使用 Python & 在 Windows 上的 Unicode 文件名子进程.Popen()

编程入门行业动态更新时间:2024-10-27 18:21:50

本文介绍了使用 Python & 在 Windows 上的 Unicode 文件名子进程.Popen()的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

为什么会出现以下情况:

>>>u'\u0308'.encode('mbcs') #UMLAUT'\xa8'>>>u'\u041A'.encode('mbcs') #西里尔大写字母 KA'?>>>

我有一个 Python 应用程序接受来自操作系统的文件名.它适用于某些国际用户，但不适用于其他用户.

例如，这个 unicode 文件名:你'\u041a\u0433\u044b\u044b\u0448\u0444\u0442'

不会使用 Windows 'mbcs' 编码(文件系统使用的编码，由 sys.getfilesystemencoding() 返回)进行编码.我得到'???????'，表示编码器在这些字符上失败.但这毫无意义，因为文件名最初来自用户.

更新:这是我背后的原因的背景......我的系统上有一个名称为西里尔文的文件.我想用该文件作为参数调用 subprocess.Popen() .Popen 不会处理 unicode.通常，我可以使用 sys.getfilesystemencoding() 给出的编解码器对参数进行编码.在这种情况下它不起作用

解决方案

在 Py3K - 至少从 Python 3.2 - subprocess.Popen 和 sys.argv 与Windows 上的(默认 unicode)字符串.CreateProcessW 和 GetCommandLineW 明显使用.

在 Python 中 - 至少到 v2.7.2 - subprocess.Popen 带有 Unicode 参数的错误.它坚持 CreateProcessA (而 os.* 与 Unicode 一致).而 shlex.split 造成了额外的废话.

Pywin32 的 win32process.CreateProcess 也不会自动切换到 W 版本，也没有 win32process.CreateProcessW.与 GetCommandLine 相同.因此需要使用 ctypes.windll.kernel32.CreateProcessW....关于这个问题，可能应该修复子流程模块.

带有私有应用程序的 argv[1:] 上的

UTF8 在 Unicode 操作系统上仍然很笨拙.对于像 Linux 这样的 8 位Latin1"字符串操作系统，这些技巧可能是合法的.

更新 vaab 为 Python 2.7 创建了 Popen 的补丁版本，解决了这个问题.请参阅 gist.github/vaab/2ad7051fc193167f15f85ef5973e>5带有解释的博客文章:vaab.blog.kal.fr/2017/03/16/fixing-windows-python-2-7-unicode-issue-with-subprocesss-popen/>

Why does the following occur:

>>> u'\u0308'.encode('mbcs') #UMLAUT '\xa8' >>> u'\u041A'.encode('mbcs') #CYRILLIC CAPITAL LETTER KA '?' >>>

I have a Python application accepting filenames from the operating system. It works for some international users, but not others.

For example, this unicode filename: u'\u041a\u0433\u044b\u044b\u0448\u0444\u0442'

will not encode with Windows 'mbcs' encoding (the one used by the filesystem, returned by sys.getfilesystemencoding()). I get '???????', indicating the encoder fails on those characters. But this makes no sense, since the filename came from the user to begin with.

Update: Here's the background to my reasons behind this... I have a file on my system with the name in Cyrillic. I want to call subprocess.Popen() with that file as an argument. Popen won't handle unicode. Normally I can get away with encoding the argument with the codec given by sys.getfilesystemencoding(). In this case it won't work

解决方案

In Py3K - at least from Python 3.2 - subprocess.Popen and sys.argv work consistently with (default unicode) strings on Windows. CreateProcessW and GetCommandLineW are used obviously.

In Python - up to v2.7.2 at least - subprocess.Popen is buggy with Unicode arguments. It sticks to CreateProcessA (while os.* are consistent with Unicode). And shlex.split creates additional nonsense.

Pywin32's win32process.CreateProcess also doesn't auto-switch to the W version, nor is there a win32process.CreateProcessW. Same with GetCommandLine. Thus ctypes.windll.kernel32.CreateProcessW... needs to be used. The subprocess module perhaps should be fixed regarding this issue.

UTF8 on argv[1:] with private apps remains clumsy on a Unicode OS. Such tricks may be legal for 8-bit "Latin1" string OSes like Linux.

UPDATE vaab has created a patched version of Popen for Python 2.7 which fixes the issue. See gist.github/vaab/2ad7051fc193167f15f85ef573e54eb9 Blog post with explanations: vaab.blog.kal.fr/2017/03/16/fixing-windows-python-2-7-unicode-issue-with-subprocesss-popen/

更多推荐

使用 Python & 在 Windows 上的 Unicode 文件名子进程.Popen()

本文发布于:2023-08-05 05:49:15，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1303061.html