为什么会出现以下情况:
>>>u'\u0308'.encode('mbcs') #UMLAUT'\xa8'>>>u'\u041A'.encode('mbcs') #西里尔大写字母 KA'?>>>我有一个 Python 应用程序接受来自操作系统的文件名.它适用于某些国际用户,但不适用于其他用户.
例如,这个 unicode 文件名:你'\u041a\u0433\u044b\u044b\u0448\u0444\u0442'
不会使用 Windows 'mbcs' 编码(文件系统使用的编码,由 sys.getfilesystemencoding() 返回)进行编码.我得到'???????',表示编码器在这些字符上失败.但这毫无意义,因为文件名最初来自用户.
更新:这是我背后的原因的背景......我的系统上有一个名称为西里尔文的文件.我想用该文件作为参数调用 subprocess.Popen() .Popen 不会处理 unicode.通常,我可以使用 sys.getfilesystemencoding() 给出的编解码器对参数进行编码.在这种情况下它不起作用
解决方案在 Py3K - 至少从 Python 3.2 - subprocess.Popen 和 sys.argv 与Windows 上的(默认 unicode)字符串.CreateProcessW 和 GetCommandLineW 明显使用.
在 Python 中 - 至少到 v2.7.2 - subprocess.Popen 带有 Unicode 参数的错误.它坚持 CreateProcessA (而 os.* 与 Unicode 一致).而 shlex.split 造成了额外的废话.
Pywin32 的 win32process.CreateProcess 也不会自动切换到 W 版本,也没有 win32process.CreateProcessW.与 GetCommandLine 相同.因此需要使用 ctypes.windll.kernel32.CreateProcessW....关于这个问题,可能应该修复子流程模块.
带有私有应用程序的 argv[1:] 上的UTF8 在 Unicode 操作系统上仍然很笨拙.对于像 Linux 这样的 8 位Latin1"字符串操作系统,这些技巧可能是合法的.
更新 vaab 为 Python 2.7 创建了 Popen 的补丁版本,解决了这个问题.请参阅 gist.github/vaab/2ad7051fc193167f15f85ef5973e>5带有解释的博客文章:vaab.blog.kal.fr/2017/03/16/fixing-windows-python-2-7-unicode-issue-with-subprocesss-popen/>
Why does the following occur:
>>> u'\u0308'.encode('mbcs') #UMLAUT '\xa8' >>> u'\u041A'.encode('mbcs') #CYRILLIC CAPITAL LETTER KA '?' >>>I have a Python application accepting filenames from the operating system. It works for some international users, but not others.
For example, this unicode filename: u'\u041a\u0433\u044b\u044b\u0448\u0444\u0442'
will not encode with Windows 'mbcs' encoding (the one used by the filesystem, returned by sys.getfilesystemencoding()). I get '???????', indicating the encoder fails on those characters. But this makes no sense, since the filename came from the user to begin with.
Update: Here's the background to my reasons behind this... I have a file on my system with the name in Cyrillic. I want to call subprocess.Popen() with that file as an argument. Popen won't handle unicode. Normally I can get away with encoding the argument with the codec given by sys.getfilesystemencoding(). In this case it won't work
解决方案In Py3K - at least from Python 3.2 - subprocess.Popen and sys.argv work consistently with (default unicode) strings on Windows. CreateProcessW and GetCommandLineW are used obviously.
In Python - up to v2.7.2 at least - subprocess.Popen is buggy with Unicode arguments. It sticks to CreateProcessA (while os.* are consistent with Unicode). And shlex.split creates additional nonsense.
Pywin32's win32process.CreateProcess also doesn't auto-switch to the W version, nor is there a win32process.CreateProcessW. Same with GetCommandLine. Thus ctypes.windll.kernel32.CreateProcessW... needs to be used. The subprocess module perhaps should be fixed regarding this issue.
UTF8 on argv[1:] with private apps remains clumsy on a Unicode OS. Such tricks may be legal for 8-bit "Latin1" string OSes like Linux.
UPDATE vaab has created a patched version of Popen for Python 2.7 which fixes the issue. See gist.github/vaab/2ad7051fc193167f15f85ef573e54eb9 Blog post with explanations: vaab.blog.kal.fr/2017/03/16/fixing-windows-python-2-7-unicode-issue-with-subprocesss-popen/
更多推荐
使用 Python & 在 Windows 上的 Unicode 文件名子进程.Popen()
发布评论