我曾经认为我有这整个编码的东西很明白。我似乎错了,因为我无法解释这里发生了什么。
I used to think I had this whole encoding stuff pretty figured out. I seem to be wrong because I can't explain what's happening here.
我试图做的是使用 制表 模块,使用
What I was trying to do is to use the tabulate module to print a nicely formatted table using
from tabulate import tabulate s = tabulate([[1,2],[3,4]], ["x","y"], tablefmt="fancy_grid") print(s)在IPython 3.5.0的交互式控制台中在Windows 10下。我预计结果是
in IPython 3.5.0's interactive console under Windows 10. I expected the result to be
╒═════╤═════╕ │ x │ y │ ╞═════╪═════╡ │ 1 │ 2 │ ├─────┼─────┤ │ 3 │ 4 │ ╘═════╧═════╛但相反,我得到了一个
UnicodeEncodeError: 'charmap' codec can't encode character '\u2552' in position 0: character maps to <undefined>困惑,我试图找出问题所在并查看字符串的repr :
Puzzled, I tried to find out where the problem was and looked at the repr of the string:
In [15]: s Out[15]: '╒═════╤═════╕\n│ x │ y │\n╞═════╪═════╡\n│ 1 │ 2 │\n├─────┼─────┤\n│ 3 │ 4 │\n╘═════╧═════╛'嗯,所有字符都可以显示终端(即使是第一个触发错误的字符)。
Hmm, all the characters can be displayed by the terminal (even the first one that triggered the error).
只需查看一些细节:
In [16]: sys.stdout.encoding Out[16]: 'cp850' In [17]: s.encode("cp850") [...] UnicodeEncodeError: 'charmap' codec can't encode character '\u2552' in position 0: character maps to <undefined>那么终端使用哪种编码 ? Python说它是 cp850 ,它告诉我 cp850 没有╒ 字符(这是真的,它是 cp437 必须为重音字母腾出空间),但我可以在终端窗口中看到它!
So which encoding is the terminal using? Python says that it's cp850, and it tells me that cp850 doesn't have a ╒ character (which is true, it's one of the characters from cp437 that had to make room for accented letters), but I can see it in the terminal window!
为了进一步复杂化,当使用原生Python控制台而不是IPython时,错误似乎更容易理解:
To complicate things further, when using the native Python console instead of IPython, the error seems more understandable:
>>> s '\u2552═══\u2564═══\u2555\n│ 1 │ 2 │\n├───┼───┤\n│ 3 │ 4 │\n\u2558═══\u2567═══\u255b' >>> sys.stdout.encoding 'cp850' >>> print(s) Traceback (most recent call last): [...] UnicodeEncodeError: 'charmap' codec can't encode character '\u2552' in position 0: character maps to <undefined>所以至少Python是一致的,但是IPython发生了什么?
So at least Python is consistent, but what's happening with IPython?
推荐答案IPython在交互模式下使用OEM代码页,就像任何其他Python控制台程序一样:
IPython uses OEM code page in the interactive mode like any other Python console program:
In [1]: '\u2552' ERROR - failed to write data to stream: <_io.TextIOWrapper name='<stdout>' mode= 'w' encoding='cp850'> Out[1]: In [2]: !chcp Active code page: 850如果安装了 pyreadline ,结果会发生变化(它在IPython控制台中启用颜色等):
The result changes if pyreadline is installed (it enables colors in the IPython console among other things):
In [1]: '\u2552' Out[1]: '╒' In [2]: import sys In [3]: sys.stdout.encoding Out[3]: 'cp850' In [4]: !chcp Active code page: 850一旦 pyreadline ,IPython的 sys.displayhook 将结果写入使用 WriteConsoleW()的readline的控制台对象Windows Unicode API允许在当前代码页中打印甚至不可编码的Unicode字符(要查看它们,您可能需要在Windows控制台中配置(TrueType)字体,如Lucida Console)。
Once pyreadline has been installed, IPython's sys.displayhook writes the result to readline's console object that uses WriteConsoleW() Windows Unicode API that allows to print even unencodable in the current code page Unicode characters (to see them, you might need to configure a (TrueType) font such as Lucida Console in the Windows console).
更多推荐
IPython终端使用哪种字符编码?
发布评论