问题描述
限时送ChatGPT账号.. <预><代码>>>>str1 = unicode('玛丽亚','utf8')>>>str2 = u'María'.encode('utf8')>>>str1 == str2错误的这怎么可能?
以防万一,我使用的是 iPython Notebook.
解决方案你有一个 unicode 字符串和一个 byte 字符串.它们不是一回事.
一个持有一个 Unicode 值,María
.另一个保存 UTF-8 编码的字节序列,'Mar\xc3\xada'
.
Python 2 在比较 Unicode 和字节字符串值时确实会进行隐式转换,但您不应指望这种转换,它完全取决于系统的默认编解码器集.
如果您还不知道 Unicode 到底是什么,或者为什么 UTF-8 不是一回事,或者想了解有关编码的任何其他信息,请参阅:
每个软件开发人员绝对、肯定必须了解 Unicode 和字符集的绝对最低要求(没有任何借口!) 作者:乔尔·斯波尔斯基
Python Unicode HOWTO
Pragmatic Unicode 作者:Ned Batchelder
>>> str1 = unicode('María','utf8')
>>> str2 = u'María'.encode('utf8')
>>> str1 == str2
False
How is that possible?
Just in case it is relevant, I'm using the iPython Notebook.
解决方案You have a unicode string and a byte string. They are not the same thing.
One holds a Unicode value, María
. The other holds a UTF-8 encoded series of bytes, 'Mar\xc3\xada'
.
Python 2 does do an implicit conversion when comparing Unicode and byte string values, but you should not count on that conversion, and it depends entirely on the default codec set for your system.
If you don't yet know what Unicode really is, or why UTF-8 is not the same thing, or want to know anything else about encodings, see:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder
这篇关于两个明显相等的 Python Unicode UTF8 编码字符串不匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论