两个明显相等的 Python Unicode UTF8 编码字符串不匹配

编程入门行业动态更新时间:2024-10-25 20:18:54

本文介绍了两个明显相等的 Python Unicode UTF8 编码字符串不匹配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

限时送ChatGPT账号.. <预><代码>>>>str1 = unicode('玛丽亚','utf8')>>>str2 = u'María'.encode('utf8')>>>str1 == str2错误的

这怎么可能?

以防万一，我使用的是 iPython Notebook.

解决方案

你有一个 unicode 字符串和一个 byte 字符串.它们不是一回事.

一个持有一个 Unicode 值，María.另一个保存 UTF-8 编码的字节序列，'Mar\xc3\xada'.

Python 2 在比较 Unicode 和字节字符串值时确实会进行隐式转换，但您不应指望这种转换，它完全取决于系统的默认编解码器集.

如果您还不知道 Unicode 到底是什么，或者为什么 UTF-8 不是一回事，或者想了解有关编码的任何其他信息，请参阅:

每个软件开发人员绝对、肯定必须了解 Unicode 和字符集的绝对最低要求(没有任何借口！) 作者:乔尔·斯波尔斯基

Python Unicode HOWTO

Pragmatic Unicode 作者:Ned Batchelder

>>> str1 = unicode('María','utf8')
>>> str2 = u'María'.encode('utf8')
>>> str1 == str2
False

How is that possible?

Just in case it is relevant, I'm using the iPython Notebook.

解决方案

You have a unicode string and a byte string. They are not the same thing.

One holds a Unicode value, María. The other holds a UTF-8 encoded series of bytes, 'Mar\xc3\xada'.

Python 2 does do an implicit conversion when comparing Unicode and byte string values, but you should not count on that conversion, and it depends entirely on the default codec set for your system.

If you don't yet know what Unicode really is, or why UTF-8 is not the same thing, or want to know anything else about encodings, see:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

The Python Unicode HOWTO

Pragmatic Unicode by Ned Batchelder

这篇关于两个明显相等的 Python Unicode UTF8 编码字符串不匹配的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

更多推荐

[db:关键词]

本文发布于:2023-04-28 10:26:28，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1171534.html

字符串不匹配两个 Unicode Python

上一篇：尝试解析设置时出错
下一篇：直放站内的图像没有显示出来(Image inside repeater is not showing up)

发布评论取消回复

评论列表（有 0 条评论）

两个明显相等的 Python Unicode UTF8 编码字符串不匹配

问题描述

发布评论取消回复

最近发表

热门文章

标签列表