Python Unicode 编码错误

编程入门行业动态更新时间:2024-10-28 12:19:06

本文介绍了Python Unicode 编码错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在读取和解析 Amazon XML 文件，而 XML 文件显示 ' ，当我尝试打印它时，我收到以下错误:

'ascii' 编解码器无法对位置 16 中的字符 u'u2019' 进行编码:序号不在范围内 (128)

从我目前在线阅读的内容来看，错误来自 XML 文件是 UTF-8 格式，但 Python 想要将其作为 ASCII 编码字符处理.有没有一种简单的方法可以消除错误并让我的程序在读取时打印 XML?

解决方案

很可能，你的问题是你解析的很好，现在你试图打印 XML 的内容，但你不能，因为有一些外国Unicode 字符.首先尝试将您的 unicode 字符串编码为 ascii:

unicodeData.encode('ascii', 'ignore')

'ignore' 部分会告诉它跳过这些字符.来自 python 文档:

>>># Python 2: u = unichr(40960) + u'abcd' + unichr(1972)>>>u = chr(40960) + u'abcd' + chr(1972)>>>u.encode('utf-8')'xeax80x80abcdxdexb4'>>>u.encode('ascii')回溯(最近一次调用最后一次):文件<stdin>"，第 1 行，在 ?UnicodeEncodeError: 'ascii' 编解码器无法对位置 0 中的字符 'ua000' 进行编码:序号不在范围内 (128)>>>u.encode('ascii', '忽略')'A B C D'>>>u.encode('ascii', 'replace')'?A B C D?'>>>u.encode('ascii', 'xmlcharrefreplace')'ꀀabcd޴'

您可能想阅读这篇文章:www.joelonsoftware/articles/Unicode.html，我发现它作为关于正在发生的事情的基本教程非常有用.阅读之后，您将不再觉得自己只是在猜测要使用的命令(或者至少我遇到过这种情况).

I'm reading and parsing an Amazon XML file and while the XML file shows a ' , when I try to print it I get the following error:

'ascii' codec can't encode character u'u2019' in position 16: ordinal not in range(128)

From what I've read online thus far, the error is coming from the fact that the XML file is in UTF-8, but Python wants to handle it as an ASCII encoded character. Is there a simple way to make the error go away and have my program print the XML as it reads?

解决方案

Likely, your problem is that you parsed it okay, and now you're trying to print the contents of the XML and you can't because theres some foreign Unicode characters. Try to encode your unicode string as ascii first:

unicodeData.encode('ascii', 'ignore')

the 'ignore' part will tell it to just skip those characters. From the python docs:

>>> # Python 2: u = unichr(40960) + u'abcd' + unichr(1972) >>> u = chr(40960) + u'abcd' + chr(1972) >>> u.encode('utf-8') 'xeax80x80abcdxdexb4' >>> u.encode('ascii') Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character 'ua000' in position 0: ordinal not in range(128) >>> u.encode('ascii', 'ignore') 'abcd' >>> u.encode('ascii', 'replace') '?abcd?' >>> u.encode('ascii', 'xmlcharrefreplace') 'ꀀabcd޴'

You might want to read this article: www.joelonsoftware/articles/Unicode.html, which I found very useful as a basic tutorial on what's going on. After the read, you'll stop feeling like you're just guessing what commands to use (or at least that happened to me).

更多推荐

Python Unicode 编码错误

本文发布于:2023-08-05 05:50:16，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1303070.html