用Python解析大型XML文件

编程入门 行业动态 更新时间:2024-10-26 10:32:45
用Python解析大型XML文件--etree.parse错误(parsing large xml file with Python - etree.parse error)

尝试使用lxml.etree.iterparse函数解析以下Python文件。

“sampleoutput.xml”

<item> <title>Item 1</title> <desc>Description 1</desc> </item> <item> <title>Item 2</title> <desc>Description 2</desc> </item>

我尝试了使用Python lxml和Iterparse解析大型XML文件的代码

在etree.iterparse(MYFILE)调用之前,我做了MYFILE = open(“/ Users / eric / Desktop / wikipedia_map / sampleoutput.xml”,“r”)

但它出现以下错误

Traceback (most recent call last): File "/Users/eric/Documents/Programming/Eclipse_Workspace/wikipedia_mapper/testscraper.py", line 6, in <module> for event, elem in context : File "iterparse.pxi", line 491, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:98565) File "iterparse.pxi", line 543, in lxml.etree.iterparse._read_more_events (src/lxml/lxml.etree.c:99086) File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:74712) lxml.etree.XMLSyntaxError: Extra content at the end of the document, line 5, column 1

有任何想法吗? 谢谢!

Trying to parse the following Python file using the lxml.etree.iterparse function.

"sampleoutput.xml"

<item> <title>Item 1</title> <desc>Description 1</desc> </item> <item> <title>Item 2</title> <desc>Description 2</desc> </item>

I tried the code from Parsing Large XML file with Python lxml and Iterparse

before the etree.iterparse(MYFILE) call I did MYFILE = open("/Users/eric/Desktop/wikipedia_map/sampleoutput.xml","r")

But it turns up the following error

Traceback (most recent call last): File "/Users/eric/Documents/Programming/Eclipse_Workspace/wikipedia_mapper/testscraper.py", line 6, in <module> for event, elem in context : File "iterparse.pxi", line 491, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:98565) File "iterparse.pxi", line 543, in lxml.etree.iterparse._read_more_events (src/lxml/lxml.etree.c:99086) File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:74712) lxml.etree.XMLSyntaxError: Extra content at the end of the document, line 5, column 1

any ideas? thank you!

最满意答案

问题在于,如果XML没有完全一个顶级标记,则XML格式不完整。 您可以通过将整个文档包装在<items></items>标签中来修复您的示例。 您还需要使用<desc/>标签来匹配您正在使用的查询( description )。

以下文档使用您现有的代码生成正确的结果:

<items> <item> <title>Item 1</title> <description>Description 1</description> </item> <item> <title>Item 2</title> <description>Description 2</description> </item> </items>

The problem is that XML isn't well-formed if it doesn't have exactly one top-level tag. You can fix your sample by wrapping the entire document in <items></items> tags. You also need the <desc/> tags to match the query that you're using (description).

The following document produces correct results with your existing code:

<items> <item> <title>Item 1</title> <description>Description 1</description> </item> <item> <title>Item 2</title> <description>Description 2</description> </item> </items>

更多推荐

本文发布于:2023-08-07 11:53:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1464196.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:文件   Python   XML

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!