我需要解析一个XML文件,其中包含一些我需要保留的CDATA块,以便以后进行绘图:
I need to parse an XML file with a number of blocks of CDATA that I need to retain for later plotting:
<process id="process1"> <log name="name1" device="device1"><![CDATA[timestamp value]]]></log> <log name="name2" device="device2"><![CDATA[timestamp value, timestamp value, timestamp]]]></log> </process>
<process id="process1"> <log name="name1" device="device1"><![CDATA[timestamp value]]]></log> <log name="name2" device="device2"><![CDATA[timestamp value, timestamp value, timestamp]]]></log> </process>
我将需要重复且快速地执行此操作,而我正在寻找执行此操作的最佳方法.我读过ElementTree是方法中比较快的方法,但是我对其他建议持开放态度.
I will need to do this repeatedly and quickly, and I am looking for the best way to do this. I've read that ElementTree is the faster of the methods, but I am open to other suggestions.
推荐答案以下是两个示例:
from lxml import etree import xml.etree.ElementTree as ElementTree CONTENT = """ <process id="process1"> <log name="name1" device="device1"><![CDATA[timestamp value]]></log> <log name="name2" device="device2"><![CDATA[timestamp value, timestamp value, timestamp]]></log> </process> """ def parse_with_lxml(): root = etree.fromstring(CONTENT) for log in root.xpath("//log"): print log.text def parse_with_stdlib(): root = ElementTree.fromstring(CONTENT) for log in root.iter('log'): print log.text if __name__ == '__main__': parse_with_lxml() parse_with_stdlib()输出:
timestamp value timestamp value, timestamp value, timestamp timestamp value timestamp value, timestamp value, timestamp在两种情况下,它都会处理text属性.
The text attribute it handles it in both cases.
更多推荐
使用python解析xml中的CDATA
发布评论