如何在保存带有Python的页眉和页脚的同时拆分XML文件（在特定的N节点上）？(How to split an XML file (on specific N node) while conserv

如何在保存带有Python的页眉和页脚的同时拆分XML文件（在特定的N节点上）？(How to split an XML file (on specific N node) while conserving a header and a footer with Python? [closed])

我是Python的新手，我不知道从哪里开始解决我的问题。

以下是我需要做的事情：从文件夹中读取XML文件并将其拆分为多个XML文件（在另一个文件夹中），关于特定重复节点（将由用户输入），同时保留标题（之前的内容）节点）和页脚（节点后面的内容）。

这是一个例子：

<?xml version="1.0"?> <catalog catalogName="cat1" catalogType="bestsellers"> <headerNode node="1"> <param1>value1</param1> <param2>value2</param2> </headerNode> <book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer's Guide</title> <genre>Computer</genre> <price>44.95</price> <publish_date>2000-10-01</publish_date> <description>An in-depth look at creating applications with XML.</description> </book> <book id="bk102"> <author>Ralls, Kim</author> <title>Midnight Rain</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2000-12-16</publish_date> <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description> </book> <book id="bk103"> <author>Corets, Eva</author> <title>Maeve Ascendant</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2000-11-17</publish_date> <description>After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society.</description> </book> <footerNode node="2"> <param1>value1</param1> <param2>value2</param2> </footerNode> </catalog>

因此，目的是拥有3个XML文件（因为我们有3个“book”节点实例）具有“headerNode”+ 1“book”+“footerNode”。

第一个文件是这样的：

<?xml version="1.0"?> <catalog catalogName="cat1" catalogType="bestsellers"> <headerNode node="1"> <param1>value1</param1> <param2>value2</param2> </headerNode> <book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer's Guide</title> <genre>Computer</genre> <price>44.95</price> <publish_date>2000-10-01</publish_date> <description>An in-depth look at creating applications with XML.</description> </book> <footerNode node="2"> <param1>value1</param1> <param2>value2</param2> </footerNode> </catalog>

唯一的限制是它需要使用“ElementTree”而不是“lxml”库（因为lxml不包含在生产依赖项中）。

编辑：所以这里的代码基于“MK Ultra”的答案。

现在我修改它以将两个参数传递给脚本（第一个是没有扩展名的XML文件的名称，第二个是拆分节点），现在我读取XML并在与脚本相同的文件夹上生成XML文件。（我在循环中使用索引来命名文件夹）

import sys import xml.etree.ElementTree as ET import os # Get the current directory cwd = os.getcwd() # Load the xml doc = ET.parse(r"%s/%s.xml" % (cwd,sys.argv[1])) root = doc.getroot() # Get the header element header = root.find("headerNode") # Get the footer element footer = root.find("footerNode") # loop over the books and create the new xml file for idx,book in enumerate(root.findall(sys.argv[2])): top = ET.Element(root.tag) top.append(header) top.append(book) top.append(footer) out_book = ET.ElementTree(top) # the output file name will be the ID of the book out_path = "%s/%s_%s.xml" % (cwd,sys.argv[1],idx) out_book.write(open(out_path, "wb"))

如何使“headerNode”/“footerNode”部分通用？通过这个我的意思是它将是“书”或其他类似“小说”，“纸张”等。正确的值只有脚本的用户（显然不是我）在运行它时才知道。

EDIT2：刚刚修改了原始文件以将属性添加到“目录”节点，因为我无法在创建分割文件时复制属性。

I am new to Python and I don't know where to start with the solution of my problem.

Here is what I need to do: Read an XML file from a folder and split it on multiple XML files (in another folder) regarding a specific repetitive node (that would be input by the user) while keeping the header (what comes before that node) and the footer (what comes after the node).

Here is an example:

<?xml version="1.0"?> <catalog catalogName="cat1" catalogType="bestsellers"> <headerNode node="1"> <param1>value1</param1> <param2>value2</param2> </headerNode> <book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer's Guide</title> <genre>Computer</genre> <price>44.95</price> <publish_date>2000-10-01</publish_date> <description>An in-depth look at creating applications with XML.</description> </book> <book id="bk102"> <author>Ralls, Kim</author> <title>Midnight Rain</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2000-12-16</publish_date> <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description> </book> <book id="bk103"> <author>Corets, Eva</author> <title>Maeve Ascendant</title> <genre>Fantasy</genre> <price>5.95</price> <publish_date>2000-11-17</publish_date> <description>After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society.</description> </book> <footerNode node="2"> <param1>value1</param1> <param2>value2</param2> </footerNode> </catalog>

So the purpose whould be to have 3 XML files (because we have 3 instances of "book" node) having the "headerNode" + 1 "book" + "footerNode".

The first file would be like this:

<?xml version="1.0"?> <catalog catalogName="cat1" catalogType="bestsellers"> <headerNode node="1"> <param1>value1</param1> <param2>value2</param2> </headerNode> <book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer's Guide</title> <genre>Computer</genre> <price>44.95</price> <publish_date>2000-10-01</publish_date> <description>An in-depth look at creating applications with XML.</description> </book> <footerNode node="2"> <param1>value1</param1> <param2>value2</param2> </footerNode> </catalog>

The only constraint is that it needs to be done with "ElementTree" and not "lxml" library (because lxml is not included in the production dependencies).

EDIT: So here is the code based on the answer from "MK Ultra".

For now I modified it to pass two parameters to the script (first one is the name of the XML file without extension and second is the split node), and now I read the XML and generate the XML files on the same folder than the script. (and I use an index in the loop to name the folder)

import sys import xml.etree.ElementTree as ET import os # Get the current directory cwd = os.getcwd() # Load the xml doc = ET.parse(r"%s/%s.xml" % (cwd,sys.argv[1])) root = doc.getroot() # Get the header element header = root.find("headerNode") # Get the footer element footer = root.find("footerNode") # loop over the books and create the new xml file for idx,book in enumerate(root.findall(sys.argv[2])): top = ET.Element(root.tag) top.append(header) top.append(book) top.append(footer) out_book = ET.ElementTree(top) # the output file name will be the ID of the book out_path = "%s/%s_%s.xml" % (cwd,sys.argv[1],idx) out_book.write(open(out_path, "wb"))

How can I make the "headerNode"/"footerNode" part generic? By this I mean that it would be "book" or something else like "novel", "paper", etc. The correct value would only be known by the user of the script (which is not me obviously) when running it.

EDIT2: Just modified the original file to add attributes to the "catalog" node because I cannot copy the attributes while creating the splitted files.

最满意答案

算法如下，

解析您的xml文件并获取现有的根目录有了它，形成所有书籍的基础 - 具有页眉和页脚的目录 - new_root。现在，遍历根标签以获取带有标签'book'的所有元素然后，将book元素插入new_root并将其写入文件 - 这里我写的文件名与您的id相同！ #question 2 - tag name as input from user! tag_name=raw_input("Enter tag name:") from xml.etree.ElementTree import ElementTree,parse,Element root = parse('sample.xml').getroot() new_root=Element(root.tag) #question 1 - multiple header and footer! new_root.extend(root.findall('.//headerNode')) new_root.extend(root.findall('.//footerNode')) for elem in root: if elem.tag == tag_name: new_root.insert(1,elem) #question 3 - write output to file! ElementTree(new_root).write(open('path/to/folder'+elem.get('id')+'.xml', 'wb')) new_root.remove(elem)

样本输出：

文件名 ：bk101.xml

<catalog> <headerNode node="1"> <param1>value1</param1> <param2>value2</param2> </headerNode> <book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer's Guide</title> <genre>Computer</genre> <price>44.95</price> <publish_date>2000-10-01</publish_date> <description>An in-depth look at creating applications with XML.</description> </book> <footerNode node="2"> <param1>value1</param1> <param2>value2</param2> </footerNode> </catalog>

快乐的编码！

The algorithm goes as follows,

parse your xml file and get your existing root with that, form the base for for all books - that has the catalog with header and footer - new_root. Now, iterate through the root tag to get all element with tag 'book' Then, insert the book element to your new_root and write it to a file - here I've written to a file with name same as your id! #question 2 - tag name as input from user! tag_name=raw_input("Enter tag name:") from xml.etree.ElementTree import ElementTree,parse,Element root = parse('sample.xml').getroot() new_root=Element(root.tag) #question 1 - multiple header and footer! new_root.extend(root.findall('.//headerNode')) new_root.extend(root.findall('.//footerNode')) for elem in root: if elem.tag == tag_name: new_root.insert(1,elem) #question 3 - write output to file! ElementTree(new_root).write(open('path/to/folder'+elem.get('id')+'.xml', 'wb')) new_root.remove(elem)

Sample Output:

File Name: bk101.xml

<catalog> <headerNode node="1"> <param1>value1</param1> <param2>value2</param2> </headerNode> <book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer's Guide</title> <genre>Computer</genre> <price>44.95</price> <publish_date>2000-10-01</publish_date> <description>An in-depth look at creating applications with XML.</description> </book> <footerNode node="2"> <param1>value1</param1> <param2>value2</param2> </footerNode> </catalog>

Happy Coding!

更多推荐