尝试使用Python解析XLS(XML)文件

编程入门 行业动态 更新时间:2024-10-07 22:29:19
本文介绍了尝试使用Python解析XLS(XML)文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我从NetSuite ERP下载了一个"xls"文件。文件根目录是".xls",但它实际上是一个XML文件。我有一个PANAS脚本,它将组合几个XLS或XLSX文件,但PANDA似乎无法处理这种奇怪的XLS/XML文件类型,所以我有另一个脚本试图解析XML数据并保存到XLS或XLSX。然而,下面的脚本似乎不起作用,因为它的结果是"None"。有谁能用我的样例代码、新代码或解决这个奇怪的XLS/XML解析问题的新方法为我指明正确的方向吗?

提前感谢!

XML示例代码:

<?xml version="1.0" encoding="utf-16"?> <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="www.w3/TR/REC-html40"> <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office"> <Author>NetSuite Reports</Author> <LastAuthor>NetSuite Reports</LastAuthor> <Company>NetSuite</Company> </DocumentProperties> <Styles> <Style ss:ID="company"> <Alignment ss:Horizontal="Center" /> <Font ss:Size="12" ss:Bold="1" /> </Style> <Style ss:ID="subcompany"> <Alignment ss:Horizontal="Center" /> <Font ss:Size="14" ss:Bold="1" /> </Style> <Style ss:ID="error"> <Alignment ss:Horizontal="Center" /> <Interior ss:Color="#f0d0d0" ss:Pattern="Solid" /> <Font ss:Bold="1" /> </Style> <Style ss:ID="header_l"> <Alignment ss:Horizontal="Left" /> <Font ss:Size="7" ss:Bold="1" /> <Interior ss:Color="#d0d0d0" ss:Pattern="Solid" /> </Style> <Style ss:ID="header_r"> <Alignment ss:Horizontal="Right" /> <Font ss:Size="7" ss:Bold="1" /> <Interior ss:Color="#d0d0d0" ss:Pattern="Solid" /> </Style> <Style ss:ID="header_c"> <Alignment ss:Horizontal="Center" /> <Font ss:Size="7" ss:Bold="1" /> <Interior ss:Color="#d0d0d0" ss:Pattern="Solid" /> </Style> <Style ss:ID="scheckbox"> <Alignment ss:Vertical="Center" ss:Horizontal="Center" /> </Style> <Style ss:ID="Default" ss:Name="Normal"> <Alignment ss:Vertical="Bottom" /> <Borders /> <Font ss:FontName="Arial" ss:Size="8" /> <Interior /> <NumberFormat /> <Protection /> </Style> <Style ss:ID="s53"> <Alignment ss:Vertical="Center" ss:Horizontal="Left" /> <Font ss:FontName="Arial" ss:Size="8" ss:Color="#000000" ss:Bold="1" ss:Italic="0" /> <Borders> <Border ss:Position="Top" ss:LineStyle="Dash" ss:Weight="1" ss:Color="#cccccc" /> </Borders> </Style> <Style ss:ID="s52"> <Alignment ss:Horizontal="Left" ss:Indent="1" /> <Font ss:FontName="Arial" ss:Size="8" ss:Color="#000000" ss:Bold="0" ss:Italic="0" /> <Borders /> </Style> <Style ss:ID="s51"> <Alignment ss:Vertical="Center" ss:Horizontal="Right" /> <Font ss:FontName="Arial" ss:Size="8" ss:Color="#000000" ss:Bold="0" ss:Italic="0" /> <NumberFormat ss:Format="&quot;€&quot;#,##0.00" /> <Borders /> </Style> <Style ss:ID="s50"> <Alignment ss:Vertical="Center" ss:Horizontal="Left" /> <Font ss:FontName="Arial" ss:Size="8" ss:Color="#000000" ss:Bold="1" ss:Italic="0" /> <Borders /> </Style> <Style ss:ID="s58"> <Alignment ss:Horizontal="Left" ss:Indent="2" /> <Font ss:FontName="Arial" ss:Size="8" ss:Color="#000000" ss:Bold="1" ss:Italic="0" /> <Borders> <Border ss:Position="Top" ss:LineStyle="Dash" ss:Weight="1" ss:Color="#cccccc" /> </Borders> </Style> <Style ss:ID="s54"> <Alignment ss:Vertical="Center" ss:Horizontal="Right" /> <Font ss:FontName="Arial" ss:Size="8" ss:Color="#000000" ss:Bold="1" ss:Italic="0" /> <NumberFormat ss:Format="&quot;€&quot;#,##0.00" /> <Borders> <Border ss:Position="Top" ss:LineStyle="Dash" ss:Weight="1" ss:Color="#cccccc" /> </Borders> </Style> <Style ss:ID="s59"> <Alignment ss:Horizontal="Left" ss:Indent="1" /> <Font ss:FontName="Arial" ss:Size="8" ss:Color="#000000" ss:Bold="1" ss:Italic="0" /> <Borders> <Border ss:Position="Top" ss:LineStyle="Dash" ss:Weight="1" ss:Color="#cccccc" /> </Borders> </Style> <Style ss:ID="s56"> <Alignment ss:Horizontal="Left" ss:Indent="2" /> <Font ss:FontName="Arial" ss:Size="8" ss:Color="#000000" ss:Bold="1" ss:Italic="0" /> <Borders /> </Style> <Style ss:ID="s57"> <Alignment ss:Horizontal="Left" ss:Indent="3" /> <Font ss:FontName="Arial" ss:Size="8" ss:Color="#000000" ss:Bold="0" ss:Italic="0" /> <Borders /> </Style> <Style ss:ID="s55"> <Alignment ss:Horizontal="Left" ss:Indent="1" /> <Font ss:FontName="Arial" ss:Size="8" ss:Color="#000000" ss:Bold="1" ss:Italic="0" /> <Borders /> </Style> <Style ss:ID="s60"> <Alignment ss:Vertical="Center" ss:Horizontal="Left" /> <Font ss:FontName="Arial" ss:Size="8" ss:Color="#000000" ss:Bold="1" ss:Italic="0" /> <Borders> <Border ss:Position="Top" ss:LineStyle="Dash" ss:Weight="1" ss:Color="#cccccc" /> </Borders> </Style> </Styles> <Worksheet ss:Name="TrialBalance"> <Table> <Row> <Cell ss:StyleID="company" ss:MergeAcross="1"> <Data ss:Type="String">Parent Company</Data> </Cell> </Row> <Row> <Cell ss:StyleID="company" ss:MergeAcross="1"> <Data ss:Type="String">Company Holdings Inc. : Company A B.V.</Data> </Cell> </Row> <Row> <Cell ss:StyleID="subcompany" ss:MergeAcross="1"> <Data ss:Type="String">Trial Balance</Data> </Cell> </Row> <Row> <Cell ss:StyleID="subcompany" ss:MergeAcross="1"> <Data ss:Type="String">End of Feb 2020</Data> </Cell> </Row> <Row> <Cell ss:StyleID="subcompany" ss:MergeAcross="1"> <Data ss:Type="String" /> </Cell> </Row> <Row> <Cell ss:StyleID="subcompany" ss:MergeAcross="1"> <Data ss:Type="String" /> </Cell> </Row> <Row> <Cell ss:StyleID="header_l"> <Data ss:Type="String">Account</Data> </Cell> <Cell ss:StyleID="header_r" ss:MergeDown="0" ss:Index="2"> <Data ss:Type="String">Total</Data> </Cell> </Row> <Row> <Cell ss:StyleID="s50"> <Data ss:Type="String">10000 - CASH &amp; CASH EQUIVALENTS</Data> </Cell> <Cell ss:StyleID="s51" /> </Row> <Row> <Cell ss:StyleID="s52"> <Data ss:Type="String">10101 - Bank - 9999 - Company A - EUR</Data> </Cell> <Cell ss:StyleID="s51"> <Data ss:Type="Number">1234567.01</Data> </Cell> </Row> <Row> <Cell ss:StyleID="s53"> <Data ss:Type="String">Total - 10000 - CASH &amp; CASH EQUIVALENTS</Data> </Cell> <Cell ss:Formula="SUM(R[-1]C)" ss:StyleID="s54"> <Data ss:Type="Number">1234567.01</Data> </Cell> </Row> </Table> </Worksheet> </Workbook>

将XML解析为XLS的Python代码:

import pandas as pd import xml.etree.cElementTree as ET tree = ET.parse(r"C:UsersNAMEDocumentsootfolderexamplefile.xls") root = tree.getroot() def getvalueofnode(node): """ return node text or None """ return node.text if node is not None else None def main(): """ main """ parsed_xml = tree dfcols = ['account', 'total'] df_xml = pd.DataFrame(columns=dfcols) for node in parsed_xml.getroot(): account = node.attrib.get('Type="String"') total = node.find('Type="Number"') df_xml = df_xml.append( pd.Series([account, getvalueofnode(total)], index=dfcols), ignore_index=True) print(df_xml) main()

Python解析XML文件结果:

account total 0 None None 推荐答案

避免通过追加系列甚至DataFrame等对象来生成数据框。相反,构建一个要绑定到DataFrame的词典列表。此外,因为您的XML有一个默认的命名空间,所以您必须分配前缀来解析该命名空间下的任何元素

import pandas as pd import xml.etree.cElementTree as ET ns = {"doc": "urn:schemas-microsoft-com:office:spreadsheet"} tree = ET.parse(r"C:PathToInput.xml") root = tree.getroot() def getvalueofnode(node): """ return node text or None """ return node.text if node is not None else None def main(): """ main """ parsed_xml = tree data = [] for i, node in enumerate(root.findall('.//doc:Row', ns)): if i > 6: data.append({'account': getvalueofnode(node.find('doc:Cell[1]/doc:Data', ns)), 'total': getvalueofnode(node.find('doc:Cell[2]/doc:Data', ns))}) return(pd.DataFrame(data)) output_df = main() print(output_df) # account total # 0 10000 - CASH & CASH EQUIVALENTS None # 1 10101 - Bank - 9999 - Company A - EUR 1234567.01 # 2 Total - 10000 - CASH & CASH EQUIVALENTS 1234567.01

或者,使用win32com(仅适用于Windows用户)将Excel样式的XML保存为xlsx使用Workbook.SaveAs方法(仅适用于Windows用户),并使用pandas.read_excel跳过相应的行进行读取。

import win32com.client import pandas as pd # SAVE EXCEL FILE try: xlApp = win32com.client.Dispatch("Excel.Application") xlWbk = xlApp.Workbooks.Open(r"C:PathToInput.xml") xlWbk.SaveAs(r"C:PathToOutput.xlsx", 51) xlWbk.Close(True) xlApp.Quit() except Exception as e: print(e) finally: xlWbk = None; xlApp = None del xlWbk; del xlApp # READ EXCEL FILE output_df = pd.read_excel(r"C:PathToOutput.xlsx", skiprows = 6) print(output_df) # Account Total # 0 10000 - CASH & CASH EQUIVALENTS NaN # 1 10101 - Bank - 9999 - Company A - EUR 1234567.01 # 2 Total - 10000 - CASH & CASH EQUIVALENTS 1234567.01

更多推荐

尝试使用Python解析XLS(XML)文件

本文发布于:2023-11-28 06:13:06,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1641248.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:文件   Python   XLS   XML

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!