本文介绍了XML文件解析Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我无法再收集2条数据以使用Python将数据从XML转换为
I am having trouble collecting 2 more pieces of data to convert from XML to CSV using Python
它们是description标签和generateOn标签。
They are the description tag and the generatedOn tag.
对于描述标签,我尝试了 item.find('description')。text ,但是它不起作用。
For the description tag I tried item.find('description').text but it did not work.
对于我希望generateOn标记将内部的项连接起来:
For the generatedOn tag I would like it concatenate the items inside like this:
请请参阅下面的示例XML:
Please see the sample XML below:
<?xml version="1.0" encoding="UTF-8"?> <omGroups xmlns="urn:nortel:namespaces:mcp:oms" xmlns:xsi="www.w3/2001/XMLSchema-instance" xsi:schemaLocation="urn:nortel:namespaces:mcp:oms OMSchema.xsd" > <group> <name>RecordingSystem</name> <row> <package>com.nortelnetworks.mcp.ne.base.recsystem.fw.system</package> <class>RecSysFileOMRow</class> <usage name="closedFileCount" hasThresholds="true"> <measures> closed file count </measures> <description> This register counts the number of closed files in the spool directory of a particular stream and a particular system. Files in the spool directory store the raw OAM records where they are sent to the Element Manager for formatting. </description> <notes> Minor and major alarms when the value of closedFileCount exceeds certain thresholds. Configure the threshold values for minor and major alarms for this OM through engineering parameters for minorBackLogCount and majorBackLogCount, respectively. These engineering parameters are grouped under the parameter group of Log, OM, and Accounting for the logs’ corresponding system. </notes> </usage> <usage name="processedFileCount" hasThresholds="true"> <measures> Processed file count </measures> <description> The register counts the number of processed files in the spool directory of a particular stream and a particular system. Files in the spool directory store the raw OAM records and then send the records to the Element Manager for formatting. </description> </usage> </row> <documentation> <description> Rows of this OM group provide a count of the number of files contained within the directory (which is the OM row key value). </description> <rowKey> The full name of the directory containing the files counted by this row. </rowKey> </documentation> <generatedOn> <all/> </generatedOn> </group> <group traffic="true"> <name>Ports</name> <row> <package>com.nortelnetworks.ims.cap.mediaportal.host</package> <class>PortsOMRow</class> <usage name="rtpMpPortUsage"> <measures> BCP port usage </measures> <description> Meter showing number of ports in use. </description> </usage> <lwGauge name="connMapEntriesLWM"> <measures> Lowest simultaneous port usage </measures> <description> Lowest number of simultaneous ports detected to be in use during the collection interval </description> </lwGauge> <hwGauge name="connMapEntriesHWM"> <measures> Highest simultaneous port usage </measures> <description> Highest number of simultaneous ports detected to be in use during the collection interval. </description> </hwGauge> <waterMark name="connMapEntries"> <measures> Connections map entries </measures> <description> Meter showing the number of connections in the host CPU connection map. </description> <bwg lwref="connMapEntriesLWM" hwref="connMapEntriesHWM"/> </waterMark> <counter name="portUsageSampleCnt"> <measures> Usage sample count </measures> <description> The number of 100-second samples taken during the collection interval contributing to the average report. </description> </counter> <counter name="sampledRtpMpPortUsage"> <measures> In-use ports usage </measures> <description> Provides the sum of the in-use ports every 100 seconds. </description> </counter> <precollector> <package>com.nortelnetworks.ims.cap.mediaportal.host</package> <class>PortsOMCenturyPrecollector</class> <collector>centurySecond</collector> </precollector> </row> <documentation> <description> </description> <rowKey> </rowKey> </documentation> <generatedOn> <list> <ne>sessmgr</ne> <ne>rtpportal</ne> </list> </generatedOn> </group> </omGroups>代码
import csv from bs4 import BeautifulSoup soup = BeautifulSoup(xml_string, 'html.parser') with open('data.csv', 'w', newline='') as f_out: writer = csv.writer(f_out) writer.writerow(['General name:SpecificName', 'RegisterType', 'Measures']) for item in soup.select('row [name]'): writer.writerow([item.find_previous('name').text + ':' + item['name'], item.name, item.find('measures').get_text(strip=True)])推荐答案
您可以尝试以下代码:
import csv import re from bs4 import BeautifulSoup soup = BeautifulSoup(xml_string, 'html.parser') with open('data.csv', 'w', newline='') as f_out: writer = csv.writer(f_out) writer.writerow(['General name:SpecificName', 'RegisterType', 'Measures', 'Description', 'generatedOn']) for item in soup.select('row [name]'): desc = item.find('description').get_text(strip=True) desc = re.sub(r'\s{2,}', ' ', desc) generatedOn = ','.join(ne.get_text(strip=True) for ne in item.find_parent('group').select('ne')) writer.writerow([item.find_previous('name').text + ':' + item['name'], item.name, item.find('measures').get_text(strip=True), desc, generatedOn])生成 data.csv :
更多推荐
XML文件解析Python
发布评论