Python ElementTree xml output to csv

Python ElementTree xml output to csv - python

I have the following XML file ('registerreads_EE.xml'):
<?xml version="1.0" encoding="us-ascii" standalone="yes"?>
<ReadingDocument xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ReadingStatusRefTable>
<ReadingStatusRef Ref="1">
<UnencodedStatus SourceValidation="SIMPLE">
<StatusCodes>
<Signal>XX</Signal>
</StatusCodes>
</UnencodedStatus>
</ReadingStatusRef>
</ReadingStatusRefTable>
<Header>
<IEE_System Id="XXXXXXXXXXXXXXX" />
<Creation_Datetime Datetime="2015-10-22T09:05:32Z" />
<Timezone Id="UTC" />
<Path FilePath="X:\XXXXXXXXXXXX.xml" />
<Export_Template Id="XXXXX" />
<CorrelationID Id="" />
</Header>
<ImportExportParameters ResubmitFile="false" CreateGroup="true">
<DataFormat TimestampType="XXXXXX" Type="XXXX" />
</ImportExportParameters>
<Channels>
<Channel StartDate="2015-10-21T00:00:00-05:00" EndDate="2015-10-22T00:00:00-05:00">
<ChannelID ServicePointChannelID="73825603:301" />
<Readings>
<Reading Value="3577.0" ReadingTime="2015-10-21T00:00:00-05:00" StatusRef="1" />
<Reading Value="3601.3" ReadingTime="2015-10-22T00:00:00-05:00" StatusRef="1" />
</Readings>
<ExportRequest RequestID="152" EntityType="ServicePoint" EntityID="73825603" RequestSource="Scheduled" />
</Channel>
<Channel StartDate="2015-10-21T00:00:00-05:00" EndDate="2015-10-22T00:00:00-05:00">
<ChannelID ServicePointChannelID="73825604:301" />
<Readings>
<Reading Value="3462.5" ReadingTime="2015-10-21T00:00:00-05:00" StatusRef="1" />
<Reading Value="3501.5" ReadingTime="2015-10-22T00:00:00-05:00" StatusRef="1" />
</Readings>
<ExportRequest RequestID="152" EntityType="ServicePoint" EntityID="73825604" RequestSource="Scheduled" />
</Channel>
</Channels>
</ReadingDocument>
I want to parse the XML of the channel data into a csv file.
He is what I have written in Python 2.7.10:
import xml.etree.ElementTree as ET
tree = ET.parse('registerreads_EE.xml')
root = tree.getroot()[3]
for channel in tree.iter('Channel'):
for exportrequest in channel.iter('ExportRequest'):
entityid = exportrequest.attrib.get('EntityID')
for meterread in channel.iter('Reading'):
read = meterread.attrib.get('Value')
date = meterread.attrib.get('ReadingTime')
print read[:-2],",",date[:10],",",entityid
tree.write(open('registerreads_EE.csv','w'))
Here is the screen output when the above is run:
3577 , 2015-10-21 , 73825603
3601 , 2015-10-22 , 73825603
3462 , 2015-10-21 , 73825604
3501 , 2015-10-22 , 73825604
The 'registerreads.csv' output file is like the original XML file, minus the first line.
I would like the printed output above outputted to a csv file with headers of read, date, entityid.
I am having difficulty with this. This is my first python program. Any help is appreciated.

Use the csv module not lxml module to write rows to csv file. But still use lxml to parse and extract content from xml file:
import xml.etree.ElementTree as ET
import csv
tree = ET.parse('registerreads_EE.xml')
root = tree.getroot()[3]
with open('registerreads_EE.csv', 'w', newline='') as r:
writer = csv.writer(r)
writer.writerow(['read', 'date', 'entityid']) # WRITING HEADERS
for channel in tree.iter('Channel'):
for exportrequest in channel.iter('ExportRequest'):
entityid = exportrequest.attrib.get('EntityID')
for meterread in channel.iter('Reading'):
read = meterread.attrib.get('Value')
date = meterread.attrib.get('ReadingTime')
# WRITE EACH ROW ITERATIVELY
writer.writerow([read[:-2],date[:10],entityid])

Related

How to Extract the Information from XML Soap Response?

We have a requirement to get the data from a SOAP XML Response.
Below is the associated XML file
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetResultResponse xmlns="http://www.relatics.com/">
<GetResultResult>
<Report ReportName="RFC" GeneratedOn="2022-12-22" EnvironmentID="XXXX" EnvironmentName="Systematic Assurance – an XXX Solution" EnvironmentURL="https://XXXX.relaticsonline.com/" WorkspaceID="XXXXX" WorkspaceName="P - ADL Program Management - XXX" TargetDevice="Pc" ReportField="" xmlns="">
<Change_module>
<applied_individual_change_request Change_Request="TestKZIreport" RFC_GUID="XXXXX">
<code RFC_Code="VtW-0101" />
<progress RFC_Progress="agreed" />
<applied_individual_project_organisation Organisation="XXXX" />
<applied__individual_discipline Discipline="Highways" />
<specification Specification="Context of Documents">
<code Specification_Code="1.1.1a" />
</specification>
<applied_individual_workpackage Workpackage="Enabling work">
<code Workpackage_Code="WP-01" />
</applied_individual_workpackage>
<physical_object Physical_Object="Train Station">
<code Physical_Object_Code="TFO-0001" />
</physical_object>
<person approver="XXX" />
<applied_individual_change_consequence_qualification Consequence_Value="10 days">
<applied_conceptual_change_consequence_aspect Consequence_Aspect="Schedule" />
</applied_individual_change_consequence_qualification>
<document Document_Name="WI 300 Design.pdf">
<code Document_Code="DOC-0002" />
</document>
<answer_status BR_Status="no" />
<applied_individual_business_rule Business_Rule="Change Review compliance">
<code BR_Code="BR-006" />
</applied_individual_business_rule>
<applied_individual_change_consequence_qualification Consequence_Value="XXX">
<applied_conceptual_change_consequence_aspect Consequence_Aspect="Finance" />
</applied_individual_change_consequence_qualification>
</applied_individual_change_request>
</Change_module>
</Report>
</GetResultResult>
</GetResultResponse>
</soap:Body>
</soap:Envelope>
i need all the tag values after Change_module.i tried some online help in Stack overflow but it didn't work.
I never worked with XML documents before and here is the sample code i
tried from Stack Overflow.
import xml.etree.ElementTree as ET
import pandas as pd
import numpy as np
tree = ET.parse("Relatics_XML.xml")
root = tree.getroot()
print(root.tag)
print(root.attrib)
namespaces = {"soap": "http://www.w3.org/2003/05/soap-envelope/",
"xsi": "http://www.w3.org/2001/XMLSchema-instance",
"xsd": "http://www.w3.org/2001/XMLSchema/",
'a': 'http://www.relatics.com/',}
names = tree.findall('./soap:Body''/a:GetResultResponse''/a:GetResultResult', namespaces)
print(names)
for name in names:
print(name.text)
i tried different methods like find and findall and also inside the method i try to pass different values but all its printing is null.
I'm not sure how to get the values out of tags.

Using xml.etree.ElementTree make life easier.
documentation in here
It can parsing tag attribute or innerText.
import xml.etree.ElementTree as ET
xml = """\
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetResultResponse xmlns="http://www.relatics.com/">
<GetResultResult>
<Report ReportName="RFC" GeneratedOn="2022-12-22" EnvironmentID="XXXX" EnvironmentName="Systematic Assurance – an XXX Solution" EnvironmentURL="https://XXXX.relaticsonline.com/" WorkspaceID="XXXXX" WorkspaceName="P - ADL Program Management - XXX" TargetDevice="Pc" ReportField=""
xmlns="">
<Change_module>
<applied_individual_change_request Change_Request="TestKZIreport" RFC_GUID="XXXXX">
<code RFC_Code="VtW-0101" />
<progress RFC_Progress="agreed" />
<applied_individual_project_organisation Organisation="XXXX" />
<applied__individual_discipline Discipline="Highways" />
<specification Specification="Context of Documents">
<code Specification_Code="1.1.1a" />
</specification>
<applied_individual_workpackage Workpackage="Enabling work">
<code Workpackage_Code="WP-01" />
</applied_individual_workpackage>
<physical_object Physical_Object="Train Station">
<code Physical_Object_Code="TFO-0001" />
</physical_object>
<person approver="XXX" />
<applied_individual_change_consequence_qualification Consequence_Value="10 days">
<applied_conceptual_change_consequence_aspect Consequence_Aspect="Schedule" />
</applied_individual_change_consequence_qualification>
<document Document_Name="WI 300 Design.pdf">
<code Document_Code="DOC-0002" />
</document>
<answer_status BR_Status="no" />
<applied_individual_business_rule Business_Rule="Change Review compliance">
<code BR_Code="BR-006" />
</applied_individual_business_rule>
<applied_individual_change_consequence_qualification Consequence_Value="XXX">
<applied_conceptual_change_consequence_aspect Consequence_Aspect="Finance" />
</applied_individual_change_consequence_qualification>
</applied_individual_change_request>
</Change_module>
</Report>
</GetResultResult>
</GetResultResponse>
</soap:Body>
</soap:Envelope>
"""
root = ET.fromstring(xml)
print("RFC_Code: " + str(root.find(".//code[#RFC_Code]").attrib))
print("RFC_Progress: " + str(root.find(".//progress[#RFC_Progress]").attrib))
print("specification: " + str(root.find(".//specification[#Specification]").attrib))
print("Specification_Code: " + str(root.find(".//code[#Specification_Code]").attrib))
print("Workpackage_Code: " + str(root.find(".//code[#Workpackage_Code]").attrib))
print("Document_Code: " + str(root.find(".//code[#Document_Code]").attrib))
Result
$ python get-data.py
RFC_Code: {'RFC_Code': 'VtW-0101'}
RFC_Progress: {'RFC_Progress': 'agreed'}
specification: {'Specification': 'Context of Documents'}
Specification_Code: {'Specification_Code': '1.1.1a'}
Workpackage_Code: {'Workpackage_Code': 'WP-01'}
Document_Code: {'Document_Code': 'DOC-0002'}
If you using xml file open, using this code
with open('data.xml', 'r') as xml_file:
root = ET.parse(xml_file)

Parse over GET Response

I have a requests call that gives me some somewhat formatted XML data like this:
<info>
<stats vol="545080705" orders="718021755"/>
<symbols timestamp="2022-09-08 19:56:37" count="11394">
<symbol name="TQQQ" vol="8700394" last="28.23" matched="8382339" />
<symbol name="SPY" vol="8571092" last="401.00" matched="8209174" />
<symbol name="SQQQ" vol="7091770" last="44.39" matched="6734334" />
<symbol name="AVCT" vol="6493626" last="0.17" matched="6469576" />
<symbol name="UVXY" vol="6158364" last="9.42" matched="6142800" />
I'm having difficulty figuring out how to convert this into either a dictionary, or data frame, or some other object which I can loop over and extract the NAME, VOL, LAST & MATCHED items.

You can parse this code as follow, but it depends what you need from it :
from bs4 import BeautifulSoup as bs
import pandas as pd
response = """<info>
<stats vol="545080705" orders="718021755"/>
<symbols timestamp="2022-09-08 19:56:37" count="11394">
<symbol name="TQQQ" vol="8700394" last="28.23" matched="8382339" />
<symbol name="SPY" vol="8571092" last="401.00" matched="8209174" />
<symbol name="SQQQ" vol="7091770" last="44.39" matched="6734334" />
<symbol name="AVCT" vol="6493626" last="0.17" matched="6469576" />
<symbol name="UVXY" vol="6158364" last="9.42" matched="6142800" />
</symbols></info>"""
content = bs(response,"lxml-xml" )
df = pd.read_xml(str(content), xpath="//symbol")
Output :
last matched name vol
0 28.23 8382339 TQQQ 8700394
1 401.00 8209174 SPY 8571092
2 44.39 6734334 SQQQ 7091770
3 0.17 6469576 AVCT 6493626
4 9.42 6142800 UVXY 6158364

There are many ways you can parse and convert XML. Here is one of the ways using Beautifulsoup
doc = '''
<info>
<stats vol="545080705" orders="718021755"/>
<symbols timestamp="2022-09-08 19:56:37" count="11394">
<symbol name="TQQQ" vol="8700394" last="28.23" matched="8382339" />
<symbol name="SPY" vol="8571092" last="401.00" matched="8209174" />
<symbol name="SQQQ" vol="7091770" last="44.39" matched="6734334" />
<symbol name="AVCT" vol="6493626" last="0.17" matched="6469576" />
<symbol name="UVXY" vol="6158364" last="9.42" matched="6142800" />
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(doc, 'lxml-xml')
for sym in soup.find_all('symbol'):
print("-"*32)
print(sym.get("name"))
print(sym.get("vol"))
print(sym.get("last"))
print(sym.get("matched"))
If you want more ways you can check this link

Not getting anything while parsing xml file in python

I am a beginner in python and trying to parse the below string from xmlfile where I can get an output like below :
Expected output :{macman:(linkedin,facebook,stack overflow)}
so that I can find out which user has which clients
XML File : <user name="macman"> <client name="linkedin" />
<client name="facebook" /> <client name="stack overflow" />
</user>
Code i am trying :
import urllib.request as url
import xml.etree.ElementTree as ET
xmlfile=url.urlopen(r'file:///D:/Python/user.html')
tree=ET.parse(xmlfile)
root=tree.getroot()
print(root.tag)
for item in root.findall("./user"):
users={}
for child in item:
users[child.tag]=child.text
print(users)

You need to findall() the client not the user. So, your code should look like this:
...
users = {root.get("name"): []}
for item in root.findall("client"):
users[root.get("name")].append(item.get("name"))
print(users)
#{'macman': ['linkedin', 'facebook', 'stack overflow']}

Below
import xml.etree.ElementTree as ET
xml = '''<r><user name="macman1"> <client name="linkedin1" />
<client name="facebook1" /> <client name="stack overflow1" />
</user><user name="macman2"> <client name="linkedin2" />
<client name="facebook2" /> <client name="stack overflow2" />
</user></r>'''
root = ET.fromstring(xml)
data = {u.attrib['name']: [c.attrib['name'] for c in u.findall('./client')] for u in root.findall('.//user')}
print(data)
output
{'macman2': ['linkedin2', 'facebook2', 'stack overflow2'], 'macman1': ['linkedin1', 'facebook1', 'stack overflow1']}

Python XML get immediate child elements only

I have an xml file as below:
<?xml version="1.0" encoding="utf-8"?>
<EDoc CID="1000101" Cname="somename" IName="iname" CSource="e1" Version="1.0">
<RIGLIST>
<RIG RIGID="100001" RIGName="RgName1">
<ListID>
<nodeA nodeAID="1000011" nodeAName="node1A" nodeAExtID="9000011" />
<nodeA nodeAID="1000012" nodeAName="node2A" nodeAExtID="9000012" />
<nodeA nodeAID="1000013" nodeAName="node3A" nodeAExtID="9000013" />
<nodeA nodeAID="1000014" nodeAName="node4A" nodeAExtID="9000014" />
<nodeA nodeAID="1000015" nodeAName="node5A" nodeAExtID="9000015" />
<nodeA nodeAID="1000016" nodeAName="node6A" nodeAExtID="9000016" />
<nodeA nodeAID="1000017" nodeAName="node7A" nodeAExtID="9000017" />
</ListID>
</RIG>
<RIG RIGID="100002" RIGName="RgName2">
<ListID>
<nodeA nodeAID="1000021" nodeAName="node1B" nodeAExtID="9000021" />
<nodeA nodeAID="1000022" nodeAName="node2B" nodeAExtID="9000022" />
<nodeA nodeAID="1000023" nodeAName="node3B" nodeAExtID="9000023" />
</ListID>
</RIG>
</RIGLIST>
</EDoc>
I need to search for the Node value RIGName and if match is found print out all the values of nodeAName
Example:
Searching for RIGName = "RgName2" should print all the values as node1B, node2B, node3B
As of now I am only able to get the first part as below:
import xml.etree.ElementTree as eT
import re
xmlfilePath = "Path of xml file"
tree = eT.parse(xmlfilePath)
root = tree.getroot()
for elem in root.iter("RIGName"):
# print(elem.tag, elem.attrib)
if re.findall(searchtxt, elem.attrib['RIGName'], re.IGNORECASE):
print(elem.attrib)
count += 1
How can I get only the immediate child node values?

Switching from xml.etree to lxml would give you a way to do it in a single go because of a much better XPath query language support:
In [1]: from lxml import etree as ET
In [2]: tree = ET.parse('input.xml')
In [3]: root = tree.getroot()
In [4]: root.xpath('//RIG[#RIGName = "RgName2"]/ListID/nodeA/#nodeAName')
Out[4]: ['node1B', 'node2B', 'node3B']

Finding values in XML document using Python

I have following code that tries to get values from XML document:
from xml.dom import minidom
xml = """<SoccerFeed TimeStamp="20130328T152947+0000">
<SoccerDocument uID="f131897" Type="Result" />
<Competition uID="c87">
<MatchData>
<MatchInfo TimeStamp="20070812T144737+0100" Weather="Windy"Period="FullTime" MatchType="Regular" />
<MatchOfficial uID="o11068"/>
<Stat Type="match_time">91</Stat>
<TeamData TeamRef="t810" Side="Home" Score="4" />
<TeamData TeamRef="t2012" Side="Away" Score="1" />
</MatchData>
<Team uID="t810" />
<Team uID="t2012" />
<Venue uID="v2158" />
</SoccerDocument>
</SoccerFeed>"""
xmldoc = minidom.parseString(xml)
soccerfeed = xmldoc.getElementsByTagName("SoccerFeed")[0]
soccerdocument = soccerfeed.getElementsByTagName("SoccerDocument")[0]
#Match Data
MatchData = soccerdocument.getElementsByTagName("MatchData")[0]
MatchInfo = MatchData.getElementsByTagName("MatchInfo")[0]
Goal = MatchData.getElementsByTagNameNS("Side", "Score")
The Goal is being set to [], but I would like to get the score value, which is 4.

It looks like you are searching for the wrong XML node. Check following line:
Goal = MatchData.getElementsByTagNameNS("Side", "Score")
You probably are looking for following:
Goal = MatchData.getElementsByTagName("TeamData")[0].getAttribute("Score")
NOTE: Document.getElementsByTagName, Document.getElementsByTagNameNS, Element.getElementsByTagName, Element.getElementsByTagNameNS return a list of nodes, not just a scalar value.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python ElementTree xml output to csv - python

Related

How to Extract the Information from XML Soap Response?

Parse over GET Response

Not getting anything while parsing xml file in python

Python XML get immediate child elements only

Finding values in XML document using Python

Categories

Resources