How to use ElemenTree for reading XML files in Python? - python

I've got an XML file which looks like this:
<?xml version="1.0"?>
-<Object>
<ID>Object_01</ID>
<Location>Manchester</Location>
<Date>01-01-2020</Date>
<Time>15u59m05s</Time>
-<Max_25Hz>
<25Hz>0.916631065043311</25Hz>
<25Hz>0.797958008447961</25Hz>
</Max_25Hz>
-<Max_75Hz>
<75Hz>1.96599232706463</75Hz>
<75Hz>1.48317837078523</75Hz>
</Max_75Hz>
</Object>
I still don't really understand the difference between attributes and text. With below code I tried to receive all the values using text.
import xml.etree.ElementTree as ET
root = r'c:\data\FF\Desktop\My_files\XML-files\Object_01.xml'
tree = ET.parse(root)
root = tree.getroot()
for elem in root:
for subelem in elem:
print(subelem.text)
Expected output:
Object_01
Manchester
01-01-2020
15u59m05s
0.916631065043311
0.797958008447961
1.96599232706463
1.48317837078523
Received output:
0.916631065043311
0.797958008447961
1.96599232706463
1.48317837078523
I tried to do to same with .attributes in the hope to receive all the 'column' names but then I received:
{}
{}
{}
{}

You can access them directly above the for-loop.
Ex:
tree = ET.ElementTree(ET.fromstring(X))
root = tree.getroot()
for elem in root:
print(elem.text) #! Access them Here
for subelem in elem:
print(subelem.text)
Output:
Object_01
Manchester
01-01-2020
15u59m05s
0.916631065043311
0.797958008447961
1.96599232706463
1.48317837078523

You could give a try to https://github.com/martinblech/xmltodict.
It is almost a replacement for json module. This allows to read an xml file into a python dict. This simplifies greatly accessing the xml content.
Something like:
from xmldict import *
root = r'c:\data\FF\Desktop\My_files\XML-files\Object_01.xml'
with open(root) as file:
xmlStr = file.read()
xmldict = xml.parse(xmlStr)
print (xmldict['Object']['Id'])

Related

How to access UBL 2.1 xml tag using python

I need to access the tags in UBL 2.1 and modify them depend on the on the user input on python.
So, I used the ElementTree library to access the tags and modify them.
Here is a sample of the xml code:
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ns2="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>2022-11-05</ns1:IssueDate>
The issue :
I want to access the tags but it is doesn't modifed and enter the loop
I tried both ways:
mytree = ET.parse('test.xml')
myroot = mytree.getroot()
for x in myroot.find({xmlns:ns1=urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate}"):
x.text = '1999'
mytree.write('test.xml')
mytree = ET.parse('test.xml')
myroot = mytree.getroot()
for x in myroot.iter('./Invoice/AllowanceCharge/ChargeIndicator'):
x.text = str('true')
mytree.write('test.xml')
None of them worked and modify the tag.
So the questions is : How can I reach the specific tag and modify it?
If you correct the namespace and the brakets in your for loop it works for a valid XML like (root tag must be closed!):
Input:
<?xml version="1.0" encoding="utf-8"?>
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ns2="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>2022-11-05</ns1:IssueDate>
</ns0:Invoice>
Your repaired code:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for elem in root.findall("{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate"):
elem.text = '1999'
tree.write('test_changed.xml', encoding='utf-8', xml_declaration=True)
ET.dump(root)
Output:
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>1999</ns1:IssueDate>
</ns0:Invoice>

Finding element in xml with python

I am trying to parse XML before converting it's content into lists and then into CSV. Unfortunately, I think my search terms for finding the initial element are failing, causing subsequent searches further down the hierarchy. I am new to XML, so I've tried variations on namespace dictionaries and including the namespace references... The simplified XML is given below:
<?xml version="1.0" encoding="utf-8"?>
<StationList xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:add="http://www.govtalk.gov.uk/people/AddressAndPersonalDetails"
xmlns:com="http://nationalrail.co.uk/xml/common" xsi:schemaLocation="http://internal.nationalrail.co.uk/xml/XsdSchemas/External/Version4.0/nre-station-v4-0.xsd"
xmlns="http://nationalrail.co.uk/xml/station">
<Station xsi:schemaLocation="http://internal.nationalrail.co.uk/xml/XsdSchemas/External/Version4.0/nre-station-v4-0.xsd">
<ChangeHistory>
<com:ChangedBy>spascos</com:ChangedBy>
<com:LastChangedDate>2018-11-07T00:00:00.000Z</com:LastChangedDate>
</ChangeHistory>
<Name>Aber</Name>
</Station>​
The Code I am using to try to extract the com/...xml/station / ChangedBy element is below
tree = ET.parse(rootfilepath + "NRE_Station_Dataset_2019_raw.xml")
root = tree.getroot()
#get at the tags and their data
#for elem in tree.iter():
# print(f"this the tag {elem.tag} and this is the data: {elem.text}")
#open file for writing
station_data = open(rootfilepath + 'station_data.csv','w')
csvwriter = csv.writer(station_data)
station_head = []
count = 0
#inspiration for this code: http://blog.appliedinformaticsinc.com/how-to- parse-and-convert-xml-to-csv-using-python/
#this is where it goes wrong; some combination of the namespace and the tag can't find anything in line 27, 'StationList'
for member in root.findall('{http://nationalrail.co.uk/xml/station}Station'):
station = []
if count == 0:
changedby = member.find('{http://nationalrail.co.uk/xml/common}ChangedBy').tag
station_head.append(changedby)
name = member.find('{http://nationalrail.co.uk/xml/station}Name').tag
station_head.append(name)
count = count+1
changedby = member.find('{http://nationalrail.co.uk/xml/common}ChangedBy').text
station.append(changedby)
name = member.find('{http://nationalrail.co.uk/xml/station}Name').text
station.append(name)
csvwriter.writerow(station)
I have tried:
using dictionaries of namespaces but that results in nothing being found at all
using hard coded namespaces but that results in "Attribute Error: 'NoneType' object has no attribute 'tag'
Thanks in advance for all and any assistance.
First of all your XML is invalid (</StationList> is absent at the end of a file).
Assuming you have valid XML file:
<?xml version="1.0" encoding="utf-8"?>
<StationList xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:add="http://www.govtalk.gov.uk/people/AddressAndPersonalDetails"
xmlns:com="http://nationalrail.co.uk/xml/common" xsi:schemaLocation="http://internal.nationalrail.co.uk/xml/XsdSchemas/External/Version4.0/nre-station-v4-0.xsd"
xmlns="http://nationalrail.co.uk/xml/station">
<Station xsi:schemaLocation="http://internal.nationalrail.co.uk/xml/XsdSchemas/External/Version4.0/nre-station-v4-0.xsd">
<ChangeHistory>
<com:ChangedBy>spascos</com:ChangedBy>
<com:LastChangedDate>2018-11-07T00:00:00.000Z</com:LastChangedDate>
</ChangeHistory>
<Name>Aber</Name>
</Station>​
</StationList>
Then you can convert your XML to JSON and simply address to the required value:
import xmltodict
with open('file.xml', 'r') as f:
data = xmltodict.parse(f.read())
changed_by = data['StationList']['Station']['ChangeHistory']['com:ChangedBy']
Output:
spascos
Try lxml:
#!/usr/bin/env python3
from lxml import etree
ns = {"com": "http://nationalrail.co.uk/xml/common"}
with open("so.xml") as f:
tree = etree.parse(f)
for t in tree.xpath("//com:ChangedBy/text()", namespaces=ns):
print(t)
Output:
spascos
You can use Beautifulsoup which is an html and xml parser
from bs4 import BeautifulSoup
fd = open(rootfilepath + "NRE_Station_Dataset_2019_raw.xml")
soup = BeautifulSoup(fd,'lxml-xml')
for i in soup.findAll('ChangeHistory'):
print(i.ChangedBy.text)

How to parse the xml with xmlns attribute using python

<?xml version="1.0" ?>
<school xmlns="loyo:22:2.2">
<profile>
<student xmlns="loyo:5:542">
<marks>
<mark java="java:/lo">
<ca1>200</ca1>
</mark>
</marks>
</student>
</profile>
</school>
I trying to access the ca1 text. I am using etree but I cannot access it. I'm using below code.
import xml.etree.ElementTree as ET
tree = ET.parse('mca.xml')
root = tree.getroot()
def getElementsData(xpath):
elements = list()
if root.findall(xpath):
for elem in root.findall(xpath):
elements.append(elem.text)
return elements
else:
raise SystemExit("Invalid xpath provided")
t = getElementsData('.//ca1')
for i in t:
print(i)
I tried in different way to access it I don't know the exact problem. Is it recording file type issue?
Your document has namespaces on nodes school and student, you need to incorporate the namespaces in your search. Since you are looking for ca1, which is under student, you will need to specify the namespace that student node has:
import xml.etree.ElementTree as ET
tree = ET.parse('mca.xml')
root = tree.getroot()
def getElementsData(xpath, namespaces):
elements = root.findall(xpath, namespaces)
if elements == []:
raise SystemExit("Invalid xpath provided")
return elements
namespaces = {'ns_school': 'loyo:22:2.2', 'ns_student': 'loyo:5:542'}
elements = getElementsData('.//ns_student:ca1', namespaces)
for element in elements:
print(element)
Notes
Since your namespaces have no names, I gave them such names as ns_school, ns_student, but these name can be anything (e.g. ns1, mystudent, ...)
In a more complex system, I recommend raising some other kinds of errors and let the caller decide whether or not to exit.
How about traversing like this
import xml.etree.ElementTree
e = xml.etree.ElementTree.parse('test.xml').getroot()
data = e.getchildren()[0].getchildren()[0].getchildren()[0].getchildren()[0].getchildren()[0].text
print(data)
Try the following xpath
tree.xpath('//ca1//text()')[0].strip()

LXML add an element into root

Im trying to take two elements from one file (file1.xml), and write them onto the end of another file (file2.xml). I am able to get them to print out, but am stuck trying to write them onto file2.xml! Help !
filename = "file1.xml"
appendtoxml = "file2.xml"
output_file = appendtoxml.replace('.xml', '') + "_editedbyed.xml"
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(filename, parser)
etree.tostring(tree)
root = tree.getroot()
a = root.findall(".//Device")
b = root.findall(".//Speaker")
for r in a:
print etree.tostring(r)
for e in b:
print etree.tostring(e)
NewSub = etree.SubElement (root, "Audio(just writes audio..")
print NewSub
I want the results of a, b to be added onto the end of outputfile.xml in the root.
Parse both the input file and the file you wish to append to.
Use root.append(elt) to append Element, elt, to root.
Then use tree.write to write the new tree to a file (e.g. appendtoxml):
Note: The links above point to documentation for xml.etree from the standard
library. Since lxml's API tries to be compatible with the standard library's
xml.etree, the standard library documentation applies to lxml as well (at
least for these methods). See http://lxml.de/api.html for information on where
the APIs differ.
import lxml.etree as ET
filename = "file1.xml"
appendtoxml = "file2.xml"
output_file = appendtoxml.replace('.xml', '') + "_editedbyed.xml"
parser = ET.XMLParser(remove_blank_text=True)
tree = ET.parse(filename, parser)
root = tree.getroot()
out_tree = ET.parse(appendtoxml, parser)
out_root = out_tree.getroot()
for path in [".//Device", ".//Speaker"]:
for elt in root.findall(path):
out_root.append(elt)
out_tree.write(output_file, pretty_print=True)
If file1.xml contains
<?xml version="1.0"?>
<root>
<Speaker>boozhoo</Speaker>
<Device>waaboo</Device>
<Speaker>anin</Speaker>
<Device>gigiwishimowin</Device>
</root>
and file2.xml contains
<?xml version="1.0"?>
<root>
<Speaker>jubal</Speaker>
<Device>crane</Device>
</root>
then file2_editedbyed.xml will contain
<root>
<Speaker>jubal</Speaker>
<Device>crane</Device>
<Device>waaboo</Device>
<Device>gigiwishimowin</Device>
<Speaker>boozhoo</Speaker>
<Speaker>anin</Speaker>
</root>

python parsing xml with ElementTree doesn't give interested result

I have an xml file like this
<?xml version="1.0"?>
<sample>
<text>My name is <b>Wrufesh</b>. What is yours?</text>
</sample>
I have a python code like this
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
for child in root:
print child.text()
I only get
'My name is' as an output.
I want to get
'My name is <b>Wrufesh</b>. What is yours?' as an output.
What can I do?
You can get your desired output using using ElementTree.tostringlist():
>>> import xml.etree.ElementTree as ET
>>> root = ET.parse('sample.xml').getroot()
>>> l = ET.tostringlist(root.find('text'))
>>> l
['<text', '>', 'My name is ', '<b', '>', 'Wrufesh', '</b>', '. What is yours?', '</text>', '\n']
>>> ''.join(l[2:-2])
'My name is <b>Wrufesh</b>. What is yours?'
I wonder though how practical this is going to be for generic use.
I don't think treating tag in xml as a string is right. You can access the text part of xml like this:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
text = root[0]
for i in text.itertext():
print i
# As you can see, `<b>` and `</b>` is a pair of tags but not strings.
print text._children
I would suggest pre-processing the xml file to wrap elements under <text> element in CDATA. You should be able to read the values without a problem afterwards.
<text><![CDATA[<My name is <b>Wrufesh</b>. What is yours?]]></text>

Categories