Printing child node using xml - python

Here is an api xml i am working with:
<response>
<request>polaris</request>
<status>0</status>
<verbiage>OK</verbiage>
<object id="S251">
<type id="1">Star</type>
<name>α UMi</name>
<catId>α UMi</catId>
<constellation id="84">Ursa Minor</constellation>
<ra unit="hour">2.5301944</ra>
<de unit="degree">89.264167</de>
<mag>2.02</mag>
</object>
<object id="S251">
<type id="1">Star</type>
<name>α UMi</name>
<catId>α UMi</catId>
<constellation id="84">Ursa Minor</constellation>
<ra unit="hour">2.5301944</ra>
<de unit="degree">89.264167</de>
<mag>2.02</mag>
</object>
</response>
Here is my current code:
#!/usr/bin/env python
import xml.etree.ElementTree as ET
tree = ET.parse('StarGaze.xml')
root = tree.getroot()
callevent=root.find('polaris')
Moc1=callevent.find('polaris')
for node in Moc1.getiterator():
if node.tag=='constellation id':
print node.tag, node.attrib, node.text'
I want to be able to print defined children. For example:
constellation id=
ra unit=
Any help would be very much appreciated

Iterate over the object nodes and locate the constellation and ra nodes using findall() and find() methods and .attrib attribute:
import xml.etree.ElementTree as ET
tree = ET.parse('StarGaze.xml')
root = tree.getroot()
for obj in root.findall("object"):
constellation = obj.find("constellation")
ra = obj.find("ra")
print(constellation.attrib["id"], constellation.text, ra.attrib["unit"], ra.text)
Would print:
84 Ursa Minor hour 2.5301944
84 Ursa Minor hour 2.5301944

Related

How to access UBL 2.1 xml tag using python

I need to access the tags in UBL 2.1 and modify them depend on the on the user input on python.
So, I used the ElementTree library to access the tags and modify them.
Here is a sample of the xml code:
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ns2="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>2022-11-05</ns1:IssueDate>
The issue :
I want to access the tags but it is doesn't modifed and enter the loop
I tried both ways:
mytree = ET.parse('test.xml')
myroot = mytree.getroot()
for x in myroot.find({xmlns:ns1=urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate}"):
x.text = '1999'
mytree.write('test.xml')
mytree = ET.parse('test.xml')
myroot = mytree.getroot()
for x in myroot.iter('./Invoice/AllowanceCharge/ChargeIndicator'):
x.text = str('true')
mytree.write('test.xml')
None of them worked and modify the tag.
So the questions is : How can I reach the specific tag and modify it?
If you correct the namespace and the brakets in your for loop it works for a valid XML like (root tag must be closed!):
Input:
<?xml version="1.0" encoding="utf-8"?>
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ns2="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>2022-11-05</ns1:IssueDate>
</ns0:Invoice>
Your repaired code:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for elem in root.findall("{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate"):
elem.text = '1999'
tree.write('test_changed.xml', encoding='utf-8', xml_declaration=True)
ET.dump(root)
Output:
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>1999</ns1:IssueDate>
</ns0:Invoice>

xml.etree.ElementTree .remove

I'm trying to remove tags from an Xml.Alto file with remove.
My Alto file looks like this:
<alto xmlns="http://www.loc.gov/standards/alto/ns-v4#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-2.xsd"> <Description>
<MeasurementUnit>pixel</MeasurementUnit>
<sourceImageInformation>
<fileName>filename</fileName>
</sourceImageInformation>
</Description>
<Layout>
<Page>
<PrintSpace>
<TextBlock>
<Shape><Polygon/></Shape>
<TextLine>
<Shape><Polygon/></Shape>
<String CONTENT="ABCDEF" HPOS="1234" VPOS="1234" WIDTH="1234" HEIGHT="1234" />
</TextLine>
</TextBlock>
</PrintSpace>
</Page>
</Layout>
</alto>
AND my code is :
import xml.etree.ElementTree as ET
tree = ET.parse("file.xml")
root = tree.getroot()
ns = {'alto': 'http://www.loc.gov/standards/alto/ns-v4#'}
ET.register_namespace("", "http://www.loc.gov/standards/alto/ns-v4#")
for Test in root.findall('.//alto:TextBlock', ns):
root.remove(Test)
tree.write('out.xml', encoding="UTF-8", xml_declaration=True)
Here is the error I get:
ValueError: list.remove(x): x not in list
Thanks a lot for your help 💐
ElementFather.remove(ElementChild) works only if the ElementChild is a sub-element of ElementFather. In your case, you have to call remove from PrintSpace.
import xml.etree.ElementTree as ET
tree = ET.parse("file.xml")
root = tree.getroot()
ns = {'alto': 'http://www.loc.gov/standards/alto/ns-v4#'}
ET.register_namespace("", "http://www.loc.gov/standards/alto/ns-v4#")
for Test in root.findall('.//alto:TextBlock', ns):
PrintSpace = root.find('.//alto:PrintSpace',ns)
PrintSpace.remove(Test)
tree.write('out.xml', encoding="UTF-8", xml_declaration=True)
Note: This code is only an example of a working solution, for sure you can improve it.

XML Attribures Empty

I'm reading an xml object into Python 3.6 on Windows 10 from file. Here is a sample of the xml:
<?xml version="1.0"?>
<rss version="2.0" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<item>
<BurnLocation># 32 40 52.99 # 80 57 33.00</BurnLocation>
<geo:lat>32.681389</geo:lat>
<geo:long>-80.959167</geo:long>
<County>Jasper</County>
<BurnType>PD</BurnType>
<BurnTypeDescription>PILED DEBRIS</BurnTypeDescription>
<Acres>2</Acres>
</item>
<item>
<BurnLocation># 33 29 34.26 # 81 15 52.89</BurnLocation>
<geo:lat>33.492851</geo:lat>
<geo:long>-81.264694</geo:long>
<County>Orangebrg</County>
<BurnType>PD</BurnType>
<BurnTypeDescription>PILED DEBRIS</BurnTypeDescription>
<Acres>1</Acres>
</item>
</channel>
</rss>
Here is a version of my code:
import os
import xml.etree.ElementTree as ET
local_filename = os.path.join('C:\\Temp\\test\\', filename)
tree = ET.parse(local_filename)
root = tree.getroot()
for child in root:
for next1 in child:
for next2 in next1:
print(next2.tag,next2.attrib)
The issue I'm having is that I cannot seem to isolate the attributes of the child tags, they are coming up as empty dictionaries. Here is an example of the result:
BurnLocation {}
{http://www.w3.org/2003/01/geo/wgs84_pos#}lat {}
{http://www.w3.org/2003/01/geo/wgs84_pos#}long {}
County {}
BurnType {}
BurnTypeDescription {}
Acres {}
BurnLocation {}
{http://www.w3.org/2003/01/geo/wgs84_pos#}lat {}
{http://www.w3.org/2003/01/geo/wgs84_pos#}long {}
County {}
BurnType {}
BurnTypeDescription {}
Acres {}
I am trying to print out the items within the tags (i.e. Jasper), what am I doing wrong?
What you want here is the text contents of each element, and not their attributes.
This ought to do it (slightly simplified for a fixed filename):
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
for child in root:
for next1 in child:
for next2 in next1:
print ('{} = "{}"'.format(next2.tag,next2.text))
print ()
However, I'd simplify it a bit by:
locating all <item> elements at once, and
then looping over its children elements.
Thus
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
for item in tree.findall('*/item'):
for elem in list(item):
print ('{} = "{}"'.format(elem.tag,elem.text))
print ()

Python ElementTree - Search children/grandchildren in poorly written XML

I'm trying to parse through a poorly coded XML and output the Node Name and content of a tag (only if it exists), and only if string name=content > 30 day(s).
Thus far I can search the children elements using ElementTree, but I need help with the poorly nested info. I can't change the XML because it's a vendor provided report. I'm a complete newbie, so please coach me on what I need to do or provide for better help. Thanks in advance.
Example File:
<?xml version="1.0" encoding="UTF-8"?>
<ReportSection>
<ReportHead>
<Criteria>
<HeadStuff value=Dont Care>
</HeadStuff>
</Criteria>
</ReportHead>
<ReportBody>
<ReportSection name="UpTime" category="rule">
<ReportSection name="NodeName.domain.net" category="node">
<String name="node">NodeName.domain.net</String>
<String name="typeName">Windows Server</String>
<OID>-1y2p0ij32e8c8:-1y2p0idhghwg6</OID>
<ReportSection name="UpTime" category="element">
<ReportSection name="2015-09-20 18:50:10.0" category="version">
<String name="version">UpTime</String>
<OID>-1y2p0ij32e8cj:-1y2p0ibspofhp</OID>
<Integer name="changeType">2</Integer>
<String name="changeTypeName">Modified</String>
<Timestamp name="changeTime" displayvalue="9/20/15 6:50 PM">1442793010000</Timestamp>
<ReportSection name="versionContent" category="versionContent">
<String name="content">12 day(s), 7 hour(s), 33 minute(s), 8 second(s)</String>
<String name="content"></String>
</ReportSection>
</ReportSection>
</ReportSection>
</ReportSection>
</ReportSection>
</ReportBody>
</ReportSection>
The idea would be to locate the content node, extract how many days are there, then check the value if needed, and locate the node name. Example (using lxml.etree):
import re
from lxml import etree
pattern = re.compile(r"^(\d+) day\(s\)")
data = """your XML here"""
tree = etree.fromstring(data)
content = tree.findtext(".//String[#name='content']")
if content:
match = pattern.search(content)
if match:
days = int(match.group(1))
# TODO: check the days if needed
node = tree.findtext(".//String[#name='node']")
print node, days
Prints:
NodeName.domain.net 12

Using XPath in ElementTree

My XML file looks like the following:
<?xml version="1.0"?>
<ItemSearchResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2008-08-19">
<Items>
<Item>
<ItemAttributes>
<ListPrice>
<Amount>2260</Amount>
</ListPrice>
</ItemAttributes>
<Offers>
<Offer>
<OfferListing>
<Price>
<Amount>1853</Amount>
</Price>
</OfferListing>
</Offer>
</Offers>
</Item>
</Items>
</ItemSearchResponse>
All I want to do is extract the ListPrice.
This is the code I am using:
>> from elementtree import ElementTree as ET
>> fp = open("output.xml","r")
>> element = ET.parse(fp).getroot()
>> e = element.findall('ItemSearchResponse/Items/Item/ItemAttributes/ListPrice/Amount')
>> for i in e:
>> print i.text
>>
>> e
>>
Absolutely no output. I also tried
>> e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
No difference.
What am I doing wrong?
There are 2 problems that you have.
1) element contains only the root element, not recursively the whole document. It is of type Element not ElementTree.
2) Your search string needs to use namespaces if you keep the namespace in the XML.
To fix problem #1:
You need to change:
element = ET.parse(fp).getroot()
to:
element = ET.parse(fp)
To fix problem #2:
You can take off the xmlns from the XML document so it looks like this:
<?xml version="1.0"?>
<ItemSearchResponse>
<Items>
<Item>
<ItemAttributes>
<ListPrice>
<Amount>2260</Amount>
</ListPrice>
</ItemAttributes>
<Offers>
<Offer>
<OfferListing>
<Price>
<Amount>1853</Amount>
</Price>
</OfferListing>
</Offer>
</Offers>
</Item>
</Items>
</ItemSearchResponse>
With this document you can use the following search string:
e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
The full code:
from elementtree import ElementTree as ET
fp = open("output.xml","r")
element = ET.parse(fp)
e = element.findall('Items/Item/ItemAttributes/ListPrice/Amount')
for i in e:
print i.text
Alternate fix to problem #2:
Otherwise you need to specify the xmlns inside the srearch string for each element.
The full code:
from elementtree import ElementTree as ET
fp = open("output.xml","r")
element = ET.parse(fp)
namespace = "{http://webservices.amazon.com/AWSECommerceService/2008-08-19}"
e = element.findall('{0}Items/{0}Item/{0}ItemAttributes/{0}ListPrice/{0}Amount'.format(namespace))
for i in e:
print i.text
Both print:
2260
from xml.etree import ElementTree as ET
tree = ET.parse("output.xml")
namespace = tree.getroot().tag[1:].split("}")[0]
amount = tree.find(".//{%s}Amount" % namespace).text
Also, consider using lxml. It's way faster.
from lxml import ElementTree as ET
Element tree uses namespaces so all the elements in your xml have name like
{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Items
So make the search include the namespace
e.g.
search = '{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Items/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Item/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}ItemAttributes/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}ListPrice/{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Amount'
element.findall( search )
gives the element corresponding to 2260
I ended up stripping out the xmlns from the raw xml like that:
def strip_ns(xml_string):
return re.sub('xmlns="[^"]+"', '', xml_string)
Obviously be very careful with this, but it worked well for me.
One of the most straight forward approach and works even with python 3.0 and other versions is like below:
It just takes the root and starts getting into it till we get the
specified "Amount" tag
from xml.etree import ElementTree as ET
tree = ET.parse('output.xml')
root = tree.getroot()
#print(root)
e = root.find(".//{http://webservices.amazon.com/AWSECommerceService/2008-08-19}Amount")
print(e.text)

Categories