Parsing nested attributes

Parsing nested attributes - python

Good day dear developers.
I can't fully parse an xml file.
The structure looks like:
<foo>
<bar1 id="1">
<bar2>
<foobar id="2">name1</foobar>
<foobar id="3">name2</foobar>
</bar2>
</bar1>
</foo>
I used the xml.etree library so I use code like:
source.get('Id')
so i get the first attribute
to get a nested tag i use code like:
source.find('bar/foobar').text
The question is how to get next nested attributes? ( Id =2 and id = 3)
It shows an error when i'm trying to use some stuff with slash
source.get('bar/id')
and other tries give me just the first attribute which i already got, also the second nested attribute has the same name Id.
Thank you for the help in advance.

Below is a working example
import xml.etree.ElementTree as ET
xml = '''<foo>
<bar1 id="1">
<bar2>
<foobar id="2">name1</foobar>
<foobar id="3">name2</foobar>
</bar2>
</bar1>
</foo>'''
root = ET.fromstring(xml)
ids = [f.attrib.get('id') for f in root.findall('.//foobar')]
print(ids)
output
['2','3']

You need to specify a working XPATH expression, like:
foobars = source.findall('bar1/bar2/foobar')
for elem in foobars:
print(elem.get('id'))
Output:
2
3

It works now for one line, but what if we have several bar1? Like this
<foo>
<bar1 id="1">
<bar2>
<foobar id="2">name1</foobar>
<foobar id="3">name2</foobar>
</bar2>
</bar1>
<bar1 id="2">
<bar2>
<foobar id="2">name3</foobar>
<foobar id="3">name4</foobar>
</bar2>
</bar1>
</foo>
The loop (findall=> for)will print all of it(4 ids), but i need just 2 of them for each row

Related

XML accessing elements within the tree with Etree python

I'm trying to access information within a XML file via python Etree. The XML looks like this:
<events-data>
<dossier-event event-type="new" id="EVT_4573534">
<event-date>
<date>20220816</date>
</event-date>
<event-code>EPIDOSNWIAI</event-code>
<event-text event-text-type="DESCRIPTION">text</event-text>
</dossier-event>
</events-data>
<events-data>
<dossier-event event-type="new" id="EVT_4573535">
<event-date>
<date>20220402</date>
</event-date>
<event-code>EPIDOS PCT</event-code>
<event-text event-text-type="DESCRIPTION">text1</event-text>
</dossier-event>
</events-data>
I want to access the <date> 20220402 </date> and retrieve the date, so 20220402. My attempt for it looks like this:
root_events = ET.fromstring(response_events.content)
for element in root_events.iter('{http://myapi/register}date'):
print(element.text)
The problem: There is an unknown number of<date>[date]</date> before and after this date, but which is not within <events-data> or <event-date>. But if I try to list all tags, attributes or text of <event-date>, it's empty. Can someone explain me how i only access the dates within something like
<event-date>
<date>20220402</date>
</event-date>

If you fix your XML to have a root tag, an XPATH query might work for you:
import xml.etree.ElementTree as ET
for event_date in ET.parse("sample.xml").getroot().findall(".//events-data/dossier-event/event-date/date"):
print(event_date.text)
Produces the following output:
$ python sample.py
20220816
20220402

Is there a way to filter xml by an id attribute in elementtree using python

I am new to python and elementtree in particular and am trying to filter some information from an xml. I have managed to locate the information I need fo every TX id (coordinates), however I want to filter this to just Tx id "TxA". Ive included a section of the xml file and the code below with some comments to help show the problem. Any help or guidance is greatly appreciated.
All lists were previously set up (hence appending)
Sections commented, do work for all Tx ids however I am now going back to try and filter
Subelm.attrib gives the Tx id. Ive shown two attempts ive made in the code on lines 5-8
Part of the XML:
<TX id="TxA">
<Tx_WGS84_Longitude>-105.0846057</Tx_WGS84_Longitude>
<Tx_WGS84_Latitude>42.9565772</Tx_WGS84_Latitude>
<Tx_Easting>678133.8818</Tx_Easting>
<Tx_Northing>895120.939</Tx_Northing>
<Channel id="3">
<Ant_Config> =</Ant_Config>
<Ant_name>Tx_ant_name1</Ant_name>
<Ant_Electrode_1>TxAEL1<Ant_Easting>678135.1069</Ant_Easting><Ant_Northing>895248.2057</Ant_Northing></Ant_Electrode_1>
<Ant_Electrode_2>TxAEL2<Ant_Easting>678137.0213</Ant_Easting><Ant_Northing>891059.2502</Ant_Northing></Ant_Electrode_2>
</Channel>
<Channel id="1">
<Ant_Config> =</Ant_Config>
<Ant_name>Tx_ant_name2</Ant_name>
<Ant_Electrode_1>TxAEL1<Ant_Easting>678135.1069</Ant_Easting><Ant_Northing>895248.2057</Ant_Northing></Ant_Electrode_1>
<Ant_Electrode_2>TxAEL2<Ant_Easting>678137.0213</Ant_Easting><Ant_Northing>891059.2502</Ant_Northing></Ant_Electrode_2>
</Channel>
</TX>
<TX id="TxB">
<Tx_WGS84_Longitude>-105.08459550832</Tx_WGS84_Longitude>
<Tx_WGS84_Latitude>42.9506068474998</Tx_WGS84_Latitude>
<Tx_Easting>678135.4896</Tx_Easting>
<Tx_Northing>893006.9206</Tx_Northing>
<Channel id="3">
<Ant_Config> =</Ant_Config>
<Ant_name>Tx_ant_name1</Ant_name>
<Ant_Electrode_1>TxBEL1<Ant_Easting>678135.6131</Ant_Easting><Ant_Northing>893055.2569</Ant_Northing></Ant_Electrode_1>
<Ant_Electrode_2>TxBEL2<Ant_Easting>678138.3127</Ant_Easting><Ant_Northing>888854.3852</Ant_Northing></Ant_Electrode_2>
</Channel>
<Channel id="1">
<Ant_Config> =</Ant_Config>
<Ant_name>Tx_ant_name2</Ant_name>
<Ant_Electrode_1>TxBEL1<Ant_Easting>678135.6131</Ant_Easting><Ant_Northing>893055.2569</Ant_Northing></Ant_Electrode_1>
<Ant_Electrode_2>TxBEL2<Ant_Easting>678138.3127</Ant_Easting><Ant_Northing>888854.3852</Ant_Northing></Ant_Electrode_2>
</Channel>
Part of the Python code:
for child in root:
if child.tag=='Layout':
for subelm in child:
if subelm.tag=='TX':
for name in subelm.iter('TxA'):
print (subelm.attrib)
if ('id' in subelm.attrib.text):
print (subelm.attrib.text)
for channel in subelm:
for electrode in channel:
for electrode1 in electrode.iter('Ant_Electrode_1'):
for electrode1 in electrode.iter('Ant_Easting'):
x1t.append(electrode1.text)
for electrode1 in electrode.iter('Ant_Northing'):
y1t.append(electrode1.text)
for electrode2 in electrode.iter('Ant_Electrode_2'):
for electrode2 in electrode.iter('Ant_Easting'):
x2t.append(electrode2.text)
for electrode2 in electrode.iter('Ant_Northing'):
y2t.append(electrode2.text)

You could just use an xpath expression:
>>> from lxml import etree
>>> with open('data.xml') as fd:
... doc = etree.parse(fd)
...
>>> matches = doc.xpath('//TX[#id="TxA"]')
>>> matches
[<Element TX at 0x7fbfdbf50370>]
>>> matches[0].find('Tx_WGS84_Longitude').text
'-105.0846057'

Parsing subchilds in XML with ElementTree

Im trying to extract information from a XML-document with ElementTree in Python 3.2.
The XML looks like this:
<Page Id="1">
<Group>4</Group>
<Type>
<Letter>B</Letter>
<Number>101</Number>
<Deep>
<A>900</A>
<B>900</B>
</Deep>
</Type>
</Page>
I manage to get the elementdata from "Group" with:
for Page in root.iter('Page'):
Group = Page.find('Group').text
And "Letter"-data with:
for Type in root.iter('Type'):
Dim = Type.find('Letter').text
However I can't figure out how to get the data from the subchilds of "Deep" (A and B).
All help is greatly appreciated!

You are very close. Use find to find the Deep tag and the iterate over it.
Ex:
import xml.etree.ElementTree as ET
tree = ET.parse(filename)
root = tree.getroot()
for Type in root.iter('Type'):
for deep_tag in Type.find("Deep"):
print( deep_tag.text )
Output:
900
900

is there a way to get the attribute text directly in a xml without traversing through children in elementree in python

I am using the python module : xml.etree.ElementTree for parsing xml files.
I am curious to know if there is a way to directly find an attribute that is nested deeply.
For example if I want to get the name attribute of neigbhor (if it exists),
I need to traverse through country/rank/year/gdppc, if my root is data. Is there a quick way to look up that attribute?
<data>
<country name="Liechtenstein">
<rank>
<year>
<gdppc>
<neighbor name="Austria" direction="E"/>
</gdppc>
</year>
</rank>
</country>
</data>
EDIT:
I tried something on this line. but did not help; I am not sure if I should be using resp.content for the xml retrived
resp=requests.get(url_fetch,params=query)
with open(resp.content) as fd:
doc = ElementTree.parse(fd)
name = doc.find('PubmedArticle//Volume').text
print name
here is the xml:

Depending on what your data looks like and exactly what you're trying to accomplish, you could do something like this:
with open('data.xml') as fd:
doc = ElementTree.parse(fd)
name = doc.find('country[#name="Liechtenstein"]//neighbor').get('name')
print name
Which given the input above would yield:
Austria
If you're parsing XML with Python, you may want to look at the lxml module, which has full support for XPath queries.
This works for me with the URL you gave above:
#!/usr/bin/python
import requests
from xml.etree import ElementTree
res = requests.get('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=24059499&retmode=xml')
doc = ElementTree.fromstring(res.content)
ele = doc.find('.//PubmedArticle//Volume')
print ele.text

Convert XML to python objects using lxml

I'm trying to use the lxml library to parse an XML file...what I want is to use XML as the datasource, but still maintain the normal Django-way of interactive with the resulting objects...from the docs, I can see that lxml.objectify is what I'm suppossed to use, but I don't know how to proceed after: list = objectify.parse('myfile.xml')
Any help will be very much appreciated. Thanks.
A sample of the file (has about 100+ records) is this:
<store>
<book>
<publisher>Hodder &...</publisher>
<isbn>345123890</isbn>
<author>King</author>
<comments>
<comment rank='1'>Interesting</comment>
<comments>
<pages>200</pages>
</book>
<book>
<publisher>Penguin Books</publisher>
<isbn>9011238XX</isbn>
<author>Armstrong</author>
<comments />
<pages>150</pages>
</book>
</store>
From this, I want to do the following (something just as easy to write as Books.objects.all() and Books.object.get_object_or_404(isbn=selected) is most preferred ):
Display a list of all books with their respective attributes
Enable viewing of further details of a book by selecting it from the list

Firstly, "list" isn't a very good variable because it "shadows" the built-in type "list."
Now, say you have this xml:
<root>
<node1 val="foo">derp</node1>
<node2 val="bar" />
</root>
Now, you could do this:
root = objectify.parse("myfile.xml")
print root.node1.get("val") # prints "foo"
print root.node1.text # prints "derp"
print root.node2.get("val") # prints "bar"
Another tip: when you have lots of nodes with the same name, you can loop over them.
>>> xml = """<root>
<node val="foo">derp</node>
<node val="bar" />
</root>"""
>>> root = objectify.fromstring(xml)
>>> for node in root.node:
print node.get("val")
foo
bar
Edit
You should be able to simply set your django context to the books object, and use that from your templates.
context = dict(books = root.book,
# other stuff
)
And then you'll be able to iterate through the books in the template, and access each book object's attributes.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing nested attributes - python

Below is a working example import xml.etree.ElementTree as ET xml = '''<foo> <bar1 id="1"> <bar2> <foobar id="2">name1</foobar> <foobar id="3">name2</foobar> </bar2> </bar1> </foo>''' root = ET.fromstring(xml) ids = [f.attrib.get('id') for f in root.findall('.//foobar')] print(ids) output ['2','3']

You need to specify a working XPATH expression, like: foobars = source.findall('bar1/bar2/foobar') for elem in foobars: print(elem.get('id')) Output: 2 3

Related

XML accessing elements within the tree with Etree python

Is there a way to filter xml by an id attribute in elementtree using python

Parsing subchilds in XML with ElementTree

is there a way to get the attribute text directly in a xml without traversing through children in elementree in python

Convert XML to python objects using lxml

Categories

Resources