How to get the child of child using Python's ElementTree

How to get the child of child using Python's ElementTree - python

I'm building a Python file that communicates with a PLC. When compiling, the PLC creates a XML file that delivers important information about the program. The XML looks more less like this:
<visu>
<time>12:34</time>
<name>my_visu</name>
<language>english</language>
<vars>
<var name="input1">2</var>
<var name="input2">45.6</var>
<var name="input3">"hello"</var>
</vars>
</visu>
The important part is found under child "vars". Using Python I want to make a file that when sending argument "input2" it will print "45.6".
So far I can read all children of "visu", but don't know how to actually tell Python to search among "the child of child". Here's is what I got so far:
tree = ET.parse("file.xml")
root = tree.getroot()
for child in root:
if child.tag == "vars":
.......
if ( "childchild".attrib.get("name") == "input2" ):
print "childchild".text
Any ideas how I can complete the script? (or maybe a more efficient way of programming it?)

You'd be better of using an XPath search here:
name = 'input2'
value = root.find('.//vars/var[#name="{}"]'.format(name)).text
This searches for a <var> tag directly below a <vars> tag, whose attribute name is equal to the value given by the Python name variable, then retrieves the text value of that tag.
Demo:
>>> from xml.etree import ElementTree as ET
>>> sample = '''\
... <visu>
... <time>12:34</time>
... <name>my_visu</name>
... <language>english</language>
... <vars>
... <var name="input1">2</var>
... <var name="input2">45.6</var>
... <var name="input3">"hello"</var>
... </vars>
... </visu>
... '''
>>> root = ET.fromstring(sample)
>>> name = 'input2'
>>> root.find('.//vars/var[#name="{}"]'.format(name)).text
'45.6'
You can do this the hard way and manually loop over all the elements; each element can be looped over directly:
name = 'input2'
for elem in root:
if elem.tag == 'vars':
for var in elem:
if var.attrib.get('name') == name:
print var.text
but using element.find() or element.find_all() is probably going to be easier and more concise.

Related

Is there something wrong with my script or the XML file? I am using ElementTree in attempt to get out child attributes

This is a shorted version of the XML file that I am trying to parse:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<TipsContents xmlns="http://www.avendasys.com/tipsapiDefs/1.0">
<TipsHeader exportTime="Mon May 04 20:05:47 SAST 2020" version="6.8"/>
<Endpoints>
<Endpoint macVendor="SHENZHEN RF-LINK TECHNOLOGY CO.,LTD." macAddress="c46e7b2939cb" status="Known">
<EndpointProfile updatedAt="May 04, 2020 10:02:21 SAST" profiledBy="Policy Manager" addedAt="Mar 04, 2020 17:31:53 SAST" fingerprint="{}" conflict="false" name="Windows" family="Windows" category="Computer" staticIP="false" ipAddress="xxx.xxx.xxx.xxx"/>
<EndpointTags tagName="Username" tagValue="xxxxxxxx"/>
<EndpointTags tagName="Disabled Reason" tagValue="IS_ACTIVE"/>
</Endpoint>
</Endpoints>
<TagDictionaries>
<TagDictionary allowMultiple="false" mandatory="true" defaultValue="false" dataType="Boolean" attributeName="DOMAIN-MACHINES" entityName="Endpoint"/>
<TagDictionary allowMultiple="false" mandatory="true" defaultValue="true" dataType="Boolean" attributeName="IS_ACTIVE" entityName="Endpoint"/>
<TagDictionary allowMultiple="true" mandatory="false" dataType="String" attributeName="Disabled Reason" entityName="Endpoint"/>
<TagDictionary allowMultiple="false" mandatory="false" dataType="String" attributeName="Username" entityName="Endpoint"/>
</TagDictionaries>
</TipsContents>
I run the following script:
import xml.etree.ElementTree as ET
f = open("Endpoint-5.xml", 'r')
tree = ET.parse(f)
root = tree.getroot()
This is what my outputs look like:
In [8]: root = tree.getroot()
In [9]: root.findall('.')
Out[9]: [<Element '{http://www.avendasys.com/tipsapiDefs/1.0}TipsContents' at 0x10874b410>]
In [10]: root.findall('./TipsHeader')
Out[10]: []
In [11]: root.findall('./TipsContents')
Out[11]: []
In [15]: root.findall('{http://www.avendasys.com/tipsapiDefs/1.0}TipsContents//Endpoints/Endpoint/EndpointProfile')
Out[15]: []
I have been following this: https://docs.python.org/3/library/xml.etree.elementtree.html#example
among other tutorials but I don't seem to get an output.
I have tried from lxml import html
My script is as follows:
tree = html.fromstring(html=f)
updatedAt = tree.xpath("//TipsContents/Endpoints/Endpoint/EndpointProfile/#updatedAt")
name = tree.xpath("//TipsContents/Endpoints/Endpoint/EndpointProfile/#name")
category = tree.xpath("//TipsContents/Endpoints/Endpoint/EndpointProfile/#category")
tagValue = tree.xpath("//TipsContents/Endpoints/Endpoint/EndpointTags[#tagName = 'Username']/#tagValue")
active = tree.xpath("//TipsContents/Endpoints/Endpoint/EndpointTags[#tagName = 'Disabled Reason']/#tagValue")
print("Name:",name)
The above attempt also returns nothing.
I am able to parse an XML document from an API and use the second attempt successfully but when I am doing this from a local file I do not get the results.
Any assistance will be appreciated.

Note that your input XML contains a default namespace, so to refer to
any element you have to specify the namespace.
One of methods to do it is to define a dictionary of namespaces
(shortcut : full_name), in your case:
ns = {'tips': 'http://www.avendasys.com/tipsapiDefs/1.0'}
Then, using findall:
use the appropriate shortcut before the element name (and ':'),
pass the namespace dictionary as the second argument.
The code to do it is:
for elem in root.findall('./tips:TipsHeader', ns):
print(elem.attrib)
The result, for your input sample, is:
{'exportTime': 'Mon May 04 20:05:47 SAST 2020', 'version': '6.8'}
As far as root.findall('./TipsContents') is concerned, it will return
an empty list, even if you specify the namespace as above.
The reason is that TipsContents is the name of the root node,
whereas you attempt to find an element with the same name below in
the XML tree, but it contains no such element.
If you want to access attributes of the root element, you can run:
print(root.attrib)
but to get something more than an empty dictionary, you have to add
some attributes to the root element (namespace is not an attribute).

XML reading the last entry from the file in python

I want to read the last entry of the xml file and get its value. Here is my xml file
<TestSuite>
<TestCase>
<name>tcname1</name>
<total>1</total>
<totalpass>0</totalpass>
<totalfail>0</totalfail>
<totalerror>1</totalerror>
</TestCase>
<TestCase>
<name>tcname2</name>
<total>1</total>
<totalpass>0</totalpass>
<totalfail>0</totalfail>
<totalerror>1</totalerror>
</TestCase>
</TestSuite>
I want to get the <total> , <totalpass>,<totalfail> and <totalerror> value in the last tag of the file. I have tried this code to do that.
import xmltodict
with open(filename) as fd:
doc = xmltodict.parse(fd.read())
length=len(doc['TestSuite']['TestCase'])
tp=doc['TestSuite']['TestCase'][length-1]['totalpass']
tf=doc['TestSuite']['TestCase'][length-1]['totalfail']
te=doc['TestSuite']['TestCase'][length-1]['totalerror']
total=doc['TestSuite']['TestCase'][length-1]['total']
This works for the xml with 2 or more testcase tags in xml files , But fails with this error for the file with only one testcase tag .
Traceback (most recent call last):
File "HTMLReportGenerationFromXML.py", line 52, in <module>
tp=doc['TestSuite']['TestCase'][length-1]['totalpass']
KeyError: 4 .
Because instead of the count , it is taking the subtag ( etc value as length). Please help me resolve this issue.

Since you only want the last one, you can use negative indices to retrieve it:
import xml.etree.ElementTree as et
tree = et.parse('test.xml')
# collect all the test cases
test_cases = [test_case for test_case in tree.findall('TestCase')]
# Pull data from the last one
last = test_cases[-1]
total = last.find('total').text
totalpass = last.find('totalpass').text
totalfail = last.find('totalfail').text
totalerror = last.find('totalerror').text
print total,totalpass,totalfail,totalerror

Why didn't I do t his in the first place! Use xpath.
The first example involves processing the xml file with just one TestCase element, the second with two of them. The key point is to use the xpath last selector.
>>> from lxml import etree
>>> tree = etree.parse('temp.xml')
>>> last_TestCase = tree.xpath('.//TestCase[last()]')[0]
>>> for child in last_TestCase.iterchildren():
... child.tag, child.text
...
('name', 'tcname2')
('total', '1')
('totalpass', '0')
('totalfail', '0')
('totalerror', '1')
>>>
>>> tree = etree.parse('temp_2.xml')
>>> last_TestCase = tree.xpath('.//TestCase[last()]')[0]
>>> for child in last_TestCase.iterchildren():
... child.tag, child.text
...
('name', 'tcname1')
('reason', 'reason')
('total', '2')
('totalpass', '0')
('totalfail', '0')
('totalerror', '2')

I have tried this this works for me
import xml.etree.ElementTree as ET
import sys
tree = ET.parse('temp.xml')
root = tree.getroot()
print root
total=[]
totalpass=[]
totalfail=[]
totalerror=[]
for test in root.findall('TestCase'):
total.append(test.find('total').text)
totalpass.append(test.find('totalpass').text)
totalfail.append(test.find('totalfail').text)
totalerror.append(test.find('totalerror').text)
length=len(total)
print total[length-1],totalpass[length-1],totalfail[length-1],totalerror[length-1]
This one works for me

The reason of your error is that with xmltidict doc['TestSuite']['TestCase'] is a list just for long XMLs
>>> type(doc2['TestSuite']['TestCase']) # here doc2 is more than one-entry long XML file
>>> list
but it is just a kind of dictionary for a one-entry long file:
>>> type(doc['TestSuite']['TestCase']) # doc is one-entry long
>>> collections.OrderedDict
That's the reason. You could try to manage the issue in the following way:
import xmltodict
with open(filename) as fd:
doc = xmltodict.parse(fd.read())
if type(doc['TestSuite']['TestCase']) == list:
tp=doc['TestSuite']['TestCase'][length-1]['totalpass']
tf=doc['TestSuite']['TestCase'][length-1]['totalfail']
te=doc['TestSuite']['TestCase'][length-1]['totalerror']
total=doc['TestSuite']['TestCase'][length-1]['total']
else: # you have just a dict here
tp=doc['TestSuite']['TestCase']['totalpass']
tf=doc['TestSuite']['TestCase']['totalfail']
te=doc['TestSuite']['TestCase']['totalerror']
total=doc['TestSuite']['TestCase']['total']
Otherwise, you can use another library for the XML parsing.
...let me know if it helps!

Using "info.get" for a child element in Python / lxml

I'm trying to get the attribute of a child element in Python, using lxml.
This is the structure of the xml:
<GroupInformation groupId="crid://thing.com/654321" ordered="true">
<GroupType value="show" xsi:type="ProgramGroupTypeType"/>
<BasicDescription>
<Title type="main" xml:lang="EN">A programme</Title>
<RelatedMaterial>
<HowRelated href="urn:eventis:metadata:cs:HowRelatedCS:2010:boxCover">
<Name>Box cover</Name>
</HowRelated>
<MediaLocator>
<mpeg7:MediaUri>file://ftp.something.com/Images/123456.jpg</mpeg7:MediaUri>
</MediaLocator>
</RelatedMaterial>
</BasicDescription>
The code I've got is below. The bit I want to return is the 'value' attribute ("Show" in the example) under 'grouptype' (third line from the bottom):
file_name = input('Enter the file name, including .xml extension: ')
print('Parsing ' + file_name)
from lxml import etree
parser = etree.XMLParser()
tree = etree.parse(file_name, parser)
root = tree.getroot()
nsmap = {'xmlns': 'urn:tva:metadata:2010','mpeg7':'urn:tva:mpeg7:2008'}
with open(file_name+'.log', 'w', encoding='utf-8') as f:
for info in root.xpath('//xmlns:GroupInformation', namespaces=nsmap):
crid = info.get('groupId'))
grouptype = info.find('.//xmlns:GroupType', namespaces=nsmap)
gtype = grouptype.get('value')
titlex = info.find('.//xmlns:BasicDescription/xmlns:Title', namespaces=nsmap)
title = titlex.text if titlex != None else 'Missing'
Can anyone explain to me how to implement it? I had a quick look at the xsi namespace, but was unable to get it to work (and didn't know if it was the right thing to do).

Is this what you are looking for?
grouptype.attrib['value']
PS: why the parenthesis around assignment values? Those look unnecessary.

How should I parse this xml string in python?

My XML string is -
xmlData = """<SMSResponse xmlns="http://example.com" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Cancelled>false</Cancelled>
<MessageID>00000000-0000-0000-0000-000000000000</MessageID>
<Queued>false</Queued>
<SMSError>NoError</SMSError>
<SMSIncomingMessages i:nil="true"/>
<Sent>false</Sent>
<SentDateTime>0001-01-01T00:00:00</SentDateTime>
</SMSResponse>"""
I am trying to parse and get the values of tags - Cancelled, MessageId, SMSError, etc. I am using python's Elementtree library. So far, I have tried things like -
root = ET.fromstring(xmlData)
print root.find('Sent') // gives None
for child in root:
print chil.find('MessageId') // also gives None
Although, I am able to print the tags with -
for child in root:
print child.tag
//child.tag for the tag Cancelled is - {http://example.com}Cancelled
and their respective values with -
for child in root:
print child.text
How do I get something like -
print child.Queued // will print false
Like in PHP we can access them with the root -
$xml = simplexml_load_string($data);
$status = $xml->SMSError;

Your document has a namespace on it, you need to include the namespace when searching:
root = ET.fromstring(xmlData)
print root.find('{http://example.com}Sent',)
print root.find('{http://example.com}MessageID')
output:
<Element '{http://example.com}Sent' at 0x1043e0690>
<Element '{http://example.com}MessageID' at 0x1043e0350>
The find() and findall() methods also take a namespace map; you can search for a arbitrary prefix, and the prefix will be looked up in that map, to save typing:
nsmap = {'n': 'http://example.com'}
print root.find('n:Sent', namespaces=nsmap)
print root.find('n:MessageID', namespaces=nsmap)

If you're set on Python standard XML libraries, you could use something like this:
root = ET.fromstring(xmlData)
namespace = 'http://example.com'
def query(tree, nodename):
return tree.find('{{{ex}}}{nodename}'.format(ex=namespace, nodename=nodename))
queued = query(root, 'Queued')
print queued.text

You can create a dictionary and directly get values out of it...
tree = ET.fromstring(xmlData)
root = {}
for child in tree:
root[child.tag.split("}")[1]] = child.text
print root["Queued"]

With lxml.etree:
In [8]: import lxml.etree as et
In [9]: doc=et.fromstring(xmlData)
In [10]: ns={'n':'http://example.com'}
In [11]: doc.xpath('n:Queued/text()',namespaces=ns)
Out[11]: ['false']
With elementtree you can do:
import xml.etree.ElementTree as ET
root=ET.fromstring(xmlData)
ns={'n':'http://example.com'}
root.find('n:Queued',namespaces=ns).text
Out[13]: 'false'

Xpath select attribute of current node?

I use python with lxml to process the xml. After I query/filter to get to a nodes I want but I have some problem. How to get its attribute's value by xpath ? Here is my input example.
>print(etree.tostring(node, pretty_print=True ))
<rdf:li xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdf:resource="urn:miriam:obo.chebi:CHEBI%3A37671"/>
The value I want is in resource=... . Currently I just use the lxml to get the value. I wonder if it is possible to do in pure xpath ? thanks
EDIT: Forgot to said, this is not a root nodes so I can't use // here. I have like 2000-3000 others in xml file. My first attempt was playing around with ".#attrib" and "self::*#" but those does not seems to work.
EDIT2: I will try my best to explain (well, this is my first time to deal with xml problem using xpath. and english is not one of my favorite field....). Here is my input snippet http://pastebin.com/kZmVdbQQ (full one from here http://www.comp-sys-bio.org/yeastnet/ using version 4).
In my code, I try to get speciesTypes node with resource link chebi (<rdf:li rdf:resource="urn:miriam:obo.chebi:...."/>). and then I tried to get value from rdf:resource attribute in rdf:li. The thing is, I am pretty sure it would be easy to get attribute in child node if I start from parent node like speciesTypes, but I wonder how to do if I start from rdf:li. From my understanding, the "//" in xpath will looking for node from everywhere not just only in the current node.
below is my code
import lxml.etree as etree
tree = etree.parse("yeast_4.02.xml")
root = tree.getroot()
ns = {"sbml": "http://www.sbml.org/sbml/level2/version4",
"rdf":"http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"body":"http://www.w3.org/1999/xhtml",
"re": "http://exslt.org/regular-expressions"
}
#good enough for now
maybemeta = root.xpath("//sbml:speciesType[descendant::rdf:li[starts-with(#rdf:resource, 'urn:miriam:obo.chebi') and not(starts-with(#rdf:resource, 'urn:miriam:uniprot'))]]", namespaces = ns)
def extract_name_and_chebi(node):
name = node.attrib['name']
chebies = node.xpath("./sbml:annotation//rdf:li[starts-with(#rdf:resource, 'urn:miriam:obo.chebi') and not(starts-with(#rdf:resource, 'urn:miriam:uniprot'))]", namespaces=ns) #get all rdf:li node with chebi resource
assert len(chebies) == 1
#my current solution to get rdf:resource value from rdf:li node
rdfNS = "{" + ns.get('rdf') + "}"
chebi = chebies[0].attrib[rdfNS + 'resource']
#do protein later
return (name, chebi)
metaWithChebi = map(extract_name_and_chebi, maybemeta)
fo = open("metabolites.txt", "w")
for name, chebi in metaWithChebi:
fo.write("{0}\t{1}\n".format(name, chebi))

Prefix the attribute name with # in the XPath query:
>>> from lxml import etree
>>> xml = """\
... <?xml version="1.0" encoding="utf8"?>
... <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
... <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A37671"/>
... </rdf:RDF>
... """
>>> tree = etree.fromstring(xml)
>>> ns = {'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'}
>>> tree.xpath('//rdf:li/#rdf:resource', namespaces=ns)
['urn:miriam:obo.chebi:CHEBI%3A37671']
EDIT
Here's a revised version of the script in the question:
import lxml.etree as etree
ns = {
'sbml': 'http://www.sbml.org/sbml/level2/version4',
'rdf':'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'body':'http://www.w3.org/1999/xhtml',
're': 'http://exslt.org/regular-expressions',
}
def extract_name_and_chebi(node):
chebies = node.xpath("""
.//rdf:li[
starts-with(#rdf:resource, 'urn:miriam:obo.chebi')
]/#rdf:resource
""", namespaces=ns)
return node.attrib['name'], chebies[0]
with open('yeast_4.02.xml') as xml:
tree = etree.parse(xml)
maybemeta = tree.xpath("""
//sbml:speciesType[descendant::rdf:li[
starts-with(#rdf:resource, 'urn:miriam:obo.chebi')]]
""", namespaces = ns)
with open('metabolites.txt', 'w') as output:
for node in maybemeta:
output.write('%s\t%s\n' % extract_name_and_chebi(node))

To select off the current node its attribute named rdf:resource, use this XPath expression:
#rdf:resource
In order for this to "work correctly" you must register the association of the prefix "rdf:" to the corresponding namespace.
If you don't know how to register the rdf namespace, it is still possible to select the attribute -- with this XPath expression:
#*[name()='rdf:resource']

Well, I got it. The xpath expression I need here is "./#rdf:resource" not ".#rdf:resource". But why ? I thought "./" indicate the child of current node.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get the child of child using Python's ElementTree - python

Related

Is there something wrong with my script or the XML file? I am using ElementTree in attempt to get out child attributes

XML reading the last entry from the file in python

Using "info.get" for a child element in Python / lxml

How should I parse this xml string in python?

Xpath select attribute of current node?

Categories

Resources