xml.etree.ElementTree not finding all Elements in XML - python

I have the following XML file that I'm trying to iterate through using xml.etree:
<safetypadapiresponse><url></url><refcode /><status>SUCCESS</status><message><pcrs>
<pcr>
<eCase01m>1234</eCase01m>
<eProcedures03>12 Lead ECG Obtained</eProcedures03>
<eMedications03>Oxygen</eMedications03>
</pcr>
</pcrs></message></safetypadapiresponse>
I'm unable to find any of the child elements after 'message' with the following:
import xml.etree.ElementTree as ET
tree = ET.parse(xmlFile)
root = tree.getroot()
for member in root.findall('pcr'):
print(member)
The following child elements are listed when the following is run:
for member in root:
print(member)
Element 'url'
Element 'refcode'
Element 'status'
Element 'message'
I'm trying to retrieve all the information under the pcr element (i.e. eCase01m, eProcedures03, eMedications03).

You can use findall() in two ways. Unhelpfully this is mentioned in two different parts of the docs:
Element.findall() finds only elements with a tag which are direct
children of the current element.
...
Finds all matching subelements, by tag name or path. Returns a list
containing all matching elements in document order.
What this means is if you look for a tag, you are only searching the direct children of the current element.
You can use XPath instead to look for the parts you are interested in, which will recurse through the docs looking for matches. Either of the following should do:
root.findall('./message/pcrs/pcr') # Find them relative to this node
root.findall('.//pcr') # Find them anywhere below the current node

For the sake of completeness, let me add that you can also try xpath:
for i in tree.xpath('*//pcr/*'):
print(i.tag)
Output:
eCase01m
eProcedures03
eMedications03

Related

How i can use 'list' object on other attributes on xml.etree.ElementTree parse

Please tell me how to be? I want to receive the text and tags of all the children "alarmTime". However, I get an error. How to assign the attribute 'list'? Or tell me how to avoid the error.
My code is:
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
for elem in root.iter(tag ='alarmTime'):
data = elem.getchildren()
print(data.text)
Error is:
AttributeError: 'list' object has no attribute 'text'
The problem you're having is that getchildren() returns a list of the child elements, not a single element. So this list doesn't have the .text attribute. To access this, you now need to loop over this list (of child elements) and retrieve the text for each of them.
You can do this really easily with the ElementTree package, as it lets you treat the element like a list containing its child elements. So you don't need to use .getchildren to make a list of child elements - elem itself is already one. (Since this feature was introduced, .getchildren is deprecated, and you're encouraged to use the list-like features instead).
So your code should look something like this:
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
# find each alarmTime element
for elem in root.iter(tag ='alarmTime'):
# loop over its child elements
for child_elem in elem:
# print the tag and text
print(child_elem.tag)
print(child_elem.text)
Obviously, you might want to store this information, rather than just printing it. If you're expecting to find multiple copies of the alarmTime element, each with child elements whose tag and text you want to retrieve, I'd use a list of lists, with each child element stored as a tuple of its tag and text. I also normally use the term subelement rather than child element:
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
# main list for alarmTime data
alarmTime_list = []
for elem in root.iter(tag ='alarmTime'):
# create a list for data from this alarmTime element
data = []
# loop over subelements
for subelem in elem:
# add the subelement tag and text as a tuple
data.append((subelem.tag, subelem.text))
# add the set of data for this alarmTime element to the main list
alarmTime_list.append(data)
These references might be helpful for you:
I found this quite helpful as an introduction when I was learning to use ElementTree: http://effbot.org/zone/element.htm
This is more detailed, and also covers iter() and iterparse():
https://eli.thegreenplace.net/2012/03/15/processing-xml-in-python-with-elementtree
The official docs are quite good too:
https://docs.python.org/3.6/library/xml.etree.elementtree.html

Basic Python Parsing XML with xml.etree - Issue

I am trying to parse XML and am hard time having. I dont understand why the results keep printing [<Element 'Results' at 0x105fc6110>]
I am trying to extract Social from my example with the
import xml.etree.ElementTree as ET
root = ET.parse("test.xml")
results = root.findall("Results")
print results #[<Element 'Results' at 0x105fc6110>]
# WHAT IS THIS??
for result in results:
print result.find("Social") #None
the XML looks like this:
<?xml version="1.0"?>
<List1>
<NextOffset>AAA</NextOffset>
<Results>
<R>
<D>internet.com</D>
<META>
<Social>
<v>http://twitter.com/internet</v>
<v>http://facebook.com/internet</v>
</Social>
<Telephones>
<v>+1-555-555-6767</v>
</Telephones>
</META>
</R>
</Results>
</List1>
findall returns a list of xml.etree.ElementTree.Element objects. In your case, you only have 1 Result node, so you could use find to look for the first/unique match.
Once you got it, you have to use find using the .// syntax which allows to search in anywhere in the tree, not only the one directly under Result.
Once you found it, just findall on v tag and print the text:
import xml.etree.ElementTree as ET
root = ET.parse("test.xml")
result = root.find("Results")
social = result.find(".//Social")
for r in social.findall("v"):
print(r.text)
results in:
http://twitter.com/internet
http://facebook.com/internet
note that I did not perform validity check on the xml file. You should check if the find method returns None and handle the error accordignly.
Note that even though I'm not confident myself with xml format, I learned all that I know on parsing it by following this lxml tutorial.
results = root.findall("Results") is a list of xml.etree.ElementTree.Element objects.
type(results)
# list
type(results[0])
# xml.etree.ElementTree.Element
find and findall only look within first children. The iter method will iterate through matching sub-children at any level.
Option 1
If <Results> could potentially have more than one <Social> element, you could use this:
for result in results:
for soc in result.iter("Social"):
for link in soc.iter("v"):
print link.text
That's worst case scenario. If you know there'll be one <Social> per <Results> then it simplifies to:
for soc in root.iter("Social"):
for link in soc.iter("v"):
print link.text
both return
"http://twitter.com/internet"
"http://facebook.com/internet"
Option 2
Or use nested list comprehensions and do it with one line of code. Because Python...
socialLinks = [[v.text for v in soc] for soc in root.iter("Social")]
# socialLinks == [['http://twitter.com/internet', 'http://facebook.com/internet']]
socialLinks is list of lists. The outer list is of <Social> elements (only one in this example)Each inner list contains the text from the v elements within each particular <Social> element .

ElementTree XML API not matching subelement

I am attempting to use the USPS API to return the status of package tracking. I have a method that returns an ElementTree.Element object built from the XML string returned from the USPS API.
This is the returned XML string.
<?xml version="1.0" encoding="UTF-8"?>
<TrackResponse>
<TrackInfo ID="EJ958088694US">
<TrackSummary>The Postal Service could not locate the tracking information for your
request. Please verify your tracking number and try again later.</TrackSummary>
</TrackInfo>
</TrackResponse>
I format that into an Element object
response = xml.etree.ElementTree.fromstring(xml_str)
Now I can see in the xml string that the tag 'TrackSummary' exists and I would expect to be able to access that using ElementTree's find method.
As extra proof I can iterate over the response object and prove that the 'TrackSummary' tag exists.
for item in response.iter():
print(item, item.text)
returns:
<Element 'TrackResponse' at 0x00000000041B4B38> None
<Element 'TrackInfo' at 0x00000000041B4AE8> None
<Element 'TrackSummary' at 0x00000000041B4B88> The Postal Service could not locate the tracking information for your request. Please verify your tracking number and try again later.
So here is the problem.
print(response.find('TrackSummary')
returns
None
Am I missing something here? Seems like I should be able to find that child element without a problem?
import xml.etree.cElementTree as ET # 15 to 20 time faster
response = ET.fromstring(str)
Xpath Syntax
Selects all child elements. For example, */egg selects all grandchildren named egg.
element = response.findall('*/TrackSummary') # you will get a list
print element[0].text #fast print else iterate the list
>>> The Postal Service could not locate the tracking informationfor your request. Please verify your tracking number and try again later.
The .find() method only searches the next layer, not recursively. To search recursively, you need to use an XPath query. In XPath, the double slash // is a recursive search. Try this:
# returns a list of elements with tag TrackSummary
response.xpath('//TrackSummary')
# returns a list of the text contained in each TrackSummary tag
response.xpath('//TrackSummary/node()')

How to identify a specific XML element without ID attributes

I'm working with XML documents that have many <text> elements. But none of them have IDs or any other attribute--just the element tag. Is there any way I can use python to tell one of these elements from another (other than by the contents)? For example do elements have some inherent index number based on their position in the document or something like that?
If you have lxml ElementTree, and you want to get details for a particular element:
>>> element
<Element e at 0x7f71068abf38>
You can find index of element inside parent and full path of element:
>>> element.getparent().index(element)
0
>>> element.getroottree().getpath(element)
'/root/e[1]'
That's all you have. For more sophisticated info (such as "global index" of element in the whole document) you should write custom code.

Python comparing XML output to a list

I have an XML that looks something like this:
<Import>
<spId>1234</spId>
<GroupFlag>false</GroupFlag>
</Import>
I want to extract the value of spId and compare it with a list and I have the following script:
import xml.etree.ElementTree as ET
xml_file = "c:/somefile.xml"
sp_id_list = ['1234']
tree = ET.parse(xml_file)
root = tree.getroot()
for sp_id in root.findall('./spId'):
if sp_id.text in sp_id_list:
print sp_id.text
This doesn't work for spId (numeric) but works for comparing GroupFlag (string) with a list. Why is this happening and how can I rectify this problem?
Sorry for the stupid question, I am a noob to this.
Your code example works correctly if your XML sample posted here is given as input XML file.
However you want to find all elements. So, I assume that your real document has many <Import> items. If a list of items is not wrapped by some parent tag it is not a valid XML. In that case you would have xml.etree.ElementTree.ParseError.
So, I assume that in your real document <Import> is not a root element and <Import> elements are somewhere deeper in the document, for example
<Parent>
<Import>
<spId>1234</spId>
<GroupFlag>false</GroupFlag>
</Import>
<Import>
<spId>1234</spId>
<GroupFlag>false</GroupFlag>
</Import>
</Parent>
In that case the search pattern './spId' cannot find those tags, since that pattern matches only direct children of the root element. So, you can use XPath matching tags all levels beneath or even better pointing direct path from the root to the level where spId is located:
# all subelements, on all levels beneath the current element
root.findall('.//spId')
# all spId elements directly in Import tags that are directly
# beneath the root element (as in the above XML example)
root.findall('./Import/spId'):

Categories