I'm having difficulty parsing an XML tree using xml.etree.ElementTree in Python. Basically, I'm making a request to an API that gives an XML response, and trying to extract the values of several elements in the tree.
This is what I've done so far with no success:
root = etree.fromstring(resp_arr[0])
walkscore = root.find('./walkscore')
Here is my XML tree:
<result>
<status>1</status>
<walkscore>95</walkscore>
<description>walker's paradise</description>
<updated>2009-12-25 03:40:16.006257</updated>
<logo_url>https://cdn.walk.sc/images/api-logo.png</logo_url>
<more_info_icon>https://cdn.walk.sc/images/api-more-info.gif</more_info_icon>
<ws_link>http://www.walkscore.com/score/1119-8th-Avenue-Seattle-WA-98101/lat=47.6085/lng=-122.3295/?utm_source=myrealtysite.com&utm_medium=ws_api&utm_campaign=ws_api</ws_link>
<help_link>https://www.redfin.com/how-walk-score-works</help_link>
<snapped_lat>47.6085</snapped_lat>
<snapped_lon>-122.3295</snapped_lon>
</result>
Essentially, I'm trying to pull the walkscores from the XML document but my code isn't returning a value. Does anyone with experience using ElementTree have any advice to help me extract the values I'm after?
Sam
Your XML appears to be malformed. But if I replace instances of & with &, then it's parseable:
>>> from xml.etree import ElementTree as ET
>>> tree = ET.fromstring(xml)
>>> tree.find('./walkscore').text
'95'
Related
I just started learning Python and have to write a program that parses xml files.
I have multiple entries as seen below and I need, as a starting point, to return all the different d:Name entries in a list.
Unfortunately, I can't manage to use findall with prefixes.
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
lst = tree.findall('.//{d}Name')
I read that if d is a prefix, I need to use the URI instead of a. But I don't understand which is the URI in my case, or how to make a successful search when i have the following file.
I have an XML that looks like this (simplified):
feed xml:base="http://projectserver/ps/_api/">
<entry>
<id>
http://projectserver/ps/_api/ProjectServer/EnterpriseResources('some id...')
</id>
<content type="application/xml">
<m:properties>
<d:Name>
WHAT I NEED
</d:Name>
</m:properties>
</content>
</entry>
<entry>
...
This bypassed my problem so thank you!
If you are using Python 3.8 or later, this post may help: link – Jim
Rhodes
So I ran the following which returned the list of tags, where i found the {URI}Name which I then used to do the search properly.
for elem in tree.iter():
print(elem.tag)
For my case, I have to find few elements in the XML file and update their values using the text attribute. For that, I have to search xml element A, B and C. My project is using xml.etree and python language. Currently I am using:
self.get_root.findall(H/A/T)
self.get_root.findall(H/B/T)
self.get_root.findall(H/C/T)
The sample XML file:
<H><A><T>text-i-have-to-update</H></A></T>
<H><B><T>text-i-have-to-update</H></B></T>
<H><C><T>text-i-have-to-update</H></C></T>
As we can notice, only the middle element in the path is different. Is there a way to optimize the code using something like self.get_root.findall(H|(A,B,C)|T)? Any guidance in the right direction will do! Thanks!
I went through the similar question: XPath to select multiple tags but it didn't work for my case
Update: maybe regular expression inside the findall()?
The html in your question is malformed; assuming it's properly formatted (like below), try this:
import xml.etree.ElementTree as ET
data = """<root>
<H><A><T>text-i-have-to-update</T></A></H>
<H><B><T>text-i-have-to-update</T></B></H>
<H><C><T>text-i-have-to-update</T></C></H>
</root>"""
doc = ET.fromstring(data)
for item in doc.findall('.//H//T'):
item.text = "modified text"
print(ET.tostring(doc).decode())
Output:
<root>
<H><A><T>modified text</T></A></H>
<H><B><T>modified text</T></B></H>
<H><C><T>modified text</T></C></H>
</root>
I have a couple of GB of log files in the XML format. What I am interested in is finding the rows with a specific command and extract the user that ordered the command.
Which is to say I want to check a particular field in all rows for a specific value and then extract an unknown value from another field on the same line. How do I go about doing that? (I've tried turning to documentation and other sources without understanding how it works.)
I currently use Python 2.7.15, but if 3.* is better or easier in any way I'll use that.
Here's an example of a row in a logfile that I'm interested in:
<?xml version="1.0" encoding="UTF-8"?>
<IHEYr4>
<UserAuthenticated>
<LocalUsername>User1</LocalUsername>
<Action>Login</Action>
</UserAuthenticated>
<Host>192.168.1.15</Host>
<TimeStamp>2018-01-18T02:31:00</TimeStamp>
</IHEYr4>
Using ElementTree
Demo:
x = """<?xml version="1.0" encoding="UTF-8"?>
<IHEYr4>
<UserAuthenticated>
<LocalUsername>User1</LocalUsername>
<Action>Login</Action>
</UserAuthenticated>
<Host>192.168.1.15</Host>
<TimeStamp>2018-01-18T02:31:00</TimeStamp>
</IHEYr4>
"""
import xml.etree.ElementTree as ET
xmlVal = ET.fromstring(x)
if xmlVal.find("UserAuthenticated/Action").text == 'Login':
print(xmlVal.find("Host").text, xmlVal.find("TimeStamp").text)
Output:
('192.168.1.15', '2018-01-18T02:31:00')
I'm using ElementTree and I can get tags and attributes but not that actual content between elements.
from this XML:
<tag_name attrib="1">I WANT THIS INFO HERE</tag_name>
here's my python code:
import urllib2
import xml.etree.ElementTree as ET
XML = urllib2.urlopen("http://URL/file.xml")
Tree = ET.parse(XML)
for node in Tree.getiterator():
print node.tag, node.attrib
This prints most of the XML file, and I understand what 'tag' and 'attrib' are, but how do I get the 'Content'? I tried looking through ElementTree's docs, but I think this might be too basic of a question.
.text method should give you the required text value.
for node in Tree.getiterator():
print node.tag, node.attrib, node.text
Did you try XPath ?
There are a lot of libraries to extract content from tags with a very easy yet powerful syntax.
Here an example:
import XmlXPathSelector
xs = XmlXPathSelector(text="<tags>your xml</tags>")
print xs.select("//tag_name[#attrib='1']/text()").extract()
I'm working on parsing an XML-Sheet in Python. The XML has a structure like this:
<layer1>
<layer2>
<element>
<info1></info1>
</element>
<element>
<info1></info1>
</element>
<element>
<info1></info1>
</element>
</layer2>
</layer1>
Without layer2, I have no problems to acess the data in info1. But with layer2, I'm really in trouble. Their I can adress info1 with: root.firstChild.childNodes[0].childNodes[0].data
So my thought was, that I can do it similiar like this:root.firstChild.firstChild.childNodes[0].childNodes[0].data
########## Solution
So this is how I solved my problem:
from xml.etree import cElementTree as ET
from xml.etree import cElementTree as ET
tree = ET.parse("test.xml")
root = tree.getroot()
for elem in root.findall('./layer2/'):
for node in elem.findall('element/'):
x = node.find('info1').text
if x != "abc":
elem.remove(node)
Don't use the minidom API if you can help it. Use the ElementTree API instead; the xml.dom.minidom documentation explicitly states that:
Users who are not already proficient with the DOM should consider using the xml.etree.ElementTree module for their XML processing instead.
Here is a short sample that uses the ElementTree API to access your elements:
from xml.etree import ElementTree as ET
tree = ET.parse('inputfile.xml')
for info in tree.findall('.//element/info1'):
print info.text
This uses an XPath expression to list all info1 elements that are contained inside a element element, regardless of their position in the overall XML document.
If all you need is the first info1 element, use .find():
print tree.find('.//info1').text
With the DOM API, .firstChild could easily be a Text node instead of an Element node; you always need to loop over the .childNotes sequence to find the first Element match:
def findFirstElement(node):
for child in node.childNodes:
if child.nodeType == node.ELEMENT_NODE:
return child
but for your case, perhaps using .getElementsByTagName() suffices:
root.getElementsByTagName('info1').data
does this work? (im not amazing at python just a quick thought)
name[0].firstChild.nodeValue