Issue with XML parsing in python

Issue with XML parsing in python - python

I'm having difficulty parsing an XML tree using xml.etree.ElementTree in Python. Basically, I'm making a request to an API that gives an XML response, and trying to extract the values of several elements in the tree.
This is what I've done so far with no success:
root = etree.fromstring(resp_arr[0])
walkscore = root.find('./walkscore')
Here is my XML tree:
<result>
<status>1</status>
<walkscore>95</walkscore>
<description>walker's paradise</description>
<updated>2009-12-25 03:40:16.006257</updated>
<logo_url>https://cdn.walk.sc/images/api-logo.png</logo_url>
<more_info_icon>https://cdn.walk.sc/images/api-more-info.gif</more_info_icon>
<ws_link>http://www.walkscore.com/score/1119-8th-Avenue-Seattle-WA-98101/lat=47.6085/lng=-122.3295/?utm_source=myrealtysite.com&utm_medium=ws_api&utm_campaign=ws_api</ws_link>
<help_link>https://www.redfin.com/how-walk-score-works</help_link>
<snapped_lat>47.6085</snapped_lat>
<snapped_lon>-122.3295</snapped_lon>
</result>
Essentially, I'm trying to pull the walkscores from the XML document but my code isn't returning a value. Does anyone with experience using ElementTree have any advice to help me extract the values I'm after?
Sam

Your XML appears to be malformed. But if I replace instances of & with &, then it's parseable:
>>> from xml.etree import ElementTree as ET
>>> tree = ET.fromstring(xml)
>>> tree.find('./walkscore').text
'95'

Related

How do I search for a Tag in xml file using ElementTree where i have prefixes (python)

I just started learning Python and have to write a program that parses xml files.
I have multiple entries as seen below and I need, as a starting point, to return all the different d:Name entries in a list.
Unfortunately, I can't manage to use findall with prefixes.
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
lst = tree.findall('.//{d}Name')
I read that if d is a prefix, I need to use the URI instead of a. But I don't understand which is the URI in my case, or how to make a successful search when i have the following file.
I have an XML that looks like this (simplified):
feed xml:base="http://projectserver/ps/_api/">
<entry>
<id>
http://projectserver/ps/_api/ProjectServer/EnterpriseResources('some id...')
</id>
<content type="application/xml">
<m:properties>
<d:Name>
WHAT I NEED
</d:Name>
</m:properties>
</content>
</entry>
<entry>
...
This bypassed my problem so thank you!
If you are using Python 3.8 or later, this post may help: link – Jim
Rhodes
So I ran the following which returned the list of tags, where i found the {URI}Name which I then used to do the search properly.
for elem in tree.iter():
print(elem.tag)

can we search multiple pattern using etree findall() in xml?

For my case, I have to find few elements in the XML file and update their values using the text attribute. For that, I have to search xml element A, B and C. My project is using xml.etree and python language. Currently I am using:
self.get_root.findall(H/A/T)
self.get_root.findall(H/B/T)
self.get_root.findall(H/C/T)
The sample XML file:
<H><A><T>text-i-have-to-update</H></A></T>
<H><B><T>text-i-have-to-update</H></B></T>
<H><C><T>text-i-have-to-update</H></C></T>
As we can notice, only the middle element in the path is different. Is there a way to optimize the code using something like self.get_root.findall(H|(A,B,C)|T)? Any guidance in the right direction will do! Thanks!
I went through the similar question: XPath to select multiple tags but it didn't work for my case
Update: maybe regular expression inside the findall()?

The html in your question is malformed; assuming it's properly formatted (like below), try this:
import xml.etree.ElementTree as ET
data = """<root>
<H><A><T>text-i-have-to-update</T></A></H>
<H><B><T>text-i-have-to-update</T></B></H>
<H><C><T>text-i-have-to-update</T></C></H>
</root>"""
doc = ET.fromstring(data)
for item in doc.findall('.//H//T'):
item.text = "modified text"
print(ET.tostring(doc).decode())
Output:
<root>
<H><A><T>modified text</T></A></H>
<H><B><T>modified text</T></B></H>
<H><C><T>modified text</T></C></H>
</root>

Parsing and extracting field values from XML files

I have a couple of GB of log files in the XML format. What I am interested in is finding the rows with a specific command and extract the user that ordered the command.
Which is to say I want to check a particular field in all rows for a specific value and then extract an unknown value from another field on the same line. How do I go about doing that? (I've tried turning to documentation and other sources without understanding how it works.)
I currently use Python 2.7.15, but if 3.* is better or easier in any way I'll use that.
Here's an example of a row in a logfile that I'm interested in:
<?xml version="1.0" encoding="UTF-8"?>
<IHEYr4>
<UserAuthenticated>
<LocalUsername>User1</LocalUsername>
<Action>Login</Action>
</UserAuthenticated>
<Host>192.168.1.15</Host>
<TimeStamp>2018-01-18T02:31:00</TimeStamp>
</IHEYr4>

Using ElementTree
Demo:
x = """<?xml version="1.0" encoding="UTF-8"?>
<IHEYr4>
<UserAuthenticated>
<LocalUsername>User1</LocalUsername>
<Action>Login</Action>
</UserAuthenticated>
<Host>192.168.1.15</Host>
<TimeStamp>2018-01-18T02:31:00</TimeStamp>
</IHEYr4>
"""
import xml.etree.ElementTree as ET
xmlVal = ET.fromstring(x)
if xmlVal.find("UserAuthenticated/Action").text == 'Login':
print(xmlVal.find("Host").text, xmlVal.find("TimeStamp").text)
Output:
('192.168.1.15', '2018-01-18T02:31:00')

Grab Content from XML using Python? almost there

I'm using ElementTree and I can get tags and attributes but not that actual content between elements.
from this XML:
<tag_name attrib="1">I WANT THIS INFO HERE</tag_name>
here's my python code:
import urllib2
import xml.etree.ElementTree as ET
XML = urllib2.urlopen("http://URL/file.xml")
Tree = ET.parse(XML)
for node in Tree.getiterator():
print node.tag, node.attrib
This prints most of the XML file, and I understand what 'tag' and 'attrib' are, but how do I get the 'Content'? I tried looking through ElementTree's docs, but I think this might be too basic of a question.

.text method should give you the required text value.
for node in Tree.getiterator():
print node.tag, node.attrib, node.text

Did you try XPath ?
There are a lot of libraries to extract content from tags with a very easy yet powerful syntax.
Here an example:
import XmlXPathSelector
xs = XmlXPathSelector(text="<tags>your xml</tags>")
print xs.select("//tag_name[#attrib='1']/text()").extract()

Python minidom: How to access an element

I'm working on parsing an XML-Sheet in Python. The XML has a structure like this:
<layer1>
<layer2>
<element>
<info1></info1>
</element>
<element>
<info1></info1>
</element>
<element>
<info1></info1>
</element>
</layer2>
</layer1>
Without layer2, I have no problems to acess the data in info1. But with layer2, I'm really in trouble. Their I can adress info1 with: root.firstChild.childNodes[0].childNodes[0].data
So my thought was, that I can do it similiar like this:root.firstChild.firstChild.childNodes[0].childNodes[0].data
########## Solution
So this is how I solved my problem:
from xml.etree import cElementTree as ET
from xml.etree import cElementTree as ET
tree = ET.parse("test.xml")
root = tree.getroot()
for elem in root.findall('./layer2/'):
for node in elem.findall('element/'):
x = node.find('info1').text
if x != "abc":
elem.remove(node)

Don't use the minidom API if you can help it. Use the ElementTree API instead; the xml.dom.minidom documentation explicitly states that:
Users who are not already proficient with the DOM should consider using the xml.etree.ElementTree module for their XML processing instead.
Here is a short sample that uses the ElementTree API to access your elements:
from xml.etree import ElementTree as ET
tree = ET.parse('inputfile.xml')
for info in tree.findall('.//element/info1'):
print info.text
This uses an XPath expression to list all info1 elements that are contained inside a element element, regardless of their position in the overall XML document.
If all you need is the first info1 element, use .find():
print tree.find('.//info1').text
With the DOM API, .firstChild could easily be a Text node instead of an Element node; you always need to loop over the .childNotes sequence to find the first Element match:
def findFirstElement(node):
for child in node.childNodes:
if child.nodeType == node.ELEMENT_NODE:
return child
but for your case, perhaps using .getElementsByTagName() suffices:
root.getElementsByTagName('info1').data

does this work? (im not amazing at python just a quick thought)
name[0].firstChild.nodeValue

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Issue with XML parsing in python - python

Your XML appears to be malformed. But if I replace instances of & with &, then it's parseable: >>> from xml.etree import ElementTree as ET >>> tree = ET.fromstring(xml) >>> tree.find('./walkscore').text '95'

Related

How do I search for a Tag in xml file using ElementTree where i have prefixes (python)

can we search multiple pattern using etree findall() in xml?

Parsing and extracting field values from XML files

Grab Content from XML using Python? almost there

Python minidom: How to access an element

Categories

Resources