Parsing XML Attribute Names - python

Python, currently using 2.7 but can easily change to latest and greatest.
Needing to parse this XML and return the INT value contained within the item. This isn't my XML. This is coming from a piece of enterprise level software.
<counters>
<item name="stats/counters/session/responsetime" type="int">1047</item>
<item name="stats/counters/session/responsecount" type="int">7423</item>
<item name="stats/counters/init/inittime" type="int">36339</item>
<item name="stats/counters/init/fetchtime" type="int">8097</item>
<item name="stats/connectionsetups" type="int">579</item>
<item name="stats/activesessions" type="int">4294967289</item>
<item name="stats/activeconnections" type="int">0</item>
</counters>
Code:
import xml.etree.ElementTree as ET
import xml
def _getstats():
resp = requests.get(urlStats)
#Writing XML to disk. This makes parsing it MUCH easier.
with open('stats_10.xml', 'wb') as f:
f.write(resp.content)
f.close()
tree = ET.parse('stats_10.xml')
root = tree.getroot()
active = root.find('stats/activesessions')
print active
The return is always None. I'm Using ElementTree. Read through the documentation (https://docs.python.org/3.0/library/xml.etree.elementtree.html) and many StackOF pages.
I think the problem is that the parser doesn't understand the slash.
Attempted to pull by name using "active = int(root['stats/activesessions'])" in place of root find which returns this error:
TypeError: list indices must be integers, not str
Also tried xmltodict but that was even worse that using ElementTree. The error would always be 'list indices must be integers'.
Lastly, this is a dynamic XML document. Indexing by ROW is not an option because at idle, the software returns 10 rows for example and under a load it return 15, with additional rows being mixed with the other rows. I have to pull by child name.
Thank you in advance for any assistance!
ADDITION:
I can run an iteration through the XML and pull the value. However, as stated above, the XML will change and the number of rows will increase, thus throwing my indices off.
active = root[5].text
print active

I believe the find method is looking for a tag name, not an attribute value. You need to find the item tag, check if it has a name attribute, and then check if the attribute equals "stats/activesessions". If this condition is met, you can read in the value of the item tag.

This is obviously me not understanding XML and how it's structured. Added this in my code and I get the return value I'm looking for.
for item in root.findall("./item[#name='system/starttime']"):
starttime = int(item.text)

Related

How to append data to a parsed XML object - Python

I am trying to take an xml document parsed with lxml objectify in python and add subelements to it.
The problem is that I can't work out how to do this. The only real option I've found is a complete reconstruction of the data with objectify.Element and objectify.SubElement, but that... doesn't make any sense.
With JSON for instance, I can just import the data as an object and navigate to any section and read, add and edit data super easily.
How do I parse an xml document with objectify so that I can add subelement data to it?
data.xml
<?xml version='1.0' encoding='UTF-8'?>
<data>
<items>
<item> text_1 </item>
<item> text_2 </item>
</items>
</data>
I'm sure there is an answer on how to do this online but my search terminology is either bad or im dumb, but I can't find a solution to this question. I'm sorry if this is a duplicate question.
I guess it has been quite difficult explain the question, but the problem can essentially be defined by an ambiguity with how .Element and .SubElement can be applied.
This reference contains actionable and replicable ways in which one can append or add data to a parsed XML file.
Solving the key problem of:
How do I reference content in a nested tag without reconstructing the entire tree?
The author uses the find feature, which is called with root.findall(./tag)
This allows one to find the nested tag that they wanted without having to reconstruct the tree.
Here is one of the examples that they have used:
cost = ["$2000","$3000","$4000")
traveltime = ["3hrs", "8hrs", "15hrs"]
i = 0
for elm in root.findall("./country"):
ET.SubElement(elm, "Holiday", attrib={"fight_cost": cost[i],
"trip_duration": traveltime[i]})
i += 1
This example also answers the question of How do you dynamically add data to XML?
They have accomplished this by using a list outside of the loop and ieteration through it with i
In essence, this example helps explain how to reference nested content without remaking the entire tree, as well as how to dynamically add data to xml trees.

Parsing XML in python: selecting an attribute given that a child node has a specific attribute

Given the xml
xmlstr = '''
<myxml>
<Description id="10">
<child info="myurl"/>
</Description>
</myxml>'
I'd like to get the id of Description only where child has an attribute of info.
import xml.etree.ElementTree as ET
root = ET.fromstring(xmlstr)
a = root.find(".//Description/[child/#info]")
print(a.attrib)
and changing the find to .//Description/[child[#info]]
both return an error of:
SyntaxError: invalid predicate
I know that etree only supports a subset of xpath, but this doesn't seem particularly weird - should this work? If so, what have I done wrong?!
Changing the find to .//Description/[child] does work, and returns
{'id': '10'}
as expected
You've definitely hit that XPath limited support limitation as, if we look at the source directly (looking at 3.7 source code), we could see that while parsing the Element Path expression, only these things in the filters are considered:
[#attribute] predicate
[#attribute='value']
[tag]
[.='value'] or [tag='value']
[index] or [last()] or [last()-index]
Which means that both of your rather simple expressions are not supported.
If you really want/need to stick with the built-in ElementTree library, one way to solve this would be with finding all Description tags via .findall() and filtering the one having a child element with info attribute.
You can also get those values as keys, which makes it a bit more structured approach to gather data:
import xml.etree.ElementTree as ET
root = ET.fromstring(xmlstr)
wht =root.find(".//Description")
wht.keys() #--> ['id']
wht.get('id') # --> '10'

python to pars xml to get value

I have a xml response from one of my system where i am trying to get the value using python code. Need experts view on highlighting my mistake.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><ns3:loginResponse xmlns:ns2="http://ws.core.product.xxxxx.com/groupService/" xmlns:ns3="http://ws.core.product.xxxxx.com/loginService/" xmlns:ns4="http://ws.core.product.xxxxx.com/userService/"><ns3:return>YWeVDwuZwHdxxxxxxxxxxx_GqLtkNTE.</ns3:return></ns3:loginResponse>
I am using the below code of code and had no luck in getting the value - YWeVDwuZwHdxxxxxxxxxxx_GqLtkNTE . I haven't used xml parsing with namespace. response.text has the above xml response.
responsetree = ET.ElementTree(ET.fromstring(response.text))
responseroot = responsetree.getroot()
for a in root.iter('return'):
print(a.attrib)
YWeVDwuZwHdxxxxxxxxxxx_GqLtkNTE is not in the attrib. It is the element text
The attrib in this case is an empty dict
See https://www.cmi.ac.in/~madhavan/courses/prog2-2012/docs/diveintopython3/xml.html about parsing XML dics using namespace.
Reference from other answer helped to understand the concepts.
Once I understood the xml structure , Its plain simple. Just adding the output it might help someone in future for quick reference.
responsetree = ET.ElementTree(ET.fromstring(response.text))
responseroot = responsetree.getroot()
root[0].text
Keeping it simple for understanding. You might need to find the len(root) and/or iterate over the loop with condition to get apt value. You can also use findall , find along with to get the interested item.

elementtree - get title?

<item rdf:about="http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-0005">
<title>CVE-2014-0005</title>
<link>http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-0005</link>
<description>PicketBox and JBossSX, as used in Red Hat JBoss Enterprise Application Platform (JBEAP) 6.2.2 and JBoss BRMS before 6.0.3 roll up patch 2, allows remote authenticated users to read and modify the application sever configuration and state by deploying a crafted application.</description>
<dc:date>2015-02-20T16:59:00Z</dc:date>
</item>
<item rdf:about="http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-1831">
<title>CVE-2014-1831 (passenger)</title>
<link>http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-1831</link>
<description>Phusion Passenger before 4.0.37 allows local users to write to certain files and directories via a symlink attack on (1) control_process.pid or a (2) generation-* file.</description>
<dc:date>2015-02-19T15:59:02Z</dc:date>
</item>
Hi,
Given the above, I am trying to extract the text value from the title item out of an xml file. However the below isn't working (am getting no results). Please advise.
def processxml():
tree = ET.parse('nvd-rss.xml')
for item in tree.findall( 'item/title' ):
print (title.text)
Thanks in advance
As BorrajaX mentioned you need to change your code in the for loop to print(item.text), because you iterate through all tags ('elements') the findall() -method of ElementTree-module returns. You can get the text inside an element by reading the attribute text of an ElementTree element instance.
# create an ElementTree instance called tree
for element in tree.findall( 'item/title' ):
print(element.text)
Some other attributes of an ElementTree element instance:
.tag Name of the element
.text Text inside the element
.tail Text following the element
.attrib Dictionary containing all element's attribute names and their corresponding values

How to grab the attribute value using beautifulSoup?

Code:
soup=BeautifulSoup(f.read())
data=soup.findAll('node',{'id':'memory'})
print data
Output
[<node id="memory" claimed="true" class="memory" handle="DMI:000E">
<description>
System Memory
</description>
<physid>
e
</physid>
<slot>
System board or motherboard
</slot>
<size units="bytes">
3221225472
</size>
<capacity units="bytes">
3221225472
</capacity>
</node>]
Now how will I grab the attributes value like the data between tag that is System Memory and so on. Any help is appreciated.
To get <...>this</...> you should use contents field, so in your case it would be:
print data.description.contents
To get attributes access them as they were a dictionary
print data.size['units']
And to iterate all the tags, use findAll that you already know:
for node in data.findAll(True):
# do stuff on node
beautifulsoup can create a tree. you can then iterate over that tree and get the attributes
check out the following link
http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html#TheattributesofTags

Categories