xml file parsing in python - python

xml file :
<global>
<rtmp>
<fcsapp>
<password>
<key>hello123</key>
<key>check123</key>
</password>
</fcsapp>
</rtmp>
</global>
python code : To obtain all the key tag values.
hello123
check123
using xml.etree.ElementTree
for streams in xmlRoot.iter('global'):
xpath = "/rtmp/fcsapp/password"
tag = "key"
for child in streams.findall(xpath):
resultlist.append(child.find(tag).text)
print resultlist
The output obtained is [hello123], but I want it to display both ([hello123, check123])
How do I obtain this?

Using lxml and cssselect I would do it like this:
>>> from lxml.html import fromstring
>>> doc = fromstring(open("foo.xml", "r").read())
>>> doc.cssselect("password key")
[<Element key at 0x7f77a6786cb0>, <Element key at 0x7f77a6786d70>]
>>> [e.text for e in doc.cssselect("password key")]
['hello123 \n ', 'check123 \n ']

With lxml and xpath You can do it in the following way:
from lxml import etree
xml = """
<global>
<rtmp>
<fcsapp>
<password>
<key>hello123</key>
<key>check123</key>
</password>
</fcsapp>
</rtmp>
</global>
"""
tree = etree.fromstring(xml)
result = tree.xpath('//password/key/text()')
print result # ['hello123', 'check123']

try beautifulsoup package "https://pypi.python.org/pypi/BeautifulSoup"

using xml.etree.ElementTree
for streams in xmlRoot.iter('global'):
xpath = "/rtmp/fcsapp/password"
tag = "key"
for child in streams.iter(tag):
resultlist.append(child.text)
print resultlist
have to iter over the "key" tag in for loop to obtain the desired result. The above code solves the problem.

Related

changing attribute value in xml via lxml python

here is my xml:
<request><table attributeA="50" attributeB="1"></table>........</request>
how do I update attributeA's value, to have something like attributeA="456"
<request><table attributeA="456" attributeB="1"></table>........</request>
Use etree and xpath :
>>> from lxml import etree
>>> xml = '<request><table attributeA="50" attributeB="1"></table></request>'
>>> root = etree.fromstring(xml)
>>> for el in root.xpath("//table[#attributeA]"):
... el.attrib['attributeA'] = "456"
...
>>> print etree.tostring(root)
<request><table attributeA="456" attributeB="1"/></request>

How to use xmltodict to get items out of an xml file

I am trying to easily access values from an xml file.
<artikelen>
<artikel nummer="121">
<code>ABC123</code>
<naam>Highlight pen</naam>
<voorraad>231</voorraad>
<prijs>0.56</prijs>
</artikel>
<artikel nummer="123">
<code>PQR678</code>
<naam>Nietmachine</naam>
<voorraad>587</voorraad>
<prijs>9.99</prijs>
</artikel>
..... etc
If i want to acces the value ABC123, how do I get it?
import xmltodict
with open('8_1.html') as fd:
doc = xmltodict.parse(fd.read())
print(doc[fd]['code'])
Using your example:
import xmltodict
with open('artikelen.xml') as fd:
doc = xmltodict.parse(fd.read())
If you examine doc, you'll see it's an OrderedDict, ordered by tag:
>>> doc
OrderedDict([('artikelen',
OrderedDict([('artikel',
[OrderedDict([('#nummer', '121'),
('code', 'ABC123'),
('naam', 'Highlight pen'),
('voorraad', '231'),
('prijs', '0.56')]),
OrderedDict([('#nummer', '123'),
('code', 'PQR678'),
('naam', 'Nietmachine'),
('voorraad', '587'),
('prijs', '9.99')])])]))])
The root node is called artikelen, and there a subnode artikel which is a list of OrderedDict objects, so if you want the code for every article, you would do:
codes = []
for artikel in doc['artikelen']['artikel']:
codes.append(artikel['code'])
# >>> codes
# ['ABC123', 'PQR678']
If you specifically want the code only when nummer is 121, you could do this:
code = None
for artikel in doc['artikelen']['artikel']:
if artikel['#nummer'] == '121':
code = artikel['code']
break
That said, if you're parsing XML documents and want to search for a specific value like that, I would consider using XPath expressions, which are supported by ElementTree.
This is using xml.etree
You can try this:
for artikelobj in root.findall('artikel'):
print artikelobj.find('code')
if you want to extract a specific code based on the attribute 'nummer' of artikel, then you can try this:
for artikelobj in root.findall('artikel'):
if artikel.get('nummer') == 121:
print artikelobj.find('code')
this will print only the code you want.
You can use lxml package using XPath Expression.
from lxml import etree
f = open("8_1.html", "r")
tree = etree.parse(f)
expression = "/artikelen/artikel[1]/code"
l = tree.xpath(expression)
code = next(i.text for i in l)
print code
# ABC123
The thing to notice here is the expression. /artikelen is the root element. /artikel[1] chooses the first artikel element under root(Notice first element is not at index 0). /code is the child element under artikel[1]. You can read more about at lxml and xpath syntax.
To read .xml files :
import lxml.etree as ET
root = ET.parse(filename).getroot()
value = root.node1.node2.variable_name.text

Iterating over xml response on python

I have this structure on a xml response
<reponse xmls="http://www.some.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.some.com/ot ot.xsd ">
<OneTree> </OneTree>
<TwoTree>
<count>10</count>
<alist>
<aelement>
<Name>FirstEntry</Name>
</aelement>
<aelement>
<Name>FirstEntry</Name>
</aelement>
</alist>
</TwoTree>
And I'm trying to print out the value on Names
So far I've managed to print the value of <count>
tree = ElementTree.fromstring(response.content)
for child in tree:
if (child.tag == '{http://www.some.com/ot}TwoTree'):
print child[0].text
I'm having problem getting the tree on <alist> and the printing out Names, looking for tags or attrib are not working for this structure. Little help?
I think it is easier using findall() method which accepts basic XPath syntax :
namespaces = {'d': 'http://www.some.com'}
result = tree.findall('.//d:Name', namespaces)
print [r.text for r in result]
complete demo :
raw = '''<response xmlns="http://www.some.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.some.com/ot ot.xsd ">
<OneTree> </OneTree>
<TwoTree>
<count>10</count>
<alist>
<aelement>
<Name>FirstEntry</Name>
</aelement>
<aelement>
<Name>FirstEntry</Name>
</aelement>
</alist>
</TwoTree>
</response>'''
from xml.etree import ElementTree as ET
tree = ET.fromstring(raw)
namespaces = {'d': 'http://www.some.com'}
result = tree.findall('.//d:Name', namespaces)
print [r.text for r in result]
output :
['FirstEntry', 'FirstEntry']

python parsing xml with ElementTree doesn't give interested result

I have an xml file like this
<?xml version="1.0"?>
<sample>
<text>My name is <b>Wrufesh</b>. What is yours?</text>
</sample>
I have a python code like this
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
for child in root:
print child.text()
I only get
'My name is' as an output.
I want to get
'My name is <b>Wrufesh</b>. What is yours?' as an output.
What can I do?
You can get your desired output using using ElementTree.tostringlist():
>>> import xml.etree.ElementTree as ET
>>> root = ET.parse('sample.xml').getroot()
>>> l = ET.tostringlist(root.find('text'))
>>> l
['<text', '>', 'My name is ', '<b', '>', 'Wrufesh', '</b>', '. What is yours?', '</text>', '\n']
>>> ''.join(l[2:-2])
'My name is <b>Wrufesh</b>. What is yours?'
I wonder though how practical this is going to be for generic use.
I don't think treating tag in xml as a string is right. You can access the text part of xml like this:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
text = root[0]
for i in text.itertext():
print i
# As you can see, `<b>` and `</b>` is a pair of tags but not strings.
print text._children
I would suggest pre-processing the xml file to wrap elements under <text> element in CDATA. You should be able to read the values without a problem afterwards.
<text><![CDATA[<My name is <b>Wrufesh</b>. What is yours?]]></text>

Python lxml xpath XPathEvalError: Invalid expression -- why?

I'm trying to parse an SVG document with lxml. Here's my code:
nsmap = {
'svg': 'http://www.w3.org/2000/svg',
'xlink': 'http://www.w3.org/1999/xlink',
}
root = etree.XML(svg)
# this works (finds the element with the given ID)
root.xpath('./svg:g/svg:g/svg:g[#id="route_1_edge"]', namespaces=nsmap)
# this yields "XPathEvalError: Invalid expression"
root.xpath('./svg:g/svg:g/svg:g[fn:startswith(#id,"route_1")]', namespaces=nsmap)
Anyone know why the first one works and the second doesn't? If I change the third svg:g to svg:text I don't get an exception, so it seems to be something to do with the g element in particular that it doesn't like, though, again, the simple g[#id="foo"] search works fine.
The "startswith" function is spelled starts-with. Also, omit the fn:.
root.xpath('./svg:g/svg:g/svg:g[starts-with(#id,"route_1")]', namespaces=nsmap)
import lxml.etree as etree
import lxml.builder as builder
nsmap = {
'svg': 'http://www.w3.org/2000/svg',
'xlink': 'http://www.w3.org/1999/xlink',
}
E = builder.ElementMaker(
namespace='http://www.w3.org/2000/svg',
nsmap=nsmap)
root = (
E.root(
E.g(
E.g(
E.g(id = "route_1_edge" )))))
print(etree.tostring(root, pretty_print=True))
print(root.xpath('./svg:g/svg:g/svg:g[#id="route_1_edge"]', namespaces=nsmap))
print(root.xpath('./svg:g/svg:g/svg:g[starts-with(#id,"route_1")]', namespaces=nsmap))
yields
<svg:root xmlns:svg="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<svg:g>
<svg:g>
<svg:g id="route_1_edge"/>
</svg:g>
</svg:g>
</svg:root>
[<Element {http://www.w3.org/2000/svg}g at 0xb7462c34>]
[<Element {http://www.w3.org/2000/svg}g at 0xb7462be4>]

Categories