xml file :
<global>
<rtmp>
<fcsapp>
<password>
<key>hello123</key>
<key>check123</key>
</password>
</fcsapp>
</rtmp>
</global>
python code : To obtain all the key tag values.
hello123
check123
using xml.etree.ElementTree
for streams in xmlRoot.iter('global'):
xpath = "/rtmp/fcsapp/password"
tag = "key"
for child in streams.findall(xpath):
resultlist.append(child.find(tag).text)
print resultlist
The output obtained is [hello123], but I want it to display both ([hello123, check123])
How do I obtain this?
Using lxml and cssselect I would do it like this:
>>> from lxml.html import fromstring
>>> doc = fromstring(open("foo.xml", "r").read())
>>> doc.cssselect("password key")
[<Element key at 0x7f77a6786cb0>, <Element key at 0x7f77a6786d70>]
>>> [e.text for e in doc.cssselect("password key")]
['hello123 \n ', 'check123 \n ']
With lxml and xpath You can do it in the following way:
from lxml import etree
xml = """
<global>
<rtmp>
<fcsapp>
<password>
<key>hello123</key>
<key>check123</key>
</password>
</fcsapp>
</rtmp>
</global>
"""
tree = etree.fromstring(xml)
result = tree.xpath('//password/key/text()')
print result # ['hello123', 'check123']
try beautifulsoup package "https://pypi.python.org/pypi/BeautifulSoup"
using xml.etree.ElementTree
for streams in xmlRoot.iter('global'):
xpath = "/rtmp/fcsapp/password"
tag = "key"
for child in streams.iter(tag):
resultlist.append(child.text)
print resultlist
have to iter over the "key" tag in for loop to obtain the desired result. The above code solves the problem.
Related
here is my xml:
<request><table attributeA="50" attributeB="1"></table>........</request>
how do I update attributeA's value, to have something like attributeA="456"
<request><table attributeA="456" attributeB="1"></table>........</request>
Use etree and xpath :
>>> from lxml import etree
>>> xml = '<request><table attributeA="50" attributeB="1"></table></request>'
>>> root = etree.fromstring(xml)
>>> for el in root.xpath("//table[#attributeA]"):
... el.attrib['attributeA'] = "456"
...
>>> print etree.tostring(root)
<request><table attributeA="456" attributeB="1"/></request>
I am trying to easily access values from an xml file.
<artikelen>
<artikel nummer="121">
<code>ABC123</code>
<naam>Highlight pen</naam>
<voorraad>231</voorraad>
<prijs>0.56</prijs>
</artikel>
<artikel nummer="123">
<code>PQR678</code>
<naam>Nietmachine</naam>
<voorraad>587</voorraad>
<prijs>9.99</prijs>
</artikel>
..... etc
If i want to acces the value ABC123, how do I get it?
import xmltodict
with open('8_1.html') as fd:
doc = xmltodict.parse(fd.read())
print(doc[fd]['code'])
Using your example:
import xmltodict
with open('artikelen.xml') as fd:
doc = xmltodict.parse(fd.read())
If you examine doc, you'll see it's an OrderedDict, ordered by tag:
>>> doc
OrderedDict([('artikelen',
OrderedDict([('artikel',
[OrderedDict([('#nummer', '121'),
('code', 'ABC123'),
('naam', 'Highlight pen'),
('voorraad', '231'),
('prijs', '0.56')]),
OrderedDict([('#nummer', '123'),
('code', 'PQR678'),
('naam', 'Nietmachine'),
('voorraad', '587'),
('prijs', '9.99')])])]))])
The root node is called artikelen, and there a subnode artikel which is a list of OrderedDict objects, so if you want the code for every article, you would do:
codes = []
for artikel in doc['artikelen']['artikel']:
codes.append(artikel['code'])
# >>> codes
# ['ABC123', 'PQR678']
If you specifically want the code only when nummer is 121, you could do this:
code = None
for artikel in doc['artikelen']['artikel']:
if artikel['#nummer'] == '121':
code = artikel['code']
break
That said, if you're parsing XML documents and want to search for a specific value like that, I would consider using XPath expressions, which are supported by ElementTree.
This is using xml.etree
You can try this:
for artikelobj in root.findall('artikel'):
print artikelobj.find('code')
if you want to extract a specific code based on the attribute 'nummer' of artikel, then you can try this:
for artikelobj in root.findall('artikel'):
if artikel.get('nummer') == 121:
print artikelobj.find('code')
this will print only the code you want.
You can use lxml package using XPath Expression.
from lxml import etree
f = open("8_1.html", "r")
tree = etree.parse(f)
expression = "/artikelen/artikel[1]/code"
l = tree.xpath(expression)
code = next(i.text for i in l)
print code
# ABC123
The thing to notice here is the expression. /artikelen is the root element. /artikel[1] chooses the first artikel element under root(Notice first element is not at index 0). /code is the child element under artikel[1]. You can read more about at lxml and xpath syntax.
To read .xml files :
import lxml.etree as ET
root = ET.parse(filename).getroot()
value = root.node1.node2.variable_name.text
I have this structure on a xml response
<reponse xmls="http://www.some.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.some.com/ot ot.xsd ">
<OneTree> </OneTree>
<TwoTree>
<count>10</count>
<alist>
<aelement>
<Name>FirstEntry</Name>
</aelement>
<aelement>
<Name>FirstEntry</Name>
</aelement>
</alist>
</TwoTree>
And I'm trying to print out the value on Names
So far I've managed to print the value of <count>
tree = ElementTree.fromstring(response.content)
for child in tree:
if (child.tag == '{http://www.some.com/ot}TwoTree'):
print child[0].text
I'm having problem getting the tree on <alist> and the printing out Names, looking for tags or attrib are not working for this structure. Little help?
I think it is easier using findall() method which accepts basic XPath syntax :
namespaces = {'d': 'http://www.some.com'}
result = tree.findall('.//d:Name', namespaces)
print [r.text for r in result]
complete demo :
raw = '''<response xmlns="http://www.some.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.some.com/ot ot.xsd ">
<OneTree> </OneTree>
<TwoTree>
<count>10</count>
<alist>
<aelement>
<Name>FirstEntry</Name>
</aelement>
<aelement>
<Name>FirstEntry</Name>
</aelement>
</alist>
</TwoTree>
</response>'''
from xml.etree import ElementTree as ET
tree = ET.fromstring(raw)
namespaces = {'d': 'http://www.some.com'}
result = tree.findall('.//d:Name', namespaces)
print [r.text for r in result]
output :
['FirstEntry', 'FirstEntry']
I have an xml file like this
<?xml version="1.0"?>
<sample>
<text>My name is <b>Wrufesh</b>. What is yours?</text>
</sample>
I have a python code like this
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
for child in root:
print child.text()
I only get
'My name is' as an output.
I want to get
'My name is <b>Wrufesh</b>. What is yours?' as an output.
What can I do?
You can get your desired output using using ElementTree.tostringlist():
>>> import xml.etree.ElementTree as ET
>>> root = ET.parse('sample.xml').getroot()
>>> l = ET.tostringlist(root.find('text'))
>>> l
['<text', '>', 'My name is ', '<b', '>', 'Wrufesh', '</b>', '. What is yours?', '</text>', '\n']
>>> ''.join(l[2:-2])
'My name is <b>Wrufesh</b>. What is yours?'
I wonder though how practical this is going to be for generic use.
I don't think treating tag in xml as a string is right. You can access the text part of xml like this:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
text = root[0]
for i in text.itertext():
print i
# As you can see, `<b>` and `</b>` is a pair of tags but not strings.
print text._children
I would suggest pre-processing the xml file to wrap elements under <text> element in CDATA. You should be able to read the values without a problem afterwards.
<text><![CDATA[<My name is <b>Wrufesh</b>. What is yours?]]></text>
I'm trying to parse an SVG document with lxml. Here's my code:
nsmap = {
'svg': 'http://www.w3.org/2000/svg',
'xlink': 'http://www.w3.org/1999/xlink',
}
root = etree.XML(svg)
# this works (finds the element with the given ID)
root.xpath('./svg:g/svg:g/svg:g[#id="route_1_edge"]', namespaces=nsmap)
# this yields "XPathEvalError: Invalid expression"
root.xpath('./svg:g/svg:g/svg:g[fn:startswith(#id,"route_1")]', namespaces=nsmap)
Anyone know why the first one works and the second doesn't? If I change the third svg:g to svg:text I don't get an exception, so it seems to be something to do with the g element in particular that it doesn't like, though, again, the simple g[#id="foo"] search works fine.
The "startswith" function is spelled starts-with. Also, omit the fn:.
root.xpath('./svg:g/svg:g/svg:g[starts-with(#id,"route_1")]', namespaces=nsmap)
import lxml.etree as etree
import lxml.builder as builder
nsmap = {
'svg': 'http://www.w3.org/2000/svg',
'xlink': 'http://www.w3.org/1999/xlink',
}
E = builder.ElementMaker(
namespace='http://www.w3.org/2000/svg',
nsmap=nsmap)
root = (
E.root(
E.g(
E.g(
E.g(id = "route_1_edge" )))))
print(etree.tostring(root, pretty_print=True))
print(root.xpath('./svg:g/svg:g/svg:g[#id="route_1_edge"]', namespaces=nsmap))
print(root.xpath('./svg:g/svg:g/svg:g[starts-with(#id,"route_1")]', namespaces=nsmap))
yields
<svg:root xmlns:svg="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<svg:g>
<svg:g>
<svg:g id="route_1_edge"/>
</svg:g>
</svg:g>
</svg:root>
[<Element {http://www.w3.org/2000/svg}g at 0xb7462c34>]
[<Element {http://www.w3.org/2000/svg}g at 0xb7462be4>]