I have the following xml:
<text>test<br/><br/>All you need to know about British birds.<br/></text>
I am wishing to set the whole content of the tag <text> to 11111
I'm using pythong and lxml and the following are my codes:
import nltk
import lxml.etree as le
current_file = '/Users/noor/Dropbox/apps/APIofLife/src/clear_description/bird.rdf'
f = open(current_file,'r')
doc=le.parse(f)
for elem in doc.xpath("//text"):
elem.text = "11111"
f.close()
f = open(current_file,'w')
f.write(le.tostring(doc))
f.close()
However, after running the above codes, my results are:
<text>11111<br/><br/>All you need to know about British birds.<br/></text>
I want to know why the whole content of the tag <text> has not been changed to 11111
According to lxml.etree._Element documentation, text property correspond to the text before the first subelement.
You need to delete sub elements:
>>> import lxml.etree as le
>>>
>>> root = le.fromstring('''<text>test<br/><br/>
... All you need to know about British birds.
... <br/></text>''')
>>> for elem in root.xpath("//text"):
... elem.text = '1111'
... del elem[:] # <----------
...
>>> le.tostring(root)
'<text>1111</text>'
Related
I am trying to easily access values from an xml file.
<artikelen>
<artikel nummer="121">
<code>ABC123</code>
<naam>Highlight pen</naam>
<voorraad>231</voorraad>
<prijs>0.56</prijs>
</artikel>
<artikel nummer="123">
<code>PQR678</code>
<naam>Nietmachine</naam>
<voorraad>587</voorraad>
<prijs>9.99</prijs>
</artikel>
..... etc
If i want to acces the value ABC123, how do I get it?
import xmltodict
with open('8_1.html') as fd:
doc = xmltodict.parse(fd.read())
print(doc[fd]['code'])
Using your example:
import xmltodict
with open('artikelen.xml') as fd:
doc = xmltodict.parse(fd.read())
If you examine doc, you'll see it's an OrderedDict, ordered by tag:
>>> doc
OrderedDict([('artikelen',
OrderedDict([('artikel',
[OrderedDict([('#nummer', '121'),
('code', 'ABC123'),
('naam', 'Highlight pen'),
('voorraad', '231'),
('prijs', '0.56')]),
OrderedDict([('#nummer', '123'),
('code', 'PQR678'),
('naam', 'Nietmachine'),
('voorraad', '587'),
('prijs', '9.99')])])]))])
The root node is called artikelen, and there a subnode artikel which is a list of OrderedDict objects, so if you want the code for every article, you would do:
codes = []
for artikel in doc['artikelen']['artikel']:
codes.append(artikel['code'])
# >>> codes
# ['ABC123', 'PQR678']
If you specifically want the code only when nummer is 121, you could do this:
code = None
for artikel in doc['artikelen']['artikel']:
if artikel['#nummer'] == '121':
code = artikel['code']
break
That said, if you're parsing XML documents and want to search for a specific value like that, I would consider using XPath expressions, which are supported by ElementTree.
This is using xml.etree
You can try this:
for artikelobj in root.findall('artikel'):
print artikelobj.find('code')
if you want to extract a specific code based on the attribute 'nummer' of artikel, then you can try this:
for artikelobj in root.findall('artikel'):
if artikel.get('nummer') == 121:
print artikelobj.find('code')
this will print only the code you want.
You can use lxml package using XPath Expression.
from lxml import etree
f = open("8_1.html", "r")
tree = etree.parse(f)
expression = "/artikelen/artikel[1]/code"
l = tree.xpath(expression)
code = next(i.text for i in l)
print code
# ABC123
The thing to notice here is the expression. /artikelen is the root element. /artikel[1] chooses the first artikel element under root(Notice first element is not at index 0). /code is the child element under artikel[1]. You can read more about at lxml and xpath syntax.
To read .xml files :
import lxml.etree as ET
root = ET.parse(filename).getroot()
value = root.node1.node2.variable_name.text
I'm appending a node to an xml, but i want it to insert before some tags, could that be possible?
newNode = xmldoc.createElement("tag2")
txt = xmldoc.createTextNode("value2")
newNode.appendChild(txt)
n.appendChild(newNode)
This is my XML. When I append the child, it add after UniMed, I want it to insert after Cantidad and before UniMed. (Simplified version of my XML) "Item" can have more childs, and i do not know how many.
<ns0:Item>
<ns0:Cantidad>1</ns0:Cantidad>
<ns0:UniMed>L</ns0:UniMed>
</ns0:Item>
I think i can solve it by reading al the childs of Item, erase them, and then add them in the order I want.
But i dont think its the best idea...
Any ideas?
EDITED
SOLUTION
itemChildNodes = n.childNodes
n.insertBefore(newNode, itemChildNodes[itemChildNodes.length-2])
Use insertBefore method to insert New created tag.
Demo:
>>> from xml.dom import minidom
>>> content = """
... <xml>
... <Item>
... <Cantidad>1</Cantidad>
... <UniMed>L</UniMed>
... </Item>
... </xml>
... """
>>> root = minidom.parseString(content)
>>> insert_tag = root.createElement("tag2")
>>> htext = root.createTextNode('test')
>>> insert_tag.appendChild(htext)
<DOM Text node "'test'">
>>>
>>> items = root.getElementsByTagName("Item")
>>> item = items[0]
>>> item_chidren = item.childNodes
>>> item.insertBefore(insert_tag, item_chidren[2])
<DOM Element: tag2 at 0xb700da8c>
>>> root.toxml()
u'<?xml version="1.0" ?><xml>\n\t<Item>\n\t <Cantidad>1</Cantidad><tag2>test</tag2>\n\t <UniMed>L</UniMed>\n\t</Item>\n</xml>'
>>>
I want to update xml file with new information by using lxml library.
For example, I have this code:
>>> from lxml import etree
>>>
>>> tree = etree.parse('books.xml')
where 'books.xml' file, has this content: http://www.w3schools.com/dom/books.xml
I want to update this file with new book:
>>> new_entry = etree.fromstring('''<book category="web" cover="paperback">
... <title lang="en">Learning XML 2</title>
... <author>Erik Ray</author>
... <year>2006</year>
... <price>49.95</price>
... </book>''')
My question is, how can I update tree element tree with new_entry tree and save the file.
Here you go, get the root of the tree, append your new element, save the tree as a string to a file:
from lxml import etree
tree = etree.parse('books.xml')
new_entry = etree.fromstring('''<book category="web" cover="paperback">
<title lang="en">Learning XML 2</title>
<author>Erik Ray</author>
<year>2006</year>
<price>49.95</price>
</book>''')
root = tree.getroot()
root.append(new_entry)
f = open('books-mod.xml', 'wb')
f.write(etree.tostring(root, pretty_print=True))
f.close()
I don't have enough reputation to comment, therefore I'll write an answer...
The most simple change make the code of Guillaume work is to change the line
f = open('books-mod.xml', 'w')
to
f = open('books-mod.xml', 'wb')
I have an xml file like this
<?xml version="1.0"?>
<sample>
<text>My name is <b>Wrufesh</b>. What is yours?</text>
</sample>
I have a python code like this
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
for child in root:
print child.text()
I only get
'My name is' as an output.
I want to get
'My name is <b>Wrufesh</b>. What is yours?' as an output.
What can I do?
You can get your desired output using using ElementTree.tostringlist():
>>> import xml.etree.ElementTree as ET
>>> root = ET.parse('sample.xml').getroot()
>>> l = ET.tostringlist(root.find('text'))
>>> l
['<text', '>', 'My name is ', '<b', '>', 'Wrufesh', '</b>', '. What is yours?', '</text>', '\n']
>>> ''.join(l[2:-2])
'My name is <b>Wrufesh</b>. What is yours?'
I wonder though how practical this is going to be for generic use.
I don't think treating tag in xml as a string is right. You can access the text part of xml like this:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
text = root[0]
for i in text.itertext():
print i
# As you can see, `<b>` and `</b>` is a pair of tags but not strings.
print text._children
I would suggest pre-processing the xml file to wrap elements under <text> element in CDATA. You should be able to read the values without a problem afterwards.
<text><![CDATA[<My name is <b>Wrufesh</b>. What is yours?]]></text>
xml file :
<global>
<rtmp>
<fcsapp>
<password>
<key>hello123</key>
<key>check123</key>
</password>
</fcsapp>
</rtmp>
</global>
python code : To obtain all the key tag values.
hello123
check123
using xml.etree.ElementTree
for streams in xmlRoot.iter('global'):
xpath = "/rtmp/fcsapp/password"
tag = "key"
for child in streams.findall(xpath):
resultlist.append(child.find(tag).text)
print resultlist
The output obtained is [hello123], but I want it to display both ([hello123, check123])
How do I obtain this?
Using lxml and cssselect I would do it like this:
>>> from lxml.html import fromstring
>>> doc = fromstring(open("foo.xml", "r").read())
>>> doc.cssselect("password key")
[<Element key at 0x7f77a6786cb0>, <Element key at 0x7f77a6786d70>]
>>> [e.text for e in doc.cssselect("password key")]
['hello123 \n ', 'check123 \n ']
With lxml and xpath You can do it in the following way:
from lxml import etree
xml = """
<global>
<rtmp>
<fcsapp>
<password>
<key>hello123</key>
<key>check123</key>
</password>
</fcsapp>
</rtmp>
</global>
"""
tree = etree.fromstring(xml)
result = tree.xpath('//password/key/text()')
print result # ['hello123', 'check123']
try beautifulsoup package "https://pypi.python.org/pypi/BeautifulSoup"
using xml.etree.ElementTree
for streams in xmlRoot.iter('global'):
xpath = "/rtmp/fcsapp/password"
tag = "key"
for child in streams.iter(tag):
resultlist.append(child.text)
print resultlist
have to iter over the "key" tag in for loop to obtain the desired result. The above code solves the problem.