I would like to get the xml value of an element in ElementTree. For example, if I had the code:
<?xml version="1.0" encoding="UTF-8"?>
<item>
<child>asd</child>
hello world
<ch>jkl</ch>
</item>
It would get me
<child>asd</child>
hello world
<ch>jkl</ch>
Here's what I tried so far:
import xml.etree.ElementTree as ET
root = ET.fromstring("""<?xml version="1.0" encoding="UTF-8"?>
<item>
<child>asd</child>
hello world
<ch>jkl</ch>
</item>""")
print(root.text)
Try
print(ET.tostring(root.find('.//child')).decode(),ET.tostring(root.find('.//ch')).decode())
Or, more readable:
elems = ['child','ch']
for elem in elems:
print(ET.tostring(doc.find(f'.//{elem}')).decode())
The output, based on the xml in your question, should be what you're looking for.
Building on Jack Fleeting's answer, I created a solution I feel is more general, not just relating to the xml I inserted.
import xml.etree.ElementTree as ET
root = ET.fromstring("""<?xml version="1.0" encoding="UTF-8"?>
<item>
<child>asd</child>
hello world
<ch>jkl</ch>
</item>""")
for elem in root:
print(ET.tostring(root.find(f'.//{elem.tag}')).decode())
Related
I want to remove element but not its children. I tried with this code, but my code remove its children also.
code
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for item in root.findall('item'):
root.remove(item)
print(ET.tostring(root))
>>> <root>
</root>
test.xml
<?xml version="1.0" ?>
<root>
<item>
<data>
<number>01</number>
<step>one</step>
</data>
</item>
</root>
expected outcome
<?xml version="1.0" ?>
<root>
<data>
<number>01</number>
<step>one</step>
</data>
</root>
You should move all children of item to root before removing
for item in root.findall('item'):
for child in item:
root.append(child)
root.remove(item)
print(ET.tostring(root))
the code results in
<root>
<data>
<number>01</number>
<step>one</step>
</data>
</root>
Find the element with data tag, remove it and extend the element's parent with element's children.
import xml.etree.ElementTree as etree
data = """
<root>
<item>
<data>
<number>01</number>
<step>one</step>
</data>
</item>
</root>
"""
tree = etree.fromstring(data)
def iterparent(tree):
for parent in tree.iter():
for child in parent:
yield parent, child
tree = etree.fromstring(data)
for parent, child in iterparent(tree):
if child.tag == "data":
parent.remove(child)
parent.extend(child)
print((etree.tostring(tree)))
will output
<root>
<item>
<number>01</number>
<step>one</step>
</item>
</root>
Adapted from a similar answer for your particular use case.
I have the following example XML tree:
<main>
<section>
<list key="capital" value="sydney">
<items>
<item id="abc-123"></item>
<item id="abc-345"></item>
</items>
</list>
<list key="capital" value="tokyo">
<items>
<item id="def-678"></item>
<item id="def-901"></item>
</items>
</list>
</section>
</maim>
Do you know how to run a query that will extract the "items" node under "list" with key="capital" and value="tokyo" (which should extract item nodes with id="def-678" and id="def-901")?
Thanks so much for your help!
You can use XPath expression that xml.etree supports (see the documentation) via find() or findall() method :
from xml.etree import ElementTree as ET
raw = '''your xml string here'''
root = ET.fromstring(raw)
result = root.findall(".//list[#key='capital'][#value='tokyo']/items/item")
console test output :
>>> for r in result:
... print ET.tostring(r)
...
<item id="def-678" />
<item id="def-901" />
I am newbie for XPath. I have the following XML file.
Here my xml file:
<?xml version='1.0' encoding='utf-8'?>
<items>
<item>
<country>India</country>
<referenceId>IN375TP</referenceId>
<price>400</price>
</item>
<item>
<country>Australia</country>
<referenceId>AU120ED</referenceId>
<price>15</price>
</item>
<item>
<country>United Kingdom</country>
<referenceId>UK862RB</referenceId>
<price>20</price>
</item>
</items>
I want the following <item> tag as an output:
<item>
<country>Australia</country>
<referenceId>AU120ED</referenceId>
<price>15</price>
</item>
Note: Please use condition like /items/item[referenceId/text()="AU120ED"]
If you want to find the item by country, you can use an xpath specifying you want to find the item in items that have the text=country_name:
from lxml.etree import parse, HTMLParser
xml = parse("check.xml",HTMLParser())
print(xml.find("//items//item[country='Australia']"))
<Element item at 0x7f40faa28950>
If you actually want to search be referenceid, just change to item[referenceid='AU120ED']:
print(xml.find("//items//item[referenceid='AU120ED']"))
<Element item at 0x7f02c0c24998>
For xml:
from xml.etree import ElementTree as et
xml = et.parse("check.xml")
print(xml.find(".").find("./item[referenceId='AU120ED']"))
Suppose I have a XML file like
<?xml version="1.0" encoding="utf-8"?>
<items>
<?xml version="1.0" encoding="utf-8"?>
<items>
<item>
<price>1500</price>
<info> asfgfdff</info>
</item>
</items>
How do I parse so that the parser selects the recently updated xml tree?
with open('file','r') as f:
newestXml = []
for line in f.readlines():
if re.search('^<\?xml',line):
newestXml = [line]
else:
newestXml.append(line)
At the end of the loop, newestXml will contain all the lines from the last occurrence of <?xml to the end of the file.
Now you can combine the lines and use the xml parser to parse the xml.
Note - I can't check this code now, so it may contain small mistakes, but I hope the idea will help you.
I can read tags, except when there is a prefix. I'm not having luck searching SO for a previous question.
I need to read media:content. I tried image = node.find("media:content").
Rss input:
<channel>
<title>Popular Photography in the last 1 week</title>
<item>
<title>foo</title>
<media:category label="Miscellaneous">photography/misc</media:category>
<media:content url="http://foo.com/1.jpg" height="375" width="500" medium="image"/>
</item>
<item> ... </item>
</channel>
I can read a sibling tag title.
from xml.etree import ElementTree
with open('cache1.rss', 'rt') as f:
tree = ElementTree.parse(f)
for node in tree.findall('.//channel/item'):
title = node.find("title").text
I've been using the docs, yet stuck on the 'prefix' part.
Here's an example of using XML namespaces with ElementTree:
>>> x = '''\
<channel xmlns:media="http://www.w3.org/TR/html4/">
<title>Popular Photography in the last 1 week</title>
<item>
<title>foo</title>
<media:category label="Miscellaneous">photography/misc</media:category>
<media:content url="http://foo.com/1.jpg" height="375" width="500" medium="image"/>
</item>
<item> ... </item>
</channel>
'''
>>> node = ElementTree.fromstring(x)
>>> for elem in node.findall('item/{http://www.w3.org/TR/html4/}category'):
print elem.text
photography/misc
media is an XML namespace, it has to be defined somewhere earlier with xmlns:media="...". See http://lxml.de/xpathxslt.html#namespaces-and-prefixes for how to define xml namespaces for use in XPath expressions in lxml.