Get xml value of ElementTree Element

Get xml value of ElementTree Element - python

I would like to get the xml value of an element in ElementTree. For example, if I had the code:
<?xml version="1.0" encoding="UTF-8"?>
<item>
<child>asd</child>
hello world
<ch>jkl</ch>
</item>
It would get me
<child>asd</child>
hello world
<ch>jkl</ch>
Here's what I tried so far:
import xml.etree.ElementTree as ET
root = ET.fromstring("""<?xml version="1.0" encoding="UTF-8"?>
<item>
<child>asd</child>
hello world
<ch>jkl</ch>
</item>""")
print(root.text)

Try
print(ET.tostring(root.find('.//child')).decode(),ET.tostring(root.find('.//ch')).decode())
Or, more readable:
elems = ['child','ch']
for elem in elems:
print(ET.tostring(doc.find(f'.//{elem}')).decode())
The output, based on the xml in your question, should be what you're looking for.

Building on Jack Fleeting's answer, I created a solution I feel is more general, not just relating to the xml I inserted.
import xml.etree.ElementTree as ET
root = ET.fromstring("""<?xml version="1.0" encoding="UTF-8"?>
<item>
<child>asd</child>
hello world
<ch>jkl</ch>
</item>""")
for elem in root:
print(ET.tostring(root.find(f'.//{elem.tag}')).decode())

Related

Python : Remove an element but not its children from xml?

I want to remove element but not its children. I tried with this code, but my code remove its children also.
code
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for item in root.findall('item'):
root.remove(item)
print(ET.tostring(root))
>>> <root>
</root>
test.xml
<?xml version="1.0" ?>
<root>
<item>
<data>
<number>01</number>
<step>one</step>
</data>
</item>
</root>
expected outcome
<?xml version="1.0" ?>
<root>
<data>
<number>01</number>
<step>one</step>
</data>
</root>

You should move all children of item to root before removing
for item in root.findall('item'):
for child in item:
root.append(child)
root.remove(item)
print(ET.tostring(root))
the code results in
<root>
<data>
<number>01</number>
<step>one</step>
</data>
</root>

Find the element with data tag, remove it and extend the element's parent with element's children.
import xml.etree.ElementTree as etree
data = """
<root>
<item>
<data>
<number>01</number>
<step>one</step>
</data>
</item>
</root>
"""
tree = etree.fromstring(data)
def iterparent(tree):
for parent in tree.iter():
for child in parent:
yield parent, child
tree = etree.fromstring(data)
for parent, child in iterparent(tree):
if child.tag == "data":
parent.remove(child)
parent.extend(child)
print((etree.tostring(tree)))
will output
<root>
<item>
<number>01</number>
<step>one</step>
</item>
</root>
Adapted from a similar answer for your particular use case.

How to query XML node using ElementTree in python

I have the following example XML tree:
<main>
<section>
<list key="capital" value="sydney">
<items>
<item id="abc-123"></item>
<item id="abc-345"></item>
</items>
</list>
<list key="capital" value="tokyo">
<items>
<item id="def-678"></item>
<item id="def-901"></item>
</items>
</list>
</section>
</maim>
Do you know how to run a query that will extract the "items" node under "list" with key="capital" and value="tokyo" (which should extract item nodes with id="def-678" and id="def-901")?
Thanks so much for your help!

You can use XPath expression that xml.etree supports (see the documentation) via find() or findall() method :
from xml.etree import ElementTree as ET
raw = '''your xml string here'''
root = ET.fromstring(raw)
result = root.findall(".//list[#key='capital'][#value='tokyo']/items/item")
console test output :
>>> for r in result:
... print ET.tostring(r)
...
<item id="def-678" />
<item id="def-901" />

How to get specific block(group) based on child node's value in XPath from the XML?

I am newbie for XPath. I have the following XML file.
Here my xml file:
<?xml version='1.0' encoding='utf-8'?>
<items>
<item>
<country>India</country>
<referenceId>IN375TP</referenceId>
<price>400</price>
</item>
<item>
<country>Australia</country>
<referenceId>AU120ED</referenceId>
<price>15</price>
</item>
<item>
<country>United Kingdom</country>
<referenceId>UK862RB</referenceId>
<price>20</price>
</item>
</items>
I want the following <item> tag as an output:
<item>
<country>Australia</country>
<referenceId>AU120ED</referenceId>
<price>15</price>
</item>
Note: Please use condition like /items/item[referenceId/text()="AU120ED"]

If you want to find the item by country, you can use an xpath specifying you want to find the item in items that have the text=country_name:
from lxml.etree import parse, HTMLParser
xml = parse("check.xml",HTMLParser())
print(xml.find("//items//item[country='Australia']"))
<Element item at 0x7f40faa28950>
If you actually want to search be referenceid, just change to item[referenceid='AU120ED']:
print(xml.find("//items//item[referenceid='AU120ED']"))
<Element item at 0x7f02c0c24998>
For xml:
from xml.etree import ElementTree as et
xml = et.parse("check.xml")
print(xml.find(".").find("./item[referenceId='AU120ED']"))

how to parse the second xml tree in a file

Suppose I have a XML file like
<?xml version="1.0" encoding="utf-8"?>
<items>
<?xml version="1.0" encoding="utf-8"?>
<items>
<item>
<price>1500</price>
<info> asfgfdff</info>
</item>
</items>
How do I parse so that the parser selects the recently updated xml tree?

with open('file','r') as f:
newestXml = []
for line in f.readlines():
if re.search('^<\?xml',line):
newestXml = [line]
else:
newestXml.append(line)
At the end of the loop, newestXml will contain all the lines from the last occurrence of <?xml to the end of the file.
Now you can combine the lines and use the xml parser to parse the xml.
Note - I can't check this code now, so it may contain small mistakes, but I hope the idea will help you.

parse .xml with prefix's on tags? xml.etree.ElementTree

I can read tags, except when there is a prefix. I'm not having luck searching SO for a previous question.
I need to read media:content. I tried image = node.find("media:content").
Rss input:
<channel>
<title>Popular Photography in the last 1 week</title>
<item>
<title>foo</title>
<media:category label="Miscellaneous">photography/misc</media:category>
<media:content url="http://foo.com/1.jpg" height="375" width="500" medium="image"/>
</item>
<item> ... </item>
</channel>
I can read a sibling tag title.
from xml.etree import ElementTree
with open('cache1.rss', 'rt') as f:
tree = ElementTree.parse(f)
for node in tree.findall('.//channel/item'):
title = node.find("title").text
I've been using the docs, yet stuck on the 'prefix' part.

Here's an example of using XML namespaces with ElementTree:
>>> x = '''\
<channel xmlns:media="http://www.w3.org/TR/html4/">
<title>Popular Photography in the last 1 week</title>
<item>
<title>foo</title>
<media:category label="Miscellaneous">photography/misc</media:category>
<media:content url="http://foo.com/1.jpg" height="375" width="500" medium="image"/>
</item>
<item> ... </item>
</channel>
'''
>>> node = ElementTree.fromstring(x)
>>> for elem in node.findall('item/{http://www.w3.org/TR/html4/}category'):
print elem.text
photography/misc

media is an XML namespace, it has to be defined somewhere earlier with xmlns:media="...". See http://lxml.de/xpathxslt.html#namespaces-and-prefixes for how to define xml namespaces for use in XPath expressions in lxml.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get xml value of ElementTree Element - python

Try print(ET.tostring(root.find('.//child')).decode(),ET.tostring(root.find('.//ch')).decode()) Or, more readable: elems = ['child','ch'] for elem in elems: print(ET.tostring(doc.find(f'.//{elem}')).decode()) The output, based on the xml in your question, should be what you're looking for.

Related

Python : Remove an element but not its children from xml?

How to query XML node using ElementTree in python

How to get specific block(group) based on child node's value in XPath from the XML?

how to parse the second xml tree in a file

parse .xml with prefix's on tags? xml.etree.ElementTree

Categories

Resources