Accessing the values of an element while parsing XML in Python - python

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<person>
<first-name>First_Name</first-name>
<last-name>Last_Name</last-name>
<headline>Headline</headline>
<location>
<name>Some_city, STATE </name>
<country>
<code>us</code>
</country>
</location>
</person>
I'm trying to access First_Name, Last_Name, Headline and Some_city, STATE
So far I have:
import xml.etree.ElementTree as ET
tree = ET.parse(data)
root = tree.getroot()
for child in root:
print child
Which prints out:
<Element 'first-name' at 0x110726b10>
<Element 'last-name' at 0x110726b50>
<Element 'headline' at 0x110726b90>
<Element 'location' at 0x110726bd0>
How can I access the value of 'first-name'?

Get the .text property:
for child in root:
print child.text

Related

How to iterate over XML children with same name as current element and avoid current element in iteration?

I have
A given XML (can't change naming) that have same name for a node and its direct children, here items
I want
To iterate on the children only, the items that have a description field
My issue
The parent node of type items appears in the iteration, even the iter is called on itself if I understand well
from xml.etree import ElementTree
content = """<?xml version="1.0" encoding="utf-8"?>
<root>
<items>
<items>
<description>foo1</description>
</items>
<items>
<description>foo2</description>
</items>
</items>
</root>
"""
tree = ElementTree.fromstring(content)
print(">>", tree.find("items"))
for item in tree.find("items").iter("items"):
print(item, item.find("description"))
Current output
>> <Element 'items' at 0x0000020B5CBF8720>
<Element 'items' at 0x0000020B5CBF8720> None
<Element 'items' at 0x0000020B5CBF8770> <Element 'description' at 0x0000020B5CBF87C0>
<Element 'items' at 0x0000020B5CBF8810> <Element 'description' at 0x0000020B5CBF8860>
Expected output
>> <Element 'items' at 0x0000020B5CBF8720>
<Element 'items' at 0x0000020B5CBF8770> <Element 'description' at 0x0000020B5CBF87C0>
<Element 'items' at 0x0000020B5CBF8810> <Element 'description' at 0x0000020B5CBF8860>
Use XPath with findall().
tree.findall('items/items')

Python : Remove an element but not its children from xml?

I want to remove element but not its children. I tried with this code, but my code remove its children also.
code
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for item in root.findall('item'):
root.remove(item)
print(ET.tostring(root))
>>> <root>
</root>
test.xml
<?xml version="1.0" ?>
<root>
<item>
<data>
<number>01</number>
<step>one</step>
</data>
</item>
</root>
expected outcome
<?xml version="1.0" ?>
<root>
<data>
<number>01</number>
<step>one</step>
</data>
</root>
You should move all children of item to root before removing
for item in root.findall('item'):
for child in item:
root.append(child)
root.remove(item)
print(ET.tostring(root))
the code results in
<root>
<data>
<number>01</number>
<step>one</step>
</data>
</root>
Find the element with data tag, remove it and extend the element's parent with element's children.
import xml.etree.ElementTree as etree
data = """
<root>
<item>
<data>
<number>01</number>
<step>one</step>
</data>
</item>
</root>
"""
tree = etree.fromstring(data)
def iterparent(tree):
for parent in tree.iter():
for child in parent:
yield parent, child
tree = etree.fromstring(data)
for parent, child in iterparent(tree):
if child.tag == "data":
parent.remove(child)
parent.extend(child)
print((etree.tostring(tree)))
will output
<root>
<item>
<number>01</number>
<step>one</step>
</item>
</root>
Adapted from a similar answer for your particular use case.

iterate through XML?

What is the easiest way to navigate through XML with python?
<html>
<body>
<soapenv:envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:body>
<getservicebyidresponse xmlns="http://www.something.com/soa/2.0/SRIManagement">
<code xmlns="">
0
</code>
<client xmlns="">
<action xsi:nil="true">
</action>
<actionmode xsi:nil="true">
</actionmode>
<clientid>
405965216
</clientid>
<firstname xsi:nil="true">
</firstname>
<id xsi:nil="true">
</id>
<lastname>
Last Name
</lastname>
<role xsi:nil="true">
</role>
<state xsi:nil="true">
</state>
</client>
</getservicebyidresponse>
</soapenv:body>
</soapenv:envelope>
</body>
</html>
I would go with regex and try to get the values of the lines I need but is there a pythonic way? something like xml[0][1] etc?
As #deceze already pointed out, you can use xml.etree.ElementTree here.
import xml.etree.ElementTree as ET
tree = ET.parse("path_to_xml_file")
root = tree.getroot()
You can iterate over all children nodes of root:
for child in root.iter():
if child.tag == 'clientid':
print(child.tag, child.text.strip())
Children are nested, and we can access specific child nodes by index, so root[0][1] should work (as long as the indices are correct).

Find Parent of specific Child

Is it possible with the package xml.etree to find the parent of a child? For example:
<ELEMENTS>
<CONSTANT-SPECIFICATION>
</CONSTANT-SPECIFICATION>
</ELEMENTS>
<ELEMENTS>
<DATA-SPECIFICATION>
</DATA-SPECIFICATION>
</ELEMENTS>
I search for the object "ELEMENTS" that contains the Child "CONSTANT-SPECIFICATION".
You can use .//ELEMENTS[CONSTANT-SPECIFICATION] XPath expression, example:
import xml.etree.ElementTree as ET
data = """<?xml version="1.0" encoding="ISO-8859-1"?>
<ROOT>
<ELEMENTS>
<CONSTANT-SPECIFICATION>
</CONSTANT-SPECIFICATION>
</ELEMENTS>
<ELEMENTS>
<DATA-SPECIFICATION>
</DATA-SPECIFICATION>
</ELEMENTS>
</ROOT>
"""
root = ET.fromstring(data)
print root.find('.//elements[constant-specification]')

python remove element containing namespace

I am trying to remove an element in an xml which contains a namespace.
Here is my code:
templateXml = """<?xml version="1.0" encoding="UTF-8"?>
<Metadata xmlns="http://www.amazon.com/UnboxMetadata/v1">
<Movie>
<CountryOfOrigin>US</CountryOfOrigin>
<TitleInfo>
<Title locale="en-GB">The Title</Title>
<Actor>
<ActorName locale="en-GB">XXX</ActorName>
<Character locale="en-GB">XXX</Character>
</Actor>
</TitleInfo>
</Movie>
</Metadata>"""
from lxml import etree
tree = etree.fromstring(templateXml)
namespaces = {'ns':'http://www.amazon.com/UnboxMetadata/v1'}
for checkActor in tree.xpath('//ns:Actor', namespaces=namespaces):
etree.strip_elements(tree, 'ns:Actor')
In my actual XML I have lots of tags, So I am trying to search for the Actor tags which contain XXX and completely remove that whole tag and its contents. But it's not working.
Use remove() method:
for checkActor in tree.xpath('//ns:Actor', namespaces=namespaces):
checkActor.getparent().remove(checkActor)
print etree.tostring(tree, pretty_print=True, xml_declaration=True)
prints:
<?xml version='1.0' encoding='ASCII'?>
<Metadata xmlns="http://www.amazon.com/UnboxMetadata/v1">
<Movie>
<CountryOfOrigin>US</CountryOfOrigin>
<TitleInfo>
<Title locale="en-GB">The Title</Title>
</TitleInfo>
</Movie>
</Metadata>

Categories