Python : Remove an element but not its children from xml? - python

I want to remove element but not its children. I tried with this code, but my code remove its children also.
code
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for item in root.findall('item'):
root.remove(item)
print(ET.tostring(root))
>>> <root>
</root>
test.xml
<?xml version="1.0" ?>
<root>
<item>
<data>
<number>01</number>
<step>one</step>
</data>
</item>
</root>
expected outcome
<?xml version="1.0" ?>
<root>
<data>
<number>01</number>
<step>one</step>
</data>
</root>

You should move all children of item to root before removing
for item in root.findall('item'):
for child in item:
root.append(child)
root.remove(item)
print(ET.tostring(root))
the code results in
<root>
<data>
<number>01</number>
<step>one</step>
</data>
</root>

Find the element with data tag, remove it and extend the element's parent with element's children.
import xml.etree.ElementTree as etree
data = """
<root>
<item>
<data>
<number>01</number>
<step>one</step>
</data>
</item>
</root>
"""
tree = etree.fromstring(data)
def iterparent(tree):
for parent in tree.iter():
for child in parent:
yield parent, child
tree = etree.fromstring(data)
for parent, child in iterparent(tree):
if child.tag == "data":
parent.remove(child)
parent.extend(child)
print((etree.tostring(tree)))
will output
<root>
<item>
<number>01</number>
<step>one</step>
</item>
</root>
Adapted from a similar answer for your particular use case.

Related

Get xml value of ElementTree Element

I would like to get the xml value of an element in ElementTree. For example, if I had the code:
<?xml version="1.0" encoding="UTF-8"?>
<item>
<child>asd</child>
hello world
<ch>jkl</ch>
</item>
It would get me
<child>asd</child>
hello world
<ch>jkl</ch>
Here's what I tried so far:
import xml.etree.ElementTree as ET
root = ET.fromstring("""<?xml version="1.0" encoding="UTF-8"?>
<item>
<child>asd</child>
hello world
<ch>jkl</ch>
</item>""")
print(root.text)
Try
print(ET.tostring(root.find('.//child')).decode(),ET.tostring(root.find('.//ch')).decode())
Or, more readable:
elems = ['child','ch']
for elem in elems:
print(ET.tostring(doc.find(f'.//{elem}')).decode())
The output, based on the xml in your question, should be what you're looking for.
Building on Jack Fleeting's answer, I created a solution I feel is more general, not just relating to the xml I inserted.
import xml.etree.ElementTree as ET
root = ET.fromstring("""<?xml version="1.0" encoding="UTF-8"?>
<item>
<child>asd</child>
hello world
<ch>jkl</ch>
</item>""")
for elem in root:
print(ET.tostring(root.find(f'.//{elem.tag}')).decode())

Python parse XML content of an element when there is a child element

I have an XML file as below:
<?xml version="1.0" encoding="UTF-8"?>
<data>
<text>
I have <num1>two</num1> apples and <num2>four</num2> mangoes
</text>
</data>
I want to parse the file and get the whole context of text and its children elements and assign it to variable sentence:
sentence = "I have two apples and four mangoes"
How can I do that using Python ElementTree?
xml = """
<?xml version="1.0" encoding="UTF-8"?>
<data>
<text>
I have <num1>two</num1> apples and <num2>four</num2> mangoes
</text>
</data>
"""
from xml.etree import ElementTree as ET
x_data = ET.fromstring(xml.strip())
all_text = list(x_data.findall(".//text")[0].itertext())
print(" ".join([text.strip() for text in all_text]))
Iterate through the text from the parent node, and process the text as per your need

How to get specific block(group) based on child node's value in XPath from the XML?

I am newbie for XPath. I have the following XML file.
Here my xml file:
<?xml version='1.0' encoding='utf-8'?>
<items>
<item>
<country>India</country>
<referenceId>IN375TP</referenceId>
<price>400</price>
</item>
<item>
<country>Australia</country>
<referenceId>AU120ED</referenceId>
<price>15</price>
</item>
<item>
<country>United Kingdom</country>
<referenceId>UK862RB</referenceId>
<price>20</price>
</item>
</items>
I want the following <item> tag as an output:
<item>
<country>Australia</country>
<referenceId>AU120ED</referenceId>
<price>15</price>
</item>
Note: Please use condition like /items/item[referenceId/text()="AU120ED"]
If you want to find the item by country, you can use an xpath specifying you want to find the item in items that have the text=country_name:
from lxml.etree import parse, HTMLParser
xml = parse("check.xml",HTMLParser())
print(xml.find("//items//item[country='Australia']"))
<Element item at 0x7f40faa28950>
If you actually want to search be referenceid, just change to item[referenceid='AU120ED']:
print(xml.find("//items//item[referenceid='AU120ED']"))
<Element item at 0x7f02c0c24998>
For xml:
from xml.etree import ElementTree as et
xml = et.parse("check.xml")
print(xml.find(".").find("./item[referenceId='AU120ED']"))

Find Parent of specific Child

Is it possible with the package xml.etree to find the parent of a child? For example:
<ELEMENTS>
<CONSTANT-SPECIFICATION>
</CONSTANT-SPECIFICATION>
</ELEMENTS>
<ELEMENTS>
<DATA-SPECIFICATION>
</DATA-SPECIFICATION>
</ELEMENTS>
I search for the object "ELEMENTS" that contains the Child "CONSTANT-SPECIFICATION".
You can use .//ELEMENTS[CONSTANT-SPECIFICATION] XPath expression, example:
import xml.etree.ElementTree as ET
data = """<?xml version="1.0" encoding="ISO-8859-1"?>
<ROOT>
<ELEMENTS>
<CONSTANT-SPECIFICATION>
</CONSTANT-SPECIFICATION>
</ELEMENTS>
<ELEMENTS>
<DATA-SPECIFICATION>
</DATA-SPECIFICATION>
</ELEMENTS>
</ROOT>
"""
root = ET.fromstring(data)
print root.find('.//elements[constant-specification]')

Accessing the values of an element while parsing XML in Python

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<person>
<first-name>First_Name</first-name>
<last-name>Last_Name</last-name>
<headline>Headline</headline>
<location>
<name>Some_city, STATE </name>
<country>
<code>us</code>
</country>
</location>
</person>
I'm trying to access First_Name, Last_Name, Headline and Some_city, STATE
So far I have:
import xml.etree.ElementTree as ET
tree = ET.parse(data)
root = tree.getroot()
for child in root:
print child
Which prints out:
<Element 'first-name' at 0x110726b10>
<Element 'last-name' at 0x110726b50>
<Element 'headline' at 0x110726b90>
<Element 'location' at 0x110726bd0>
How can I access the value of 'first-name'?
Get the .text property:
for child in root:
print child.text

Categories