Unable to remove element/node using ElementTree - python

I have an issue with ElementTree that I can't quite figure out. I've read all their documentation as well as all the information I could find on this forum. I have a couple elements/nodes that I am trying to remove using ElementTree. I don't get any errors with the following code, but when I look at the output file I wrote the changes to, the elements/nodes that I expected to be removed are still there. I have a document that looks like this:
<data>
<config>
<script filename="test1.txt"></script>
<documentation filename="test2.txt"></script>
</config>
</data>
My code looks as follows:
import xml.etree.ElementTree as ElementTree
xmlTree = ElementTree.parse(os.path.join(sourcePath, "test.xml"))
xmlRoot = xmlTree.getroot()
for doc in xmlRoot.findall('documentation'):
xmlRoot.remove(doc)
xmlTree.write(os.path.join(sourcePath, "testTWO.xml"))
The result is I get the following document:
<data>
<config>
<script filename="test1.txt" />
<documentation filename="test2.txt" />
</config>
</data>
What I need is something more like this. I am not stuck using ElementTree. If there is a better solution with lxml or some other library, I am all ears. I know ElementTree can be a little bit of a pain at times.
<data>
<config>
</config>
</data>

xmlRoot.findall('documentation') in your code didn't find anything, because <documentation> isn't direct child of the root element <data>. It is actually direct child of <config> :
"Element.findall() finds only elements with a tag which are direct children of the current element". [19.7.1.3. Finding interesting elements]
This is one possible way to remove all children of <config> using findall() given sample XML you posted (and assuming that the actual XML has <documentation> element closed with proper closing tag instead of closed with </script>) :
......
config = xmlRoot.find('config')
# find all children of config
for doc in config.findall('*'):
config.remove(doc)
# print just to make sure the element to be removed is correct
print ElementTree.tostring(doc)
......

Related

Reaching xml tags with namespaces in python - jupyter

Hello I try to parse this xml pasted below but my code doesn't return anything. Anyone can help me please?
<data>
<row>
<document>
<doc:document xmlns:ed="http://mail.yahoo.com/d/folders/1" xmlns:reslvd="http://mail.yahoo.com/d/folders/1/234" xmlns:ct="http://mail.yahoo.com/d/folders/cool/storybro" xmlns:doc="http://mail.yahoo.com/d/folders/nice/alldone">
<ed:EI>
<ed:SID>9865-346</ed:SID>
</doc:document>
</document>
</row>
</data>
my code is:
x = []
for value in root.findall(".//*[#name='ed:SID']"):
x.append(value.text)
Your predicate is wrong: //*[#name='ed:SID'] means: find any element which has an attribute called name and which itself has a value of ed:SID. Your sample xml doesn't have an element which meets these requirements.
Also, your sample xml is not well formed -<ed:EI> should be closed like this: <ed:EI/>.
Also, you have to deal with the namespace.
Finally, if I understand you correctly, this is what you're looking for:
for value in root.findall(".//{*}SID"):
x.append(value.text)
print(x)
The output, given your sample xml, is
['9865-346']

How do I search for a Tag in xml file using ElementTree where i have prefixes (python)

I just started learning Python and have to write a program that parses xml files.
I have multiple entries as seen below and I need, as a starting point, to return all the different d:Name entries in a list.
Unfortunately, I can't manage to use findall with prefixes.
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
lst = tree.findall('.//{d}Name')
I read that if d is a prefix, I need to use the URI instead of a. But I don't understand which is the URI in my case, or how to make a successful search when i have the following file.
I have an XML that looks like this (simplified):
feed xml:base="http://projectserver/ps/_api/">
<entry>
<id>
http://projectserver/ps/_api/ProjectServer/EnterpriseResources('some id...')
</id>
<content type="application/xml">
<m:properties>
<d:Name>
WHAT I NEED
</d:Name>
</m:properties>
</content>
</entry>
<entry>
...
This bypassed my problem so thank you!
If you are using Python 3.8 or later, this post may help: link – Jim
Rhodes
So I ran the following which returned the list of tags, where i found the {URI}Name which I then used to do the search properly.
for elem in tree.iter():
print(elem.tag)

Better way to find interactive deeper element tag in xml?

I want to find the last deeper xml tag interactively. I found some other questions but they all bring me a fixed way to find it. I want to add elements always to the last tag interactively.
root = Element('soap:Envelope', {"xmlns:soap":"http://www.w3.org/2003/05/soap-soap_envelope", "xmlns:aut":"Automidia"})
sub_elementos = [Element("soap:Body"),
Element("information", {"token":"ABC"}),
Element("data"),
Element("value")]
for elemento in sub_elementos:
list(root.iter())[-1].append(elemento) # This is the way I've found
I saw in xml Element Tree documentation that there is a findall() method that supports Xpath to navigate through XML easily. I want to know how can I use it to find the last element with last() function, instead of list(root.iter())[-1] as written in my code above. This command reduces code readability, in my opinion. Some ideias how could I achieve this?
This is my final output:
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:aut="Automidia">
<soap:Body>
<information token="ABC>
<data>
<value/>
</data>
</information>
</soap:Body>
</soap:Envelope>
something like this:
import xml.etree.ElementTree as ET
tree_elements = {'body':{}, 'info':{'token':'ABC'}, 'data':{}, 'value':{}}
tree = ET.Element('root')
root = tree
for ele,ele_attrs in tree_elements.items():
root = ET.SubElement(root, ele)
root.attrib = ele_attrs
ET.dump(tree)
output
<root><body><info token="ABC"><data><value /></data></info></body></root>

Python add new element by xml ElementTree

XML file
<?xml version="1.0" encoding="utf-8"?>
<Info xmlns="BuildTest">
<RequestDate>5/4/2020 12:27:46 AM</RequestDate>
</Info>
I want to add a new element inside the Info tag.
Here is what I did.
import xml.etree.ElementTree as ET
tree = ET.parse('example.xml')
root = tree.getroot()
ele = ET.Element('element1')
ele.text = 'ele1'
root.append(ele)
tree.write("output.xhtml")
Output
<ns0:Info xmlns:ns0="BuildTest">
<ns0:RequestDate>5/4/2020 12:27:46 AM</ns0:RequestDate>
<element1>ele1</element1></ns0:Info>
Three questions:
The <?xml version="1.0" encoding="utf-8"?> is missing.
The namespace is wrong.
The whitespace of the new element is gone.
I saw many questions related to this topic, most of them are suggesting other packages.
Is there any way it can handle properly?
The processing instructions are not considered XML elements. Just Google are processing instructions part of an XML, and the first result states:
Processing instructions are markup, but they're not elements.
Since the package you are using is literally called ElementTree, you can reasonably expect its objects to be a trees of elements. If I remember correctly, DOM compliant XML packages can support non-element markup in XML.
For the namespace issue, the answer is in stack overflow, at Remove ns0 from XML - you just have to register the namespace you specified in the top element of your document. The following worked for me:
ET.register_namespace("", "Buildtest")
As for the whitespace - the new element does not have any whitespace. You can assign to the tail member to add a linefeed after an element.

Python: Read and write namespaced XML using ElementTree

This XML file is named example.xml:
<?xml version="1.0"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>14.0.0</modelVersion>
<groupId>.com.foobar.flubber</groupId>
<artifactId>uberportalconf</artifactId>
<version>13-SNAPSHOT</version>
<packaging>pom</packaging>
<name>Environment for UberPortalConf</name>
<description>This is the description</description>
<properties>
<birduberportal.version>11</birduberportal.version>
<promotiondevice.version>9</promotiondevice.version>
<foobarportal.version>6</foobarportal.version>
<eventuberdevice.version>2</eventuberdevice.version>
</properties>
<!-- A lot more here, but as it is irrelevant for the problem I have removed it -->
</project>
If I load the example.xml file above using ElementTree and print the root node:
>>> from xml.etree import ElementTree
>>> tree = ElementTree.parse('example.xml')
>>> print tree.getroot()
<Element '{http://maven.apache.org/POM/4.0.0}project' at 0x26ee0f0>
I see that Element also contains the namespace http://maven.apache.org/POM/4.0.0.
How do I:
Get the foobarportal.version text, increase it by one and write the XML file back while keeping the namespace the document had when loaded and also not change the overall XML layout.
Get it to load using any namespace, not just http://maven.apache.org/POM/4.0.0. I still don´t want to strip the namespace, as I want the XML to stay the same except for changing foobarportal.version as in 1 above.
The current way is not aware of XML but fulfills 1 and 2 above:
Grep for <foobarportal.version>(.*)</foobarportal.version>
Take the contents of the match group and i increase it by one
Write it back.
It would be nice to have an XML aware solution, as it would be more robust. The XML namespace handling of ElementTree is making it more complicated.
If your question is simply: "how do I search by a namespaced element name", then the answer is that lxml understands {namespace} syntax, so you can do:
tree.getroot().find('{http://maven.apache.org/POM/4.0.0}project')

Categories