libxml2 in python edit of a specific node - python

I use python with libxml2 for a webapplication.
I need to add an xml node child of one of the articles to an xml with this structure:
<?xml version="1.0" encoding="UTF-8"?>
<archive>
<article>text<span>tag</span></article>
<article>text2<span>tag2</span></article>
<article>text3<span>tag3</span></article>
</archive>
The result must be something like this:
<?xml version="1.0" encoding="UTF-8"?>
<archive>
<article>text<span>tag</span></article>
<article>text2<span>tag2</span><counter count='1'/></article>
<article>text3<span>tag3</span></article>
</archive>
Sorry for the bad english
edit: the webclient will specify when a i need to add the counter node.
my problem is that i can't found the way to select the node i need to modify.
i can modify an xml document adding a child the root element i haven't found a way to do that in elements other than the root element

If you can narrow down the XPath results to a single node then this code will work:
#!/usr/bin/env python
import libxml2
doc = libxml2.parseFile("foo.xml")
nodes=doc.xpathEval('/archive/article[2]')
newElement = nodes[0].newChild(None, 'counter', None)
newElement.newProp('count', '1')
print nodes[0].serialize()
doc.freeDoc()

Related

Better way to find interactive deeper element tag in xml?

I want to find the last deeper xml tag interactively. I found some other questions but they all bring me a fixed way to find it. I want to add elements always to the last tag interactively.
root = Element('soap:Envelope', {"xmlns:soap":"http://www.w3.org/2003/05/soap-soap_envelope", "xmlns:aut":"Automidia"})
sub_elementos = [Element("soap:Body"),
Element("information", {"token":"ABC"}),
Element("data"),
Element("value")]
for elemento in sub_elementos:
list(root.iter())[-1].append(elemento) # This is the way I've found
I saw in xml Element Tree documentation that there is a findall() method that supports Xpath to navigate through XML easily. I want to know how can I use it to find the last element with last() function, instead of list(root.iter())[-1] as written in my code above. This command reduces code readability, in my opinion. Some ideias how could I achieve this?
This is my final output:
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:aut="Automidia">
<soap:Body>
<information token="ABC>
<data>
<value/>
</data>
</information>
</soap:Body>
</soap:Envelope>
something like this:
import xml.etree.ElementTree as ET
tree_elements = {'body':{}, 'info':{'token':'ABC'}, 'data':{}, 'value':{}}
tree = ET.Element('root')
root = tree
for ele,ele_attrs in tree_elements.items():
root = ET.SubElement(root, ele)
root.attrib = ele_attrs
ET.dump(tree)
output
<root><body><info token="ABC"><data><value /></data></info></body></root>

Python add new element by xml ElementTree

XML file
<?xml version="1.0" encoding="utf-8"?>
<Info xmlns="BuildTest">
<RequestDate>5/4/2020 12:27:46 AM</RequestDate>
</Info>
I want to add a new element inside the Info tag.
Here is what I did.
import xml.etree.ElementTree as ET
tree = ET.parse('example.xml')
root = tree.getroot()
ele = ET.Element('element1')
ele.text = 'ele1'
root.append(ele)
tree.write("output.xhtml")
Output
<ns0:Info xmlns:ns0="BuildTest">
<ns0:RequestDate>5/4/2020 12:27:46 AM</ns0:RequestDate>
<element1>ele1</element1></ns0:Info>
Three questions:
The <?xml version="1.0" encoding="utf-8"?> is missing.
The namespace is wrong.
The whitespace of the new element is gone.
I saw many questions related to this topic, most of them are suggesting other packages.
Is there any way it can handle properly?
The processing instructions are not considered XML elements. Just Google are processing instructions part of an XML, and the first result states:
Processing instructions are markup, but they're not elements.
Since the package you are using is literally called ElementTree, you can reasonably expect its objects to be a trees of elements. If I remember correctly, DOM compliant XML packages can support non-element markup in XML.
For the namespace issue, the answer is in stack overflow, at Remove ns0 from XML - you just have to register the namespace you specified in the top element of your document. The following worked for me:
ET.register_namespace("", "Buildtest")
As for the whitespace - the new element does not have any whitespace. You can assign to the tail member to add a linefeed after an element.

replicate namespace and root element attributes with ElementTree

Trying to replicate the following root element including namespace:
<ns0:StdFX1.3 xmlns:ns0="http://website.com/schemas/StdFX1.3.In"
CutOff="2200LON" DataSource="" SpotDataSource="">
</ns0:StdFX1.3>
here is my code so far:
import xml.etree.ElementTree as ET
ET.register_namespace("", "http://website.com/schemas/StdFX1.3.In")
top = ET.Element('{http://website.com/schemas/StdFX1.3.In}Stuff')
it only gets me the following though:
<?xml version='1.0' encoding='UTF-8'?>
< xmlns="http://website.com/schemas/StdFX1.3.In">
I gave up and used string substitution on the final object.
root.tostring().replace("mangled toplevel namespace", '<ns0:StdFX1.3 xmlns:ns0="http://website.com/schemas/StdFX1.3.In"
CutOff="2200LON" DataSource="" SpotDataSource="">')
Likewise for the closing tag. Any other way just wouldn't keep the changes I specified.
fromString method to get back to the element tree. I was just submitting XML so didn't so I can't remember if this effected your desired changes, but you get the desired XML.

Unable to remove element/node using ElementTree

I have an issue with ElementTree that I can't quite figure out. I've read all their documentation as well as all the information I could find on this forum. I have a couple elements/nodes that I am trying to remove using ElementTree. I don't get any errors with the following code, but when I look at the output file I wrote the changes to, the elements/nodes that I expected to be removed are still there. I have a document that looks like this:
<data>
<config>
<script filename="test1.txt"></script>
<documentation filename="test2.txt"></script>
</config>
</data>
My code looks as follows:
import xml.etree.ElementTree as ElementTree
xmlTree = ElementTree.parse(os.path.join(sourcePath, "test.xml"))
xmlRoot = xmlTree.getroot()
for doc in xmlRoot.findall('documentation'):
xmlRoot.remove(doc)
xmlTree.write(os.path.join(sourcePath, "testTWO.xml"))
The result is I get the following document:
<data>
<config>
<script filename="test1.txt" />
<documentation filename="test2.txt" />
</config>
</data>
What I need is something more like this. I am not stuck using ElementTree. If there is a better solution with lxml or some other library, I am all ears. I know ElementTree can be a little bit of a pain at times.
<data>
<config>
</config>
</data>
xmlRoot.findall('documentation') in your code didn't find anything, because <documentation> isn't direct child of the root element <data>. It is actually direct child of <config> :
"Element.findall() finds only elements with a tag which are direct children of the current element". [19.7.1.3. Finding interesting elements]
This is one possible way to remove all children of <config> using findall() given sample XML you posted (and assuming that the actual XML has <documentation> element closed with proper closing tag instead of closed with </script>) :
......
config = xmlRoot.find('config')
# find all children of config
for doc in config.findall('*'):
config.remove(doc)
# print just to make sure the element to be removed is correct
print ElementTree.tostring(doc)
......

Python: Read and write namespaced XML using ElementTree

This XML file is named example.xml:
<?xml version="1.0"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>14.0.0</modelVersion>
<groupId>.com.foobar.flubber</groupId>
<artifactId>uberportalconf</artifactId>
<version>13-SNAPSHOT</version>
<packaging>pom</packaging>
<name>Environment for UberPortalConf</name>
<description>This is the description</description>
<properties>
<birduberportal.version>11</birduberportal.version>
<promotiondevice.version>9</promotiondevice.version>
<foobarportal.version>6</foobarportal.version>
<eventuberdevice.version>2</eventuberdevice.version>
</properties>
<!-- A lot more here, but as it is irrelevant for the problem I have removed it -->
</project>
If I load the example.xml file above using ElementTree and print the root node:
>>> from xml.etree import ElementTree
>>> tree = ElementTree.parse('example.xml')
>>> print tree.getroot()
<Element '{http://maven.apache.org/POM/4.0.0}project' at 0x26ee0f0>
I see that Element also contains the namespace http://maven.apache.org/POM/4.0.0.
How do I:
Get the foobarportal.version text, increase it by one and write the XML file back while keeping the namespace the document had when loaded and also not change the overall XML layout.
Get it to load using any namespace, not just http://maven.apache.org/POM/4.0.0. I still donĀ“t want to strip the namespace, as I want the XML to stay the same except for changing foobarportal.version as in 1 above.
The current way is not aware of XML but fulfills 1 and 2 above:
Grep for <foobarportal.version>(.*)</foobarportal.version>
Take the contents of the match group and i increase it by one
Write it back.
It would be nice to have an XML aware solution, as it would be more robust. The XML namespace handling of ElementTree is making it more complicated.
If your question is simply: "how do I search by a namespaced element name", then the answer is that lxml understands {namespace} syntax, so you can do:
tree.getroot().find('{http://maven.apache.org/POM/4.0.0}project')

Categories