keep original xml declaration when editing with python

keep original xml declaration when editing with python - python

my original xml file looks like this:
<?xml version="1.0" encoding="utf-8"?>
<foo/>
and I want to change it to
<?xml version="1.0" encoding="utf-8"?>
<foo>
<bar>confusing dev</bar>
</foo>
I am using xml.etree.ElementTree as suggested by this tutorial
with open('file.xml','r+b') as f:
tree = etree.parse(f)
f.seek(0,0)
tree.write(f,xml_declaration=True)# default argument: encoding="us-ascii"
this outputs
<?xml version='1.0' encoding='us-ascii'?>
<foo/>
But how do I get the encoding of file.xml at runtime and pass it as an argument to tree.write or is there a better way to edit xml in python? I just want to change some Element.text but keep the declaration and namespace unchanged.

Related

How do I remove a comment outside of the root element of an XML document using python lxml

How do you remove comments above or below the root node of an xml document using python's lxml module? I want to remove only one comment above the root node, NOT all comments in the entire document. For instance, given the following xml document
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- This comment needs to be removed -->
<root>
<!-- This comment needs to STAY -->
<a/>
</root>
I want to output
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<!-- This comment needs to STAY -->
<a/>
</root>
The usual way to remove an element would be to do element.getparent().remove(element), but this doesn't work for the root element since getparent returns None. I also tried the suggestions from this stackoverflow answer, but the first answer (using a parser that remove comments) removes all comments from the document including the ones I want to keep, and the second answer (adding a dummy opening and closing tag around the document) doesn't work if the document has a directive above the root element.
I can get access to the comment above the root element using the following code, but how do I remove it from the document?
from lxml import etree as ET
tree = ET.parse("./sample_file.xml")
root = tree.getroot()
comment = root.getprevious()
# What do I do with comment now??
I've tried doing the following, but none of them worked:
comment.getparent().remove(comment) says AttributeError: 'NoneType' object has no attribute 'remove'
del comment does nothing
comment.clear() does nothing
comment.text = "" renders an empty comment <!---->
root.remove(comment) says ValueError: Element is not a child of this node.
tree.remove(comment) says AttributeError: 'lxml.etree._ElementTree' object has no attribute 'remove'
tree[:] = [root] says TypeError: 'lxml.etree._ElementTree' object does not support item assignment
Initialize a new tree with tree = ET.ElementTree(root). Serializing this new tree still has the comments somehow.

You could just build another tree by using fromstring() and passing in the root element.
from lxml import etree
tree = etree.parse("sample_file.xml")
new_tree = etree.fromstring(etree.tostring(tree.getroot()))
print(etree.tostring(new_tree, xml_declaration=True, encoding="UTF-8", standalone=True).decode())
printed output...
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<root>
<!-- This comment needs to STAY -->
<a/>
</root>
Note: This will also remove any processing instructions before root, so another option is to append the comment to root before removing...
from lxml import etree
tree = etree.parse("sample_file.xml")
root = tree.getroot()
for comment_to_delete in root.xpath("preceding::comment()"):
root.append(comment_to_delete)
root.remove(comment_to_delete)
print(etree.tostring(tree, xml_declaration=True, encoding="UTF-8", standalone=True).decode())
This produces the same output as above, but will retain any processing instructions that occur before root.

You can parse a XML file with comments with the xmlPullParser:
If your input file looks like:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- This comment needs to be removed -->
<root>
<!-- This comment needs to STAY -->
<a/>
<b>Text</b>
</root>
Parse the file and write it to a new one:
import xml.etree.ElementTree as ET
import re
# Write XML declaration line into neu file without comment 1
def write_delte_xml(input):
with open('Cleaned.xml', 'a') as my_file:
my_file.write(f'{input}')
with open('Remove_Comment.xml', 'r', encoding='utf-8') as xml:
feedstring = xml.readlines()
parser = ET.XMLPullParser(['start','end', 'comment'])
for line in enumerate(feedstring):
if line[0] == 0 and line[1].startswith('<?'):
write_delte_xml(line[1])
parser.feed(line[1])
for event, elem in parser.read_events():
if event == "comment" and line[0] != 1:
write_delte_xml(line[1])
#print(line[1])
if event == "start" and r'\>' not in line[1]:
write_delte_xml(f"{line[1]}")
#print("start",f"{line[1]},Element: {elem}")
if event == "end":
write_delte_xml(f"{line[1]}")
#print(f"END: {line[1]}")
# Clean douplicates
xml_list = []
with open('Cleaned.xml', 'rb') as xml:
lines = xml.readlines()
for line in lines:
if line not in xml_list:
xml_list.append(line)
with open('Cleaned_final.xml', 'wb') as my_file:
for line in xml_list:
my_file.write(line)
print('Cleaned.xml')
Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<!-- This comment needs to STAY -->
<a/>
<b>Text</b>
</root>

How to load xml file with specifc paragraph by xml in Python?

I have a xml file and its structure like that,
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<book>
<toc> <tocdiv pagenum="564">
<title>9thmemo</title>
<tocdiv pagenum="588">
<title>b</title>
</tocdiv>
</tocdiv></toc>
<chapter><title>9thmemo</title>
<para>...</para>
<para>...</para></chapter>
<chapter>...</chapter>
<chapter>...</chapter>
</book>
There are several chapters in the <book>...</book>, and each chapter has a title, I only want to read all content of this chapter,"9thmemo"(not others)
I tried to read by following code:
from xml.dom import minidom
filename = "result.xml"
file = minidom.parse(filename)
chapters = file.getElementsByTagName('chapter')
for i in range(10):
print(chapters[i])
I only get the address of each chapter...
if I add some sub-element like chapters[i].title, it shows cannot find this attribute

I only want to read all content of this chapter,"9thmemo"(not others)
The problem with the code is that it does not try to locate the specific 'chapter' while the answer code uses xpath in order to locate it.
Try the below
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<book>
<toc>
<tocdiv pagenum="564">
<title>9thmemo</title>
<tocdiv pagenum="588">
<title>b</title>
</tocdiv>
</tocdiv>
</toc>
<chapter>
<title>9thmemo</title>
<para>A</para>
<para>B</para>
</chapter>
<chapter>...</chapter>
<chapter>...</chapter>
</book>'''
root = ET.fromstring(xml)
chapter = root.find('.//chapter/[title="9thmemo"]')
para_data = ','.join(p.text for p in chapter.findall('para'))
print(para_data)
output
A,B

How to get the content of child->child->child->child in XML file using Python

<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.056.001.01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<FIToFIPmtCxlReq>
<Assgnmt>
<Id>TEST-ISO-81</Id>
<Assgnr>
<Agt>
<FinInstnId>
<BIC>CCCCGB2L</BIC>
</FinInstnId>
</Agt>
</Assgnr>
<Assgne>
<Agt>
<FinInstnId>
<BIC>MMMMGB2L</BIC>
</FinInstnId>
</Agt>
</Assgne>
<CreDtTm>2009-03-24T11:22:59</CreDtTm>
</Assgnmt>
<TxInf>
<CxlId>103012345</CxlId>
<Case>
<Id>ISO_TEST_CASE</Id>
<Cretr>
<Agt>
<FinInstnId>
<BIC>MMMMGB2L</BIC>
</FinInstnId>
</Agt>
</Cretr>
</Case>
</TxInf>
</Undrlyg>
</FIToFIPmtCxlReq>
</Document>
Here I want to get the content of "TxInf" like all its child and child of child and the data.
What I have tried is :
import xml.etree.ElementTree as ET
from xml.etree import ElementTree
tree = ET.parse('R3-CAMT.056.001.07-ISO-V.XML')
root = tree.getroot()
for element in root.iter():
if element.tag == "{urn:iso:std:iso:20022:tech:xsd:camt.056.001.01}TxInf":
tree._setroot(element.tag)
print(root.tag)
print(root.attrib)
Please suggest if I can change the root with _setroot or any other possible method

Try something along these lines on your code to see if it works:
for r in root.findall(".//*"):
if 'TxInf' in r.tag:
print(ET.tostring(r))
By the way, it may be easier to do it with lxml, if you can use it.

Adding XML file to the last element of existing XML using Python

I have a XML file like this, let's name is XML_old:
<?xml version="1.0" encoding="UTF-8"?>
<!--
// Description : ahbbus12alda.xml
// modifications; this notice must be included on any copy.
-->
<ipxact:component xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ipxact="http://www.accellera.org/XMLSchema/IPXACT/1685-2014"
xsi:schemaLocation="http://www.accellera.org/XMLSchema/IPXACT/1685-2014
http://www.accellera.org/XMLSchema/IPXACT/1685-2014/index.xsd">
<ipxact:vendor>spiritconsortium.org</ipxact:vendor>
<ipxact:library>Leon2RTL</ipxact:library>
<ipxact:name>ahbbus12</ipxact:name>
<ipxact:version>1.3</ipxact:version>
<ipxact:busInterfaces>
<ipxact:busInterface>
<ipxact:name>AHBClk</ipxact:name>
</ipxact:busInterface>
</ipxact:busInterfaces>
</ipxact:component>
Also, I have another XML file, XML_1, like:
<?xml version="1.0" encoding="UTF-8"?>
<ipxact:component1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ipxact="http://www.accellera.org/XMLSchema/IPXACT/1685-2014"
xsi:schemaLocation="http://www.accellera.org/XMLSchema/IPXACT/1685-2014
http://www.accellera.org/XMLSchema/IPXACT/1685-2014/index.xsd">
<ipxact:vendor1>spiritconsortium.org</ipxact:vendor1>
<ipxact:library1>Leon2RTL</ipxact:library1>
<ipxact:name1>ahbbus34</ipxact:name1>
<ipxact:version1>1.3</ipxact:version1>
<ipxact:busInterfaces1>
<ipxact:busInterface1>
<ipxact:name1>AHBClk</ipxact:name1>
<ipxact:busInterface1>
<ipxact:busInterfaces1>
</ipxact:component1>
I want to add XML_1 to the XML_old as a last child of component element in XML_old file and create a new XML file like
<?xml version="1.0" encoding="UTF-8"?>
<!--
// Description : ahbbus12alda.xml
// modifications; this notice must be included on any copy.
-->
<ipxact:component xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ipxact="http://www.accellera.org/XMLSchema/IPXACT/1685-2014"
xsi:schemaLocation="http://www.accellera.org/XMLSchema/IPXACT/1685-2014
http://www.accellera.org/XMLSchema/IPXACT/1685-2014/index.xsd">
<ipxact:vendor>spiritconsortium.org</ipxact:vendor>
<ipxact:library>Leon2RTL</ipxact:library>
<ipxact:name>ahbbus12</ipxact:name>
<ipxact:version>1.3</ipxact:version>
<ipxact:busInterfaces>
<ipxact:busInterface>
<ipxact:name>AHBClk</ipxact:name>
</ipxact:busInterface>
</ipxact:busInterfaces>
<ipxact:component1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ipxact="http://www.accellera.org/XMLSchema/IPXACT/1685-2014"
xsi:schemaLocation="http://www.accellera.org/XMLSchema/IPXACT/1685-2014
http://www.accellera.org/XMLSchema/IPXACT/1685-2014/index.xsd">
<ipxact:vendor1>spiritconsortium.org</ipxact:vendor1>
<ipxact:library1>Leon2RTL</ipxact:library1>
<ipxact:name1>ahbbus34</ipxact:name1>
<ipxact:version1>1.3</ipxact:version1>
<ipxact:busInterfaces1>
<ipxact:busInterface1>
<ipxact:name1>AHBClk</ipxact:name1>
<ipxact:busInterface1>
<ipxact:busInterfaces1>
</ipxact:component1>
</ipxact:component>
I wonder how can I do this? I appreciate if you can help me.

Combine XML files similar to ConfigParser's multiple file support

I'm writing an application configuration module that uses XML in its files. Consider the following example:
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathA>/Some/path/to/directory</PathA>
<PathB>/Another/path</PathB>
</Settings>
Now, I'd like to override certain elements in a different file that gets loaded afterwards. Example of the override file:
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathB>/Change/this/path</PathB>
</Settings>
When querying the document (with overrides) with XPath, I'd like to get this as the element tree:
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathA>/Some/path/to/directory</PathA>
<PathB>/Change/this/path</PathB>
</Settings>
This is similar to what Python's ConfigParser does with its read() method, but done with XML. How can I implement this?

You could convert the XML into an instance of Python class:
import lxml.etree as ET
import io
class Settings(object):
def __init__(self,text):
root=ET.parse(io.BytesIO(text)).getroot()
self.settings=dict((elt.tag,elt.text) for elt in root.xpath('/Settings/*'))
def update(self,other):
self.settings.update(other.settings)
text='''\
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathA>/Some/path/to/directory</PathA>
<PathB>/Another/path</PathB>
</Settings>'''
text2='''\
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathB>/Change/this/path</PathB>
</Settings>'''
s=Settings(text)
s2=Settings(text2)
s.update(s2)
print(s.settings)
yields
{'PathB': '/Change/this/path', 'PathA': '/Some/path/to/directory'}

Must you use XML? The same could be achieved with JSON much simpler:
Suppose this is the text from the first config file:
text='''
{
"PathB": "/Another/path",
"PathA": "/Some/path/to/directory"
}
'''
and this is the text from the second:
text2='''{
"PathB": "/Change/this/path"
}'''
Then to merge the to, you simply load each into a dict, and call update:
import json
config=json.loads(text)
config2=json.loads(text2)
config.update(config2)
print(config)
yields the Python dict:
{u'PathB': u'/Change/this/path', u'PathA': u'/Some/path/to/directory'}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

keep original xml declaration when editing with python - python

Related

How do I remove a comment outside of the root element of an XML document using python lxml

How to load xml file with specifc paragraph by xml in Python?

How to get the content of child->child->child->child in XML file using Python

Adding XML file to the last element of existing XML using Python

Combine XML files similar to ConfigParser's multiple file support

Categories

Resources