Combine XML files similar to ConfigParser's multiple file support - python

I'm writing an application configuration module that uses XML in its files. Consider the following example:
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathA>/Some/path/to/directory</PathA>
<PathB>/Another/path</PathB>
</Settings>
Now, I'd like to override certain elements in a different file that gets loaded afterwards. Example of the override file:
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathB>/Change/this/path</PathB>
</Settings>
When querying the document (with overrides) with XPath, I'd like to get this as the element tree:
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathA>/Some/path/to/directory</PathA>
<PathB>/Change/this/path</PathB>
</Settings>
This is similar to what Python's ConfigParser does with its read() method, but done with XML. How can I implement this?

You could convert the XML into an instance of Python class:
import lxml.etree as ET
import io
class Settings(object):
def __init__(self,text):
root=ET.parse(io.BytesIO(text)).getroot()
self.settings=dict((elt.tag,elt.text) for elt in root.xpath('/Settings/*'))
def update(self,other):
self.settings.update(other.settings)
text='''\
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathA>/Some/path/to/directory</PathA>
<PathB>/Another/path</PathB>
</Settings>'''
text2='''\
<?xml version="1.0" encoding="UTF-8"?>
<Settings>
<PathB>/Change/this/path</PathB>
</Settings>'''
s=Settings(text)
s2=Settings(text2)
s.update(s2)
print(s.settings)
yields
{'PathB': '/Change/this/path', 'PathA': '/Some/path/to/directory'}

Must you use XML? The same could be achieved with JSON much simpler:
Suppose this is the text from the first config file:
text='''
{
"PathB": "/Another/path",
"PathA": "/Some/path/to/directory"
}
'''
and this is the text from the second:
text2='''{
"PathB": "/Change/this/path"
}'''
Then to merge the to, you simply load each into a dict, and call update:
import json
config=json.loads(text)
config2=json.loads(text2)
config.update(config2)
print(config)
yields the Python dict:
{u'PathB': u'/Change/this/path', u'PathA': u'/Some/path/to/directory'}

Related

How to load xml file with specifc paragraph by xml in Python?

I have a xml file and its structure like that,
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<book>
<toc> <tocdiv pagenum="564">
<title>9thmemo</title>
<tocdiv pagenum="588">
<title>b</title>
</tocdiv>
</tocdiv></toc>
<chapter><title>9thmemo</title>
<para>...</para>
<para>...</para></chapter>
<chapter>...</chapter>
<chapter>...</chapter>
</book>
There are several chapters in the <book>...</book>, and each chapter has a title, I only want to read all content of this chapter,"9thmemo"(not others)
I tried to read by following code:
from xml.dom import minidom
filename = "result.xml"
file = minidom.parse(filename)
chapters = file.getElementsByTagName('chapter')
for i in range(10):
print(chapters[i])
I only get the address of each chapter...
if I add some sub-element like chapters[i].title, it shows cannot find this attribute
I only want to read all content of this chapter,"9thmemo"(not others)
The problem with the code is that it does not try to locate the specific 'chapter' while the answer code uses xpath in order to locate it.
Try the below
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<book>
<toc>
<tocdiv pagenum="564">
<title>9thmemo</title>
<tocdiv pagenum="588">
<title>b</title>
</tocdiv>
</tocdiv>
</toc>
<chapter>
<title>9thmemo</title>
<para>A</para>
<para>B</para>
</chapter>
<chapter>...</chapter>
<chapter>...</chapter>
</book>'''
root = ET.fromstring(xml)
chapter = root.find('.//chapter/[title="9thmemo"]')
para_data = ','.join(p.text for p in chapter.findall('para'))
print(para_data)
output
A,B

Removing Elements from a KML (Python)

I generated a KML file using Python's SimpleKML library and the following script, the output of which is also shown below:
import simplekml
kml = simplekml.Kml()
ground = kml.newgroundoverlay(name='Aerial Extent')
ground.icon.href = 'C:\\Users\\mdl518\\Desktop\\aerial_image.png'
ground.latlonbox.north = 46.55537
ground.latlonbox.south = 46.53134
ground.latlonbox.east = 48.60005
ground.latlonbox.west = 48.57678
ground.latlonbox.rotation = 0.090320
kml.save(".//aerial_extent.kml")
The output KML:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Document id="1">
<GroundOverlay id="2">
<name>Aerial Extent</name>
<Icon id="3">
<href>C:\\Users\\mdl518\\Desktop\\aerial_image.png</href>
</Icon>
<LatLonBox>
<north>46.55537</north>
<south>46.53134</south>
<east>48.60005</east>
<west>48.57678</west>
<rotation>0.090320</rotation>
</LatLonBox>
</GroundOverlay>
</Document>
However, I am trying to remove the "Document" tag from this KML since it is a default element generated with SimpleKML, while keeping the child elements (e.g. GroundOverlay). Additionally, is there a way to remove the "id" attributes associated with specific elements (i.e. for the GroundOverlay, Icon elements)? I am exploring the usage of ElementTree/lxml to enable this, but these seem to be more specific to XML files as opposed to KMLs. Here's what I'm trying to use to modify the KML, but it is unable to remove the Document element:
from lxml import etree
tree = etree.fromstring(open("C:\\Users\\mdl518\\Desktop\\aerial_extent.kml").read())
for item in tree.xpath("//Document[#id='1']"):
item.getparent().remove(item)
print(etree.tostring(tree, pretty_print=True))
Here is the final desired output XML:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<GroundOverlay>
<name>Aerial Extent</name>
<Icon>
<href>C:\\Users\\mdl518\\Desktop\\aerial_image.png</href>
</Icon>
<LatLonBox>
<north>46.55537</north>
<south>46.53134</south>
<east>48.60005</east>
<west>48.57678</west>
<rotation>0.090320</rotation>
</LatLonBox>
</GroundOverlay>
</kml>
Any insights are most appreciated!
You are getting tripped up on the dreaded namespaces...
Try using something like this:
ns = {'kml': 'http://www.opengis.net/kml/2.2'}
for item in tree.xpath("//kml:Document[#id='1']",namespaces=ns):
item.getparent().remove(item)
Edit:
To remove just the parent and retain all its descendants, try the following:
retain = doc.xpath("//kml:Document[#id='1']/kml:GroundOverlay",namespaces=ns)[0]
for item in doc.xpath("//kml:Document[#id='1']",namespaces=ns):
anchor = item.getparent()
anchor.remove(item)
anchor.insert(1,retain)
print(etree.tostring(doc, pretty_print=True).decode())
This should get you the desired output.

Adding XML file to the last element of existing XML using Python

I have a XML file like this, let's name is XML_old:
<?xml version="1.0" encoding="UTF-8"?>
<!--
// Description : ahbbus12alda.xml
// modifications; this notice must be included on any copy.
-->
<ipxact:component xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ipxact="http://www.accellera.org/XMLSchema/IPXACT/1685-2014"
xsi:schemaLocation="http://www.accellera.org/XMLSchema/IPXACT/1685-2014
http://www.accellera.org/XMLSchema/IPXACT/1685-2014/index.xsd">
<ipxact:vendor>spiritconsortium.org</ipxact:vendor>
<ipxact:library>Leon2RTL</ipxact:library>
<ipxact:name>ahbbus12</ipxact:name>
<ipxact:version>1.3</ipxact:version>
<ipxact:busInterfaces>
<ipxact:busInterface>
<ipxact:name>AHBClk</ipxact:name>
</ipxact:busInterface>
</ipxact:busInterfaces>
</ipxact:component>
Also, I have another XML file, XML_1, like:
<?xml version="1.0" encoding="UTF-8"?>
<ipxact:component1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ipxact="http://www.accellera.org/XMLSchema/IPXACT/1685-2014"
xsi:schemaLocation="http://www.accellera.org/XMLSchema/IPXACT/1685-2014
http://www.accellera.org/XMLSchema/IPXACT/1685-2014/index.xsd">
<ipxact:vendor1>spiritconsortium.org</ipxact:vendor1>
<ipxact:library1>Leon2RTL</ipxact:library1>
<ipxact:name1>ahbbus34</ipxact:name1>
<ipxact:version1>1.3</ipxact:version1>
<ipxact:busInterfaces1>
<ipxact:busInterface1>
<ipxact:name1>AHBClk</ipxact:name1>
<ipxact:busInterface1>
<ipxact:busInterfaces1>
</ipxact:component1>
I want to add XML_1 to the XML_old as a last child of component element in XML_old file and create a new XML file like
<?xml version="1.0" encoding="UTF-8"?>
<!--
// Description : ahbbus12alda.xml
// modifications; this notice must be included on any copy.
-->
<ipxact:component xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ipxact="http://www.accellera.org/XMLSchema/IPXACT/1685-2014"
xsi:schemaLocation="http://www.accellera.org/XMLSchema/IPXACT/1685-2014
http://www.accellera.org/XMLSchema/IPXACT/1685-2014/index.xsd">
<ipxact:vendor>spiritconsortium.org</ipxact:vendor>
<ipxact:library>Leon2RTL</ipxact:library>
<ipxact:name>ahbbus12</ipxact:name>
<ipxact:version>1.3</ipxact:version>
<ipxact:busInterfaces>
<ipxact:busInterface>
<ipxact:name>AHBClk</ipxact:name>
</ipxact:busInterface>
</ipxact:busInterfaces>
<ipxact:component1 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ipxact="http://www.accellera.org/XMLSchema/IPXACT/1685-2014"
xsi:schemaLocation="http://www.accellera.org/XMLSchema/IPXACT/1685-2014
http://www.accellera.org/XMLSchema/IPXACT/1685-2014/index.xsd">
<ipxact:vendor1>spiritconsortium.org</ipxact:vendor1>
<ipxact:library1>Leon2RTL</ipxact:library1>
<ipxact:name1>ahbbus34</ipxact:name1>
<ipxact:version1>1.3</ipxact:version1>
<ipxact:busInterfaces1>
<ipxact:busInterface1>
<ipxact:name1>AHBClk</ipxact:name1>
<ipxact:busInterface1>
<ipxact:busInterfaces1>
</ipxact:component1>
</ipxact:component>
I wonder how can I do this? I appreciate if you can help me.

keep original xml declaration when editing with python

my original xml file looks like this:
<?xml version="1.0" encoding="utf-8"?>
<foo/>
and I want to change it to
<?xml version="1.0" encoding="utf-8"?>
<foo>
<bar>confusing dev</bar>
</foo>
I am using xml.etree.ElementTree as suggested by this tutorial
with open('file.xml','r+b') as f:
tree = etree.parse(f)
f.seek(0,0)
tree.write(f,xml_declaration=True)# default argument: encoding="us-ascii"
this outputs
<?xml version='1.0' encoding='us-ascii'?>
<foo/>
But how do I get the encoding of file.xml at runtime and pass it as an argument to tree.write or is there a better way to edit xml in python? I just want to change some Element.text but keep the declaration and namespace unchanged.

XPath with LXML Element

I am trying to parse an XML document using lxml etree. The XML doc I am parsing looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.openarchives.org/OAI/2.0/">\t
<codeBook version="2.5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="ddi:codebook:2_5" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
<docDscr>
<citation>
<titlStmt>
<titl>Test Title</titl>
</titlStmt>
<prodStmt>
<prodDate/>
</prodStmt>
</citation>
</docDscr>
<stdyDscr>
<citation>
<titlStmt>
<titl>Test Title 2</titl>
<IDNo agency="UKDA">101</IDNo>
</titlStmt>
<rspStmt>
<AuthEnty>TestAuthEntry</AuthEnty>
</rspStmt>
<prodStmt>
<copyright>Yes</copyright>
</prodStmt>
<distStmt/>
<verStmt>
<version date="">1</version>
</verStmt>
</citation>
<stdyInfo>
<subject>
<keyword>2009</keyword>
<keyword>2010</keyword>
<topcClas>CLASS</topcClas>
<topcClas>ffdsf</topcClas>
</subject>
<abstract>This is an abstract piece of text.</abstract>
<sumDscr>
<timePrd event="single">2020</timePrd>
<nation>UK</nation>
<anlyUnit>Test</anlyUnit>
<universe>test</universe>
<universe>hello</universe>
<dataKind>fdsfdsf</dataKind>
</sumDscr>
</stdyInfo>
<method>
<dataColl>
<timeMeth>test timemeth</timeMeth>
<dataCollector>test data collector</dataCollector>
<sampProc>test sampprocess</sampProc>
<deviat>test deviat</deviat>
<collMode>test collMode</collMode>
<sources/>
</dataColl>
</method>
<dataAccs>
<setAvail>
<accsPlac>Test accsPlac</accsPlac>
</setAvail>
<useStmt>
<restrctn>NONE</restrctn>
</useStmt>
</dataAccs>
<othrStdyMat>
<relPubl>122</relPubl>
<relPubl>12332</relPubl>
</othrStdyMat>
</stdyDscr>
</codeBook>
</metadata>
I wrote the following code to try and process it:
from lxml import etree
import pdb
f = open('/vagrant/out2.xml', 'r')
xml_str = f.read()
xml_doc = etree.fromstring(xml_str)
f.close()
From what I understand from the lxml xpath docs, I should be able to get the text from a specific element as follows:
xml_doc.xpath('/metadata/codeBook/docDscr/citation/titlStmt/titl/text()')
However, when I run this it returns an empty array.
The only xpath I can get to return something is using a wildcard:
xml_doc.xpath('*')
Which returns [<Element {ddi:codebook:2_5}codeBook at 0x7f8da8a413f8>].
I've read through the docs and I'm not understanding what is going wrong with this. Any help is appreciated.
You need to take the default namespace into account so instead of
xml_doc.xpath('/metadata/codeBook/docDscr/citation/titlStmt/titl/text()')
use
xml_doc.xpath.xpath(
'/oai:metadata/ddi:codeBook/ddi:docDscr/ddi:citation/ddi:titlStmt/ddi:titl/text()',
namespaces={
'oai': 'http://www.openarchives.org/OAI/2.0/',
'ddi': 'ddi:codebook:2_5'
}
)

Categories