copy xml node and children to new xml file - python

I have the following code that loops over a set of records and moves each record to a new file:
import os
import xml.etree.cElementTree as ET
for filename in os.listdir('modemuze'):
if filename.endswith('.xml'):
original_tree = ET.ElementTree(file='modemuze/'+filename)
root = original_tree.getroot()
for child in root[2]:
if child.tag == "{http://www.openarchives.org/OAI/2.0/}record":
new_tree = ET.ElementTree(file='test.xml')
new_root = new_tree.getroot()
new_root.append(child)
the file which contains the records i want to move has to following structure:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<OAI-PMH xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd" xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<responseDate>2018-02-14T12:15:11.241+02:00</responseDate>
<request verb="ListRecords">147.102.11.65:9000/fashionedmfp/oai</request>
<ListRecords>
<record>
<header>
<identifier>oai:fashionedmfp:8c549bcd078e2ce84a265d318547c5f8e8bf0cd0</identifier>
<datestamp>2016-06-27</datestamp>
</header>
<metadata>
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:edm="http://www.europeana.eu/schemas/edm/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:edmfp="http://www.europeanafashion.eu/edmfp/" xmlns:rdaGr2="http://rdvocab.info/ElementsGr2/" xmlns:ore="http://www.openarchives.org/ore/terms/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:mrel="http://id.loc.gov/vocabulary/relators/" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:wgs84="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:crm="http://www.cidoc-crm.org/rdfs/cidoc_crm_v5.0.2_english_label.rdfs#" xmlns:gr="http://www.heppnetz.de/ontologies/goodrelations/v1#" xmlns:xalan="http://xml.apache.org/xalan" xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<edm:ProvidedCHO rdf:about="localID/europeana-fashion/10918x1y95563">
<dcterms:created>circa 1855-1874</dcterms:created>
<dcterms:issued>1875</dcterms:issued>
<dcterms:issued>1875</dcterms:issued>
<dcterms:medium>katoen</dcterms:medium>
<dcterms:provenance>schenking</dcterms:provenance>
<dcterms:provenance>1958</dcterms:provenance>
<dcterms:spatial>
<edm:Place>
<skos:prefLabel>Nederland</skos:prefLabel>
</edm:Place>
</dcterms:spatial>
<dcterms:creator>
<edm:Agent/>
</dcterms:creator>
<dc:creator>
<edm:Agent/>
</dc:creator>
<dc:description>Jurk (kind) van wit piqué, gegarneerd met broderie en soustache</dc:description>
<dc:identifier>0061211</dc:identifier>
<dc:title>jurken</dc:title>
<dc:title>kinderkleding</dc:title>
<edm:type>IMAGE</edm:type>
<edmfp:localType>jurken</edmfp:localType>
<edmfp:localType>kinderkleding</edmfp:localType>
<edmfp:technique>piqué-weefsel</edmfp:technique>
</edm:ProvidedCHO>
<ore:Aggregation rdf:about="localID/europeana-fashion/Aggregation_10918x1y95563">
<edm:aggregatedCHO rdf:resource="localID/europeana-fashion/10918x1y95563"/>
<edm:dataProvider>
<edm:Agent>
<skos:prefLabel>Gemeentemuseum Den Haag</skos:prefLabel>
</edm:Agent>
</edm:dataProvider>
<edm:hasView>
<edm:WebResource rdf:about="http://images.gemeentemuseum.nl/br/0061211.jpg"/>
</edm:hasView>
<edm:isShownBy>
<edm:WebResource rdf:about="http://images.gemeentemuseum.nl/br/0061211.jpg"/>
</edm:isShownBy>
<edm:provider>
<edm:Agent rdf:about="http://www.europeanafashion.eu/">
<skos:prefLabel>Europeana Fashion</skos:prefLabel>
</edm:Agent>
</edm:provider>
<edm:rights rdf:resource="http://www.europeana.eu/rights/rr-f/"/>
</ore:Aggregation>
</rdf:RDF>
</metadata>
</record>
</ListRecords>
</OAI-PMH>
and the file i want to place the records in has the following structure:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ListRecords>
</ListRecords>
but when i run this code my test.xml file contains no records..
Why is this happening?
I am totally new to XML and python so if i missed something or something is unclear please let me know so i can clarify or add it.
I am doing this is python 2.7
All help and suggestions are very much appreciated!

You need to write changes to the file.
Add to the code new_tree.write('test.xml')
As a result, the code should look like this
import os
import xml.etree.cElementTree as ET
for filename in os.listdir('modemuze'):
if filename.endswith('.xml'):
original_tree = ET.ElementTree(file='modemuze/'+filename)
root = original_tree.getroot()
for child in root[2]:
if child.tag == "{http://www.openarchives.org/OAI/2.0/}record":
new_tree = ET.ElementTree(file='test.xml')
new_root = new_tree.getroot()
new_root.append(child)
new_tree.write('test.xml')

Related

modify node and extract data from xml file in python

I am new with python and I am looking for advices on what is the best approach to do the following task:
I have an xml file looking like this
<component xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.spiritconsortium.org/XMLSchema/SPIRIT/1685-2009 http://www.spiritconsortium.org/XMLSchema/SPIRIT/1685-2009/index.xsd">
<memoryMaps>
<memoryMap>
<name>name</name>
<description>description</description>
<peripheral>
<name>periph</name>
<description>description</description>
<baseAddress>0x0</baseAddress>
<range>0x8</range>
<width>32</width>
<register>
<name>reg1</name>
<displayName>reg1</displayName>
<addressOffset>0x0</addressOffset>
<size>32</size>
<access>read-write</access>
<reset>
<value>0x00000002</value>
<mask>0xFFFFFFFF</mask>
</reset>
<field>
</field>
</register>
</peripheral>
</memoryMap>
</memoryMaps>
</component>
I want to do some modifications to modify the node of "reset" to become 2 separate nodes, one for "resetValue" and another "resetMask" but keeping same data in "value" and "mask" extracted into "resetValue" and "resetMask" as follow:
........
<access>read-write</access>
<resetValue>0x00000002</resetValue>
<resetMask>0xFFFFFFFF</resetMask>
<field>
.............
I managed the part of parsing my xml file with success, now I can't know how to start this first modification.
Thank you to guide me.
code that create 2 sub elements under 'register' and remove the unneeded element 'reset'
import xml.etree.ElementTree as ET
xml = '''<component xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.spiritconsortium.org/XMLSchema/SPIRIT/1685-2009 http://www.spiritconsortium.org/XMLSchema/SPIRIT/1685-2009/index.xsd">
<memoryMaps>
<memoryMap>
<name>name</name>
<description>description</description>
<peripheral>
<name>periph</name>
<description>description</description>
<baseAddress>0x0</baseAddress>
<range>0x8</range>
<width>32</width>
<register>
<name>reg1</name>
<displayName>reg1</displayName>
<addressOffset>0x0</addressOffset>
<size>32</size>
<access>read-write</access>
<reset>
<value>0x00000002</value>
<mask>0xFFFFFFFF</mask>
</reset>
<field>
</field>
</register>
</peripheral>
</memoryMap>
</memoryMaps>
</component>'''
root = ET.fromstring(xml)
register = root.find('.//register')
value = register.find('.//reset/value').text
mask = register.find('.//reset/mask').text
v = ET.SubElement(register, 'resetValue')
v.text = value
m = ET.SubElement(register, 'resetMask')
m.text = mask
register.remove(register.find('reset'))
ET.dump(root)
output
<?xml version="1.0" encoding="UTF-8"?>
<component xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.spiritconsortium.org/XMLSchema/SPIRIT/1685-2009 http://www.spiritconsortium.org/XMLSchema/SPIRIT/1685-2009/index.xsd">
<memoryMaps>
<memoryMap>
<name>name</name>
<description>description</description>
<peripheral>
<name>periph</name>
<description>description</description>
<baseAddress>0x0</baseAddress>
<range>0x8</range>
<width>32</width>
<register>
<name>reg1</name>
<displayName>reg1</displayName>
<addressOffset>0x0</addressOffset>
<size>32</size>
<access>read-write</access>
<field />
<resetValue>0x00000002</resetValue>
<resetMask>0xFFFFFFFF</resetMask>
</register>
</peripheral>
</memoryMap>
</memoryMaps>
</component>

Add tag with content to existing XML (resx) using python

I have an XML with a number of strings:
<?xml version="1.0" encoding="UTF-8"?>
<Strings>
<String id="TEST_STRING_FROM_XML">
<en>Test string from XML</en>
<de>Testzeichenfolge aus XML</de>
<es>Cadena de prueba de XML</es>
<fr>Tester la chaîne à partir de XML</fr>
<it>Stringa di test da XML</it>
<ja>XMLからのテスト文字列</ja>
<ko>XML에서 테스트 문자열</ko>
<nl>Testreeks van XML</nl>
<pl>Łańcuch testowy z XML</pl>
<pt>Cadeia de teste de XML</pt>
<ru>Тестовая строка из XML</ru>
<sv>Teststräng från XML</sv>
<zh-CHS>从XML测试字符串</zh-CHS>
<zh-CHT>從XML測試字符串</zh-CHT>
<Comment>A test string that comes from a shared XML file.</Comment>
</String>
<String id="TEST_STRING_FROM_XML_2">
<en>Another test string from XML.</en>
<de></de>
<es></es>
<fr></fr>
<it></it>
<ja></ja>
<ko></ko>
<nl></nl>
<pl></pl>
<pt></pt>
<ru></ru>
<sv></sv>
<zh-CHS></zh-CHS>
<zh-CHT></zh-CHT>
<Comment>Another test string that comes from a shared XML file.</Comment>
</String>
</Strings>
And I would like to append these strings to a resx file with a long list of strings in the following format:
<?xml version="1.0" encoding="utf-8"?>
<root>
<!--
Microsoft ResX Schema
Version 2.0
**a bunch of schema and header stuff...**
-->
<data name="STRING_NAME_1" xml:space="preserve">
<value>This is a value 1</value>
<comment>This is a comment 1</comment>
</data>
<data name="STRING_NAME_2" xml:space="preserve">
<value>This is a value 2</value>
<comment>This is a comment 2</comment>
</data>
</root>
But using the following snippet of python code:
import sys, os, os.path, re
import xml.etree.ElementTree as ET
from xml.dom import minidom
existingStrings = []
newStrings = {}
languages = []
resx = '*path to resx file*'
def LoadAllNewStrings():
src_root = ET.parse('Strings.xml').getroot()
for src_string in src_root.findall('String'):
src_id = src_string.get('id')
src_value = src_string.findtext("en")
src_comment = src_string.findtext("Comment")
content = [src_value, src_comment]
newStrings[src_id] = content
def ExscludeExistingStrings():
dest_root = ET.parse(resx)
for stringName in dest_root.findall('Name'):
for stringId in newStrings:
if stringId == stringName:
newStrings.remove(stringId)
def PrettifyXML(element):
roughString = ET.tostring(element, 'utf-8')
reparsed = minidom.parseString(roughString)
return reparsed.toprettyxml(indent=" ")
def AddMissingStringsToLocalResource():
ExscludeExistingStrings()
with open(resx, "a") as output:
root = ET.parse(resx).getroot()
for newString in newStrings:
data = ET.Element("data", name=newString)
newStringContent = newStrings[newString]
newStringValue = newStringContent[0]
newStringComment = newStringContent[1]
ET.SubElement(data, "value").text = newStringValue
ET.SubElement(data, "comment").text = newStringComment
output.write(PrettifyXML(data))
if __name__ == "__main__":
LoadAllNewStrings()
AddMissingStringsToLocalResource()
I get the following XML appended to the end of the resx file:
<data name="STRING_NAME_2" xml:space="preserve">
<value>This is a value 1</value>
<comment>This is a comment 1</comment>
</data>
</root><?xml version="1.0" ?>
<data name="TEST_STRING_FROM_XML">
<value>Test string from XML</value>
<comment>A test string that comes from a shared XML file.</comment>
</data>
<?xml version="1.0" ?>
<data name="TEST_STRING_FROM_XML_2">
<value>Another test string from XML.</value>
<comment>Another test string that comes from a shared XML file.</comment>
</data>
I.e. the root ends and then my new strings are added after. Any ideas on how to add the data tags to the existing root properly?
with open(resx, "a") as output:
No. Don't open XML files as text files. Not for reading, not for writing, not for appending. Never.
The typical life cycle of an XML file is:
parsing (with an XML parser)
reading or Modification (with a DOM API)
if there were changes: Serializition (also with a DOM API)
At no point should you ever call open() on an XML file. XML files are not supposed to be treated as if they were plain text. They are not.
# parsing
resx = ET.parse(resx_path)
root = resx.getroot()
# modification
for newString in newStrings:
newStringContent = newStrings[newString]
# create node
data = ET.Element("data", name=newString)
ET.SubElement(data, "value").text = newStringContent[0]
ET.SubElement(data, "comment").text = newStringContent[1]
# append node, e.g. to the top level element
root.append(data)
# serialization
resx.write(resx_path, encoding='utf8')

parsing serial numbers tags from xml python

I have an xml file a shorter version is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<DATA>
<_1>
<member_id>AFCE6DB97D4CD67D</member_id>
</_1>
<_2>
<member_id>AFCE6DB97D4CD67D</member_id>
</_2>
</DATA>
I am using the following code to parse
tree = ElementTree.parse(args['inputxml'])
root = tree.getroot()
for dat in root:
memberID = dat.find('member_id').text
I am able to parse the member id but not sure how to parse the serial number <_1>``<_2>etc. This number keeps extending with every new record in xml.
You could use xpath():
xml = """<?xml version="1.0" encoding="UTF-8"?>
<DATA>
<_1>
<member_id>AFCE6DB97D4CD67D</member_id>
</_1>
<_2>
<member_id>AFCE6DB97D4CD67D</member_id>
</_2>
</DATA>"""
root = etree.fromstring(xml)
members = root.xpath("//member_id")
for m in members:
print m.text, m.getparent().tag
This prints:
AFCE6DB97D4CD67D _1
AFCE6DB97D4CD67E _2

Python add Tags to XML using lxml

I have the following Input XML:
<?xml version="1.0" encoding="utf-8"?>
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
</Scenario>
My Program does add three Tags to the XML but they are formatted false.
The Output XML looks like the following:
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
<Duration>12</Duration><EVC-SW-Version>08.02.0001.0027</EVC-SW-Version><STAC-Release>08.02.0001.0027</STAC-Release></Scenario>
Thats my Source-Code:
class XmlManager:
#staticmethod
def write_xml(xml_path, duration, evc_sw_version):
xml_path = os.path.abspath(xml_path)
if os.path.isfile(xml_path) and xml_path.endswith(".xml"):
# parse XML into etree
root = etree.parse(xml_path).getroot()
# add tags
duration_tag = etree.SubElement(root, "Duration")
duration_tag.text = duration
sw_version_tag = etree.SubElement(root, "EVC-SW-Version")
sw_version_tag.text = evc_sw_version
stac_release = evc_sw_version
stac_release_tag = etree.SubElement(root, "STAC-Release")
stac_release_tag.text = stac_release
# write changes to the XML-file
tree = etree.ElementTree(root)
tree.write(xml_path, pretty_print=False)
else:
XmlManager.logger.log("Invalid path to XML-file")
def main():
xml = r".\Test_Input_Data_Base\blnmerf1_md1czjyc_REL_V_08.01.0001.000x\Test_startup_0029\Test_startup_0029.xml"
XmlManager.write_xml(xml, "12", "08.02.0001.0027")
My Question is how to add the new tags to the XML in the right format. I guess its working that way for parsing again the changed XML but its not nice formated. Any Ideas? Thanks in advance.
To ensure nice pretty-printed output, you need to do two things:
Parse the input file using an XMLParser object with remove_blank_text=True.
Write the output using pretty_print=True
Example:
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse("Test_startup_0029.xml", parser)
root = tree.getroot()
duration_tag = etree.SubElement(root, "Duration")
duration_tag.text = "12"
sw_version_tag = etree.SubElement(root, "EVC-SW-Version")
sw_version_tag.text = "08.02.0001.0027"
stac_release_tag = etree.SubElement(root, "STAC-Release")
stac_release_tag.text = "08.02.0001.0027"
tree.write("output.xml", pretty_print=True)
Contents of output.xml:
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
<Duration>12</Duration>
<EVC-SW-Version>08.02.0001.0027</EVC-SW-Version>
<STAC-Release>08.02.0001.0027</STAC-Release>
</Scenario>
See also http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output.

how can I select all descendants of a certain element with ElementTree in Python 3.3?

This is the sample data.
input.xml
<root>
<entry id="1">
<headword>go</headword>
<example>I <hw>go</hw> to school.</example>
</entry>
</root>
I'd like to put node and its descendants into . That is,
output.xml
<root>
<entry id="1">
<headword>go</headword>
<examplegrp>
<example>I <hw>go</hw> to school.</example>
</examplegrp>
</entry>
</root>
My poor and incomplete script is:
import codecs
import xml.etree.ElementTree as ET
fin = codecs.open(r'input.xml', 'rb', encoding='utf-8')
data = ET.parse(fin)
root = data.getroot()
example = root.find('.//example')
for elem in example.iter():
---and then I don't know what to do---
Here's an example of how it can be done:
text = """
<root>
<entry id="1">
<headword>go</headword>
<example>I <hw>go</hw> to school.</example>
</entry>
</root>
"""
import lxml.etree
import StringIO
data = lxml.etree.parse(StringIO.StringIO(text))
root = data.getroot()
for entry in root.xpath('//example/ancestor::entry[1]'):
examplegrp = lxml.etree.SubElement(entry,"examplegrp")
nodes = [node for node in entry.xpath('./example')]
for node in nodes:
entry.remove(node)
examplegrp.append(node)
print lxml.etree.tostring(root,pretty_print=True)
which will output:
<root>
<entry id="1">
<headword>go</headword>
<examplegrp><example>I <hw>go</hw> to school.</example>
</examplegrp></entry>
</root>
http://docs.python.org/3/library/xml.dom.html?highlight=xml#node-objects
http://docs.python.org/3/library/xml.dom.html?highlight=xml#document-objects
You probably want to follow some paradigm of creating a Document Element and appending reach result to it.
group = Document.createElement(tagName)
for found in founds:
group.appendNode(found)
Or something like this

Categories