Need some help generating XML with Python - python

I have some variables in Python that I need to store as XML. I have been using the python:LXML module for this so far. Not too experienced with it. Have tried playing around with various tutorials and docs, but I am at a dead end need some help.
Here is the python script:
root = etree.Element("root")
coins=etree.Element("coins")
doc=etree.ElementTree(coins)
coins.append(etree.Element("trader"))
coins.append(etree.Element("metal"))
coins.append(etree.Element("type"))
coins.append(etree.Element("price"))
coins[0].text="Gold.co.uk"
coins[0].attrib["variable"]=("GLDAG_MAPLE")
coins[1].text="Silver"
coins[2].text="Britannia"
coins[3].text=str(GLDAG_MAPLE)
doc.write('data.xml', pretty_print=True)
As of now it outputs this:
<coins>
<trader variable="GLDAG_MAPLE">Gold.co.uk</trader>
<metal>Silver</metal>
<type>Britannia</type>
<price>
£31.20
</price>
</coins>
However I would like it to look like this:
<root>
<coin>
<trader> Gold.co.uk </trader>
<type> Britannia </type>
<price> £31.20 </price>
</coin>
</root>
The tags and their sub-tags would be duplicated for every type of coin. I have no idea how to construct the XML so that the output looks like the third code-block. So far I have tried to follow other scripts that I have seen on github and other sites but modify them to suit my needs but my scripts keep failing or producing incorrect resaults for some reason.
If someone could help me out then that would be great!

You can simply append the Element to root:
from lxml import etree
coinItems = [
{'trader': 'Gold.co.uk', 'metal': 'Silver', 'type': 'Britannia'},
{'trader': 'copper.co.uk', 'metal': 'Copper', 'type': 'World'}
]
root = etree.Element("root")
for ci in coinItems:
coin=etree.Element("coin")
etree.SubElement(coin, "trader", {'variable': 'GLDAG_MAPLE'}).text = ci['trader'] # example how to use attributes!
etree.SubElement(coin, "metal").text = ci['metal']
etree.SubElement(coin, "type").text = ci['type']
root.append(coin)
fName = '/tmp/data.xml'
with open(fName, 'wb') as f:
# remove encoding here, in case you want escaped ASCII characters: £
f.write(etree.tostring(root, xml_declaration=True, encoding="utf-8", pretty_print=True))
print(open(fName).read())
Output:
<?xml version='1.0' encoding='utf-8'?>
<root>
<coin>
<trader variable="GLDAG_MAPLE">Gold.co.uk</trader>
<metal>Silver</metal>
<type>Britannia</type>
</coin>
<coin>
<trader variable="GLDAG_MAPLE">copper.co.uk</trader>
<metal>Copper</metal>
<type>World</type>
</coin>
</root>

I prefer using the lxml builder (https://lxml.de/api/lxml.builder.ElementMaker-class.html) because imho it is easier to see the structure of your XML document.
from lxml.builder import E
root = E.root(
E.coin(
E.trader("Gold.co.uk",
variable="GLDAG_MAPLE"),
E.metal("silver"),
E.price("£31.20")
)
)
You can then append the root element to your main document.

Related

How to force ElementTree to keep xmlns attribute within its original element?

I have an input XML file:
<?xml version='1.0' encoding='utf-8'?>
<configuration>
<runtime name="test" version="1.2" xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
<ns0:assemblyBinding>
<ns0:dependentAssembly />
</ns0:assemblyBinding>
</runtime>
</configuration>
...and Python script:
import xml.etree.ElementTree as ET
file_xml = 'test.xml'
tree = ET.parse(file_xml)
root = tree.getroot()
print (root.tag)
print (root.attrib)
element_runtime = root.find('.//runtime')
print (element_runtime.tag)
print (element_runtime.attrib)
tree.write(file_xml, xml_declaration=True, encoding='utf-8', method="xml")
...which gives the following output:
>test.py
configuration
{}
runtime
{'name': 'test', 'version': '1.2'}
...and has an undesirable side-effect of modifying XML into:
<?xml version='1.0' encoding='utf-8'?>
<configuration xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
<runtime name="test" version="1.2">
<ns0:assemblyBinding>
<ns0:dependentAssembly />
</ns0:assemblyBinding>
</runtime>
</configuration>
My original script modifies XML so I do have to call tree.write and save edited file. But the problem is that ElementTree parser moves xmlns attribute from runtime element up to the root element configuration which is not desirable in my case.
I can't remove xmlns attribute from the root element (remove it from the dictionary of its attributes) as it is not listed in a list of its attributes (unlike the attributes listed for runtime element).
Why does xmlns attribute never gets listed within the list of attributes for any element?
How to force ElementTree to keep xmlns attribute within its original element?
I am using Python 3.5.1 on Windows.
xml.etree.ElementTree pulls all namespaces into the first element as it internally doesn't track on which element the namespace was declared originally.
If you don't want that, you'll have to write your own serialisation logic.
The better alternative would be to use lxml instead of xml.etree, because it preserves the location where a namespace prefix is declared.
Following #mata advice, here I give an answer with an example with code and xml file attached.
The xml input is as shown in the picture (original and modified)
The python codes check the NtnlCcy Name and if it is "EUR", convert the Price to USD (by multiplying EURUSD: = 1.2) and change the NtnlCcy Name to "USD".
The python code is as follows:
from lxml import etree
pathToXMLfile = r"C:\Xiang\codes\Python\afmreports\test_original.xml"
tree = etree.parse(pathToXMLfile)
root = tree.getroot()
EURUSD = 1.2
for Rchild in root:
print ("Root child: ", Rchild.tag, ". \n")
if Rchild.tag.endswith("Pyld"):
for PyldChild in Rchild:
print ("Pyld Child: ", PyldChild.tag, ". \n")
Doc = Rchild.find('{001.003}Document')
FinInstrNodes = Doc.findall('{001.003}FinInstr')
for FinInstrNode in FinInstrNodes:
FinCcyNode = FinInstrNode.find('{001.003}NtnlCcy')
FinPriceNode = FinInstrNode.find('{001.003}Price')
FinCcyNodeText = ""
if FinCcyNode is not None:
CcyNodeText = FinCcyNode.text
if CcyNodeText == "EUR":
PriceText = FinPriceNode.text
Price = float(PriceText)
FinPriceNode.text = str(Price * EURUSD)
FinCcyNode.text = "USD"
tree.write(r"C:\Xiang\codes\Python\afmreports\test_modified.xml", encoding="utf-8", xml_declaration=True)
print("\n the program runs to the end! \n")
As we compare the original and modified xml files, the namespace remains unchanged, the whole structure of the xml remains unchanged, only some NtnlCcy and Price Nodes have been changed, as desired.
The only minor difference we do not want is the first line. In the original xml file, it is <?xml version="1.0" encoding="UTF-8"?>, while in the modified xml file, it is <?xml version='1.0' encoding='UTF-8'?>. The quotation sign changes from double quotation to single quotation. But we think this minor difference should not matter.
The original file context will be attached for your easy test:
<?xml version="1.0" encoding="UTF-8"?>
<BizData xmlns="001.001">
<Hdr>
<AppHdr xmlns="001.002">
<Fr>
<Id>XXX01</Id>
</Fr>
<To>
<Id>XXX02</Id>
</To>
<CreDt>2019-10-25T15:38:30</CreDt>
</AppHdr>
</Hdr>
<Pyld>
<Document xmlns="001.003">
<FinInstr>
<Id>NLENX240</Id>
<FullNm>AO.AAI</FullNm>
<NtnlCcy>EUR</NtnlCcy>
<Price>9</Price>
</FinInstr>
<FinInstr>
<Id>NLENX681</Id>
<FullNm>AO.ABN</FullNm>
<NtnlCcy>USD</NtnlCcy>
<Price>10</Price>
</FinInstr>
<FinInstr>
<Id>NLENX320</Id>
<FullNm>AO.ING</FullNm>
<NtnlCcy>EUR</NtnlCcy>
<Price>11</Price>
</FinInstr>
</Document>
</Pyld>

XPath with LXML Element

I am trying to parse an XML document using lxml etree. The XML doc I am parsing looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.openarchives.org/OAI/2.0/">\t
<codeBook version="2.5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="ddi:codebook:2_5" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
<docDscr>
<citation>
<titlStmt>
<titl>Test Title</titl>
</titlStmt>
<prodStmt>
<prodDate/>
</prodStmt>
</citation>
</docDscr>
<stdyDscr>
<citation>
<titlStmt>
<titl>Test Title 2</titl>
<IDNo agency="UKDA">101</IDNo>
</titlStmt>
<rspStmt>
<AuthEnty>TestAuthEntry</AuthEnty>
</rspStmt>
<prodStmt>
<copyright>Yes</copyright>
</prodStmt>
<distStmt/>
<verStmt>
<version date="">1</version>
</verStmt>
</citation>
<stdyInfo>
<subject>
<keyword>2009</keyword>
<keyword>2010</keyword>
<topcClas>CLASS</topcClas>
<topcClas>ffdsf</topcClas>
</subject>
<abstract>This is an abstract piece of text.</abstract>
<sumDscr>
<timePrd event="single">2020</timePrd>
<nation>UK</nation>
<anlyUnit>Test</anlyUnit>
<universe>test</universe>
<universe>hello</universe>
<dataKind>fdsfdsf</dataKind>
</sumDscr>
</stdyInfo>
<method>
<dataColl>
<timeMeth>test timemeth</timeMeth>
<dataCollector>test data collector</dataCollector>
<sampProc>test sampprocess</sampProc>
<deviat>test deviat</deviat>
<collMode>test collMode</collMode>
<sources/>
</dataColl>
</method>
<dataAccs>
<setAvail>
<accsPlac>Test accsPlac</accsPlac>
</setAvail>
<useStmt>
<restrctn>NONE</restrctn>
</useStmt>
</dataAccs>
<othrStdyMat>
<relPubl>122</relPubl>
<relPubl>12332</relPubl>
</othrStdyMat>
</stdyDscr>
</codeBook>
</metadata>
I wrote the following code to try and process it:
from lxml import etree
import pdb
f = open('/vagrant/out2.xml', 'r')
xml_str = f.read()
xml_doc = etree.fromstring(xml_str)
f.close()
From what I understand from the lxml xpath docs, I should be able to get the text from a specific element as follows:
xml_doc.xpath('/metadata/codeBook/docDscr/citation/titlStmt/titl/text()')
However, when I run this it returns an empty array.
The only xpath I can get to return something is using a wildcard:
xml_doc.xpath('*')
Which returns [<Element {ddi:codebook:2_5}codeBook at 0x7f8da8a413f8>].
I've read through the docs and I'm not understanding what is going wrong with this. Any help is appreciated.
You need to take the default namespace into account so instead of
xml_doc.xpath('/metadata/codeBook/docDscr/citation/titlStmt/titl/text()')
use
xml_doc.xpath.xpath(
'/oai:metadata/ddi:codeBook/ddi:docDscr/ddi:citation/ddi:titlStmt/ddi:titl/text()',
namespaces={
'oai': 'http://www.openarchives.org/OAI/2.0/',
'ddi': 'ddi:codebook:2_5'
}
)

XML row structure in one row

Strange error occured, got a XML-file emailed to me which was wrongly formated. The info in the file was all in one row.
Like this
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Text><otherText><printdate>2015-02-08</printdate>
Does anyone know a quick way to fix this by using a python script or something that has had the same error?
I want to make the file like this.
<?xml version="1.0" encoding="ISO-8859-1"?>
<Text>
<OtherText>
<Name>VH2</Name>
<PrintDate>2015-02-05</PrintDate>
Thanks!
It seems you want to print pretty, if you look into other XML libraries, such as lxml, it support pretty print.
import lxml.etree as etree
x = etree.parse("filename")
print etree.tostring(x, pretty_print = True)
However, you can also try this:
Pretty printing XML in Python
If the XML is well formed this snippet will work
#!/usr/bin/python
import xml.dom.minidom
def main():
ugly_xml = open('ugly.xml', 'r')
pretty_xml = open('pretty.xml', 'w')
xmll = xml.dom.minidom.parseString(ugly_xml.read())
pretty_xml.write(xmll.toprettyxml())
if __name__ == "__main__":
main()

Can I create this XML file with lxml?

I'm trying to generate an xml that looks exactly like this:
<?xml version="1.0" encoding="utf-8"?>
<XML type="formats" version="4">
<format type="format" uid="BEAUTY:MasterBeauty">
<type>video</type>
<channelsDepth type="uint">16</channelsDepth>
<channelsEncoding type="string">Float</channelsEncoding>
<channelsEndianess type="string">Little Endian</channelsEndianess>
<fieldDominance type="int">2</fieldDominance>
<height type="uint">1080</height>
<nbChannels type="uint">4</nbChannels>
<pixelLayout type="string">ABGR</pixelLayout>
<pixelRatio type="float">1</pixelRatio>
<rowOrdering type="string">up</rowOrdering>
<width type="uint">1920</width>
</format>
</XML>
It's part of a VFX nodal workflow script ensemble and this file is part of a "read media" node.
I've spent the whole week looking at many different things but can't find anything close to this. I picked lxml for the pretty print thing. I was able to generate a bunch of other simpler (to me) xml files but for this one, I gotta say … i'm lost. Complete fail so far!
Could someone kindly shed a light on this please?
MY QUESTIONS:
- is lxml appropriate for this?
- if no, what is a better choice? (i did look for ElementTree example, no luck!)
- if yes, where do i start? Could someone share a piece a code to get me started?
What i could create so far was things like this one:
import os, sys
import lxml.etree
import lxml.builder as lb
from lxml import etree
E = lxml.builder.ElementMaker()
Setup = E.Setup
Base = E.Base
Version = E.Version
Note = E.Note
Expanded = E.Expanded
ScrollBar = E.ScrollBar
Frames = E.Frames
Current_Time = E.Current_Time
Input_DataType = E.Input_DataType
ClampMode = E.ClampMode
AdapDegrad = E.AdapDegrad
UsedAsTransition = E.UsedAsTransition
State = E.State
root_node = Setup(
Base(
Version('12.030000'),
Note(''),
Expanded('False'),
ScrollBar('0'),
Frames('0'),
Current_Time('1'),
Input_DataType('3'),
ClampMode('0'),
AdapDegrad('False'),
UsedAsTransition('False')
),
State(),
)
print lxml.etree.tostring(root_node, pretty_print=True)
str = etree.tostring(root_node, pretty_print=True)
myXMLfile = open('/Users/stefan/XenDRIVE/___DEV/PYTHON/Create_xlm/create_Batch_xml_setups/result/xml_result/root.root_node.xml', 'w')
myXMLfile.write(str)
myXMLfile.close()
Hope those are "acceptable" questions.
Thank you in advance for any help.
First, make the format node and then add it to the root XML node.
Example code (follow it to create more nodes):
from lxml import etree
from lxml.builder import ElementMaker
E = ElementMaker()
format = E.format(
E.type("video"),
E.channelsDepth("16", type="uint"),
# create more elements here
type="format",
uid="BEAUTY:MasterBeauty"
)
root = E.XML(
format,
type="formats",
version="4"
)
print(etree.tostring(root, xml_declaration=True, encoding='utf-8', pretty_print=True))
Prints:
<?xml version='1.0' encoding='utf-8'?>
<XML version="4" type="formats">
<format type="format" uid="BEAUTY:MasterBeauty">
<type>video</type>
<channelsDepth type="uint">16</channelsDepth>
</format>
</XML>

Python: Xml parsing method

I have a problem with a python script which is used to parse a xml file. This is the xml file:
file.xml
<Tag1 SchemaVersion="1.1" xmlns="http://www.microsoft.com/axe">
<RandomTag>TextText</RandomTag>
<Tag2 xmlns="http://schemas.datacontract.org/2004/07">
<AnotherRandom>Abc</AnotherRandom>
</Tag2>
</Tag1>
I am using xml.etree.ElementTree as parsing method. My task is to change the tags between RandomTag (in this case "TextText"). This is the python code:
python code
import xml.etree.ElementTree as ET
customXmlFile = 'file.xml'
ns = {
'ns': 'http://www.microsoft.com/axe',
'sc': 'http://schemas.datacontract.org/2004/07/Microsoft.Assessments.Relax.ObjectModel_V1'
}
tree = ET.parse(customXmlFile)
root = tree.getroot()
node = root.find('ns:RandomTag', namespaces=ns)
node.text = 'NEW TEXT'
ET.register_namespace('', 'http://www.microsoft.com/axe')
tree.write(customXmlFile + ".new",
xml_declaration=True,
encoding='utf-8',
method="xml")
I don't have run time errors, the code works fine, but all the namespaces are defined in the first node (Tag1) and in AnotherRandom and Tag2 is used a shorcut. Also, the SchemaVersion is deleted.
file.xml.new - output
<?xml version='1.0' encoding='utf-8'?>
<Tag1 xmlns="http://www.microsoft.com/axe" xmlns:ns1="http://schemas.datacontract.org/2004/07" SchemaVersion="1.1">
<RandomTag>NEW TEXT</RandomTag>
<ns1:Tag2>
<ns1:AnotherRandom>Abc</ns1:AnotherRandom>
</ns1:Tag2>
</Tag1>
file.xml.new - desired output
<Tag1 SchemaVersion="1.1" xmlns="http://www.microsoft.com/axe">
<RandomTag>TextText</RandomTag>
<Tag2 xmlns="http://schemas.datacontract.org/2004/07">
<AnotherRandom>NEW TEXT</AnotherRandom>
</Tag2>
</Tag1>
What should I change to get exact the same format of XML as at the beggining with that only text changed?
This is a bit of a hack but will do kind of what you want. However, playing around with namespaces like this surely violates the XML standard. I suggest you check out lxml if you want better handling of namespaces.
You must call register_namespace() before parsing in the file. Since repeated calls to that function overwrite previous mapping, you must manually edit the internal dict.
import xml.etree.ElementTree as ET
customXmlFile = 'test.xml'
ns = {'ns': 'http://www.microsoft.com/axe',
'sc': 'http://schemas.datacontract.org/2004/07/'}
ET.register_namespace('', 'http://www.microsoft.com/axe')
ET._namespace_map['http://schemas.datacontract.org/2004/07'] = ''
tree = ET.parse(customXmlFile)
root = tree.getroot()
node = root.find('ns:RandomTag', namespaces=ns)
node.text = 'NEW TEXT'
tree.write(customXmlFile + ".new",
xml_declaration=True,
encoding='utf-8',
method="xml")
For more information about this see:
http://effbot.org/zone/element-namespaces.htm
Saving XML files using ElementTree
Cannot write XML file with default namespace

Categories