Parsing subchilds in XML with ElementTree

Parsing subchilds in XML with ElementTree - python

Im trying to extract information from a XML-document with ElementTree in Python 3.2.
The XML looks like this:
<Page Id="1">
<Group>4</Group>
<Type>
<Letter>B</Letter>
<Number>101</Number>
<Deep>
<A>900</A>
<B>900</B>
</Deep>
</Type>
</Page>
I manage to get the elementdata from "Group" with:
for Page in root.iter('Page'):
Group = Page.find('Group').text
And "Letter"-data with:
for Type in root.iter('Type'):
Dim = Type.find('Letter').text
However I can't figure out how to get the data from the subchilds of "Deep" (A and B).
All help is greatly appreciated!

You are very close. Use find to find the Deep tag and the iterate over it.
Ex:
import xml.etree.ElementTree as ET
tree = ET.parse(filename)
root = tree.getroot()
for Type in root.iter('Type'):
for deep_tag in Type.find("Deep"):
print( deep_tag.text )
Output:
900
900

Related

parsing XML in python by using xml.etree.ElementTree

I get an XML file using the request module, then I want to use the xml.etree.ElementTree module to get the output of the element
core-usg-01
but I'm already confused how to do it, im stuck. I tried writing this simple code to get the sysname element, but I get an empty output.
Python code:
import xml.etree.ElementTree as ET
tree = ET.parse('usg.xml')
root = tree.getroot()
print(root.findall('sysname'))
XML file:
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1">
<data>
<system-state xmlns="urn:ietf:params:xml:ns:yang:ietf-system">
<sysname xmlns="urn:huawei:params:xml:ns:yang:huawei-system">
core-usg-01
</sysname>
</system-state>
</data>
</rpc-reply>

You need to iter() over the root to reach to the child.
for child in root.iter():
print (child.tag, child.attrib)
Which will give you the present children tags and their attributes.
{urn:ietf:params:xml:ns:netconf:base:1.0}rpc-reply {'message-id': '1'}
{urn:ietf:params:xml:ns:netconf:base:1.0}data {}
{urn:ietf:params:xml:ns:yang:ietf-system}system-state {}
{urn:huawei:params:xml:ns:yang:huawei-system}sysname {}
Now you need to loop to your desired tag using following code:
for child in root.findall('.//{urn:ietf:params:xml:ns:yang:ietf-system}system-state'):
temp = child.find('.//{urn:huawei:params:xml:ns:yang:huawei-system}sysname')
print(temp.text)
The output will look like this:
core-usg-01

Try the below one liner
import xml.etree.ElementTree as ET
xml = '''<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1">
<data>
<system-state xmlns="urn:ietf:params:xml:ns:yang:ietf-system">
<sysname xmlns="urn:huawei:params:xml:ns:yang:huawei-system">
core-usg-01
</sysname>
</system-state>
</data>
</rpc-reply>'''
root = ET.fromstring(xml)
print(root.find('.//{urn:huawei:params:xml:ns:yang:huawei-system}sysname').text)
output
core-usg-01

Need some help generating XML with Python

I have some variables in Python that I need to store as XML. I have been using the python:LXML module for this so far. Not too experienced with it. Have tried playing around with various tutorials and docs, but I am at a dead end need some help.
Here is the python script:
root = etree.Element("root")
coins=etree.Element("coins")
doc=etree.ElementTree(coins)
coins.append(etree.Element("trader"))
coins.append(etree.Element("metal"))
coins.append(etree.Element("type"))
coins.append(etree.Element("price"))
coins[0].text="Gold.co.uk"
coins[0].attrib["variable"]=("GLDAG_MAPLE")
coins[1].text="Silver"
coins[2].text="Britannia"
coins[3].text=str(GLDAG_MAPLE)
doc.write('data.xml', pretty_print=True)
As of now it outputs this:
<coins>
<trader variable="GLDAG_MAPLE">Gold.co.uk</trader>
<metal>Silver</metal>
<type>Britannia</type>
<price>
£31.20
</price>
</coins>
However I would like it to look like this:
<root>
<coin>
<trader> Gold.co.uk </trader>
<type> Britannia </type>
<price> £31.20 </price>
</coin>
</root>
The tags and their sub-tags would be duplicated for every type of coin. I have no idea how to construct the XML so that the output looks like the third code-block. So far I have tried to follow other scripts that I have seen on github and other sites but modify them to suit my needs but my scripts keep failing or producing incorrect resaults for some reason.
If someone could help me out then that would be great!

You can simply append the Element to root:
from lxml import etree
coinItems = [
{'trader': 'Gold.co.uk', 'metal': 'Silver', 'type': 'Britannia'},
{'trader': 'copper.co.uk', 'metal': 'Copper', 'type': 'World'}
]
root = etree.Element("root")
for ci in coinItems:
coin=etree.Element("coin")
etree.SubElement(coin, "trader", {'variable': 'GLDAG_MAPLE'}).text = ci['trader'] # example how to use attributes!
etree.SubElement(coin, "metal").text = ci['metal']
etree.SubElement(coin, "type").text = ci['type']
root.append(coin)
fName = '/tmp/data.xml'
with open(fName, 'wb') as f:
# remove encoding here, in case you want escaped ASCII characters: £
f.write(etree.tostring(root, xml_declaration=True, encoding="utf-8", pretty_print=True))
print(open(fName).read())
Output:
<?xml version='1.0' encoding='utf-8'?>
<root>
<coin>
<trader variable="GLDAG_MAPLE">Gold.co.uk</trader>
<metal>Silver</metal>
<type>Britannia</type>
</coin>
<coin>
<trader variable="GLDAG_MAPLE">copper.co.uk</trader>
<metal>Copper</metal>
<type>World</type>
</coin>
</root>

I prefer using the lxml builder (https://lxml.de/api/lxml.builder.ElementMaker-class.html) because imho it is easier to see the structure of your XML document.
from lxml.builder import E
root = E.root(
E.coin(
E.trader("Gold.co.uk",
variable="GLDAG_MAPLE"),
E.metal("silver"),
E.price("£31.20")
)
)
You can then append the root element to your main document.

XPath with LXML Element

I am trying to parse an XML document using lxml etree. The XML doc I am parsing looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.openarchives.org/OAI/2.0/">\t
<codeBook version="2.5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="ddi:codebook:2_5" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
<docDscr>
<citation>
<titlStmt>
<titl>Test Title</titl>
</titlStmt>
<prodStmt>
<prodDate/>
</prodStmt>
</citation>
</docDscr>
<stdyDscr>
<citation>
<titlStmt>
<titl>Test Title 2</titl>
<IDNo agency="UKDA">101</IDNo>
</titlStmt>
<rspStmt>
<AuthEnty>TestAuthEntry</AuthEnty>
</rspStmt>
<prodStmt>
<copyright>Yes</copyright>
</prodStmt>
<distStmt/>
<verStmt>
<version date="">1</version>
</verStmt>
</citation>
<stdyInfo>
<subject>
<keyword>2009</keyword>
<keyword>2010</keyword>
<topcClas>CLASS</topcClas>
<topcClas>ffdsf</topcClas>
</subject>
<abstract>This is an abstract piece of text.</abstract>
<sumDscr>
<timePrd event="single">2020</timePrd>
<nation>UK</nation>
<anlyUnit>Test</anlyUnit>
<universe>test</universe>
<universe>hello</universe>
<dataKind>fdsfdsf</dataKind>
</sumDscr>
</stdyInfo>
<method>
<dataColl>
<timeMeth>test timemeth</timeMeth>
<dataCollector>test data collector</dataCollector>
<sampProc>test sampprocess</sampProc>
<deviat>test deviat</deviat>
<collMode>test collMode</collMode>
<sources/>
</dataColl>
</method>
<dataAccs>
<setAvail>
<accsPlac>Test accsPlac</accsPlac>
</setAvail>
<useStmt>
<restrctn>NONE</restrctn>
</useStmt>
</dataAccs>
<othrStdyMat>
<relPubl>122</relPubl>
<relPubl>12332</relPubl>
</othrStdyMat>
</stdyDscr>
</codeBook>
</metadata>
I wrote the following code to try and process it:
from lxml import etree
import pdb
f = open('/vagrant/out2.xml', 'r')
xml_str = f.read()
xml_doc = etree.fromstring(xml_str)
f.close()
From what I understand from the lxml xpath docs, I should be able to get the text from a specific element as follows:
xml_doc.xpath('/metadata/codeBook/docDscr/citation/titlStmt/titl/text()')
However, when I run this it returns an empty array.
The only xpath I can get to return something is using a wildcard:
xml_doc.xpath('*')
Which returns [<Element {ddi:codebook:2_5}codeBook at 0x7f8da8a413f8>].
I've read through the docs and I'm not understanding what is going wrong with this. Any help is appreciated.

You need to take the default namespace into account so instead of
xml_doc.xpath('/metadata/codeBook/docDscr/citation/titlStmt/titl/text()')
use
xml_doc.xpath.xpath(
'/oai:metadata/ddi:codeBook/ddi:docDscr/ddi:citation/ddi:titlStmt/ddi:titl/text()',
namespaces={
'oai': 'http://www.openarchives.org/OAI/2.0/',
'ddi': 'ddi:codebook:2_5'
}
)

Can I create this XML file with lxml?

I'm trying to generate an xml that looks exactly like this:
<?xml version="1.0" encoding="utf-8"?>
<XML type="formats" version="4">
<format type="format" uid="BEAUTY:MasterBeauty">
<type>video</type>
<channelsDepth type="uint">16</channelsDepth>
<channelsEncoding type="string">Float</channelsEncoding>
<channelsEndianess type="string">Little Endian</channelsEndianess>
<fieldDominance type="int">2</fieldDominance>
<height type="uint">1080</height>
<nbChannels type="uint">4</nbChannels>
<pixelLayout type="string">ABGR</pixelLayout>
<pixelRatio type="float">1</pixelRatio>
<rowOrdering type="string">up</rowOrdering>
<width type="uint">1920</width>
</format>
</XML>
It's part of a VFX nodal workflow script ensemble and this file is part of a "read media" node.
I've spent the whole week looking at many different things but can't find anything close to this. I picked lxml for the pretty print thing. I was able to generate a bunch of other simpler (to me) xml files but for this one, I gotta say … i'm lost. Complete fail so far!
Could someone kindly shed a light on this please?
MY QUESTIONS:
- is lxml appropriate for this?
- if no, what is a better choice? (i did look for ElementTree example, no luck!)
- if yes, where do i start? Could someone share a piece a code to get me started?
What i could create so far was things like this one:
import os, sys
import lxml.etree
import lxml.builder as lb
from lxml import etree
E = lxml.builder.ElementMaker()
Setup = E.Setup
Base = E.Base
Version = E.Version
Note = E.Note
Expanded = E.Expanded
ScrollBar = E.ScrollBar
Frames = E.Frames
Current_Time = E.Current_Time
Input_DataType = E.Input_DataType
ClampMode = E.ClampMode
AdapDegrad = E.AdapDegrad
UsedAsTransition = E.UsedAsTransition
State = E.State
root_node = Setup(
Base(
Version('12.030000'),
Note(''),
Expanded('False'),
ScrollBar('0'),
Frames('0'),
Current_Time('1'),
Input_DataType('3'),
ClampMode('0'),
AdapDegrad('False'),
UsedAsTransition('False')
),
State(),
)
print lxml.etree.tostring(root_node, pretty_print=True)
str = etree.tostring(root_node, pretty_print=True)
myXMLfile = open('/Users/stefan/XenDRIVE/___DEV/PYTHON/Create_xlm/create_Batch_xml_setups/result/xml_result/root.root_node.xml', 'w')
myXMLfile.write(str)
myXMLfile.close()
Hope those are "acceptable" questions.
Thank you in advance for any help.

First, make the format node and then add it to the root XML node.
Example code (follow it to create more nodes):
from lxml import etree
from lxml.builder import ElementMaker
E = ElementMaker()
format = E.format(
E.type("video"),
E.channelsDepth("16", type="uint"),
# create more elements here
type="format",
uid="BEAUTY:MasterBeauty"
)
root = E.XML(
format,
type="formats",
version="4"
)
print(etree.tostring(root, xml_declaration=True, encoding='utf-8', pretty_print=True))
Prints:
<?xml version='1.0' encoding='utf-8'?>
<XML version="4" type="formats">
<format type="format" uid="BEAUTY:MasterBeauty">
<type>video</type>
<channelsDepth type="uint">16</channelsDepth>
</format>
</XML>

python elementree blank output

I am parsing an XML output from VCloud, however I am not able to reach to the values
<?xml version="1.0" encoding="UTF-8"?>
<SupportedVersions xmlns="http://www.vmware.com/vcloud/versions" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.vmware.com/vcloud/versions http://10.10.6.12/api/versions/schema/versions.xsd">
<VersionInfo>
<Version>1.5</Version>
<LoginUrl>https://api.vcd.portal.skyscapecloud.com/api/sessions</LoginUrl>
<MediaTypeMapping>
<MediaType>application/vnd.vmware.vcloud.instantiateVAppTemplateParams+xml</MediaType>
<ComplexTypeName>InstantiateVAppTemplateParamsType</ComplexTypeName>
<SchemaLocation>http://api.vcd.portal.skyscapecloud.com/api/v1.5/schema/master.xsd</SchemaLocation>
</MediaTypeMapping>
<MediaTypeMapping>
<MediaType>application/vnd.vmware.admin.vmwProviderVdcReferences+xml</MediaType>
<ComplexTypeName>VMWProviderVdcReferencesType</ComplexTypeName>
<SchemaLocation>http://api.vcd.portal.skyscapecloud.com/api/v1.5/schema/vmwextensions.xsd</SchemaLocation>
</MediaTypeMapping>
<MediaTypeMapping>
<MediaType>application/vnd.vmware.vcloud.customizationSection+xml</MediaType>
<ComplexTypeName>CustomizationSectionType</ComplexTypeName>
<SchemaLocation>http://api.vcd.portal.skyscapecloud.com/api/v1.5/schema/master.xsd</SchemaLocation>
</MediaTypeMapping>
this is what I have been using
import xml.etree.ElementTree as ET
data = ET.fromstring(content)
versioninfo = data.findall("VersionInfo/Version")
print len(versioninfo)
print versioninfo.text
however this gives a blank output...any suggestions?

Try this:
import xml.etree.ElementTree as ET
data = ET.fromstring(content)
versioninfo = data.find(
"ns:VersionInfo/ns:Version",
namespaces={'ns':'http://www.vmware.com/vcloud/versions'})
print versioninfo.text
Use .find(), not .findall() to return a single element
Your XML uses namespaces. The full path to your desired object is: '{http://www.vmware.com/vcloud/versions}VersionInfo/{http://www.vmware.com/vcloud/versions}Version' By passing in the namespaces parameter, you are able to use the shortcut syntax: ns:VersionInfo/ns:Version.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing subchilds in XML with ElementTree - python

You are very close. Use find to find the Deep tag and the iterate over it. Ex: import xml.etree.ElementTree as ET tree = ET.parse(filename) root = tree.getroot() for Type in root.iter('Type'): for deep_tag in Type.find("Deep"): print( deep_tag.text ) Output: 900 900

Related

parsing XML in python by using xml.etree.ElementTree

Need some help generating XML with Python

XPath with LXML Element

Can I create this XML file with lxml?

python elementree blank output

Categories

Resources