Parse XML attribute to variable with ElementTree - python

Hello im writing a bit of code im Maya and running into some issues with ElementTree. I need help reading in this xml, or something similar. The XML is generated based on a selection, so it can change.
<root>
<Locations>
<1 name="CacheLocation">C:\Users\daunish\Desktop</1>
</Locations>
<Objects>
<1 name="Sphere">[u'pSphere1', u'pSphere2']</1>
<2 name="Cube">[u'pCube1']</2>
</Objects>
</root>
I need a way of searching for a particular "name" inside "Locations", and passing the text to a variable.
I also need a way of going through each line inside of "Objects" and preforming a functions, as in a for loop.
I'm open to all suggestions, I have been going crazy trying to get this to work. If you think i should format the XML differently I'm up for that as well. Thanks in advance for the help.

[Note: your XML is not well formed because you can't have tags that start with a number]
Not sure what you've tried but there are many ways to do this, here's one:
Find the first element with name=CacheLocation in Locations:
>>> filename = root.find("./Locations/*[#name='CacheLocation']").text
>>> filename
'C:\\Users\\daunish\\Desktop'
Iterating over all the elements in Objects:
>>> import ast
>>> for target in root.find("./Objects"):
... for i in ast.literal_eval(target.text):
... print(target.get('name'), i)
Sphere pSphere1
Sphere pSphere2
Cube pCube1

Related

Python: Add subelement to an element based on value of a sibling subelement?

Lots of python XML parsing tutorials out there, but not that many on updating XML, and none I can find that match my needs. Sorry for the N00B.
I have a need to add subelements to a particular element based on the value of another subelement.
<CadData>
<FireIncidentCollection>
<FireIncident>
<IncidentNo>12345</IncidentNo>
<ApparatusCollection>
<Apparatus>
<Unit>E29</Unit>
</Apparatus>
<Apparatus>
<Unit>TW29</Unit>
</Apparatus>
<Apparatus>
<Unit>R29</Unit>
</Apparatus>
</ApparatusCollection>
</FireIncident>
</FireIncidentCollection>
</CadData>
I have values and even other subtrees I need to add based on the "Unit" value of an "Apparatus" element. For example, I may need to add this snippet in that "Apparatus" element when the "Unit"=="TW29":
<DispatchTime>20221115T06:05:04-6.00</DispatchTime>
<ApparatusPersonnelCollection>
<ApparatusPersonnel>
<ID>23456</ID>
</ApparatusPersonnel>
<ApparatusPersonnel>
<ID>78901</ID>
</ApparatusPersonnelCollection>
So far I'm resisting the urge to dump everything to a DB and re-writing the whole file each time :). I'm sure there's a way in ElementTree or DOM, but I can't figure it out (not for lack of effort). Any pointers are greatly appreciated. Thanks!
(Oh, and no, I don't control the schema- I just have to adhere to it).
The first step would be to put this information into a dictionary - then it will be much easier to update your data
I'd recommend using xmltodict library with a mixture of this tutorial - which will allow you to convert to a dictionary that you can traverse.
From there, just traverse down the dictionary. The nice thing about the xmltodict library is that it will convert xml with the same tag into a list, so your ApparatusCollection might look something like this once it's converted:
>>> data['CadData']['FireIncidentCollection']['FireIncident']['ApparatusCollection']
{'Apparatus': [{'Unit': 'E29'}, {'Unit': 'TW29'}, {'Unit': 'R29'}]}
Just find the "ApparatusCollect" key, and update the "Apparatus" section. My guess is that it would look close to this:
{'Apparatus': [
{
'Unit': 'E29',
'DispatchTime':20221115T06:05:04-6.00},
'ApparatusPersonnelCollection': {
'ApparatusPersonnel': [23456, 78901]
} ...
After you've added what you need, just convert the dictionary back into XML.
Hope this helps!

Find for multiple tags' values with lxml

I am using lxml to parse an XML like this sample one:
<compounddef xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="d2/db7/class_foo" kind="class">
<compoundname>FooClass</compoundname>
<sectiondef kind="public-type">
<memberdef kind="typedef" id="d2/db7/class_bar">
<type><ref refid="d3/d73/struct_foo" kindref="compound">StructFoo</ref></type>
<definition>StructFooDefinition</definition>
</memberdef>
</sectiondef>
</compounddef>
I'm trying to get the element with <refid> "d3/d73/struct_foo" and with the <definition> containing the text "Foo".
There could be many refid with that value and many definitions containing Foo, but only one has this combination.
I am able to first find all the elements with that refid and then filter this list by checking which of them containts "Foo" in the , but since I'm working with a really big XML file (~1GB) and the application is time sensitive, I wanted to avoid this.
I tried combining the various etree paths using the keyword 'and' or '//precede:...', but without success.
My last try was:
self.dox_tree_root_.xpath(".//compounddef[#kind = 'class']//memberdef[#kind='typedef'][/type/ref[#refid='%s'] and contains(definition, 'name')]" % (independent_type_refid, name)))
but it is giving me an error.
Is there a way to combine the two filters inside one command?
You can use XPATH
//a[.//ref[#refid="12345"] and contains(c, "Good")]
If I understand your correctly, this should get you close enough:
.//compounddef[#kind = 'class']//memberdef[#kind='typedef'][./type/ref[#refid='d3/d73/struct_foo']][contains(.//definition, 'Foo')]//definition
Output:
StructFooDefinition

How to access attribute value in xml containing namespace using ElementTree in python

XML file:
<?xml version="1.0" encoding="iso-8859-1"?>
<rdf:RDF xmlns:cim="http://iec.ch/TC57/2008/CIM-schema-cim13#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<cim:Terminal rdf:ID="A_T1">
<cim:Terminal.ConductingEquipment rdf:resource="#A_EF2"/>
<cim:Terminal.ConnectivityNode rdf:resource="#A_CN1"/>
</cim:Terminal>
</rdf:RDF>
I want to get the Terminal.ConnnectivityNode element's attribute value and Terminal element's attribute value also as output from the above xml. I have tried in below way!
Python code:
from elementtree import ElementTree as etree
tree= etree.parse(r'N:\myinternwork\files xml of bus systems\cimxmleg.xml')
cim= "{http://iec.ch/TC57/2008/CIM-schema-cim13#}"
rdf= "{http://www.w3.org/1999/02/22-rdf-syntax-ns#}"
Appending the below line to the code
print tree.find('{0}Terminal'.format(cim)).attrib
output1: : Is as expected
{'{http://www.w3.org/1999/02/22-rdf-syntax-ns#}ID': 'A_T1'}
If we Append with this below line to above code
print tree.find('{0}Terminal'.format(cim)).attrib['rdf:ID']
output2: key error in rdf:ID
If we append with this below line to above code
print tree.find('{0}Terminal/{0}Terminal.ConductivityEquipment'.format(cim))
output3 None
How to get output2 as A_T1 & Output3 as #A_CN1?
What is the significance of {0} in the above code, I have found that it must be used through net didn't get the significance of it?
First off, the {0} you're wondering about is part of the syntax for Python's built-in string formatting facility. The Python documentation has a fairly comprehensive guide to the syntax. In your case, it simply gets substituted by cim, which results in the string {http://iec.ch/TC57/2008/CIM-schema-cim13#}Terminal.
The problem here is that ElementTree is a bit silly about namespaces. Instead of being able to simply supply the namespace prefix (like cim: or rdf:), you have to supply it in XPath form. This means that rdf:id becomes {http://www.w3.org/1999/02/22-rdf-syntax-ns#}ID, which is very clunky.
ElementTree does support a way to use the namespace prefix for finding tags, but not for attributes. This means you'll have to expand rdf: to {http://www.w3.org/1999/02/22-rdf-syntax-ns#} yourself.
In your case, it could look as following (note also that ID is case-sensitive):
tree.find('{0}Terminal'.format(cim)).attrib['{0}ID'.format(rdf)]
Those substitutions expand to:
tree.find('{http://iec.ch/TC57/2008/CIM-schema-cim13#}Terminal').attrib['{http://www.w3.org/1999/02/22-rdf-syntax-ns#}ID']
With those hoops jumped through, it works (note that the ID is A_T1 and not #A_T1, however). Of course, this is all really annoying to have to deal with, so you could also switch to lxml and have it mostly handled for you.
Your third case doesn't work simply because 1) it's named Terminal.ConductingEquipment and not Terminal.ConductivityEquipment, and 2) if you really want A_CN1 and not A_EF2, that's the ConnectivityNode and not the ConductingEquipment. You can get A_CN1 with tree.find('{0}Terminal/{0}Terminal.ConnectivityNode'.format(cim)).attrib['{0}resource'.format(rdf)].

Converting a python set into a XML element

I'm still new in making python apps... But I'm willing to learn...
I want to make hash tags (converted from a string that was generated) and turn them into an element for an XML etree.
e.g.
from the string (object rawData)
rawData = "I'm soooo sleepy - feeling bored #journal #asleep"
I already got code from here to convert these hashtags (#journal and #asleep) into a python set:
hashTags = extract_hash_tags(rawData)
Result would be this (Now I already have a set of tags):
hashTags = set(['journal', 'asleep'])
The problem now is to make that set into:
<array>
<string>journal</string>
<string>asleep</string>
</array>
I know that I'm gonna make a loop for this that'll make individual parts of the set into elements.
I'm still rusty at loops though.
I'm using lxml because I need to prettify the xml. It gets the job done though.
EDIT: The answer for the stackoverflow question used a set not an array. Sorry 'bout that mistake...
With lxml.
from lxml import etree
# Code to make hashTags list...
array = etree.Element('array')
# Note: array can be also SubElement(parent, 'array')
for hash in hashTags:
string = etree.SubElement(array, 'string')
string.text = hash
print(etree.tostring(array, pretty_print=True)

Extracting XML Element and Attribute Data with Python 3

I'm looking to extract the extract the values of a particular attribute from a particular element, using Python 3.
An example of the element in question (Atom3d):
<Atom3d ID="18" Mapping="43" Parent="2" Name="C7"
XYZ="0.0148299997672439,0.283699989318848,1.0291999578476" Connections="33,39"
TemperatureType="Isotropic" IsotropicTemperature="0.0677"
AnisotropicTemperature="0,0,0,0,0,0,0,0,0" Occupancy="0.708" Components="C"/>
I need to extract the XYZ value, and further need to take this value and separate the comma-separated numbers within it. I need to use these numbers in another input file of a different format, so I was thinking to assign them to three separate variables and take it from there.
I'm very inexperienced with Python, and completely so when it comes to XML. I'm not sure of which libraries I would need to use, if such libraries even exist and how to use them if they do.
http://docs.python.org/3/library/xml.etree.elementtree.html
>>> from xml.etree import ElementTree as ET
>>> elem = ET.fromstring('''<Atom3d ID="18" Mapping="43" Parent="2" Name="C7"
... XYZ="0.0148299997672439,0.283699989318848,1.0291999578476" Connections="33,39"
... TemperatureType="Isotropic" IsotropicTemperature="0.0677"
... AnisotropicTemperature="0,0,0,0,0,0,0,0,0" Occupancy="0.708" Components="C"/>
... ''')
get attribute using get('attribute-name'):
>>> elem.get('XYZ')
'0.0148299997672439,0.283699989318848,1.0291999578476'
split string by ',':
>>> elem.get('XYZ').split(',')
['0.0148299997672439', '0.283699989318848', '1.0291999578476']

Categories