Parsing XML with namespace into dictionary

Parsing XML with namespace into dictionary - python

I'm having a hard time following the xml.etree.ElementTree documentation with regard to parsing an XML document with a namespace and nested tags.
To begin, the xml tree I am trying to parse looks like:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ROOT-MAIN xmlns="http://fakeurl.com/page">
<Alarm> <--- I dont care about these types of objects
<Node>
<location>Texas></location>
<name>John</name>
</Node>
</Alarm>
<Alarm> <--- I care about these types of objects
<CreateTime>01/01/2011</CreateTime>
<Story>
<Node>
<Name>Ethan</name
<Address category="residential>
<address>1421 Morning SE</address>
</address>
</Node>
</Story>
<Build>
<Action category="build_value_1">Build was successful</Action>
</Build>
<OtherData type="string" meaning="favoriteTVShow">Purple</OtherData>
<OtherData type="string" meaning="favoriteColor">Seinfeld</OtherData>
</Alarm>
</ROOT-MAIN>
I am trying to build an array of dictionaries that have a similar structure to the second < Alarm > object. When parsing this XML file, I do the following:
import xml.etree.ElementTree as ET
tree = ET.parse('data/'+filename)
root = tree.getroot()
namespace= '{http://fakeurl.com/page}'
for alarm in tree.findall(namespace+'Alarm'):
for elem in alarm.iter():
try:
creation_time = elem.find(namespace+'CreateTime')
for story in elem.findall(namespace+'Story'):
for node in story.findall(namespace+'Node'):
for Address in node.findall(namespace+'Address'):
address = Address.find(namespace+'address').text
for build in elem.findall(namespace+'Build'):
category= build.find(namespace+'Action').attrib
action = build.find(namespace+'Action').text
for otherdata in elem.findall(namespace+'OtherData'):
#not sure how to get the 'meaning' attribute value as well as the text value for these <OtherData> tags
except:
pass
Right I'm just trying to get values for:
< address >
< Action > (attribute value and text value)
< OtherData > (attribute value and text value)
I'm sort of able to do this with for loops within for-loops but I was hoping for a cleaner, xpath solution which I haven't figured out how to do with a namespace.
Any suggestions would be much appreciated.

Here (collecting a subset of the elements you mentioned -- add more code to collect rest of elements)
import xml.etree.ElementTree as ET
import re
xmlstring = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root xmlns="http://fakeurl.com/page">
<Alarm>
<Node>
<location>Texas></location>
<name>John</name>
</Node>
</Alarm>
<Alarm>
<CreateTime>01/01/2011</CreateTime>
<Story>
<Node>
<Name>Ethan</Name>
<Address category="residential">
<address>1421 Morning SE</address>
</Address>
</Node>
</Story>
<Build>
<Action category="build_value_1">Build was successful</Action>
</Build>
<OtherData type="string" meaning="favoriteTVShow">Purple</OtherData>
<OtherData type="string" meaning="favoriteColor">Seinfeld</OtherData>
</Alarm>
</root>'''
xmlstring = re.sub(' xmlns="[^"]+"', '', xmlstring, count=1)
root = ET.fromstring(xmlstring)
alarms = root.findall('Alarm')
alarms_list = []
for alarm in alarms:
create_time = alarm.find('CreateTime')
if create_time is not None:
entry = {'create_time': create_time.text}
alarms_list.append(entry)
actions = alarm.findall('Build/Action')
if actions:
entry['builds'] = []
for action in actions:
entry['builds'].append({'category': action.attrib['category'], 'status': action.text})
print(alarms_list)

Related

xml find is always None

How can I use .find or .get to get the value what's I want?
I look at the example, I don’t know where I did it wrong
Always get None
<?xml version="1.0" encoding="UTF-8"?>
<config version="1.0" xmlns="http://www.ipc.com/ver10">
<types>
<bitRateType>
<enum>VBR</enum>
<enum>CBR</enum>
</bitRateType>
<quality>
<enum>lowest</enum>
<enum>lower</enum>
</quality>
</types>
<streams type="list" count="2">
<item id="0">
<name type="string" maxLen="32"><![CDATA[rtsp://192.168.0.175:554/chID=2&streamType=main]]></name>
<resolution>2592x1520</resolution>
</item>
<item id="1">
<name type="string" maxLen="32"><![CDATA[rtsp://192.168.0.175:554/chID=2&streamType=sub1]]></name>
<resolution>704x480</resolution>
</item>
</streams>
</config>
My Code.
tree = ET.fromstring(res.text)
types = tree.find('types')
streams = tree.find('streams')
item = streams.findall('item')[0]
print(item.get('name'))
print(item.get('resolution'))
print(types, streams, item)

You can try this method's
from xml.etree import cElementTree as ET
new_tree = ET.parse('test.xml')
new_root = new_tree.getroot()
types = new_root[0]
bitRateType = types[0]
quality = types[1]
streams = new_root[1]
item0_name = streams[0][0]
item0_resolution = streams[0][1]
item1_name = streams[1][0]
item1_resolution = streams[1][1]
print(item0_name.attrib)
print(item1_resolution.text)
var.attrib - all attributes, for example:
print(item0_name.attrib)
{'type': 'string', 'maxLen': '32'}
var.text - get text , for example:
print(item1_resolution.text)
704x480

modify node and extract data from xml file in python

I am new with python and I am looking for advices on what is the best approach to do the following task:
I have an xml file looking like this
<component xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.spiritconsortium.org/XMLSchema/SPIRIT/1685-2009 http://www.spiritconsortium.org/XMLSchema/SPIRIT/1685-2009/index.xsd">
<memoryMaps>
<memoryMap>
<name>name</name>
<description>description</description>
<peripheral>
<name>periph</name>
<description>description</description>
<baseAddress>0x0</baseAddress>
<range>0x8</range>
<width>32</width>
<register>
<name>reg1</name>
<displayName>reg1</displayName>
<addressOffset>0x0</addressOffset>
<size>32</size>
<access>read-write</access>
<reset>
<value>0x00000002</value>
<mask>0xFFFFFFFF</mask>
</reset>
<field>
</field>
</register>
</peripheral>
</memoryMap>
</memoryMaps>
</component>
I want to do some modifications to modify the node of "reset" to become 2 separate nodes, one for "resetValue" and another "resetMask" but keeping same data in "value" and "mask" extracted into "resetValue" and "resetMask" as follow:
........
<access>read-write</access>
<resetValue>0x00000002</resetValue>
<resetMask>0xFFFFFFFF</resetMask>
<field>
.............
I managed the part of parsing my xml file with success, now I can't know how to start this first modification.
Thank you to guide me.

code that create 2 sub elements under 'register' and remove the unneeded element 'reset'
import xml.etree.ElementTree as ET
xml = '''<component xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.spiritconsortium.org/XMLSchema/SPIRIT/1685-2009 http://www.spiritconsortium.org/XMLSchema/SPIRIT/1685-2009/index.xsd">
<memoryMaps>
<memoryMap>
<name>name</name>
<description>description</description>
<peripheral>
<name>periph</name>
<description>description</description>
<baseAddress>0x0</baseAddress>
<range>0x8</range>
<width>32</width>
<register>
<name>reg1</name>
<displayName>reg1</displayName>
<addressOffset>0x0</addressOffset>
<size>32</size>
<access>read-write</access>
<reset>
<value>0x00000002</value>
<mask>0xFFFFFFFF</mask>
</reset>
<field>
</field>
</register>
</peripheral>
</memoryMap>
</memoryMaps>
</component>'''
root = ET.fromstring(xml)
register = root.find('.//register')
value = register.find('.//reset/value').text
mask = register.find('.//reset/mask').text
v = ET.SubElement(register, 'resetValue')
v.text = value
m = ET.SubElement(register, 'resetMask')
m.text = mask
register.remove(register.find('reset'))
ET.dump(root)
output
<?xml version="1.0" encoding="UTF-8"?>
<component xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.spiritconsortium.org/XMLSchema/SPIRIT/1685-2009 http://www.spiritconsortium.org/XMLSchema/SPIRIT/1685-2009/index.xsd">
<memoryMaps>
<memoryMap>
<name>name</name>
<description>description</description>
<peripheral>
<name>periph</name>
<description>description</description>
<baseAddress>0x0</baseAddress>
<range>0x8</range>
<width>32</width>
<register>
<name>reg1</name>
<displayName>reg1</displayName>
<addressOffset>0x0</addressOffset>
<size>32</size>
<access>read-write</access>
<field />
<resetValue>0x00000002</resetValue>
<resetMask>0xFFFFFFFF</resetMask>
</register>
</peripheral>
</memoryMap>
</memoryMaps>
</component>

Filling values under child through element tree

I have an xml file and another text file and i had written script to parse the text file and get a dictionary with keys and values ,Now i have to go inside the xml file and fill the values for child of child testgroup,the values include test case_title ,inddent etc,
And also based on the length of aa in the script i need to create child under test group,i have minimum exposure in elementtree,Any recomendation would be highly helpful.
xml = """<?xml version="1.0" encoding="UTF-8"?>
<testmodule title="hello" version="version 2">
<description> 'world' </description>
<engineer>
<info>
<name>Test </name>
<description> 'test' </description>
</info>
</engineer>
<preparation>
<initialize title="Set">
</initialize>
</preparation>
<variants>
<variant name="A">Test </variant>
<variant name="B">test</variant>
<variant name="C">Test test</variant>
</variants>
<testgroup title="Testing" ident="Testing" >
<testcase title="Check" ident= "3_1" name="Number" variants="A">
<param name="Testcase" type="string">Checking of Correct SW and Part identifiers </param>
<param name="TestcaseRequirements" type="string"></param>
<param name="Test" type="string">TS_Automation=Manual;TS_Method=Bench_Test;TS_Priority=1;TS_Tested_By=rjrjjn;TS_Written_By=SUN;TS_Review_done=No;TS_Regression=No;</param>
</testcase>
"""
ee=''
with open('C:\\Users\\rjrn8w\\Desktop\\Test.txt', "r") as f:
for i in f:
ee+=i
import re
aa=re.findall(r'<TC_HEADER_START>([\s\S]*)</TC_HEADER_END>',ee)
for j in aa:
k=j.strip()
new_dict={}
ak=dict(re.findall(r'(\S+)=(.*)', j.strip()))
print ak
import xml.etree.ElementTree as ET
tree = ET.parse('C:\\Users\\rjrn8w\\Documents\\My Received Files\\new.xml')
root = tree.getroot()
for child in root:
if child.tag=='testgroup':
for element in child:
for elem in element:
import pdb;pdb.set_trace()
print elem.tag
ak={'TS_Regression': 'No', 'ident': '1 ', 'TestcaseRequirements': '36978', 'name': '"T01">', 'title': '"DHCP " ', 'TS_Review_done': 'Yes;', 'TestcaseTestType': 'Test', 'TS_Priority': '1;', 'TS_Tested_By': 'qz9ghv;', 'TS_Techniques': 'Full Testing;', 'variants': '"A C" ', 'StakeholderRequirements': '1236\t\t\t\t', 'TS_Implemented': 'Yes;', 'TS_Automation': 'Automated;', 'TestcaseDescription': ' This test verifies DHCP discovery is halted after tester is connected'}

you can read python documentation, it have content of how to create Element, add attribute , add values and how to append the new element to existed element as a child.
child=xml.etree.ElementTree.Element(tag, attrib={}, **extra)
existedelement.append(child)

Python xpath with xml.etree.ElementTree: multiple conditions

I am trying to count from an XML file all the XML nodes of the form:
....
<node id="0">
<data key="d0">Attribute</data>
....
</node>
....
For example a file like this:
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<graph edgedefault="directed">
<node id="0">
<data key="d0">Attribute</data>
<data key="d1">Foo</data>
</node>
What I have tried is:
x = graphml_root.findall(".//"+nsfy("node")+"/["+nsfy("data")+"='Attribute']")
Butt his only says that the text of the XML has to be "Attribute", I want to make sure that "Attribute" is the text of the node with key="d0", so I tried this:
x = graphml_root.findall(".//"+nsfy("node")+"/"+nsfy("data")+"[#key='d0']"+"[""'Attribute']")
But it returns an empty list, so I am missing something.
NOTE:
I had to write a little lambda to avoid copying the xmlnamespace all teh time:
nsfy = lambda x : '{http://graphml.graphdrawing.org/xmlns}'+x #to be able to read namespace tags
Thanks.

Try doing something like:
nodes = []
containers = graphml_root.findall('.//node/data[#key="d0"]')
for container in containers:
if container.text == "Attribute":
nodes.append(container)
count = len(nodes)

from lxml import etree
f= '''
<node id="0">
<data key="d0" t="32">Attribute</data>
<data key="d1">Foo</data>
</node>'''
root = etree.XML(f)
data = root.xpath('.//*[#key="d0" and text()="Attribute"]')
print(data)
lxml provide the xpath method.and it's done.
UPDATE
read the DOC of xml.etree,it don't supported this syntax.the xpath supported by xml.etree
So,only you can do is find .//*[#key="d0"]then test it's text equal to "Attribute".

how can I select all descendants of a certain element with ElementTree in Python 3.3?

This is the sample data.
input.xml
<root>
<entry id="1">
<headword>go</headword>
<example>I <hw>go</hw> to school.</example>
</entry>
</root>
I'd like to put node and its descendants into . That is,
output.xml
<root>
<entry id="1">
<headword>go</headword>
<examplegrp>
<example>I <hw>go</hw> to school.</example>
</examplegrp>
</entry>
</root>
My poor and incomplete script is:
import codecs
import xml.etree.ElementTree as ET
fin = codecs.open(r'input.xml', 'rb', encoding='utf-8')
data = ET.parse(fin)
root = data.getroot()
example = root.find('.//example')
for elem in example.iter():
---and then I don't know what to do---

Here's an example of how it can be done:
text = """
<root>
<entry id="1">
<headword>go</headword>
<example>I <hw>go</hw> to school.</example>
</entry>
</root>
"""
import lxml.etree
import StringIO
data = lxml.etree.parse(StringIO.StringIO(text))
root = data.getroot()
for entry in root.xpath('//example/ancestor::entry[1]'):
examplegrp = lxml.etree.SubElement(entry,"examplegrp")
nodes = [node for node in entry.xpath('./example')]
for node in nodes:
entry.remove(node)
examplegrp.append(node)
print lxml.etree.tostring(root,pretty_print=True)
which will output:
<root>
<entry id="1">
<headword>go</headword>
<examplegrp><example>I <hw>go</hw> to school.</example>
</examplegrp></entry>
</root>

http://docs.python.org/3/library/xml.dom.html?highlight=xml#node-objects
http://docs.python.org/3/library/xml.dom.html?highlight=xml#document-objects
You probably want to follow some paradigm of creating a Document Element and appending reach result to it.
group = Document.createElement(tagName)
for found in founds:
group.appendNode(found)
Or something like this

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing XML with namespace into dictionary - python

Related

xml find is always None

modify node and extract data from xml file in python

Filling values under child through element tree

Python xpath with xml.etree.ElementTree: multiple conditions

how can I select all descendants of a certain element with ElementTree in Python 3.3?

Categories

Resources