xml missing element in python - python

System uses dom parser in python 2.7.2. The goal is to extract the .db file and use it on sql server.I currently have no problem with sqlite3 library. I have read the similar questions/answers about how to handle a missing element while parsing xml files.But still I couldn't figure out the solution. xml has 15000+ elements. here is the basic code from xml:
<topo>
<vlancard>
<id>4545</id>
<nodeValue>21</nodeValue>
<vlanName>voice</vlanName>
</vlancard>
<vlancard>
<id>1234</id>
<nodeValue>42</nodeValue>
<vlanName>camera</vlanName>
</vlancard>
<vlancard>
<id>9876</id>
<nodeValue>84</nodeValue>
</vlancard>
</topo>
Like the 3rd element, several elements do not have the node. That causes inconsistency on element numbers. i.e.
from xml.dom import minidom
xmldoc = minidom.parse('c:\vlan.xml')
vlId = xmldoc.getElementsByTagName('id')
vlValue = xmldoc.getElementsByTagName('nodeValue')
vlName = xmldoc.getElementsByTagName('vlanName')
after running the module:
IndexError: list index out of range
>>> len(id)
16163
>>> len(vlanName)
16155
Because of this problem , problem occurs for ordering the elements. while printing the table , parser passes the missing elements and element orders are mixed up. I use a simple while loop to insert the values into the table.
x=0
while x < (len(vlId)):
c.execute('''insert into vlan ('id','nodeValue','vlanName') values ('%s','%s','%s') ''' %(id[x].firstChild.nodeValue, nodeValue[x].firstChild.nodeValue, vlanName[x].firstChild.nodeValue))
x= x+1
How else can I do this? Any help will be appreciated.
Yusuf

Instead of parsing the entire xml and then inserting, parse each vlancard the retrieve it's id/value/name and then insert them into the DB.

Related

Getting values from an XML file that has deep keys and values

I have a very large xml file produced from an application whose part of tree is as below:
There are several items under 'item' from 0 to 7. These names are always named as numbers it can range from 0 to any number.
Each of these items will have multiple items all with same structure as per the above tree. Only item 0 to 7 is variable all other structure remains same.
under I have a value <bbmds_questiontype>: which can be Multiple Choice or Matching or Essays.
What I need is to have a list the values of <mat_formattedtext>. ie. the output is supposed to be:
<0>
<bbmds_questiontype>Multiple Choice</bbmds_questiontype>
<mat_formattedtext>This is first question </mat_formattedtext></0>
<1>
<bbmds_questiontype>Multiple Choice</bbmds_questiontype>
<mat_formattedtext>This is second question </mat_formattedtext> </1>
<2>
<bbmds_questiontype>Essay</bbmds_questiontype>
<mat_formattedtext>This is first question </mat_formattedtext> </2>
....
I have tried several solution included xml tree, xmltodict all getting complicated as filters to be applied across different branches of children
import xmltodict
with open("C:/Users/SS/Desktop/moodlexml/00001_questions.dat") as fd:
doc = xmltodict.parse(fd.read())
shortened=doc['questestinterop']['assessment']['section']['item'] # == u'an attribute'
Any advice will be appreciated to proceed further.
Have you tried to use bs4 parsing, its simple
Check it out
https://linuxhint.com/parse_xml_python_beautifulsoup/

Find for multiple tags' values with lxml

I am using lxml to parse an XML like this sample one:
<compounddef xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="d2/db7/class_foo" kind="class">
<compoundname>FooClass</compoundname>
<sectiondef kind="public-type">
<memberdef kind="typedef" id="d2/db7/class_bar">
<type><ref refid="d3/d73/struct_foo" kindref="compound">StructFoo</ref></type>
<definition>StructFooDefinition</definition>
</memberdef>
</sectiondef>
</compounddef>
I'm trying to get the element with <refid> "d3/d73/struct_foo" and with the <definition> containing the text "Foo".
There could be many refid with that value and many definitions containing Foo, but only one has this combination.
I am able to first find all the elements with that refid and then filter this list by checking which of them containts "Foo" in the , but since I'm working with a really big XML file (~1GB) and the application is time sensitive, I wanted to avoid this.
I tried combining the various etree paths using the keyword 'and' or '//precede:...', but without success.
My last try was:
self.dox_tree_root_.xpath(".//compounddef[#kind = 'class']//memberdef[#kind='typedef'][/type/ref[#refid='%s'] and contains(definition, 'name')]" % (independent_type_refid, name)))
but it is giving me an error.
Is there a way to combine the two filters inside one command?
You can use XPATH
//a[.//ref[#refid="12345"] and contains(c, "Good")]
If I understand your correctly, this should get you close enough:
.//compounddef[#kind = 'class']//memberdef[#kind='typedef'][./type/ref[#refid='d3/d73/struct_foo']][contains(.//definition, 'Foo')]//definition
Output:
StructFooDefinition

Parse XML attribute to variable with ElementTree

Hello im writing a bit of code im Maya and running into some issues with ElementTree. I need help reading in this xml, or something similar. The XML is generated based on a selection, so it can change.
<root>
<Locations>
<1 name="CacheLocation">C:\Users\daunish\Desktop</1>
</Locations>
<Objects>
<1 name="Sphere">[u'pSphere1', u'pSphere2']</1>
<2 name="Cube">[u'pCube1']</2>
</Objects>
</root>
I need a way of searching for a particular "name" inside "Locations", and passing the text to a variable.
I also need a way of going through each line inside of "Objects" and preforming a functions, as in a for loop.
I'm open to all suggestions, I have been going crazy trying to get this to work. If you think i should format the XML differently I'm up for that as well. Thanks in advance for the help.
[Note: your XML is not well formed because you can't have tags that start with a number]
Not sure what you've tried but there are many ways to do this, here's one:
Find the first element with name=CacheLocation in Locations:
>>> filename = root.find("./Locations/*[#name='CacheLocation']").text
>>> filename
'C:\\Users\\daunish\\Desktop'
Iterating over all the elements in Objects:
>>> import ast
>>> for target in root.find("./Objects"):
... for i in ast.literal_eval(target.text):
... print(target.get('name'), i)
Sphere pSphere1
Sphere pSphere2
Cube pCube1

Extracting XML Element and Attribute Data with Python 3

I'm looking to extract the extract the values of a particular attribute from a particular element, using Python 3.
An example of the element in question (Atom3d):
<Atom3d ID="18" Mapping="43" Parent="2" Name="C7"
XYZ="0.0148299997672439,0.283699989318848,1.0291999578476" Connections="33,39"
TemperatureType="Isotropic" IsotropicTemperature="0.0677"
AnisotropicTemperature="0,0,0,0,0,0,0,0,0" Occupancy="0.708" Components="C"/>
I need to extract the XYZ value, and further need to take this value and separate the comma-separated numbers within it. I need to use these numbers in another input file of a different format, so I was thinking to assign them to three separate variables and take it from there.
I'm very inexperienced with Python, and completely so when it comes to XML. I'm not sure of which libraries I would need to use, if such libraries even exist and how to use them if they do.
http://docs.python.org/3/library/xml.etree.elementtree.html
>>> from xml.etree import ElementTree as ET
>>> elem = ET.fromstring('''<Atom3d ID="18" Mapping="43" Parent="2" Name="C7"
... XYZ="0.0148299997672439,0.283699989318848,1.0291999578476" Connections="33,39"
... TemperatureType="Isotropic" IsotropicTemperature="0.0677"
... AnisotropicTemperature="0,0,0,0,0,0,0,0,0" Occupancy="0.708" Components="C"/>
... ''')
get attribute using get('attribute-name'):
>>> elem.get('XYZ')
'0.0148299997672439,0.283699989318848,1.0291999578476'
split string by ',':
>>> elem.get('XYZ').split(',')
['0.0148299997672439', '0.283699989318848', '1.0291999578476']

Search for specific XML element Attribute values

Using Python ElementTree to construct and edit test messages:
Part of XML as follows:
<FIXML>
<TrdMtchRpt TrdID="$$+TrdID#" RptTyp="0" TrdDt="20120201" MtchTyp="4" LastMkt="ABCD" LastPx="104.11">
The key TrdID contain values beginning with $$ to identify that this value is variable data and needs to be amended once the message is constructed from a template, in this case to the next sequential number (stored in a dictionary - the overall idea is to load a dictionary from a file with the attribute key listed and the associated value such as the next sequential number e.g. dictionary file contains $$+TrdID# 12345 using space as the delim).
So far my script iterates the parsed XML and examines each indexed element in turn. There will be several fields in the xml file that require updating so I need to avoid using hard coded references to element tags.
How can I search the element/attribute to identify if the attribute contains a key where the corresponding value starts with or contains the specific string $$?
And for reasons unknown to me we cannot use lxml!
You can use XPath.
import lxml.etree as etree
import StringIO from StringIO
xml = """<FIXML>
<TrdMtchRpt TrdID="$$+TrdID#"
RptTyp="0"
TrdDt="20120201"
MtchTyp="4"
LastMkt="ABCD"
LastPx="104.11"/>
</FIXML>"""
tree = etree.parse(StringIO(xml))
To find elements TrdMtchRpt where the attribute TrdID starts with $$:
r = tree.xpath("//TrdMtchRpt[starts-with(#TrdID, '$$')]")
r[0].tag == 'TrdMtchRpt'
r[0].get("TrdID") == '$$+TrdID#'
If you want to find any element where at least one attribute starts with $$ you can do this:
r = tree.xpath("//*[starts-with(#*, '$$')]")
r[0].tag == 'TrdMtchRpt'
r[0].get("TrdID") == '$$+TrdID#'
Look at the documentation:
http://lxml.de/xpathxslt.html#the-xpath-method
http://www.w3schools.com/xpath/xpath_functions.asp#string
http://www.w3schools.com/xpath/xpath_syntax.asp
You can use ElementTree package. It gives you an object with a hierarchical data structure from XML document.

Categories