Nested XML structures from list of dicts in Python 3 - python

I have a list of dictionaries, and I would like to create an XML structure from it. Below is a sample of what I have tried. The main issue is that I cannot get out of the element "item" (I am probably not using the correct terms, please, forgive me in advance):
import xml.etree.cElementTree as ET
mapping = [
{"struct": "root\\item\\itemno"},
{"struct": "root\\item\\itemdup"},
{"struct": "root\\item\\edge\\type"},
{"struct": "root\\item\\edge\\len\\l1"},
{"struct": "root\\item\\edge\\len\\l2"},
]
# The first item will always be the root
root = ET.Element(mapping[0]["struct"].split("\\")[0])
for structure in mapping:
structure = structure["struct"].split("\\")
# remove the first element, root, as this was already added
del structure[0]
iter = 1
for element in structure:
print("Element being processed", element)
if not root.findall(element):
if iter == 1:
sub = ET.SubElement(root, element)
print(f"iter {iter} - element not found in root {element} - Added to ROOT")
else:
ET.SubElement(sub, element)
print(f"iter {iter} - element not found in root {element} - Added to SUB")
else:
print(f"iter {iter} - element found {element}")
iter += 1
ET.dump(root)
The result I get from this is the following:
<root>
<item>
<itemno />
<itemdup />
<edge />
<type />
<edge />
<len />
<l1 />
<edge />
<len />
<l2 />
</item>
</root>
What I would like is the following:
<root>
<item>
<itemno />
<itemdup />
<edge>
<type />
<len>
<l1 />
<l2 />
</len>
</edge>
</item>
</root>
It starts well, but when it gets to "type" it goes inside "item" and not "edge".
I put some print statements to help me debug it but I could not figure it out.
I have already looked into similar issues in SO, but the main problem is that my list changes, so I don't always know the name of the element I must search for and where to place it.

You have nested elements so you should create it in differet way
for every structure first set
sub = root
and later if item doesn't exist then create new item and get it as sub
sub = ET.SubElement(sub, element)
and if item exists then get it
sub = item
import xml.etree.cElementTree as ET
mapping = [
{"struct": "root\\item\\itemno"},
{"struct": "root\\item\\itemdup"},
{"struct": "root\\item\\edge\\type"},
{"struct": "root\\item\\edge\\len\\l1"},
{"struct": "root\\item\\edge\\len\\l2"},
]
# The first item will always be the root
root = ET.Element(mapping[0]["struct"].split("\\")[0])
for structure in mapping:
structure = structure["struct"].split("\\")
# remove the first element, root, as this was already added
del structure[0]
sub = root
for element in structure:
print("Element:", element)
item = sub.find(element)
if not item:
sub = ET.SubElement(sub, element)
else:
sub = item
print('---')
#ET.dump(root)
from xml.dom import minidom
text = minidom.parseString(ET.tostring(root)).toprettyxml(indent=" ")
print(text)

Related

XML to table form in Excel

There is this option when opening an xml file using Excel. You get prompted with the option as seen in the picture Here
It basically open that xml file in a table work and based on the analysis that I have done. It seems to do a pretty good job.
This is how it looks after I opened an xml file using excel as a tabel form Here
My Question: I want to convert an Xml into a table from like that feature in Excel does it. Is that possible?
The reason I want this result, is that working with tables inside excel is really easy using libraries like pandas. However, I don’t want to go an open every xml file with excel, show the table and then save it again. It is not very time efficient
This is my XML file
<?xml version="1.0" encoding="utf-8"?>
<ProjectData>
<FINAL>
<START id="ID0001" service_code="0x5196">
<Docs Docs_type="START">
<Rational>225196</Rational>
<Qualify>6251960000A0DE</Qualify>
</Docs>
<Description num="1213f2312">The parameter</Description>
<SetFile dg="" dg_id="">
<SetData value="32" />
</SetFile>
</START>
<START id="DG0003" service_code="0x517B">
<Docs Docs_type="START">
<Rational>23423</Rational>
<Qualify>342342</Qualify>
</Docs>
<Description num="3423423f3423">The third</Description>
<SetFile dg="" dg_id="">
<FileX dg="" axis_pts="2" name="" num="" dg_id="" />
<FileY unit="" axis_pts="20" name="TOOLS" text_id="23423" unit_id="" />
<SetData x="E1" value="21259" />
<SetData x="E2" value="0" />
</SetFile>
</START>
<START id="ID0048" service_code="0x5198">
<RawData rawdata_type="OPDATA">
<Request>225198</Request>
<Response>343243324234234</Response>
</RawData>
<Meaning text_id="434234234">The forth</Meaning>
<ValueDataset unit="m" unit_id="FEDS">
<FileX dg="kg" discrete="false" axis_pts="19" name="weight" text_id="SDF3" unit_id="SDGFDS" />
<SetData xin="sdf" xax="233" value="323" />
<SetData xin="123" xax="213" value="232" />
<SetData xin="2321" xax="232" value="23" />
</ValueDataset>
</START>
</FINAL>
</ProjectData>
So let's say I have the following input.xml file:
<main>
<item name="item1" image="a"></item>
<item name="item2" image="b"></item>
<item name="item3" image="c"></item>
<item name="item4" image="d"></item>
</main>
You can use the following code:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('input.xml')
tags = [ e.attrib for e in tree.getroot() ]
df = pd.DataFrame(tags)
# df:
# name image
# 0 item1 a
# 1 item2 b
# 2 item3 c
# 3 item4 d
And this should be independent of the number of attributes in a given file.
To write to a simple CSV file from pandas, you can use the to_csv command. See documentation. If it is necessary to be an excel sheet, you can use to_excel, see here.
# Write to csv without the row names
df.to_csv('file_name.csv', index = False)
# Write to xlsx sheet without the row names
df.to_excel('file_name.xlsx', index=False)
UPDATE:
For your XML file, and based on your clarification in the comments, I suggest the following, where all elements in the first level in the tree will be rows, and every attribute or node text will be column:
def has_children(e):
''' Check if element, e, has children'''
return(len(list(e)) > 0)
def has_attrib(e):
''' Check if element, e, has attributes'''
return(len(e.attrib)>0)
def get_uniqe_key(mydict, key):
''' Generate unique key if already exists in mydict'''
if key in mydict:
while key in mydict:
key = key + '*'
return(key)
tree = ET.parse('input2.xml')
root = tree.getroot()
# Get first level:
lvl_one = list(root)
myList = [];
for e in lvl_one:
mydict = {}
# Iterate over each node in level one element
for node in e.iter():
if (not has_children(node)) & (node.text != None):
uniqe_key = get_uniqe_key(mydict, node.tag)
mydict[uniqe_key] = node.text
if has_attrib(node):
for key in node.attrib:
uniqe_key = get_uniqe_key(mydict, key)
mydict[uniqe_key] = node.attrib[key]
myList.append(mydict)
print(pd.DataFrame(myList))
Notice in this code, I check if the column name exists for each key, and if it exists, I create a new column name by suffixing with '*'.

How do I write a function that takes an xml file and an integer value X as parameters and updates the attributes of the xml based on the given integer

I am trying to write a function that will take as parameters my xml file file.xml and an integer I want to input from the keyboard.
My xml files looks like this:
<root>
<item name="A" days="10"/>
<item name="B" days="20"/>
I have the integer X :
X= int(input("X value is:")
I want to add the X value to the days attribute in my xml.
for X=1.1 =>I want the output:
A, 11.1 days
B, 20.1 days
I don't know how to write the function because when I tried calling it the name of the file I wanted to open was not recognized =>
read_xml(file.xml)
NameError : name 'file' is not defined.
But more importantly, I don't know how to add an integer value to the attribute of an xml file.
What I did so far using the ElementTree library:
import os
import xml.etree.ElementTree as et
tree = et.ElementTree(file = 'file.xml')
root = tree.getroot()
for item in root.findall('item'):
names = item.get('name')
ages = item.get('age')
genders = item.get('sex')
print(f'''\n{names}, {ages} years old''')
At this moment I get the desired output format but without the integer X added to the days attribute.
Please let me know if you have any idea how to solve this in Python3.
Thanks!!!
import xml.etree.ElementTree as ET
xml = '''<root>
<item name="A" days="10"/>
<item name="B" days="20"/>
</root>'''
def change_days_value(factor):
root = ET.fromstring(xml)
items = root.findall('.//item')
for item in items:
item.attrib['days'] = str(int(item.attrib['days']) * factor)
ET.dump(root)
# read this value from the user
factor = 1.1
change_days_value(factor)
output
<root>
<item days="11.0" name="A" />
<item days="22.0" name="B" />
</root>

python ElementTree remove issue

I have xml file as following:
<plugin-config>
<properties>
<property name="AZSRVC_CONNECTION" value="diamond_plugins#AZSRVC" />
<property name="DIAMOND_HOST" value="10.0.230.1" />
<property name="DIAMOND_PORT" value="3333" />
</properties>
<pack-list>
<vsme-pack id="monthly_50MB">
<campaign-list>
<campaign id="2759" type="SOB" />
<campaign id="2723" type="SUBSCRIBE" />
</campaign-list>
</vsme-pack>
<vsme-pack id="monthly_500MB">
<campaign-list>
<campaign id="3879" type="SOB" />
<campaign id="3885" type="SOB" />
<campaign id="2724" type="SUBSCRIBE" />
<campaign id="1111" type="COB" /></campaign-list>
</vsme-pack>
</pack-list>
</plugin-config>
And trying to run this Python script to remove 'campaign' with specific id.
import xml.etree.ElementTree as ET
tree = ET.parse('pack-assign-config.xml')
root = tree.getroot()
pack_list = root.find('pack-list')
camp_list = pack_list.find(".//vsme-pack[#id='{pack_id}']".format(pack_id=pack_id)).find('campaign-list').findall('campaign')
for camp in camp_list:
if camp.get('id') == '2759':
camp_list.remove(camp)
tree.write('out.xml')
I run script but out is the same as input file, so does not remove element.
Issue :
this is wrong way to find the desired node . you are searching for vsme-pack and the trying to find campaign-list and campaign ? which incorrect format.
camp_list = pack_list.find(".//vsme-pack[#id='{pack_id}']".format(pack_id=pack_id)).find('campaign-list').findall('campaign')
Fixed Code Example
here is the working code which removes the node from xml
import xml.etree.ElementTree as ET
root = ET.parse('pack-assign-config.xml')
# Alternatively, parse the XML that lives in 'filename_path'
# tree = ElementTree.parse(filename_path)
# root = tree.getroot()
# Find the parent element of each "weight" element, using XPATH
for parent in root.findall('.//pack-list/'):
# Find each weight element
for element in parent.findall('campaign-list'):
for camp_list in element.findall('campaign'):
if camp_list.get('id') == '2759' or camp_list.get('id') == '3879' :
element.remove(camp_list)
root.write("out.xml")
hope this helps

Using Python LXML to removing XML element values but leaving one placeholder

I have an XML file which I would like to clear the text in the 'value' child elements, but leave one empty value element as a placeholder for adding text at a later date. I am using Python's LXML module.
Here's an example of the XML section:
<spec class="Spec" name="New Test">
<mainreport>
<item name="New Item">First Item</item>
</mainreport>
<case class="CaseItem" name="Some Name">
<extraelement>
<item name="ID">Some Id</item>
</extraelement>
<pool class="String" name="Originator">
<value>A</value>
<value>B</value>
<value>C</value>
</pool>
<pool class="String" name="Target">
<value>D</value>
<value>E</value>
<value>F</value>
</pool>
And here's what I am hoping to output:
<spec class="Spec" name="New Test">
<mainreport>
<item name="New Item">First Item</item>
</mainreport>
<case class="CaseItem" name="Some Name">
<extraelement>
<item name="ID">Some Id</item>
</extraelement>
<pool class="String" name="Originator">
<value></value>
</pool>
<pool class="String" name="Target">
<value></value>
</pool>
I have written the following code, but it only adds the "value" tag to the last element:
import lxml.etree as et
import os
xml_match = os.path.join("input.xml")
doc = et.parse(xml_match)
for elem in doc.xpath('//case/pool/value'):
elem.getparent().remove(elem)
blankval = et.Element("value")
blankval.text = ""
for elem in doc.xpath('//case/pool'):
elem.insert(1, blankval)
outFile = "output.xml"
doc.write(outFile)
I would remove all value elements and append an empty one in a single loop:
for elem in doc.xpath('//case/pool'):
for value in elem.findall("value"):
elem.remove(value)
blankval = et.Element("value")
blankval.text = ""
elem.append(blankval)
There is also a handy .clear() method, but it would also clear up the attributes.
The reason your current approach is not working is because you are trying to reuse the same exact blankval element, but instead, you need to recreate new element in the loop before you perform an insert operation:
for elem in doc.xpath('//case/pool'):
blankval = et.Element("value")
blankval.text = ""
elem.insert(1, blankval)

Minidom element insertion into xml

I have some problems with insertion data structure into xml document.But with no big success.I have file eg.
<?xml version="1.0" ?>
<marl version="2.1" xmlns="xxxx.xsd">
<mcdata id="2" scope="all" type="plan">
<header>
<log action="created"/>
</header>
<mObject class="foo" distName="a-1">
<p name="Ethernet">false</p>
<list name="pass"/>
</mObject>
<mObject class="bar" distName="a-1/b-2">
<p name="Voltage">false</p>
</mObject>
</mcdata>
</marl>
Basic version of my code goes like this, but seems to have no effect because output.xml is thesame as mini.xml.
from xml.dom.minidom import *
document = parse('mini.xml')
mo = document.getElementsByTagName("mObject")
element = document.createElement("mObject")
mo.append(element)
with open('output.xml', 'wb') as out:
document.writexml(out)
out.close()
Create a new node and decorate it as needed :
#create node <mObject>
element = document.createElement("mObject")
#add text content to the node
element.appendChild(document.createTextNode("content"))
#add attribute id to the node
element.setAttribute("id" , "foo")
#result: <mObject id="foo">content</mObject>
Add the newly created node to a parent node :
#select a parent node
mc = document.getElementsByTagName("mcdata")[0]
#append the new node as child of the parent
mc.appendChild(element)

Categories