Minidom element insertion into xml - python

I have some problems with insertion data structure into xml document.But with no big success.I have file eg.
<?xml version="1.0" ?>
<marl version="2.1" xmlns="xxxx.xsd">
<mcdata id="2" scope="all" type="plan">
<header>
<log action="created"/>
</header>
<mObject class="foo" distName="a-1">
<p name="Ethernet">false</p>
<list name="pass"/>
</mObject>
<mObject class="bar" distName="a-1/b-2">
<p name="Voltage">false</p>
</mObject>
</mcdata>
</marl>
Basic version of my code goes like this, but seems to have no effect because output.xml is thesame as mini.xml.
from xml.dom.minidom import *
document = parse('mini.xml')
mo = document.getElementsByTagName("mObject")
element = document.createElement("mObject")
mo.append(element)
with open('output.xml', 'wb') as out:
document.writexml(out)
out.close()

Create a new node and decorate it as needed :
#create node <mObject>
element = document.createElement("mObject")
#add text content to the node
element.appendChild(document.createTextNode("content"))
#add attribute id to the node
element.setAttribute("id" , "foo")
#result: <mObject id="foo">content</mObject>
Add the newly created node to a parent node :
#select a parent node
mc = document.getElementsByTagName("mcdata")[0]
#append the new node as child of the parent
mc.appendChild(element)

Related

Nested XML structures from list of dicts in Python 3

I have a list of dictionaries, and I would like to create an XML structure from it. Below is a sample of what I have tried. The main issue is that I cannot get out of the element "item" (I am probably not using the correct terms, please, forgive me in advance):
import xml.etree.cElementTree as ET
mapping = [
{"struct": "root\\item\\itemno"},
{"struct": "root\\item\\itemdup"},
{"struct": "root\\item\\edge\\type"},
{"struct": "root\\item\\edge\\len\\l1"},
{"struct": "root\\item\\edge\\len\\l2"},
]
# The first item will always be the root
root = ET.Element(mapping[0]["struct"].split("\\")[0])
for structure in mapping:
structure = structure["struct"].split("\\")
# remove the first element, root, as this was already added
del structure[0]
iter = 1
for element in structure:
print("Element being processed", element)
if not root.findall(element):
if iter == 1:
sub = ET.SubElement(root, element)
print(f"iter {iter} - element not found in root {element} - Added to ROOT")
else:
ET.SubElement(sub, element)
print(f"iter {iter} - element not found in root {element} - Added to SUB")
else:
print(f"iter {iter} - element found {element}")
iter += 1
ET.dump(root)
The result I get from this is the following:
<root>
<item>
<itemno />
<itemdup />
<edge />
<type />
<edge />
<len />
<l1 />
<edge />
<len />
<l2 />
</item>
</root>
What I would like is the following:
<root>
<item>
<itemno />
<itemdup />
<edge>
<type />
<len>
<l1 />
<l2 />
</len>
</edge>
</item>
</root>
It starts well, but when it gets to "type" it goes inside "item" and not "edge".
I put some print statements to help me debug it but I could not figure it out.
I have already looked into similar issues in SO, but the main problem is that my list changes, so I don't always know the name of the element I must search for and where to place it.
You have nested elements so you should create it in differet way
for every structure first set
sub = root
and later if item doesn't exist then create new item and get it as sub
sub = ET.SubElement(sub, element)
and if item exists then get it
sub = item
import xml.etree.cElementTree as ET
mapping = [
{"struct": "root\\item\\itemno"},
{"struct": "root\\item\\itemdup"},
{"struct": "root\\item\\edge\\type"},
{"struct": "root\\item\\edge\\len\\l1"},
{"struct": "root\\item\\edge\\len\\l2"},
]
# The first item will always be the root
root = ET.Element(mapping[0]["struct"].split("\\")[0])
for structure in mapping:
structure = structure["struct"].split("\\")
# remove the first element, root, as this was already added
del structure[0]
sub = root
for element in structure:
print("Element:", element)
item = sub.find(element)
if not item:
sub = ET.SubElement(sub, element)
else:
sub = item
print('---')
#ET.dump(root)
from xml.dom import minidom
text = minidom.parseString(ET.tostring(root)).toprettyxml(indent=" ")
print(text)

XML to table form in Excel

There is this option when opening an xml file using Excel. You get prompted with the option as seen in the picture Here
It basically open that xml file in a table work and based on the analysis that I have done. It seems to do a pretty good job.
This is how it looks after I opened an xml file using excel as a tabel form Here
My Question: I want to convert an Xml into a table from like that feature in Excel does it. Is that possible?
The reason I want this result, is that working with tables inside excel is really easy using libraries like pandas. However, I don’t want to go an open every xml file with excel, show the table and then save it again. It is not very time efficient
This is my XML file
<?xml version="1.0" encoding="utf-8"?>
<ProjectData>
<FINAL>
<START id="ID0001" service_code="0x5196">
<Docs Docs_type="START">
<Rational>225196</Rational>
<Qualify>6251960000A0DE</Qualify>
</Docs>
<Description num="1213f2312">The parameter</Description>
<SetFile dg="" dg_id="">
<SetData value="32" />
</SetFile>
</START>
<START id="DG0003" service_code="0x517B">
<Docs Docs_type="START">
<Rational>23423</Rational>
<Qualify>342342</Qualify>
</Docs>
<Description num="3423423f3423">The third</Description>
<SetFile dg="" dg_id="">
<FileX dg="" axis_pts="2" name="" num="" dg_id="" />
<FileY unit="" axis_pts="20" name="TOOLS" text_id="23423" unit_id="" />
<SetData x="E1" value="21259" />
<SetData x="E2" value="0" />
</SetFile>
</START>
<START id="ID0048" service_code="0x5198">
<RawData rawdata_type="OPDATA">
<Request>225198</Request>
<Response>343243324234234</Response>
</RawData>
<Meaning text_id="434234234">The forth</Meaning>
<ValueDataset unit="m" unit_id="FEDS">
<FileX dg="kg" discrete="false" axis_pts="19" name="weight" text_id="SDF3" unit_id="SDGFDS" />
<SetData xin="sdf" xax="233" value="323" />
<SetData xin="123" xax="213" value="232" />
<SetData xin="2321" xax="232" value="23" />
</ValueDataset>
</START>
</FINAL>
</ProjectData>
So let's say I have the following input.xml file:
<main>
<item name="item1" image="a"></item>
<item name="item2" image="b"></item>
<item name="item3" image="c"></item>
<item name="item4" image="d"></item>
</main>
You can use the following code:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('input.xml')
tags = [ e.attrib for e in tree.getroot() ]
df = pd.DataFrame(tags)
# df:
# name image
# 0 item1 a
# 1 item2 b
# 2 item3 c
# 3 item4 d
And this should be independent of the number of attributes in a given file.
To write to a simple CSV file from pandas, you can use the to_csv command. See documentation. If it is necessary to be an excel sheet, you can use to_excel, see here.
# Write to csv without the row names
df.to_csv('file_name.csv', index = False)
# Write to xlsx sheet without the row names
df.to_excel('file_name.xlsx', index=False)
UPDATE:
For your XML file, and based on your clarification in the comments, I suggest the following, where all elements in the first level in the tree will be rows, and every attribute or node text will be column:
def has_children(e):
''' Check if element, e, has children'''
return(len(list(e)) > 0)
def has_attrib(e):
''' Check if element, e, has attributes'''
return(len(e.attrib)>0)
def get_uniqe_key(mydict, key):
''' Generate unique key if already exists in mydict'''
if key in mydict:
while key in mydict:
key = key + '*'
return(key)
tree = ET.parse('input2.xml')
root = tree.getroot()
# Get first level:
lvl_one = list(root)
myList = [];
for e in lvl_one:
mydict = {}
# Iterate over each node in level one element
for node in e.iter():
if (not has_children(node)) & (node.text != None):
uniqe_key = get_uniqe_key(mydict, node.tag)
mydict[uniqe_key] = node.text
if has_attrib(node):
for key in node.attrib:
uniqe_key = get_uniqe_key(mydict, key)
mydict[uniqe_key] = node.attrib[key]
myList.append(mydict)
print(pd.DataFrame(myList))
Notice in this code, I check if the column name exists for each key, and if it exists, I create a new column name by suffixing with '*'.

Filling values under child through element tree

I have an xml file and another text file and i had written script to parse the text file and get a dictionary with keys and values ,Now i have to go inside the xml file and fill the values for child of child testgroup,the values include test case_title ,inddent etc,
And also based on the length of aa in the script i need to create child under test group,i have minimum exposure in elementtree,Any recomendation would be highly helpful.
xml = """<?xml version="1.0" encoding="UTF-8"?>
<testmodule title="hello" version="version 2">
<description> 'world' </description>
<engineer>
<info>
<name>Test </name>
<description> 'test' </description>
</info>
</engineer>
<preparation>
<initialize title="Set">
</initialize>
</preparation>
<variants>
<variant name="A">Test </variant>
<variant name="B">test</variant>
<variant name="C">Test test</variant>
</variants>
<testgroup title="Testing" ident="Testing" >
<testcase title="Check" ident= "3_1" name="Number" variants="A">
<param name="Testcase" type="string">Checking of Correct SW and Part identifiers </param>
<param name="TestcaseRequirements" type="string"></param>
<param name="Test" type="string">TS_Automation=Manual;TS_Method=Bench_Test;TS_Priority=1;TS_Tested_By=rjrjjn;TS_Written_By=SUN;TS_Review_done=No;TS_Regression=No;</param>
</testcase>
"""
ee=''
with open('C:\\Users\\rjrn8w\\Desktop\\Test.txt', "r") as f:
for i in f:
ee+=i
import re
aa=re.findall(r'<TC_HEADER_START>([\s\S]*)</TC_HEADER_END>',ee)
for j in aa:
k=j.strip()
new_dict={}
ak=dict(re.findall(r'(\S+)=(.*)', j.strip()))
print ak
import xml.etree.ElementTree as ET
tree = ET.parse('C:\\Users\\rjrn8w\\Documents\\My Received Files\\new.xml')
root = tree.getroot()
for child in root:
if child.tag=='testgroup':
for element in child:
for elem in element:
import pdb;pdb.set_trace()
print elem.tag
ak={'TS_Regression': 'No', 'ident': '1 ', 'TestcaseRequirements': '36978', 'name': '"T01">', 'title': '"DHCP " ', 'TS_Review_done': 'Yes;', 'TestcaseTestType': 'Test', 'TS_Priority': '1;', 'TS_Tested_By': 'qz9ghv;', 'TS_Techniques': 'Full Testing;', 'variants': '"A C" ', 'StakeholderRequirements': '1236\t\t\t\t', 'TS_Implemented': 'Yes;', 'TS_Automation': 'Automated;', 'TestcaseDescription': ' This test verifies DHCP discovery is halted after tester is connected'}
you can read python documentation, it have content of how to create Element, add attribute , add values and how to append the new element to existed element as a child.
child=xml.etree.ElementTree.Element(tag, attrib={}, **extra)
existedelement.append(child)

How to insert text from file into new XML tags

I have the following code to try to parse an XML file such that it reads from external text files (if found) and inserts its contents into newly introduced tags and saves a new XML file with the resultant manipulations.
The code looks like this:
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
import os
# define our data file
data_file = 'test2_of_2016-09-19.xml'
tree = ET.ElementTree(file=data_file)
root = tree.getroot()
for element in root:
if element.find('File_directory') is not None:
directory = element.find('File_directory').text
if element.find('Introduction') is not None:
introduction = element.find('Introduction').text
if element.find('Directions') is not None:
directions = element.find('Directions').text
for element in root:
if element.find('File_directory') is not None:
if element.find('Introduction') is not None:
intro_tree = directory+introduction
with open(intro_tree, 'r') as f:
intro_text = f.read()
f.closed
intro_body = ET.SubElement(element,'Introduction_Body')
intro_body.text = intro_text
if element.find('Directions') is not None:
directions_tree = directory+directions
with open(directions_tree, 'r') as f:
directions_text = f.read()
f.closed
directions_body = ET.SubElement(element,'Directions_Body')
directions_body.text = directions_text
tree.write('new_' + data_file)
The problem is that it seems like the last found instance of file_directory, introduction, and directions is saved and spread out to multiple entries, which is not desired as each entry has its own individual record so to speak.
The source XML file looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Row>
<Entry_No>1</Entry_No>
<Waterfall_Name>Bridalveil Fall</Waterfall_Name>
<File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory>
<Introduction>introduction-bridalveil-fall.html</Introduction>
<Directions>directions-bridalveil-fall.html</Directions>
</Row>
<Row>
<Entry_No>52</Entry_No>
<Waterfall_Name>Switzer Falls</Waterfall_Name>
<File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory>
<Introduction>introduction-switzer-falls.html</Introduction>
<Directions>directions-switzer-falls.html</Directions>
</Row>
</Root>
The desired output XML should look like this:
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Row>
<Entry_No>1</Entry_No>
<Waterfall_Name>Bridalveil Fall</Waterfall_Name>
<File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory>
<Introduction>introduction-bridalveil-fall.html</Introduction>
<Directions>directions-bridalveil-fall.html</Directions>
<Introduction_Body>Text from ./waterfall_writeups/1_Bridalveil_Fall/introduction-bridalveil-fall.html</Introduction_Body>
<Directions_Body>Text from ./waterfall_writeups/1_Bridalveil_Fall/directions-bridalveil-fall.html</Directions_Body>
</Row>
<Row>
<Entry_No>52</Entry_No>
<Waterfall_Name>Switzer Falls</Waterfall_Name>
<File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory>
<Introduction>introduction-switzer-falls.html</Introduction>
<Directions>directions-switzer-falls.html</Directions>
<Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body>
<Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body>
</Row>
</Root>
But what I end up getting is:
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Row>
<Entry_No>1</Entry_No>
<Waterfall_Name>Bridalveil Fall</Waterfall_Name>
<File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory>
<Introduction>introduction-bridalveil-fall.html</Introduction>
<Directions>directions-bridalveil-fall.html</Directions>
<Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body>
<Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body>
</Row>
<Row>
<Entry_No>52</Entry_No>
<Waterfall_Name>Switzer Falls</Waterfall_Name>
<File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory>
<Introduction>introduction-switzer-falls.html</Introduction>
<Directions>directions-switzer-falls.html</Directions>
<Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body>
<Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body>
</Row>
</Root>
As an aside, is there any way to introduce the body tags' content without it all being printed on one line (for readability)?
The first for loop iterates over the Row elements of your document, assigning new values to your directory, introduction, and directions variables respectively, with each iteration, ending up with the values from the last occurring Row element.
What I would do is create a dictionary to map tag names to text contents, and then use that mapping to add the new sub-elements on the fly. Example (without reading the referenced files):
for row in root:
elements = {}
for node in row:
elements[node.tag] = node.text
directory = elements['File_directory']
intro_tree = directory + elements['Introduction']
intro_body = ET.SubElement(row, 'Introduction_Body')
intro_body.text = 'Text from %s' % intro_tree
directions_tree = directory + elements['Directions']
directions_body = ET.SubElement(row, 'Directions_Body')
directions_body.text = 'Text from %s' % directions_tree

lxml, get xml between elements

given this sample xml:
<xml>
<pb facs="id1" />
<aa></aa>
<aa></aa>
<lot-of-xml></lot-of-xml>
<pb facs="id2" />
<bb></bb>
<bb></bb>
<lot-of-xml></lot-of-xml>
</xml>
i need to parse it and get all the content between pb, saving into distinct external files.
expected result:
$ cat id1
<aa></aa>
<aa></aa>
<lot-of-xml></lot-of-xml>
$ cat id2
<bb></bb>
<bb></bb>
<lot-of-xml></lot-of-xml>
what is the correct xpath axe to use?
from lxml import etree
xml = etree.parse("sample.xml")
for pb in xml.xpath('//pb'):
filename = pb.xpath('#facs')[0]
f = open(filename, 'w')
content = **{{ HOW TO GET THE CONTENT HERE? }}**
f.write(content)
f.close()
is there any xpath expression to get all descendants and stop when reached a new pb?
Do you want to extract the tag between two pb's? If yes then that's not quite possible because it is not a tag in between pb's rather than an individual tag on the same level as pb as you have closed the tag pb . If you close the tag after the test tag then test can become a child of pb.
In other words if your xml is like this:
<xml>
<pb facs="id1">
<test></test>
</pb>
<test></test>
<pb facs="id2" />
<test></test>
<test></test>
</xml>
Then you can use
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for child in root:
for subchild in child:
print subchild
to print the subchild('test') with pb as a parent.
Well if that's not the case (you just want to extract the attributes of pb tag)then you can use either of the two methods shown below to extract the elements.
With python's inbuilt etree
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
for child in root:
if child.get('facs'):
print child.get('facs')
With the lxml library you can parse it like this:
tree = etree.parse('test.xml')
root = tree.getroot()
for child in root:
if child.get('facs'):
print child.get('facs')
OK, I tested this code:
lists = []
for node in tree.findall('*'):
if node.tag == 'pb':
lists.append([])
else:
lists[-1].append(node)
Output:
>>> lists
[[<Element test at 2967fa8>, <Element test at 2a89030>, <Element lot-of-xml at 2a89080>], [<Element test at 2a89170>, <Element test at 2a891c0>, <Element lot-of-xml at 2a89210>]]
Input file (just in case):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xml>
<pb facs="id1" />
<test></test>
<test></test>
<lot-of-xml></lot-of-xml>
<pb facs="id2" />
<test></test>
<test></test>
<lot-of-xml></lot-of-xml>
</xml>

Categories