Python ElementTree: replacing elements in a loop - python

I'm trying to create a script that loops creating an xml file, with incrementing values for two elements. (an IP address using netaddr, and the tag/member element that increments, tag01 - tag10)
from netaddr import IPNetwork
import xml.dom.minidom
import lxml.etree as etree
import xml.etree.cElementTree as ET
ip = IPNetwork('10.10.10.0/24')
count = 1
tag = range(1,10)
uid = ET.Element("message")
type = ET.SubElement(uid, "type").text = "update"
payload = ET.SubElement(uid, "payload")
register = ET.SubElement(payload, "register")
entry = ET.SubElement(register, "entry", ip="11.11.11.11")
tag = ET.SubElement(entry, "tag")
ET.SubElement(tag, "member").text = "tag1"
tree = ET.ElementTree(uid)
while count <= 10:
elemtag = tree.findall(".//member")
for elemt in elemtag:
elemt.text = 'tag{}'.format(tag)
elemip = tree.findall(".//entry")
for elemi in elemip:
elemi.text = 'ip="{}"'.format(ip)
count += 1
ET.dump(uid)
print(count)
#tree.write("tmp.xml")
#x = etree.parse("tmp.xml")
#print etree.tostring(x, pretty_print=True)
#etree.parse("tmp.xml").write("pretty.xml", encoding="utf-8", pretty_print=True)
#os.system('tool.py -f pretty.xml')
I figured out how to create the xml I needed using ElementTree, and if I comment out my loop and write the the resulting xml, it's correct, looks good, and works with the tool that consumes it (yay!).
<message>
<type>update</type>
<payload>
<register>
<entry ip="11.11.11.11">
<tag>
<member>tag1</member>
</tag>
</entry>
</register>
</payload>
</message>
However, when I add my loop to replace the values for the two elements, I just cant seem to get it right, and I'm clobbering the tags/elements in the tree.
<message>
<type>update</type>
<payload>
<register>
<entry ip="11.11.11.11">ip="10.10.10.0/24"<tag><member>tag<Element 'tag' at
0x7f7b29d66c90></member></tag></entry>
</register>
</payload>
</message>
I keep trying different things to replace the elements, but they just end up as different permutations of wrong, and I just can't seem to get it right. Hoping someone can help me figure out what I'm missing. Thanks in advance!

< Entry > it's an element and "ip" it's an attribute of that element.
We can change the attribute for the element in the following way:
tree.find('.//entry').attrib['ip'] = "22.22.22.22"
"tag1" it's .text inside of the < member > element but we reassigned the tag variable at this line:
tag = ET.SubElement(entry, "tag")
According that we assigned own tag object to the tag element's text instead of tag number.
I updated your code. Now this solution generates new XML < message > at each "count" iteration. I hope it will help.
from netaddr import IPNetwork
import xml.etree.cElementTree as ET
ip = IPNetwork('10.10.10.0/24')
tag_lst = list(range(1, 11))
count = 1
uid = ET.Element("message")
type = ET.SubElement(uid, "type").text = "update"
payload = ET.SubElement(uid, "payload")
register = ET.SubElement(payload, "register")
entry = ET.SubElement(register, "entry", ip="11.11.11.11")
tag = ET.SubElement(entry, "tag")
ET.SubElement(tag, "member").text = "tag1"
tree = ET.ElementTree(uid)
while count <= 10:
tree.find('.//member').text = "tag"+str(count)
tree.find('.//entry').attrib['ip'] = format(ip[count])
count += 1
ET.dump(uid)
print(count)
XML output is:
<message><type>update</type><payload><register><entry ip="10.10.10.1"><tag><member>tag1</member></tag></entry></register></payload></message>
<message><type>update</type><payload><register><entry ip="10.10.10.2"><tag><member>tag2</member></tag></entry></register></payload></message>
<message><type>update</type><payload><register><entry ip="10.10.10.3"><tag><member>tag3</member></tag></entry></register></payload></message>
<message><type>update</type><payload><register><entry ip="10.10.10.4"><tag><member>tag4</member></tag></entry></register></payload></message>
<message><type>update</type><payload><register><entry ip="10.10.10.5"><tag><member>tag5</member></tag></entry></register></payload></message>
<message><type>update</type><payload><register><entry ip="10.10.10.6"><tag><member>tag6</member></tag></entry></register></payload></message>
<message><type>update</type><payload><register><entry ip="10.10.10.7"><tag><member>tag7</member></tag></entry></register></payload></message>
<message><type>update</type><payload><register><entry ip="10.10.10.8"><tag><member>tag8</member></tag></entry></register></payload></message>
<message><type>update</type><payload><register><entry ip="10.10.10.9"><tag><member>tag9</member></tag></entry></register></payload></message>
<message><type>update</type><payload><register><entry ip="10.10.10.10"><tag><member>tag10</member></tag></entry></register></payload></message>

Related

How convert xml to csv file using python (in row)?

I want to encode this xml document in cvs. I tried but it does not work I do not know what I'm doing wrong.I'm new on this.
There is the xml that i want to convert
<?xml version="1.0" encoding="UTF-8"?>
<Shot
Shotcode = "30AA"
ShotDate = "4/2/2000">
<Images>
<Image
ImageNumber="103"
RawFileName="18_Shot_30AA.jpg" />
<Image
ImageNumber="104"
RawFileName="17_Shot_30AA.jpg" />
<Image
ImageNumber="105"
RawFileName="14_Shot_30AA" />
</Images>
<Metrics>
<Metric
Name = "30AA"
TypeId = "163"
Value = "0" />
<Metric
Name = "Area"
TypeId = "10"
Value = "63" />
</Metrics>
</Shot>
I code this in that form, in order to complete some example and is not the complete program but show what i'm doing.
import xml.etree.ElementTree as ET
import csv
tree = ET.parse("30AA.xml")
root = tree.getroot()
30AA = open('30AA.csv', 'w+')
csvwriter = csv.writer(30AA)
head = []
count = 0 #loops
for member in root.findall('Shot'):
Shot = []
if count == 0:
ShotCode = member.find('ShotCode').tag
head.append(ShotCode)
ShotDate = member.find('ShotDate').tag
head.append(ShotDate)
csvwriter.writerow(head)
count = count + 1
ShotCode = member.find('ShotCode').txt
Shot.append(ShotCode)
ShotDate = member.find('ShotDate').txt
Shot.append(ShotDate)
30AA.close()
the result that i expect is
Shotcode 30AA
ShotDate 4/2/2000
Imagen 103
Imagen 104
Imagen 105
Name TypeId Value
30AA 163 0
area 10 63
Okay I think I see whats going wrong, the major problem is mostly in reading the xml It just looks like its a csv thing.
The root of your xml is a Shot tag, so you can't use root.findall('Shot') to get all the tags since root is already and it doesn't have any Shot's inside it.
So that why your not getting anything in your output.
Also when you want to get the attributes of a tag you use .attrib['name_of_attribute'] so for example instead of member.find('ShotCode').tag should be member.attrib['ShotCode']
That changes the rest of the script quite a bit but you then need to do something like this:
root = tree.getroot()
_30AA = open('30AA.csv', 'w+')
csvwriter = csv.writer(_30AA)
head = []
ShotCode = root.attrib['Shotcode']
csvwriter.writerow(['ShotCode', ShotCode])
head.append(ShotCode)
ShotDate = root.attrib['ShotDate']
csvwriter.writerow(['ShotDate', ShotDate])
# member is going to be the <Images> and <Metrics>
for member in root.getchildren():
submembers = member.getchildren()
# Write the names of the attributes as headings
keys = submembers[0].attrib.keys()
csvwriter.writerow(keys)
for submember in submembers:
row_data = [submember.attrib[k] for k in keys]
csvwriter.writerow(row_data )
_30AA.close()
Will give you what you want

element attributes missing when parsing XML with iterparse/lxml/python 2

Here's my use case:
I have a potentially large XML file, and I want to output the frequency of all the unique structural variations of a given element type. Element attributes should be included as part of the uniqueness test. The output should sort the variations by frequency.
Here's a trivial input example, with 4 entries for automobile:
<automobile>
<mileage>20192</mileage>
<year>2005</year>
<user_defined name="color">red</user_defined>
</automobile>
<automobile>
<mileage>1098</mileage>
<year>2018</year>
<user_defined name="color">blue</user_defined>
</automobile>
<automobile>
<mileage>17964</mileage>
<year>2012</year>
<user_defined name="title_status">salvage</user_defined>
</automobile>
<automobile>
<mileage>198026</mileage>
<year>1990</year>
</automobile>
The output I expect would look like this:
<automobile automobile_frequency="2">
<mileage />
<year />
<user_defined name="color" />
</automobile>
<automobile automobile_frequency="1">
<mileage />
<year />
<user_defined name="title_status" />
</automobile>
<automobile automobile_frequency="1">
<mileage />
<year />
</automobile>
I've implemented the code using iterparse, but when it's processing the elements, the attributes do not exist in the element. The code logic appears to be correct, but attributes simply don't exist; they are not written in the output, and they are not present for the uniqueness test. Per the above input example, this is what I get on output:
<root>
<automobile automobile_frequency="3">
<mileage/>
<year/>
<user_defined/>
</automobile>
<automobile automobile_frequency="1">
<mileage/>
<year/>
</automobile>
</root>
The usage is:
xplore.py input.xml node_to_explore
In the above example, I used:
xplore.py trivial.xml automobile
Here's the source:
from lxml import etree
import sys
import re
from datetime import datetime
# global node signature map
structure_map = {}
# global code frequency map
frequency_map = {}
# output tree
tmp_root = etree.Element("tmp_root")
def process_element(el):
global target
if el.tag != target:
return
# get the structure of the element
structure = get_structure(el)
global structure_map
structure_key = etree.tostring(structure, pretty_print=True)
if structure_key not in structure_map.keys():
# add signature to structure map
structure_map[structure_key] = structure
# add node to output
global tmp_root
tmp_root.append(structure)
# add signature to frequency map
frequency_map[structure_key] = 1
else:
# increment frequency map
frequency_map[structure_key] += 1
# returns a unique string representing the structure of the node
# including attributes
def get_structure(el):
# create new element for the return value
ret = etree.Element(el.tag)
# get attributes
attribute_keys = el.attrib.keys()
for attribute_key in attribute_keys:
ret.set(attribute_key, el.get(attribute_key))
# check for children
children = list(el)
for child in children:
ret.append(get_structure(child))
return ret
if len(sys.argv) < 3:
print "Must specify an XML file for processing, as well as an element type!"
exit(0)
# Get XML file
xml = sys.argv[1]
# Get output file name
output_file = xml[0:xml.rindex(".")]+".txt"
# get target element type to evaluate
target = sys.argv[2]
# mark start
startTime = datetime.now()
# Parse XML
print '==========================='
print 'Parsing XML'
print '==========================='
context = etree.iterparse(xml, events=('end',))
for event, element in context:
process_element(element)
element.clear()
# create tree sorted by frequency
ranked = sorted(frequency_map.items(), key=lambda x: x[1], reverse=True)
root = etree.Element("root")
for item in ranked:
structure = structure_map[item[0]]
structure.set(target+"_frequency", str(item[1]))
root.append(structure)
# pretty print root
out = open(output_file, 'w')
out.write(etree.tostring(root, pretty_print=True))
# output run time
time = datetime.now() - startTime
reg3 = re.compile("\\d+:\\d(\\d:\\d+\\.\\d{4})")
time = re.search(reg3, unicode(time))
time = "Runtime: %ss" % (time.group(1).encode("utf-8"))
print(time)
In the debugger, I can clearly see that the attributes are missing from elements in the calls to get_structure. Can anyone tell me why this is the case?
The data:
<root>
<automobile>
<mileage>20192</mileage>
<year>2005</year>
<user_defined name="color">red</user_defined>
</automobile>
<automobile>
<mileage>1098</mileage>
<year>2018</year>
<user_defined name="color">blue</user_defined>
</automobile>
<automobile>
<mileage>17964</mileage>
<year>2012</year>
<user_defined name="title_status">salvage</user_defined>
</automobile>
<automobile>
<mileage>198026</mileage>
<year>1990</year>
</automobile>
</root>
The code:
from lxml import etree
import sys
import re
from datetime import datetime
# global node signature map
structure_map = {}
# global code frequency map
frequency_map = {}
# output tree
tmp_root = etree.Element("tmp_root")
def process_element(el):
# get the structure of the element
structure = get_structure(el)
global structure_map
structure_key = etree.tostring(structure, pretty_print=True)
if structure_key not in structure_map.keys():
# add signature to structure map
structure_map[structure_key] = structure
# add node to output
global tmp_root
tmp_root.append(structure)
# add signature to frequency map
frequency_map[structure_key] = 1
else:
# increment frequency map
frequency_map[structure_key] += 1
# returns a unique string representing the structure of the node
# including attributes
def get_structure(el):
# create new element for the return value
ret = etree.Element(el.tag)
# get attributes
attribute_keys = el.attrib.keys()
for attribute_key in attribute_keys:
ret.set(attribute_key, el.get(attribute_key))
# check for children
children = list(el)
for child in children:
ret.append(get_structure(child))
return ret
if len(sys.argv) < 3:
print "Must specify an XML file for processing, as well as an element type!"
exit(0)
# Get XML file
xml = sys.argv[1]
# Get output file name
output_file = xml[0:xml.rindex(".")]+".txt"
# get target element type to evaluate
target = sys.argv[2]
# mark start
startTime = datetime.now()
# Parse XML
print '==========================='
print 'Parsing XML'
print '==========================='
context = etree.iterparse(xml, events=('end',))
element_to_clear = []
for event, element in context:
element_to_clear.append(element)
global target
if element.tag == target:
process_element(element)
for ele in element_to_clear:
ele.clear()
element_to_clear = []
# create tree sorted by frequency
ranked = sorted(frequency_map.items(), key=lambda x: x[1], reverse=True)
root = etree.Element("root")
for item in ranked:
structure = structure_map[item[0]]
structure.set(target+"_frequency", str(item[1]))
root.append(structure)
# pretty print root
out = open(output_file, 'w')
out.write(etree.tostring(root, pretty_print=True))
# output run time
time = datetime.now() - startTime
reg3 = re.compile("\\d+:\\d(\\d:\\d+\\.\\d{4})")
time = re.search(reg3, unicode(time))
time = "Runtime: %ss" % (time.group(1).encode("utf-8"))
print(time)
The command: xplore.py trivial.xml automobile

How can I parse a XML file to a dictionary in Python?

I 'am trying to parse a XML file using the Python library minidom (even tried xml.etree.ElementTree API).
My XML (resource.xml)
<?xml version='1.0'?>
<quota_result xmlns="https://some_url">
</quota_rule>
<quota_rule name='max_mem_per_user/5'>
<users>user1</users>
<limit resource='mem' limit='1550' value='921'/>
</quota_rule>
<quota_rule name='max_mem_per_user/6'>
<users>user2 /users>
<limit resource='mem' limit='2150' value='3'/>
</quota_rule>
</quota_result>
I would like to parse this file and store inside a dictionnary the information in the following form and be able to access it:
dict={user1=[resource,limit,value],user2=[resource,limit,value]}
So far I have only been able to do things like:
docXML = minidom.parse("resource.xml")
for node in docXML.getElementsByTagName('limit'):
print node.getAttribute('value')
You can use getElementsByTagName and getAttribute to trace the result:
dict_users = dict()
docXML = parse('mydata.xml')
users= docXML.getElementsByTagName("quota_rule")
for node in users:
user = 'None'
tag_user = node.getElementsByTagName("users") #check the length of the tag_user to see if tag <users> is exist or not
if len(tag_user) ==0:
print "tag <users> is not exist"
else:
user = tag_user[0]
resource = node.getElementsByTagName("limit")[0].getAttribute("resource")
limit = node.getElementsByTagName("limit")[0].getAttribute("limit")
value = node.getElementsByTagName("limit")[0].getAttribute("value")
dict_users[user.firstChild.data]=[resource, limit, value]
if user == 'None':
dict_users['None']=[resource, limit, value]
else:
dict_users[user.firstChild.data]=[resource, limit, value]
print(dict_users) # remove the <users>user1</users> in xml
Output:
tag <users> is not exist
{'None': [u'mem', u'1550', u'921'], u'user2': [u'mem', u'2150', u'3']}

Generating XML dynamically in Python

Hey friends I am generating XML data using Python libraries as follow
def multiwan_info_save(request):
data = {}
init = "init"
try:
form = Addmultiwanform(request.POST)
except:
pass
if form.is_valid():
from_sv = form.save(commit=False)
obj_get = False
try:
obj_get = MultiWAN.objects.get(isp_name=from_sv.isp_name)
except:
obj_get = False
nameservr = request.POST.getlist('nameserver_mw')
for nm in nameservr:
nameserver1, is_new = NameServer.objects.get_or_create(name=nm)
from_sv.nameserver = nameserver1
from_sv.save()
# main(init)
top = Element('ispinfo')
# comment = Comment('Generated for PyMOTW')
#top.append(comment)
all_connection = MultiWAN.objects.all()
for conn in all_connection:
child = SubElement(top, 'connection number ='+str(conn.id)+'name='+conn.isp_name+'desc='+conn.description )
subchild_ip = SubElement(child,'ip_address')
subchild_subnt = SubElement(child,'subnet')
subchild_gtwy = SubElement(child,'gateway')
subchild_nm1 = SubElement(child,'probe_server1')
subchild_nm2 = SubElement(child,'probe_server2')
subchild_interface = SubElement(child,'interface')
subchild_weight = SubElement(child,'weight')
subchild_ip.text = str(conn.ip_address)
subchild_subnt.text = str(conn.subnet)
subchild_gtwy.text = str(conn.gateway)
subchild_nm1.text = str(conn.nameserver.name)
# subchild_nm2.text = conn.
subchild_weight.text = str(conn.weight)
subchild_interface.text = str(conn.interface)
print "trying to print _____________________________"
print tostring(top)
print "let seeeeeeeeeeeeeeeeee +++++++++++++++++++++++++"
But I am getting output like follow
<ispinfo><connection number =5name=Airtelllldesc=Largets TRelecome ><ip_address>192.168.1.23</ip_address><subnet>192.168.1.23</subnet><gateway>192.168.1.23</gateway><probe_server1>192.168.99.1</probe_server1><probe_server2 /><interface>eth0</interface><weight>160</weight></connection number =5name=Airtelllldesc=Largets TRelecome ><connection number =6name=Uninordesc=Uninor><ip_address>192.166.55.23</ip_address><subnet>192.166.55.23</subnet><gateway>192.168.1.23</gateway><probe_server1>192.168.99.1</probe_server1><probe_server2 /><interface>eth0</interface><weight>160</weight></connection number =6name=Uninordesc=Uninor><connection number =7name=Airteldesc=Largets TRelecome ><ip_address>192.168.1.23</ip_address><subnet>192.168.1.23</subnet><gateway>192.168.1.23</gateway><probe_server1>192.168.99.1</probe_server1><probe_server2 /><interface>eth0</interface><weight>160</weight></connection number =7name=Airteldesc=Largets TRelecome ></ispinfo>
I just want to know that how can I write this XML in proper XML format ?
Thanks in advance
UPDATED to include simulation of both creating and printing of the XML tree
The Basic Issue
Your code is generating invalid connection tags like this:
<connection number =5name=Airtelllldesc=Largets TRelecome ></connection number =5name=Airteldesc=Largets TRelecome >
when they should look like this (I am omitting the sub-elements in between. Your code is generating these correctly):
<connection number="5" name="Airtellll" desc="Largets TRelecome" ></connection>
If you had valid XML, this code would print it neatly:
from lxml import etree
xml = '''<ispinfo><connection number="5" name="Airtellll" desc="Largets TRelecome" ><ip_address>192.168.1.23</ip_address><subnet>192.168.1.23</subnet><gateway>192.168.1.23</gateway><probe_server1>192.168.99.1</probe_server1><probe_server2 /><interface>eth0</interface><weight>160</weight></connection></ispinfo>'''
xml = etree.XML(xml)
print etree.tostring(xml, pretty_print = True)
Generating Valid XML
A small simulation follows:
from lxml import etree
# Some dummy text
conn_id = 5
conn_name = "Airtelll"
conn_desc = "Largets TRelecome"
ip = "192.168.1.23"
# Building the XML tree
# Note how attributes and text are added, using the Element methods
# and not by concatenating strings as in your question
root = etree.Element("ispinfo")
child = etree.SubElement(root, 'connection',
number = str(conn_id),
name = conn_name,
desc = conn_desc)
subchild_ip = etree.SubElement(child, 'ip_address')
subchild_ip.text = ip
# and pretty-printing it
print etree.tostring(root, pretty_print=True)
This will produce:
<ispinfo>
<connection desc="Largets TRelecome" number="5" name="Airtelll">
<ip_address>192.168.1.23</ip_address>
</connection>
</ispinfo>
A single line is proper, in the sense that a XML parser will understand it.
For pretty-printing to sys.stdout, use the dump method of Element.
For pretty-printing to a stream, use the write method of ElementTree.

Generating Xml using python

Kindly have a look at below code i am using this to generate a xml using python .
from lxml import etree
# Some dummy text
conn_id = 5
conn_name = "Airtelll"
conn_desc = "Largets TRelecome"
ip = "192.168.1.23"
# Building the XML tree
# Note how attributes and text are added, using the Element methods
# and not by concatenating strings as in your question
root = etree.Element("ispinfo")
child = etree.SubElement(root, 'connection',
number = str(conn_id),
name = conn_name,
desc = conn_desc)
subchild_ip = etree.SubElement(child, 'ip_address')
subchild_ip.text = ip
# and pretty-printing it
print etree.tostring(root, pretty_print=True)
This will produce:
<ispinfo>
<connection desc="Largets TRelecome" number="5" name="Airtelll">
<ip_address>192.168.1.23</ip_address>
</connection>
</ispinfo>
But i want it to be like :
<ispinfo>
<connection desc="Largets TRelecome" number='1' name="Airtelll">
<ip_address>192.168.1.23</ip_address>
</connection>
</ispinfo>
Mean number attribute should be come in a single quote .Any idea ....How can i achieve this
There is no flag in lxml to do this, so you have to resort to manual manipulation.
import re
re.sub(r'number="([0-9]+)"',r"number='\1'", etree.tostring(root, pretty_print=True))
However, why do you want to do this? As there is no difference other than cosmetics.

Categories