Writing XML (ET.Element) to a file - python

I have created an XML file using xml.etree.ElementTree.
The created XML basically looks like this:
<testsuite name="Exploatering Tests">
<testsuite name="device"></testsuite>
<testsuite name="management"></testsuite>
<testsuite name="auto"></testsuite>
</testsuite>
It all looks good, but I want to export only the child elements into a file and for that I am using the following code:
# First Testsuite Level
testsuite_exploatering = ET.Element('testsuite')
testsuite_exploatering.set("name", "Exploatering Tests")
... HERE I RECURSIVELY ADD LOTS OF ELEMENT TO testsuite_exploatering
# Write XML to file
output_xml = ET.ElementTree(testsuite_exploatering)
ET.ElementTree.write(testsuite_exploatering, output_file, xml_declaration=True, encoding="UTF-8")
It writes the XML element into the file correctly but how should I modify it to print only the inner elements into a file (I don't want to have the Exploatering Tests element written to the file). want the file to look like this:
<testsuite name="device"></testsuite>
<testsuite name="management"></testsuite>
<testsuite name="auto"></testsuite>

I'm not sure that's a legal XML structure. If you really want to force the output into that format, maybe you can iterate over the children and write each of them like so:
out_handle = open('output.xml','w')
for child in testsuite_exploatering.getchildren():
ET.ElementTree.write(ET.ElementTree(child), out_handle)
out_handle.close()

Related

Write KML file from another

I'm trying to:
- read a KML file
- remove the Placemark element if name = 'ZONE'
- write a new KML file without the element
This is my code:
from pykml import parser
kml_file_path = '../Source/Lombardia.kml'
removeList = list()
with open(kml_file_path) as f:
folder = parser.parse(f).getroot().Document.Folder
for pm in folder.Placemark:
if pm.name == 'ZONE':
removeList.append(pm)
print pm.name
for tag in removeList:
parent = tag.getparent()
parent.remove(tag)
#Write the new file
#I cannot reach the solution help me
and this is the KML:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>
<name>Lombardia</name>
<Style>
...
</Style>
<Folder>
<Placemark>
<name>ZOGNO</name>
<styleUrl>#FEATURES_LABELS</styleUrl>
<Point>
<coordinates>9.680530595139061,45.7941656233647,0</coordinates>
</Point>
</Placemark>
<Placemark>
<name>ZONE</name>
<styleUrl>#FEATURES_LABELS</styleUrl>
<Point>
<coordinates>10.1315885854064,45.7592449779275,0</coordinates>
</Point>
</Placemark>
</Folder>
</Document>
</kml>
The problem is that when I write the new KML file this still has the element I want to delete.
In fact, with I want to delete the element that contains name = ZONE.
What i'm doing wrong?
Thank you.
--- Final Code
This is the working code thanks to #Dawid Ferenczy:
from lxml import etree
import pykml
from pykml import parser
kml_file_path = '../Source/Lombardia.kml'
# parse the input file into an object tree
with open(kml_file_path) as f:
tree = parser.parse(f)
# get a reference to the "Document.Folder" node
folder = tree.getroot().Document.Folder
# iterate through all "Document.Folder.Placemark" nodes and find and remove all nodes
# which contain child node "name" with content "ZONE"
for pm in folder.Placemark:
if pm.name == 'ZOGNO':
parent = pm.getparent()
parent.remove(pm)
# convert the object tree into a string and write it into an output file
with open('output.kml', 'w') as output:
output.write(etree.tostring(folder, pretty_print=True))
Consider XSLT, the special purpose language designed to transform XML files. And because KML files are XML files, this solution is viable. Python's third-party module, lxml can run XSLT 1.0 scripts and do so without a single loop.
Specifically, the XSLT script runs the Identity Transform to copy entire document as is. Then, script runs an empty template on the element (conditional to specific logic) to remove that element. To accommodate the default namespace, a prefix, doc, is used for XPath search.
XSLT (save as .xsl file, a special .xml file to be loaded in Python below)
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:doc="http://earth.google.com/kml/2.2">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="doc:Placemark[doc:name='ZONE']"/>
</xsl:stylesheet>
XSLT Fiddle Demo
Python
import lxml.etree as et
# LOAD XML AND XSL
doc = et.parse('/path/to/Input.xml')
xsl = et.parse('/path/to/XSLT_Script.xsl')
# CONFIGURE TRANSFORMER
transform = et.XSLT(xsl)
# RUN TRANSFORMATION
result = transform(doc)
# PRINT RESULT
print(result)
# SAVE TO FILE
with open('output.xml', 'wb') as f:
f.write(result)
Output
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>
<name>Lombardia</name>
<Style>
...
</Style>
<Folder>
<Placemark>
<name>ZOGNO</name>
<styleUrl>#FEATURES_LABELS</styleUrl>
<Point>
<coordinates>9.680530595139061,45.7941656233647,0</coordinates>
</Point>
</Placemark>
</Folder>
</Document>
</kml>
You have the following issues in your code:
you're not storing the whole parsed object tree anywhere (you have just a reference to the node "Document.Folder": folder = parser.parse(f).getroot().Document.Folder) but you want to write it back into a file so you need to store it
I don't understand why you need two loops and the list removeList when you can delete elements directly in the first loop
you're not reading the documentation - it's well described how to write the object tree into a file under examples in pykml library's documentation
Try the following code:
from lxml import etree
from pykml import parser
kml_file_path = './input.kml'
# parse the input file into an object tree
with open(kml_file_path) as f:
tree = parser.parse(f)
# get a reference to the "Document.Folder" node
folder = tree.getroot().Document.Folder
# iterate through all "Document.Folder.Placemark" nodes and find and remove all nodes
# which contain child node "name" with content "ZONE"
for pm in folder.Placemark:
if pm.name == 'ZONE':
parent = pm.getparent()
parent.remove(pm)
# convert the object tree into a string and write it into an output file
with open('output.kml', 'w') as output:
output.write(etree.tostring(tree, pretty_print=True))
It's very simple:
KML file is parsed into an object tree and stored in variable tree
the same object tree is directly manipulated (removed element)
the same object tree is written back into a file

Split one large .xml file in more .xml files (python)

I've been trying to split one large .xml file in more .xml files in python for a few days now. The thing is I haven't really succeeded yet. So here I am asking for your help.
My large .xml file looks like this:
<Root>
<Testcase>
<Info1>[]<Info1>
<Info2>[]<Info2>
</Testcase>
<Testcase>
<Info1>[]<Info1>
<Info2>[]<Info2>
<Testcase>
...
...
...
<Testcase>
<Info1>[]<Info1>
<Info2>[]<Info2>
<Testcase>
</Root>
It has over 2000 children and what I would like to do is to parse this .xml file and split in smaller .xml files with 100 children each. That would result in 20 new .xml files.
How can I do that?
Thank you!
L.E.:
I've tried to parse the .xml file using xml.etree.ElementTree
import xml.etree.ElementTree as ET
file = open('Testcase.xml', 'r')
tree = ET.parse(file)
total_testcases = 0
for Testcase in root.findall('Testcase'):
total_testcases+=1
nr_of_files = (total_testcases/100)+1
for i in range(nr_of_files+1):
tree.write('Testcase%d.xml' % (i), encoding="UTF-8")
The thing is I don't know how to specifically get only the Testcases and copy them to another file...
Actually, root.findall('Testcase') will return a list of "Testcase" sub elements.
So what need to do is:
create root
add sub elements to root.
Here is example:
>>> tcs = root.findall('Testcase')
>>> tcs
[<Element 'Testcase' at 0x23e14e0>, <Element 'Testcase' at 0x23e1828>]
>>> len(tcs)
2
>>> r = ET.Element('Root')
>>> r.append(tcs[0])
>>> ET.tostring(r, 'utf-8')
'<Root><Testcase>\n <Info1>[]</Info1>\n <Info2>[]</Info2>\n </Testcase>\n </Root>'

Generate output files from template file and csv data in python

I need to generate xml files poulated with data from a csv file in python
I have two input files:
one CSV file named data.csv containing data like this:
ID YEAR PASS LOGIN HEX_LOGIN
14Z 2013 (3e?k<.P#H}l hex0914Z F303935303031345A
14Z 2014 EAeW+ZM..--r hex0914Z F303935303031345A
.......
One Template file named template.xml
<?xml version="1.0"?>
<SecurityProfile xmlns="security_profile_v1">
<year></year>
<security>
<ID></ID>
<login></login>
<hex_login></hex_login>
<pass></pass>
</security>
</SecurityProfile>
I want to get as many output files as lines in the csv data file, each output filed named YEAR_ID, with the data from the csv file in the xml fields:
Output files contentes:
Content of output file #1 named 2013_0950014z:
<?xml version="1.0"?>
<SecurityProfile xmlns="security_profile_v1">
<year>2013</year>
<security>
<ID>14Z</ID>
<login>hex0914</login>
<hex_login>F303935303031345A</hex_login>
<pass>(3e?k<.P#H}l</pass>
</security>
</SecurityProfile>
Content of output file #2 named 2014_0950014z:
<?xml version="1.0"?>
<SecurityProfile xmlns="security_profile_v1">
<year>2014</year>
<security>
<ID>14Z</ID>
<login>hex0914</login>
<hex_login>F303935303031345A</hex_login>
<pass>EAeW+ZM..--r</pass>
</security>
</SecurityProfile>
Thank you for your suggestions.
Can you make changes the template? If so, I would do the following to make this a bit simpler:
<?xml version="1.0"?>
<SecurityProfile xmlns="security_profile_v1">
<year>{year}</year>
<security>
<ID>{id}</ID>
<login>{login}</login>
<hex_login>{hex_login}</hex_login>
<pass>{pass}</pass>
</security>
</SecurityProfile>
Then, something like this would work:
import csv
input_file_name = "some_file.csv" #name/path of your csv file
template_file_name = "some_file.xml" #name/path of your xml template
output_file_name = "{}_09500{}.xml"
with open(template_file_name,"r") as template_file:
template = template_file.read()
with open(input_file_name,"r") as csv_file:
my_reader = csv.DictReader(csv_file)
for row in my_reader:
with open(output_file_name.format(row["YEAR"],row["ID"]),"w") as current_out:
current_out.write(template.format(year=row["YEAR"],
id=row["ID"],
login=row["LOGIN"],
hex_login=row["HEX_LOGIN"],
pass=row["PASS"]))
If you can't modify the template, or want to process it as XML instead of basic string manipulation, then it's a bit more involved.
EDIT:
Modified answer to use csv.DictReader rather than csv.reader.
Fixed variable names opening input CSV file and writing the output. Removed 'binary' mode file operations.
import csv
from collections import defaultdict
header = '<?xml version="1.0"?><SecurityProfile xmlns="security_profile_v1">\n'
footer = '\n</SecurityProfile>'
entry = '''<security>
<ID>{0[ID]}</ID>
<login>{0[LOGIN]}</login>
<hex_login>{0[HEX_LOGIN]}</hex_login>
<pass>{0[PASS]}</pass>
</security>'''
rows = defaultdict(list)
with open('infile.csv') as f:
reader = csv.DictReader(f, delimiter='\t')
for item in reader:
rows[reader['YEAR']].append(item)
for year,data in rows.iteritems():
with open('{}.xml'.format(year), 'w') as f:
f.write(header)
f.write('<year>{}</year>\n'.format(year))
for record in data:
f.write(entry.format(record))
f.write('\n')
f.write(footer)

remove <?xml version="1.0" ?> using xml.dom.minidom

I am generating XML files using xml.dom.minidom. Every time I generate a file on the very row there appears <?xml version="1.0" ?> and the generated file looks like this:
<?xml version="1.0" ?>
<Root>
data
</Root>
is not there anyway so have an output without and my output should look like
<Root>
data
</Root>
The best solution I found was to write out .childNodes[0], i.e. write out:
doc.childNodes[0].toprettyxml()
to the file, which will omit the xml version tag.
If you are happy just to trim the first line from the file, use this code;
f = open( 'file.txt', 'r' )
lines = f.readlines()
f.close()
f = open( 'file.txt'.'w' )
f.write( '\n'.join( lines[1:] ) )
f.close()
This does the job where old_data is the xml to strip
new_data = old_data[old_data.find("?>")+2:]

Generating xml in python and lxml

I have this xml from sql, and I want to do the same by python 2.7 and lxml
<?xml version="1.0" encoding="utf-16"?>
<results>
<Country name="Germany" Code="DE" Storage="Basic" Status="Fresh" Type="Photo" />
</results>
Now I have:
from lxml import etree
# create XML
results= etree.Element('results')
country= etree.Element('country')
country.text = 'Germany'
root.append(country)
filename = "xmltestthing.xml"
FILE = open(filename,"w")
FILE.writelines(etree.tostring(root, pretty_print=True))
FILE.close()
Do you know how to add rest of attributes?
Note this also prints the BOM
>>> from lxml.etree import tostring
>>> from lxml.builder import E
>>> print tostring(
E.results(
E.Country(name='Germany',
Code='DE',
Storage='Basic',
Status='Fresh',
Type='Photo')
), pretty_print=True, xml_declaration=True, encoding='UTF-16')
��<?xml version='1.0' encoding='UTF-16'?>
<results>
<Country Status="Fresh" Type="Photo" Code="DE" Storage="Basic" name="Germany"/>
</results>
from lxml import etree
# Create the root element
page = etree.Element('results')
# Make a new document tree
doc = etree.ElementTree(page)
# Add the subelements
pageElement = etree.SubElement(page, 'Country',
name='Germany',
Code='DE',
Storage='Basic')
# For multiple multiple attributes, use as shown above
# Save to XML file
outFile = open('output.xml', 'w')
doc.write(outFile, xml_declaration=True, encoding='utf-16')
Save to XML file
doc.write('output.xml', xml_declaration=True, encoding='utf-16')
instead of:
outFile = open('output.xml', 'w')
doc.write(outFile, xml_declaration=True, encoding='utf-16')
Promoting my comment to an answer:
#sukbir is probably not using Windows. What happens is that lxml writes a newline (0A 00 in UTF-16LE) between the XML header and the body. This is then molested by Win text mode to become 0D 0A 00 which makes everything after that look like UTF-16BE hence the Chinese etc characters when you display it. You can get around this in this instance by using "wb" instead of "w" when you open the file. However I'd strongly suggest that you use 'UTF-8' (spelled EXACTLY like that) as your encoding. Why are you using UTF-16? You like large files and/or weird problems?

Categories