I want to update xml file with new information by using lxml library.
For example, I have this code:
>>> from lxml import etree
>>>
>>> tree = etree.parse('books.xml')
where 'books.xml' file, has this content: http://www.w3schools.com/dom/books.xml
I want to update this file with new book:
>>> new_entry = etree.fromstring('''<book category="web" cover="paperback">
... <title lang="en">Learning XML 2</title>
... <author>Erik Ray</author>
... <year>2006</year>
... <price>49.95</price>
... </book>''')
My question is, how can I update tree element tree with new_entry tree and save the file.
Here you go, get the root of the tree, append your new element, save the tree as a string to a file:
from lxml import etree
tree = etree.parse('books.xml')
new_entry = etree.fromstring('''<book category="web" cover="paperback">
<title lang="en">Learning XML 2</title>
<author>Erik Ray</author>
<year>2006</year>
<price>49.95</price>
</book>''')
root = tree.getroot()
root.append(new_entry)
f = open('books-mod.xml', 'wb')
f.write(etree.tostring(root, pretty_print=True))
f.close()
I don't have enough reputation to comment, therefore I'll write an answer...
The most simple change make the code of Guillaume work is to change the line
f = open('books-mod.xml', 'w')
to
f = open('books-mod.xml', 'wb')
Related
I've got an XML file which looks like this:
<?xml version="1.0"?>
-<Object>
<ID>Object_01</ID>
<Location>Manchester</Location>
<Date>01-01-2020</Date>
<Time>15u59m05s</Time>
-<Max_25Hz>
<25Hz>0.916631065043311</25Hz>
<25Hz>0.797958008447961</25Hz>
</Max_25Hz>
-<Max_75Hz>
<75Hz>1.96599232706463</75Hz>
<75Hz>1.48317837078523</75Hz>
</Max_75Hz>
</Object>
I still don't really understand the difference between attributes and text. With below code I tried to receive all the values using text.
import xml.etree.ElementTree as ET
root = r'c:\data\FF\Desktop\My_files\XML-files\Object_01.xml'
tree = ET.parse(root)
root = tree.getroot()
for elem in root:
for subelem in elem:
print(subelem.text)
Expected output:
Object_01
Manchester
01-01-2020
15u59m05s
0.916631065043311
0.797958008447961
1.96599232706463
1.48317837078523
Received output:
0.916631065043311
0.797958008447961
1.96599232706463
1.48317837078523
I tried to do to same with .attributes in the hope to receive all the 'column' names but then I received:
{}
{}
{}
{}
You can access them directly above the for-loop.
Ex:
tree = ET.ElementTree(ET.fromstring(X))
root = tree.getroot()
for elem in root:
print(elem.text) #! Access them Here
for subelem in elem:
print(subelem.text)
Output:
Object_01
Manchester
01-01-2020
15u59m05s
0.916631065043311
0.797958008447961
1.96599232706463
1.48317837078523
You could give a try to https://github.com/martinblech/xmltodict.
It is almost a replacement for json module. This allows to read an xml file into a python dict. This simplifies greatly accessing the xml content.
Something like:
from xmldict import *
root = r'c:\data\FF\Desktop\My_files\XML-files\Object_01.xml'
with open(root) as file:
xmlStr = file.read()
xmldict = xml.parse(xmlStr)
print (xmldict['Object']['Id'])
I am trying to parse XML before converting it's content into lists and then into CSV. Unfortunately, I think my search terms for finding the initial element are failing, causing subsequent searches further down the hierarchy. I am new to XML, so I've tried variations on namespace dictionaries and including the namespace references... The simplified XML is given below:
<?xml version="1.0" encoding="utf-8"?>
<StationList xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:add="http://www.govtalk.gov.uk/people/AddressAndPersonalDetails"
xmlns:com="http://nationalrail.co.uk/xml/common" xsi:schemaLocation="http://internal.nationalrail.co.uk/xml/XsdSchemas/External/Version4.0/nre-station-v4-0.xsd"
xmlns="http://nationalrail.co.uk/xml/station">
<Station xsi:schemaLocation="http://internal.nationalrail.co.uk/xml/XsdSchemas/External/Version4.0/nre-station-v4-0.xsd">
<ChangeHistory>
<com:ChangedBy>spascos</com:ChangedBy>
<com:LastChangedDate>2018-11-07T00:00:00.000Z</com:LastChangedDate>
</ChangeHistory>
<Name>Aber</Name>
</Station>
The Code I am using to try to extract the com/...xml/station / ChangedBy element is below
tree = ET.parse(rootfilepath + "NRE_Station_Dataset_2019_raw.xml")
root = tree.getroot()
#get at the tags and their data
#for elem in tree.iter():
# print(f"this the tag {elem.tag} and this is the data: {elem.text}")
#open file for writing
station_data = open(rootfilepath + 'station_data.csv','w')
csvwriter = csv.writer(station_data)
station_head = []
count = 0
#inspiration for this code: http://blog.appliedinformaticsinc.com/how-to- parse-and-convert-xml-to-csv-using-python/
#this is where it goes wrong; some combination of the namespace and the tag can't find anything in line 27, 'StationList'
for member in root.findall('{http://nationalrail.co.uk/xml/station}Station'):
station = []
if count == 0:
changedby = member.find('{http://nationalrail.co.uk/xml/common}ChangedBy').tag
station_head.append(changedby)
name = member.find('{http://nationalrail.co.uk/xml/station}Name').tag
station_head.append(name)
count = count+1
changedby = member.find('{http://nationalrail.co.uk/xml/common}ChangedBy').text
station.append(changedby)
name = member.find('{http://nationalrail.co.uk/xml/station}Name').text
station.append(name)
csvwriter.writerow(station)
I have tried:
using dictionaries of namespaces but that results in nothing being found at all
using hard coded namespaces but that results in "Attribute Error: 'NoneType' object has no attribute 'tag'
Thanks in advance for all and any assistance.
First of all your XML is invalid (</StationList> is absent at the end of a file).
Assuming you have valid XML file:
<?xml version="1.0" encoding="utf-8"?>
<StationList xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:add="http://www.govtalk.gov.uk/people/AddressAndPersonalDetails"
xmlns:com="http://nationalrail.co.uk/xml/common" xsi:schemaLocation="http://internal.nationalrail.co.uk/xml/XsdSchemas/External/Version4.0/nre-station-v4-0.xsd"
xmlns="http://nationalrail.co.uk/xml/station">
<Station xsi:schemaLocation="http://internal.nationalrail.co.uk/xml/XsdSchemas/External/Version4.0/nre-station-v4-0.xsd">
<ChangeHistory>
<com:ChangedBy>spascos</com:ChangedBy>
<com:LastChangedDate>2018-11-07T00:00:00.000Z</com:LastChangedDate>
</ChangeHistory>
<Name>Aber</Name>
</Station>
</StationList>
Then you can convert your XML to JSON and simply address to the required value:
import xmltodict
with open('file.xml', 'r') as f:
data = xmltodict.parse(f.read())
changed_by = data['StationList']['Station']['ChangeHistory']['com:ChangedBy']
Output:
spascos
Try lxml:
#!/usr/bin/env python3
from lxml import etree
ns = {"com": "http://nationalrail.co.uk/xml/common"}
with open("so.xml") as f:
tree = etree.parse(f)
for t in tree.xpath("//com:ChangedBy/text()", namespaces=ns):
print(t)
Output:
spascos
You can use Beautifulsoup which is an html and xml parser
from bs4 import BeautifulSoup
fd = open(rootfilepath + "NRE_Station_Dataset_2019_raw.xml")
soup = BeautifulSoup(fd,'lxml-xml')
for i in soup.findAll('ChangeHistory'):
print(i.ChangedBy.text)
Im trying to take two elements from one file (file1.xml), and write them onto the end of another file (file2.xml). I am able to get them to print out, but am stuck trying to write them onto file2.xml! Help !
filename = "file1.xml"
appendtoxml = "file2.xml"
output_file = appendtoxml.replace('.xml', '') + "_editedbyed.xml"
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(filename, parser)
etree.tostring(tree)
root = tree.getroot()
a = root.findall(".//Device")
b = root.findall(".//Speaker")
for r in a:
print etree.tostring(r)
for e in b:
print etree.tostring(e)
NewSub = etree.SubElement (root, "Audio(just writes audio..")
print NewSub
I want the results of a, b to be added onto the end of outputfile.xml in the root.
Parse both the input file and the file you wish to append to.
Use root.append(elt) to append Element, elt, to root.
Then use tree.write to write the new tree to a file (e.g. appendtoxml):
Note: The links above point to documentation for xml.etree from the standard
library. Since lxml's API tries to be compatible with the standard library's
xml.etree, the standard library documentation applies to lxml as well (at
least for these methods). See http://lxml.de/api.html for information on where
the APIs differ.
import lxml.etree as ET
filename = "file1.xml"
appendtoxml = "file2.xml"
output_file = appendtoxml.replace('.xml', '') + "_editedbyed.xml"
parser = ET.XMLParser(remove_blank_text=True)
tree = ET.parse(filename, parser)
root = tree.getroot()
out_tree = ET.parse(appendtoxml, parser)
out_root = out_tree.getroot()
for path in [".//Device", ".//Speaker"]:
for elt in root.findall(path):
out_root.append(elt)
out_tree.write(output_file, pretty_print=True)
If file1.xml contains
<?xml version="1.0"?>
<root>
<Speaker>boozhoo</Speaker>
<Device>waaboo</Device>
<Speaker>anin</Speaker>
<Device>gigiwishimowin</Device>
</root>
and file2.xml contains
<?xml version="1.0"?>
<root>
<Speaker>jubal</Speaker>
<Device>crane</Device>
</root>
then file2_editedbyed.xml will contain
<root>
<Speaker>jubal</Speaker>
<Device>crane</Device>
<Device>waaboo</Device>
<Device>gigiwishimowin</Device>
<Speaker>boozhoo</Speaker>
<Speaker>anin</Speaker>
</root>
I have an xml file like this
<?xml version="1.0"?>
<sample>
<text>My name is <b>Wrufesh</b>. What is yours?</text>
</sample>
I have a python code like this
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
for child in root:
print child.text()
I only get
'My name is' as an output.
I want to get
'My name is <b>Wrufesh</b>. What is yours?' as an output.
What can I do?
You can get your desired output using using ElementTree.tostringlist():
>>> import xml.etree.ElementTree as ET
>>> root = ET.parse('sample.xml').getroot()
>>> l = ET.tostringlist(root.find('text'))
>>> l
['<text', '>', 'My name is ', '<b', '>', 'Wrufesh', '</b>', '. What is yours?', '</text>', '\n']
>>> ''.join(l[2:-2])
'My name is <b>Wrufesh</b>. What is yours?'
I wonder though how practical this is going to be for generic use.
I don't think treating tag in xml as a string is right. You can access the text part of xml like this:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
text = root[0]
for i in text.itertext():
print i
# As you can see, `<b>` and `</b>` is a pair of tags but not strings.
print text._children
I would suggest pre-processing the xml file to wrap elements under <text> element in CDATA. You should be able to read the values without a problem afterwards.
<text><![CDATA[<My name is <b>Wrufesh</b>. What is yours?]]></text>
i have a questiion about formatting xml files after generating them. Here is my code:
import csv
from xml.etree.ElementTree import Element, SubElement, Comment, tostring
from xml.etree.ElementTree import ElementTree
import xml.etree.ElementTree as etree
root = Element('Solution')
root.set('version','1.0')
tree = ElementTree(root)
head = SubElement(root, 'DrillHoles')
head.set('total_holes', '238')
description = SubElement(head,'description')
with open ('1250_12.csv', 'r') as data:
current_group = None
reader = csv.reader(data)
i = 0
for row in reader:
if i > 0:
x1,y1,z1,x2,y2,z2,cost = row
if current_group is None or i != current_group.text:
current_group = SubElement(description, 'hole',{'hole_id':"%s"%i})
information = SubElement (current_group, 'hole',{'collar':', '.join((x1,y1,z1)),
'toe':', '.join((x2,y2,z2)),
'cost': cost})
i+=1
Which produces the following xml file:
<?xml version="1.0"?>
-<Solution version="1.0">
-<DrillHoles total_holes="238">
-<description>
-<hole hole_id="1">
<hole toe="5797.82, 3061.01, 2576.29" cost="102.12" collar="5720.44, 3070.94, 2642.19"/></hole>
that is just a part of the xml file but it is enough to serve this purpose.
There are many things i would like to change, first is i would like the toe,cost, and collar to be on different lines like so:
<collar>0,-150,0</collar>
<toe>69.9891,-18.731,-19.2345</toe>
<cost>15</cost>
and i would like it to be in the order of collar then toe then cost shown above.
Furthermore, in the xml file it displays : "hole toe ="5797.82, 3061.01, 2576.29", how do i get rid of the hole? Yea thats about it, i am really new to this python thing so go easy on me. haha