How to save data in xml - python

I want to save data in xml in this format
<posts>
<row Id="1" PostTypeId="1" AcceptedAnswerId="13"
CreationDate="2010-09-13T19:16:26.763" Score="297" ViewCount="472045"
Body="<p>This is a common question by those who have just rooted their phones. What apps, ROMs, benefits, etc. do I get from rooting? What should I be doing now?</p>
"
OwnerUserId="10"
LastEditorUserId="16575" LastEditDate="2013-04-05T15:50:48.133"
LastActivityDate="2018-05-19T19:51:11.530"
Title="I've rooted my phone. Now what? What do I gain from rooting?"
Tags="<rooting><root-access>"
AnswerCount="3" CommentCount="0" FavoriteCount="194"
CommunityOwnedDate="2011-01-25T08:44:10.820" />
</posts>
I tried this, but I don't know how will I save in the above format:
import xml.etree.cElementTree as ET
root = ET.Element("posts")
row = ET.SubElement(root, "row")
ET.SubElement(row, questionText = "questionText").text = questionText
ET.SubElement(row, votes = "votes").text = votes
ET.SubElement(row, tags = "tags").text = tags
tree = ET.ElementTree(root)
tree.write("data.xml")

The docs for SubElement say it has the following arguments: parent, tag, attrib={}, **extra.
You omit the tag, so you get an error TypeError: SubElement() takes at least 2 arguments (1 given) with your code. You need something like:
import xml.etree.cElementTree as ET
root = ET.Element("posts")
row = ET.SubElement(root, "row", attrib={"foo":"bar", "baz":"qux"})
tree = ET.ElementTree(root)
tree.write("data.xml")
Outputs: <posts><row baz="qux" foo="bar" /></posts>

Related

How to access UBL 2.1 xml tag using python

I need to access the tags in UBL 2.1 and modify them depend on the on the user input on python.
So, I used the ElementTree library to access the tags and modify them.
Here is a sample of the xml code:
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ns2="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>2022-11-05</ns1:IssueDate>
The issue :
I want to access the tags but it is doesn't modifed and enter the loop
I tried both ways:
mytree = ET.parse('test.xml')
myroot = mytree.getroot()
for x in myroot.find({xmlns:ns1=urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate}"):
x.text = '1999'
mytree.write('test.xml')
mytree = ET.parse('test.xml')
myroot = mytree.getroot()
for x in myroot.iter('./Invoice/AllowanceCharge/ChargeIndicator'):
x.text = str('true')
mytree.write('test.xml')
None of them worked and modify the tag.
So the questions is : How can I reach the specific tag and modify it?
If you correct the namespace and the brakets in your for loop it works for a valid XML like (root tag must be closed!):
Input:
<?xml version="1.0" encoding="utf-8"?>
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ns2="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>2022-11-05</ns1:IssueDate>
</ns0:Invoice>
Your repaired code:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for elem in root.findall("{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate"):
elem.text = '1999'
tree.write('test_changed.xml', encoding='utf-8', xml_declaration=True)
ET.dump(root)
Output:
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>1999</ns1:IssueDate>
</ns0:Invoice>

XML parser returns NoneType

I am trying to parse below XML format using the ElementTree XML in Python, but I get "member" as None, when I use .text it gives attribute error
<address-group>
<entry name="TBR">
<static>
<member>TBR1-1.1.1.1_21</member>
<member>TBR2-2.2.2.2_24</member>
<member>TBR3-3.3.3.3_21</member>
<member>TBR4-4.4.4.4_24</member>
</static>
</entry>
<address-group>
Here is my code:
import xml.etree.ElementTree as ET
tree = ET.parse("addrgrp.xml")
root = tree.getroot()
tag = root.tag
print (tag)
attr = root.attrib
for entries in root.findall("entry"):
name = entries.get('name')
print (name)
ip = entries.find('static')
print (ip)
for mem in ip.findall('member'):
member = mem.find('member')
print (member)
The code below aggregate the members of each entry by entry name
import xml.etree.ElementTree as ET
import pprint
XML = '''
<address-group>
<entry name="TBR1">
<static>
<member>TBR1-1.1.1.1_21</member>
<member>TBR2-2.2.2.2_24</member>
<member>TBR3-3.3.3.3_21</member>
<member>TBR4-4.4.4.4_24</member>
</static>
</entry>
<entry name="TBR2">
<static>
<member>TBR1-4.1.1.1_21</member>
<member>TBR2-4.2.2.2_24</member>
<member>TBR3-4.3.3.3_21</member>
<member>TBR4-9.4.4.4_24</member>
</static>
</entry>
</address-group>'''
root = ET.fromstring(XML)
data_by_entry = {}
entries = root.findall('.//entry')
for entry in entries:
data_by_entry[entry.attrib['name']] = [m.text for m in entry.findall('./static/member')]
pprint.pprint(data_by_entry)
output
{'TBR1': ['TBR1-1.1.1.1_21',
'TBR2-2.2.2.2_24',
'TBR3-3.3.3.3_21',
'TBR4-4.4.4.4_24'],
'TBR2': ['TBR1-4.1.1.1_21',
'TBR2-4.2.2.2_24',
'TBR3-4.3.3.3_21',
'TBR4-9.4.4.4_24']}
The source of your problem is that:
within for mem in ip.findall('member'): loop mem is the current member element,
but the first instruction in this loop is member = mem.find('member'),
so you attempt to find another (nested) member within the current member,
which doesn't exist.
Another flaw in your code is that there is no point in printing a node which does
not have any text.
Change your loop to the code below:
for entries in root.findall('entry'):
name = entries.get('name')
print(name)
ip = entries.find('static')
print('Members:')
for mem in ip.findall('member'):
print(mem.text)
and you will get meaningful result.

parse the child tags with a specific matching string in xml using python

i want parse the xml string having tag Topics as parent tag and Topic1,Topic2 as child tags.
<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers & Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>
I just want to parse this xml so that i can get the attribute value of every Topic tag, I just want it to be in for loop.
i have tried with the following code:
import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='sample.xml')
#get the root element
root = tree.getroot()
namespace = {'xmlns': 'urn:reuterscompanycontent:significantdevelopments03'}
for devs in root.findall('xmlns:Topics' ,namespace):
for child_tags in devs.findall('xmlns:./', namespace):
print 'child: ', child_tags.tag
I just want to add some wild card like Topic/d in second last line so that i can parse every tag matching Topic
You can check that the tag property starts with the namespace plus the prefix Topic, for instance
from xml.etree import cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers & Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>')
topics = [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]
for topic in topics:
print (topic.text)
or shorter as
from xml.etree import cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers & Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>')
for topic in [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]:
print (topic.text)
Or put the check into an if statement inside of your for statements.

Python add Tags to XML using lxml

I have the following Input XML:
<?xml version="1.0" encoding="utf-8"?>
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
</Scenario>
My Program does add three Tags to the XML but they are formatted false.
The Output XML looks like the following:
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
<Duration>12</Duration><EVC-SW-Version>08.02.0001.0027</EVC-SW-Version><STAC-Release>08.02.0001.0027</STAC-Release></Scenario>
Thats my Source-Code:
class XmlManager:
#staticmethod
def write_xml(xml_path, duration, evc_sw_version):
xml_path = os.path.abspath(xml_path)
if os.path.isfile(xml_path) and xml_path.endswith(".xml"):
# parse XML into etree
root = etree.parse(xml_path).getroot()
# add tags
duration_tag = etree.SubElement(root, "Duration")
duration_tag.text = duration
sw_version_tag = etree.SubElement(root, "EVC-SW-Version")
sw_version_tag.text = evc_sw_version
stac_release = evc_sw_version
stac_release_tag = etree.SubElement(root, "STAC-Release")
stac_release_tag.text = stac_release
# write changes to the XML-file
tree = etree.ElementTree(root)
tree.write(xml_path, pretty_print=False)
else:
XmlManager.logger.log("Invalid path to XML-file")
def main():
xml = r".\Test_Input_Data_Base\blnmerf1_md1czjyc_REL_V_08.01.0001.000x\Test_startup_0029\Test_startup_0029.xml"
XmlManager.write_xml(xml, "12", "08.02.0001.0027")
My Question is how to add the new tags to the XML in the right format. I guess its working that way for parsing again the changed XML but its not nice formated. Any Ideas? Thanks in advance.
To ensure nice pretty-printed output, you need to do two things:
Parse the input file using an XMLParser object with remove_blank_text=True.
Write the output using pretty_print=True
Example:
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse("Test_startup_0029.xml", parser)
root = tree.getroot()
duration_tag = etree.SubElement(root, "Duration")
duration_tag.text = "12"
sw_version_tag = etree.SubElement(root, "EVC-SW-Version")
sw_version_tag.text = "08.02.0001.0027"
stac_release_tag = etree.SubElement(root, "STAC-Release")
stac_release_tag.text = "08.02.0001.0027"
tree.write("output.xml", pretty_print=True)
Contents of output.xml:
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
<Duration>12</Duration>
<EVC-SW-Version>08.02.0001.0027</EVC-SW-Version>
<STAC-Release>08.02.0001.0027</STAC-Release>
</Scenario>
See also http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output.

LXML add an element into root

Im trying to take two elements from one file (file1.xml), and write them onto the end of another file (file2.xml). I am able to get them to print out, but am stuck trying to write them onto file2.xml! Help !
filename = "file1.xml"
appendtoxml = "file2.xml"
output_file = appendtoxml.replace('.xml', '') + "_editedbyed.xml"
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(filename, parser)
etree.tostring(tree)
root = tree.getroot()
a = root.findall(".//Device")
b = root.findall(".//Speaker")
for r in a:
print etree.tostring(r)
for e in b:
print etree.tostring(e)
NewSub = etree.SubElement (root, "Audio(just writes audio..")
print NewSub
I want the results of a, b to be added onto the end of outputfile.xml in the root.
Parse both the input file and the file you wish to append to.
Use root.append(elt) to append Element, elt, to root.
Then use tree.write to write the new tree to a file (e.g. appendtoxml):
Note: The links above point to documentation for xml.etree from the standard
library. Since lxml's API tries to be compatible with the standard library's
xml.etree, the standard library documentation applies to lxml as well (at
least for these methods). See http://lxml.de/api.html for information on where
the APIs differ.
import lxml.etree as ET
filename = "file1.xml"
appendtoxml = "file2.xml"
output_file = appendtoxml.replace('.xml', '') + "_editedbyed.xml"
parser = ET.XMLParser(remove_blank_text=True)
tree = ET.parse(filename, parser)
root = tree.getroot()
out_tree = ET.parse(appendtoxml, parser)
out_root = out_tree.getroot()
for path in [".//Device", ".//Speaker"]:
for elt in root.findall(path):
out_root.append(elt)
out_tree.write(output_file, pretty_print=True)
If file1.xml contains
<?xml version="1.0"?>
<root>
<Speaker>boozhoo</Speaker>
<Device>waaboo</Device>
<Speaker>anin</Speaker>
<Device>gigiwishimowin</Device>
</root>
and file2.xml contains
<?xml version="1.0"?>
<root>
<Speaker>jubal</Speaker>
<Device>crane</Device>
</root>
then file2_editedbyed.xml will contain
<root>
<Speaker>jubal</Speaker>
<Device>crane</Device>
<Device>waaboo</Device>
<Device>gigiwishimowin</Device>
<Speaker>boozhoo</Speaker>
<Speaker>anin</Speaker>
</root>

Categories