I'm a new to python so please bear with me as I try to explain what I am trying to do
here is my xml
<?xml version="1.0"?>
<playlist>
<list>
<txdate>2015-10-30</txdate>
<channel>cake</channel>
<name>Play List</name>
</list>
<eventlist>
<event type="MEDIA">
<title>title1</title>
<starttype>FIX</starttype>
<mediaid>a</mediaid>
<onairtime>2015-10-30T13:30:00:00</onairtime>
<som>00:00:40:03</som>
<duration>01:15:47:15</duration>
<reconcilekey>123</reconcilekey>
<category>PROGRAM</category>
<subtitles>
<cap>CLOSED</cap>
<file>a</file>
<lang>ENG</lang>
<lang>GER</lang>
</subtitles>
</event>
<event type="MEDIA">
<title>THREE DAYS AND A CHILD</title>
<mediaid>b</mediaid>
<onairtime>2015-10-30T14:45:47:15</onairtime>
<som>00:00:00:00</som>
<duration>01:19:41:07</duration>
<reconcilekey>321</reconcilekey>
<category>PROGRAM</category>
<subtitles>
<cap>CLOSED</cap>
<file>b</file>
<lang>ENG</lang>
<lang>GER</lang>
</subtitles>
</event>
</eventlist>
</playlist>
I would like to print all the mediaid values to a file
this is my code so far
import os
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
wfile = 'new.csv'
for child in root:
child.find( "media type" )
for x in child.iter("mediaid"):
file = open(wfile, 'a')
file.write(str(x))
file.close
I tried this with a few other nonstandard libraries but I didn't have much success
For your requirement (as mentioned in the comments) -
just the mediaid from each <event type="MEDIA">
You should use findall() method of ElementTree to get all the event elements with type="MEDIA" , and then get the child mediaid element from it. Example -
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
with open('new.csv','w') as outfile:
for elem in root.findall('.//event[#type="MEDIA"]'):
mediaidelem = elem.find('./mediaid')
if mediaidelem is not None:
outfile.write("{}\n".format(mediaidelem.text))
Related
I need to access the tags in UBL 2.1 and modify them depend on the on the user input on python.
So, I used the ElementTree library to access the tags and modify them.
Here is a sample of the xml code:
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ns2="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>2022-11-05</ns1:IssueDate>
The issue :
I want to access the tags but it is doesn't modifed and enter the loop
I tried both ways:
mytree = ET.parse('test.xml')
myroot = mytree.getroot()
for x in myroot.find({xmlns:ns1=urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate}"):
x.text = '1999'
mytree.write('test.xml')
mytree = ET.parse('test.xml')
myroot = mytree.getroot()
for x in myroot.iter('./Invoice/AllowanceCharge/ChargeIndicator'):
x.text = str('true')
mytree.write('test.xml')
None of them worked and modify the tag.
So the questions is : How can I reach the specific tag and modify it?
If you correct the namespace and the brakets in your for loop it works for a valid XML like (root tag must be closed!):
Input:
<?xml version="1.0" encoding="utf-8"?>
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ns2="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>2022-11-05</ns1:IssueDate>
</ns0:Invoice>
Your repaired code:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for elem in root.findall("{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate"):
elem.text = '1999'
tree.write('test_changed.xml', encoding='utf-8', xml_declaration=True)
ET.dump(root)
Output:
<ns0:Invoice xmlns:ns0="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns1="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2">
<ns1:ProfileID>reporting:1.0</ns1:ProfileID>
<ns1:ID>0</ns1:ID>
<ns1:UUID>dbdf65eb-5d66-47e6-bb0c-a84bbf7baa30</ns1:UUID>
<ns1:IssueDate>1999</ns1:IssueDate>
</ns0:Invoice>
I'm trying to remove tags from an Xml.Alto file with remove.
My Alto file looks like this:
<alto xmlns="http://www.loc.gov/standards/alto/ns-v4#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-2.xsd"> <Description>
<MeasurementUnit>pixel</MeasurementUnit>
<sourceImageInformation>
<fileName>filename</fileName>
</sourceImageInformation>
</Description>
<Layout>
<Page>
<PrintSpace>
<TextBlock>
<Shape><Polygon/></Shape>
<TextLine>
<Shape><Polygon/></Shape>
<String CONTENT="ABCDEF" HPOS="1234" VPOS="1234" WIDTH="1234" HEIGHT="1234" />
</TextLine>
</TextBlock>
</PrintSpace>
</Page>
</Layout>
</alto>
AND my code is :
import xml.etree.ElementTree as ET
tree = ET.parse("file.xml")
root = tree.getroot()
ns = {'alto': 'http://www.loc.gov/standards/alto/ns-v4#'}
ET.register_namespace("", "http://www.loc.gov/standards/alto/ns-v4#")
for Test in root.findall('.//alto:TextBlock', ns):
root.remove(Test)
tree.write('out.xml', encoding="UTF-8", xml_declaration=True)
Here is the error I get:
ValueError: list.remove(x): x not in list
Thanks a lot for your help 💐
ElementFather.remove(ElementChild) works only if the ElementChild is a sub-element of ElementFather. In your case, you have to call remove from PrintSpace.
import xml.etree.ElementTree as ET
tree = ET.parse("file.xml")
root = tree.getroot()
ns = {'alto': 'http://www.loc.gov/standards/alto/ns-v4#'}
ET.register_namespace("", "http://www.loc.gov/standards/alto/ns-v4#")
for Test in root.findall('.//alto:TextBlock', ns):
PrintSpace = root.find('.//alto:PrintSpace',ns)
PrintSpace.remove(Test)
tree.write('out.xml', encoding="UTF-8", xml_declaration=True)
Note: This code is only an example of a working solution, for sure you can improve it.
I have the following Input XML:
<?xml version="1.0" encoding="utf-8"?>
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
</Scenario>
My Program does add three Tags to the XML but they are formatted false.
The Output XML looks like the following:
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
<Duration>12</Duration><EVC-SW-Version>08.02.0001.0027</EVC-SW-Version><STAC-Release>08.02.0001.0027</STAC-Release></Scenario>
Thats my Source-Code:
class XmlManager:
#staticmethod
def write_xml(xml_path, duration, evc_sw_version):
xml_path = os.path.abspath(xml_path)
if os.path.isfile(xml_path) and xml_path.endswith(".xml"):
# parse XML into etree
root = etree.parse(xml_path).getroot()
# add tags
duration_tag = etree.SubElement(root, "Duration")
duration_tag.text = duration
sw_version_tag = etree.SubElement(root, "EVC-SW-Version")
sw_version_tag.text = evc_sw_version
stac_release = evc_sw_version
stac_release_tag = etree.SubElement(root, "STAC-Release")
stac_release_tag.text = stac_release
# write changes to the XML-file
tree = etree.ElementTree(root)
tree.write(xml_path, pretty_print=False)
else:
XmlManager.logger.log("Invalid path to XML-file")
def main():
xml = r".\Test_Input_Data_Base\blnmerf1_md1czjyc_REL_V_08.01.0001.000x\Test_startup_0029\Test_startup_0029.xml"
XmlManager.write_xml(xml, "12", "08.02.0001.0027")
My Question is how to add the new tags to the XML in the right format. I guess its working that way for parsing again the changed XML but its not nice formated. Any Ideas? Thanks in advance.
To ensure nice pretty-printed output, you need to do two things:
Parse the input file using an XMLParser object with remove_blank_text=True.
Write the output using pretty_print=True
Example:
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse("Test_startup_0029.xml", parser)
root = tree.getroot()
duration_tag = etree.SubElement(root, "Duration")
duration_tag.text = "12"
sw_version_tag = etree.SubElement(root, "EVC-SW-Version")
sw_version_tag.text = "08.02.0001.0027"
stac_release_tag = etree.SubElement(root, "STAC-Release")
stac_release_tag.text = "08.02.0001.0027"
tree.write("output.xml", pretty_print=True)
Contents of output.xml:
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
<Duration>12</Duration>
<EVC-SW-Version>08.02.0001.0027</EVC-SW-Version>
<STAC-Release>08.02.0001.0027</STAC-Release>
</Scenario>
See also http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output.
I am trying to parse an xml and get the value of dir_path as below,however I dont seem to get the desired output,whats wrong here and how to fix it?
input.xml
<?xml version="1.0" ?>
<data>
<software>
<name>xyz</name>
<role>xyz</role>
<future>unknown</future>
</software>
<software>
<name>abc</name>
<role>abc</role>
<future>clear</future>
<dir_path cmm_root_path_var="COMP_softwareROOT">\\location\software\INR\</dir_path>
<loadit reduced="true">
<RW>yes</RW>
<readonly>R/</readonly>
</loadit>
<upload reduced="true">
</upload>
</software>
<software>
<name>def</name>
<role>def</role>
<future>clear</future>
<dir_path cmm_root_path_var="COMP2_softwareROOT">\\location1\software\INR\</dir_path>
<loadit reduced="true">
<RW>yes</RW>
<readonly>R/</readonly>
</loadit>
<upload reduced="true">
</upload>
</software>
</data>
CODE:-
tree = ET.parse(input.xml)
root = tree.getroot()
dir_path = root.find(".//dir_path")
print dir_path.text
OUTPUT:-
.\
EXPECTED OUTPUT:-
\\location\software\INR\
Try the following:
from xml.etree import ElementTree as ET
tree = ET.parse('filename.xml')
item = tree.find('software/[name="abc"]/dir_path')
print(item.text if item is not None else None)
Im trying to take two elements from one file (file1.xml), and write them onto the end of another file (file2.xml). I am able to get them to print out, but am stuck trying to write them onto file2.xml! Help !
filename = "file1.xml"
appendtoxml = "file2.xml"
output_file = appendtoxml.replace('.xml', '') + "_editedbyed.xml"
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(filename, parser)
etree.tostring(tree)
root = tree.getroot()
a = root.findall(".//Device")
b = root.findall(".//Speaker")
for r in a:
print etree.tostring(r)
for e in b:
print etree.tostring(e)
NewSub = etree.SubElement (root, "Audio(just writes audio..")
print NewSub
I want the results of a, b to be added onto the end of outputfile.xml in the root.
Parse both the input file and the file you wish to append to.
Use root.append(elt) to append Element, elt, to root.
Then use tree.write to write the new tree to a file (e.g. appendtoxml):
Note: The links above point to documentation for xml.etree from the standard
library. Since lxml's API tries to be compatible with the standard library's
xml.etree, the standard library documentation applies to lxml as well (at
least for these methods). See http://lxml.de/api.html for information on where
the APIs differ.
import lxml.etree as ET
filename = "file1.xml"
appendtoxml = "file2.xml"
output_file = appendtoxml.replace('.xml', '') + "_editedbyed.xml"
parser = ET.XMLParser(remove_blank_text=True)
tree = ET.parse(filename, parser)
root = tree.getroot()
out_tree = ET.parse(appendtoxml, parser)
out_root = out_tree.getroot()
for path in [".//Device", ".//Speaker"]:
for elt in root.findall(path):
out_root.append(elt)
out_tree.write(output_file, pretty_print=True)
If file1.xml contains
<?xml version="1.0"?>
<root>
<Speaker>boozhoo</Speaker>
<Device>waaboo</Device>
<Speaker>anin</Speaker>
<Device>gigiwishimowin</Device>
</root>
and file2.xml contains
<?xml version="1.0"?>
<root>
<Speaker>jubal</Speaker>
<Device>crane</Device>
</root>
then file2_editedbyed.xml will contain
<root>
<Speaker>jubal</Speaker>
<Device>crane</Device>
<Device>waaboo</Device>
<Device>gigiwishimowin</Device>
<Speaker>boozhoo</Speaker>
<Speaker>anin</Speaker>
</root>