how to get value of an xml element not directly under root

how to get value of an xml element not directly under root - python

I am trying to parse an xml and get the value of dir_path as below,however I dont seem to get the desired output,whats wrong here and how to fix it?
input.xml
<?xml version="1.0" ?>
<data>
<software>
<name>xyz</name>
<role>xyz</role>
<future>unknown</future>
</software>
<software>
<name>abc</name>
<role>abc</role>
<future>clear</future>
<dir_path cmm_root_path_var="COMP_softwareROOT">\\location\software\INR\</dir_path>
<loadit reduced="true">
<RW>yes</RW>
<readonly>R/</readonly>
</loadit>
<upload reduced="true">
</upload>
</software>
<software>
<name>def</name>
<role>def</role>
<future>clear</future>
<dir_path cmm_root_path_var="COMP2_softwareROOT">\\location1\software\INR\</dir_path>
<loadit reduced="true">
<RW>yes</RW>
<readonly>R/</readonly>
</loadit>
<upload reduced="true">
</upload>
</software>
</data>
CODE:-
tree = ET.parse(input.xml)
root = tree.getroot()
dir_path = root.find(".//dir_path")
print dir_path.text
OUTPUT:-
.\
EXPECTED OUTPUT:-
\\location\software\INR\

Try the following:
from xml.etree import ElementTree as ET
tree = ET.parse('filename.xml')
item = tree.find('software/[name="abc"]/dir_path')
print(item.text if item is not None else None)

Related

xml.etree.ElementTree .remove

I'm trying to remove tags from an Xml.Alto file with remove.
My Alto file looks like this:
<alto xmlns="http://www.loc.gov/standards/alto/ns-v4#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-2.xsd"> <Description>
<MeasurementUnit>pixel</MeasurementUnit>
<sourceImageInformation>
<fileName>filename</fileName>
</sourceImageInformation>
</Description>
<Layout>
<Page>
<PrintSpace>
<TextBlock>
<Shape><Polygon/></Shape>
<TextLine>
<Shape><Polygon/></Shape>
<String CONTENT="ABCDEF" HPOS="1234" VPOS="1234" WIDTH="1234" HEIGHT="1234" />
</TextLine>
</TextBlock>
</PrintSpace>
</Page>
</Layout>
</alto>
AND my code is :
import xml.etree.ElementTree as ET
tree = ET.parse("file.xml")
root = tree.getroot()
ns = {'alto': 'http://www.loc.gov/standards/alto/ns-v4#'}
ET.register_namespace("", "http://www.loc.gov/standards/alto/ns-v4#")
for Test in root.findall('.//alto:TextBlock', ns):
root.remove(Test)
tree.write('out.xml', encoding="UTF-8", xml_declaration=True)
Here is the error I get:
ValueError: list.remove(x): x not in list
Thanks a lot for your help 💐

ElementFather.remove(ElementChild) works only if the ElementChild is a sub-element of ElementFather. In your case, you have to call remove from PrintSpace.
import xml.etree.ElementTree as ET
tree = ET.parse("file.xml")
root = tree.getroot()
ns = {'alto': 'http://www.loc.gov/standards/alto/ns-v4#'}
ET.register_namespace("", "http://www.loc.gov/standards/alto/ns-v4#")
for Test in root.findall('.//alto:TextBlock', ns):
PrintSpace = root.find('.//alto:PrintSpace',ns)
PrintSpace.remove(Test)
tree.write('out.xml', encoding="UTF-8", xml_declaration=True)
Note: This code is only an example of a working solution, for sure you can improve it.

parsing serial numbers tags from xml python

I have an xml file a shorter version is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<DATA>
<_1>
<member_id>AFCE6DB97D4CD67D</member_id>
</_1>
<_2>
<member_id>AFCE6DB97D4CD67D</member_id>
</_2>
</DATA>
I am using the following code to parse
tree = ElementTree.parse(args['inputxml'])
root = tree.getroot()
for dat in root:
memberID = dat.find('member_id').text
I am able to parse the member id but not sure how to parse the serial number <_1>``<_2>etc. This number keeps extending with every new record in xml.

You could use xpath():
xml = """<?xml version="1.0" encoding="UTF-8"?>
<DATA>
<_1>
<member_id>AFCE6DB97D4CD67D</member_id>
</_1>
<_2>
<member_id>AFCE6DB97D4CD67D</member_id>
</_2>
</DATA>"""
root = etree.fromstring(xml)
members = root.xpath("//member_id")
for m in members:
print m.text, m.getparent().tag
This prints:
AFCE6DB97D4CD67D _1
AFCE6DB97D4CD67E _2

Python add Tags to XML using lxml

I have the following Input XML:
<?xml version="1.0" encoding="utf-8"?>
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
</Scenario>
My Program does add three Tags to the XML but they are formatted false.
The Output XML looks like the following:
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
<Duration>12</Duration><EVC-SW-Version>08.02.0001.0027</EVC-SW-Version><STAC-Release>08.02.0001.0027</STAC-Release></Scenario>
Thats my Source-Code:
class XmlManager:
#staticmethod
def write_xml(xml_path, duration, evc_sw_version):
xml_path = os.path.abspath(xml_path)
if os.path.isfile(xml_path) and xml_path.endswith(".xml"):
# parse XML into etree
root = etree.parse(xml_path).getroot()
# add tags
duration_tag = etree.SubElement(root, "Duration")
duration_tag.text = duration
sw_version_tag = etree.SubElement(root, "EVC-SW-Version")
sw_version_tag.text = evc_sw_version
stac_release = evc_sw_version
stac_release_tag = etree.SubElement(root, "STAC-Release")
stac_release_tag.text = stac_release
# write changes to the XML-file
tree = etree.ElementTree(root)
tree.write(xml_path, pretty_print=False)
else:
XmlManager.logger.log("Invalid path to XML-file")
def main():
xml = r".\Test_Input_Data_Base\blnmerf1_md1czjyc_REL_V_08.01.0001.000x\Test_startup_0029\Test_startup_0029.xml"
XmlManager.write_xml(xml, "12", "08.02.0001.0027")
My Question is how to add the new tags to the XML in the right format. I guess its working that way for parsing again the changed XML but its not nice formated. Any Ideas? Thanks in advance.

To ensure nice pretty-printed output, you need to do two things:
Parse the input file using an XMLParser object with remove_blank_text=True.
Write the output using pretty_print=True
Example:
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse("Test_startup_0029.xml", parser)
root = tree.getroot()
duration_tag = etree.SubElement(root, "Duration")
duration_tag.text = "12"
sw_version_tag = etree.SubElement(root, "EVC-SW-Version")
sw_version_tag.text = "08.02.0001.0027"
stac_release_tag = etree.SubElement(root, "STAC-Release")
stac_release_tag.text = "08.02.0001.0027"
tree.write("output.xml", pretty_print=True)
Contents of output.xml:
<Scenario xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Scenario.xsd">
<TestCase>test_startup_0029</TestCase>
<ShortDescription>Restart of the EVC with missing ODO5 board.</ShortDescription>
<Events>
<Event Num="1">Switch on the EVC</Event>
</Events>
<HW-configuration>
<ELBE5A>true</ELBE5A>
<ELBE5K>false</ELBE5K>
</HW-configuration>
<SystemFailure>true</SystemFailure>
<Duration>12</Duration>
<EVC-SW-Version>08.02.0001.0027</EVC-SW-Version>
<STAC-Release>08.02.0001.0027</STAC-Release>
</Scenario>
See also http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output.

python extract xml element value to csv

I'm a new to python so please bear with me as I try to explain what I am trying to do
here is my xml
<?xml version="1.0"?>
<playlist>
<list>
<txdate>2015-10-30</txdate>
<channel>cake</channel>
<name>Play List</name>
</list>
<eventlist>
<event type="MEDIA">
<title>title1</title>
<starttype>FIX</starttype>
<mediaid>a</mediaid>
<onairtime>2015-10-30T13:30:00:00</onairtime>
<som>00:00:40:03</som>
<duration>01:15:47:15</duration>
<reconcilekey>123</reconcilekey>
<category>PROGRAM</category>
<subtitles>
<cap>CLOSED</cap>
<file>a</file>
<lang>ENG</lang>
<lang>GER</lang>
</subtitles>
</event>
<event type="MEDIA">
<title>THREE DAYS AND A CHILD</title>
<mediaid>b</mediaid>
<onairtime>2015-10-30T14:45:47:15</onairtime>
<som>00:00:00:00</som>
<duration>01:19:41:07</duration>
<reconcilekey>321</reconcilekey>
<category>PROGRAM</category>
<subtitles>
<cap>CLOSED</cap>
<file>b</file>
<lang>ENG</lang>
<lang>GER</lang>
</subtitles>
</event>
</eventlist>
</playlist>
I would like to print all the mediaid values to a file
this is my code so far
import os
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
wfile = 'new.csv'
for child in root:
child.find( "media type" )
for x in child.iter("mediaid"):
file = open(wfile, 'a')
file.write(str(x))
file.close
I tried this with a few other nonstandard libraries but I didn't have much success

For your requirement (as mentioned in the comments) -
just the mediaid from each <event type="MEDIA">
You should use findall() method of ElementTree to get all the event elements with type="MEDIA" , and then get the child mediaid element from it. Example -
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
with open('new.csv','w') as outfile:
for elem in root.findall('.//event[#type="MEDIA"]'):
mediaidelem = elem.find('./mediaid')
if mediaidelem is not None:
outfile.write("{}\n".format(mediaidelem.text))

how can I select all descendants of a certain element with ElementTree in Python 3.3?

This is the sample data.
input.xml
<root>
<entry id="1">
<headword>go</headword>
<example>I <hw>go</hw> to school.</example>
</entry>
</root>
I'd like to put node and its descendants into . That is,
output.xml
<root>
<entry id="1">
<headword>go</headword>
<examplegrp>
<example>I <hw>go</hw> to school.</example>
</examplegrp>
</entry>
</root>
My poor and incomplete script is:
import codecs
import xml.etree.ElementTree as ET
fin = codecs.open(r'input.xml', 'rb', encoding='utf-8')
data = ET.parse(fin)
root = data.getroot()
example = root.find('.//example')
for elem in example.iter():
---and then I don't know what to do---

Here's an example of how it can be done:
text = """
<root>
<entry id="1">
<headword>go</headword>
<example>I <hw>go</hw> to school.</example>
</entry>
</root>
"""
import lxml.etree
import StringIO
data = lxml.etree.parse(StringIO.StringIO(text))
root = data.getroot()
for entry in root.xpath('//example/ancestor::entry[1]'):
examplegrp = lxml.etree.SubElement(entry,"examplegrp")
nodes = [node for node in entry.xpath('./example')]
for node in nodes:
entry.remove(node)
examplegrp.append(node)
print lxml.etree.tostring(root,pretty_print=True)
which will output:
<root>
<entry id="1">
<headword>go</headword>
<examplegrp><example>I <hw>go</hw> to school.</example>
</examplegrp></entry>
</root>

http://docs.python.org/3/library/xml.dom.html?highlight=xml#node-objects
http://docs.python.org/3/library/xml.dom.html?highlight=xml#document-objects
You probably want to follow some paradigm of creating a Document Element and appending reach result to it.
group = Document.createElement(tagName)
for found in founds:
group.appendNode(found)
Or something like this

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to get value of an xml element not directly under root - python

Try the following: from xml.etree import ElementTree as ET tree = ET.parse('filename.xml') item = tree.find('software/[name="abc"]/dir_path') print(item.text if item is not None else None)

Related

xml.etree.ElementTree .remove

parsing serial numbers tags from xml python

Python add Tags to XML using lxml

python extract xml element value to csv

how can I select all descendants of a certain element with ElementTree in Python 3.3?

Categories

Resources