Append xml to existing xml in python - python

I have an XML file as:
<a>
<b>
<c>
<condition>
....
</condition>
</c>
</b>
</a>
I have another XML in string type as :
<condition>
<comparison compare="and">
<operand idref="Agent" type="boolean" />
<comparison compare="lt">
<operand idref="Premium" type="float" />
<operand type="int" value="10000" />
</comparison>
</comparison>
</condition>
I need to comment the 'condition block' in the first xml and then append this second xml in place of it.
I did not try to comment the first block but tried to append the second xml in the first. I am able to append it to it but I am getting the '<' and '>' as
&lt ; and &gt ; respectively as
<a>
<b>
<c>
<condition>
....
</condition>
<condition>
<comparison compare="and">
<operand idref="Agent" type="boolean"/>
<comparison compare="lt">
<operand idref="Premium" type="float"/>
<operand type="int" value="10000"/>
</comparison>
</comparison>
</condition>
How do I convert this back to < and > rather than lt and gt?
And how do I delete or comment the <condition> block of the first xml below which I will append the new xml?
tree = ET.parse('basexml.xml') #This is the xml where i will append
tree1 = etree.parse(open('newxml.xml')) # This is the xml to append
xml_string = etree.tostring(tree1, pretty_print = True) #converted the xml to string
tree.find('a/b/c').text = xml_string #updating the content of the path with this new string(xml)
I converted the 'newxml.xml' into a string 'xml_string' and then appended to the path a/b/c of the first xml

You are adding newxml.xml, as a string, to the text property of the <c> element. That does not work. You need to add an Element object as a child of <c>.
Here is how it can be done:
from xml.etree import ElementTree as ET
# Parse both files into ElementTree objects
base_tree = ET.parse("basexml.xml")
new_tree = ET.parse("newxml.xml")
# Get a reference to the "c" element (the parent of "condition")
c = base_tree.find(".//c")
# Remove old "condition" and append new one
old_condition = c.find("condition")
new_condition = new_tree.getroot()
c.remove(old_condition)
c.append(new_condition)
print ET.tostring(base_tree.getroot())
Result:
<a>
<b>
<c>
<condition>
<comparison compare="and">
<operand idref="Agent" type="boolean" />
<comparison compare="lt">
<operand idref="Premium" type="float" />
<operand type="int" value="10000" />
</comparison>
</comparison>
</condition></c>
</b>
</a>

Related

How to retrieve all values of a specific attribute from sub-elements that contain this attribute?

I have the following XML file:
<main>
<node>
<party iot="00">Big</party>
<children type="me" value="3" iot="A">
<p>
<display iot="B|S">
<figure iot="FF"/>
</display>
</p>
<li iot="C"/>
<ul/>
</children>
</node>
<node>
<party iot="01">Small</party>
<children type="me" value="1" iot="N">
<p>
<display iot="T|F">
<figure iot="MM"/>
</display>
</p>
</children>
</node>
</main>
How can I retrieve all values of iot attribute from sub-elements of children of the first node? I need to retrieve the values of iot as a list.
The expected result:
iot_list = ['A','B|S','FF','C']
This is my current code:
import xml.etree.ElementTree as ET
mytree = ET.parse("file.xml")
myroot = mytree.getroot()
list_nodes = myroot.findall('node')
for n in list_nodes:
# ???
This is easier to do using the lxml library:
If the sample xml in your question represents the exact structure of the actual xml:
from lxml import etree
data = """[your xml above]"""
doc = etree.XML(data)
print(doc.xpath('//node[1]//*[not(self::party)][#iot]/#iot'))
More generically:
for t in doc.xpath('//node[1]//children'):
print(t.xpath('.//descendant-or-self::*/#iot'))
In either case, the output should be
['A', 'B|S', 'FF', 'C']

XML Parsing with Python - find attribute value with Elementtree

I am working on XML-Parsing and I want to get the attribute of a sepcific value.
I have a XML-file (see bellow) and I want to get the value of val in the second line after lid="diagnosticEcgSpeed" which is -1.
<global>
<setting lid="diagnosticEcgSpeed" val="-1" pers="" res="" unit="mm/s">
<txt id="001041" description="" type="">Geschwindigkeit</txt>
<value lid="1" val="-1" text="50"/>
<value lid="2" val="-2" text="25"/>
<value lid="4" val="-4" text="12,5"/>
<!-- todo: only one value is needed -> use adult value -->
<preset i="-1" c="-1" a="-1" />
</setting>
<setting lid="diagnosticEcgScale" val="10" unit="mm/mV" pers="" res="">
<txt id="001040" description="" type="">Amplitudenskalierung</txt>
<value lid="2" val="2" />
<value lid="5" val="5" />
<value lid="10" val="10" />
<value lid="20" val="20" />
<!-- todo: only one value is needed -> use adult value -->
<preset i="10" c="10" a="10" />
</setting>
</global>
I tried so far this code:
import xml.etree.ElementTree as ET
tree = ET.parse('basics.xml')
root = tree.getroot()
y=root.find(".//*[#lid='diagnosticEcgSpeed']").attrib['val']
print(y)
And the return is
Traceback (most recent call last):
File "parsing_example.py", line 5, in <module>
y=root.find(".//*[#lid='diagnosticEcgSpeed']").attrib['val']
KeyError: 'val'
I don't understand what my error is to get my value var.
You can use the following xpath: .//setting[#lid='diagnosticEcgSpeed'] to retrieve the element and then retrieve its attribute.
See the example below:
data = """
<global>
<setting lid="diagnosticEcgSpeed" val="-1" pers="" res="" unit="mm/s">
<txt id="001041" description="" type="">Geschwindigkeit</txt>
<value lid="1" val="-1" text="50"/>
<value lid="2" val="-2" text="25"/>
<value lid="4" val="-4" text="12,5"/>
<!-- todo: only one value is needed -> use adult value -->
<preset i="-1" c="-1" a="-1" />
</setting>
</global>
"""
import xml.etree.ElementTree as ET
tree = ET.fromstring(data)
y=tree.find(".//setting[#lid='diagnosticEcgSpeed']").attrib["val"]
print(y)
In your case if you want to extract this value directly from a file you can use the following:
import xml.etree.ElementTree as ET
tree = ET.parse('./basics.xml')
y=tree.find(".//setting[#lid='diagnosticEcgSpeed']").attrib['val']
print(y)
Which output:
-1

Parsing XML with Python - Accessing Values

I have recently got a RaspberryPi and have started to learn Python. To begin with I want to parse an XML file and I am doing this via the untangle library.
My XML looks like:
<?xml version="1.0" encoding="utf-8"?>
<weatherdata>
<location>
<name>Katherine</name>
<type>Administrative division</type>
<country>Australia</country>
<timezone id="Australia/Darwin" utcoffsetMinutes="570" />
<location altitude="176" latitude="-14.65012" longitude="132.17414" geobase="geonames" geobaseid="7839404" />
</location>
<sun rise="2019-02-04T06:33:52" set="2019-02-04T19:16:15" />
<forecast>
<tabular>
<time from="2019-02-04T06:30:00" to="2019-02-04T12:30:00" period="1">
<!-- Valid from 2019-02-04T06:30:00 to 2019-02-04T12:30:00 -->
<symbol number="9" numberEx="9" name="Rain" var="09" />
<precipitation value="1.8" />
<!-- Valid at 2019-02-04T06:30:00 -->
<windDirection deg="314.8" code="NW" name="Northwest" />
<windSpeed mps="3.3" name="Light breeze" />
<temperature unit="celsius" value="26" />
<pressure unit="hPa" value="1005.0" />
</time>
<time from="2019-02-04T12:30:00" to="2019-02-04T18:30:00" period="2">
<!-- Valid from 2019-02-04T12:30:00 to 2019-02-04T18:30:00 -->
<symbol number="9" numberEx="9" name="Rain" var="09" />
<precipitation value="2.3" />
<!-- Valid at 2019-02-04T12:30:00 -->
<windDirection deg="253.3" code="WSW" name="West-southwest" />
<windSpeed mps="3.0" name="Light breeze" />
<temperature unit="celsius" value="29" />
<pressure unit="hPa" value="1005.0" />
</time>
</tabular>
</forecast>
</weatherdata>
From this I would like to be able to print out the from and to attributes of the <time> element as well as the value attribute in its child node <temperature>
I can correctly print out the temperature values if I run the Python script below:
for forecast in data.weatherdata.forecast.tabular.time:
print (forecast.temperature['value'])
but if I run
for forecast in data.weatherdata.forecast.tabular:
print ("time is " + forecast.time['from'] + "and temperature is " + forecast.time.temperature['value'])
I get an error:
print (forecast.time['from'] + forecast.time.temperature['value'])
TypeError: list indices must be integers, not str
Can anyone advise how I can correctly access these values?
forecast.time should be a list, as it does have multiple values, one for each <time> node.
Did you expect forecast.time['from'] to automatically aggregate that data?

How to pass <Br /> element when combine text in XML using python?

I've been trying to combine all text in the content element in XML using python.
I succeeded combining all content text but need to except content which is right below <'Br /> element.
<'Br /> element means Enter in adobe indesign program.
This XML is exported from adobe indesign.
This is example as follow :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Root>
<Story>
<ParagraphStyleRange>
<CharacterStyleRange>
<Content>AAA</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Content>BBB</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Br />
<Content>CCC</Content>
<Br />
<Content>DDD</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Content>EEE</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Br />
<Content>FFF</Content>
<Br />
</CharacterStyleRange>
</ParagraphStyleRange>
</Story>
</Root>
and it's what i want as follow :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Root>
<Story>
<ParagraphStyleRange>
<CharacterStyleRange>
<Content>AAA</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Content>AAABBB</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Br />
<Content>CCC</Content>
<Br />
<Content>DDD</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Content>DDDEEE</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Br />
<Content>FFF</Content>
<Br />
</CharacterStyleRange>
</ParagraphStyleRange>
</Story>
</Root>
As you see, i don't want to add content text to next one if there is <'Br /> element right above the content that i want to add.
In detail, the first Content element text is AAA and next one is BBB.
in this case AAA should be attched in front of BBB.
and BBB is not attached in front of CCC because there is <'Br /> element right above CCC Content.
Would you help me how to recognize the <'Br /> element to pass?
this is what i'am doing code so far, but it doesn't work well...
tree = ET.parse("C:\\Br_test.xml")
root = tree.getroot()
for ParagraphStyleRange in root.findall('.//Story/ParagraphStyleRange'):
CharacterStyleRange_count = len(ParagraphStyleRange.findall('CharacterStyleRange'))
#print(CharacterStyleRange_count)
if int(CharacterStyleRange_count) >= 2 :
try :
Content_collect = ''
for CharacterStyleRange in ParagraphStyleRange.findall('CharacterStyleRange'):
Br_count = len(CharacterStyleRange.findall('Br'))
print(Br_count)
if int(Br_count) == 0 :
for Content in CharacterStyleRange.findall('Content'):
Content_collect += Content.text
Content.text = str(Content_collect)
print(Content_collect)
#---- Code to delete Contents that are attached to next one---
#for CharacterStyleRange in ParagraphStyleRange.findall('CharacterStyleRange')[:-1]:
# for Content in CharacterStyleRange.findall('Content'):
# Content_remove = CharacterStyleRange.remove(Content)
except:
pass

python element tree iterparse filter nodes and children

I am trying to use elementTree's iterparse function to filter nodes based on the text and write them to a new file. I am using iterparse becuase the input file is large (100+ MB)
input.xml
<xmllist>
<page id="1">
<title>movie title 1</title>
<text>this is a moviein theatres/text>
</page>
<page id="2">
<title>movie title 2</title>
<text>this is a horror film</text>
</page>
<page id="3">
<title></title>
<text>actor in film</text>
</page>
<page id="4">
<title>some other topic</title>
<text>nothing related</text>
</page>
</xmllist>
Expected output (all pages where the text has "movie" or "film" in them)
<xmllist>
<page id="1">
<title>movie title 1</title>
<text>this is a movie<n theatres/text>
</page>
<page id="2">
<title>movie title 2</title>
<text>this is a horror film</text>
</page>
<page id="3">
<title></title>
<text>actor in film</text>
</page>
</xmllist>
Current code
import xml.etree.cElementTree as etree
from xml.etree.cElementTree import dump
output_file=open('/tmp/outfile.xml','w')
for event, elem in iter(etree.iterparse("/tmp/test.xml", events=('start','end'))):
if event == "end" and elem.tag == "page": #need to add condition to search for strings
output_file.write(elem)
elem.clear()
How do I add the regular expression to filter based on page's text attribute?
You're looking for a child, not an attribute, so it's simplest to analyze the title as it "passes by" in the iteration and remember the result until you get the end of the resulting page:
import re
good_page = False
for event, elem in iter(etree.iterparse("/tmp/test.xml", events=('start','end'))):
if event == 'end':
if elem.tag = 'title':
good_page = re.search(r'film|movie', elem.text)
elif elem.tag == 'page':
if good_page:
output_file.write(elem)
good_page = False
elem.clear()
The re.search will return None if not found, and the if treats that as false, so we're avoiding the writing of pages without a title as well as ones whose title's text does not match your desired RE.

Categories