Parsing XML with Python - Accessing Values - python

I have recently got a RaspberryPi and have started to learn Python. To begin with I want to parse an XML file and I am doing this via the untangle library.
My XML looks like:
<?xml version="1.0" encoding="utf-8"?>
<weatherdata>
<location>
<name>Katherine</name>
<type>Administrative division</type>
<country>Australia</country>
<timezone id="Australia/Darwin" utcoffsetMinutes="570" />
<location altitude="176" latitude="-14.65012" longitude="132.17414" geobase="geonames" geobaseid="7839404" />
</location>
<sun rise="2019-02-04T06:33:52" set="2019-02-04T19:16:15" />
<forecast>
<tabular>
<time from="2019-02-04T06:30:00" to="2019-02-04T12:30:00" period="1">
<!-- Valid from 2019-02-04T06:30:00 to 2019-02-04T12:30:00 -->
<symbol number="9" numberEx="9" name="Rain" var="09" />
<precipitation value="1.8" />
<!-- Valid at 2019-02-04T06:30:00 -->
<windDirection deg="314.8" code="NW" name="Northwest" />
<windSpeed mps="3.3" name="Light breeze" />
<temperature unit="celsius" value="26" />
<pressure unit="hPa" value="1005.0" />
</time>
<time from="2019-02-04T12:30:00" to="2019-02-04T18:30:00" period="2">
<!-- Valid from 2019-02-04T12:30:00 to 2019-02-04T18:30:00 -->
<symbol number="9" numberEx="9" name="Rain" var="09" />
<precipitation value="2.3" />
<!-- Valid at 2019-02-04T12:30:00 -->
<windDirection deg="253.3" code="WSW" name="West-southwest" />
<windSpeed mps="3.0" name="Light breeze" />
<temperature unit="celsius" value="29" />
<pressure unit="hPa" value="1005.0" />
</time>
</tabular>
</forecast>
</weatherdata>
From this I would like to be able to print out the from and to attributes of the <time> element as well as the value attribute in its child node <temperature>
I can correctly print out the temperature values if I run the Python script below:
for forecast in data.weatherdata.forecast.tabular.time:
print (forecast.temperature['value'])
but if I run
for forecast in data.weatherdata.forecast.tabular:
print ("time is " + forecast.time['from'] + "and temperature is " + forecast.time.temperature['value'])
I get an error:
print (forecast.time['from'] + forecast.time.temperature['value'])
TypeError: list indices must be integers, not str
Can anyone advise how I can correctly access these values?

forecast.time should be a list, as it does have multiple values, one for each <time> node.
Did you expect forecast.time['from'] to automatically aggregate that data?

Related

get default value if don't find elements with iterfind

I have these code to extract of xml file some elements:
for general in tree.iter('FOLDER'):
nameFolder = general.attrib.get('FOLDER_NAME')
for job_nodeOS in tablaGeneral.iterfind(".//JOB[#APPL_TYPE='OS']"):
listaOS.clear()
listaOS.append(job_name)
listaOS.append(nameFolder)
listaOS.append(daily)
for job_nodeOS3 in job_nodeOS.iterfind("ON"):
listaOS.append(job_nodeOS3.get('STMT',"NO APLICA"))
listaOS.append(job_nodeOS3.get('CODE',"NO APLICA"))
for job_nodeOS4 in job_nodeOS3.iterfind("DOMAIL"):
listaOS.append(job_nodeOS5.get('SUBJECT',"NO APLICA"))
listaOS.append(job_nodeOS5.get('MESSAGE',"NO APLICA"))
for variable_name in variablesOS:
variable_node = job_nodeOS.find(f"./VARIABLE[#NAME='{variable_name}']")
variable_value = variable_node.get("VALUE", default_value) if variable_node is not None else default_value
#print(job_name, variable_name.lstrip("%"), "=", variable_value)
listaOS.append(variable_value)
My problem is that if the clause for don't find any occurrences, I need listaOS add default values ('NO APLICA').
A piece of code xml:
<?xml version="1.0" encoding="utf-8"?>
<!--Exported at 11-06-2022 17:14:50-->
<DEFTABLE xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Folder.xsd">
<FOLDER SERVER="PROD" VERSION="800" PLATFORM="UNIX" FOLDER_NAME="PALNF" >
<JOB ID="256" APPLICATION="HOUSE" SUBAPP="SERVER" JOBNAME="JOBA" APPL_TYPE="OS">
<SHOUT WHEN="LATESUB" TIME="0825" URGENCY="R" DEST="DESTINATION" MESSAGE="HI" DAYSOFFSET="0" />
<ON STMT="*" CODE="*NETWORK*">
<DOACTION ACTION="NOTOK" />
<DOMAIL URGENCY="U" DEST="EXMAPLE#EXMAPLE.COM" SUBJECT="SUBJECT" MESSAGE="HI" />
</ON>
</JOB>
<JOB ID="1" APPLICATION="OFFICE" SUBAPP="Google" JOBNAME="Google_Update_Task_Machine_UA" APPL_TYPE="OS">
<VARIABLE NAME="%%PARM1" VALUE="GoogleUpdate.exe" />
<VARIABLE NAME="%%PARM2" VALUE="/ua /installsource scheduler" />
</JOB>
</FOLDER>
<FOLDER SERVER="PROD" VERSION="800" PLATFORM="UNIX" FOLDER_NAME="PALNF_CALENDARIO">
<JOB ID="2" APPLICATION="APP" SUBAPP="SUB" JOBNAME="NOSCHEDULER" APPL_TYPE="OS" />
<JOB ID="3" APPLICATION="APP" SUBAPP="SUB" JOBNAME="NOSCHEDULER_CONMONHDAYS" APPL_TYPE="OS" />
</FOLDER>
</DEFTABLE>
Do you know how could I get that?
Thanks and sorry for my English!

python parse conditional multiple lines in text file

I would like to parse a text file, but with multiple conditions:
Input:
something<br />
something <br />
something<br />
Modifications made by xy (xy) on 2019/12/10 10:40:23<br />
location: A --> B<br />
something<br />
something<br />
something<br />
Modifications made by xz (xz) on 2020/01/17 11:11:59<br />
analyzer: C --> D<br />
analyzer: B --> D<br />
analyzer: G --> D<br />
location: E --> F<br />
something<br />
something<br />
something
Task:
I need to find the "location: x --> y" and date before the location. The txt file can contains unknown number of location change.
Required Output:
2019/12/10 10:40:23, location: A --> B
2020/01/17 11:11:59, location: E --> F
I tried some code eg.:
with open('log.txt', 'r') as searchfile:
for line in searchfile:
if 'location' in line:
print (line)
but only find the locations and I don't know how to find the dates for them.
Thank you in advance.
Just keep track of corresponding times and locations as such:
with open('log.txt', 'r') as searchfile:
time = None
for line in searchfile:
if line.startswith('Modifications made by'):
time = line.split('on')[-1].strip()
elif line.startswith('location') and time is not None:
print(f'{time}, {line}')

using python to convert XML to Json and trigger a post API request

im lookning for a solution to convert XML to Json and use the Json as the payload for post request.
I'm aiming for the following logic:
search for all root.listing.scedules.s and parse #s #d #p #c.
in root.listing.programs parse #t [p.id = #p (from scedules)] ->"Prime Discussion"
3, in root.listing.channels parse #c [c.id = #c (from scedules)] -> "mychannel"
once I have all the info parsed, I want to build a JSON containing all the params and send it using post request
I also look for a solution which will trigger multiple post APIs as the number of root.listing.scedules.s elements
{
"time":"{#s}",
"durartion":"{#d}",
"programID":"{#p}",
"title":"{#t}",
"channelName":"{#c}",
}
<?xml version="1.0" encoding="UTF-8"?>
<root>
<listings>
<schedules>
<s s="2019-09-26T00:00:00" d="1800" p="1569735" c="100007">
<f id="3" />
</s>
</schedules>
<programs>
<p id="1569735" t="Prime Discussion" d="Discussion on Current Affairs." rd="Discussion on Current Affairs." l="en">
<f id="2" />
<f id="21" />
<k id="6" v="20160614" />
<k id="1" v="2450548" />
<k id="18" v="12983658" />
<k id="21" v="12983658" />
<k id="10" v="Program" />
<k id="19" v="SH024505480000" />
<k id="20" v="http://tmsimg.com/assets/p12983658_b_h5_aa.jpg" />
<c id="607" />
<r o="1" r="1" n="100" />
<r o="2" r="1" n="1000" />
<r o="3" r="1" n="10000" />
</p>
</programs>
</listings>
<channels>
<c id="100007" c="mychannel" l="Prime Asia TV SD" d="Prime Asia TV SD" t="Digital" iso639="hi" />
<c id="10035" c="AETV" l="A&amp;E Canada" d="A&amp;E Canada" t="Digital" u="WWW.AETV.COM" iso639="en" />
</channels>
</root>
currently, i use this code to parse the scedules.s elements (part 1) and need some help with parts 2,3,4
import xml.etree.ElementTree as ET
tree = ET.parse('ChannelsProgramsTest.xml')
root = tree.getroot()
for sched in root[0][0].findall('s'):
new = sched.get('s'),sched.get('p'),sched.get('d'),sched.get('c')
print(new)
Below (I think it is the core solution you were looking for)
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<root>
<listings>
<schedules>
<s s="2019-09-26T00:00:00" d="1800" p="1569735" c="100007">
<f id="3" />
</s>
</schedules>
<programs>
<p id="1569735" t="Prime Discussion" d="Discussion on Current Affairs." rd="Discussion on Current Affairs." l="en">
<f id="2" />
<f id="21" />
<k id="6" v="20160614" />
<k id="1" v="2450548" />
<k id="18" v="12983658" />
<k id="21" v="12983658" />
<k id="10" v="Program" />
<k id="19" v="SH024505480000" />
<k id="20" v="http://tmsimg.com/assets/p12983658_b_h5_aa.jpg" />
<c id="607" />
<r o="1" r="1" n="100" />
<r o="2" r="1" n="1000" />
<r o="3" r="1" n="10000" />
</p>
</programs>
</listings>
<channels>
<c id="100007" c="mychannel" l="Prime Asia TV SD" d="Prime Asia TV SD" t="Digital" iso639="hi" />
<c id="10035" c="AETV" l="A&amp;E Canada" d="A&amp;E Canada" t="Digital" u="WWW.AETV.COM" iso639="en" />
</channels>
</root>'''
tree = ET.fromstring(xml)
listings = tree.findall('.//listings')
for entry in listings:
# This is the first requirement: find s,d,p,c under 's' element
s = entry.find('./schedules/s')
print(s.attrib)
# now that we have s,d,p,c we can move on and look for the program with a specific id
program = entry.find("./programs/p[#id='{}']".format(s.attrib['p']))
print(program.attrib['t'])
# find the channel
channel = tree.find(".//channels/c[#id='{}']".format(s.attrib['c']))
print(channel.attrib['c'])
output
{'s': '2019-09-26T00:00:00', 'd': '1800', 'p': '1569735', 'c': '100007'}
Prime Discussion
mychannel
I'm still somewhat new to Stackoverflow in usage but not years. I think this is a somewhat duplicate question but I do not know how to tag this as duplicate yet.
A very good explanation of XML to JSON via Python is in the following post by the author of the library suggested.
Converting XML to JSON using Python?
The data source may have unknown characters in it that you will need to code for if you don't use a library
ie, newlines, unicode characters, other 'stray' characters. Often libraries will have done this for you already and you don't have to re-invent the wheel.

How to pass <Br /> element when combine text in XML using python?

I've been trying to combine all text in the content element in XML using python.
I succeeded combining all content text but need to except content which is right below <'Br /> element.
<'Br /> element means Enter in adobe indesign program.
This XML is exported from adobe indesign.
This is example as follow :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Root>
<Story>
<ParagraphStyleRange>
<CharacterStyleRange>
<Content>AAA</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Content>BBB</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Br />
<Content>CCC</Content>
<Br />
<Content>DDD</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Content>EEE</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Br />
<Content>FFF</Content>
<Br />
</CharacterStyleRange>
</ParagraphStyleRange>
</Story>
</Root>
and it's what i want as follow :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Root>
<Story>
<ParagraphStyleRange>
<CharacterStyleRange>
<Content>AAA</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Content>AAABBB</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Br />
<Content>CCC</Content>
<Br />
<Content>DDD</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Content>DDDEEE</Content>
</CharacterStyleRange>
<CharacterStyleRange>
<Br />
<Content>FFF</Content>
<Br />
</CharacterStyleRange>
</ParagraphStyleRange>
</Story>
</Root>
As you see, i don't want to add content text to next one if there is <'Br /> element right above the content that i want to add.
In detail, the first Content element text is AAA and next one is BBB.
in this case AAA should be attched in front of BBB.
and BBB is not attached in front of CCC because there is <'Br /> element right above CCC Content.
Would you help me how to recognize the <'Br /> element to pass?
this is what i'am doing code so far, but it doesn't work well...
tree = ET.parse("C:\\Br_test.xml")
root = tree.getroot()
for ParagraphStyleRange in root.findall('.//Story/ParagraphStyleRange'):
CharacterStyleRange_count = len(ParagraphStyleRange.findall('CharacterStyleRange'))
#print(CharacterStyleRange_count)
if int(CharacterStyleRange_count) >= 2 :
try :
Content_collect = ''
for CharacterStyleRange in ParagraphStyleRange.findall('CharacterStyleRange'):
Br_count = len(CharacterStyleRange.findall('Br'))
print(Br_count)
if int(Br_count) == 0 :
for Content in CharacterStyleRange.findall('Content'):
Content_collect += Content.text
Content.text = str(Content_collect)
print(Content_collect)
#---- Code to delete Contents that are attached to next one---
#for CharacterStyleRange in ParagraphStyleRange.findall('CharacterStyleRange')[:-1]:
# for Content in CharacterStyleRange.findall('Content'):
# Content_remove = CharacterStyleRange.remove(Content)
except:
pass

how to edit XML files in batch / python

I'm trying to edit xml files in a batch / python script
this is my xml file:
<?xml version="1.0" encoding="UTF-8"?>
<task name="analyse">
<taskInfo taskId="21a09311-ade3-4e9a-af21-d13be8b7ba45" runAt="2015-05-20 13:48:50" runTime="5 minutes, 53 seconds">
<project name="13955 - HMI Volvo Truck PA15" number="e20d51c0-71dc-4572-8f9b-4c150bf35222" />
<language lcid="1031" name="German (Germany)" />
<tm name="ENG-DEU_en-GB_de-DE.sdltm" />
<settings reportInternalFuzzyLeverage="yes" reportLockedSegments="no" reportCrossFileRepetitions="yes" minimumMatchScore="70" searchMode="bestWins" missingFormattingPenalty="1" differentFormattingPenalty="1" multipleTranslationsPenalty="1" autoLocalizationPenalty="0" textReplacementPenalty="0" />
</taskInfo>
<file name="VT MAIN TRACK_PA15_Default_DE-DE_20150520_102527.xlf.sdlxliff" guid="111f9ba6-82f6-45fb-ac49-8bf6cf57c169">
<analyse>
<perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
<inContextExact segments="60" words="55" characters="755" placeables="3" tags="0" />
' Replace the Value word="55" with "0"
<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
<locked segments="0" words="0" characters="0" placeables="0" tags="0" />
<crossFileRepeated segments="2" words="20" characters="0" placeables="0" tags="0" />
'Cut the value words="20" replace with 0
<repeated segments="17" words="34" characters="293" placeables="2" tags="0" />
'add the value to current value 20 to 34 so the new value is words="54"
<total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
<new segments="126" words="434" characters="2384" placeables="18" tags="5" />
<fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
<fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
<fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
<internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
<internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
<internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
</analyse>
</file>
<file name="VT MAIN TRACK_PA15_Default_DE-DE_20150523_254796.xlf.sdlxliff" guid="111f9ba6-82f6-45fb-ac49-8bf6cf57c169">
<analyse>
<perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
<inContextExact segments="60" words="67" characters="755" placeables="3" tags="0" />
' Replace the Value word="67" with "0"
<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
<locked segments="0" words="0" characters="0" placeables="0" tags="0" />
<crossFileRepeated segments="2" words="35" characters="0" placeables="0" tags="0" />
'Cut the value words="35" replace with 0
<repeated segments="17" words="54" characters="293" placeables="2" tags="0" />
'add the value to current value 35 to 54 so the new value is words="89"
<total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
<new segments="126" words="434" characters="2384" placeables="18" tags="5" />
<fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
<fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
<fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
<internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
<internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
<internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
</analyse>
</file>
<batchTotal>
<analyse>
<perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
<inContextExact segments="60" words="139" characters="755" placeables="3" tags="0" />
<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
<locked segments="0" words="0" characters="0" placeables="0" tags="0" />
<crossFileRepeated segments="0" words="0" characters="0" placeables="0" tags="0" />
<repeated segments="17" words="54" characters="293" placeables="2" tags="0" />
<total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
<new segments="126" words="434" characters="2384" placeables="18" tags="5" />
<fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
<fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
<fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
<internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
<internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
<internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
</analyse>
</batchTotal>
</task>
general notes:
the <task> is the root element (end element </task>)
the important here is to modify a few tags in a section called file <file> and endtag </file>
there can be X occurrences of <file>*</file>
What i need,
for each <file> element, i would like to:
In <inContextExact>, Set the value of the attribute words with 0
<inContextExact ... words="55" ... /> => <inContextExact ... words="0" ... />
In <crossFileRepeated>, Set the value of the attribute words with 0
<crossFileRepeated ... words="20" ... /> => <crossFileRepeated ... words="0" ... />
In <total>, Set the value of the words attribute to be calculated by my own logic
<total ... words="1462" ... /> => <total ... words="??" ... />
I could really appreciate an example of processing XML files in batch / python
Let's utilize python!
it's extremely easy to do that in python. and since you said it's ok to make a solution in python, check the script below.
here's how you can iterate over a directory contains xml files and process them as requested in python while saving the file changes.
from xml.etree import ElementTree
import os
def edit_xml_file(data):
e = ElementTree.fromstring(data)
for file_element in e.findall('file'):
analyse_element = file_element.find('analyse')
in_context_exact_element = analyse_element.find('inContextExact')
in_context_exact_words = int(in_context_exact_element.get('words'))
in_context_exact_element.set('words', '0')
cross_file_repeated_element = analyse_element.find('crossFileRepeated')
cross_file_repeated_words = int(cross_file_repeated_element.get('words'))
cross_file_repeated_element.set('words', '0')
total_element = analyse_element.find('total')
total_element.set('words', str(in_context_exact_words + cross_file_repeated_words))
xmlstr = ElementTree.tostring(e)
return xmlstr
def main():
source_directory = 'xmlfiles'
for filename in os.listdir(source_directory):
if not filename.endswith('.xml'):
continue
xml_file_path = os.path.join(source_directory, filename)
with open(xml_file_path, 'r+b') as f:
data = f.read()
fixed_data = edit_xml_file(data)
f.seek(0)
f.write(fixed_data)
f.truncate()
if __name__ == '__main__':
main()
in this solution, iv'e used the built in ElementTree utility
Necessary tools
Here are the necessary tools you will need to create a script in Excel VBA or VBscript:
Looping text files in a directory: link
Reading text files: link
Writing text files: link
Replacing using RegExp: link
Example Regex to get you going:
<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
->
<exact segments="114" words="0" characters="1687" placeables="14" tags="3" />
Use this regex:
(words="[0-9]+?") or words="([0-9]+?)" even better
Below an example of processing a single row:
Dim re as RegExp
set re = new RegExp
re.Pattern = "words="([0-9]+?)"
newTextRow = re.Replace(textRow, 0) 'Replace word value with 0
The approach
Loop through your XML files using the Dir function
Read the contents of the file using the link above on how to read text files in VBA
Loop through all rows and use the RegExp function to replace the necessary word params
Save the output back to the XML file using the link above on how to write text files in VBA

Categories