Python XML file creation usnig a loop and assignig values to subelements - python

I'm using xml.etree.ElementTree module with Python 3.6 to create an XML file with dozens of subelements. What I'm aiming for should look like this:
<shots>
<shot id="0">
<Audio_Channels>2</Audio_Channels>
<Audio_File>testhq12.mov</Audio_File>
<Audio_Fps>Unspecified</Audio_Fps>
...
<Type>C</Type>
<Width>4096</Width>
<shot/>
<shot id="1">
....
</shots>
And so far I've been using the following code to create this structure but it gets very ugly when theres a lot of 'sub-fields' to add
_audio_channels = Element('Audio_Channels')
shot.append(_audio_channels)
_audio_channels.text = str(audio_channels_data)
_audio_file = Element('Audio_File')
shot.append(_audio_file)
_audio_file.text = str(audio_file_data)
.
.
.
And so I've tried to simplify it with a loop looking somewhat like this:
fields = ['Audio_Channels', 'Audio_File', 'Audio_Fps', ...]
for k in fields:
prop = Element(k)
shot.append(prop)
But I have no idea how to assing any text to them later on using only elements from fields list as sort of keys?
Tried this but it's not working
shot.insert(str(audio_file_data), 'Audio_File')

If I undertand correctly what you are after, try something like this:
import xml.etree.ElementTree as ET
fields = ['Audio_Channels', 'Audio_File', 'Audio_Fps']
dats = [ 2,'testhq12.mov', 'Unspecified']
shots = ET.Element('shots')
shot = ET.SubElement(shots, 'shot')
for f, d in zip(fields,dats):
elem = ET.Element(f)
elem.text=str(d)
shot.append(elem)
The output should look something like:
<shots>
<shot>
<Audio_Channels>2</Audio_Channels>
<Audio_File>testhq12.mov</Audio_File>
<Audio_Fps>Unspecified</Audio_Fps>
</shot>
</shots>

Related

Make loop temp variable correlate with index instead of string

So I have this program that reads lines from a file and inserts them into a list line by line (not pictured in code.) My objective is to find a specific start index indicated by a number surrounded by XML formats and find a specific end index indicated by a "/Invoice". I am able to successfully find these indexes using the start_indexes and end_indexes functions I created below.
I was informed (and experienced firsthand) the dangers of del list in loops otherwise that solution would have been perfect. Instead, I was advised to add everything I wanted to delete to a new list that I would then somehow use to delete the difference from my original list.
With that context being given, my question is as follows:
What is the best way to accomplish what I am trying to do with the def deletion_list()?
I am aware that the "lines" in lst_file are strings, and I am attempting to compare them to indexes. That's where I am stumped; I don't know a way to convert the temp variable that is a string and make it into an index so the function works as I expect, or if there is a better way to do it.
start_indexes = []
for i in str_lst:
invoice_index_start = lst_file.index('<InvoiceNumber>' + i + '</InvoiceNumber>\n')
start_indexes.append(invoice_index_start)
end_indexes = []
constant = '</Invoice>\n'
for i in range(0,len(start_indexes)):
invoice_index_end = lst_file.index(constant, start_indexes[i])
end_indexes.append(invoice_index_end + 1)
result = []
def deletion_list():
for lines in lst_file:
if lst_file[] > lst_file[invoice_index_start] and lst_file[] < lst_file[invoice_index_end]
result.append(lines)
return lst_file
I assume your list looks like similar as:
Invoice_1.xml and you would remove InvoiceNumber 2 and 4.
Input:
<?xml version="1.0" encoding="utf-8"?>
<root>
<Invoices>
<InvoiceNumber>1</InvoiceNumber>
<InvoiceNumber>2</InvoiceNumber>
<InvoiceNumber>3</InvoiceNumber>
<InvoiceNumber>4</InvoiceNumber>
</Invoices>
</root>
You can parse the input XML file and write the changed XML to Invoice_2.xml:
import xml.etree.ElementTree as ET
tree = ET.parse('Invoice_1.xml')
root = tree.getroot()
print("Original file:")
ET.dump(root)
rem_list = ['2', '4']
parent_map = {(c, p) for p in root.iter( ) for c in p}
for (c, p) in parent_map:
if c.text in rem_list:
p.remove(c)
tree.write('Invoice_2.xml', encoding='utf-8', xml_declaration=True)
tree1 = ET.parse('Invoice_2.xml')
root1 = tree1.getroot()
print("Changed file:")
ET.dump(root1)
Output:
<?xml version='1.0' encoding='utf-8'?>
<root>
<Invoices>
<InvoiceNumber>1</InvoiceNumber>
<InvoiceNumber>3</InvoiceNumber>
</Invoices>
</root>
If you want to delete items from a list, best way is to loop through a copy of original list and you can delete from the original list.
a: list = [1,2,3,4,5,6]
for item in a[:]:
if item % 2 == 0:
a.remove(item)
You can simplify your problem by using XML parsing. Refer this: XML parser-Real python

parse large xml in python

I have a very large xml file (about 100mb) with multiple elements similar to the one in this example
<adrmsg:hasMember>
<aixm:DesignatedPoint gml:id="ID_197095_1650420151927_74256">
<gml:identifier codeSpace="urn:uuid:">084e1bb6-94f7-450f-a88e-44eb465cd5a6</gml:identifier>
<aixm:timeSlice>
<aixm:DesignatedPointTimeSlice gml:id="ID_197095_1650420151927_74257">
<gml:validTime>
<gml:TimePeriod gml:id="ID_197095_1650420151927_74258">
<gml:beginPosition>2020-12-31T00:00:00</gml:beginPosition>
<gml:endPosition indeterminatePosition="unknown"/>
</gml:TimePeriod>
</gml:validTime>
<aixm:interpretation>BASELINE</aixm:interpretation>
<aixm:featureLifetime>
<gml:TimePeriod gml:id="ID_197095_1650420151927_74259">
<gml:beginPosition>2020-12-31T00:00:00</gml:beginPosition>
<gml:endPosition indeterminatePosition="unknown"/>
</gml:TimePeriod>
</aixm:featureLifetime>
<aixm:designator>BITLA</aixm:designator>
<aixm:type>ICAO</aixm:type>
<aixm:location>
<aixm:Point gml:id="ID_197095_1650420151927_74260">
<gml:pos srsName="urn:ogc:def:crs:EPSG::4326">40.87555555555556 21.358055555555556</gml:pos>
</aixm:Point>
</aixm:location>
<aixm:extension>
<adrext:DesignatedPointExtension gml:id="ID_197095_1650420151927_74261">
<adrext:pointUsage>
<adrext:PointUsage gml:id="ID_197095_1650420151927_74262">
<adrext:role>FRA_ENTRY</adrext:role>
<adrext:reference_border>
<adrext:AirspaceBorderCrossingObject gml:id="ID_197095_1650420151927_74263">
<adrext:exitedAirspace xlink:href="urn:uuid:78447f69-9671-41c5-a7b7-bdd82c60e978"/>
<adrext:enteredAirspace xlink:href="urn:uuid:afb35b5b-6626-43ff-9d92-875bbd882c05"/>
</adrext:AirspaceBorderCrossingObject>
</adrext:reference_border>
</adrext:PointUsage>
</adrext:pointUsage>
<adrext:pointUsage>
<adrext:PointUsage gml:id="ID_197095_1650420151927_74264">
<adrext:role>FRA_EXIT</adrext:role>
<adrext:reference_border>
<adrext:AirspaceBorderCrossingObject gml:id="ID_197095_1650420151927_74265">
<adrext:exitedAirspace xlink:href="urn:uuid:78447f69-9671-41c5-a7b7-bdd82c60e978"/>
<adrext:enteredAirspace xlink:href="urn:uuid:afb35b5b-6626-43ff-9d92-875bbd882c05"/>
</adrext:AirspaceBorderCrossingObject>
</adrext:reference_border>
</adrext:PointUsage>
</adrext:pointUsage>
</adrext:DesignatedPointExtension>
</aixm:extension>
</aixm:DesignatedPointTimeSlice>
</aixm:timeSlice>
</aixm:DesignatedPoint>
</adrmsg:hasMember>
The ultimate goal is to have in a pandas DataFrame parsed data from this very big xml file.
So far I cannot 'capture' the data that I am looking for.
I manage only to 'capture' the last data from the very last element in that large xml file.
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
ab = {'aixm':'http://www.aixm.aero/schema/5.1.1', 'adrext':'http://www.aixm.aero/schema/5.1.1/extensions/EUR/ADR', 'gml':'http://www.opengis.net/gml/3.2'}
for point in root.findall('.//aixm:DesignatedPointTimeSlice', ab):
designator = point.find('.//aixm:designator', ab)
d = point.find('.//{http://www.aixm.aero/schema/5.1.1}type', ab)
for pos in point.findall('.//gml:pos', ab):
print(designator.text, pos.text, d.text)
the print statement returns the data that I would like to have but as mentioned, only for the very last element of the file whereas I would like to have the result returned for all of them
ZIFSA 54.02111111111111 27.823888888888888 ICAO
Could I be pls advice on the path I should follow? I need some help pls
Thank you very much
Assuming all three needed nodes (aixm:designator, aixm:type, and gml:pos) are always present, consider parsing the parent nodes, aixm:DesignatedPointTimeSlice and axim:Point and then join them. Finally, select the three final columns needed.
import pandas as pd
ab = {
'aixm':'http://www.aixm.aero/schema/5.1.1',
'adrext':'http://www.aixm.aero/schema/5.1.1/extensions/EUR/ADR',
'gml':'http://www.opengis.net/gml/3.2'
}
time_slice_df = pd.read_xml(
'file.xml', xpath=".//aixm:DesignatedPointTimeSlice", namespaces=ab
).add_prefix("time_slice_")
point_df = pd.read_xml(
'file.xml', xpath=".//aixm:Point", namespaces=ab
).add_prefix("point_")
time_slice_df = (
time_slice_df.join(point_df)
.reindex(
["time_slice_designator", "time_slice_type", "point_pos"],
axis="columns"
)
)
And in forthcoming pandas 1.5, read_xml will support iterparse allowing retrieval of descendant nodes not limited to XPath expressions:
time_slice_df = pd.read_xml(
'file.xml',
namespaces = ab,
iterparse = {"aixm:DesignatedPointTimeSlice":
["aixm:designator", "axim:type", "aixm:Point"]
}
)

How to fill an XML file with a random number of items

I need for testing purposes to fill an XML file with for example 1000 lines (P0 thru p999) with between 1 and 20 steps and then add the random steps.
How can I do that? I can't find any (good) example with a lot of for loops.
the XML needs to look something like this:
I hope to do this in Python.
<root>
<P>P0<NPS>20</NPS><STEPS>5,19,22,12,0,3,22,4,11,0,2,7,20,19,16,24,9,2,15,6,</STEPS></P>
<P>P1<NPS>2</NPS><STEPS>12,21,</STEPS></P>
<P>P2<NPS>15</NPS><STEPS>21,23,10,18,23,22,17,4,17,15,17,18,18,14,22,</STEPS></P>
<P>P3<NPS>4</NPS><STEPS>15,24,12,10,</STEPS></P>
...
</root>
Something like
import random
NUM_OF_LINES = 10
MAX_NUM_OF_STEPS = 7
STEP_RANGE = 20
TEMPLATE = '<P>P{}<NPS>{}</NPS><STEPS>{}</STEPS></P>'
for i in range(1,NUM_OF_LINES):
steps = random.randint(1,MAX_NUM_OF_STEPS)
step_values = [str(random.randint(0,STEP_RANGE)) for x in range(0,steps)]
line = TEMPLATE.format(i,steps,','.join(step_values))
print(line)

Parsing through a deep-nested XML File in Python

I am looking at an xml file similar to the below:
<pinnacle_line_feed>
<PinnacleFeedTime>1418929691920</PinnacleFeedTime>
<lastContest>28962804</lastContest>
<lastGame>162995589</lastGame>
<events>
<event>
<event_datetimeGMT>2014-12-19 11:15</event_datetimeGMT>
<gamenumber>422739932</gamenumber>
<sporttype>Alpine Skiing</sporttype>
<league>DH 145</league>
<IsLive>No</IsLive>
<participants>
<participant>
<participant_name>Kjetil Jansrud (NOR)</participant_name>
<contestantnum>2001</contestantnum>
<rotnum>2001</rotnum>
<visiting_home_draw>Visiting</visiting_home_draw>
</participant>
<participant>
<participant_name>The Field</participant_name>
<contestantnum>2002</contestantnum>
<rotnum>2002</rotnum>
<visiting_home_draw>Home</visiting_home_draw>
</participant>
</participants>
<periods>
<period>
<period_number>0</period_number>
<period_description>Matchups</period_description>
<periodcutoff_datetimeGMT>2014-12-19 11:15</periodcutoff_datetimeGMT>
<period_status>I</period_status>
<period_update>open</period_update>
<spread_maximum>200</spread_maximum>
<moneyline_maximum>100</moneyline_maximum>
<total_maximum>200</total_maximum>
<moneyline>
<moneyline_visiting>116</moneyline_visiting>
<moneyline_home>-136</moneyline_home>
</moneyline>
</period>
</periods>
<PinnacleFeedTime>1418929691920</PinnacleFeedTime>
</event>
</events>
</pinnacle_line_feed>
I have parsed the file with the code below:
pinny_url = 'http://xml.pinnaclesports.com/pinnacleFeed.aspx?sportType=Basketball'
tree = ET.parse(urllib.urlopen(pinny_url))
root = tree.getroot()
list = []
for event in root.iter('event'):
event_datetimeGMT = event.find('event_datetimeGMT').text
gamenumber = event.find('gamenumber').text
sporttype = event.find('sporttype').text
league = event.find('league').text
IsLive = event.find('IsLive').text
for participants in event.iter('participants'):
for participant in participants.iter('participant'):
p1_name = participant.find('participant_name').text
contestantnum = participant.find('contestantnum').text
rotnum = participant.find('rotnum').text
vhd = participant.find('visiting_home_draw').text
for periods in event.iter('periods'):
for period in periods.iter('period'):
period_number = period.find('period_number').text
desc = period.find('period_description').text
pdatetime = period.find('periodcutoff_datetimeGMT')
status = period.find('period_status').text
update = period.find('period_update').text
max = period.find('spread_maximum').text
mlmax = period.find('moneyline_maximum').text
tot_max = period.find('total_maximum').text
for moneyline in period.iter('moneyline'):
ml_vis = moneyline.find('moneyline_visiting').text
ml_home = moneyline.find('moneyline_home').text
However, I am hoping to get the nodes separated by event similar to a 2D table (as in a pandas dataframe). However, the full xml file has multiple "event" children, some events that do not share the same nodes as above. I am struggling quite mightily with being able to take each event node and simply create a 2d table with the tag and that value where the tag acts as the column name and the text acts as the value.
Up to this point, I have done the above to gauge how I might put that information into a dictionary and subsequently put a number of dictionaries into a list from which I can create a dataframe using pandas, but that has not worked out, as all attempts have required me to find and replace text to create the dxcictionaries and python has not responded well to that when attempting to subsequently create a dataframe. I have also used a simple:
for elt in tree.iter():
list.append("'%s': '%s'") % (elt.tag, elt.text.strip()))
which worked quite well in simple pulling out every single tag and the corresponding text, but I was unable to make anything of that because any attempts at finding and replacing the text to create dictionaries was no good.
Any assistance would be greatly appreciated.
Thank you.
Here's an easy way to get your XML into a pandas dataframe. This utilizes the awesome requests library (which you can switch for urllib if you'd like, as well as the always helpful xmltodict library available in pypi. (NOTE: a reverse library is also available, knows as dicttoxml)
import json
import pandas
import requests
import xmltodict
web_request = requests.get(u'http://xml.pinnaclesports.com/pinnacleFeed.aspx?sportType=Basketball')
# Make that unweidly XML doc look like a native Dictionary!
result = xmltodict.parse(web_request.text)
# Next, convert the nested OrderedDict to a real dict, which isn't strictly necessary, but helps you
# visualize what the structure of the data looks like
normal_dict = json.loads(json.dumps(result.get('pinnacle_line_feed', {}).get(u'events', {}).get(u'event', [])))
# Now, make that dictionary into a dataframe
df = pandas.DataFrame.from_dict(normal_dict)
To get some idea of what this is starting to look like, here's the first couple of lines of the CSV:
>>> from StringIO import StringIO
>>> foo = StringIO() # A fake file to write to
>>> df.to_csv(foo) # Output the df to a CSV file
>>> foo.seek(0) # And rewind the file to the beginning
>>> print ''.join(foo.readlines()[:3])
,IsLive,event_datetimeGMT,gamenumber,league,participants,periods,sporttype
0,No,2015-01-10 23:00,426688683,Argentinian,"{u'participant': [{u'contestantnum': u'1071', u'rotnum': u'1071', u'visiting_home_draw': u'Home', u'participant_name': u'Obras Sanitarias'}, {u'contestantnum': u'1072', u'rotnum': u'1072', u'visiting_home_draw': u'Visiting', u'participant_name': u'Libertad'}]}",,Basketball
1,No,2015-01-06 23:00,426686588,Argentinian,"{u'participant': [{u'contestantnum': u'1079', u'rotnum': u'1079', u'visiting_home_draw': u'Home', u'participant_name': u'Boca Juniors'}, {u'contestantnum': u'1080', u'rotnum': u'1080', u'visiting_home_draw': u'Visiting', u'participant_name': u'Penarol'}]}","{u'period': {u'total_maximum': u'450', u'total': {u'total_points': u'152.5', u'under_adjust': u'-107', u'over_adjust': u'-103'}, u'spread_maximum': u'450', u'period_description': u'Game', u'moneyline_maximum': u'450', u'period_number': u'0', u'period_status': u'I', u'spread': {u'spread_visiting': u'3', u'spread_adjust_visiting': u'-102', u'spread_home': u'-3', u'spread_adjust_home': u'-108'}, u'periodcutoff_datetimeGMT': u'2015-01-06 23:00', u'moneyline': {u'moneyline_visiting': u'136', u'moneyline_home': u'-150'}, u'period_update': u'open'}}",Basketball
Notice that the participants and periods columns are still their native Python dictionaries. You'll either need to remove them from the columns list, or do some additional mangling to get them to flatten out:
# Remove the offending columns in this example by selecting particular columns to show
>>> from StringIO import StringIO
>>> foo = StringIO() # A fake file to write to
>>> df.to_csv(foo, cols=['IsLive', 'event_datetimeGMT', 'gamenumber', 'league', 'sporttype'])
>>> foo.seek(0) # And rewind the file to the beginning
>>> print ''.join(foo.readlines()[:3])
,IsLive,event_datetimeGMT,gamenumber,league,sporttype
0,No,2015-01-10 23:00,426688683,Argentinian,Basketball
1,No,2015-01-06 23:00,426686588,Argentinian,Basketball

Python Elementtree SubElement incorrect

I am trying to add a SubElement to a separate SubElement and I can't figure out why the output xml is coming out incorrectly.
import xml.etree.cElementTree as ET
def CreateXml(list):
Channel = ET.Element("Channel")
cell = ET.SubElement(Channel,"cell")
pngIcc0 = ET.SubElement(cell,"InputIcc0")
pngIcc0.set("StructName","pngImage")
pngIcc0.text = "DataBuffer_t"
type0 = ET.SubElement(pngIcc0,"DataType")
type0.set("Type","dataBuffer_t")
pngOut0 = ET.SubElement(cell,"OutputIcc0")
pngOut0.set("StructName","rawImage")
pngOut0.text = "DataBuffer_t"
tree = ET.ElementTree(Channel)
tree.write("E:\Programming/ChannelCreation.xml")
The Resulting xml looks like
<InputIcc0 StructName="pngImage">
DataBuffer_t
<DataType Type="dataBuffer_t"/>
</InputIcc0>
If i want to make Type0 a child of pngIcc0 which is a child of cell what is the correct way to do it? Or am i going about this completly wrong. I don't have much experience with python of xml.

Categories