Create a dictionary from an XML using xpath

Create a dictionary from an XML using xpath - python

I would like to create a dictionary from an XML file unsing xpath. Here's an example of the XML:
</Contract>
<Contract ID="1">
<UnwantedPatterns>
<Pattern>0</Pattern>
<Pattern>1</Pattern>
</Contract>
<Contract ID="2
<UnwantedPatterns>
<Pattern>0</Pattern>
<Pattern>1</Pattern>
</Contract>
What I would like it's having the contract ID as key and the unwanted patterns as value.
Here's my code:
UnwantedPatterns = []
key = []
DictUP = {}
for ID in root.xpath('//Contracts'):
key = ID.xpath('./Contract/#ID')
for patterns in root.xpath('.//Contract/UnwantedPatterns/Pattern'):
DictUP[key] = UnwantedPatterns.append(patterns.text)
I get the error "unhashable type: 'list'". Thank you for your help, the output should look like that:
{1: 0,1
2: 0,1}

xpath returns list, so instead of
key = ID.xpath('./Contract/#ID')
try
key = ID.xpath('./Contract/#ID')[0]
As for output, as dictionary cannot have multiple values with the same key DictUP[key] = UnwantedPatterns.append(patterns.text) will overwrite value on each iteration.
Try
for ID in root.xpath('//Contracts'):
key = ID.xpath('./Contract/#ID')[0]
_patterns = []
for unwanted in root.xpath('.//Contract/UnwantedPatterns'):
_patterns.extend([pattern.text for pattern in unwanted.xpath('./Pattern')])
DictUP[key] = _patterns

Related

how to get desired output in python from following output

I am getting this output as pasted below .
[{'accel-world-infinite-burst-2016': 'https://yts.mx/torrent/download/92E58C7C69D015DA528D8D7F22844BF49D702DFC'}, {'accel-world-infinite-burst-2016': 'https://yts.mx/torrent/download/3086E306E7CB623F377B6F99261F82CC8BB57115'}, {'accel-world-infinite-burst-2016': 'https://yifysubtitles.org/movie-imdb/tt5923132'}, {'anna-to-the-infinite-power-1983': 'https://yts.mx/torrent/download/E92B664EE87663D7E5EC8E9FEED574C586A95A62'}, {'anna-to-the-infinite-power-1983': 'https://yts.mx/torrent/download/4F6F194996AC29924DB7596FB646C368C4E4224B'}, {'anna-to-the-infinite-power-1983': 'https://yts.mx/movies/anna-to-the-infinite-power-1983/request-subtitle'}, {'infinite-2021': 'https://yts.mx/torrent/download/304DB2FEC8901E996B066B74E5D5C010D2F818B4'}, {'infinite-2021': 'https://yts.mx/torrent/download/1320D6D3B332399B2F4865F36823731ABD1444C0'}, {'infinite-2021': 'https://yts.mx/torrent/download/45821E5B2E339382E7EAEFB2D89967BB2C9835F6'}, {'infinite-2021': 'https://yifysubtitles.org/movie-imdb/tt6654210'}, {'infinite-potential-the-life-ideas-of-david-bohm-2020': 'https://yts.mx/torrent/download/47EB04FBC7DC37358F86A5BFC115A0361F019B5B'}, {'infinite-potential-the-life-ideas-of-david-bohm-2020': 'https://yts.mx/torrent/download/88223BEAA09D0A3D8FB7EEA62BA9C5EB5FDE9282'}, {'infinite-potential-the-life-ideas-of-david-bohm-2020': 'https://yts.mx/movies/infinite-potential-the-life-ideas-of-david-bohm-2020/request-subtitle'}, {'the-infinite-man-2014': 'https://yts.mx/torrent/download/0E2ACFF422AF4F62877F59EAE4EF93C0B3623828'}, {'the-infinite-man-2014': 'https://yts.mx/torrent/download/52437F80F6BDB6FD326A179FC8A63003832F5896'}, {'the-infinite-man-2014': 'https://yifysubtitles.org/movie-imdb/tt2553424'}, {'nick-and-norahs-infinite-playlist-2008': 'https://yts.mx/torrent/download/DA101D139EE3668EEC9EC5B855B446A39C6C5681'}, {'nick-and-norahs-infinite-playlist-2008': 'https://yts.mx/torrent/download/8759CD554E8BB6CFFCFCE529230252AC3A22D4D4'}, {'nick-and-norahs-infinite-playlist-2008': 'https://yifysubtitles.org/movie-imdb/tt0981227'}]
As you can see each movie have multiple links and for each link movie name is repeating .I want all links related to same movie must appeared as same object e.g
[{accel-world-infinite-burst-2016:{link1,link2,link3,link4},........]
for item in li:
# print(item.partition("movies/")[2])
movieName["Movies"].append(item.partition("movies/")[2])
req=requests.get(item)
s=soup(req.text,"html.parser")
m=s.find_all("p",{"class":"hidden-xs hidden-sm"})
# print(m[0])
for a in m[0].find_all('a', href=True):
# movieName['Movies'][item.partition("movies/")[2]]=(a['href'])
downloadLinks.append ( {item.partition("movies/")[2]:a['href'] })

you can try this,
# input = your list of dict
otp_dict = {}
for l in input:
for key, value in l.items():
if key not in otp_dict:
otp_dict[key] = list([value])
else:
otp_dict[key].append(value)
print(otp_dict)
otp: {'accel-world-infinite-burst-2016':[link1,link2],...}
output is dict containing list of links if you want set as you mentioned in your desired op try this
for l in input:
for key, value in l.items():
if key not in otp_dict:
otp_dict[key] = set([value])
else:
otp_dict[key].add(value)
otp: {'accel-world-infinite-burst-2016':{link1,link2},...}

Element XML parsing not giving proper result

I have sample below XML file and I am trying to generate below JSON but I am not geeting expected result it is only add one document in dictionary.
Sample Input XML:
<results status="passed">
<num-records>2</num-records>
<records>
<volume-info>
<flexible-volume-info>
<agg-name>aggr1_split</aggregate-name>
</flexible-volume-info>
<volume-name>volume1</volume-name>
<volume-size>
<actual-size>44</actual-volume-size>
<afs-avail>90</afs-avail>
</volume-size>
</volume-info>
<volume-info>
<flexible-volume-info>
<agg-name>aggr2_split</aggregate-name>
</flexible-volume-info>
<volume-name>volume2</volume-name>
<volume-size>
<actual-size>10</actual-volume-size>
<afs-avail>14</afs-avail>
</volume-size>
</volume-info>
</records>
</results>
Expected Output:
{
"agg-name": "aggr1_split",
"volume-name": "volume1",
"actual-size": "44"
},
{
"agg-name": "aggr2_split",
"volume-name": "volume2",
"actual-size": "10"
}
Sample code:
result = {}
for child in root.iter("records"):
result['agg-name'] = child.find('volume-info/flexible-volume-info/agg-name').text
result['volume-name'] = child.find('volume-info/volume-name').text
result['actual-size'] = child.find('volume-info/volume-size/actual-size').text
print result

Your expected output would be a dictionary which contained multiple identical keys which is not possible. You either need to choose different keys for each iteration of your loop or better still have a list of dictionaries:
import xml.etree.ElementTree as ET
xml_data = """<results status="passed">
<num-records>2</num-records>
<records>
<volume-info>
<flexible-volume-info>
<agg-name>aggr1_split</agg-name>
</flexible-volume-info>
<volume-name>volume1</volume-name>
<volume-size>
<actual-size>44</actual-size>
<afs-avail>90</afs-avail>
</volume-size>
</volume-info>
<volume-info>
<flexible-volume-info>
<agg-name>aggr2_split</agg-name>
</flexible-volume-info>
<volume-name>volume2</volume-name>
<volume-size>
<actual-size>10</actual-size>
<afs-avail>14</afs-avail>
</volume-size>
</volume-info>
</records>
</results>"""
root = ET.fromstring(xml_data)
results = []
for child in root.iter("volume-info"):
result = {}
print(child)
result['agg-name'] = child.find('flexible-volume-info/agg-name').text
result['volume-name'] = child.find('volume-name').text
result['actual-size'] = child.find('volume-size/actual-size').text
results.append(result)
print(results)
This would give you:
[{'agg-name': 'aggr1_split', 'volume-name': 'volume1', 'actual-size': '44'}, {'agg-name': 'aggr2_split', 'volume-name': 'volume2', 'actual-size': '10'}]
Your XML is also badly formed, the open and closing tags do not always match.

Dictionary key and value flipping themselves unexpectedly

I am running python 3.5, and I've defined a function that creates XML SubElements and adds them under another element. The attributes are in a dictionary, but for some reason the dictionary keys and values will sometimes flip when I execute the script.
Here is a snippet of kind of what I have (the code is broken into many functions so I combined it here)
import xml.etree.ElementTree as ElementTree
def AddSubElement(parent, tag, text='', attributes = None):
XMLelement = ElementTree.SubElement(parent, tag)
XMLelement.text = text
if attributes != None:
for key, value in attributes:
XMLelement.set(key, value)
print("attributes =",attributes)
return XMLelement
descriptionTags = ([('xmlns:g' , 'http://base.google.com/ns/1.0')])
XMLroot = ElementTree.Element('rss')
XMLroot.set('version', '2.0')
XMLchannel = ElementTree.SubElement(XMLroot,'channel')
AddSubElement(XMLchannel,'g:description', 'sporting goods', attributes=descriptionTags )
AddSubElement(XMLchannel,'link', 'http://'+ domain +'/')
XMLitem = AddSubElement(XMLchannel,'item')
AddSubElement(XMLitem, 'g:brand', Product['ProductManufacturer'], attributes=bindingParam)
AddSubElement(XMLitem, 'g:description', Product['ProductDescriptionShort'], attributes=bindingParam)
AddSubElement(XMLitem, 'g:price', Product['ProductPrice'] + ' USD', attributes=bindingParam)
The key and value does get switched! Because I'll see this in the console sometimes:
attributes = [{'xmlns:g', 'http://base.google.com/ns/1.0'}]
attributes = [{'http://base.google.com/ns/1.0', 'xmlns:g'}]
attributes = [{'http://base.google.com/ns/1.0', 'xmlns:g'}]
...
And here is the xml string that sometimes comes out:
<rss version="2.0">
<channel>
<title>example.com</title>
<g:description xmlns:g="http://base.google.com/ns/1.0">sporting goods</g:description>
<link>http://www.example.com/</link>
<item>
<g:id http://base.google.com/ns/1.0="xmlns:g">8987983</g:id>
<title>Some cool product</title>
<g:brand http://base.google.com/ns/1.0="xmlns:g">Cool</g:brand>
<g:description http://base.google.com/ns/1.0="xmlns:g">Why is this so cool?</g:description>
<g:price http://base.google.com/ns/1.0="xmlns:g">69.00 USD</g:price>
...
What is causing this to flip?

attributes = [{'xmlns:g', 'http://base.google.com/ns/1.0'}]
This is a list containing a set, not a dictionary. Neither sets nor dictionaries are ordered.

How to associate values of tags with label of the tag the using ElementTree in a Pythonic way

I have some xml files I am trying to process.
Here is a derived sample from one of the files
fileAsString = """
<?xml version="1.0" encoding="utf-8"?>
<eventDocument>
<schemaVersion>X2</schemaVersion>
<eventTable>
<eventTransaction>
<eventTitle>
<value>Some Event</value>
</eventTitle>
<eventDate>
<value>2003-12-31</value>
</eventDate>
<eventCoding>
<eventType>47</eventType>
<eventCode>A</eventCode>
<footnoteId id="F1"/>
<footnoteId id="F2"/>
</eventCoding>
<eventCycled>
<value></value>
</eventCycled>
<eventAmounts>
<eventVoltage>
<value>40000</value>
</eventVoltage>
</eventAmounts>
</eventTransaction>
</eventTable>
</eventDocument>"""
Note, there can be many eventTables in each document and events can have more details then just the ones I have isolated.
My goal is to create a dictionary in the following form
{'eventTitle':'Some Event, 'eventDate':'2003-12-31','eventType':'47',\
'eventCode':'A', 'eventCoding_FTNT_1':'F1','eventCoding_FTNT_2':'F2',\
'eventCycled': , 'eventVoltage':'40000'}
I am actually reading these in from files but assuming I have a string my code to get the text for the elements right below the eventTransaction element where the text is inside a value tag is as follows
import xml.etree.cElementTree as ET
myXML = ET.fromstring(fileAsString)
eventTransactions = [ e for e in myXML.iter() if e.tag == 'eventTransaction']
testTransaction = eventTransactions[0]
my_dict = {}
for child_of in testTransaction:
grand_children_tags = [e.tag for e in child_of]
if grand_children_tags == ['value']:
my_dict[child_of.tag] = [e.text for e in child_of][0]
>>> my_dict
{'eventTitle': 'Some Event', 'eventCycled': None, 'eventDate': '2003-12-31'}
This seems wrong because I am not really taking advantage of xml instead I am using brute force but I have not seemed to find an example.
Is there a clearer and more pythonic way to create the output I am looking for?

Use XPath to pull out the elements you're interested in.
The following code creates a list of lists of dicts (i.e. tables/transactions/info):
tables = []
myXML = ET.fromstring(fileAsString)
for table in myXML.findall('./eventTable'):
transactions = []
tables.append(transactions)
for transaction in table.findall('./eventTransaction'):
info = {}
for element in table.findall('.//*[value]'):
info[element.tag] = element.find('./value').text or ''
coding = transaction.find('./eventCoding')
if coding is not None:
for tag in 'eventType', 'eventCode':
element = coding.find('./%s' % tag)
if element is not None:
info[tag] = element.text or ''
for index, element in enumerate(coding.findall('./footnoteId')):
info['eventCoding_FTNT_%d' % index] = element.get('id', '')
if info:
transactions.append(info)
Output:
[[{'eventCode': 'A',
'eventCoding_FTNT_0': 'F1',
'eventCoding_FTNT_1': 'F2',
'eventCycled': '',
'eventDate': '2003-12-31',
'eventTitle': 'Some Event',
'eventType': '47',
'eventVoltage': '40000'}]]

Extracting the value of a key with BeautifulSoup

I want to extract the value of the "archivo" key of something like this:
...
<applet name="bla" code="Any.class" archive="Any.jar">
<param name="abc" value="space='1' archivo='bla.jpg'" </param>
<param name="def" value="space='2' archivo='bli.jpg'" </param>
<param name="jkl" value="space='3' archivo='blu.jpg'" </param>
</applet>
...
I suppose I need a list with [bla.jpg, bli.jpg, ...], so I try options like:
inputTag = soup.findAll("param",{'value':'archivo'})
or
inputTag = soup.findAll(attrs={"value" : "archivo"})
or
inputTag = soup.findAll("archivo")
and always I get an empty list: []
Other unsuccessful options:
inputTag = soup.findAll("param",{"value" : "archivo"}.contents)
I get something like: a dict object hasn't attribute contents
inputTag = unicode(getattr(soup.findAll('archivo'), 'string', ''))
I get nothing.
Finally I have seen: Difference between attrMap and attrs in beautifulSoup, and:
for tag in soup.recursiveChildGenerator():
print tag['archivo']
find nothing, it must be tag of name, code or archive keys.
and more finally:
tag.attrs = [(key,value) for key,value in tag.attrs if key == 'archivo']
but tag.attrs find nothing
OK, with jcollado's help I could get the list this way:
imageslist = []
patron = re.compile(r"archivo='([\w\./]+)'")
for tag in soup.findAll('param'):
if patron.search(tag['value']):
imageslist.append(patron.search(tag['value']).group(1))

The problem here is that archivo isn't an attribute of param, but something inside the value attribute. To extract archivo from value, I suggest to use a regular expression as follows:
>>> archivo_regex = re.compile(r"archivo='([\w\./]+)'")
>>> [archivo_regex.search(tag['value']).group(1)
... for tag in soup.findAll('param')]
[u'bla.jpg', u'bli.jpg', u'blu.jpg']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create a dictionary from an XML using xpath - python

Related

how to get desired output in python from following output

Element XML parsing not giving proper result

Dictionary key and value flipping themselves unexpectedly

How to associate values of tags with label of the tag the using ElementTree in a Pythonic way

Extracting the value of a key with BeautifulSoup

Categories

Resources