Access data into list of dictionaries python - python

I have a list of dictionaries, with some nested dictionaries inside:
[{'id': '67569006',
'kind': 'analytics#accountSummary',
'name': 'Adopt-a-Hydrant',
'webProperties': [{'id': 'UA-62536006-1',
'internalWebPropertyId': '102299473',
'kind': 'analytics#webPropertySummary',
'level': 'STANDARD',
'name': 'Adopt-a-Hydrant',
'profiles': [{'id': '107292146',
'kind': 'analytics#profileSummary',
'name': 'Adopt a Hydrant view1',
'type': 'WEB'},
{'id': '1372982608',
'kind': 'analytics#profileSummary',
'name': 'Unfiltered view',
'type': 'WEB'}],
'websiteUrl': 'https://example1.com/'}]},
{'id': '44824959',
'kind': 'analytics#accountSummary',
'name': 'Adorn',
'webProperties': [{'id': 'UA-62536006-1',
'internalWebPropertyId': '75233390',
'kind': 'analytics#webPropertySummary',
'level': 'STANDARD',
'name': 'Website 2',
'profiles': [{'id': '77736192',
'kind': 'analytics#profileSummary',
'name': 'All Web Site Data',
'type': 'WEB'}],
'websiteUrl': 'http://www.example2.com'}]},
]
I'm trying to print the site name, url & view, if the site have 2 or more views print them all, and this is where it gets tricky.
So far I've tried:
all_properties = [The list above]
for single_property in all_properties:
single_propery_name=single_property['name']
view_name=single_property['webProperties'][0]['profiles'][0]['name']
view_id=single_property['webProperties'][0]['profiles'][0]['id']
print(single_propery_name, view_name, view_id)
This almost work, but it prints only the first view profile>name of each property, however some properties have more than one view and I need also these views to get print out.
The output now is:
Adopt-a-Hydrant Adopt a Hydrant view1 107292146
Website 2 All Web Site Data 77736192
So it's skipping the second view of the first property. I tried nesting a sub for loop but I can't get it to work, the final output should be:
Adopt-a-Hydrant Adopt a Hydrant view1 107292146
Adopt-a-Hydrant Unfiltered View 1372982608
Website 2 All Web Site Data 77736192
Any ideas on how to get that?

You need to iterate through the profiles list for each single_property:
for single_property in all_properties:
single_property_name = single_property['name']
for profile in single_property['webProperties'][0]['profiles']:
view_name = profile['name']
view_id = profile['id']
print(single_property_name, view_name, view_id)
It would probably help if you read a little in the python docs about lists and how to iterate through them

Just another proposition with oneline loops:
for single_property in data:
single_propery_name=single_property['name']
view_name = [i['name'] for i in single_property['webProperties'][0]['profiles']]
view_id = [i['id'] for i in single_property['webProperties'][0]['profiles']]
print(single_propery_name, view_name, view_id)
The point is that you will have to loop inside the lists. You could also make objects, if you think your Data would be more manageable.

If you're getting really confused, don't be afraid to just make variable.
Look at how much more readable this is:
for item in data:
webProperties = item['webProperties'][0]
print("Name: " + webProperties["name"])
print("URL: " + webProperties["websiteUrl"])
print("PRINTING VIEWS\n")
print("----------------------------")
views = webProperties['profiles']
for view in views:
print("ID: " + view['id'])
print("Kind: " + view['kind'])
print("Name: " + view['name'])
print("Type: " + view['type'])
print("----------------------------")
print("\n\n\n")
Data is defined as the information you gave us:
data = [{'id': '67569006',
'kind': 'analytics#accountSummary',
'name': 'Adopt-a-Hydrant',
'webProperties': [{'id': 'UA-62536006-1',
'internalWebPropertyId': '102299473',
'kind': 'analytics#webPropertySummary',
'level': 'STANDARD',
'name': 'Adopt-a-Hydrant',
'profiles': [{'id': '107292146',
'kind': 'analytics#profileSummary',
'name': 'Adopt a Hydrant view1',
'type': 'WEB'},
{'id': '1372982608',
'kind': 'analytics#profileSummary',
'name': 'Unfiltered view',
'type': 'WEB'}],
'websiteUrl': 'https://example1.com/'}]},
{'id': '44824959',
'kind': 'analytics#accountSummary',
'name': 'Adorn',
'webProperties': [{'id': 'UA-62536006-1',
'internalWebPropertyId': '75233390',
'kind': 'analytics#webPropertySummary',
'level': 'STANDARD',
'name': 'Website 2',
'profiles': [{'id': '77736192',
'kind': 'analytics#profileSummary',
'name': 'All Web Site Data',
'type': 'WEB'}],
'websiteUrl': 'http://www.example2.com'}]},
]

Related

How do I read a yaml file into a Jupyter notebook?

I have a file from an Open API Spec that I have been trying to access in a Jupyter notebook. It is a .yaml file. I was able to upload it into Jupyter and put it in the same folder as the notebook I'd like to use to access it. I am new to Jupyter and Python, so I'm sorry if this is a basic question. I found a forum that suggested this code to read the data (in my file: "openapi.yaml"):
import yaml
with open("openapi.yaml", 'r') as stream:
try:
print(yaml.safe_load(stream))
except yaml.YAMLError as exc:
print(exc)
This seems to bring the data in, but it is a completely unstructured stream like so:
{'openapi': '3.0.0', 'info': {'title': 'XY Tracking API', 'version': '2.0', 'contact': {'name': 'Narrativa', 'url': 'http://link, 'email': '}, 'description': 'The XY Tracking Project collects information from different data sources to provide comprehensive data for the XYs, X-Y. Contact Support:'}, 'servers': [{'url': 'link'}], 'paths': {'/api': {'get': {'summary': 'Data by date range', 'tags': [], 'responses': {'200': {'description': 'OK', 'content': {'application/json': {'schema': {'$ref': '#/components/schemas/covidtata'}}}}}, 'operationId': 'get-api', 'parameters': [{'schema': {'type': 'string', 'format': 'date'}, 'in': 'query', 'name': 'date_from', 'description': 'Date range beginig (YYYY-DD-MM)', 'required': True}, {'schema': {'type': 'string', 'format': 'date'}, 'in': 'query', 'name': 'date_to', 'description': 'Date range ending (YYYY-DD-MM)'}], 'description': 'Returns the data for a specific date range.'}}, '/api/{date}': {'parameters': [{'schema': {'type': 'string', 'format': 'date'}, 'name': 'date', 'in': 'path', 'required': True}], 'get': {'summary': 'Data by date', 'tags': [], 'responses': {'200': {'description': 'OK', 'content': {'application/json': {'schema': {'$ref': '#/components/schemas/data'}}}}}, 'operationId': 'get-api-date', 'description': 'Returns the data for a specific day.'}}, '/api/country/{country}': {'parameters': [{'schema': {'type': 'string', 'example': 'spain'}, 'name': 'country', 'in': 'path', 'required': True, 'example': 'spain'}, {'schema': {'type': 'strin
...etc.
I'd like to work through the data for analysis but can't seem to access it correctly. Any help would be extremely appreciated!!! Thank you so much for reading.
What you're seeing in the output is JSON. This is in a machine-readable format which doesn't need human-readable newlines or indentation. You should be able to work with this data just fine in your code.
Alternatively, you may want to consider another parser/emitter such as ruamel.yaml which can make dealing with YAML files considerably easier than the package you're currently importing. Print statements with this package can preserve lines and indentation for better readability.

Python parsing JSON nested data

I am trying to parse this JSON data from the setlist.fm api. I am trying to get all the song names in order from each setlist. I have looked around but none of the methods describe on the internet are working.
Here is the JSON data
{'itemsPerPage': 20,
'page': 1,
'setlist': [{'artist': {'disambiguation': '',
'mbid': 'cc197bad-dc9c-440d-a5b5-d52ba2e14234',
'name': 'Coldplay',
'sortName': 'Coldplay',
'tmid': 806431,
'url': 'https://www.setlist.fm/setlists/coldplay-3d6bde3.html'},
'eventDate': '15-11-2017',
'id': '33e0845d',
'info': 'Last show of the A Head Full of Dreams Tour',
'lastUpdated': '2017-11-23T14:51:05.000+0000',
'sets': {'set': [{'song': [{'cover': {'disambiguation': '',
'mbid': '9dee40b2-25ad-404c-9c9a-139feffd4b57',
'name': 'Maria Callas',
'sortName': 'Callas, Maria',
'url': 'https://www.setlist.fm/setlists/maria-callas-33d6706d.html'},
'name': 'O mio babbino caro',
'tape': True},
{'info': 'extended intro with Charlie '
'Chaplin speech',
'name': 'A Head Full of Dreams'},
{'name': 'Yellow'},
{'name': 'Every Teardrop Is a '
'Waterfall'},
{'name': 'The Scientist'},
{'info': 'with "Oceans" excerpt in '
'intro',
'name': 'God Put a Smile Upon Your '
'Face'},
{'info': 'with Tiësto Remix outro',
'name': 'Paradise'}]},
{'name': 'B-Stage',
'song': [{'name': 'Always in My Head'},
{'name': 'Magic'},
{'info': 'single version',
'name': 'Everglow'}]},
{'name': 'A-Stage',
'song': [{'info': 'with "Army of One" excerpt '
'in intro',
'name': 'Clocks'},
{'info': 'partial',
'name': 'Midnight'},
{'name': 'Charlie Brown'},
{'name': 'Hymn for the Weekend'},
{'info': 'with "Midnight" excerpt in '
'intro',
'name': 'Fix You'},
{'name': 'Viva la Vida'},
{'name': 'Adventure of a Lifetime'},
{'cover': {'disambiguation': '',
'mbid': '3f8a5e5b-c24b-4068-9f1c-afad8829e06b',
'name': 'Soda Stereo',
'sortName': 'Soda Stereo',
'tmid': 1138263,
'url': 'https://www.setlist.fm/setlists/soda-stereo-7bd6d204.html'},
'name': 'De música ligera'}]},
{'name': 'C-Stage',
'song': [{'info': 'extended',
'name': 'Kaleidoscope',
'tape': True},
{'info': 'acoustic',
'name': 'In My Place'},
{'name': 'Amor Argentina'}]},
{'name': 'A-Stage',
'song': [{'cover': {'mbid': '2c82c087-8300-488e-b1e4-0b02b789eb18',
'name': 'The Chainsmokers '
'& Coldplay',
'sortName': 'Chainsmokers, '
'The & '
'Coldplay',
'url': 'https://www.setlist.fm/setlists/the-chainsmokers-and-coldplay-33ce5029.html'},
'name': 'Something Just Like This'},
{'name': 'A Sky Full of Stars'},
{'info': 'Extended Outro; followed by '
'‘Believe In Love’ Tour '
'Conclusion Video',
'name': 'Up&Up'}]}]},
'tour': {'name': 'A Head Full of Dreams'},
'url': 'https://www.setlist.fm/setlist/coldplay/2017/estadio-ciudad-de-la-plata-la-plata-argentina-33e0845d.html',
'venue': {'city': {'coords': {'lat': -34.9313889,
'long': -57.9488889},
'country': {'code': 'AR', 'name': 'Argentina'},
'id': '3432043',
'name': 'La Plata',
'state': 'Buenos Aires',
'stateCode': '01'},
'id': '3d62153',
'name': 'Estadio Ciudad de La Plata',
'url': 'https://www.setlist.fm/venue/estadio-ciudad-de-la-plata-la-plata-argentina-3d62153.html'},
'versionId': '7b4ce6d0'},
{'artist': {'disambiguation': '',
'mbid': 'cc197bad-dc9c-440d-a5b5-d52ba2e14234',
'name': 'Coldplay',
'sortName': 'Coldplay',
'tmid': 806431,
'url': 'https://www.setlist.fm/setlists/coldplay-3d6bde3.html'},
'eventDate': '14-11-2017',
'id': '63e08ec7',
'info': '"Paradise", "Something Just Like This" and "De música '
'ligera" were soundchecked',
'lastUpdated': '2017-11-15T02:40:25.000+0000',
'sets': {'set': [{'song': [{'cover': {'disambiguation': '',
'mbid': '9dee40b2-25ad-404c-9c9a-139feffd4b57',
'name': 'Maria Callas',
'sortName': 'Callas, Maria',
'url': 'https://www.setlist.fm/setlists/maria-callas-33d6706d.html'},
'name': 'O mio babbino caro',
'tape': True},
{'info': 'extended intro with Charlie '
'Chaplin speech',
'name': 'A Head Full of Dreams'},
{'name': 'Yellow'},
{'name': 'Every Teardrop Is a '
'Waterfall'},
{'name': 'The Scientist'},
{'info': 'with "Oceans" excerpt in '
'intro',
'name': 'Birds'},
{'info': 'with Tiësto Remix outro',
'name': 'Paradise'}]},
{'name': 'B-Stage',
'song': [{'name': 'Always in My Head'},
{'name': 'Magic'},
{'info': 'single version; dedicated '
'to the Argentinian victims '
'of the New York terrorist '
'attack',
'name': 'Everglow'}]},
{'name': 'A-Stage',
'song': [{'info': 'with "Army of One" excerpt '
'in intro',
'name': 'Clocks'},
{'info': 'partial',
'name': 'Midnight'},
{'name': 'Charlie Brown'},
{'name': 'Hymn for the Weekend'},
{'info': 'with "Midnight" excerpt in '
'intro',
'name': 'Fix You'},
{'name': 'Viva la Vida'},
{'name': 'Adventure of a Lifetime'},
{'cover': {'disambiguation': '',
'mbid': '3f8a5e5b-c24b-4068-9f1c-afad8829e06b',
'name': 'Soda Stereo',
'sortName': 'Soda Stereo',
'tmid': 1138263,
'url': 'https://www.setlist.fm/setlists/soda-stereo-7bd6d204.html'},
'info': 'Coldplay debut',
'name': 'De música ligera'}]},
{'name': 'C-Stage',
'song': [{'info': 'Part 1: "The Guest House"',
'name': 'Kaleidoscope',
'tape': True},
{'info': 'acoustic; Will on lead '
'vocals',
'name': 'In My Place'},
{'info': 'song made for Argentina',
'name': 'Amor Argentina'},
{'info': 'Part 2: "Amazing Grace"',
'name': 'Kaleidoscope',
'tape': True}]},
{'name': 'A-Stage',
'song': [{'name': 'Life Is Beautiful'},
{'cover': {'mbid': '2c82c087-8300-488e-b1e4-0b02b789eb18',
'name': 'The Chainsmokers '
'& Coldplay',
'sortName': 'Chainsmokers, '
'The & '
'Coldplay',
'url': 'https://www.setlist.fm/setlists/the-chainsmokers-and-coldplay-33ce5029.html'},
'name': 'Something Just Like This'},
{'name': 'A Sky Full of Stars'},
{'name': 'Up&Up'}]}]},
This is part of the JSON I grabbed from the URL.
Below is the code I am trying touse:
import requests
import json
from pprint import*
url = "https://api.setlist.fm/rest/1.0/artist/cc197bad-dc9c-440d-a5b5-d52ba2e14234/setlists?p=1"
headers = {'x-api-key': 'API-KEY',
'Accept': 'application/json'}
r = requests.get(url, headers=headers)
data = json.loads(r.text)
#pprint(r.json())
response = data['setlist']
#pprint(response)
for item in response:
pprint(item['sets']['set']['song']['name'])
However I get this error that I cannot resolve nor find any help online with:
pprint(item['sets']['set']['song']['name'])
TypeError: list indices must be integers or slices, not str
Dictionaries (Dict) are accessed by keys.
Lists are accessed by indexes.
i.e.
# Dict get 'item'.
data = {'key': 'item'}
data['key']
# List get 'item0'.
data = ['item0', 'item1']
data[0]
# Dict with List get 'item0'.
data = {'key': ['item0', 'item1']}
data['key'][0]
Both storage types can be nested in JSON and either needs to be accessed in a
different manner.
You have nested Lists which need to be indexed through and that can be done by
a for loop.
I have no access to workable json data except for the Python incomplete object
that you show so I have not tested my code. Thus, no assurance that this
is correct. If not, it may demonstrate how to do the task.
import requests
import json
from pprint import *
url = "https://api.setlist.fm/rest/1.0/artist/cc197bad-dc9c-440d-a5b5-d52ba2e14234/setlists?p=1"
headers = {'x-api-key': 'API-KEY',
'Accept': 'application/json'}
r = requests.get(url, headers=headers)
data = json.loads(r.text)
result = []
for setlist_item in data['setlist']:
for set_item in setlist_item['sets']['set']:
for song_item in set_item['song']:
result += [song_item['name']]
print(result)
Each for loop is processing each list to finally get to extending the result with
each song name.

Convert dictionary lists to multi-dimensional list of dictionaries

I've been trying to convert the following:
data = {'title':['doc1','doc2','doc3'], 'name':['test','check'], 'id':['ddi5i'] }
to:
[{'title':'doc1', 'name': 'test', 'id': 'ddi5i'},
{'title':'doc2', 'name': 'test', 'id': 'ddi5i'},
{'title':'doc3', 'name': 'test', 'id': 'ddi5i'},
{'title':'doc1', 'name': 'check', 'id': 'ddi5i'},
{'title':'doc2', 'name': 'check', 'id': 'ddi5i'},
{'title':'doc3', 'name': 'check', 'id': 'ddi5i'}]
I've tried various options (list comprehensions, pandas and custom code) but nothing seems to work. For example, the following:
panda.DataFrame(data).to_dict('list')
throws an error because, since it tries to map the lists, all of them have to be of the same length. Besides, the output would only be uni-dimensional which is not what I'm looking for.
itertools.product may be what you're looking for here, and it can be applied to the values of your data to get appropriate value groupings for the new dicts. Something like
list(dict(zip(data, ele)) for ele in product(*data.values()))
Demo
>>> from itertools import product
>>> list(dict(zip(data, ele)) for ele in product(*data.values()))
[{'id': 'ddi5i', 'name': 'test', 'title': 'doc1'},
{'id': 'ddi5i', 'name': 'test', 'title': 'doc2'},
{'id': 'ddi5i', 'name': 'test', 'title': 'doc3'},
{'id': 'ddi5i', 'name': 'check', 'title': 'doc1'},
{'id': 'ddi5i', 'name': 'check', 'title': 'doc2'},
{'id': 'ddi5i', 'name': 'check', 'title': 'doc3'}]
It is clear how this works once seeing
>>> list(product(*data.values()))
[('test', 'doc1', 'ddi5i'),
('test', 'doc2', 'ddi5i'),
('test', 'doc3', 'ddi5i'),
('check', 'doc1', 'ddi5i'),
('check', 'doc2', 'ddi5i'),
('check', 'doc3', 'ddi5i')]
and now it is just a matter of zipping back into a dict with the original keys.

Mongo Distinct Query with full row object

first of all i'm new to mongo so I don't know much and i cannot just remove duplicate rows due to some dependencies.
I have following data stored in mongo
{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 2, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'},
{'id': 5, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}
you can see some of the rows are duplicate with different id
as long as it will take to solve this issue from input I must tackle it on output.
I need the data in the following way:
{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}
My query
keys = db.collection.distinct('key', {})
all_data = db.collection.find({'key': {$in: keys}})
As you can see it takes two queries for a same result set Please combine it to one as the database is very large
I might also create a unique key on the key but the value is so long (152 characters) that it will not help me.
Or it will??
You need to use the aggregation framework for this. There are multiple ways to do this, the solution below uses the $$ROOT variable to get the first document for each group:
db.data.aggregate([{
"$sort": {
"_id": 1
}
}, {
"$group": {
"_id": "$key",
"first": {
"$first": "$$ROOT"
}
}
}, {
"$project": {
"_id": 0,
"id":"$first.id",
"key":"$first.key",
"name":"$first.name",
"country":"$first.country"
}
}])

Problems parsing XML with lxml

I've been trying to parse an XML feed into a Pandas dataframe and can't work out where I'm going wrong.
import pandas as pd
import requests
import lxml.objectify
path = "http://www2.cineworld.co.uk/syndication/listings.xml"
xml = lxml.objectify.parse(path)
root = xml.getroot()
The next bit of code is to parse through the bits I want and create a list of show dictionaries.
shows_list = []
for r in root.cinema:
rec = {}
rec['name'] = r.attrib['name']
rec['info'] = r.attrib["root"] + r.attrib['url']
listing = r.find("listing")
for f in listing.film:
film = rec
film['title'] = f.attrib['title']
film['rating'] = f.attrib['rating']
shows = f.find("shows")
for s in shows['show']:
show = rec
show['time'] = s.attrib['time']
show['url'] = s.attrib['url']
#print show
shows_list.append(rec)
df = pd.DataFrame(show_list)
When I run the code, the film and time field seems to be replicated multiple times within rows. However, if I put a print statement into the code (it's commented out), the dictionaries appear to as I would expect.
What am I doing wrong? Please feel free to let me know if there's a more pythonic way of doing the parsing process.
EDIT: To clarify:
These are the last five rows of the data if I use a print statement to check what's happening as I loop through.
{'info': 'http://cineworld.co.uk/cinemas/107/information', 'rating': 'TBC', 'name': 'Cineworld Stoke-on-Trent', 'title': "Dad's Army", 'url': '/booking?performance=4729365&seats=STANDARD', 'time': '2016-02-07T20:45:00'}
{'info': 'http://cineworld.co.uk/cinemas/107/information', 'rating': 'TBC', 'name': 'Cineworld Stoke-on-Trent', 'title': "Dad's Army", 'url': '/booking?performance=4729366&seats=STANDARD', 'time': '2016-02-08T20:45:00'}
{'info': 'http://cineworld.co.uk/cinemas/107/information', 'rating': 'TBC', 'name': 'Cineworld Stoke-on-Trent', 'title': "Dad's Army", 'url': '/booking?performance=4729367&seats=STANDARD', 'time': '2016-02-09T20:45:00'}
{'info': 'http://cineworld.co.uk/cinemas/107/information', 'rating': 'TBC', 'name': 'Cineworld Stoke-on-Trent', 'title': "Dad's Army", 'url': '/booking?performance=4729368&seats=STANDARD', 'time': '2016-02-10T20:45:00'}
{'info': 'http://cineworld.co.uk/cinemas/107/information', 'rating': 'TBC', 'name': 'Cineworld Stoke-on-Trent', 'title': "Dad's Army", 'url': '/booking?performance=4729369&seats=STANDARD', 'time': '2016-02-11T20:45:00'}
{'info': 'http://cineworld.co.uk/cinemas/107/information', 'rating': 'PG', 'name': 'Cineworld Stoke-on-Trent', 'title': 'Autism Friendly Screening - Goosebumps', 'url': '/booking?performance=4782937&seats=STANDARD', 'time': '2016-02-07T11:00:00'}
This is the end of the list:
...
{'info': 'http://cineworld.co.uk/cinemas/107/information',
'name': 'Cineworld Stoke-on-Trent',
'rating': 'PG',
'time': '2016-02-07T11:00:00',
'title': 'Autism Friendly Screening - Goosebumps',
'url': '/booking?performance=4782937&seats=STANDARD'},
{'info': 'http://cineworld.co.uk/cinemas/107/information',
'name': 'Cineworld Stoke-on-Trent',
'rating': 'PG',
'time': '2016-02-07T11:00:00',
'title': 'Autism Friendly Screening - Goosebumps',
'url': '/booking?performance=4782937&seats=STANDARD'},
{'info': 'http://cineworld.co.uk/cinemas/107/information',
'name': 'Cineworld Stoke-on-Trent',
'rating': 'PG',
'time': '2016-02-07T11:00:00',
'title': 'Autism Friendly Screening - Goosebumps',
'url': '/booking?performance=4782937&seats=STANDARD'},
{'info': 'http://cineworld.co.uk/cinemas/107/information',
'name': 'Cineworld Stoke-on-Trent',
'rating': 'PG',
'time': '2016-02-07T11:00:00',
'title': 'Autism Friendly Screening - Goosebumps',
'url': '/booking?performance=4782937&seats=STANDARD'}]
Your code only has one object that keeps getting updated: rec. Try this:
from copy import copy
shows_list = []
for r in root.cinema:
rec = {}
rec['name'] = r.attrib['name']
rec['info'] = r.attrib["root"] + r.attrib['url']
listing = r.find("listing")
for f in listing.film:
film = copy(rec) # New object
film['title'] = f.attrib['title']
film['rating'] = f.attrib['rating']
shows = f.find("shows")
for s in shows['show']:
show = copy(film) # New object, changed reference
show['time'] = s.attrib['time']
show['url'] = s.attrib['url']
#print show
shows_list.append(show) # Changed reference
df = pd.DataFrame(show_list)
With this structure, the data in rec is copied into each film, and the data in each film is copied into each show. Then, at the end, show is added to the shows_list.
You might want to read this article to learn more about what's happening in your line film = rec, i.e. you are giving another name to the original dictionary rather than creating a new dictionary.

Categories