Problems parsing XML with lxml

Problems parsing XML with lxml - python

I've been trying to parse an XML feed into a Pandas dataframe and can't work out where I'm going wrong.
import pandas as pd
import requests
import lxml.objectify
path = "http://www2.cineworld.co.uk/syndication/listings.xml"
xml = lxml.objectify.parse(path)
root = xml.getroot()
The next bit of code is to parse through the bits I want and create a list of show dictionaries.
shows_list = []
for r in root.cinema:
rec = {}
rec['name'] = r.attrib['name']
rec['info'] = r.attrib["root"] + r.attrib['url']
listing = r.find("listing")
for f in listing.film:
film = rec
film['title'] = f.attrib['title']
film['rating'] = f.attrib['rating']
shows = f.find("shows")
for s in shows['show']:
show = rec
show['time'] = s.attrib['time']
show['url'] = s.attrib['url']
#print show
shows_list.append(rec)
df = pd.DataFrame(show_list)
When I run the code, the film and time field seems to be replicated multiple times within rows. However, if I put a print statement into the code (it's commented out), the dictionaries appear to as I would expect.
What am I doing wrong? Please feel free to let me know if there's a more pythonic way of doing the parsing process.
EDIT: To clarify:
These are the last five rows of the data if I use a print statement to check what's happening as I loop through.
{'info': 'http://cineworld.co.uk/cinemas/107/information', 'rating': 'TBC', 'name': 'Cineworld Stoke-on-Trent', 'title': "Dad's Army", 'url': '/booking?performance=4729365&seats=STANDARD', 'time': '2016-02-07T20:45:00'}
{'info': 'http://cineworld.co.uk/cinemas/107/information', 'rating': 'TBC', 'name': 'Cineworld Stoke-on-Trent', 'title': "Dad's Army", 'url': '/booking?performance=4729366&seats=STANDARD', 'time': '2016-02-08T20:45:00'}
{'info': 'http://cineworld.co.uk/cinemas/107/information', 'rating': 'TBC', 'name': 'Cineworld Stoke-on-Trent', 'title': "Dad's Army", 'url': '/booking?performance=4729367&seats=STANDARD', 'time': '2016-02-09T20:45:00'}
{'info': 'http://cineworld.co.uk/cinemas/107/information', 'rating': 'TBC', 'name': 'Cineworld Stoke-on-Trent', 'title': "Dad's Army", 'url': '/booking?performance=4729368&seats=STANDARD', 'time': '2016-02-10T20:45:00'}
{'info': 'http://cineworld.co.uk/cinemas/107/information', 'rating': 'TBC', 'name': 'Cineworld Stoke-on-Trent', 'title': "Dad's Army", 'url': '/booking?performance=4729369&seats=STANDARD', 'time': '2016-02-11T20:45:00'}
{'info': 'http://cineworld.co.uk/cinemas/107/information', 'rating': 'PG', 'name': 'Cineworld Stoke-on-Trent', 'title': 'Autism Friendly Screening - Goosebumps', 'url': '/booking?performance=4782937&seats=STANDARD', 'time': '2016-02-07T11:00:00'}
This is the end of the list:
...
{'info': 'http://cineworld.co.uk/cinemas/107/information',
'name': 'Cineworld Stoke-on-Trent',
'rating': 'PG',
'time': '2016-02-07T11:00:00',
'title': 'Autism Friendly Screening - Goosebumps',
'url': '/booking?performance=4782937&seats=STANDARD'},
{'info': 'http://cineworld.co.uk/cinemas/107/information',
'name': 'Cineworld Stoke-on-Trent',
'rating': 'PG',
'time': '2016-02-07T11:00:00',
'title': 'Autism Friendly Screening - Goosebumps',
'url': '/booking?performance=4782937&seats=STANDARD'},
{'info': 'http://cineworld.co.uk/cinemas/107/information',
'name': 'Cineworld Stoke-on-Trent',
'rating': 'PG',
'time': '2016-02-07T11:00:00',
'title': 'Autism Friendly Screening - Goosebumps',
'url': '/booking?performance=4782937&seats=STANDARD'},
{'info': 'http://cineworld.co.uk/cinemas/107/information',
'name': 'Cineworld Stoke-on-Trent',
'rating': 'PG',
'time': '2016-02-07T11:00:00',
'title': 'Autism Friendly Screening - Goosebumps',
'url': '/booking?performance=4782937&seats=STANDARD'}]

Your code only has one object that keeps getting updated: rec. Try this:
from copy import copy
shows_list = []
for r in root.cinema:
rec = {}
rec['name'] = r.attrib['name']
rec['info'] = r.attrib["root"] + r.attrib['url']
listing = r.find("listing")
for f in listing.film:
film = copy(rec) # New object
film['title'] = f.attrib['title']
film['rating'] = f.attrib['rating']
shows = f.find("shows")
for s in shows['show']:
show = copy(film) # New object, changed reference
show['time'] = s.attrib['time']
show['url'] = s.attrib['url']
#print show
shows_list.append(show) # Changed reference
df = pd.DataFrame(show_list)
With this structure, the data in rec is copied into each film, and the data in each film is copied into each show. Then, at the end, show is added to the shows_list.
You might want to read this article to learn more about what's happening in your line film = rec, i.e. you are giving another name to the original dictionary rather than creating a new dictionary.

Related

How save a json file in python from api response when the class is a list and object is not serializable

I have tried to find the answer but I could not find it
I am looking for the way to save in my computer a json file from python.
I call the API
configuration = api.Configuration()
configuration.api_key['X-XXXX-Application-ID'] = 'xxxxxxx'
configuration.api_key['X-XXX-Application-Key'] = 'xxxxxxxx1'
## List our parameters as search operators
opts= {
'title': 'Deutsche Bank',
'body': 'fraud',
'language': ['en'],
'published_at_start': 'NOW-7DAYS',
'published_at_end': 'NOW',
'per_page': 1,
'sort_by': 'relevance'
}
try:
## Make a call to the Stories endpoint for stories that meet the criteria of the search operators
api_response = api_instance.list_stories(**opts)
## Print the returned story
pp(api_response.stories)
except ApiException as e:
print('Exception when calling DefaultApi->list_stories: %s\n' % e)
I got the response like this
[{'author': {'avatar_url': None, 'id': 1688440, 'name': 'Pranav Nair'},
'body': 'The law firm will investigate whether the bank or its officials have '
'engaged in securities fraud or unlawful business practices. '
'Industries: Bank Referenced Companies: Deutsche Bank',
'categories': [{'confident': False,
'id': 'IAB11-5',
'level': 2,
'links': {'_self': 'https://,
'parent': 'https://'},
'score': 0.39,
'taxonomy': 'iab-qag'},
{'confident': False,
'id': 'IAB3-12',
'level': 2,
'links': {'_self': 'https://api/v1/classify/taxonomy/iab-qag/IAB3-12',
'score': 0.16,
'taxonomy': 'iab-qag'},
'clusters': [],
'entities': {'body': [{'indices': [[168, 180]],
'links': {'dbpedia': 'http://dbpedia.org/resource/Deutsche_Bank'},
'score': 1.0,
'text': 'Deutsche Bank',
'types': ['Bank',
'Organisation',
'Company',
'Banking',
'Agent']},
{'indices': [[80, 95]],
'links': {'dbpedia': 'http://dbpedia.org/resource/Securities_fraud'},
'score': 1.0,
'text': 'securities fraud',
'types': ['Practice', 'Company']},
'hashtags': ['#DeutscheBank', '#Bank', '#SecuritiesFraud'],
'id': 3004661328,
'keywords': ['Deutsche',
'behalf',
'Bank',
'firm',
'investors',
'Deutsche Bank',
'bank',
'fraud',
'unlawful'],
'language': 'en',
'links': {'canonical': None,
'coverages': '/coverages?story_id=3004661328',
'permalink': 'https://www.snl.com/interactivex/article.aspx?KPLT=7&id=58657069',
'related_stories': '/related_stories?story_id=3004661328'},
'media': [],
'paragraphs_count': 1,
'published_at': datetime.datetime(2020, 5, 19, 16, 8, 5, tzinfo=tzutc()),
'sentences_count': 2,
'sentiment': {'body': {'polarity': 'positive', 'score': 0.599704},
'title': {'polarity': 'neutral', 'score': 0.841333}},
'social_shares_count': {'facebook': [],
'google_plus': [],
'source': {'description': None,
'domain': 'snl.com',
'home_page_url': 'http://www.snl.com/',
'id': 8256,
'links_in_count': None,
'locations': [{'city': 'Charlottesville',
'country': 'US',
'state': 'Virginia'}],
'logo_url': None,
'name': 'SNL Financial',
'scopes': [{'city': None,
'country': 'US',
'level': 'national',
'state': None},
{'city': None,
'country': None,
'level': 'international',
'state': None}],
'title': None},
'summary': {'sentences': ['The law firm will investigate whether the bank or '
'its officials have engaged in securities fraud or '
'unlawful business practices.',
'Industries: Bank Referenced Companies: Deutsche '
'Bank']},
'title': "Law firm to investigate Deutsche Bank's US ops on behalf of "
'investors',
'translations': {'en': None},
'words_count': 26}]
In the documentation says "Stories you retrieve from the API are returned as JSON objects by default. These JSON story objects contain 22 top-level fields, whereas a full story object will contain 95 unique data points"
The class is a list. When I have tried to save json file I have the error "TypeError: Object of type Story is not JSON serializable".
How I can save a json file in my computer?

The response you got is not json, json uses double quotes, but here its single quotes. Copy paste your response in the following link to see the issues
http://json.parser.online.fr/.
If you change it like
[{"author": {"avatar_url": None, "id": 1688440, "name": "Pranav Nair"},
"body": "......
It will work, You can use python json module to do it
import json
json.loads(the_dict_got_from_response).
But it should be the duty of the API provider to, To make it working you can json load the result you got.

Getting a dict key's value as a list (the value is a dict)

I am fairly new to Python so my apologies if the terminology is mistaken. I assume it would be easier to explain through the code itself.
{'continue': {'rvcontinue': '20160625113031|17243371', 'continue': '||'},
'query': {'pages': {'4270': {'pageid': 4270,
'ns': 0,
'title': 'Bulgaristan',
'revisions': [{'revid': 17077223,
'parentid': 16909061,
'user': '85.103.140.217',
'anon': '',
'userid': 0,
'timestamp': '2016-05-11T15:30:31Z',
'comment': 'BULGARİSTAN',
'tags': ['visualeditor']},
{'revid': 17077230,
'parentid': 17077223,
'user': 'GurayKant',
'userid': 406350,
'timestamp': '2016-05-11T15:31:31Z',
'comment': '[[Özel:Katkılar/Muratero|Muratero]] ([[Kullanıcı_mesaj:Muratero|mesaj]]) tarafından yapılmış 16907788 numaralı değişiklikler geri getirildi. ([[VP:TW|TW]])',
'tags': []},
{'revid': 17079353,
'parentid': 17077230,
'user': '85.105.16.34',
'anon': '',
'userid': 0,
'timestamp': '2016-05-12T11:03:43Z',
'comment': 'Dipnotta 2001 sayımı verilmesine rağmen burada 2011 yazılmış.',
'tags': ['visualeditor']},
{'revid': 17085285,
'parentid': 17079353,
'user': 'İazak',
'userid': 200435,
'timestamp': '2016-05-14T09:36:18Z',
'comment': 'Gerekçe: Etnik dağılım kaynağı 2001, nüfus sayımı 2011 tarihli.',
'tags': []},
{'revid': 17109975,
'parentid': 17085285,
'user': 'Kudelski',
'userid': 167898,
'timestamp': '2016-05-21T13:14:44Z',
'comment': 'Düzeltme.',
'tags': []}]}}}}
From this code I want to get the values of 'pageid', 'title', 'revid', 'user', 'userid', 'timestamp', 'comment', 'tags'.
I can use y['query']['pages'] however I do not want to continue with '4270' since that number will be changing each time I run the API.
I hope it is explanatory enough! Thank you very much! I can give additional info if necessary!!

You can use list() to convert .keys() or .values() to list and get only first element.
data = {...your dictionary...}
item = list(data['query']['pages'].values())[0]
print('title:', item['title'])
print('pageid:', item['pageid'])
for x in item['revisions']:
print('---')
print('revid:', x['revid'])
print('user:', x['user'])
print('userid:', x['userid'])
print('timestamp:', x['timestamp'])
print('comment:', x['comment'])
print('tags:', x['tags'])
But if you have many pages in this directory then you should use for-loop
for key, item in data['query']['pages'].items():
print('key:', key)
print('title:', item['title'])
print('pageid:', item['pageid'])
for x in item['revisions']:
print('---')
print('revid:', x['revid'])
print('user:', x['user'])
print('userid:', x['userid'])
print('timestamp:', x['timestamp'])
print('comment:', x['comment'])
print('tags:', x['tags'])

Python parsing JSON nested data

I am trying to parse this JSON data from the setlist.fm api. I am trying to get all the song names in order from each setlist. I have looked around but none of the methods describe on the internet are working.
Here is the JSON data
{'itemsPerPage': 20,
'page': 1,
'setlist': [{'artist': {'disambiguation': '',
'mbid': 'cc197bad-dc9c-440d-a5b5-d52ba2e14234',
'name': 'Coldplay',
'sortName': 'Coldplay',
'tmid': 806431,
'url': 'https://www.setlist.fm/setlists/coldplay-3d6bde3.html'},
'eventDate': '15-11-2017',
'id': '33e0845d',
'info': 'Last show of the A Head Full of Dreams Tour',
'lastUpdated': '2017-11-23T14:51:05.000+0000',
'sets': {'set': [{'song': [{'cover': {'disambiguation': '',
'mbid': '9dee40b2-25ad-404c-9c9a-139feffd4b57',
'name': 'Maria Callas',
'sortName': 'Callas, Maria',
'url': 'https://www.setlist.fm/setlists/maria-callas-33d6706d.html'},
'name': 'O mio babbino caro',
'tape': True},
{'info': 'extended intro with Charlie '
'Chaplin speech',
'name': 'A Head Full of Dreams'},
{'name': 'Yellow'},
{'name': 'Every Teardrop Is a '
'Waterfall'},
{'name': 'The Scientist'},
{'info': 'with "Oceans" excerpt in '
'intro',
'name': 'God Put a Smile Upon Your '
'Face'},
{'info': 'with Tiësto Remix outro',
'name': 'Paradise'}]},
{'name': 'B-Stage',
'song': [{'name': 'Always in My Head'},
{'name': 'Magic'},
{'info': 'single version',
'name': 'Everglow'}]},
{'name': 'A-Stage',
'song': [{'info': 'with "Army of One" excerpt '
'in intro',
'name': 'Clocks'},
{'info': 'partial',
'name': 'Midnight'},
{'name': 'Charlie Brown'},
{'name': 'Hymn for the Weekend'},
{'info': 'with "Midnight" excerpt in '
'intro',
'name': 'Fix You'},
{'name': 'Viva la Vida'},
{'name': 'Adventure of a Lifetime'},
{'cover': {'disambiguation': '',
'mbid': '3f8a5e5b-c24b-4068-9f1c-afad8829e06b',
'name': 'Soda Stereo',
'sortName': 'Soda Stereo',
'tmid': 1138263,
'url': 'https://www.setlist.fm/setlists/soda-stereo-7bd6d204.html'},
'name': 'De música ligera'}]},
{'name': 'C-Stage',
'song': [{'info': 'extended',
'name': 'Kaleidoscope',
'tape': True},
{'info': 'acoustic',
'name': 'In My Place'},
{'name': 'Amor Argentina'}]},
{'name': 'A-Stage',
'song': [{'cover': {'mbid': '2c82c087-8300-488e-b1e4-0b02b789eb18',
'name': 'The Chainsmokers '
'& Coldplay',
'sortName': 'Chainsmokers, '
'The & '
'Coldplay',
'url': 'https://www.setlist.fm/setlists/the-chainsmokers-and-coldplay-33ce5029.html'},
'name': 'Something Just Like This'},
{'name': 'A Sky Full of Stars'},
{'info': 'Extended Outro; followed by '
'‘Believe In Love’ Tour '
'Conclusion Video',
'name': 'Up&Up'}]}]},
'tour': {'name': 'A Head Full of Dreams'},
'url': 'https://www.setlist.fm/setlist/coldplay/2017/estadio-ciudad-de-la-plata-la-plata-argentina-33e0845d.html',
'venue': {'city': {'coords': {'lat': -34.9313889,
'long': -57.9488889},
'country': {'code': 'AR', 'name': 'Argentina'},
'id': '3432043',
'name': 'La Plata',
'state': 'Buenos Aires',
'stateCode': '01'},
'id': '3d62153',
'name': 'Estadio Ciudad de La Plata',
'url': 'https://www.setlist.fm/venue/estadio-ciudad-de-la-plata-la-plata-argentina-3d62153.html'},
'versionId': '7b4ce6d0'},
{'artist': {'disambiguation': '',
'mbid': 'cc197bad-dc9c-440d-a5b5-d52ba2e14234',
'name': 'Coldplay',
'sortName': 'Coldplay',
'tmid': 806431,
'url': 'https://www.setlist.fm/setlists/coldplay-3d6bde3.html'},
'eventDate': '14-11-2017',
'id': '63e08ec7',
'info': '"Paradise", "Something Just Like This" and "De música '
'ligera" were soundchecked',
'lastUpdated': '2017-11-15T02:40:25.000+0000',
'sets': {'set': [{'song': [{'cover': {'disambiguation': '',
'mbid': '9dee40b2-25ad-404c-9c9a-139feffd4b57',
'name': 'Maria Callas',
'sortName': 'Callas, Maria',
'url': 'https://www.setlist.fm/setlists/maria-callas-33d6706d.html'},
'name': 'O mio babbino caro',
'tape': True},
{'info': 'extended intro with Charlie '
'Chaplin speech',
'name': 'A Head Full of Dreams'},
{'name': 'Yellow'},
{'name': 'Every Teardrop Is a '
'Waterfall'},
{'name': 'The Scientist'},
{'info': 'with "Oceans" excerpt in '
'intro',
'name': 'Birds'},
{'info': 'with Tiësto Remix outro',
'name': 'Paradise'}]},
{'name': 'B-Stage',
'song': [{'name': 'Always in My Head'},
{'name': 'Magic'},
{'info': 'single version; dedicated '
'to the Argentinian victims '
'of the New York terrorist '
'attack',
'name': 'Everglow'}]},
{'name': 'A-Stage',
'song': [{'info': 'with "Army of One" excerpt '
'in intro',
'name': 'Clocks'},
{'info': 'partial',
'name': 'Midnight'},
{'name': 'Charlie Brown'},
{'name': 'Hymn for the Weekend'},
{'info': 'with "Midnight" excerpt in '
'intro',
'name': 'Fix You'},
{'name': 'Viva la Vida'},
{'name': 'Adventure of a Lifetime'},
{'cover': {'disambiguation': '',
'mbid': '3f8a5e5b-c24b-4068-9f1c-afad8829e06b',
'name': 'Soda Stereo',
'sortName': 'Soda Stereo',
'tmid': 1138263,
'url': 'https://www.setlist.fm/setlists/soda-stereo-7bd6d204.html'},
'info': 'Coldplay debut',
'name': 'De música ligera'}]},
{'name': 'C-Stage',
'song': [{'info': 'Part 1: "The Guest House"',
'name': 'Kaleidoscope',
'tape': True},
{'info': 'acoustic; Will on lead '
'vocals',
'name': 'In My Place'},
{'info': 'song made for Argentina',
'name': 'Amor Argentina'},
{'info': 'Part 2: "Amazing Grace"',
'name': 'Kaleidoscope',
'tape': True}]},
{'name': 'A-Stage',
'song': [{'name': 'Life Is Beautiful'},
{'cover': {'mbid': '2c82c087-8300-488e-b1e4-0b02b789eb18',
'name': 'The Chainsmokers '
'& Coldplay',
'sortName': 'Chainsmokers, '
'The & '
'Coldplay',
'url': 'https://www.setlist.fm/setlists/the-chainsmokers-and-coldplay-33ce5029.html'},
'name': 'Something Just Like This'},
{'name': 'A Sky Full of Stars'},
{'name': 'Up&Up'}]}]},
This is part of the JSON I grabbed from the URL.
Below is the code I am trying touse:
import requests
import json
from pprint import*
url = "https://api.setlist.fm/rest/1.0/artist/cc197bad-dc9c-440d-a5b5-d52ba2e14234/setlists?p=1"
headers = {'x-api-key': 'API-KEY',
'Accept': 'application/json'}
r = requests.get(url, headers=headers)
data = json.loads(r.text)
#pprint(r.json())
response = data['setlist']
#pprint(response)
for item in response:
pprint(item['sets']['set']['song']['name'])
However I get this error that I cannot resolve nor find any help online with:
pprint(item['sets']['set']['song']['name'])
TypeError: list indices must be integers or slices, not str

Dictionaries (Dict) are accessed by keys.
Lists are accessed by indexes.
i.e.
# Dict get 'item'.
data = {'key': 'item'}
data['key']
# List get 'item0'.
data = ['item0', 'item1']
data[0]
# Dict with List get 'item0'.
data = {'key': ['item0', 'item1']}
data['key'][0]
Both storage types can be nested in JSON and either needs to be accessed in a
different manner.
You have nested Lists which need to be indexed through and that can be done by
a for loop.
I have no access to workable json data except for the Python incomplete object
that you show so I have not tested my code. Thus, no assurance that this
is correct. If not, it may demonstrate how to do the task.
import requests
import json
from pprint import *
url = "https://api.setlist.fm/rest/1.0/artist/cc197bad-dc9c-440d-a5b5-d52ba2e14234/setlists?p=1"
headers = {'x-api-key': 'API-KEY',
'Accept': 'application/json'}
r = requests.get(url, headers=headers)
data = json.loads(r.text)
result = []
for setlist_item in data['setlist']:
for set_item in setlist_item['sets']['set']:
for song_item in set_item['song']:
result += [song_item['name']]
print(result)
Each for loop is processing each list to finally get to extending the result with
each song name.

Access data into list of dictionaries python

I have a list of dictionaries, with some nested dictionaries inside:
[{'id': '67569006',
'kind': 'analytics#accountSummary',
'name': 'Adopt-a-Hydrant',
'webProperties': [{'id': 'UA-62536006-1',
'internalWebPropertyId': '102299473',
'kind': 'analytics#webPropertySummary',
'level': 'STANDARD',
'name': 'Adopt-a-Hydrant',
'profiles': [{'id': '107292146',
'kind': 'analytics#profileSummary',
'name': 'Adopt a Hydrant view1',
'type': 'WEB'},
{'id': '1372982608',
'kind': 'analytics#profileSummary',
'name': 'Unfiltered view',
'type': 'WEB'}],
'websiteUrl': 'https://example1.com/'}]},
{'id': '44824959',
'kind': 'analytics#accountSummary',
'name': 'Adorn',
'webProperties': [{'id': 'UA-62536006-1',
'internalWebPropertyId': '75233390',
'kind': 'analytics#webPropertySummary',
'level': 'STANDARD',
'name': 'Website 2',
'profiles': [{'id': '77736192',
'kind': 'analytics#profileSummary',
'name': 'All Web Site Data',
'type': 'WEB'}],
'websiteUrl': 'http://www.example2.com'}]},
]
I'm trying to print the site name, url & view, if the site have 2 or more views print them all, and this is where it gets tricky.
So far I've tried:
all_properties = [The list above]
for single_property in all_properties:
single_propery_name=single_property['name']
view_name=single_property['webProperties'][0]['profiles'][0]['name']
view_id=single_property['webProperties'][0]['profiles'][0]['id']
print(single_propery_name, view_name, view_id)
This almost work, but it prints only the first view profile>name of each property, however some properties have more than one view and I need also these views to get print out.
The output now is:
Adopt-a-Hydrant Adopt a Hydrant view1 107292146
Website 2 All Web Site Data 77736192
So it's skipping the second view of the first property. I tried nesting a sub for loop but I can't get it to work, the final output should be:
Adopt-a-Hydrant Adopt a Hydrant view1 107292146
Adopt-a-Hydrant Unfiltered View 1372982608
Website 2 All Web Site Data 77736192
Any ideas on how to get that?

You need to iterate through the profiles list for each single_property:
for single_property in all_properties:
single_property_name = single_property['name']
for profile in single_property['webProperties'][0]['profiles']:
view_name = profile['name']
view_id = profile['id']
print(single_property_name, view_name, view_id)
It would probably help if you read a little in the python docs about lists and how to iterate through them

Just another proposition with oneline loops:
for single_property in data:
single_propery_name=single_property['name']
view_name = [i['name'] for i in single_property['webProperties'][0]['profiles']]
view_id = [i['id'] for i in single_property['webProperties'][0]['profiles']]
print(single_propery_name, view_name, view_id)
The point is that you will have to loop inside the lists. You could also make objects, if you think your Data would be more manageable.

If you're getting really confused, don't be afraid to just make variable.
Look at how much more readable this is:
for item in data:
webProperties = item['webProperties'][0]
print("Name: " + webProperties["name"])
print("URL: " + webProperties["websiteUrl"])
print("PRINTING VIEWS\n")
print("----------------------------")
views = webProperties['profiles']
for view in views:
print("ID: " + view['id'])
print("Kind: " + view['kind'])
print("Name: " + view['name'])
print("Type: " + view['type'])
print("----------------------------")
print("\n\n\n")
Data is defined as the information you gave us:
data = [{'id': '67569006',
'kind': 'analytics#accountSummary',
'name': 'Adopt-a-Hydrant',
'webProperties': [{'id': 'UA-62536006-1',
'internalWebPropertyId': '102299473',
'kind': 'analytics#webPropertySummary',
'level': 'STANDARD',
'name': 'Adopt-a-Hydrant',
'profiles': [{'id': '107292146',
'kind': 'analytics#profileSummary',
'name': 'Adopt a Hydrant view1',
'type': 'WEB'},
{'id': '1372982608',
'kind': 'analytics#profileSummary',
'name': 'Unfiltered view',
'type': 'WEB'}],
'websiteUrl': 'https://example1.com/'}]},
{'id': '44824959',
'kind': 'analytics#accountSummary',
'name': 'Adorn',
'webProperties': [{'id': 'UA-62536006-1',
'internalWebPropertyId': '75233390',
'kind': 'analytics#webPropertySummary',
'level': 'STANDARD',
'name': 'Website 2',
'profiles': [{'id': '77736192',
'kind': 'analytics#profileSummary',
'name': 'All Web Site Data',
'type': 'WEB'}],
'websiteUrl': 'http://www.example2.com'}]},
]

Convert dictionary lists to multi-dimensional list of dictionaries

I've been trying to convert the following:
data = {'title':['doc1','doc2','doc3'], 'name':['test','check'], 'id':['ddi5i'] }
to:
[{'title':'doc1', 'name': 'test', 'id': 'ddi5i'},
{'title':'doc2', 'name': 'test', 'id': 'ddi5i'},
{'title':'doc3', 'name': 'test', 'id': 'ddi5i'},
{'title':'doc1', 'name': 'check', 'id': 'ddi5i'},
{'title':'doc2', 'name': 'check', 'id': 'ddi5i'},
{'title':'doc3', 'name': 'check', 'id': 'ddi5i'}]
I've tried various options (list comprehensions, pandas and custom code) but nothing seems to work. For example, the following:
panda.DataFrame(data).to_dict('list')
throws an error because, since it tries to map the lists, all of them have to be of the same length. Besides, the output would only be uni-dimensional which is not what I'm looking for.

itertools.product may be what you're looking for here, and it can be applied to the values of your data to get appropriate value groupings for the new dicts. Something like
list(dict(zip(data, ele)) for ele in product(*data.values()))
Demo
>>> from itertools import product
>>> list(dict(zip(data, ele)) for ele in product(*data.values()))
[{'id': 'ddi5i', 'name': 'test', 'title': 'doc1'},
{'id': 'ddi5i', 'name': 'test', 'title': 'doc2'},
{'id': 'ddi5i', 'name': 'test', 'title': 'doc3'},
{'id': 'ddi5i', 'name': 'check', 'title': 'doc1'},
{'id': 'ddi5i', 'name': 'check', 'title': 'doc2'},
{'id': 'ddi5i', 'name': 'check', 'title': 'doc3'}]
It is clear how this works once seeing
>>> list(product(*data.values()))
[('test', 'doc1', 'ddi5i'),
('test', 'doc2', 'ddi5i'),
('test', 'doc3', 'ddi5i'),
('check', 'doc1', 'ddi5i'),
('check', 'doc2', 'ddi5i'),
('check', 'doc3', 'ddi5i')]
and now it is just a matter of zipping back into a dict with the original keys.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problems parsing XML with lxml - python

Related

How save a json file in python from api response when the class is a list and object is not serializable

Getting a dict key's value as a list (the value is a dict)

Python parsing JSON nested data

Access data into list of dictionaries python

Convert dictionary lists to multi-dimensional list of dictionaries

Categories

Resources