faster and more 'pythonic' list of dictionaries - python

For simplicity, I've provided 2 lists in a list, but I'm actually dealing with a hundred of lists in a list, each containing a sizable amount of dictionaries. I only want to get the value of 'status' key in the 1st dictionary without checking any other dictionaries in that list (since I know they all contain the same value at that key). Then I will perform some sort of clustering within each big dictionary. I need to efficiently concatenate all 'title' values. Is there a way to make my code more elegant and much faster?
I have:
nested = [
[
{'id': 287, 'title': 'hungry badger', 'status': 'High'},
{'id': 437, 'title': 'roadtrip to Kansas','status': 'High'}
],
[
{'id': 456, 'title': 'happy title here','status': 'Medium'},
{'id': 342,'title': 'soft big bear','status': 'Medium'}
]
]
I'd like:
result = [
{
'High': [
{'id': 287, 'title': 'hungry badger'},
{'id': 437, 'title': 'roadtrip to Kansas'}
]
},
{
'Medium': [
{'id': 456, 'title': 'happy title here'},
{'id': 342, 'title': 'soft big bear'}
]
}
]
What I tried:
for oneList in nested:
result= {}
for i in oneList:
a= list(i.keys())
m= [i[key] for key in a if key not in ['id','title']]
result[m[0]]=oneList
for key in a:
if key not in ['id','title']:
del i[key]

from itertools import groupby
result = groupby(sum(nested,[]), lambda x: x['status'])
How it works:
sum(nested,[]) concatenates all your outer lists together into one big list of dictionaries
groupby(, lambda x: x['status']) groups all your objects by their status property
Note itertools.groupby returns a generator (not a list), so if you want to materialize the generator you need to do something like follows.
from itertools import groupby
result = groupby(sum(nested,[]), lambda x: x['status'])
result = {key:list(val) for key,val in result}

You could make a defaultdict for each nested list:
import collections
nested = [
[{'id': 287, 'title': 'hungry badger', 'status': 'High'},
{'id': 437, 'title': 'roadtrip to Kansas','status': 'High'}],
[{'id': 456, 'title': 'happy title here','status': 'Medium'},
{'id': 342,'title': 'soft big bear','status': 'Medium'}] ]
result = []
for l in nested:
r = collections.defaultdict(list)
for d in l:
name = d.pop('status')
r[name].append(d)
result.append(r)
This gives the following result:
>>> import pprint
>>> pprint.pprint(result)
[{'High': [{'id': 287, 'title': 'hungry badger'},
{'id': 437, 'title': 'roadtrip to Kansas'}]},
{'Medium': [{'id': 456, 'title': 'happy title here'},
{'id': 342, 'title': 'soft big bear'}]}]

Related

Python, sort dict based on external list

I have to sort a dict like:
jobs = {'elem_05': {'id': 'fifth'},
'elem_03': {'id': 'third'},
'elem_01': {'id': 'first'},
'elem_00': {'id': 'zeroth'},
'elem_04': {'id': 'fourth'},
'elem_02': {'id': 'second'}}
based on the "id" elements, whose order can be found in a list:
sorting_list = ['zeroth', 'first', 'second', 'third', 'fourth', 'fifth']
The trivial way to solve the problem is to use:
tmp = {}
for x in sorting_list:
for k, v in jobs.items():
if v["id"] == x:
tmp.update({k: v})
but I was trying to figure out a more efficient and pythonic way.
I've been trying sorted and lambda functions as key, but I'm not familiar with that yet, so I was unsuccessful so far.
I would use a dictionary as key for sorted:
order = {k:i for i,k in enumerate(sorting_list)}
# {'zeroth': 0, 'first': 1, 'second': 2, 'third': 3, 'fourth': 4, 'fifth': 5}
out = dict(sorted(jobs.items(), key=lambda x: order.get(x[1].get('id'))))
output:
{'elem_00': {'id': 'zeroth'},
'elem_01': {'id': 'first'},
'elem_02': {'id': 'second'},
'elem_03': {'id': 'third'},
'elem_04': {'id': 'fourth'},
'elem_05': {'id': 'fifth'}}
There is a way to sort the dict using lambda as a sorting key:
jobs = {'elem_05': {'id': 'fifth'},
'elem_03': {'id': 'third'},
'elem_01': {'id': 'first'},
'elem_00': {'id': 'zeroth'},
'elem_04': {'id': 'fourth'},
'elem_02': {'id': 'second'}}
sorting_list = ['zeroth', 'first', 'second', 'third', 'fourth', 'fifth']
sorted_jobs = dict(sorted(jobs.items(), key=lambda x: sorting_list.index(x[1]['id'])))
print(sorted_jobs)
This outputs
{'elem_00': {'id': 'zeroth'}, 'elem_01': {'id': 'first'}, 'elem_02': {'id': 'second'}, 'elem_03': {'id': 'third'}, 'elem_04': {'id': 'fourth'}, 'elem_05': {'id': 'fifth'}}
I have a feeling the sorted expression could be cleaner but I didn't get it to work any other way.
You can use OrderedDict:
from collections import OrderedDict
sorted_jobs = OrderedDict([(el, jobs[key]['id']) for el, key in zip(sorting_list, jobs.keys())])
This creates an OrderedDict object which is pretty similar to dict, and can be converted to dict using dict(sorted_jobs).
Similar to what is already posted, but with error checking in case id doesn't appear in sorting_list
sorting_list = ['zeroth', 'first', 'second', 'third', 'fourth', 'fifth']
jobs = {'elem_05': {'id': 'fifth'},
'elem_03': {'id': 'third'},
'elem_01': {'id': 'first'},
'elem_00': {'id': 'zeroth'},
'elem_04': {'id': 'fourth'},
'elem_02': {'id': 'second'}}
def custom_order(item):
try:
return sorting_list.index(item[1]["id"])
except ValueError:
return len(sorting_list)
jobs_sorted = {k: v for k, v in sorted(jobs.items(), key=custom_order)}
print(jobs_sorted)
The sorted function costs O(n log n) in average time complexity. For a linear time complexity you can instead create a reverse mapping that maps each ID to the corresponding dict entry:
mapping = {d['id']: (k, d) for k, d in jobs.items()}
so that you can then construct a new dict by mapping sorting_list with the ID mapping above:
dict(map(mapping.get, sorting_list))
which, with your sample input, returns:
{'elem_00': {'id': 'zeroth'}, 'elem_01': {'id': 'first'}, 'elem_02': {'id': 'second'}, 'elem_03': {'id': 'third'}, 'elem_04': {'id': 'fourth'}, 'elem_05': {'id': 'fifth'}}
Demo: https://replit.com/#blhsing/WorseChartreuseFonts

understanding nested python dict comprehension

I am getting along with dict comprehensions and trying to understand how the below 2 dict comprehensions work:
select_vals = ['name', 'pay']
test_dict = {'data': [{'name': 'John', 'city': 'NYC', 'pay': 70000}, {'name': 'Mike', 'city': 'NYC', 'pay': 80000}, {'name': 'Kate', 'city': 'Houston', 'pay': 65000}]}
dict_comp1 = [{key: item[key] for key in select_vals } for item in test_dict['data'] if item['pay'] > 65000 ]
The above line gets me
[{'name': 'John', 'pay': 70000}, {'name': 'Mike', 'pay': 80000}]
dict_comp2 = [{key: item[key]} for key in select_vals for item in test_dict['data'] if item['pay'] > 65000 ]
The above line gets me
[{'name': 'John'}, {'name': 'Mike'}, {'pay': 70000}, {'pay': 80000}]
How does the two o/ps vary when written in a for loop ? When I execute in a for loop
dict_comp3 = []
for key in select_vals:
for item in test_dict['data']:
if item['pay'] > 65000:
dict_comp3.append({key: item[key]})
print(dict_comp3)
The above line gets me same as dict_comp2
[{'name': 'John'}, {'name': 'Mike'}, {'pay': 70000}, {'pay': 80000}]
How do I get the o/p as dict_comp1 in a for loop ?
The select vals iteration should be the inner one
result = []
for item in test_dict['data']:
if item['pay'] > 65000:
aux = {}
for key in select_vals:
aux[key] = item[key]
result.append(aux)

Fastest method: for value in DictA, find value in DictB and retrieve other DictB value?

For each item in dictA, I want to search for it in dictB, if dictB has it then I want to pull some other values from dictB and add it to dictA.
An example that is working is here, however it is rather slow as I have 50,000+ items to search through and it will perform this similar function on multiple dicts.
Is there a fast method of performing this search?
dictA = [
{'id': 12345},
{'id': 67890},
{'id': 11111},
{'id': 22222}
]
dictB = [
{'id': 63351, 'name': 'Bob'},
{'id': 12345, 'name': 'Carl'},
{'id': 59933, 'name': 'Amy'},
{'id': 11111, 'name': 'Chris'}
]
for i in dictA:
name = None
for j in dictB:
if i['id'] == j['id']:
name = j['name']
i['name'] = name
The dictA output after this would be:
dictA = [
{'id': 12345, 'name': 'Carl'},
{'id': 67890, 'name': None},
{'id': 11111, 'name': 'Chris'},
{'id': 22222, 'name': None}
]
The given is list of dict. You can create dict from that assuming id is uninque. Converting from list of dict to dict will work for your case.
dictA = [
{'id': 12345},
{'id': 67890},
{'id': 11111},
{'id': 22222}
]
dictB = [
{'id': 63351, 'name': 'Bob'},
{'id': 12345, 'name': 'Carl'},
{'id': 59933, 'name': 'Amy'},
{'id': 11111, 'name': 'Chris'}
]
actual_dictB = dict()
for d in dictB:
actual_dictB[d['id']] = d['name']
for i in dictA:
i['name'] = actual_dictB.pop(i['id'], None) # now search have became O(1) constant. So best time complexity achived O(n) n=length of dictA
print(dictA)
Follow up for additional question:
actual_dictB = dict()
for d in dictB:
id_ = d['id']
d.pop('id')
actual_dictB[id_] = d
tmp = dict([(k,None) for k in dictB[0].keys() if k!='id'])
for i in dictA:
if i['id'] not in actual_dictB:
i.update(tmp)
else:
i.update(actual_dictB[i['id']])
print(dictA)

Loop in dict python

I have already looked around but have not found any help for this.
This is my dict:
{'id': 1, 'name': 'Studio Pierrot'}
{'id': 29, 'name': 'VAP'}
{'id': 102, 'name': 'FUNimation Entertainment'}
{'id': 148, 'name': 'Hakusensha'}
{'id': 238, 'name': 'AT-X'}
{'id': 751, 'name': 'Marvelous AQL'}
{'id': 1211, 'name': 'Tokyo MX'}
aproducers = an.info['Producers'][0]['name']
for key in aproducers:
print key
The output is like:
S
t
u
d
i
o
...
I want to output just Studio Pierrot,VAP,FUNimation Entertainment...
You’re looping over a string, the single name value of the first producer. You need to loop over the producers instead:
for producer in an.info['Producers']:
print producer['name']
I suggest you to use the methods keys() values() items() and to use nested dicts
for the last question you can just use:
listproducer = []
for producer in an.info['Producers']:
listproducer.append( producer['name'] )

How to merge lists of dictionaries

With lists of dictionaries such as the following:
user_course_score = [
{'course_id': 1456, 'score': 56},
{'course_id': 316, 'score': 71}
]
courses = [
{'course_id': 1456, 'name': 'History'},
{'course_id': 316, 'name': 'Science'},
{'course_id': 926, 'name': 'Geography'}
]
What is the best way to combine them into the following list of dictionaries:
user_course_information = [
{'course_id': 1456, 'score': 56, 'name': 'History'},
{'course_id': 316, 'score': 71, 'name': 'Science'},
{'course_id': 926, 'name': 'Geography'} # Note: the student did not take this test
]
Or would it be better to store the data differently, such as:
courses = {
'1456': 'History',
'316': 'Science',
'926': 'Geography'
}
Thanks for your help.
Here's a possible solution:
def merge_lists(l1, l2, key):
merged = {}
for item in l1+l2:
if item[key] in merged:
merged[item[key]].update(item)
else:
merged[item[key]] = item
return merged.values()
courses = merge_lists(user_course_score, courses, 'course_id')
Produces:
[{'course_id': 1456, 'name': 'History', 'score': 56},
{'course_id': 316, 'name': 'Science', 'score': 71},
{'course_id': 926, 'name': 'Geography'}]
As you can see, I used a dictionary ('merged') as a halfway point. Of course, you can skip a step by storing your data differently, but it depends also on the other uses you may have for those variables.
All the best.
dictionary is basically a list of (key, value) pairs.
in your case,
user_course_score can just be a dictionary of (course_id, score) rather than a list of dictionaries (you are just complicating it unnecessarily)
similarly, course can just be a dictionary of (course_id, name)
what you have suggested in the end is the right way :)
Rahul is correct; a list of dictionaries is not the right way to do this. Think about it like this: dictionaries are mappings between pieces of data. Your final example, courses, is the right way to store the data; you could then do something like this to store the per-user data:
courses = {
1456: 'History',
316: 'Science',
926: 'Geography'
} # Note the lack of quotes
test_scores = {
1456: { <user-id>: <score on History test> },
316: { <user-id>: <score on History test> },
926: { <user-id>: <score on History test> }
}
You could also try:
[
course.update(score) for course
in courses for score in user_course_score
if course['course_id'] == score['course_id']
]
:)

Categories