How to merge lists of dictionaries - python

With lists of dictionaries such as the following:
user_course_score = [
{'course_id': 1456, 'score': 56},
{'course_id': 316, 'score': 71}
]
courses = [
{'course_id': 1456, 'name': 'History'},
{'course_id': 316, 'name': 'Science'},
{'course_id': 926, 'name': 'Geography'}
]
What is the best way to combine them into the following list of dictionaries:
user_course_information = [
{'course_id': 1456, 'score': 56, 'name': 'History'},
{'course_id': 316, 'score': 71, 'name': 'Science'},
{'course_id': 926, 'name': 'Geography'} # Note: the student did not take this test
]
Or would it be better to store the data differently, such as:
courses = {
'1456': 'History',
'316': 'Science',
'926': 'Geography'
}
Thanks for your help.

Here's a possible solution:
def merge_lists(l1, l2, key):
merged = {}
for item in l1+l2:
if item[key] in merged:
merged[item[key]].update(item)
else:
merged[item[key]] = item
return merged.values()
courses = merge_lists(user_course_score, courses, 'course_id')
Produces:
[{'course_id': 1456, 'name': 'History', 'score': 56},
{'course_id': 316, 'name': 'Science', 'score': 71},
{'course_id': 926, 'name': 'Geography'}]
As you can see, I used a dictionary ('merged') as a halfway point. Of course, you can skip a step by storing your data differently, but it depends also on the other uses you may have for those variables.
All the best.

dictionary is basically a list of (key, value) pairs.
in your case,
user_course_score can just be a dictionary of (course_id, score) rather than a list of dictionaries (you are just complicating it unnecessarily)
similarly, course can just be a dictionary of (course_id, name)
what you have suggested in the end is the right way :)

Rahul is correct; a list of dictionaries is not the right way to do this. Think about it like this: dictionaries are mappings between pieces of data. Your final example, courses, is the right way to store the data; you could then do something like this to store the per-user data:
courses = {
1456: 'History',
316: 'Science',
926: 'Geography'
} # Note the lack of quotes
test_scores = {
1456: { <user-id>: <score on History test> },
316: { <user-id>: <score on History test> },
926: { <user-id>: <score on History test> }
}

You could also try:
[
course.update(score) for course
in courses for score in user_course_score
if course['course_id'] == score['course_id']
]
:)

Related

How to create a new dictionary from two other dictionaries taking different keys and values with iteration?

I have two dictionaries as input.
The first one contains pin numbers as keys and as values - the probability of a certain code.
`dict_1={
8063: {'code15': 99.61, 'code17': 96.14}, 9621: {'15': 88.59}, 1583: {'code17': 99.37, 'code14':87.37},
7631: {'code17': 99.88}, 4345: {'code11': 99.97, 'code12': 99.93}, 1799: {'code11': 99.8, 'code12': 99.18},
3604: {'code17': 98.77}, 1098: {'code12': 99.96}, 3752: {'code11': 99.95}}`
The second one has clusters as keys with list of pin numbers and names in a dictionary as values.
`dict_2={ 0: {
'count': 4,
'id': [{'pin': 8063,'name': 'John'},
{'pin': 9621,'name': 'Maria'},
{'pin': 1583,'name': 'Peter'},
{'pin': 7631,'name': 'Jess'}]},
3: {
'count': 5,
'id': [
{'pin': 4345,'name': 'George'},
{'pin': 1799,'name': 'Kevin'},
{'pin': 3604,'name': 'Sarah'},
{'pin': 1098,'name': 'Stewie'},
{'pin': 3752, 'name': 'Jan'}]}
}`
I want to create a dictionary that has the clusters from dict_2 as keys and as values - a dictionary with the codes from dict_1 as keys and the probability, pins and names in a list as values. This should be the output:
`merged_dict = {
0: {'code15': [99.61, 8063, 'John', 88.59, 9621,'Maria'],
'code17':[99.37, 1583, 'Peter', 96.14, 8063, 'John',99.88, 7631, 'Jess', 98.77, 3604, 'Sarah'],
'code14':[87.37, 1583, 'Peter'] },
3: {'code11': [99.97, 4345, 'George', 99.8, 1799, 'Kevin', 99.95, 3752, 'Jan'],
'code12':[99.93, 4345, 'George', 99.18, 1799, 'Kevin', 99.96, 1098, 'Stewie']}
}`
I've tried to append merge the dictionaries by pin as a common element, but no success.
Below is a solution function to achieve the result:
def merge_dict(dict_1, dict_2):
dict_3 = {}
for cluster, data in dict_2.items():
dict_3[cluster] = {}
for pin in data['id']:
for code, prob in dict_1[pin['pin']].items():
if code in dict_3[cluster]:
dict_3[cluster][code][0] += prob
dict_3[cluster][code].append(pin['pin'])
dict_3[cluster][code].append(pin['name'])
else:
dict_3[cluster][code] = [prob, pin['pin'], pin['name']]
return dict_3

Python match nested structure

Im trying to define some logic, that verifies everything in one nested dictionary belongs to another nested nested dictionary.
Ie:
official_data = {
'Name': 'John Smith',
'ID': 123123232,
'Family': [
{'Name': 'Sarah Smith','ID': 12312323},
{'Name': 'Joe Smith','ID': 12312324}
{'Name': 'Tim Smith','ID': 12312325}
{'Name': 'Sally Smith','ID': 12312326}
],
'Info': {
'InfoList': [
{'text': ['Personal Info Message']},
{'text': ['Secondary Message']}
]
}
}
sample_data = {
'Family': [
{"Name": 'Joe Smith'}
],
'Info': {
'InfoList': [
{'text': ['Secondary Message']}
]
}
}
matches(official_data, sample_data) # True, because everything in sample data exists in official_data, despite official_data having MORE values.
different_sample = {
'Info': {
'InfoList': [{}]
}
}
matches(official_data, different_sample) # True, because the structure of Dict -> Dict -> List -> Dict exists
bad_data = {'ID': 54242343}
matches(official_data, bad_data) # False, because the ID of bad_data is not the ID of official_data
other_bad_data = {
'Info': {
'InfoList': {}
}
}
matches(official_data, other_bad_data) # False, because InfoList is a list in official data
I have a feeling such logic SHOULD be easy to implement, or has already been implemented and is in wide use, but I am struggling to find what i want, and implementing it on my own becomes complicated, with recursive solutions and casting lists into sets in order to make sure order is ignored.
Im wondering if im missing something obvious, or if this logic is actually really niche and would have to be designed from scratch.

Recursively sort a list of nested dictionaries by value

I have a list of dictionaries, themselves with nested lists of dictionaries. All of the nest levels have a similar structure, thankfully. I desire to sort these nested lists of dictionaries. I grasp the technique to sort a list of dictionaries by value. I'm struggling with the recursion that will sort the inner lists.
def reorder(l, sort_by):
# I have been trying to add a recursion here
# so that the function calls itself for each
# nested group of "children". So far, fail
return sorted(l, key=lambda k: k[sort_by])
l = [
{ 'name': 'steve',
'children': [
{ 'name': 'sam',
'children': [
{'name': 'sally'},
{'name': 'sabrina'}
]
},
{'name': 'sydney'},
{'name': 'sal'}
]
},
{ 'name': 'fred',
'children': [
{'name': 'fritz'},
{'name': 'frank'}
]
}
]
print(reorder(l, 'name'))
def reorder(l, sort_by):
l = sorted(l, key=lambda x: x[sort_by])
for item in l:
if "children" in item:
item["children"] = reorder(item["children"], sort_by)
return l
Since you state "I grasp the technique to sort a list of dictionaries by value" I will post some code for recursively gathering data from another SO post I made, and leave it to you to implement your sorting technique. The code:
myjson = {
'transportation': 'car',
'address': {
'driveway': 'yes',
'home_address': {
'state': 'TX',
'city': 'Houston'}
},
'work_address': {
'state': 'TX',
'city': 'Sugarland',
'location': 'office-tower',
'salary': 30000}
}
def get_keys(some_dictionary, parent=None):
for key, value in some_dictionary.items():
if '{}.{}'.format(parent, key) not in my_list:
my_list.append('{}.{}'.format(parent, key))
if isinstance(value, dict):
get_keys(value, parent='{}.{}'.format(parent, key))
else:
pass
my_list = []
get_keys(myjson, parent='myjson')
print(my_list)
Is intended to retrieve all keys recursively from the json file. It outputs:
['myjson.address',
'myjson.address.home_address',
'myjson.address.home_address.state',
'myjson.address.home_address.city',
'myjson.address.driveway',
'myjson.transportation',
'myjson.work_address',
'myjson.work_address.state',
'myjson.work_address.salary',
'myjson.work_address.location',
'myjson.work_address.city']
The main thing to note is that if isinstance(value, dict): results in get_keys() being called again, hence the recursive capabilities of it (but only for nested dictionaries in this case).

faster and more 'pythonic' list of dictionaries

For simplicity, I've provided 2 lists in a list, but I'm actually dealing with a hundred of lists in a list, each containing a sizable amount of dictionaries. I only want to get the value of 'status' key in the 1st dictionary without checking any other dictionaries in that list (since I know they all contain the same value at that key). Then I will perform some sort of clustering within each big dictionary. I need to efficiently concatenate all 'title' values. Is there a way to make my code more elegant and much faster?
I have:
nested = [
[
{'id': 287, 'title': 'hungry badger', 'status': 'High'},
{'id': 437, 'title': 'roadtrip to Kansas','status': 'High'}
],
[
{'id': 456, 'title': 'happy title here','status': 'Medium'},
{'id': 342,'title': 'soft big bear','status': 'Medium'}
]
]
I'd like:
result = [
{
'High': [
{'id': 287, 'title': 'hungry badger'},
{'id': 437, 'title': 'roadtrip to Kansas'}
]
},
{
'Medium': [
{'id': 456, 'title': 'happy title here'},
{'id': 342, 'title': 'soft big bear'}
]
}
]
What I tried:
for oneList in nested:
result= {}
for i in oneList:
a= list(i.keys())
m= [i[key] for key in a if key not in ['id','title']]
result[m[0]]=oneList
for key in a:
if key not in ['id','title']:
del i[key]
from itertools import groupby
result = groupby(sum(nested,[]), lambda x: x['status'])
How it works:
sum(nested,[]) concatenates all your outer lists together into one big list of dictionaries
groupby(, lambda x: x['status']) groups all your objects by their status property
Note itertools.groupby returns a generator (not a list), so if you want to materialize the generator you need to do something like follows.
from itertools import groupby
result = groupby(sum(nested,[]), lambda x: x['status'])
result = {key:list(val) for key,val in result}
You could make a defaultdict for each nested list:
import collections
nested = [
[{'id': 287, 'title': 'hungry badger', 'status': 'High'},
{'id': 437, 'title': 'roadtrip to Kansas','status': 'High'}],
[{'id': 456, 'title': 'happy title here','status': 'Medium'},
{'id': 342,'title': 'soft big bear','status': 'Medium'}] ]
result = []
for l in nested:
r = collections.defaultdict(list)
for d in l:
name = d.pop('status')
r[name].append(d)
result.append(r)
This gives the following result:
>>> import pprint
>>> pprint.pprint(result)
[{'High': [{'id': 287, 'title': 'hungry badger'},
{'id': 437, 'title': 'roadtrip to Kansas'}]},
{'Medium': [{'id': 456, 'title': 'happy title here'},
{'id': 342, 'title': 'soft big bear'}]}]

Loop in dict python

I have already looked around but have not found any help for this.
This is my dict:
{'id': 1, 'name': 'Studio Pierrot'}
{'id': 29, 'name': 'VAP'}
{'id': 102, 'name': 'FUNimation Entertainment'}
{'id': 148, 'name': 'Hakusensha'}
{'id': 238, 'name': 'AT-X'}
{'id': 751, 'name': 'Marvelous AQL'}
{'id': 1211, 'name': 'Tokyo MX'}
aproducers = an.info['Producers'][0]['name']
for key in aproducers:
print key
The output is like:
S
t
u
d
i
o
...
I want to output just Studio Pierrot,VAP,FUNimation Entertainment...
You’re looping over a string, the single name value of the first producer. You need to loop over the producers instead:
for producer in an.info['Producers']:
print producer['name']
I suggest you to use the methods keys() values() items() and to use nested dicts
for the last question you can just use:
listproducer = []
for producer in an.info['Producers']:
listproducer.append( producer['name'] )

Categories