Merge list of python dictionaries using multiple keys - python

I want to merge two lists of dictionaries, using multiple keys.
I have a single list of dicts with one set of results:
l1 = [{'id': 1, 'year': '2017', 'resultA': 2},
{'id': 2, 'year': '2017', 'resultA': 3},
{'id': 1, 'year': '2018', 'resultA': 3},
{'id': 2, 'year': '2018', 'resultA': 5}]
And another list of dicts for another set of results:
l2 = [{'id': 1, 'year': '2017', 'resultB': 5},
{'id': 2, 'year': '2017', 'resultB': 8},
{'id': 1, 'year': '2018', 'resultB': 7},
{'id': 2, 'year': '2018', 'resultB': 9}]
And I want to combine them using the 'id' and 'year' keys to get the following:
all = [{'id': 1, 'year': '2017', 'resultA': 2, 'resultB': 5},
{'id': 2, 'year': '2017', 'resultA': 3, 'resultB': 8},
{'id': 1, 'year': '2018', 'resultA': 3, 'resultB': 7},
{'id': 2, 'year': '2018', 'resultA': 5, 'resultB': 9}]
I know that for combining two lists of dicts on a single key, I can use this:
l1 = {d['id']:d for d in l1}
all = [dict(d, **l1.get(d['id'], {})) for d in l2]
But it ignores the year, providing the following incorrect result:
all = [{'id': 1, 'year': '2018', 'resultA': 3, 'resultB': 5},
{'id': 2, 'year': '2018', 'resultA': 5, 'resultB': 8},
{'id': 1, 'year': '2018', 'resultA': 3, 'resultB': 7},
{'id': 2, 'year': '2018', 'resultA': 5, 'resultB': 9}]
Treating this as I would in R, by adding in the second variable I want to merge on, I get a KeyError:
l1 = {d['id','year']:d for d in l1}
all = [dict(d, **l1.get(d['id','year'], {})) for d in l2]
How do I merge using multiple keys?

Instead of d['id','year'], use the tuple (d['id'], d['year']) as your key.

You can combine both list and groupby the resulting list on id and year. Then merge the dict together that have same keys.
Grouping can be achieved by using itertools.groupby, and merge can be done using collection.ChainMap
>>> from itertools import groupby
>>> from collections import ChainMap
>>> [dict(ChainMap(*list(g))) for _,g in groupby(sorted(l1+l2, key=lambda x: (x['id'],x['year'])),key=lambda x: (x['id'],x['year']))]
>>> [{'resultA': 2, 'id': 1, 'resultB': 5, 'year': '2017'}, {'resultA': 3, 'id': 1, 'resultB': 7, 'year': '2018'}, {'resultA': 3, 'id': 2, 'resultB': 8, 'year': '2017'}, {'resultA': 5, 'id': 2, 'resultB': 9, 'year': '2018'}]
Alternatively to avoid lambda you can also use operator.itemgetter
>>> from operator import itemgetter
>>> [dict(ChainMap(*list(g))) for _,g in groupby(sorted(l1+l2, key=itemgetter('id', 'year')),key=itemgetter('id', 'year'))]

Expanding on #AlexHall's suggestion, you can use collections.defaultdict to help you:
from collections import defaultdict
d = defaultdict(dict)
for i in l1 + l2:
results = {k: v for k, v in i.items() if k not in ('id', 'year')}
d[(i['id'], i['year'])].update(results)
Result
defaultdict(dict,
{(1, '2017'): {'resultA': 2, 'resultB': 5},
(1, '2018'): {'resultA': 3, 'resultB': 7},
(2, '2017'): {'resultA': 3, 'resultB': 8},
(2, '2018'): {'resultA': 5, 'resultB': 9}})

Related

Python: Descending order and just 3 objects has a high value [duplicate]

This question already has answers here:
How do I sort a list of dictionaries by a value of the dictionary?
(20 answers)
Closed 6 months ago.
I have an array object like that, Not sort value, I want descending order and just 3 objects has a high value:
[{'id': 1, 'value': 3},
{'id': 2, 'value': 6},
{'id': 3, 'value': 8},
{'id': 4, 'value': 8},
{'id': 5, 'value': 10},
{'id': 6, 'value': 9},
{'id': 7, 'value': 8},
{'id': 8, 'value': 4},
{'id': 9, 'value': 5}]
I want result is descending order and just 3 objects have a high value, like this
[{'id': 5, 'value': 10},
{'id': 6, 'value': 9},
{'id': 7, 'value': 8},
{'id': 3, 'value': 8},
{'id': 4, 'value': 8},]
Please help me, thanks
t = [{'id': 1, 'value': 3},
{'id': 2, 'value': 6},
{'id': 3, 'value': 8},
{'id': 4, 'value': 8},
{'id': 5, 'value': 10},
{'id': 6, 'value': 9},
{'id': 7, 'value': 8}]
newlist = sorted(t, key=lambda d: d['value'])
newlist.reverse()
print(newlist[:3])
# [{'id': 5, 'value': 10}, {'id': 6, 'value': 9}, {'id': 7, 'value': 8}]
More info about list slicing
More info about reverse()
More info

How to convert key to value in dictionary type?

I have a question about the convert key.
First, I have this type of word count in Data Frame.
[Example]
dict = {'forest': 10, 'station': 3, 'office': 7, 'park': 2}
I want to get this result.
[Result]
result = {'name': 'forest', 'value': 10,
'name': 'station', 'value': 3,
'name': 'office', 'value': 7,
'name': 'park', 'value': 2}
Please check this issue.
As Rakesh said:
dict cannot have duplicate keys
The closest way to achieve what you want is to build something like that
my_dict = {'forest': 10, 'station': 3, 'office': 7, 'park': 2}
result = list(map(lambda x: {'name': x[0], 'value': x[1]}, my_dict.items()))
You will get
result = [
{'name': 'forest', 'value': 10},
{'name': 'station', 'value': 3},
{'name': 'office', 'value': 7},
{'name': 'park', 'value': 2},
]
As Rakesh said, You can't have duplicate values in the dictionary
You can simply try this.
dict = {'forest': 10, 'station': 3, 'office': 7, 'park': 2}
result = {}
count = 0;
for key in dict:
result[count] = {'name':key, 'value': dict[key]}
count = count + 1;
print(result)

Combining multiple lists of dictionaries

I have several lists of dictionaries, where each dictionary contains a unique id value that is common among all lists. I'd like to combine them into a single list of dicts, where each dict is joined on that id value.
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
desired_output = [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
I tried doing something like the answer found at https://stackoverflow.com/a/42018660/7564393, but I'm getting very confused since I have more than 2 lists. Should I try using a defaultdict approach? More importantly, I am NOT always going to know the other values, only that the id value is present in all dicts.
You can use itertools.groupby():
from itertools import groupby
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
desired_output = []
for _, values in groupby(sorted([*list1, *list2, *list3], key=lambda x: x['id']), key=lambda x: x['id']):
temp = {}
for d in values:
temp.update(d)
desired_output.append(temp)
Result:
[{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
# combine all lists
d = {} # id -> dict
for l in [list1, list2, list3]:
for list_d in l:
if 'id' not in list_d: continue
id = list_d['id']
if id not in d:
d[id] = list_d
else:
d[id].update(list_d)
# dicts with same id are grouped together since id is used as key
res = [v for v in d.values()]
print(res)
You can first build a dict of dicts, then turn it into a list:
from itertools import chain
from collections import defaultdict
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
dict_out = defaultdict(dict)
for d in chain(list1, list2, list3):
dict_out[d['id']].update(d)
out = list(dict_out.values())
print(out)
# [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
itertools.chain allows you to iterate on all the dicts contained in the 3 lists. We build a dict dict_out having the id as key, and the corresponding dict being built as value. This way, we can easily update the already built part with the small dict of our current iteration.
Here, I have presented a functional approach without using itertools (which is excellent in rapid development work).
This solution will work for any number of lists as the function takes variable number of arguments and also let user to specify the type of return output (list/dict).
By default it returns list as you want that otherwise it returns dictionary in case if you pass as_list = False.
I preferred dictionary to solve this because its fast and search complexity is also less.
Just have a look at the below get_packed_list() function.
get_packed_list()
def get_packed_list(*dicts_lists, as_list=True):
output = {}
for dicts_list in dicts_lists:
for dictionary in dicts_list:
_id = dictionary.pop("id") # id() is in-built function so preferred _id
if _id not in output:
# Create new id
output[_id] = {"id": _id}
for key in dictionary:
output[_id][key] = dictionary[key]
dictionary["id"] = _id # push back the 'id' after work (call by reference mechanism)
if as_list:
return [output[key] for key in output]
return output # dictionary
Test
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
output = get_packed_list(list1, list2, list3)
print(output)
# [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]
output = get_packed_list(list1, list2, list3, as_list=False)
print(output)
# {1: {'id': 1, 'value': 20, 'sum': 10, 'total': 30}, 2: {'id': 2, 'value': 21, 'sum': 11, 'total': 32}}
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
print(list1+list2+list3)
list1 = [{'id': 1, 'value': 20}, {'id': 2, 'value': 21}]
list2 = [{'id': 1, 'sum': 10}, {'id': 2, 'sum': 11}]
list3 = [{'id': 1, 'total': 30}, {'id': 2, 'total': 32}]
result = []
for i in range(0,len(list1)):
final_dict = dict(list(list1[i].items()) + list(list2[i].items()) + list(list3[i].items()))
result.append(final_dict)
print(result)
output : [{'id': 1, 'value': 20, 'sum': 10, 'total': 30}, {'id': 2, 'value': 21, 'sum': 11, 'total': 32}]

Separate list elements by theirs property in Python

I have list p1:
p1 = [
{'id': 1, 'area': 5},
{'id': 2, 'area': 6},
{'id': 3, 'area': 10},
{'id': 4, 'area': 6},
{'id': 5, 'area': 6},
{'id': 6, 'area': 6},
{'id': 7, 'area': 4},
{'id': 8, 'area': 4}
]
And I need to separate this list by area value, like this (p2):
p2 = {
4: [
{'id': 7, 'area': 4},
{'id': 8, 'area': 4}
],
5: [
{'id': 1, 'area': 5}
],
6: [
{'id': 2, 'area': 6},
{'id': 4, 'area': 6},
{'id': 5, 'area': 6},
{'id': 6, 'area': 6}
],
10: [
{'id': 3, 'area': 10}
]
}
My solution is:
areas = {x['area'] for x in p1}
p2 = {}
for area in areas:
p2[area] = [x for x in p1 if x['area'] == area]
It seems to work, but is there any better and more "pythonic" solution?
Using groupby you get
>>> import itertools
>>> f = lambda t: t['area']
>>> {i: list(b) for i, b in itertools.groupby(sorted(p1, key=f), key=f)}
Gives
{4: [{'area': 4, 'id': 7},
{'area': 4, 'id': 8}],
5: [{'area': 5, 'id': 1}],
6: [{'area': 6, 'id': 2},
{'area': 6, 'id': 4},
{'area': 6, 'id': 5},
{'area': 6, 'id': 6}],
10: [{'area': 10, 'id': 3}]}
edit: If you don't like using lambdas you can also do, as suggested by bro-grammer
>>> import operator
>>> f = operator.itemgetter('area')
You can simply use defaultdict:
from collections import defaultdict
result = defaultdict(list)
for i in p1:
result[i['area']].append(i)
Yes, use one of the grouping idioms. Using a vanilla dict:
In [15]: p1 = [
...: {'id': 1, 'area': 5},
...: {'id': 2, 'area': 6},
...: {'id': 3, 'area': 10},
...: {'id': 4, 'area': 6},
...: {'id': 5, 'area': 6},
...: {'id': 6, 'area': 6},
...: {'id': 7, 'area': 4},
...: {'id': 8, 'area': 4}
...: ]
In [16]: p2 = {}
In [17]: for d in p1:
...: p2.setdefault(d['area'], []).append(d)
...:
In [18]: p2
Out[18]:
{4: [{'area': 4, 'id': 7}, {'area': 4, 'id': 8}],
5: [{'area': 5, 'id': 1}],
6: [{'area': 6, 'id': 2},
{'area': 6, 'id': 4},
{'area': 6, 'id': 5},
{'area': 6, 'id': 6}],
10: [{'area': 10, 'id': 3}]}
Or more neatly, using a defaultdict:
In [23]: from collections import defaultdict
In [24]: p2 = defaultdict(list)
In [25]: for d in p1:
...: p2[d['area']].append(d)
...:
In [26]: p2
Out[26]:
defaultdict(list,
{4: [{'area': 4, 'id': 7}, {'area': 4, 'id': 8}],
5: [{'area': 5, 'id': 1}],
6: [{'area': 6, 'id': 2},
{'area': 6, 'id': 4},
{'area': 6, 'id': 5},
{'area': 6, 'id': 6}],
10: [{'area': 10, 'id': 3}]})

Splitting a list of dictionaries into several lists of dictionaries

I've been whacking away at this for a while to no avail... Any help would be greatly
appreciated.
I have:
[{'event': 0, 'voltage': 1, 'time': 0},
{'event': 0, 'voltage': 2, 'time': 1},
{'event': 1, 'voltage': 1, 'time': 2},
{'event': 1, 'voltage': 2, 'time': 3},
{'event': 2, 'voltage': 1, 'time': 4},
{'event': 2, 'voltage': 2, 'time': 5},
...]
and I want to split that list of dictionaries up per event like this (there can be arbitrarily many events):
list0 = [{'event': 0, 'voltage': 1, 'time': 0},
{'event': 0, 'voltage': 2, 'time': 1}]
list1 = [{'event': 1, 'voltage': 1, 'time': 2},
{'event': 1, 'voltage': 2, 'time': 3}]
list2 = [{'event': 2, 'voltage': 1, 'time': 4},
{'event': 2, 'voltage': 2, 'time': 5}]
listN = ...
use defaultdict
import collections
result = collections.defaultdict(list)
for d in dict_list:
result[d['event']].append(d)
result_list = result.values() # Python 2.x
result_list = list(result.values()) # Python 3
This way, you don't have to make any assumptions about how many different events there are or if there are any events missing.
This gives you a list of lists. If you want a dict indexed by event, I would probably use dict(d) if you plan on doing any random access.
As far as constructing a bunch of individual lists, I think that that's a bad idea. It will necessitate creating them as globals or using eval (or getting hacky in some other way) unless you know exactly how many there are going to be which you claim not to. It's best to just keep them in a container.
This one is O(n log n) because of the sort, but I wouldn't worry too much unless there are a lot of items in the list.
It the list is already sorted by event, you can skip the sort of course.
>>> from operator import itemgetter
>>> from itertools import groupby
>>> d=[{'event': 0, 'voltage': 1, 'time': 0},
... {'event': 0, 'voltage': 2, 'time': 1},
... {'event': 1, 'voltage': 1, 'time': 2},
... {'event': 1, 'voltage': 2, 'time': 3},
... {'event': 2, 'voltage': 1, 'time': 4},
... {'event': 2, 'voltage': 2, 'time': 5}]
>>> groupby(sorted(d, key=itemgetter('event')), key=itemgetter('event'))
<itertools.groupby object at 0xb78138c4>
>>> for x in _:
... print x[0], list(x[1])
...
0 [{'time': 0, 'event': 0, 'voltage': 1}, {'time': 1, 'event': 0, 'voltage': 2}]
1 [{'time': 2, 'event': 1, 'voltage': 1}, {'time': 3, 'event': 1, 'voltage': 2}]
2 [{'time': 4, 'event': 2, 'voltage': 1}, {'time': 5, 'event': 2, 'voltage': 2}]
dict_list = [{'event': 0, 'voltage': 1, 'time': 0},
{'event': 0, 'voltage': 2, 'time': 1},
{'event': 1, 'voltage': 1, 'time': 2},
{'event': 1, 'voltage': 2, 'time': 3},
{'event': 2, 'voltage': 1, 'time': 4},
{'event': 2, 'voltage': 2, 'time': 5},
]
import collections
dol = collections.defaultdict(list)
for d in dict_list:
k = d["event"]
dol[k].append(d)
print dol
if you know that your "event" keys are consecutive zero-based integers, you can use a list instead, but the extra complexity may not gain you anything.
defaultdict was added in python 2.5, but the workaround for earlier versions is not hard (see Nick D's code).
I think what you really want is to filter them:
elist = [{'event': 0, 'voltage': 1, 'time': 0},
{'event': 0, 'voltage': 2, 'time': 1},
{'event': 1, 'voltage': 1, 'time': 2},
{'event': 1, 'voltage': 2, 'time': 3},
{'event': 2, 'voltage': 1, 'time': 4},
{'event': 2, 'voltage': 2, 'time': 5}]
from itertools import ifilter
def get_events(elist, n):
return ifilter( lambda d: d['event'] == n , elist)
for e in get_events(elist,0):
print e
this solution will not create additional structures. (think in case of HUGE event list)
Another very nice solution is to use groupby:
from itertools import groupby
from operator import itemgetter
for group in groupby(elist, itemgetter('event')):
id, event_list = group
for e in event_list:
print e
{'time': 0, 'event': 0, 'voltage': 1}
{'time': 1, 'event': 0, 'voltage': 2}
{'time': 2, 'event': 1, 'voltage': 1}
{'time': 3, 'event': 1, 'voltage': 2}
{'time': 4, 'event': 2, 'voltage': 1}
{'time': 5, 'event': 2, 'voltage': 2}
A simple implementation will suffice in my opinion:
grouping = {}
for d in dictlist:
if d[field] not in grouping:
grouping[d[field]] = []
grouping[d[field]].append(d)
result = list(result.values())

Categories