inside list of dictionaries, merge lists based on key

inside list of dictionaries, merge lists based on key - python

I have nested dictionaries in a list of dictionaries, I want to merge the lists based on 'id'
res = [{'i': ['1'], 'id': '123'},
{'i': ['1'], 'id': '123'},
{'i': ['1','2','3','4','5','6'],'id': '123'},
{'i': ['1'], 'id': '234'},
{'i': ['1','2','3','4','5'],'id': '234'}]
Desired output:
[{'i': [1, 1, 1, 2, 3, 4, 5, 6], 'id': '123'},
{'i': [1, 1, 2, 3, 4, 5], 'id': '234'}]
I am trying to merge the nested dictionaries based on key "id". I couldn't figure out the best way out:
import collections
d = collections.defaultdict(list)
for i in res:
for k, v in i.items():
d[k].extend(v)
The above code is merging all the lists, but i wantto merge lists based on key "id".

Something like this should do the trick
from collections import defaultdict
merged = defaultdict(list)
for r in res:
merged[r['id']].extend(r['i'])
output = [{'id': key, 'i': merged_list} for key, merged_list in merged.items()]

The following produces the desired output, using itertools.groupby:
from operator import itemgetter
from itertools import groupby
k = itemgetter('id')
[
{'id': k, 'i': [x for d in g for x in d['i']]}
for k, g in groupby(sorted(res, key=k), key=k)
]

I'm not sure what the expected behavior should be when there are duplicates -- for example, should the lists be:
treated like a set() ?
appended, and there could be multiple items, such as [1,1,2,3...] ?
doesn't matter -- just take any
Here would be one variation where we use a dict comprehension:
{item['id']: item for item in res}.values()
# [{'i': ['1', '2', '3', '4', '5'], 'id': '234'}, {'i': ['1', '2', '3', '4', '5', '6'], 'id': '123'}]
If you provide a bit more information in your question, I can update the answer accordingly.

Related

Separating nested list and dictionary in separate columns

I created an function to gather the following sample list below:
full_list = ['Group1', [{'a':'1', 'b':'2'},{'c':'3', 'x':'1'}]
'Group2', [{'d':'7', 'e':'18'}],
'Group3', [{'m':'21'}, {'n':'44','p':'13'}]]
As you can see some of the elements inside the lists are made up of key-value pair dictionaries.
And these dictionaries are of different sizes (number of kv pairs).
Can anyone suggest what to use in python to display this list in separate columns?
Group1 Group2 Group3
{'a':'1', 'b':'2'} {'d':'7', 'e':'18'} {'m':'21'}
{'c':'3', 'x':'1'} {'n':'44','p':'13'}
I am not after a solution but rather a point in the right direction for a novice like me.
I have briefly looked at itertools and pandas dataframes
Thanks in advance

Here is one way:
First extract the columns and the data:
import pandas as pd
columns = full_list[::2]
#['Group1', 'Group2', 'Group3']
data = full_list[1::2]
#[[{'a': '1', 'b': '2'}, {'c': '3', 'x': '1'}],
# [{'d': '7', 'e': '18'}],
# [{'m': '21'}, {'n': '44', 'p': '13'}]]
Here the [::2] means iterate from begin to end but only every 2 items and so does [1::2] but it starts iterating from index 1 (second position)
Then create a pd.DataFrame:
df = pd.DataFrame(data)
#0 {'a': '1', 'b': '2'} {'c': '3', 'x': '1'}
#1 {'d': '7', 'e': '18'} None
#2 {'m': '21'} {'n': '44', 'p': '13'}
Ooops but the columns and rows are transposed so we need to convert it:
df = df.T
Then add the columns:
df.columns = columns
And there we have it:
Group1 Group2 Group3
0 {'a': '1', 'b': '2'} {'d': '7', 'e': '18'} {'m': '21'}
1 {'c': '3', 'x': '1'} None {'n': '44', 'p': '13'}

Python - Sum the value in the list of dictionary based on the same key

I have a list of dictionaries which looks like:
data = [{'stat3': '5', 'stat2': '4', 'player': '1'},
{'stat3': '8', 'stat2': '1', 'player': '1'},
{'stat3': '6', 'stat2': '1', 'player': '3'},
{'stat3': '3', 'stat2': '7', 'player': '3'}]
And I want to get a nested dictionary whose keys are the value from the key('player') and whose values are dictionaries of aggregated stats.
The output should:
{'3': {'stat3': 9, 'stat2': 8, 'player': '3'},
'1': {'stat3': 13, 'stat2': 5, 'player': '1'}}
The following is my code:
from collections import defaultdict
result = {}
total_stat = defaultdict(int)
for dict in data:
total_stat[dict['player']] += int(dict['stat3'])
total_stat[dict['player']] += int(dict['stat2'])
total_stat = ([{'player': info, 'stat3': total_stat[info],
'stat2': total_stat[info]} for info in
sorted(total_stat, reverse=True)])
for item in total_stat:
result.update({item['player']: item})
print(result)
However, I got this:
{'3': {'player': '3', 'stat3': 17, 'stat2': 17},
'1': {'player': '1', 'stat3': 18, 'stat2': 18}}
How could I make it right? Or are there other approaches?

Your data is rather a DataFrame, a natural pandas solution is :
In [34]: pd.DataFrame.from_records(data).astype(int).groupby('player').sum().T.to_dict()
Out[34]: {1: {'stat2': 5, 'stat3': 13}, 3: {'stat2': 8, 'stat3': 9}}

Just use a more nested default-factory:
>>> total_stat = defaultdict(lambda : defaultdict(int))
>>> value_fields = 'stat2', 'stat3'
>>> for datum in data:
... player_data = total_stat[datum['player']]
... for k in value_fields:
... player_data[k] += int(datum[k])
...
>>> from pprint import pprint
>>> pprint(total_stat)
defaultdict(<function <lambda> at 0x1023490d0>,
{'1': defaultdict(<class 'int'>, {'stat2': 5, 'stat3': 13}),
'3': defaultdict(<class 'int'>, {'stat2': 8, 'stat3': 9})})

This solution use a nested dictionary. The out is a {player: Counter} dictionary, where as Counter itself is another dictionary {stat: score}
import collections
def split_player_stat(dict_object):
"""
Split a row of data into player, stat
>>> split_player_stat({'stat3': '5', 'stat2': '4', 'player': '1'})
'1', {'stat3': 5, 'stat2': 4}
"""
key = dict_object['player']
value = {k: int(v) for k, v in dict_object.items() if k != 'player'}
return key, value
data = [{'stat3': '5', 'stat2': '4', 'player': '1'},
{'stat3': '8', 'stat2': '1', 'player': '1'},
{'stat3': '6', 'stat2': '1', 'player': '3'},
{'stat3': '3', 'stat2': '7', 'player': '3'}]
out = collections.defaultdict(collections.Counter)
for player_stat in data:
player, stat = split_player_stat(player_stat)
out[player].update(stat)
print(out)
The magic of this solution is done by the collections.defaultdict and collections.Counter classes, both behaves like dictionaries.

Not the best code, nor the more pythonic, but I think you should be able to walk through it and figure out where your code went wrong.
def sum_stats_by_player(data):
result = {}
for dictionary in data:
print(f"evaluating dictionary {dictionary}")
player = dictionary["player"]
stat3 = int(dictionary["stat3"])
stat2 = int(dictionary["stat2"])
# if the player isn't in our result
if player not in result:
print(f"\tfirst time player {player}")
result[player] = {} # add the player as an empty dictionary
result[player]["player"] = player
if "stat3" not in result[player]:
print(f"\tfirst time stat3 {stat3}")
result[player]["stat3"] = stat3
else:
print(f"\tupdating stat3 { result[player]['stat3'] + stat3}")
result[player]["stat3"] += stat3
if "stat2" not in result[player]:
print(f"\tfirst time stat2 {stat2}")
result[player]["stat2"] = stat2
else:
print(f"\tupdating stat2 { result[player]['stat2'] + stat2}")
result[player]["stat2"] += stat2
return result
data = [{'stat3': '5', 'stat2': '4', 'player': '1'},
{'stat3': '8', 'stat2': '1', 'player': '1'},
{'stat3': '6', 'stat2': '1', 'player': '3'},
{'stat3': '3', 'stat2': '7', 'player': '3'}]
print(sum_stats_by_player(data))

Most of the solution here are making the problem too complex. Let's make it simple and more readable. Here you go:
In [26]: result = {}
In [27]: req_key = 'player'
In [29]: for dct in data:
...: player_val = dct.pop(req_key)
...: result.setdefault(player_val, {req_key: player_val})
...: for k, v in dct.items():
...: result[player_val][k] = result[player_val].get(k, 0) + int(v)
In [30]: result
Out[30]:
{'1': {'player': '1', 'stat2': 5, 'stat3': 13},
'3': {'player': '3', 'stat2': 8, 'stat3': 9}}
Here you go simple and clean. For this simple problem no need of imports. Now coming to the program:
result.setdefault(player_val, {'player': player_val})
It sets the default value as "player": 3 or "player": 1 if there is no such key in the result.
result[player_val][k] = result[player_val].get(k, 0) + int(v)
This adds up the value for keys with common values.

Another version using Counter
import itertools
from collections import Counter
def count_group(group):
c = Counter()
for g in group:
g_i = dict([(k, int(v)) for k, v in g.items() if k != 'player'])
c.update(g_i)
return dict(c)
sorted_data = sorted(data, key=lambda x:x['player'])
results = [(k, count_group(g)) for k, g in itertools.groupby(sorted_data, lambda x: x['player'])]
print(results)
To give
[('1', {'stat3': 13, 'stat2': 5}), ('3', {'stat3': 9, 'stat2': 8})]

Two loops would allow you to:
group your data by a primary key
aggregate all secondary information
These two tasks are accomplished in the aggregate_statistics function shown below.
from collections import Counter
from pprint import pprint
def main():
data = [{'player': 1, 'stat2': 4, 'stat3': 5},
{'player': 1, 'stat2': 1, 'stat3': 8},
{'player': 3, 'stat2': 1, 'stat3': 6},
{'player': 3, 'stat2': 7, 'stat3': 3}]
new_data = aggregate_statistics(data, 'player')
pprint(new_data)
def aggregate_statistics(table, key):
records_by_key = {}
for record in table:
data = record.copy()
records_by_key.setdefault(data.pop(key), []).append(Counter(data))
new_data = []
for second_key, value in records_by_key.items():
start, *remaining = value
for record in remaining:
start.update(record)
new_data.append(dict(start, **{key: second_key}))
return new_data
if __name__ == '__main__':
main()

How do I loop over dictionary and check for values passed by a variable in Python

I've got the below dictionary and list - how do I loop over the dictionary checking if b == '1' while passing '1' as variable from a list?
dic = {'info': [{'a':0, 'b':'1'},{'a':0, 'b':'3'},{'a':0, 'b':'3'},{'a':0, 'b':'1'}]}
lst = ['1']
I want to return {'a':0, 'b':'1'}, {'a':0, 'b':'1'}.

This is a general solution using filter; the built-in method, you will have to adopt it to your needs:
>>> list(filter(lambda d: d['b'] in lst, dic['info']))
[{'b': '1', 'a': 0}, {'b': '1', 'a': 0}]
Converting the filter object into a list using list constructor is necessary only in Python3, whereas in Python2, it is not required:
>>> filter(lambda d: d['b'] in lst, dic['info'])
[{'b': '1', 'a': 0}, {'b': '1', 'a': 0}]
EDIT: To make the solution more general in case multiple items in lst, then consider the following:
>>> dic
{'info': [{'b': '1', 'a': 0}, {'b': '3', 'a': 0}, {'b': '3', 'a': 0}, {'b': '1', 'a': 0}, {'b': '2', 'a': '1'}]}
>>>
>>> lst
['1', '2']
>>> def filter_dict(dic_lst, lst):
lst_out = []
for sub_d in dic_lst:
if any(x == sub_d['b'] for x in lst):
lst_out.append(sub_d)
return lst_out
>>> filter_dict(dic['info'], lst)
[{'b': '1', 'a': 0}, {'b': '1', 'a': 0}, {'b': '2', 'a': '1'}]
OR:
>>> list(map(lambda x: list(filter(lambda d: d['b'] in x, dic['info'])),lst))
[[{'b': '1', 'a': 0}, {'b': '1', 'a': 0}], [{'b': '2', 'a': '1'}]]

Just a simple list comprehension:
In [22]: dic = {'info': [{'a':0, 'b':'1'},{'a':0, 'b':'3'},{'a':0, 'b':'3'},{'a':0, 'b':'1'}]}
In [23]: lst = ['1']
In [25]: [sub_dict for sub_dict in dic['info'] if sub_dict['b'] == lst[0]]
Out[25]: [{'a': 0, 'b': '1'}, {'a': 0, 'b': '1'}]

You could use a filter approach:
filter(lambda x:x['b'] in list, dic['info'])
It will create a generator which you can materialize in a list:
result = list(filter(lambda x:x['b'] in list, dic['info']))
Mind I would however rename your list variable since you here override a reference to the list type.

from collections import defaultdict
dic = {'info': [{'a':0, 'b':'1'},{'a':0, 'b':'3'},{'a':0, 'b':'3'},{'a':0, 'b':'1'}]}
d = defaultdict(list)
for each in dic['info']:
d[each['b']].append(each)
out:
defaultdict(list,
{'1': [{'a': 0, 'b': '1'}, {'a': 0, 'b': '1'}],
'3': [{'a': 0, 'b': '3'}, {'a': 0, 'b': '3'}]})
in:
d['1']
out:
[{'a': 0, 'b': '1'}, {'a': 0, 'b': '1'}]
Build an index dict to avoid iterate again.

First go my simple loop and iteration way
Input:
>>> dic
{'info': [{'a': 0, 'b': '1'}, {'a': 0, 'b': '3'}, {'a': 0, 'b': '3'}, {'a': 0, 'b': '1'}]}
>>> l
['1']
New List variable for result.
>>> result = []
Algo
Iterate diction by iteritems method of dictionary.
Value of main dictionary is list data type. so again iterate list by for loop.
Check b key is present in sub dictionary and check its value is present in given list l.
If yes, then append to result list.
code:
>>> for k,v in dic.iteritems():
... for i in v:
... if "b" in i and i["b"] in l:
... result.append(i)
...
Output:
>>> result
[{'a': 0, 'b': '1'}, {'a': 0, 'b': '1'}]
>>>
Notes:
Do not use list as variable name because list is reversed keyword for Python
Read basic things of dictionary and list which has properties.
Try to write code first.

You can make use of a list comprehension, or just do it using filter.
list comprehension
dict = {'info': [{'a':0, 'b':'1'},{'a':0, 'b':'3'},{'a':0, 'b':'3'},{'a':0, 'b':'1'}]}
lst = ['1']
result = [i for i in dict['info'] if i['b'] == lst[0]]
print result # [{'a': 0, 'b': '1'}, {'a': 0, 'b': '1'}]
filter
dict = {'info': [{'a':0, 'b':'1'},{'a':0, 'b':'3'},{'a':0, 'b':'3'},{'a':0, 'b':'1'}]}
list(filter(lambda i: i['b'] in lst, dic['info']))
# [{'b': '1', 'a': 0}, {'b': '1', 'a': 0}]

How to find the difference between two lists of dictionaries?

I have two lists of dictionaries and I'd like to find the difference between them (i.e. what exists in the first list but not the second, and what exists in the second list but not the first list).
The issue is that it is a list of dictionaries
a = [{'a': '1'}, {'c': '2'}]
b = [{'a': '1'}, {'b': '2'}]
set(a) - set(b)
Result
TypeError: unhashable type: 'dict'
Desired Result:
{'c': '2'}
How do I accomplish this?

You can use the in operator to see if it is in the list
a = [{'a': '1'}, {'c': '2'}]
b = [{'a': '1'}, {'b': '2'}]
>>> {'a':'1'} in a
True
>>> {'a':'1'} in b
True
>>> [i for i in a if i not in b]
[{'c': '2'}]

I'd like to find the difference between them (i.e. what exists in the first list but not the second, and what exists in the second list but not the first list)
According to your definition, you looking for a Symmetric difference:
>>> import itertools
>>> a = [{'a': '1'}, {'c': '2'}]
>>> b = [{'a': '1'}, {'b': '2'}]
>>> intersec = [item for item in a if item in b]
>>> sym_diff = [item for item in itertools.chain(a,b) if item not in intersec]
>>> intersec
[{'a': '1'}]
>>> sym_diff
[{'c': '2'}, {'b': '2'}
Alternatively (using the plain difference as given in your example):
>>> a_minus_b = [item for item in a if item not in b]
>>> b_minus_a = [item for item in b if item not in a]
>>> sym_diff = list(itertools.chain(a_minus_b,b_minus_a))
>>> a_minus_b
[{'c': '2'}]
>>> b_minus_a
[{'b': '2'}]
>>> sym_diff
[{'c': '2'}, {'b': '2'}]

You can also you filter with a lambda:
If you want the different items in each list:
print filter(lambda x: x not in b,a) + filter(lambda x: x not in a,b)
[{'c': '2'}, {'b': '2'}]
Or just filter(lambda x: x not in b,a) to get the elements in a but not in b
If you don't want to create the full list of dicts in memory you can use itertools.ifilter
from itertools import ifilter
diff = ifilter(lambda x: x not in b,a)
Then just iterate over diff:
for uniq in diff:
print uniq

loop for to print a dictionary of dictionaries

I'd like to find a way to print a list of dictionnaries line by line, so that the result be clear and easy to read
the list is like this.
myList = {'1':{'name':'x',age:'18'},'2':{'name':'y',age:'19'},'3':{'name':'z',age:'20'}...}
and the result should be like this:
>>> '1':{'name':'x',age:'18'}
'2':{'name':'y',age:'19'}
'3':{'name':'z',age:'20'} ...

Using your example:
>>> myList = {'1':{'name':'x','age':'18'},'2':{'name':'y','age':'19'},'3':{'name':'z','age':'20'}}
>>> for k, d in myList.items():
print k, d
1 {'age': '18', 'name': 'x'}
3 {'age': '20', 'name': 'z'}
2 {'age': '19', 'name': 'y'}
More examples:
A list of dictionaries:
>>> l = [{'a':'1'},{'b':'2'},{'c':'3'}]
>>> for d in l:
print d
{'a': '1'}
{'b': '2'}
{'c': '3'}
A dictionary of dictionaries:
>>> D = {'d1': {'a':'1'}, 'd2': {'b':'2'}, 'd3': {'c':'3'}}
>>> for k, d in D.items():
print d
{'b': '2'}
{'c': '3'}
{'a': '1'}
If you want the key of the dicts:
>>> D = {'d1': {'a':'1'}, 'd2': {'b':'2'}, 'd3': {'c':'3'}}
>>> for k, d in D.items():
print k, d
d2 {'b': '2'}
d3 {'c': '3'}
d1 {'a': '1'}

>>> import json
>>> dicts = {1: {'a': 1, 'b': 2}, 2: {'c': 3}, 3: {'d': 4, 'e': 5, 'f':6}}
>>> print(json.dumps(dicts, indent=4))
{
"1": {
"a": 1,
"b": 2
},
"2": {
"c": 3
},
"3": {
"d": 4,
"e": 5,
"f": 6
}
}

One more option - pprint, made for pretty-printing.
The pprint module provides a capability to “pretty-print” arbitrary Python data structures in a form which can be used as input to the interpreter.
List of dictionaries:
from pprint import pprint
l = [{'a':'1'},{'b':'2'},{'c':'3'}]
pprint(l, width=1)
Output:
[{'a': '1'},
{'b': '2'},
{'c': '3'}]
Dictionary with dictionaries in values:
from pprint import pprint
d = {'a':{'b':'c'}},{'d':{'e':'f'}}
pprint(d, width=1)
Output:
({'a': {'b': 'c'}},
{'d': {'e': 'f'}})

myList = {'1':{'name':'x','age':'18'},
'2':{'name':'y','age':'19'},
'3':{'name':'z','age':'20'}}
for item in myList:
print(item,':',myList[item])
Output:
3 : {'age': '20', 'name': 'z'}
2 : {'age': '19', 'name': 'y'}
1 : {'age': '18', 'name': 'x'}
item is used to iterate keys in the dict, and myList[item] is the value corresponding to the current key.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

inside list of dictionaries, merge lists based on key - python

Something like this should do the trick from collections import defaultdict merged = defaultdict(list) for r in res: merged[r['id']].extend(r['i']) output = [{'id': key, 'i': merged_list} for key, merged_list in merged.items()]

The following produces the desired output, using itertools.groupby: from operator import itemgetter from itertools import groupby k = itemgetter('id') [ {'id': k, 'i': [x for d in g for x in d['i']]} for k, g in groupby(sorted(res, key=k), key=k) ]

Related

Separating nested list and dictionary in separate columns

Python - Sum the value in the list of dictionary based on the same key

How do I loop over dictionary and check for values passed by a variable in Python

How to find the difference between two lists of dictionaries?

loop for to print a dictionary of dictionaries

Categories

Resources