Merge a two dictionaries based on keys with no duplicates

Merge a two dictionaries based on keys with no duplicates - python

Dict 1 :
{'result2': [{'2': '22'}, {'22': '222'}], 'result1': [{'1': '11'}, {'11': '111'}]}
Dict 2:
{'result2': [{'two': 'twentytwo'}, {'22': '222'}], 'result1': [{'one': 'eleven'}, {'11': '111'}]}
I want to merge them based on keys result1 and result 2 and preserve the same format.
Expected Output
{"result1":[{"1":"11"},{"one":"eleven"},{"11":"111"}]},{"result2":[{"2":"22"},{"two":"twentytwo"},{"22":"222"}]}
And the keys of both dict may not necessarily have same index.

It's always good to add desired format for output. It likes you want to list of dict which contains data from both with respect to result1 & result 2.
Given -
a = [{"result1":[{"1":"11"},{"11":"111"}]},{"result2":[{"2":"22"},{"22":"222"}]}]
b = [{"result1":[{"one":"eleven"},{"11":"111"}]},{"result2":[{"two":"twentytwo"},{"22":"222"}]}]
Here is code-
desired_op = []
for index, item in enumerate(a):
inner_item = {}
for key, value in item.items():
#print (key, value + b[index][key])
inner_item[key] = value + b[index][key]
desired_op.append(inner_item)
print (desired_op)
[{'result1': [{'1': '11'}, {'11': '111'}, {'one': 'eleven'}, {'11': '111'}]}, {'result2': [{'2': '22'}, {'22': '222'}, {'two': 'twentytwo'}, {'22': '222'}]}]
If you need unique items, it will be bit complex-
desired_op = []
for index, item in enumerate(a):
inner_item = {}
for key, value in item.items():
#print (key, value + b[index][key])
inner_item[key] = list({list(litem.keys())[0]:litem for litem in value + b[index][key]}.values())
desired_op.append(inner_item)
print (desired_op)
[{'result1': [{'1': '11'}, {'11': '111'}, {'one': 'eleven'}]}, {'result2': [{'2': '22'}, {'22': '222'}, {'two': 'twentytwo'}]}]
follow up on comment
simple append can work, but for this you must be sure that first index from second list should contain result1 & second index should contain result2 as dict keys.
ex. (but this won't remove duplicates)
a[0]['result1'].extend(b[0]['result1'])
a[1]['result2'].extend(b[1]['result2'])
print(a)
[{'result1': [{'1': '11'}, {'11': '111'}, {'one': 'eleven'}, {'11': '111'}]}, {'result2': [{'2': '22'}, {'22': '222'}, {'two': 'twentytwo'}, {'22': '222'}]}]
if you aren't sure about index position in second list, this will work-
keys_mapping = {list(item.keys())[0]:index for index, item in enumerate(a)}
print (keys_mapping)
{'result1': 0, 'result2': 1}
for item in b:
key = list(item.keys())[0]
a[keys_mapping[key]][key].extend(item[key])
print (a)
[{'result1': [{'1': '11'}, {'11': '111'}, [{'one': 'eleven'}, {'11': '111'}]]}, {'result2': [{'2': '22'}, {'22': '222'}, [{'two': 'twentytwo'}, {'22': '222'}]]}]

You could separate this in two phases:
merge the dictionaries by appending the lists
remove duplicates from the resulting merged dictionary
...
dict1={'result2': [{'2': '22'}, {'22': '222'}],
'result1': [{'1': '11'}, {'11': '111'}]}
dict2={'result2': [{'two': 'twentytwo'}, {'22': '222'}],
'result1': [{'one': 'eleven'}, {'11': '111'}]}
# merge phase
dict3 = {k:dict1.get(k,[])+dict2.get(k,[]) for k in {*dict1,*dict2}}
# uniqueness phase
dict3 = {k:[d for i,d in enumerate(v) if v.index(d)==i]
for k,v in dict3.items()}
print(dict3)
{'result2': [{'2': '22'}, {'22': '222'}, {'two': 'twentytwo'}],
'result1': [{'1': '11'}, {'11': '111'}, {'one': 'eleven'}]}
Note that you could combine the two phases in one large dictionary comprehension:
dict3 = {k: [d for i,d in enumerate(v) if v.index(d)==i]
for k in {*dict1,*dict2}
for v in [dict1.get(k,[])+dict2.get(k,[])] }
If dict1 and dict2 are guaranteed to have the same keys, then the whole process can be performed more concisely:
dict3 = {k:v+[w for w in dict2[k] if w not in v] for k,v in dict1.items()}

Related

pythonic way to break a dict which value is a list to several dict

description
I need to break a dict which value is a list to several dict, keep the other parts.
The key I want to break may have different name,but the cond only and always get one value which type is list.
example
input
cond = {"type":"image","questionType":["3","4","5"]}
cond = {"type":"example","fieldToBreak":["1","2","3"],"fieldInt":1,"fieldFloat":0.1}
output
[
{'type': 'image', 'questionType': '3'},
{'type': 'image', 'questionType': '4'},
{'type': 'image', 'questionType': '5'}
]
[
{'type': 'example', 'fieldToBreak': '1', 'fieldInt': 1, 'fieldFloat': 0.1},
{'type': 'example', 'fieldToBreak': '2', 'fieldInt': 1, 'fieldFloat': 0.1},
{'type': 'example', 'fieldToBreak': '3', 'fieldInt': 1, 'fieldFloat': 0.1}
]
what I have tried
cond_queue = []
for k,v in cond.items():
if isinstance(v,list):
for ele in v:
cond_copy = cond.copy()
cond_copy[k] = ele
cond_queue.append(cond_copy)
break
It works, but I think it is not the best pythonic solution.
question:
Any better pythonic solution?

Possible approach utilizing python's built-in functions and standard library. The code should work with any number of keys. It creates all combinations of values' elements in case of multiple lists presented in the original dict. Not sure if this logic a correct one.
import itertools
def dict_to_inflated_list(d):
ans, keys, vals = list(), list(), list()
# copy keys and 'listified' values in the same order
for k, v in d.items():
keys.append(k)
vals.append(v if isinstance(v, list) else [v])
# iterate over all possible combinations of elements of all 'listified' values
for combination in itertools.product(*vals):
ans.append({k: v for k, v in zip(keys, combination)})
return ans
if __name__ == '__main__':
cond = {'type': 'image', 'questionType': ['3', '4', '5']}
print(dict_to_inflated_list(cond))
cond = {'a': 0, 'b': [1, 2], 'c': [10, 20]}
print(dict_to_inflated_list(cond))
Output:
[{'type': 'image', 'questionType': '3'}, {'type': 'image', 'questionType': '4'}, {'type': 'image', 'questionType': '5'}]
[{'a': 0, 'b': 1, 'c': 10}, {'a': 0, 'b': 1, 'c': 20}, {'a': 0, 'b': 2, 'c': 10}, {'a': 0, 'b': 2, 'c': 20}]

something like the below (the solution is based on the input from the post which I assume represents the general case)
cond = {"type": "image", "questionType": ["3", "4", "5"]}
data = [{"type": "image", "questionType": e} for e in cond['questionType']]
print(data)
output
[{'type': 'image', 'questionType': '3'}, {'type': 'image', 'questionType': '4'}, {'type': 'image', 'questionType': '5'}]

This little function does the job without any extra argument except the input dictionary
def unpack_dict(d):
n = [len(v) for k,v in d.items() if type(v) is list][0] #number of items in the list
r = []
for i in range(n):
_d = {}
for k,v in d.items():
if type(v) is list:
_d[k] = v[i]
else:
_d[k] = v
r.append(_d)
return r
cond = {"type":"example","fieldToBreak":["1","2","3"],"fieldInt":1,"fieldFloat":0.1}
unpack_dict(cond)
[{'type': 'example', 'fieldToBreak': '1', 'fieldInt': 1, 'fieldFloat': 0.1},
{'type': 'example', 'fieldToBreak': '2', 'fieldInt': 1, 'fieldFloat': 0.1},
{'type': 'example', 'fieldToBreak': '3', 'fieldInt': 1, 'fieldFloat': 0.1}]
The function determines how many items (n) there are in the list entry and uses that info to extract the right value to be inserted in the dictionary. Looping over n (for i in range(n):) is used to append the correct number of dictionaries in the final output. That's it. Quite simple to read and understand.

try this:
lens = 0
for index, item in enumerate(cond):
if isinstance(cond[item], list):
lens = len(cond[item])
idx = index
break
print([{k : v if i!=idx else v[j] for i,(k,v) in enumerate(cond.items()) } for j in range(lens)])
output:
# cond = {"type":"image","questionType":["3","4","5"]}
[{'type': 'image', 'questionType': '3'},
{'type': 'image', 'questionType': '4'},
{'type': 'image', 'questionType': '5'}]
# cond = {"type":"example","fieldToBreak":["1","2","3"],"fieldInt":1,"fieldFloat":0.1}
[{'type': 'example', 'fieldToBreak': '1', 'fieldInt': 1, 'fieldFloat': 0.1},
{'type': 'example', 'fieldToBreak': '2', 'fieldInt': 1, 'fieldFloat': 0.1},
{'type': 'example', 'fieldToBreak': '3', 'fieldInt': 1, 'fieldFloat': 0.1}]
if dict have another shape:
# cond = {"questionType":["3","4","5"], "type":"image"}
[{'questionType': '3', 'type': 'image'},
{'questionType': '4', 'type': 'image'},
{'questionType': '5', 'type': 'image'}]

inside list of dictionaries, merge lists based on key

I have nested dictionaries in a list of dictionaries, I want to merge the lists based on 'id'
res = [{'i': ['1'], 'id': '123'},
{'i': ['1'], 'id': '123'},
{'i': ['1','2','3','4','5','6'],'id': '123'},
{'i': ['1'], 'id': '234'},
{'i': ['1','2','3','4','5'],'id': '234'}]
Desired output:
[{'i': [1, 1, 1, 2, 3, 4, 5, 6], 'id': '123'},
{'i': [1, 1, 2, 3, 4, 5], 'id': '234'}]
I am trying to merge the nested dictionaries based on key "id". I couldn't figure out the best way out:
import collections
d = collections.defaultdict(list)
for i in res:
for k, v in i.items():
d[k].extend(v)
The above code is merging all the lists, but i wantto merge lists based on key "id".

Something like this should do the trick
from collections import defaultdict
merged = defaultdict(list)
for r in res:
merged[r['id']].extend(r['i'])
output = [{'id': key, 'i': merged_list} for key, merged_list in merged.items()]

The following produces the desired output, using itertools.groupby:
from operator import itemgetter
from itertools import groupby
k = itemgetter('id')
[
{'id': k, 'i': [x for d in g for x in d['i']]}
for k, g in groupby(sorted(res, key=k), key=k)
]

I'm not sure what the expected behavior should be when there are duplicates -- for example, should the lists be:
treated like a set() ?
appended, and there could be multiple items, such as [1,1,2,3...] ?
doesn't matter -- just take any
Here would be one variation where we use a dict comprehension:
{item['id']: item for item in res}.values()
# [{'i': ['1', '2', '3', '4', '5'], 'id': '234'}, {'i': ['1', '2', '3', '4', '5', '6'], 'id': '123'}]
If you provide a bit more information in your question, I can update the answer accordingly.

Swap keys of nested dictionaries

I have a dictionary as follows:
Each key has a dictionary associated with it.
dict_sample = {'a': {'d0': '1', 'd1': '2', 'd2': '3'}, 'b': {'d0': '1'}, 'c': {'d1': '1'}}
I need the output as follows:
output_dict = {'d0': {'a': 1, 'b': 1}, 'd1': {'a': 2, 'c': 1}, 'd2': {'a': 3}}
I'd appreciate any help on the pythonic way to achieve this. Thank You !

I believe this produces the desired output
>>> from collections import defaultdict
>>> d = defaultdict(dict)
>>>
>>> dict_sample = {'a': {'d0': '1', 'd1': '2', 'd2': '3'}, 'b': {'d0': '1'}, 'c': {'d1': '1'}}
>>>
>>> for key, value in dict_sample.items():
... for k, v in value.items():
... d[k][key] = v
...
>>> d
defaultdict(<class 'dict'>, {'d0': {'a': '1', 'b': '1'}, 'd1': {'a': '2', 'c': '1'}, 'd2': {'a': '3'}})

You can use dict.setdefault on a new dict with a nested loop:
d = {}
# for each key and sub-dict in the main dict
for k1, s in dict_sample.items():
# for each key and value in the sub-dict
for k2, v in s.items():
# this is equivalent to d[k2][k1] = int(v), except that when k2 is not yet in d,
# setdefault will initialize d[k2] with {} (a new dict)
d.setdefault(k2, {})[k1] = int(v)
d would become:
{'d0': {'a': 1, 'b': 1}, 'd1': {'a': 2, 'c': 1}, 'd2': {'a': 3}}

Python - Sum the value in the list of dictionary based on the same key

I have a list of dictionaries which looks like:
data = [{'stat3': '5', 'stat2': '4', 'player': '1'},
{'stat3': '8', 'stat2': '1', 'player': '1'},
{'stat3': '6', 'stat2': '1', 'player': '3'},
{'stat3': '3', 'stat2': '7', 'player': '3'}]
And I want to get a nested dictionary whose keys are the value from the key('player') and whose values are dictionaries of aggregated stats.
The output should:
{'3': {'stat3': 9, 'stat2': 8, 'player': '3'},
'1': {'stat3': 13, 'stat2': 5, 'player': '1'}}
The following is my code:
from collections import defaultdict
result = {}
total_stat = defaultdict(int)
for dict in data:
total_stat[dict['player']] += int(dict['stat3'])
total_stat[dict['player']] += int(dict['stat2'])
total_stat = ([{'player': info, 'stat3': total_stat[info],
'stat2': total_stat[info]} for info in
sorted(total_stat, reverse=True)])
for item in total_stat:
result.update({item['player']: item})
print(result)
However, I got this:
{'3': {'player': '3', 'stat3': 17, 'stat2': 17},
'1': {'player': '1', 'stat3': 18, 'stat2': 18}}
How could I make it right? Or are there other approaches?

Your data is rather a DataFrame, a natural pandas solution is :
In [34]: pd.DataFrame.from_records(data).astype(int).groupby('player').sum().T.to_dict()
Out[34]: {1: {'stat2': 5, 'stat3': 13}, 3: {'stat2': 8, 'stat3': 9}}

Just use a more nested default-factory:
>>> total_stat = defaultdict(lambda : defaultdict(int))
>>> value_fields = 'stat2', 'stat3'
>>> for datum in data:
... player_data = total_stat[datum['player']]
... for k in value_fields:
... player_data[k] += int(datum[k])
...
>>> from pprint import pprint
>>> pprint(total_stat)
defaultdict(<function <lambda> at 0x1023490d0>,
{'1': defaultdict(<class 'int'>, {'stat2': 5, 'stat3': 13}),
'3': defaultdict(<class 'int'>, {'stat2': 8, 'stat3': 9})})

This solution use a nested dictionary. The out is a {player: Counter} dictionary, where as Counter itself is another dictionary {stat: score}
import collections
def split_player_stat(dict_object):
"""
Split a row of data into player, stat
>>> split_player_stat({'stat3': '5', 'stat2': '4', 'player': '1'})
'1', {'stat3': 5, 'stat2': 4}
"""
key = dict_object['player']
value = {k: int(v) for k, v in dict_object.items() if k != 'player'}
return key, value
data = [{'stat3': '5', 'stat2': '4', 'player': '1'},
{'stat3': '8', 'stat2': '1', 'player': '1'},
{'stat3': '6', 'stat2': '1', 'player': '3'},
{'stat3': '3', 'stat2': '7', 'player': '3'}]
out = collections.defaultdict(collections.Counter)
for player_stat in data:
player, stat = split_player_stat(player_stat)
out[player].update(stat)
print(out)
The magic of this solution is done by the collections.defaultdict and collections.Counter classes, both behaves like dictionaries.

Not the best code, nor the more pythonic, but I think you should be able to walk through it and figure out where your code went wrong.
def sum_stats_by_player(data):
result = {}
for dictionary in data:
print(f"evaluating dictionary {dictionary}")
player = dictionary["player"]
stat3 = int(dictionary["stat3"])
stat2 = int(dictionary["stat2"])
# if the player isn't in our result
if player not in result:
print(f"\tfirst time player {player}")
result[player] = {} # add the player as an empty dictionary
result[player]["player"] = player
if "stat3" not in result[player]:
print(f"\tfirst time stat3 {stat3}")
result[player]["stat3"] = stat3
else:
print(f"\tupdating stat3 { result[player]['stat3'] + stat3}")
result[player]["stat3"] += stat3
if "stat2" not in result[player]:
print(f"\tfirst time stat2 {stat2}")
result[player]["stat2"] = stat2
else:
print(f"\tupdating stat2 { result[player]['stat2'] + stat2}")
result[player]["stat2"] += stat2
return result
data = [{'stat3': '5', 'stat2': '4', 'player': '1'},
{'stat3': '8', 'stat2': '1', 'player': '1'},
{'stat3': '6', 'stat2': '1', 'player': '3'},
{'stat3': '3', 'stat2': '7', 'player': '3'}]
print(sum_stats_by_player(data))

Most of the solution here are making the problem too complex. Let's make it simple and more readable. Here you go:
In [26]: result = {}
In [27]: req_key = 'player'
In [29]: for dct in data:
...: player_val = dct.pop(req_key)
...: result.setdefault(player_val, {req_key: player_val})
...: for k, v in dct.items():
...: result[player_val][k] = result[player_val].get(k, 0) + int(v)
In [30]: result
Out[30]:
{'1': {'player': '1', 'stat2': 5, 'stat3': 13},
'3': {'player': '3', 'stat2': 8, 'stat3': 9}}
Here you go simple and clean. For this simple problem no need of imports. Now coming to the program:
result.setdefault(player_val, {'player': player_val})
It sets the default value as "player": 3 or "player": 1 if there is no such key in the result.
result[player_val][k] = result[player_val].get(k, 0) + int(v)
This adds up the value for keys with common values.

Another version using Counter
import itertools
from collections import Counter
def count_group(group):
c = Counter()
for g in group:
g_i = dict([(k, int(v)) for k, v in g.items() if k != 'player'])
c.update(g_i)
return dict(c)
sorted_data = sorted(data, key=lambda x:x['player'])
results = [(k, count_group(g)) for k, g in itertools.groupby(sorted_data, lambda x: x['player'])]
print(results)
To give
[('1', {'stat3': 13, 'stat2': 5}), ('3', {'stat3': 9, 'stat2': 8})]

Two loops would allow you to:
group your data by a primary key
aggregate all secondary information
These two tasks are accomplished in the aggregate_statistics function shown below.
from collections import Counter
from pprint import pprint
def main():
data = [{'player': 1, 'stat2': 4, 'stat3': 5},
{'player': 1, 'stat2': 1, 'stat3': 8},
{'player': 3, 'stat2': 1, 'stat3': 6},
{'player': 3, 'stat2': 7, 'stat3': 3}]
new_data = aggregate_statistics(data, 'player')
pprint(new_data)
def aggregate_statistics(table, key):
records_by_key = {}
for record in table:
data = record.copy()
records_by_key.setdefault(data.pop(key), []).append(Counter(data))
new_data = []
for second_key, value in records_by_key.items():
start, *remaining = value
for record in remaining:
start.update(record)
new_data.append(dict(start, **{key: second_key}))
return new_data
if __name__ == '__main__':
main()

How do I loop over dictionary and check for values passed by a variable in Python

I've got the below dictionary and list - how do I loop over the dictionary checking if b == '1' while passing '1' as variable from a list?
dic = {'info': [{'a':0, 'b':'1'},{'a':0, 'b':'3'},{'a':0, 'b':'3'},{'a':0, 'b':'1'}]}
lst = ['1']
I want to return {'a':0, 'b':'1'}, {'a':0, 'b':'1'}.

This is a general solution using filter; the built-in method, you will have to adopt it to your needs:
>>> list(filter(lambda d: d['b'] in lst, dic['info']))
[{'b': '1', 'a': 0}, {'b': '1', 'a': 0}]
Converting the filter object into a list using list constructor is necessary only in Python3, whereas in Python2, it is not required:
>>> filter(lambda d: d['b'] in lst, dic['info'])
[{'b': '1', 'a': 0}, {'b': '1', 'a': 0}]
EDIT: To make the solution more general in case multiple items in lst, then consider the following:
>>> dic
{'info': [{'b': '1', 'a': 0}, {'b': '3', 'a': 0}, {'b': '3', 'a': 0}, {'b': '1', 'a': 0}, {'b': '2', 'a': '1'}]}
>>>
>>> lst
['1', '2']
>>> def filter_dict(dic_lst, lst):
lst_out = []
for sub_d in dic_lst:
if any(x == sub_d['b'] for x in lst):
lst_out.append(sub_d)
return lst_out
>>> filter_dict(dic['info'], lst)
[{'b': '1', 'a': 0}, {'b': '1', 'a': 0}, {'b': '2', 'a': '1'}]
OR:
>>> list(map(lambda x: list(filter(lambda d: d['b'] in x, dic['info'])),lst))
[[{'b': '1', 'a': 0}, {'b': '1', 'a': 0}], [{'b': '2', 'a': '1'}]]

Just a simple list comprehension:
In [22]: dic = {'info': [{'a':0, 'b':'1'},{'a':0, 'b':'3'},{'a':0, 'b':'3'},{'a':0, 'b':'1'}]}
In [23]: lst = ['1']
In [25]: [sub_dict for sub_dict in dic['info'] if sub_dict['b'] == lst[0]]
Out[25]: [{'a': 0, 'b': '1'}, {'a': 0, 'b': '1'}]

You could use a filter approach:
filter(lambda x:x['b'] in list, dic['info'])
It will create a generator which you can materialize in a list:
result = list(filter(lambda x:x['b'] in list, dic['info']))
Mind I would however rename your list variable since you here override a reference to the list type.

from collections import defaultdict
dic = {'info': [{'a':0, 'b':'1'},{'a':0, 'b':'3'},{'a':0, 'b':'3'},{'a':0, 'b':'1'}]}
d = defaultdict(list)
for each in dic['info']:
d[each['b']].append(each)
out:
defaultdict(list,
{'1': [{'a': 0, 'b': '1'}, {'a': 0, 'b': '1'}],
'3': [{'a': 0, 'b': '3'}, {'a': 0, 'b': '3'}]})
in:
d['1']
out:
[{'a': 0, 'b': '1'}, {'a': 0, 'b': '1'}]
Build an index dict to avoid iterate again.

First go my simple loop and iteration way
Input:
>>> dic
{'info': [{'a': 0, 'b': '1'}, {'a': 0, 'b': '3'}, {'a': 0, 'b': '3'}, {'a': 0, 'b': '1'}]}
>>> l
['1']
New List variable for result.
>>> result = []
Algo
Iterate diction by iteritems method of dictionary.
Value of main dictionary is list data type. so again iterate list by for loop.
Check b key is present in sub dictionary and check its value is present in given list l.
If yes, then append to result list.
code:
>>> for k,v in dic.iteritems():
... for i in v:
... if "b" in i and i["b"] in l:
... result.append(i)
...
Output:
>>> result
[{'a': 0, 'b': '1'}, {'a': 0, 'b': '1'}]
>>>
Notes:
Do not use list as variable name because list is reversed keyword for Python
Read basic things of dictionary and list which has properties.
Try to write code first.

You can make use of a list comprehension, or just do it using filter.
list comprehension
dict = {'info': [{'a':0, 'b':'1'},{'a':0, 'b':'3'},{'a':0, 'b':'3'},{'a':0, 'b':'1'}]}
lst = ['1']
result = [i for i in dict['info'] if i['b'] == lst[0]]
print result # [{'a': 0, 'b': '1'}, {'a': 0, 'b': '1'}]
filter
dict = {'info': [{'a':0, 'b':'1'},{'a':0, 'b':'3'},{'a':0, 'b':'3'},{'a':0, 'b':'1'}]}
list(filter(lambda i: i['b'] in lst, dic['info']))
# [{'b': '1', 'a': 0}, {'b': '1', 'a': 0}]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Merge a two dictionaries based on keys with no duplicates - python

Related

pythonic way to break a dict which value is a list to several dict

inside list of dictionaries, merge lists based on key

Swap keys of nested dictionaries

Python - Sum the value in the list of dictionary based on the same key

How do I loop over dictionary and check for values passed by a variable in Python

Categories

Resources