I want to generate a dictionary from a list of dictionaries, grouping list items by the value of some key, such as:
input_list = [
{'a':'tata', 'b': 'foo'},
{'a':'pipo', 'b': 'titi'},
{'a':'pipo', 'b': 'toto'},
{'a':'tata', 'b': 'bar'}
]
output_dict = {
'pipo': [
{'a': 'pipo', 'b': 'titi'},
{'a': 'pipo', 'b': 'toto'}
],
'tata': [
{'a': 'tata', 'b': 'foo'},
{'a': 'tata', 'b': 'bar'}
]
}
So far I've found two ways of doing this. The first simply iterates over the list, create sublists in the dict for each key value and append elements matching these keys to the sublist:
l = [
{'a':'tata', 'b': 'foo'},
{'a':'pipo', 'b': 'titi'},
{'a':'pipo', 'b': 'toto'},
{'a':'tata', 'b': 'bar'}
]
res = {}
for e in l:
res[e['a']] = res.get(e['a'], [])
res[e['a']].append(e)
And another using itertools.groupby:
import itertools
from operator import itemgetter
l = [
{'a':'tata', 'b': 'foo'},
{'a':'pipo', 'b': 'titi'},
{'a':'pipo', 'b': 'toto'},
{'a':'tata', 'b': 'bar'}
]
l = sorted(l, key=itemgetter('a'))
res = dict((k, list(g)) for k, g in itertools.groupby(l, key=itemgetter('a')))
I wonder which alternative is the most efficient?
Is there any more pythonic/concise or better performing way of achieving this?
Is it correct that you want to group your input list by the value of the 'a' key of the list elements? If so, your first approach is the best, one minor improvement, use dict.setdefault:
res = {}
for item in l:
res.setdefault(item['a'], []).append(item)
If by efficient you mean "time efficient", it is possible to measure it using the timeit built in module.
For example:
import timeit
import itertools
from operator import itemgetter
input = [{'a': 'tata', 'b': 'foo'},
{'a': 'pipo', 'b': 'titi'},
{'a': 'pipo', 'b': 'toto'},
{'a': 'tata', 'b': 'bar'}]
def solution1():
res = {}
for e in input:
res[e['a']] = res.get(e['a'], [])
res[e['a']].append(e)
return res
def solution2():
l = sorted(input, key=itemgetter('a'))
res = dict(
(k, list(g)) for k, g in itertools.groupby(l, key=itemgetter('a'))
)
return res
t = timeit.Timer(solution1)
print(t.timeit(10000))
# 0.0122511386871
t = timeit.Timer(solution2)
print(t.timeit(10000))
# 0.0366218090057
Please refer to the timeit official docs for further information.
A one liner -
>>> import itertools
>>> input_list = [
... {'a':'tata', 'b': 'foo'},
... {'a':'pipo', 'b': 'titi'},
... {'a':'pipo', 'b': 'toto'},
... {'a':'tata', 'b': 'bar'}
... ]
>>> {k:[v for v in input_list if v['a'] == k] for k, val in itertools.groupby(input_list,lambda x: x['a'])}
{'tata': [{'a': 'tata', 'b': 'foo'}, {'a': 'tata', 'b': 'bar'}], 'pipo': [{'a': 'pipo', 'b': 'titi'}, {'a': 'pipo', 'b': 'toto'}]}
The best approach is the first one you mentioned, and you can even make it more elegant by using setdefault as mentioned by bernhard above. The complexity of this approach is O(n) since we simply iterate over the input once and for each item we perform a lookup into the output dict we are building to find the appropriate list to append it to, which takes constant time (lookup+append) for each item. So overlal complexity is O(n) which is optimal.
When using itertools.groupby, you must sort the input beforehand (which is O(n log n)).
Related
Does any one knows how to get the index of the values from dictionary 2 on dictionary 1.. like this:
Dictionary_1= {A: [Tom, Jane, Joe]; B: [Joana, Clare, Tom]; C: [Clare, Jane, Joe]}
Dictionary_2 = {A: Tom; B: Clare; C: Jane}
RESULT = {A: 1; B: 2; C: 2}
EDIT:
Sorry guys.. first of all I got confused and forgot that I needed it starting with "0" instead of "1".
I was having a problem, but it was because my list inside of dictionary 1 was in unicode format instead of list.
Also.. in the example I used here, I noticed later that the keys existed in both dictionaries, but in the code Im writting it wasnt the same thing. I didnt post the original here because it was bigger, so I tried to resume the most I could. Sorry for that too.
So I got it working with this code:
RESULT = {}
for x, y in Dictionary_1.items():
for a, b in Dictionary_2 .items():
if x == a:
z = Dictionary_1[x]
r = eval(z)
if '{0}'.format(b) in r:
RESULT [a] = r.index('{0}'.format(b))
I know that its looks messy but im still learning.
I really appreciate your help guys!
You can try using dict comprehension.
dict1={'A':['Tom','Jane','Joe'],'B':['Joana','Clare','Tom'],'C':['Clare','Jane','Joe']}
dict2={'A':'Tom','B':'Clare','C':'Jane'}
result={k:dict1[k].index(v)+1 for k,v in dict2.values()}
# {'A': 1, 'B': 2, 'C': 2}
#Or
# {k:dict1.get(k).index(v)+1 for k,v in dict2.items()}
Assuming you want 0-based indices, you can use list.index() with a dict comprehension:
d1 = {'A': ['Tom', 'Jane', 'Joe'], 'B': ['Joana', 'Clare', 'Tom'], 'C': ['Clare', 'Jane', 'Joe']}
d2 = {'A': 'Tom', 'B': 'Clare', 'C': 'Jane'}
result = {k: d1[k].index(v) for k, v in d2.items()}
print(result)
# {'A': 0, 'B': 1, 'C': 1}
If you want to have indices starting at 1, then you can do d1[k].index(v) + 1.
An easy to understand solution for you
d1 = {'A': ['Tom', 'Jane', 'Joe'], 'B': ['Joana', 'Clare', 'Tom'], 'C': ['Clare', 'Jane', 'Joe']}
d2 = {'A': 'Tom', 'B': 'Clare', 'C': 'Jane'}
output = {}
for k,v in d2.items():
output[k] = d1[k].index(v)+1
print(output)
This is certainly not the best approach but this is what I did quickly:
dict1 = {0: ['Tom', 'Jane', 'Joe'], 1: ['Joana', 'Clare', 'Tom'], 2: ['Clare', 'Jane', 'Joe']}
dict2 ={0: 'Tom', 1: 'Clare', 2: 'Jane'}
result = {}
val_list = list(dict1.values())
for i in range(0,len(dict1)):
result.update({i : val_list[i].index(dict2[i])})
print(result)
for a list of dictionaries
sample_dict = [
{'a': 'woot', 'b': 'nope', 'c': 'duh', 'd': 'rough', 'e': '1'},
{'a': 'coot', 'b': 'nope', 'c': 'ruh', 'd': 'rough', 'e': '2'},
{'a': 'doot', 'b': 'nope', 'c': 'suh', 'd': 'rough', 'e': '3'},
{'a': 'soot', 'b': 'nope', 'c': 'fuh', 'd': 'rough', 'e': '4'},
{'a': 'toot', 'b': 'nope', 'c': 'cuh', 'd': 'rough', 'e': '1'}
]
How do I make a separate dictionary that contains all the key,value pair that match to a certain key. With list comprehension I created a list of all the key,value pairs like this:
container = [[key,val] for s in sample_dict for key,val in s.iteritems() if key == 'a']
Now the container gave me
[['a', 'woot'], ['a', 'coot'], ['a', 'doot'], ['a', 'soot'], ['a', 'toot']]
Which is all fine... but if I want to do the same with dictionaries, I get only a singe key,value pair. Why does this happen ?
container = {key : val for s in sample_dict for key,val in s.iteritems() if key == 'a'}
The container gives only a single element
{'a': 'toot'}
I want the something like
{'a': ['woot','coot','doot','soot','toot']}
How do I do this with minimal change to the code above ?
You are generating multiple key-value pairs with the same key, and a dictionary will only ever store unique keys.
If you wanted just one key, you'd use a dictionary with a list comprehension:
container = {'a': [s['a'] for s in sample_dict if 'a' in s]}
Note that there is no need to iterate over the nested dictionaries in sample_dict if all you wanted was a specific key; in the above I simply test if the key exists ('a' in s) and extract the value for that key with s['a']. This is much faster than looping over all the keys.
Another option:
filter = lambda arr, x: { x: [ e.get(x) for e in arr] }
So, from here, you can construct the dict based on the original array and the key
filter(sample_dict, 'a')
# {'a': ['woot', 'coot', 'doot', 'soot', 'toot']}
I want to write a code which takes the following inputs:
list (list of maps)
request_keys (list of strings)
operation (add,substract,multiply,concat)
The code would look at the list for the maps having the same value for all keys except the keys given in request_keys. Upon finding two maps for which the value in the search keys match, the code would do the operation (add,multiple,substract,concat) on the two maps and combine them into one map. This combination map would basically replace the other two maps.
i have written the following peice of code to do this. The code only does add operation. It can be extended to make the other operations
In [83]: list
Out[83]:
[{'a': 2, 'b': 3, 'c': 10},
{'a': 2, 'b': 3, 'c': 3},
{'a': 2, 'b': 4, 'c': 4},
{'a': 2, 'b': 3, 'c': 2},
{'a': 2, 'b': 3, 'c': 3}]
In [84]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:def func(list,request_keys):
: new_list = []
: found_indexes = []
: for i in range(0,len(list)):
: new_item = list[i]
: if i in found_indexes:
: continue
: for j in range(0,len(list)):
: if i != j and {k: v for k,v in list[i].iteritems() if k not in request_keys} == {k: v for k,v in list[j].iteritems() if k not in request_keys}:
: found_indexes.append(j)
: for request_key in request_keys:
: new_item[request_key] += list[j][request_key]
: new_list.append(new_item)
: return new_list
:--
In [85]: func(list,['c'])
Out[85]: [{'a': 2, 'b': 3, 'c': 18}, {'a': 2, 'b': 4, 'c': 4}]
In [86]:
What i want to know is, is there a faster, more memory efficient, cleaner and a more pythonic way of doing the same?
Thank you
You manually generate all the combinations and then compare each of those combinations. This is pretty wasteful. Instead, I suggest grouping the dictionaries in another dictionary by their matching keys, then adding the "same" dictionaries. Also, you forgot the operator parameter.
import collections, operator, functools
def func(lst, request_keys, op=operator.add):
matching_dicts = collections.defaultdict(list)
for d in lst:
key = tuple(sorted(((k, d[k]) for k in d if k not in request_keys)))
matching_dicts[key].append(d)
for group in matching_dicts.values():
merged = dict(group[0])
merged.update({key: functools.reduce(op, (g[key] for g in group))
for key in request_keys})
yield merged
What this does: First, it creates a dictionary, mapping the key-value pairs that have to be equal for two dictionaries to match to all those dictionaries that have those key-value pairs. Then it iterates the dicts from those groups, using one of that group as a prototype and updating it with the sum (or product, or whatever, depending on the operator) of the all the dicts in that group for the required_keys.
Note that this returns a generator. If you want a list, just call it like list(func(...)), or accumulate the merged dicts in a list and return that list.
from itertools import groupby
from operator import itemgetter
def mergeDic(inputData, request_keys):
keys = inputData[0].keys()
comparedKeys = [item for item in keys if item not in request_keys]
grouper = itemgetter(*comparedKeys)
result = []
for key, grp in groupby(sorted(inputData, key = grouper), grouper):
temp_dict = dict(zip(comparedKeys, key))
for request_key in request_keys:
temp_dict[request_key] = sum(item[request_key] for item in grp)
result.append(temp_dict)
return result
inputData = [{'a': 2, 'b': 3, 'c': 10},
{'a': 2, 'b': 3, 'c': 3},
{'a': 2, 'b': 4, 'c': 4},
{'a': 2, 'b': 3, 'c': 2},
{'a': 2, 'b': 3, 'c': 3}]
from pprint import pprint
pprint(mergeDic(inputData,['c']))
i just picked up python not too long ago.
An example below
i have a dictionary within a list
myword = [{'a': 2},{'b':3},{'c':4},{'a':1}]
I need to change it to the output below
[{'a':3} , {'b':3} , {'c':4}]
is there a way where i can add the value together? I tried using counter, but it prints out the each dict out.
what i did using Counter:
for i in range(1,4,1):
text = myword[i]
Print Counter(text)
The output
Counter({'a': 2})
Counter({'b': 3})
Counter({'c': 4})
Counter({'a': 1})
i have read the link below but what they compared was between 2 dict.
Is there a better way to compare dictionary values
Thanks!
Merge dictionaries into one dictionary (Counter), and split them.
>>> from collections import Counter
>>> myword = [{'a': 2}, {'b':3}, {'c':4}, {'a':1}]
>>> c = Counter()
>>> for d in myword:
... c.update(d)
...
>>> [{key: value} for key, value in c.items()]
[{'a': 3}, {'c': 4}, {'b': 3}]
>>> [{key: value} for key, value in sorted(c.items())]
[{'a': 3}, {'b': 3}, {'c': 4}]
I have a dictionary where the value elements are lists:
d1={'A': [], 'C': ['SUV'], 'B': []}
I need to concatenate the values into a single list ,only if the list is non-empty.
Expected output:
o=['SUV']
Help is appreciated.
from itertools import chain
d1={'A': [], 'C': ['SUV'], 'B': []}
print list(chain.from_iterable(d1.itervalues()))
>>> d1 = {'A': [], 'C': ['SUV'], 'B': []}
>>> [ele for lst in d1.itervalues() for ele in lst]
['SUV']
You can use itertools.chain, but the order can be arbitrary as dicts are unordered collection. So may have have to sort the dict based on keys or values to get the desired result.
>>> d1={'A': [], 'C': ['SUV'], 'B': []}
>>> from itertools import chain
>>> list(chain(*d1.values())) # or use d1.itervalues() as it returns an iterator(memory efficient)
['SUV']
>>> from operator import add
>>> d1={'A': [], 'C': ['SUV'], 'B': []}
>>> reduce(add,d1.itervalues())
['SUV']
Or more comprehensive example:
>>> d2={'A': ["I","Don't","Drive"], 'C': ['SUV'], 'B': ["Boy"]}
>>> reduce(add,d2.itervalues())
['I', "Don't", 'Drive', 'SUV', 'Boy']