Create Dictionary Of Lists from List of Dictionaries in Python - python

I have a list of dictionaries as follows.
[{'a' : 1, 'b' : 2, 'c' : 2},
{'a' : 2, 'b' : 3, 'c' : 3},
{'a' : 3, 'b' : 5, 'c' : 6},
{'a' : 4, 'b' : 7, 'c' : 8},
{'a' : 1, 'b' : 8, 'c' : 9},
{'a' : 2, 'b' : 0, 'c' : 0},
{'a' : 5, 'b' : 1, 'c' : 3},
{'a' : 7, 'b' : 4, 'c' : 5}]
I want to create a dictionary of lists from above list which should be as follows.
{1 : [{'a' : 1, 'b' : 2, 'c' : 2}, {'a' : 1, 'b' : 8, 'c' : 9}]
2 : [{'a' : 2, 'b' : 3, 'c' : 3}, {'a' : 2, 'b' : 0, 'c' : 0}]
3 : [{'a' : 3, 'b' : 5, 'c' : 6}]
4 : [{'a' : 4, 'b' : 7, 'c' : 8}]
5 : [{'a' : 5, 'b' : 1, 'c' : 3}]
7 : [{'a' : 7, 'b' : 4, 'c' : 5}]
Basically I want to pick one of the keys in dictionary say 'a', and create new dictionary with the values of that key (1, 2, 3, 4, 5, 7) as keys for new dictionary to be created, and values for new dictionary should be list of all the dictionaries containing that value as value for key 'a'.
I know the simplest approach is iterating over the list and build the required dictionary. I am just curious is there another way of doing it.

A collections.defaultdict will be the most efficient:
from collections import defaultdict
l = [{'a': 1, 'b': 2, 'c': 2},
{'a': 2, 'b': 3, 'c': 3},
{'a': 3, 'b': 5, 'c': 6},
{'a': 4, 'b': 7, 'c': 8},
{'a': 1, 'b': 8, 'c': 9},
{'a': 2, 'b': 0, 'c': 0},
{'a': 5, 'b': 1, 'c': 3},
{'a': 7, 'b': 4, 'c': 5}]
dct = defaultdict(list)
for d in l:
dct[d["a"]].append(d)
from pprint import pprint as pp
pp(dict(dct))
Output:
{1: [{'a': 1, 'b': 2, 'c': 2}, {'a': 1, 'b': 8, 'c': 9}],
2: [{'a': 2, 'b': 3, 'c': 3}, {'a': 2, 'b': 0, 'c': 0}],
3: [{'a': 3, 'b': 5, 'c': 6}],
4: [{'a': 4, 'b': 7, 'c': 8}],
5: [{'a': 5, 'b': 1, 'c': 3}],
7: [{'a': 7, 'b': 4, 'c': 5}]}

Normal dictionary with setdefault method can be used
Code:
data=[{'a' : 1, 'b' : 2, 'c' : 2},
{'a' : 2, 'b' : 3, 'c' : 3},
{'a' : 3, 'b' : 5, 'c' : 6},
{'a' : 4, 'b' : 7, 'c' : 8},
{'a' : 1, 'b' : 8, 'c' : 9},
{'a' : 2, 'b' : 0, 'c' : 0},
{'a' : 5, 'b' : 1, 'c' : 3},
{'a' : 7, 'b' : 4, 'c' : 5}]
dictionary_list={}
for row in data:
dictionary_list.setdefault(row["a"],[]).append(row)
print dictionary_list
Output:
{1: [{'a': 1, 'c': 2, 'b': 2}, {'a': 1, 'c': 9, 'b': 8}],
2: [{'a': 2, 'c': 3, 'b': 3}, {'a': 2, 'c': 0, 'b': 0}],
3: [{'a': 3, 'c': 6, 'b': 5}],
4: [{'a': 4, 'c': 8, 'b': 7}],
5: [{'a': 5, 'c': 3, 'b': 1}],
7: [{'a': 7, 'c': 5, 'b': 4}]}

You can do it in following way
mylist = [
{'a' : 1, 'b' : 2, 'c' : 2},
{'a' : 2, 'b' : 3, 'c' : 3},
{'a' : 3, 'b' : 5, 'c' : 6},
{'a' : 4, 'b' : 7, 'c' : 8},
{'a' : 1, 'b' : 8, 'c' : 9},
{'a' : 2, 'b' : 0, 'c' : 0},
{'a' : 5, 'b' : 1, 'c' : 3},
{'a' : 7, 'b' : 4, 'c' : 5}
]
def get_dict(mylist, required_key):
result_dict = {}
for mydict in mylist:
result_dict.setdefault(mydict[required_key], [])
result_dict[mydict[required_key]].append(mydict)
return result_dict
result_dict = get_dict(mylist, required_key = 'a')
print(result_dict)

Related

How to deduplicate dictionary in Python? Is there any method to override hashCode of some class like Java? [duplicate]

This question already has answers here:
List of unique dictionaries
(23 answers)
Closed 1 year ago.
elements = [
{'a' : 1, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 3, 'c': 3},
{'a' : 1, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 2},
{'a' : 1, 'b' : 2, 'c': 3, 'd' : 4},
{'v' : [1,2,3]}
]
Given above list of dict in Python, how to deduplicate to the following collection(order doesn't matter) efficiently
result = [
{'a' : 1, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 3, 'c': 3},
{'a' : 2, 'b' : 2},
{'a' : 1, 'b' : 2, 'c': 3, 'd' : 4},
{'v' : [1,2,3]}
]
The naive method is to use set, however dict in Python is unhashable. Right now, my solution is to serialize dict to String like json format (since dict has no order, two different strings can correspond to same dict. I have to keep some order). However this method has too high time complexity.
My Questions:
How to efficiently deduplicate dictionary in Python?
More generally, is there any method to override a class's hashCode like Java to use set or dict?
For your toy example with few data you can use the repr of the inner dictionaries as key for a new dictionary, then collect all the values:
elements = [{'a' : 1, 'b' : 2, 'c': 3}, {'a' : 2, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 3, 'c': 3}, {'a' : 1, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 2, 'c': 3}, {'a' : 2, 'b' : 2},
{'a' : 1, 'b' : 2, 'c': 3, 'd' : 4}, {'v' : [1,2,3]}]
kv = {repr(inner):inner for inner in elements}
elements = list(kv.values())
print(elements)
Output:
[{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 2, 'c': 3}, {'a': 2, 'b': 3, 'c': 3},
{'a': 2, 'b': 2}, {'a': 1, 'b': 2, 'c': 3, 'd': 4}, {'v': [1, 2, 3]}]
If you check the id() of your inner dictionaries you'll see the last one survives.

Python create dictionary from dataframe in loop

first post!
I am trying to create a function that create dictionary in loop from a dataframe.
Assume those 2 simplistic dataframes already exist:
data1 = {'A':[1, 2, 3, 4], 'B':[5, 6, 7, 8]}
df1 = pd.DataFrame(data)
dataframe1
and
data2 = {'C':[9, 10], 'D':[11, 12], 'E':[13, 14] }
df2 = pd.DataFrame(data2)
dataframe2
I want to be able to create a function like this:
def create_dict(df):
where the end results of df1 is:
dict1 = { 'A' : 1, 'B' : 5}
dict2 = { 'A' : 2, 'B' : 6}
dict3 = { 'A' : 3, 'B' : 7}
dict4 = { 'A' : 4, 'B' : 8}
and the end results of df2 is:
dict1 = { 'C' : 9, 'D' : 11, 'E' : 13}
dict2 = { 'C' : 10, 'D' : 12, 'E' : 14}
I was looking at dictionary comprehension to handle this, but I'm obviously not sure how to handle that problem. Thanks!
Use pandas.DataFrame.to_dict with records:
df1.to_dict(orient="records")
Output:
[{'A': 1, 'B': 5}, {'A': 2, 'B': 6}, {'A': 3, 'B': 7}, {'A': 4, 'B': 8}]

Convert dict to list of dict for each combinations

I have a dict looks like this :
my_dict = {
"a":[1, 2, 3],
"b":[10],
"c":[4, 5],
"d":[11]
}
And I would like to obtain a list containig all combinations keeping keys and value like this:
result = [
{"a":1, "b":10, "c":4, "d":11},
{"a":1, "b":10, "c":5, "d":11},
{"a":2, "b":10, "c":4, "d":11},
{"a":2, "b":10, "c":5, "d":11},
{"a":3, "b":10, "c":4, "d":11},
{"a":3, "b":10, "c":5, "d":11}
]
Do someone have a solution for this ?
Is there any existing solution to do this, or how should I proceed to do it myself ?
Thank you.
A task for itertools.product:
>>> from itertools import product
>>> for dict_items in product(*[product([k],v) for k, v in my_dict.items()]):
... print(dict(dict_items))
{'a': 1, 'b': 10, 'c': 4, 'd': 11}
{'a': 1, 'b': 10, 'c': 5, 'd': 11}
{'a': 2, 'b': 10, 'c': 4, 'd': 11}
{'a': 2, 'b': 10, 'c': 5, 'd': 11}
{'a': 3, 'b': 10, 'c': 4, 'd': 11}
{'a': 3, 'b': 10, 'c': 5, 'd': 11}
Small explanation:
The inner product(...) will expand the dict to a list such as [[(k1, v11), (k1, v12), ...], [(k2, v21), (k2, v22), ...], ...].
The outer product(...) will reassemble the items lists by choosing one tuple from each list.
dict(...) will create a dictionary from a sequence of (k1, v#), (k2, v#), ... tuples.
Try:
def permute(d):
k = d.keys()
perms = itertools.product(*d.values())
return [dict(zip(k, v)) for v in perms]
Example usage:
>>> d = {'a': [1, 2, 3], 'b': [10], 'c': [4, 5], 'd': [11]}
>>> pprint(permute(d))
[{'a': 1, 'b': 10, 'c': 4, 'd': 11},
{'a': 1, 'b': 10, 'c': 5, 'd': 11},
{'a': 2, 'b': 10, 'c': 4, 'd': 11},
{'a': 2, 'b': 10, 'c': 5, 'd': 11},
{'a': 3, 'b': 10, 'c': 4, 'd': 11},
{'a': 3, 'b': 10, 'c': 5, 'd': 11}]
Assuming that you are only interested in my_dict having 4 keys, it is simple enough to use nested for loops:
my_dict = {
"a": [1, 2, 3],
"b": [10],
"c": [4, 5],
"d": [11]
}
result = []
for a_val in my_dict['a']:
for b_val in my_dict['b']:
for c_val in my_dict['c']:
for d_val in my_dict['d']:
result.append({'a': a_val, 'b': b_val, 'c': c_val, 'd': d_val})
print(result)
This gives the expected result.
You can use:
from itertools import product
allNames = sorted(my_dict)
values= list(product(*(my_dict[Name] for Name in allNames)))
d = list(dict(zip(['a','b','c','d'],i)) for i in values)
Output:
[{'a': 1, 'c': 4, 'b': 10, 'd': 11},
{'a': 1, 'c': 5, 'b': 10, 'd': 11},
{'a': 2, 'c': 4, 'b': 10, 'd': 11},
{'a': 2, 'c': 5, 'b': 10, 'd': 11},
{'a': 3, 'c': 4, 'b': 10, 'd': 11},
{'a': 3, 'c': 5, 'b': 10, 'd': 11}]
itertools.product produces the combinations of a list of iterators.
dict.values() gets the list needed.
For each combination, zip up the dict.keys() with the combination.
Use a list comprehension to collect them up:
from itertools import product
from pprint import pprint
my_dict = {
"a":[1, 2, 3],
"b":[10],
"c":[4, 5],
"d":[11]
}
result = [dict(zip(my_dict,i)) for i in product(*my_dict.values())]
pprint(result)
Output:
[{'a': 1, 'b': 10, 'c': 4, 'd': 11},
{'a': 1, 'b': 10, 'c': 5, 'd': 11},
{'a': 2, 'b': 10, 'c': 4, 'd': 11},
{'a': 2, 'b': 10, 'c': 5, 'd': 11},
{'a': 3, 'b': 10, 'c': 4, 'd': 11},
{'a': 3, 'b': 10, 'c': 5, 'd': 11}]

Pythonic way to group items in a list [duplicate]

This question already has an answer here:
Group list of dictionaries to list of list of dictionaries with same property value
(1 answer)
Closed 8 years ago.
Consider a list of dicts:
items = [
{'a': 1, 'b': 9, 'c': 8},
{'a': 1, 'b': 5, 'c': 4},
{'a': 2, 'b': 3, 'c': 1},
{'a': 2, 'b': 7, 'c': 9},
{'a': 3, 'b': 8, 'c': 2}
]
Is there a pythonic way to extract and group these items by their a field, such that:
result = {
1 : [{'b': 9, 'c': 8}, {'b': 5, 'c': 4}]
2 : [{'b': 3, 'c': 1}, {'b': 7, 'c': 9}]
3 : [{'b': 8, 'c': 2}]
}
References to any similar Pythonic constructs are appreciated.
Use itertools.groupby:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> {k: list(g) for k, g in groupby(items, itemgetter('a'))}
{1: [{'a': 1, 'c': 8, 'b': 9},
{'a': 1, 'c': 4, 'b': 5}],
2: [{'a': 2, 'c': 1, 'b': 3},
{'a': 2, 'c': 9, 'b': 7}],
3: [{'a': 3, 'c': 2, 'b': 8}]}
If item are not in sorted order then you can either sort them and then use groupby or you can use collections.OrderedDict(if order matters) or collections.defaultdict to do it in O(N) time:
>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> for item in items:
... d.setdefault(item['a'], []).append(item)
...
>>> dict(d.items())
{1: [{'a': 1, 'c': 8, 'b': 9},
{'a': 1, 'c': 4, 'b': 5}],
2: [{'a': 2, 'c': 1, 'b': 3},
{'a': 2, 'c': 9, 'b': 7}],
3: [{'a': 3, 'c': 2, 'b': 8}]}
Update:
I see that you only want the those keys to be returned that we didn't use for grouping, for that you'll need to do something like this:
>>> group_keys = {'a'}
>>> {k:[{k:d[k] for k in d.viewkeys() - group_keys} for d in g]
for k, g in groupby(items, itemgetter(*group_keys))}
{1: [{'c': 8, 'b': 9},
{'c': 4, 'b': 5}],
2: [{'c': 1, 'b': 3},
{'c': 9, 'b': 7}],
3: [{'c': 2, 'b': 8}]}
Note: This code assumes the the data is already sorted. If it is not, we have to sort it manually
from itertools import groupby
print {key:list(grp) for key, grp in groupby(items, key=lambda x:x["a"])}
Output
{1: [{'a': 1, 'b': 9, 'c': 8}, {'a': 1, 'b': 5, 'c': 4}],
2: [{'a': 2, 'b': 3, 'c': 1}, {'a': 2, 'b': 7, 'c': 9}],
3: [{'a': 3, 'b': 8, 'c': 2}]}
To get the result in the same format you asked for,
from itertools import groupby
from operator import itemgetter
a_getter, getter, keys = itemgetter("a"), itemgetter("b", "c"), ("b", "c")
def recon_dicts(items):
return dict(zip(keys, getter(items)))
{key: map(recon_dicts, grp) for key, grp in groupby(items, key=a_getter)}
Output
{1: [{'c': 8, 'b': 9}, {'c': 4, 'b': 5}],
2: [{'c': 1, 'b': 3}, {'c': 9, 'b': 7}],
3: [{'c': 2, 'b': 8}]}
If the data is not sorted already, you can either use the defaultdict method in this answer, or you can use sorted function to sort based on a, like this
{key: map(recon_dicts, grp)
for key, grp in groupby(sorted(items, key=a_getter), key=a_getter)}
References:
operator.itemgetter
itertools.groupby
zip, map, dict, sorted

Dynamic Dictionary of dictionaries Python

I wanted to create a dictionary of dictionaries in Python:
Suppose I already have a list which contains the keys:
keys = ['a', 'b', 'c', 'd', 'e']
value = [1, 2, 3, 4, 5]
Suppose I have a data field with numeric values (20 of them)
I want to define a dictionary which stores 4 different dictionaries with the given to a corresponding value
for i in range(0, 3)
for j in range(0, 4)
dictionary[i] = { 'keys[j]' : value[j] }
So basically, it should be like:
dictionary[0] = {'a' : 1, 'b' : 2, 'c' : 3, 'd': 4, 'e':5}
dictionary[1] = {'a' : 1, 'b' : 2, 'c' : 3, 'd': 4, 'e':5}
dictionary[2] = {'a' : 1, 'b' : 2, 'c' : 3, 'd': 4, 'e':5}
dictionary[3] = {'a' : 1, 'b' : 2, 'c' : 3, 'd': 4, 'e':5}
What is the best way to achieve this?
Use a list comprehension and dict(zip(keys,value)) will return the dict for you.
>>> keys = ['a', 'b', 'c', 'd', 'e']
>>> value = [1, 2, 3, 4, 5]
>>> dictionary = [dict(zip(keys,value)) for _ in xrange(4)]
>>> from pprint import pprint
>>> pprint(dictionary)
[{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}]
If you want a dict of dicts then use a dict comprehension:
>>> keys = ['a', 'b', 'c', 'd', 'e']
>>> value = [1, 2, 3, 4, 5]
>>> dictionary = {i: dict(zip(keys,value)) for i in xrange(4)}
>>> pprint(dictionary)
{0: {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
1: {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
2: {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
3: {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}}
An alternative that only zips once...:
from itertools import repeat
map(dict, repeat(zip(keys,values), 4))
Or, maybe, just use dict.copyand construct the dict once:
[d.copy() for d in repeat(dict(zip(keys, values)), 4)]
for a list of dictionaries:
dictionary = [dict(zip(keys,value)) for i in xrange(4)]
If you really wanted a dictionary of dictionaries like you said:
dictionary = dict((i,dict(zip(keys,value))) for i in xrange(4))
I suppose you could use pop or other dict calls which you could not from a list
BTW: if this is really a data/number crunching application, I'd suggest moving on to numpy and/or pandas as great modules.
Edit re: OP comments,
if you want indicies for the type of data you are talking about:
# dict keys must be tuples and not lists
[(i,j) for i in xrange(4) for j in range(3)]
# same can come from itertools.product
from itertools import product
list(product(xrange4, xrange 3))

Categories