Python create dictionary from dataframe in loop - python

first post!
I am trying to create a function that create dictionary in loop from a dataframe.
Assume those 2 simplistic dataframes already exist:
data1 = {'A':[1, 2, 3, 4], 'B':[5, 6, 7, 8]}
df1 = pd.DataFrame(data)
dataframe1
and
data2 = {'C':[9, 10], 'D':[11, 12], 'E':[13, 14] }
df2 = pd.DataFrame(data2)
dataframe2
I want to be able to create a function like this:
def create_dict(df):
where the end results of df1 is:
dict1 = { 'A' : 1, 'B' : 5}
dict2 = { 'A' : 2, 'B' : 6}
dict3 = { 'A' : 3, 'B' : 7}
dict4 = { 'A' : 4, 'B' : 8}
and the end results of df2 is:
dict1 = { 'C' : 9, 'D' : 11, 'E' : 13}
dict2 = { 'C' : 10, 'D' : 12, 'E' : 14}
I was looking at dictionary comprehension to handle this, but I'm obviously not sure how to handle that problem. Thanks!

Use pandas.DataFrame.to_dict with records:
df1.to_dict(orient="records")
Output:
[{'A': 1, 'B': 5}, {'A': 2, 'B': 6}, {'A': 3, 'B': 7}, {'A': 4, 'B': 8}]

Related

Updating a nested dictionary whose root keys match the index of a certain dataframe with said dataframe’s values

I have a nested dict that is uniform throughout (i.e. each 2nd level dict will have the same keys).
{
'0': {'a': 1, 'b': 2},
'1': {'a': 3, 'b': 4},
'2': {'a': 5, 'b': 6},
}
and the following data frame
c
0 9
1 6
2 4
Is there a way (without for loops) to update/map the dict/key-values such that I get
{
'0': {'a': 1, 'b': 2, 'c': 9},
'1': {'a': 3, 'b': 4, 'c': 6},
'2': {'a': 5, 'b': 6, 'c': 4},
}
Try this
# input
my_dict = {
'0': {'a': 1, 'b': 2},
'1': {'a': 3, 'b': 4},
'2': {'a': 5, 'b': 6},
}
my_df = pd.DataFrame({'c': [9, 6, 4]})
# build df from my_dict
df1 = pd.DataFrame.from_dict(my_dict, orient='index')
# append my_df as a column to df1
df1['c'] = my_df.values
# get dictionary
df1.to_dict('index')
But a simple loop is much more efficient here. I tested on a sample with 1mil entries and the loop is 2x faster.1
for d, c in zip(my_dict.values(), my_df['c']):
d['c'] = c
my_dict
{'0': {'a': 1, 'b': 2, 'c': 9},
'1': {'a': 3, 'b': 4, 'c': 6},
'2': {'a': 5, 'b': 6, 'c': 4}}
1: Constructing a dataframe is expensive, so unless you want a dataframe (and possibly do other computations later), it's not worth it to construct one for a task such as this one.

How to deduplicate dictionary in Python? Is there any method to override hashCode of some class like Java? [duplicate]

This question already has answers here:
List of unique dictionaries
(23 answers)
Closed 1 year ago.
elements = [
{'a' : 1, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 3, 'c': 3},
{'a' : 1, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 2},
{'a' : 1, 'b' : 2, 'c': 3, 'd' : 4},
{'v' : [1,2,3]}
]
Given above list of dict in Python, how to deduplicate to the following collection(order doesn't matter) efficiently
result = [
{'a' : 1, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 3, 'c': 3},
{'a' : 2, 'b' : 2},
{'a' : 1, 'b' : 2, 'c': 3, 'd' : 4},
{'v' : [1,2,3]}
]
The naive method is to use set, however dict in Python is unhashable. Right now, my solution is to serialize dict to String like json format (since dict has no order, two different strings can correspond to same dict. I have to keep some order). However this method has too high time complexity.
My Questions:
How to efficiently deduplicate dictionary in Python?
More generally, is there any method to override a class's hashCode like Java to use set or dict?
For your toy example with few data you can use the repr of the inner dictionaries as key for a new dictionary, then collect all the values:
elements = [{'a' : 1, 'b' : 2, 'c': 3}, {'a' : 2, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 3, 'c': 3}, {'a' : 1, 'b' : 2, 'c': 3},
{'a' : 2, 'b' : 2, 'c': 3}, {'a' : 2, 'b' : 2},
{'a' : 1, 'b' : 2, 'c': 3, 'd' : 4}, {'v' : [1,2,3]}]
kv = {repr(inner):inner for inner in elements}
elements = list(kv.values())
print(elements)
Output:
[{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 2, 'c': 3}, {'a': 2, 'b': 3, 'c': 3},
{'a': 2, 'b': 2}, {'a': 1, 'b': 2, 'c': 3, 'd': 4}, {'v': [1, 2, 3]}]
If you check the id() of your inner dictionaries you'll see the last one survives.

Three lists zipped into list of dicts

Consider the following:
>>> # list of length n
>>> idx = ['a', 'b', 'c', 'd']
>>> # list of length n
>>> l_1 = [1, 2, 3, 4]
>>> # list of length n
>>> l_2 = [5, 6, 7, 8]
>>> # first key
>>> key_1 = 'mkt_o'
>>> # second key
>>> key_2 = 'mkt_c'
How do I zip this mess to look like this?
{
'a': {'mkt_o': 1, 'mkt_c': 5},
'b': {'mkt_o': 2, 'mkt_c': 6},
'c': {'mkt_o': 3, 'mkt_c': 6},
'd': {'mkt_o': 4, 'mkt_c': 7},
...
}
The closest I've got is something like this:
>>> dict(zip(idx, zip(l_1, l_2)))
{'a': (1, 5), 'b': (2, 6), 'c': (3, 7), 'd': (4, 8)}
Which of course has tuples as values instead of dictionaries, and
>>> dict(zip(('mkt_o', 'mkt_c'), (1,2)))
{'mkt_o': 1, 'mkt_c': 2}
Which seems like it might be promising, but again, fails to meet requirements.
{k : {key_1 : v1, key_2 : v2} for k,v1,v2 in zip(idx, l_1, l_2)}
Solution 1: You may use zip twice (actually thrice) with dictionary comprehension to achieve this as:
idx = ['a', 'b', 'c', 'd']
l_1 = [1, 2, 3, 4]
l_2 = [5, 6, 7, 8]
keys = ['mkt_o', 'mkt_c'] # yours keys in another list
new_dict = {k: dict(zip(keys, v)) for k, v in zip(idx, zip(l_1, l_2))}
Solution 2: You may also use zip with nested list comprehension as:
new_dict = dict(zip(idx, [{key_1: i, key_2: j} for i, j in zip(l_1, l_2)]))
Solution 3: using dictionary comprehension on top of zip as shared in DYZ's answer:
new_dict = {k : {key_1 : v1, key_2 : v2} for k,v1,v2 in zip(idx, l_1, l_2)}
All the above solutions will return new_dict as:
{
'a': {'mkt_o': 1, 'mkt_c': 5},
'b': {'mkt_o': 2, 'mkt_c': 6},
'c': {'mkt_o': 3, 'mkt_c': 7},
'd': {'mkt_o': 4, 'mkt_c': 8}
}
You're working with dicts, lists, indices, keys and would like to transpose the data. It might make sense to work with pandas (DataFrame, .T and .to_dict):
>>> import pandas as pd
>>> idx = ['a', 'b', 'c', 'd']
>>> l_1 = [1, 2, 3, 4]
>>> l_2 = [5, 6, 7, 8]
>>> key_1 = 'mkt_o'
>>> key_2 = 'mkt_c'
>>> pd.DataFrame([l_1, l_2], index=[key_1, key_2], columns = idx)
a b c d
mkt_o 1 2 3 4
mkt_c 5 6 7 8
>>> pd.DataFrame([l_1, l_2], index=[key_1, key_2], columns = idx).T
mkt_o mkt_c
a 1 5
b 2 6
c 3 7
d 4 8
>>> pd.DataFrame([l_1, l_2], index=[key_1, key_2], columns = idx).to_dict()
{'a': {'mkt_o': 1, 'mkt_c': 5},
'b': {'mkt_o': 2, 'mkt_c': 6},
'c': {'mkt_o': 3, 'mkt_c': 7},
'd': {'mkt_o': 4, 'mkt_c': 8}
}
It can also be done with dict, zip, map and repeat from itertools:
>>> from itertools import repeat
>>> dict(zip(idx, map(dict, zip(zip(repeat(key_1), l_1), zip(repeat(key_2), l_2)))))
{'a': {'mkt_c': 5, 'mkt_o': 1}, 'c': {'mkt_c': 7, 'mkt_o': 3}, 'b': {'mkt_c': 6, 'mkt_o': 2}, 'd': {'mkt_c': 8, 'mkt_o': 4}}

Create Dictionary Of Lists from List of Dictionaries in Python

I have a list of dictionaries as follows.
[{'a' : 1, 'b' : 2, 'c' : 2},
{'a' : 2, 'b' : 3, 'c' : 3},
{'a' : 3, 'b' : 5, 'c' : 6},
{'a' : 4, 'b' : 7, 'c' : 8},
{'a' : 1, 'b' : 8, 'c' : 9},
{'a' : 2, 'b' : 0, 'c' : 0},
{'a' : 5, 'b' : 1, 'c' : 3},
{'a' : 7, 'b' : 4, 'c' : 5}]
I want to create a dictionary of lists from above list which should be as follows.
{1 : [{'a' : 1, 'b' : 2, 'c' : 2}, {'a' : 1, 'b' : 8, 'c' : 9}]
2 : [{'a' : 2, 'b' : 3, 'c' : 3}, {'a' : 2, 'b' : 0, 'c' : 0}]
3 : [{'a' : 3, 'b' : 5, 'c' : 6}]
4 : [{'a' : 4, 'b' : 7, 'c' : 8}]
5 : [{'a' : 5, 'b' : 1, 'c' : 3}]
7 : [{'a' : 7, 'b' : 4, 'c' : 5}]
Basically I want to pick one of the keys in dictionary say 'a', and create new dictionary with the values of that key (1, 2, 3, 4, 5, 7) as keys for new dictionary to be created, and values for new dictionary should be list of all the dictionaries containing that value as value for key 'a'.
I know the simplest approach is iterating over the list and build the required dictionary. I am just curious is there another way of doing it.
A collections.defaultdict will be the most efficient:
from collections import defaultdict
l = [{'a': 1, 'b': 2, 'c': 2},
{'a': 2, 'b': 3, 'c': 3},
{'a': 3, 'b': 5, 'c': 6},
{'a': 4, 'b': 7, 'c': 8},
{'a': 1, 'b': 8, 'c': 9},
{'a': 2, 'b': 0, 'c': 0},
{'a': 5, 'b': 1, 'c': 3},
{'a': 7, 'b': 4, 'c': 5}]
dct = defaultdict(list)
for d in l:
dct[d["a"]].append(d)
from pprint import pprint as pp
pp(dict(dct))
Output:
{1: [{'a': 1, 'b': 2, 'c': 2}, {'a': 1, 'b': 8, 'c': 9}],
2: [{'a': 2, 'b': 3, 'c': 3}, {'a': 2, 'b': 0, 'c': 0}],
3: [{'a': 3, 'b': 5, 'c': 6}],
4: [{'a': 4, 'b': 7, 'c': 8}],
5: [{'a': 5, 'b': 1, 'c': 3}],
7: [{'a': 7, 'b': 4, 'c': 5}]}
Normal dictionary with setdefault method can be used
Code:
data=[{'a' : 1, 'b' : 2, 'c' : 2},
{'a' : 2, 'b' : 3, 'c' : 3},
{'a' : 3, 'b' : 5, 'c' : 6},
{'a' : 4, 'b' : 7, 'c' : 8},
{'a' : 1, 'b' : 8, 'c' : 9},
{'a' : 2, 'b' : 0, 'c' : 0},
{'a' : 5, 'b' : 1, 'c' : 3},
{'a' : 7, 'b' : 4, 'c' : 5}]
dictionary_list={}
for row in data:
dictionary_list.setdefault(row["a"],[]).append(row)
print dictionary_list
Output:
{1: [{'a': 1, 'c': 2, 'b': 2}, {'a': 1, 'c': 9, 'b': 8}],
2: [{'a': 2, 'c': 3, 'b': 3}, {'a': 2, 'c': 0, 'b': 0}],
3: [{'a': 3, 'c': 6, 'b': 5}],
4: [{'a': 4, 'c': 8, 'b': 7}],
5: [{'a': 5, 'c': 3, 'b': 1}],
7: [{'a': 7, 'c': 5, 'b': 4}]}
You can do it in following way
mylist = [
{'a' : 1, 'b' : 2, 'c' : 2},
{'a' : 2, 'b' : 3, 'c' : 3},
{'a' : 3, 'b' : 5, 'c' : 6},
{'a' : 4, 'b' : 7, 'c' : 8},
{'a' : 1, 'b' : 8, 'c' : 9},
{'a' : 2, 'b' : 0, 'c' : 0},
{'a' : 5, 'b' : 1, 'c' : 3},
{'a' : 7, 'b' : 4, 'c' : 5}
]
def get_dict(mylist, required_key):
result_dict = {}
for mydict in mylist:
result_dict.setdefault(mydict[required_key], [])
result_dict[mydict[required_key]].append(mydict)
return result_dict
result_dict = get_dict(mylist, required_key = 'a')
print(result_dict)

Dynamic Dictionary of dictionaries Python

I wanted to create a dictionary of dictionaries in Python:
Suppose I already have a list which contains the keys:
keys = ['a', 'b', 'c', 'd', 'e']
value = [1, 2, 3, 4, 5]
Suppose I have a data field with numeric values (20 of them)
I want to define a dictionary which stores 4 different dictionaries with the given to a corresponding value
for i in range(0, 3)
for j in range(0, 4)
dictionary[i] = { 'keys[j]' : value[j] }
So basically, it should be like:
dictionary[0] = {'a' : 1, 'b' : 2, 'c' : 3, 'd': 4, 'e':5}
dictionary[1] = {'a' : 1, 'b' : 2, 'c' : 3, 'd': 4, 'e':5}
dictionary[2] = {'a' : 1, 'b' : 2, 'c' : 3, 'd': 4, 'e':5}
dictionary[3] = {'a' : 1, 'b' : 2, 'c' : 3, 'd': 4, 'e':5}
What is the best way to achieve this?
Use a list comprehension and dict(zip(keys,value)) will return the dict for you.
>>> keys = ['a', 'b', 'c', 'd', 'e']
>>> value = [1, 2, 3, 4, 5]
>>> dictionary = [dict(zip(keys,value)) for _ in xrange(4)]
>>> from pprint import pprint
>>> pprint(dictionary)
[{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}]
If you want a dict of dicts then use a dict comprehension:
>>> keys = ['a', 'b', 'c', 'd', 'e']
>>> value = [1, 2, 3, 4, 5]
>>> dictionary = {i: dict(zip(keys,value)) for i in xrange(4)}
>>> pprint(dictionary)
{0: {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
1: {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
2: {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
3: {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}}
An alternative that only zips once...:
from itertools import repeat
map(dict, repeat(zip(keys,values), 4))
Or, maybe, just use dict.copyand construct the dict once:
[d.copy() for d in repeat(dict(zip(keys, values)), 4)]
for a list of dictionaries:
dictionary = [dict(zip(keys,value)) for i in xrange(4)]
If you really wanted a dictionary of dictionaries like you said:
dictionary = dict((i,dict(zip(keys,value))) for i in xrange(4))
I suppose you could use pop or other dict calls which you could not from a list
BTW: if this is really a data/number crunching application, I'd suggest moving on to numpy and/or pandas as great modules.
Edit re: OP comments,
if you want indicies for the type of data you are talking about:
# dict keys must be tuples and not lists
[(i,j) for i in xrange(4) for j in range(3)]
# same can come from itertools.product
from itertools import product
list(product(xrange4, xrange 3))

Categories