Convert list of dicts of dict into DataFrame - python

I have a list of dictionaries of dictionary looks like:
[{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
and the result should looks like:
a c d e f
0 1 1 2 3 4
1 2 2 3 4 3
2 3 3 4 5 2
3 4 4 5 6 1
while the default pd.DataFrame(data) looks like:
a b f
0 1 {'c': 1, 'd': 2, 'e': 3} 4
1 2 {'c': 2, 'd': 3, 'e': 4} 3
2 3 {'c': 3, 'd': 4, 'e': 5} 2
3 4 {'c': 4, 'd': 5, 'e': 6} 1
How can I do this with pandas? Thanks.

you need to convert json to flat data as such:
import pandas as pd
from pandas.io.json import json_normalize
data = [{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
df = pd.DataFrame.from_dict(json_normalize(data), orient='columns')
df
# output:
a b.c b.d b.e f
0 1 1 2 3 4
1 2 2 3 4 3
2 3 3 4 5 2
3 4 4 5 6 1
You can rename the columns once it's done..

json_normalize is what you're loooking for!
import pandas as pd
from pandas.io.json import json_normalize
x = [{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
sep = '::::' # string that doesn't appear in column names
frame = json_normalize(x, sep=sep)
frame.columns = frame.columns.str.split(sep).str[-1]
print(frame)
Output
a c d e f
0 1 1 2 3 4
1 2 2 3 4 3
2 3 3 4 5 2
3 4 4 5 6 1

import pandas as pd
z=[{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
step1=pd.DataFrame(z)
column_with_sets = 'b'
step2=pd.DataFrame(list(step1[column_with_sets]))
step3=pd.concat([step1[[i for i in step1.columns if column_with_sets
not in i]], step2],1)
step4=output.reindex_axis(sorted(output.columns), axis=1)

Related

Updating a nested dictionary whose root keys match the index of a certain dataframe with said dataframe’s values

I have a nested dict that is uniform throughout (i.e. each 2nd level dict will have the same keys).
{
'0': {'a': 1, 'b': 2},
'1': {'a': 3, 'b': 4},
'2': {'a': 5, 'b': 6},
}
and the following data frame
c
0 9
1 6
2 4
Is there a way (without for loops) to update/map the dict/key-values such that I get
{
'0': {'a': 1, 'b': 2, 'c': 9},
'1': {'a': 3, 'b': 4, 'c': 6},
'2': {'a': 5, 'b': 6, 'c': 4},
}
Try this
# input
my_dict = {
'0': {'a': 1, 'b': 2},
'1': {'a': 3, 'b': 4},
'2': {'a': 5, 'b': 6},
}
my_df = pd.DataFrame({'c': [9, 6, 4]})
# build df from my_dict
df1 = pd.DataFrame.from_dict(my_dict, orient='index')
# append my_df as a column to df1
df1['c'] = my_df.values
# get dictionary
df1.to_dict('index')
But a simple loop is much more efficient here. I tested on a sample with 1mil entries and the loop is 2x faster.1
for d, c in zip(my_dict.values(), my_df['c']):
d['c'] = c
my_dict
{'0': {'a': 1, 'b': 2, 'c': 9},
'1': {'a': 3, 'b': 4, 'c': 6},
'2': {'a': 5, 'b': 6, 'c': 4}}
1: Constructing a dataframe is expensive, so unless you want a dataframe (and possibly do other computations later), it's not worth it to construct one for a task such as this one.

circular changing of key values in python dict [duplicate]

This question already has answers here:
Rotate values of a dictionary
(6 answers)
Closed 2 years ago.
I have a dict :
d = {'a': 0, 'b': 1, 'c': 2, 'd': 3}
Is there any python API which allows getting the bellow result
API(d)... = {'a': 1, 'b': 2, 'c': 3, 'd': 0}
API(d)... = {'a': 2, 'b': 3, 'c': 0, 'd': 1}
API(d)... = {'a': 3, 'b': 0, 'c': 1, 'd': 2}
You can implement it simply without taking much help from any non-standard library, like :
def rotate(d):
keys = d.keys()
values = list(d.values())
values = values[1:] + values[:1]
d = dict(zip(keys, values))
return d
d = {'a': 0, 'b': 1, 'c': 2, 'd': 3}
d = rotate(d)
print(d)
d = rotate(d)
print(d)
d = rotate(d)
print(d)
d = rotate(d)
print(d)
Output :
{'a': 1, 'b': 2, 'c': 3, 'd': 0}
{'a': 2, 'b': 3, 'c': 0, 'd': 1}
{'a': 3, 'b': 0, 'c': 1, 'd': 2}
{'a': 0, 'b': 1, 'c': 2, 'd': 3}
You try this. Write a function which rotates the list in the clock-wise direction.
def API(d):
val=list(d.values())
val.append(val.pop(0))
return dict(zip(d,val))
d = {'a': 0, 'b': 1, 'c': 2, 'd': 3}
d= API(d)
# {'a': 1, 'b': 2, 'c': 3, 'd': 0}
d= API(d)
# {'a': 2, 'b': 3, 'c': 0, 'd': 1}
d= API(d)
# {'a': 3, 'b': 0, 'c': 1, 'd': 2}

Is there any way to sort this dictionaries by lowest value from keys?

I just wanna sort these dictionaries with some values from an input file.
def sortdicts():
listofs=[]
listofs=splitndict()
print sorted(listofs)
The splitndict() function has this output:
[{'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}, {'y': 5, 'x': 0}]
While the input is from another file and it's:
a 1
b 2
c 2
d 4
a 7
c 3
x 0
y 5
I used this to split the dictionary:
def splitndict():
listofd=[]
variablesRead=readfromfile()
splitted=[i.split() for i in variablesRead]
d={}
for lines in splitted:
if lines:
d[lines[0]]=int(lines[1])
elif d=={}:
pass
else:
listofd.append(d)
d={}
print listofd
return listofd
The output file should look like this:
[{'y': 5, 'x': 0}, {'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}
This output because :
It needs to be sorted by the lowest value from each dictionary key.
array = [{'y': 5, 'x': 0}, {'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}]
for the above array:
array = sorted(array, lambda element: min(element.values()))
where "element.values()" returns all values from dictionary and "min" returns the minimum of those values.
"sorted" passes each dictionary (an element) inside the lambda function one by one. and sorts on the basis of the result from the lambda function.
x = [{'y': 5, 'x': 0}, {'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}]
sorted(x, key=lambda i: min(i.values()))
Output is
[{'y': 5, 'x': 0}, {'a': 1, 'b': 2}, {'c': 2, 'd': 4}, {'a': 7, 'c': 3}]

Python For each group in DataFrame create a list of dictionaries

Trying to create dictionary out of each grouping defined in column 'a' in python. Below is pandas DataFrame.
id | day | b | c
-----------------
A1 1 H 2
A1 1 C 1
A1 2 H 3
A1 2 C 5
A2 1 H 5
A2 1 C 6
A2 2 H 2
A2 2 C 1
What I am trying to accomplish is a list of dictionaries for each 'id':
id A1: [{H: 2, C: 1}, {H: 3, C: 5}]
id A2: [{H: 5, C: 6}, {H: 2, C: 1}]
A little bit long ..:-)
df.groupby(['id','day'])[['b','c']].apply(lambda x : {t[0]:t[1:][0] for t in x.values.tolist()}).groupby(level=0).apply(list)
Out[815]:
id
A1 [{'H': 2, 'C': 1}, {'H': 3, 'C': 5}]
A2 [{'H': 5, 'C': 6}, {'H': 2, 'C': 1}]
dtype: object
Let's reshape the dataframe then use groupby and to_dict:
df.set_index(['id','day','b']).unstack()['c']\
.groupby(level=0).apply(lambda x: x.to_dict('records'))
Output:
id
A1 [{'H': 2, 'C': 1}, {'H': 3, 'C': 5}]
A2 [{'H': 5, 'C': 6}, {'H': 2, 'C': 1}]
dtype: object
We can make use of dual groupby i.e
one = df.groupby(['id','day']).apply(lambda x : dict(zip(x['b'],x['c']))).reset_index()
id day 0
0 A1 1 {'C': 1, 'H': 2}
1 A1 2 {'C': 5, 'H': 3}
2 A2 1 {'C': 6, 'H': 5}
3 A2 2 {'C': 1, 'H': 2}
one.groupby('id')[0].apply(list)
id
A1 [{'C': 1, 'H': 2}, {'C': 5, 'H': 3}]
A2 [{'C': 6, 'H': 5}, {'C': 1, 'H': 2}]
Name: 0, dtype: object

Combining all combinations of two lists into a dict of special form

I have two lists:
var_a = [1,2,3,4]
var_b = [6,7]
I want to have a list of dicts as follows:
result = [{'a':1,'b':6},{'a':1,'b':7},{'a':2,'b':6},{'a':2,'b':7},....]
I think the result should be clear.
[{k:v for k,v in itertools.izip('ab', comb)} for comb in itertools.product([1,2,3,4], [6,7])]
>>> import itertools
>>> [{k:v for k,v in itertools.izip('ab', comb)} for comb in itertools.product([
1,2,3,4], [6,7])]
[{'a': 1, 'b': 6}, {'a': 1, 'b': 7}, {'a': 2, 'b': 6}, {'a': 2, 'b': 7}, {'a': 3
, 'b': 6}, {'a': 3, 'b': 7}, {'a': 4, 'b': 6}, {'a': 4, 'b': 7}]
from itertools import product
a = [1,2,3,4]
b = [6,7]
[dict(zip(('a','b'), (i,j))) for i,j in product(a,b)]
yields
[{'a': 1, 'b': 6},
{'a': 1, 'b': 7},
{'a': 2, 'b': 6},
{'a': 2, 'b': 7},
{'a': 3, 'b': 6},
{'a': 3, 'b': 7},
{'a': 4, 'b': 6},
{'a': 4, 'b': 7}]
If the name of variables is given to you, you could use.
>>> a = [1,2,3,4]
>>> b = [6,7]
>>> from itertools import product
>>> nameTup = ('a', 'b')
>>> [dict(zip(nameTup, elem)) for elem in product(a, b)]
[{'a': 1, 'b': 6}, {'a': 1, 'b': 7}, {'a': 2, 'b': 6}, {'a': 2, 'b': 7}, {'a': 3, 'b': 6}, {'a': 3, 'b': 7}, {'a': 4, 'b': 6}, {'a': 4, 'b': 7}]

Categories