I have a dataframe
df = pd.DataFrame({"a":[1,1,1,2,2,2,3,3], "b":["a","a","a","b","b","b","c","c"], "c":[0,0,1,0,1,1,0,1], "d":["x","y","z","x","y","y","z","x"]})
a b c d
0 1 a 0 x
1 1 a 0 y
2 1 a 1 z
3 2 b 0 x
4 2 b 1 y
5 2 b 1 y
6 3 c 0 z
7 3 c 1 x
I want to groupby on column a and column b to get following output:
a b e
0 1 a [{'c': 0, 'd': 'x'}, {'c': 0, 'd': 'y'}, {'c': 1, 'd': 'z'}]
1 2 b [{'c': 0, 'd': 'x'}, {'c': 1, 'd': 'y'}, {'c': 1, 'd': 'y'}]
2 3 c [{'c': 0, 'd': 'z'}, {'c': 1, 'd': 'x'}]
My solution:
new_df = df.groupby(["a","b"])["c","d"].apply(lambda x: x.to_dict(orient="records")).reset_index(name="e")
But issue is it is behaving inconsistently, sometimes I get below error:
reset_index() got an unexpected keyword argument "name"
It would be helpful if someone point out issue in above solution or provide an alternate way of doing.
You can do
new=ddf.groupby(['a','b'])[['c','d']].apply(lambda x : x.to_dict('r')).to_frame('e').reset_index()
Out[13]:
a b e
0 1 a [{'c': 0, 'd': 'x'}, {'c': 0, 'd': 'y'}, {'c':...
1 2 b [{'c': 0, 'd': 'x'}, {'c': 1, 'd': 'y'}, {'c':...
2 3 c [{'c': 0, 'd': 'z'}, {'c': 1, 'd': 'x'}]
Alternatively we can do:
df['e'] = df[['c', 'd']].agg(lambda s: dict(zip(s.index, s.values)), axis=1)
df1 = df.groupby(['a', 'b'])['e'].agg(list).reset_index()
# print(df1)
a b e
0 1 a [{'c': 0, 'd': 'x'}, {'c': 0, 'd': 'y'}, {'c':...
1 2 b [{'c': 0, 'd': 'x'}, {'c': 1, 'd': 'y'}, {'c':...
2 3 c [{'c': 0, 'd': 'z'}, {'c': 1, 'd': 'x'}]
Related
I want to cut pandas data frame with duplicated values in a column into separate data frames.
So from this:
df = pd.DataFrame({'block': ['A', 'B', 'B', 'C'],
'd': [{'A': 1}, {'B': 'A'}, {'B': 3}, {'C': 'Z'}]})
block d
0 A {'A': 1 }
1 B {'B': 'A'}
2 B {'B': 3 }
3 C {'C': 'Z'}
I would like to achieve two separate data frames:
df1:
block d
0 A {'A': 1 }
1 B {'B': 'A'}
2 C {'C': 'Z'}
df2:
block d
0 A {'A': 1 }
1 B {'B': 3 }
2 C {'C': 'Z'}
Another example:
df = pd.DataFrame({'block': ['A', 'B', 'B', 'C', 'C'],
'd': [{'A': 1}, {'B': 'A'}, {'B': 3}, {'C': 'Z'}, {'C': 10}]})
Result:
block d
0 A {'A': 1 }
1 B {'B': 'A'}
2 C {'C': 'Z'}
block d
0 A {'A': 1 }
1 B {'B': 'A'}
2 C {'C': 10 }
block d
0 A {'A': 1 }
1 B {'B': 3 }
2 C {'C': 'Z'}
block d
0 A {'A': 1 }
1 B {'B': 3 }
2 C {'C': 10 }
I should add that I want to preserve the order of the column 'block'.
I tried pandas explode and itertools package but without good results. If someone knows how to solve this - please help.
One way using pandas.DataFrame.groupby, iterrows and itertools.product:
from itertools import product
prods = []
for _, d in df.groupby("block"):
prods.append([s for _, s in d.iterrows()])
dfs = [pd.concat(ss, axis=1).T for ss in product(*prods)]
print(dfs)
Output:
[ block d
0 A {'A': 1}
1 B {'B': 'A'}
3 C {'C': 'Z'},
block d
0 A {'A': 1}
2 B {'B': 3}
3 C {'C': 'Z'}]
Output for second sample df:
[ block d
0 A {'A': 1}
1 B {'B': 'A'}
3 C {'C': 'Z'},
block d
0 A {'A': 1}
1 B {'B': 'A'}
4 C {'C': 10},
block d
0 A {'A': 1}
2 B {'B': 3}
3 C {'C': 'Z'},
block d
0 A {'A': 1}
2 B {'B': 3}
4 C {'C': 10}]
I have a list of dictionaries of dictionary looks like:
[{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
and the result should looks like:
a c d e f
0 1 1 2 3 4
1 2 2 3 4 3
2 3 3 4 5 2
3 4 4 5 6 1
while the default pd.DataFrame(data) looks like:
a b f
0 1 {'c': 1, 'd': 2, 'e': 3} 4
1 2 {'c': 2, 'd': 3, 'e': 4} 3
2 3 {'c': 3, 'd': 4, 'e': 5} 2
3 4 {'c': 4, 'd': 5, 'e': 6} 1
How can I do this with pandas? Thanks.
you need to convert json to flat data as such:
import pandas as pd
from pandas.io.json import json_normalize
data = [{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
df = pd.DataFrame.from_dict(json_normalize(data), orient='columns')
df
# output:
a b.c b.d b.e f
0 1 1 2 3 4
1 2 2 3 4 3
2 3 3 4 5 2
3 4 4 5 6 1
You can rename the columns once it's done..
json_normalize is what you're loooking for!
import pandas as pd
from pandas.io.json import json_normalize
x = [{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
sep = '::::' # string that doesn't appear in column names
frame = json_normalize(x, sep=sep)
frame.columns = frame.columns.str.split(sep).str[-1]
print(frame)
Output
a c d e f
0 1 1 2 3 4
1 2 2 3 4 3
2 3 3 4 5 2
3 4 4 5 6 1
import pandas as pd
z=[{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
step1=pd.DataFrame(z)
column_with_sets = 'b'
step2=pd.DataFrame(list(step1[column_with_sets]))
step3=pd.concat([step1[[i for i in step1.columns if column_with_sets
not in i]], step2],1)
step4=output.reindex_axis(sorted(output.columns), axis=1)
Trying to create dictionary out of each grouping defined in column 'a' in python. Below is pandas DataFrame.
id | day | b | c
-----------------
A1 1 H 2
A1 1 C 1
A1 2 H 3
A1 2 C 5
A2 1 H 5
A2 1 C 6
A2 2 H 2
A2 2 C 1
What I am trying to accomplish is a list of dictionaries for each 'id':
id A1: [{H: 2, C: 1}, {H: 3, C: 5}]
id A2: [{H: 5, C: 6}, {H: 2, C: 1}]
A little bit long ..:-)
df.groupby(['id','day'])[['b','c']].apply(lambda x : {t[0]:t[1:][0] for t in x.values.tolist()}).groupby(level=0).apply(list)
Out[815]:
id
A1 [{'H': 2, 'C': 1}, {'H': 3, 'C': 5}]
A2 [{'H': 5, 'C': 6}, {'H': 2, 'C': 1}]
dtype: object
Let's reshape the dataframe then use groupby and to_dict:
df.set_index(['id','day','b']).unstack()['c']\
.groupby(level=0).apply(lambda x: x.to_dict('records'))
Output:
id
A1 [{'H': 2, 'C': 1}, {'H': 3, 'C': 5}]
A2 [{'H': 5, 'C': 6}, {'H': 2, 'C': 1}]
dtype: object
We can make use of dual groupby i.e
one = df.groupby(['id','day']).apply(lambda x : dict(zip(x['b'],x['c']))).reset_index()
id day 0
0 A1 1 {'C': 1, 'H': 2}
1 A1 2 {'C': 5, 'H': 3}
2 A2 1 {'C': 6, 'H': 5}
3 A2 2 {'C': 1, 'H': 2}
one.groupby('id')[0].apply(list)
id
A1 [{'C': 1, 'H': 2}, {'C': 5, 'H': 3}]
A2 [{'C': 6, 'H': 5}, {'C': 1, 'H': 2}]
Name: 0, dtype: object
In Python, I have some tuples like b, and I want to add them into an empty list without unpack them. Here, I simplify b so that it repeats itself, in reality, the values in b would be different, so b would be b1, b2, b3...
b = ({'a': 1, 'b': 1, 'c': 1}, 'y')
bb = [b, b, b]
print(len(bb))
print(len(bb[0]))
bb
This gives
3 2 Out[204]: [({'a': 1, 'b': 1, 'c': 1}, 'y'), ({'a': 1, 'b': 1,'c': 1}, 'y'), ({'a': 1, 'b': 1, 'c': 1}, 'y')]
which is what I want. But since I am now doing in a loop, I can not write bb = [b, b, b]. The syntax I came up with will make hiarachy that I do not want.
bb = ()
b = ({'a': 1, 'b': 1, 'c': 1}, 'y')
bb = [bb, b]
# in reality I loop bb with 3 times in for loop
bb = [bb, b]
bb = [bb, b]
print(len(bb))
print(len(bb[0]))
bb
This gives
[[[(), ({'a': 1, 'b': 1, 'c': 1}, 'y')], ({'a': 1, 'b': 1, 'c': 1},'y')], ({'a': 1, 'b': 1, 'c': 1}, 'y')]
and is not want I wanted. How can I loop and reach the first outcome?
Just use list comprehension:
b = ({'a': 1, 'b': 1, 'c': 1}, 'y')
bb = [b for i in range(3)]
Output:
[({'a': 1, 'c': 1, 'b': 1}, 'y'), ({'a': 1, 'c': 1, 'b': 1}, 'y'), ({'a': 1, 'c': 1, 'b': 1}, 'y')]
Start with a list and use append:
bb = []
b = ({'a': 1, 'b': 1, 'c': 1}, 'y')
for _ in range(3):
bb.append(b)
Given the following, how can I set the NaN/None value of the B row based on the other rows? Should I use apply?
d = [
{'A': 2, 'B': Decimal('628.00'), 'C': 1, 'D': 'blue'},
{'A': 1, 'B': None, 'C': 3, 'D': 'orange'},
{'A': 3, 'B': None, 'C': 1, 'D': 'orange'},
{'A': 2, 'B': Decimal('575.00'), 'C': 2, 'D': 'blue'},
{'A': 4, 'B': None, 'C': 1, 'D': 'blue'},
]
df = pd.DataFrame(d)
# Make sure types are correct
df['B'] = df['B'].astype('float')
df['C'] = df['C'].astype('int')
In : df
Out:
A B C D
0 2 628 1 blue
1 1 NaN 3 orange
2 3 NaN 1 orange
3 2 575 2 blue
4 4 NaN 1 blue
In : df.dtypes
Out:
A int64
B float64
C int64
D object
dtype: object
Here is an example of the "rules" to set B when the value is None:
def make_B(c, d):
"""When B is None, the value of B depends on C and D."""
if d == 'blue':
return Decimal('1400.89') * 1 * c
elif d == 'orange':
return Decimal('2300.57') * 2 * c
raise
Here is the way I solve it:
I define make_B as below:
def make_B(x):
if np.isnan(x['B']):
"""When B is None, the value of B depends on C and D."""
if x['D'] == 'blue':
return Decimal('1400.89') * 1 * x['C']
elif x['D'] == 'orange':
return Decimal('2300.57') * 2 * x['C']
else:
return x['B']
Then I use apply:
df.apply(make_B,axis=1)