Pivot Tables with Pandas - python

I have the following data saved as a pandas dataframe
Animal Day age Food kg
1 1 3 17 0.1
1 1 3 22 0.7
1 2 3 17 0.8
2 2 7 15 0.1
With pivot I get the following:
output = df.pivot(["Animal", "Food"], "Day", "kg") \
.add_prefix("Day") \
.reset_index() \
.rename_axis(None, axis=1)
>>> output
Animal Food Day1 Day2
0 1 17 0.1 0.8
1 1 22 0.7 NaN
2 2 15 NaN 0.1
However I would like to have the age column (and other columns) still included.
It could also be possible that for animal x the value age is not always the same, then it doesn't matter which age value is taken.
Animal Food Age Day1 Day2
0 1 17 3 0.1 0.8
1 1 22 3 0.7 NaN
2 2 15 7 NaN 0.1
How do I need to change the code above?

IIUC, what you want is to pivot the weight, but to aggregate the age.
To my knowledge, you need to do both operations separately. One with pivot, the other with groupby (here I used first for the example, but this could be anything), and join:
(df.pivot(index=["Animal", "Food"],
columns="Day",
values="kg",
)
.add_prefix('Day')
.join(df.groupby(['Animal', 'Food'])['age'].first())
.reset_index()
)
I am adding a non ambiguous example (here the age of Animal 1 changes on Day2).
Input:
Animal Day age Food kg
0 1 1 3 17 0.1
1 1 1 3 22 0.7
2 1 2 4 17 0.8
3 2 2 7 15 0.1
output:
Animal Food Day1 Day2 age
0 1 17 0.1 0.8 3
1 1 22 0.7 NaN 3
2 2 15 NaN 0.1 7

Use pivot, add other columns to index:
>>> df.pivot(df.columns[~df.columns.isin(['Day', 'kg'])], 'Day', 'kg') \
.add_prefix('Day').reset_index().rename_axis(columns=None)
Animal age Food Day1 Day2
0 1 3 17 0.1 0.8
1 1 3 22 0.7 NaN
2 2 7 15 NaN 0.1

Related

assign values of one dataframe column to another dataframe column based on condition

am trying to compare two dataframes based on different columns and assign a value to a dataframe based on it.
df1 :
date value1 value2
4/1/2021 A 1
4/2/2021 B 2
4/6/2021 C 3
4/4/2021 D 4
4/5/2021 E 5
4/6/2021 F 6
4/2/2021 G 7
df2:
Date percent
4/1/2021 0.1
4/2/2021 0.2
4/6/2021 0.6
output:
date value1 value2 per
4/1/2021 A 1 0.1
4/2/2021 B 2 0.2
4/6/2021 C 3 0.6
4/4/2021 D 4 0
4/5/2021 E 5 0
4/6/2021 F 6 0
4/2/2021 G 7 0.2
Code1:
df1['per'] = np.where(df1['date']==df2['Date'], df2['per'], 0)
error:
ValueError: Can only compare identically-labeled Series objects
Note: changed the column value of df2['Date] to df2['date] and then tried merging
code2:
new = pd.merge(df1, df2, on=['date'], how='inner')
error:
ValueError: You are trying to merge on object and datetime64[ns] columns. If you wish to proceed you should use pd.concat
df1['per']=df1['date'].map(dict(zip(df2['Date'], df2['percent']))).fillna(0)
date value1 value2 per
0 4/1/2021 A 1 0.1
1 4/2/2021 B 2 0.2
2 4/6/2021 C 3 0.6
3 4/4/2021 D 4 0.0
4 4/5/2021 E 5 0.0
5 4/6/2021 F 6 0.6
6 4/2/2021 G 7 0.2
You could use pd.merge and perform a left join to keep all the rows from df1 and bring over all the date matching rows from df2:
pd.merge(df1,df2,left_on='date',right_on='Date', how='left').fillna(0).drop('Date',axis=1)
Prints:
date value1 value2 percent
0 04/01/2021 A 1 0.1
1 04/02/2021 B 2 0.2
2 04/06/2021 C 3 0.6
3 04/04/2021 D 4 0.0
4 04/05/2021 E 5 0.0
5 04/06/2021 F 6 0.6
6 04/02/2021 G 7 0.2
*I think there's a typo on your penultimate row. percent should be 0.6 IIUC.

Pandas groupby and calculate 1/count

I have a pandas dataframe like so:
id val
1 10
1 20
2 19
2 21
2 15
Now I want to groupby id and calculate weight column as 1/count of rows in each group. So final dataframe will be like:
id val weight
1 10 0.5
1 20 0.5
2 19 0.33
2 21 0.33
2 15 0.33
What's the easiest way to achieve this?
Use GroupBy.transform with division:
df['weight'] = 1 / df.groupby('id')['id'].transform('size')
#alternative
#df['weight'] = df.groupby('id')['id'].transform('size').rdiv(1)
Or Series.map with Series.value_counts:
df['weight'] = 1 / df['id'].map(df['id'].value_counts())
#alternative
#df['weight'] = df['id'].map(df['id'].value_counts()).rdiv(1)
print (df)
id val weight
0 1 10 0.500000
1 1 20 0.500000
2 2 19 0.333333
3 2 21 0.333333
4 2 15 0.333333

pandas number of items in one column per value in another column

I have two dataframes. say for example, frame 1 is the student info:
student_id course
1 a
2 b
3 c
4 a
5 f
6 f
frame 2 is each interaction the student has with a program
student_id day number_of_clicks
1 4 60
1 5 34
1 7 87
2 3 33
2 4 29
2 8 213
2 9 46
3 2 103
I am trying to add the information from frame 2 to frame 1, ie. for each student I would like to know the number of different days they accessed the database on, and the sum of all the clicks on those days. eg:
student_id course no_days total_clicks
1 a 3 181
2 b 4 321
3 c 1 103
4 a 0 0
5 f 0 0
6 f 0 0
I've tried to do this with groupby, but I couldn't add the information back into frame 1, or figure out how to sum the number of clicks. any ideas?
First we aggregate your df2 to the desired information using GroupBy.agg. Then we merge that information into df1:
agg = df2.groupby('student_id').agg(
no_days=('day', 'size'),
total_clicks=('number_of_clicks', 'sum')
)
df1 = df1.merge(agg, on='student_id', how='left').fillna(0)
student_id course no_days total_clicks
0 1 a 3.0 181.0
1 2 b 4.0 321.0
2 3 c 1.0 103.0
3 4 a 0.0 0.0
4 5 f 0.0 0.0
5 6 f 0.0 0.0
Or if you like one-liners, here's the same method as above, but in one line of code and more in SQL kind of style:
df1.merge(
df2.groupby('student_id').agg(
no_days=('day', 'size'),
total_clicks=('number_of_clicks', 'sum')
),
on='student_id',
how='left'
).fillna(0)
Use merge and fillna the null values then aggregate using groupby.agg as:
df = df1.merge(df2, how='left').fillna(0, downcast='infer')\
.groupby(['student_id', 'course'], as_index=False)\
.agg({'day':np.count_nonzero, 'number_of_clicks':np.sum}).reset_index()
print(df)
student_id course day number_of_clicks
0 1 a 3 181
1 2 b 4 321
2 3 c 1 103
3 4 a 0 0
4 5 f 0 0
5 6 f 0 0
​

how to add a mean column for the groupby movieID?

I have a df like below
userId movieId rating
0 1 31 2.0
1 2 10 4.0
2 2 17 5.0
3 2 39 5.0
4 2 47 4.0
5 3 31 3.0
6 3 10 2.0
I need to add two column, one is mean for each movie, the other is diff which is the difference between rating and mean.
Please note that movieId can be repeated because different users may rate the same movie. Here row 0 and 5 is for movieId 31, row 1 and 6 is for movieId 10
userId movieId rating mean diff
0 1 31 2.0 2.5 -0.5
1 2 10 4.0 3 1
2 2 17 5.0 5 0
3 2 39 5.0 5 0
4 2 47 4.0 4 0
5 3 31 3.0 2.5 0.5
6 3 10 2.0 3 -1
here is some of my code which calculates the mean
df = df.groupby('movieId')['rating'].agg(['count','mean']).reset_index()
You can use transform to keep the same number of rows when calculating mean with groupby. Calculating the difference is straightforward from that:
df['mean'] = df.groupby('movieId')['rating'].transform('mean')
df['diff'] = df['rating'] - df['mean']

Pandas Divide two data-frames based on similar keys

Suppose I have a df with values as:
user_id sub_id score
39 16 1
39 4 1
40 1 3
40 2 3
40 3 3
and
user_id score
39 2
40 30
So I want to divide columns based on key user_id, such that my result should be as:
user_id sub_id score
39 16 0.5
39 4 0.5
40 1 0.1
40 2 0.1
40 3 0.1
I have tried the div operation but it is not working as per my need, It is only dividing the first appearance and giving me NAN for else.
Is there any direct pandas operation or do I need to group both df's and then do the division?
Thanks
I think need divide by div by Series created by map:
df1['score'] = df1['score'].div(df1['user_id'].map(df2.set_index('user_id')['score']))
print (df1)
user_id sub_id score
0 39 16 0.5
1 39 4 0.5
2 40 1 0.1
3 40 2 0.1
4 40 3 0.1
Detail:
print (df1['user_id'].map(df2.set_index('user_id')['score']))
0 2
1 2
2 30
3 30
4 30
Name: user_id, dtype: int64

Categories