Converting from group by output to separate columns in pandas - python

I have a grouped by pandas dataframe that looks like so:
id type count
1 A 4
1 B 5
2 A 3
3 C 0
3 B 6
and I am hoping to get an output:
id A B C
1 4 5 0
2 3 0 0
3 0 0 6
I feel like there is a straightforward solution to this that I am not seeing.

use pivot
df.pivot('id', 'type', 'count').fillna(0)

Related

Search and update values in other dataframe for specific columns

I have two different dataframe in pandas.
First
A
B
C
D
VALUE
1
2
3
5
0
1
5
3
2
0
2
5
3
2
0
Second
A
B
C
D
Value
5
3
3
2
1
1
5
4
3
1
I want column values A and B in the first dataframe to be searched in the second dataframe. If A and B values match then update the Value column.Search only 2 columns in other dataframe and update only 1 column. Actually the process we know in sql.
Result
A
B
C
D
VALUE
1
2
3
5
0
1
5
3
2
1
2
5
3
2
0
If you focus on the bold text, you can understand it more easily.Despite my attempts, I could not succeed. I only want 1 column to change but it also changes A and B. I only want the Value column of matches to change.
You can use a merge:
cols = ['A', 'B']
df1['VALUE'] = (df2.merge(df1[cols], on=cols, how='right')
['Value'].fillna(df1['VALUE'], downcast='infer')
)
output:
A B C D VALUE
0 1 2 3 5 0
1 1 5 3 2 1
2 2 5 3 2 0

Pandas join two DataFrames where one is a pivot table

Say I have two DataFrames that look like the following:
df1:
movieID 1 2 3 4
userID
0 2 0 0 2
1 1 1 4 0
2 0 2 3 0
3 1 2 0 0
and
df2:
userID movieID
0 0 2
1 0 3
2 0 4
3 1 3
What I am trying to accomplish is joining the two so that df2 contains a new column with the associated rating of a user for a specific movie. Thus df2 in this example would become:
df2:
userID movieID rating
0 0 2 0
1 0 3 0
2 0 4 2
3 1 3 4
I don't believe that simply reformatting df2 to have the same shape as df1 would work because there is no guarantee that it will have all userIDs or movieIDs, and i've looked into the merge function but I'm confused on how to set the how and on parameters in this scenario. If anyone can explain how could I achieve this it would be greatly appreciated.
You can apply() by row to index df1.loc[row.userID, row.movieID].
Just make sure the dtype of df1.columns matches df2.userID, and df2.movieID matches df1.index.
df1.columns = df1.columns.astype(df2.movieID.dtype)
df1.index = df1.index.astype(df2.userID.dtype)
df2['rating'] = df2.apply(lambda row: df1.loc[row.userID, row.movieID], axis=1)
# userID movieID rating
# 0 0 2 0
# 1 0 3 0
# 2 0 4 2
# 3 1 3 4

Filtering a database based on the content of another

So I need to assign a new dataframe of features without protected attributes; I'm provided 2 .csv's where one has all information for each instance and another which labels each column as 1 if the attribute is a protected feature, 2 if the attribute is the value to be predicted, and 0 otherwise.
I'm not entirely certain how to go about this since I'm not very well versed but from my understanding it would be something similar to
df = pd.read_csv("x.csv")
pdf = pd.read_csv("y.csv")
newDf = df.iloc[? && pdf[cols?]]
So, given 2 different dataframes with the same labels:
A B C
0 7 3 1
1 8 3 1
2 9 2 1
A B C
0 0 1 1
Expected output would be:
A
0 7
1 8
2 9
You need to use any with negation to find all the columns with 0 value in second dataframe (df2) and use that as a list of columns to fetch from df1:
df[list(df.columns.to_series().loc[~df2.any()])]
Output:
A
0 7
1 8
2 9
Below is the output as per my understanding of your question, tell me if I am mistaken.
df1:
A B C D
0 7 3 1 6
1 8 3 1 4
2 9 1 1 1
df2:
A B C D
0 0 1 1 0
df3=pd.DataFrame()
for i in df1.columns:
if df2[i][0]==0:
df3[i]=df1[i]
o/p:
df3:
A D
0 7 6
1 8 4
2 9 1

Replace string values in pandas to their count

I`m trying to calculate count of some values in data frame like
user_id event_type
1 a
1 a
1 b
2 a
2 b
2 c
and I want to get table like
user_id event_type event_type_count
1 a 2
1 a 2
1 b 1
2 a 1
2 b 1
2 c 2
2 c 2
In other words, I want to insert count of value instead value in data frame.
I've tried use df.join(pd.crosstab)..., but I get a large data frame with many columns.
Which way is better to solve this problem ?
Use GroupBy.transform by both columns with GroupBy.size:
df['event_type_count'] = df.groupby(['user_id','event_type'])['event_type'].transform('size')
print (df)
user_id event_type event_type_count
0 1 a 2
1 1 a 2
2 1 b 1
3 2 a 1
4 2 b 1
5 2 c 2
6 2 c 2

Python pandas nonzero cumsum

I want to apply cumsum on dataframe in pandas in python, but withouth zeros. Simply I want to leave zeros and do cumsum on dataframe. Suppose I have dataframe like this:
import pandas as pd
df = pd.DataFrame({'a' : [1,2,0,1],
'b' : [2,5,0,0],
'c' : [0,1,2,5]})
a b c
0 1 2 0
1 2 5 1
2 0 0 2
3 1 0 5
and result sould be
a b c
0 1 2 0
1 3 7 1
2 0 0 3
3 4 0 8
Any ideas how to do that avoiding loops? In R there is ave function, but Im very new to python and I dont know.
You can mask the df so that you only overwrite the non-zero cells:
In [173]:
df[df!=0] = df.cumsum()
df
Out[173]:
a b c
0 1 2 0
1 3 7 1
2 0 0 3
3 4 0 8

Categories