I have a dataframe df with columns [ShowOnAir, AfterPremier, ID, EverOnAir].
My condition is that
if it is the first element of groupby(df.ID)
then if (df.ShowOnAir ==0 or df.AfterPremier == 0), then EverOnAir = 0
else EverOnAir = 1
I am not sure how to compare the first element of the groupby, with elements of the orignal dataframe df.
would really appreciate if I could get help in it ,
Thank you
You can get a row number for your groups by using cumsum, then you can do your logic on the resulting dataframe:
df = pd.DataFrame([[1],[1],[2],[2],[2]])
df['n']=1
df.groupby(0).cumsum()
n
0 1
1 2
2 1
3 2
4 3
You can first create new column EverOnAir filled 1. Then groupby by ID and apply custom function f, where find first element of columns by iat and fill 0:
print df
ShowOnAir AfterPremier ID
0 0 0 a
1 0 1 a
2 1 1 a
3 1 1 b
4 1 0 b
5 0 0 b
6 0 1 c
7 1 0 c
8 0 0 c
def f(x):
#print x
x['EverOnAir'].iat[0] = np.where((x['ShowOnAir'].iat[0] == 0 ) |
(x['AfterPremier'].iat[0] == 0), 0, 1)
return x
df['EverOnAir'] = 1
print df.groupby('ID').apply(f)
ShowOnAir AfterPremier ID EverOnAir
0 0 0 a 0
1 0 1 a 1
2 1 1 a 1
3 1 1 b 1
4 1 0 b 1
5 0 0 b 1
6 0 1 c 0
7 1 0 c 1
8 0 0 c 1
Related
How do I do this operation using pandas?
Initial Df:
A B C D
0 0 1 0 0
1 0 1 0 0
2 0 0 1 1
3 0 1 0 1
4 1 1 0 0
5 1 1 1 0
Final Df:
A B C D Param
0 0 1 0 0 1
1 0 1 0 0 0
2 0 0 1 1 2
3 0 1 0 1 0
4 1 1 0 0 1
5 1 1 1 0 0
Basically Param is the number of the 1 in that row which is appearing for the first time in its own column.
Example:
index 0 : 1 in the column B is appearing for the first time hence Param1 = 1
index 1 : none of the 1 is appearing for the first time in its own column hence Param1 = 0
index 2 : 1 in the column C and D is appearing for the first time in their columns hence Paramm1 = 2
index 3 : none of the 1 is appearing for the first time in its own column hence Param1 = 0
index 4 : 1 in the column A is appearing for the first time in the column hence Paramm1 = 1
index 5 : none of the 1 is appearing for the first time in its own column hence Param1 = 0
I will do idxmax and value_counts
df['Param']=df.idxmax().value_counts().reindex(df.index,fill_value=0)
df
A B C D Param
0 0 1 0 0 1
1 0 1 0 0 0
2 0 0 1 1 2
3 0 1 0 1 0
4 1 1 0 0 1
5 1 1 1 0 0
You can check for duplicated values, multiply with df and sum:
df['Param'] = df.apply(lambda x: ~x.duplicated()).mul(df).sum(1)
Output:
A B C D Param
0 0 1 0 0 1
1 0 1 0 0 0
2 0 0 1 1 2
3 0 1 0 1 0
4 1 1 0 0 1
5 1 1 1 0 0
Assuming these are integers, you can use cumsum() twice to isolate the first occurrence of 1.
df2 = (df.cumsum() > 0).cumsum() == 1
df['Param'] = df2.sum(axis = 1)
print(df)
If df elements are strings, you should first convert them to integers.
df = df.astype(int)
I have a df in python that looks something like this:
'A'
0
1
0
0
1
1
1
1
0
I want to create another column that adds cumulative 1's from column A, and starts over if the value in column A becomes 0 again. So desired output:
'A' 'B'
0 0
1 1
0 0
0 0
1 1
1 2
1 3
1 4
0 0
This is what I am trying, but it's just replicating column A:
df.B[df.A ==0] = 0
df.B[df.A !=0] = df.A + df.B.shift(1)
Let us do cumsum with groupby cumcount
df['B']=(df.groupby(df.A.eq(0).cumsum()).cumcount()).where(df.A==1,0)
Out[81]:
0 0
1 1
2 0
3 0
4 1
5 2
6 3
7 4
8 0
dtype: int64
Use shift with ne and groupby.cumsum:
df['B'] = df.groupby(df['A'].shift().ne(df['A']).cumsum())['A'].cumsum()
print(df)
A B
0 0 0
1 1 1
2 0 0
3 0 0
4 1 1
5 1 2
6 1 3
7 1 4
8 0 0
I have 2 columns on whose value I want to update the third column for only 1 row.
I have-
df = pd.DataFrame({'A':[1,1,2,3,4,4],
'B':[2,2,4,3,2,1],
'C':[0] * 6})
print (df)
A B C
0 1 2 0
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0
If A= 1 and B=2 then only 1st row has C=1 like this -
print (df)
A B C
0 1 2 1
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0
Right now I have used
df.loc[(df['A']==1) & (df['B']==2)].iloc[[0]].loc['C'] = 1
but it doesn't change the dataframe.
Solution if match always at least one row:
Create boolean mask and set to first True index value by idxmax:
mask = (df['A']==1) & (df['B']==2)
df.loc[mask.idxmax(), 'C'] = 1
But if no value matched idxmax return first False value, so added if-else:
mask = (df['A']==1) & (df['B']==2)
idx = mask.idxmax() if mask.any() else np.repeat(False, len(df))
df.loc[idx, 'C'] = 1
print (df)
A B C
0 1 2 1
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0
mask = (df['A']==10) & (df['B']==20)
idx = mask.idxmax() if mask.any() else np.repeat(False, len(df))
df.loc[idx, 'C'] = 1
print (df)
A B C
0 1 2 0
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0
Using pd.Series.cumsum to ensure only the first matching criteria is satisfied:
mask = df['A'].eq(1) & df['B'].eq(2)
df.loc[mask & mask.cumsum().eq(1), 'C'] = 1
print(df)
A B C
0 1 2 1
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0
If performance is a concern, see Efficiently return the index of the first value satisfying condition in array.
I am trying to export a cumulative count based off two columns in a pandas df.
An example is the df below. I'm trying to export a count based off Value and Count. So when the count increase I want attribute that to the adjacent value
import pandas as pd
d = ({
'Value' : ['A','A','B','C','D','A','B','A'],
'Count' : [0,1,1,2,3,3,4,5],
})
df = pd.DataFrame(d)
I have used this:
for val in ['A','B','C','D']:
cond = df.Value.eq(val) & df.Count.eq(int)
df.loc[cond, 'Count_' + val] = cond[cond].cumsum()
If I alter int to a specific number it will return the count. But I need this to read any number as the Count column keeps increasing.
My intended output is:
Value Count A_Count B_Count C_Count D_Count
0 A 0 0 0 0 0
1 A 1 1 0 0 0
2 B 1 1 0 0 0
3 C 2 1 0 1 0
4 D 3 1 0 1 1
5 A 3 1 0 1 1
6 B 4 1 1 1 1
7 A 5 2 1 1 1
So the count increase on the second row so 1 to Value A. Count increases again on row 4 and it's the first time for Value C so 1. Same again for rows 5 and 7. The count increases on row 8 so A becomes 2.
You could use str.get_dummies and diff and cumsum
In [262]: df['Value'].str.get_dummies().multiply(df['Count'].diff().gt(0), axis=0).cumsum()
Out[262]:
A B C D
0 0 0 0 0
1 1 0 0 0
2 1 0 0 0
3 1 0 1 0
4 1 0 1 1
5 1 0 1 1
6 1 1 1 1
7 2 1 1 1
Which is
In [266]: df.join(df['Value'].str.get_dummies()
.multiply(df['Count'].diff().gt(0), axis=0)
.cumsum().add_suffix('_Count'))
Out[266]:
Value Count A_Count B_Count C_Count D_Count
0 A 0 0 0 0 0
1 A 1 1 0 0 0
2 B 1 1 0 0 0
3 C 2 1 0 1 0
4 D 3 1 0 1 1
5 A 3 1 0 1 1
6 B 4 1 1 1 1
7 A 5 2 1 1 1
in pandas I have the following data frame:
a b
0 0
1 1
2 1
0 0
1 0
2 1
Now I want to do the following:
Create a new column c, and for each row where a = 0 fill c with 1. Then c should be filled with 1s until the first row after each column fulfilling that, where b = 1 (and here im hanging), so the output should look like this:
a b c
0 0 1
1 1 1
2 1 0
0 0 1
1 0 1
2 1 1
Thanks!
It seems you need:
df['c'] = df.groupby(df.a.eq(0).cumsum())['b'].cumsum().le(1).astype(int)
print (df)
a b c
0 0 0 1
1 1 1 1
2 2 1 0
3 0 0 1
4 1 0 1
5 2 1 1
Detail:
print (df.a.eq(0).cumsum())
0 1
1 1
2 1
3 2
4 2
5 2
Name: a, dtype: int32