Condition based on First Element of a group in python

Condition based on First Element of a group in python - python

I have a dataframe df with columns [ShowOnAir, AfterPremier, ID, EverOnAir].
My condition is that
if it is the first element of groupby(df.ID)
then if (df.ShowOnAir ==0 or df.AfterPremier == 0), then EverOnAir = 0
else EverOnAir = 1
I am not sure how to compare the first element of the groupby, with elements of the orignal dataframe df.
would really appreciate if I could get help in it ,
Thank you

You can get a row number for your groups by using cumsum, then you can do your logic on the resulting dataframe:
df = pd.DataFrame([[1],[1],[2],[2],[2]])
df['n']=1
df.groupby(0).cumsum()
n
0 1
1 2
2 1
3 2
4 3

You can first create new column EverOnAir filled 1. Then groupby by ID and apply custom function f, where find first element of columns by iat and fill 0:
print df
ShowOnAir AfterPremier ID
0 0 0 a
1 0 1 a
2 1 1 a
3 1 1 b
4 1 0 b
5 0 0 b
6 0 1 c
7 1 0 c
8 0 0 c
def f(x):
#print x
x['EverOnAir'].iat[0] = np.where((x['ShowOnAir'].iat[0] == 0 ) |
(x['AfterPremier'].iat[0] == 0), 0, 1)
return x
df['EverOnAir'] = 1
print df.groupby('ID').apply(f)
ShowOnAir AfterPremier ID EverOnAir
0 0 0 a 0
1 0 1 a 1
2 1 1 a 1
3 1 1 b 1
4 1 0 b 1
5 0 0 b 1
6 0 1 c 0
7 1 0 c 1
8 0 0 c 1

Related

How do I create a column such that its values is count of the number of,1, in that row, which are appearing for the first time in their own column?

How do I do this operation using pandas?
Initial Df:
A B C D
0 0 1 0 0
1 0 1 0 0
2 0 0 1 1
3 0 1 0 1
4 1 1 0 0
5 1 1 1 0
Final Df:
A B C D Param
0 0 1 0 0 1
1 0 1 0 0 0
2 0 0 1 1 2
3 0 1 0 1 0
4 1 1 0 0 1
5 1 1 1 0 0
Basically Param is the number of the 1 in that row which is appearing for the first time in its own column.
Example:
index 0 : 1 in the column B is appearing for the first time hence Param1 = 1
index 1 : none of the 1 is appearing for the first time in its own column hence Param1 = 0
index 2 : 1 in the column C and D is appearing for the first time in their columns hence Paramm1 = 2
index 3 : none of the 1 is appearing for the first time in its own column hence Param1 = 0
index 4 : 1 in the column A is appearing for the first time in the column hence Paramm1 = 1
index 5 : none of the 1 is appearing for the first time in its own column hence Param1 = 0

I will do idxmax and value_counts
df['Param']=df.idxmax().value_counts().reindex(df.index,fill_value=0)
df
A B C D Param
0 0 1 0 0 1
1 0 1 0 0 0
2 0 0 1 1 2
3 0 1 0 1 0
4 1 1 0 0 1
5 1 1 1 0 0

You can check for duplicated values, multiply with df and sum:
df['Param'] = df.apply(lambda x: ~x.duplicated()).mul(df).sum(1)
Output:
A B C D Param
0 0 1 0 0 1
1 0 1 0 0 0
2 0 0 1 1 2
3 0 1 0 1 0
4 1 1 0 0 1
5 1 1 1 0 0

Assuming these are integers, you can use cumsum() twice to isolate the first occurrence of 1.
df2 = (df.cumsum() > 0).cumsum() == 1
df['Param'] = df2.sum(axis = 1)
print(df)
If df elements are strings, you should first convert them to integers.
df = df.astype(int)

Using previous row value while creating a new column

I have a df in python that looks something like this:
'A'
0
1
0
0
1
1
1
1
0
I want to create another column that adds cumulative 1's from column A, and starts over if the value in column A becomes 0 again. So desired output:
'A' 'B'
0 0
1 1
0 0
0 0
1 1
1 2
1 3
1 4
0 0
This is what I am trying, but it's just replicating column A:
df.B[df.A ==0] = 0
df.B[df.A !=0] = df.A + df.B.shift(1)

Let us do cumsum with groupby cumcount
df['B']=(df.groupby(df.A.eq(0).cumsum()).cumcount()).where(df.A==1,0)
Out[81]:
0 0
1 1
2 0
3 0
4 1
5 2
6 3
7 4
8 0
dtype: int64

Use shift with ne and groupby.cumsum:
df['B'] = df.groupby(df['A'].shift().ne(df['A']).cumsum())['A'].cumsum()
print(df)
A B
0 0 0
1 1 1
2 0 0
3 0 0
4 1 1
5 1 2
6 1 3
7 1 4
8 0 0

Change 1st row of a dataframe based on a condition in pandas

I have 2 columns on whose value I want to update the third column for only 1 row.
I have-
df = pd.DataFrame({'A':[1,1,2,3,4,4],
'B':[2,2,4,3,2,1],
'C':[0] * 6})
print (df)
A B C
0 1 2 0
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0
If A= 1 and B=2 then only 1st row has C=1 like this -
print (df)
A B C
0 1 2 1
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0
Right now I have used
df.loc[(df['A']==1) & (df['B']==2)].iloc[[0]].loc['C'] = 1
but it doesn't change the dataframe.

Solution if match always at least one row:
Create boolean mask and set to first True index value by idxmax:
mask = (df['A']==1) & (df['B']==2)
df.loc[mask.idxmax(), 'C'] = 1
But if no value matched idxmax return first False value, so added if-else:
mask = (df['A']==1) & (df['B']==2)
idx = mask.idxmax() if mask.any() else np.repeat(False, len(df))
df.loc[idx, 'C'] = 1
print (df)
A B C
0 1 2 1
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0
mask = (df['A']==10) & (df['B']==20)
idx = mask.idxmax() if mask.any() else np.repeat(False, len(df))
df.loc[idx, 'C'] = 1
print (df)
A B C
0 1 2 0
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0

Using pd.Series.cumsum to ensure only the first matching criteria is satisfied:
mask = df['A'].eq(1) & df['B'].eq(2)
df.loc[mask & mask.cumsum().eq(1), 'C'] = 1
print(df)
A B C
0 1 2 1
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0
If performance is a concern, see Efficiently return the index of the first value satisfying condition in array.

Cumulative count in a pandas df

I am trying to export a cumulative count based off two columns in a pandas df.
An example is the df below. I'm trying to export a count based off Value and Count. So when the count increase I want attribute that to the adjacent value
import pandas as pd
d = ({
'Value' : ['A','A','B','C','D','A','B','A'],
'Count' : [0,1,1,2,3,3,4,5],
})
df = pd.DataFrame(d)
I have used this:
for val in ['A','B','C','D']:
cond = df.Value.eq(val) & df.Count.eq(int)
df.loc[cond, 'Count_' + val] = cond[cond].cumsum()
If I alter int to a specific number it will return the count. But I need this to read any number as the Count column keeps increasing.
My intended output is:
Value Count A_Count B_Count C_Count D_Count
0 A 0 0 0 0 0
1 A 1 1 0 0 0
2 B 1 1 0 0 0
3 C 2 1 0 1 0
4 D 3 1 0 1 1
5 A 3 1 0 1 1
6 B 4 1 1 1 1
7 A 5 2 1 1 1
So the count increase on the second row so 1 to Value A. Count increases again on row 4 and it's the first time for Value C so 1. Same again for rows 5 and 7. The count increases on row 8 so A becomes 2.

You could use str.get_dummies and diff and cumsum
In [262]: df['Value'].str.get_dummies().multiply(df['Count'].diff().gt(0), axis=0).cumsum()
Out[262]:
A B C D
0 0 0 0 0
1 1 0 0 0
2 1 0 0 0
3 1 0 1 0
4 1 0 1 1
5 1 0 1 1
6 1 1 1 1
7 2 1 1 1
Which is
In [266]: df.join(df['Value'].str.get_dummies()
.multiply(df['Count'].diff().gt(0), axis=0)
.cumsum().add_suffix('_Count'))
Out[266]:
Value Count A_Count B_Count C_Count D_Count
0 A 0 0 0 0 0
1 A 1 1 0 0 0
2 B 1 1 0 0 0
3 C 2 1 0 1 0
4 D 3 1 0 1 1
5 A 3 1 0 1 1
6 B 4 1 1 1 1
7 A 5 2 1 1 1

Find first row with condition after each row satisfying another condition

in pandas I have the following data frame:
a b
0 0
1 1
2 1
0 0
1 0
2 1
Now I want to do the following:
Create a new column c, and for each row where a = 0 fill c with 1. Then c should be filled with 1s until the first row after each column fulfilling that, where b = 1 (and here im hanging), so the output should look like this:
a b c
0 0 1
1 1 1
2 1 0
0 0 1
1 0 1
2 1 1
Thanks!

It seems you need:
df['c'] = df.groupby(df.a.eq(0).cumsum())['b'].cumsum().le(1).astype(int)
print (df)
a b c
0 0 0 1
1 1 1 1
2 2 1 0
3 0 0 1
4 1 0 1
5 2 1 1
Detail:
print (df.a.eq(0).cumsum())
0 1
1 1
2 1
3 2
4 2
5 2
Name: a, dtype: int32

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Condition based on First Element of a group in python - python

You can get a row number for your groups by using cumsum, then you can do your logic on the resulting dataframe: df = pd.DataFrame([[1],[1],[2],[2],[2]]) df['n']=1 df.groupby(0).cumsum() n 0 1 1 2 2 1 3 2 4 3

Related

How do I create a column such that its values is count of the number of,1, in that row, which are appearing for the first time in their own column?

Using previous row value while creating a new column

Change 1st row of a dataframe based on a condition in pandas

Cumulative count in a pandas df

Find first row with condition after each row satisfying another condition

Categories

Resources