Find first row with condition after each row satisfying another condition - python

in pandas I have the following data frame:
a b
0 0
1 1
2 1
0 0
1 0
2 1
Now I want to do the following:
Create a new column c, and for each row where a = 0 fill c with 1. Then c should be filled with 1s until the first row after each column fulfilling that, where b = 1 (and here im hanging), so the output should look like this:
a b c
0 0 1
1 1 1
2 1 0
0 0 1
1 0 1
2 1 1
Thanks!

It seems you need:
df['c'] = df.groupby(df.a.eq(0).cumsum())['b'].cumsum().le(1).astype(int)
print (df)
a b c
0 0 0 1
1 1 1 1
2 2 1 0
3 0 0 1
4 1 0 1
5 2 1 1
Detail:
print (df.a.eq(0).cumsum())
0 1
1 1
2 1
3 2
4 2
5 2
Name: a, dtype: int32

Related

Splitting a non delimited column and create an additional column to count which number value

I have a problem in which I want to take Table 1 and turn it into Table 2 using Python.
Does anybody have any ideas? I've tried to split the Value column from table 1 but run into issues in that each value is a different length, hence I can't always define how much to split it.
Equally I have not been able to think through how to create a new column that counts the position that value was in the string.
Table 1, before:
ID
Value
1
000000S
2
000FY
Table 2, after:
ID
Position
Value
1
1
0
1
2
0
1
3
0
1
4
0
1
5
0
1
6
0
1
7
S
2
1
0
2
2
0
2
3
0
2
4
F
2
5
Y
You can split the string to individual characters and explode:
out = (df
.assign(Value=df['Value'].apply(list))
.explode('Value')
)
output:
ID Value
0 1 0
0 1 0
0 1 0
0 1 0
0 1 0
0 1 0
0 1 S
1 2 0
1 2 0
1 2 0
1 2 F
1 2 Y
Given:
ID Value
0 1 000000S
1 2 000FY
Doing:
df.Value = df.Value.apply(list)
df = df.explode('Value')
df['Position'] = df.groupby('ID').cumcount() + 1
Output:
ID Value Position
0 1 0 1
0 1 0 2
0 1 0 3
0 1 0 4
0 1 0 5
0 1 0 6
0 1 S 7
1 2 0 1
1 2 0 2
1 2 0 3
1 2 F 4
1 2 Y 5

How do I create a column such that its values is count of the number of,1, in that row, which are appearing for the first time in their own column?

How do I do this operation using pandas?
Initial Df:
A B C D
0 0 1 0 0
1 0 1 0 0
2 0 0 1 1
3 0 1 0 1
4 1 1 0 0
5 1 1 1 0
Final Df:
A B C D Param
0 0 1 0 0 1
1 0 1 0 0 0
2 0 0 1 1 2
3 0 1 0 1 0
4 1 1 0 0 1
5 1 1 1 0 0
Basically Param is the number of the 1 in that row which is appearing for the first time in its own column.
Example:
index 0 : 1 in the column B is appearing for the first time hence Param1 = 1
index 1 : none of the 1 is appearing for the first time in its own column hence Param1 = 0
index 2 : 1 in the column C and D is appearing for the first time in their columns hence Paramm1 = 2
index 3 : none of the 1 is appearing for the first time in its own column hence Param1 = 0
index 4 : 1 in the column A is appearing for the first time in the column hence Paramm1 = 1
index 5 : none of the 1 is appearing for the first time in its own column hence Param1 = 0
I will do idxmax and value_counts
df['Param']=df.idxmax().value_counts().reindex(df.index,fill_value=0)
df
A B C D Param
0 0 1 0 0 1
1 0 1 0 0 0
2 0 0 1 1 2
3 0 1 0 1 0
4 1 1 0 0 1
5 1 1 1 0 0
You can check for duplicated values, multiply with df and sum:
df['Param'] = df.apply(lambda x: ~x.duplicated()).mul(df).sum(1)
Output:
A B C D Param
0 0 1 0 0 1
1 0 1 0 0 0
2 0 0 1 1 2
3 0 1 0 1 0
4 1 1 0 0 1
5 1 1 1 0 0
Assuming these are integers, you can use cumsum() twice to isolate the first occurrence of 1.
df2 = (df.cumsum() > 0).cumsum() == 1
df['Param'] = df2.sum(axis = 1)
print(df)
If df elements are strings, you should first convert them to integers.
df = df.astype(int)

Using previous row value while creating a new column

I have a df in python that looks something like this:
'A'
0
1
0
0
1
1
1
1
0
I want to create another column that adds cumulative 1's from column A, and starts over if the value in column A becomes 0 again. So desired output:
'A' 'B'
0 0
1 1
0 0
0 0
1 1
1 2
1 3
1 4
0 0
This is what I am trying, but it's just replicating column A:
df.B[df.A ==0] = 0
df.B[df.A !=0] = df.A + df.B.shift(1)
Let us do cumsum with groupby cumcount
df['B']=(df.groupby(df.A.eq(0).cumsum()).cumcount()).where(df.A==1,0)
Out[81]:
0 0
1 1
2 0
3 0
4 1
5 2
6 3
7 4
8 0
dtype: int64
Use shift with ne and groupby.cumsum:
df['B'] = df.groupby(df['A'].shift().ne(df['A']).cumsum())['A'].cumsum()
print(df)
A B
0 0 0
1 1 1
2 0 0
3 0 0
4 1 1
5 1 2
6 1 3
7 1 4
8 0 0

Cumulative count in a pandas df

I am trying to export a cumulative count based off two columns in a pandas df.
An example is the df below. I'm trying to export a count based off Value and Count. So when the count increase I want attribute that to the adjacent value
import pandas as pd
d = ({
'Value' : ['A','A','B','C','D','A','B','A'],
'Count' : [0,1,1,2,3,3,4,5],
})
df = pd.DataFrame(d)
I have used this:
for val in ['A','B','C','D']:
cond = df.Value.eq(val) & df.Count.eq(int)
df.loc[cond, 'Count_' + val] = cond[cond].cumsum()
If I alter int to a specific number it will return the count. But I need this to read any number as the Count column keeps increasing.
My intended output is:
Value Count A_Count B_Count C_Count D_Count
0 A 0 0 0 0 0
1 A 1 1 0 0 0
2 B 1 1 0 0 0
3 C 2 1 0 1 0
4 D 3 1 0 1 1
5 A 3 1 0 1 1
6 B 4 1 1 1 1
7 A 5 2 1 1 1
So the count increase on the second row so 1 to Value A. Count increases again on row 4 and it's the first time for Value C so 1. Same again for rows 5 and 7. The count increases on row 8 so A becomes 2.
You could use str.get_dummies and diff and cumsum
In [262]: df['Value'].str.get_dummies().multiply(df['Count'].diff().gt(0), axis=0).cumsum()
Out[262]:
A B C D
0 0 0 0 0
1 1 0 0 0
2 1 0 0 0
3 1 0 1 0
4 1 0 1 1
5 1 0 1 1
6 1 1 1 1
7 2 1 1 1
Which is
In [266]: df.join(df['Value'].str.get_dummies()
.multiply(df['Count'].diff().gt(0), axis=0)
.cumsum().add_suffix('_Count'))
Out[266]:
Value Count A_Count B_Count C_Count D_Count
0 A 0 0 0 0 0
1 A 1 1 0 0 0
2 B 1 1 0 0 0
3 C 2 1 0 1 0
4 D 3 1 0 1 1
5 A 3 1 0 1 1
6 B 4 1 1 1 1
7 A 5 2 1 1 1

for loop to extract header for a dataframe in pandas

I am a newbie in python. I have a data frame that looks like this:
A B C D E
0 1 0 1 0 1
1 0 1 0 0 1
2 0 1 1 1 0
3 1 0 0 1 0
4 1 0 0 1 1
How can I write a for loop to gather the column names for each row. I expect my result set looks like that:
A B C D E Result
0 1 0 1 0 1 ACE
1 0 1 0 0 1 BE
2 0 1 1 1 0 BCD
3 1 0 0 1 0 AD
4 1 0 0 1 1 ADE
Anyone can help me with that? Thank you!
The dot function is done for that purpose as you want the matrix dot product between your matrix and the vector of column names:
df.dot(df.columns)
Out[5]:
0 ACE
1 BE
2 BCD
3 AD
4 ADE
If your dataframe is numeric, then obtain the boolean matrix first by test your df against 0:
(df!=0).dot(df.columns)
PS: Just assign the result to the new column
df['Result'] = df.dot(df.columns)
df
Out[7]:
A B C D E Result
0 1 0 1 0 1 ACE
1 0 1 0 0 1 BE
2 0 1 1 1 0 BCD
3 1 0 0 1 0 AD
4 1 0 0 1 1 ADE

Categories