Change 1st row of a dataframe based on a condition in pandas

Change 1st row of a dataframe based on a condition in pandas - python

I have 2 columns on whose value I want to update the third column for only 1 row.
I have-
df = pd.DataFrame({'A':[1,1,2,3,4,4],
'B':[2,2,4,3,2,1],
'C':[0] * 6})
print (df)
A B C
0 1 2 0
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0
If A= 1 and B=2 then only 1st row has C=1 like this -
print (df)
A B C
0 1 2 1
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0
Right now I have used
df.loc[(df['A']==1) & (df['B']==2)].iloc[[0]].loc['C'] = 1
but it doesn't change the dataframe.

Solution if match always at least one row:
Create boolean mask and set to first True index value by idxmax:
mask = (df['A']==1) & (df['B']==2)
df.loc[mask.idxmax(), 'C'] = 1
But if no value matched idxmax return first False value, so added if-else:
mask = (df['A']==1) & (df['B']==2)
idx = mask.idxmax() if mask.any() else np.repeat(False, len(df))
df.loc[idx, 'C'] = 1
print (df)
A B C
0 1 2 1
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0
mask = (df['A']==10) & (df['B']==20)
idx = mask.idxmax() if mask.any() else np.repeat(False, len(df))
df.loc[idx, 'C'] = 1
print (df)
A B C
0 1 2 0
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0

Using pd.Series.cumsum to ensure only the first matching criteria is satisfied:
mask = df['A'].eq(1) & df['B'].eq(2)
df.loc[mask & mask.cumsum().eq(1), 'C'] = 1
print(df)
A B C
0 1 2 1
1 1 2 0
2 2 4 0
3 3 3 0
4 4 2 0
5 4 1 0
If performance is a concern, see Efficiently return the index of the first value satisfying condition in array.

Related

Using previous row value while creating a new column

I have a df in python that looks something like this:
'A'
0
1
0
0
1
1
1
1
0
I want to create another column that adds cumulative 1's from column A, and starts over if the value in column A becomes 0 again. So desired output:
'A' 'B'
0 0
1 1
0 0
0 0
1 1
1 2
1 3
1 4
0 0
This is what I am trying, but it's just replicating column A:
df.B[df.A ==0] = 0
df.B[df.A !=0] = df.A + df.B.shift(1)

Let us do cumsum with groupby cumcount
df['B']=(df.groupby(df.A.eq(0).cumsum()).cumcount()).where(df.A==1,0)
Out[81]:
0 0
1 1
2 0
3 0
4 1
5 2
6 3
7 4
8 0
dtype: int64

Use shift with ne and groupby.cumsum:
df['B'] = df.groupby(df['A'].shift().ne(df['A']).cumsum())['A'].cumsum()
print(df)
A B
0 0 0
1 1 1
2 0 0
3 0 0
4 1 1
5 1 2
6 1 3
7 1 4
8 0 0

How to apply cummulative count on multiple columns of dataframe

Dataframe
a b c
0 0 1 1
1 0 1 1
2 0 0 1
3 0 0 1
4 1 1 0
5 1 1 1
6 1 1 1
7 0 0 1
I am trying apply cummulative count cumcount on multiple columns of dataframe, i have tried applying the cummulative count by grouping each column. Is there any easy way to achieve expected output
I have tried this code , but it is not working
li =[]
for column in df.columns:
li.append(df.groupby(column)[column].cumcount())
pd.concat(li,axis=1)
Expected output
a b c
0 1 1 1
1 1 2 2
2 1 1 3
3 1 1 4
4 1 1 1
5 2 2 1
6 3 3 2
7 1 1 3

Create consecutive groups by comparing with shifted values and for each column apply cumcount, last set 1 by boolean mask:
df = (df.ne(df.shift()).cumsum()
.apply(lambda x: df.groupby(x).cumcount() + 1)
.mask(df == 0, 1))
print (df)
a b c
0 1 1 1
1 1 2 2
2 1 1 3
3 1 1 4
4 1 1 1
5 2 2 1
6 3 3 2
7 1 1 3
Another solution if performance is important - count only 1 values and last set 1 by mask by np.where:
a = df == 1
b = a.cumsum()
arr = np.where(a, b-b.mask(a).ffill().fillna(0).astype(int), 1)
df = pd.DataFrame(arr, index=df.index, columns=df.columns)
print (df)
a b c
0 1 1 1
1 1 2 2
2 1 1 3
3 1 1 4
4 1 1 1
5 2 2 1
6 3 3 2
7 1 1 3

Find first row with condition after each row satisfying another condition

in pandas I have the following data frame:
a b
0 0
1 1
2 1
0 0
1 0
2 1
Now I want to do the following:
Create a new column c, and for each row where a = 0 fill c with 1. Then c should be filled with 1s until the first row after each column fulfilling that, where b = 1 (and here im hanging), so the output should look like this:
a b c
0 0 1
1 1 1
2 1 0
0 0 1
1 0 1
2 1 1
Thanks!

It seems you need:
df['c'] = df.groupby(df.a.eq(0).cumsum())['b'].cumsum().le(1).astype(int)
print (df)
a b c
0 0 0 1
1 1 1 1
2 2 1 0
3 0 0 1
4 1 0 1
5 2 1 1
Detail:
print (df.a.eq(0).cumsum())
0 1
1 1
2 1
3 2
4 2
5 2
Name: a, dtype: int32

return rows with unique pairs across columns

I'm trying to find rows that have unique pairs of values across 2 columns, so this dataframe:
A B
1 0
2 0
3 0
0 1
2 1
3 1
0 2
1 2
3 2
0 3
1 3
2 3
will be reduced to only the rows that don't match up if flipped, for instance 1 and 3 is a combination I only want returned once. So a check to see if the same pair exists if the columns are flipped (3 and 1) it can be removed. The table I'm looking to get is:
A B
0 2
0 3
1 0
1 2
1 3
2 3
Where there is only one occurrence of each pair of values that are mirrored if the columns are flipped.

I think you can use apply sorted + drop_duplicates:
df = df.apply(sorted, axis=1).drop_duplicates()
print (df)
A B
0 0 1
1 0 2
2 0 3
4 1 2
5 1 3
8 2 3
Faster solution with numpy.sort:
df = pd.DataFrame(np.sort(df.values, axis=1), index=df.index, columns=df.columns)
.drop_duplicates()
print (df)
A B
0 0 1
1 0 2
2 0 3
4 1 2
5 1 3
8 2 3
Solution without sorting with DataFrame.min and DataFrame.max:
a = df.min(axis=1)
b = df.max(axis=1)
df['A'] = a
df['B'] = b
df = df.drop_duplicates()
print (df)
A B
0 0 1
1 0 2
2 0 3
4 1 2
5 1 3
8 2 3

Loading the data:
import numpy as np
import pandas as pd
a = np.array("1 2 3 0 2 3 0 1 3 0 1 2".split("\t"),dtype=np.double)
b = np.array("0 0 0 1 1 1 2 2 2 3 3 3".split("\t"),dtype=np.double)
df = pd.DataFrame(dict(A=a,B=b))
In case you don't need to sort the entire DF:
df["trans"] = df.apply(
lambda row: (min(row['A'], row['B']), max(row['A'], row['B'])), axis=1
)
df.drop_duplicates("trans")

Problems with pandas and numpy where condition/multiple values?

I have the follwoing pandas dataframe:
A B
1 3
0 3
1 2
0 1
0 0
1 4
....
0 0
I would like to add a new column at the right side, following the following condition:
If the value in B has 3 or 2 add 1 in the new_col for instance:
(*)
A B new_col
1 3 1
0 3 1
1 2 1
0 1 0
0 0 0
1 4 0
....
0 0 0
So I tried the following:
df['new_col'] = np.where(df['B'] == 3 & 2,'1','0')
However it did not worked:
A B new_col
1 3 0
0 3 0
1 2 1
0 1 0
0 0 0
1 4 0
....
0 0 0
Any idea of how to do a multiple contidition statement with pandas and numpy like (*)?.

You can use Pandas isin which will return a boolean showing whether the elements you're looking for are contained in column 'B'.
df['new_col'] = df['B'].isin([3, 2])
A B new_col
0 1 3 True
1 0 3 True
2 1 2 True
3 0 1 False
4 0 0 False
5 1 4 False
Then, you can use astype to convert the boolean values to 0 and 1, True being 1 and False being 0
df['new_col'] = df['B'].isin([3, 2]).astype(int)
Output:
A B new_col
0 1 3 1
1 0 3 1
2 1 2 1
3 0 1 0
4 0 0 0
5 1 4 0

Using numpy:
>>> df['new_col'] = np.where(np.logical_or(df['B'] == 3, df['B'] == 2), '1','0')
>>> df
A B new_col
0 1 3 1
1 0 3 1
2 1 2 1
3 0 1 0
4 0 0 0
5 1 4 0

df['new_col'] = [1 if x in [2, 3] else 0 for x in df.B]
The operators * + ^ work on booleans as expected, and mixing with integers give the expected result. So you can also do:
df['new_col'] = [(x in [2, 3]) * 1 for x in df.B]

using numpy
df['new'] = (df.B.values[:, None] == np.array([2, 3])).any(1) * 1
Timing
over given data set
over 60,000 rows

df=pd.DataFrame({'A':[1,0,1,0,0,1],'B':[3,3,2,1,0,4]})
print df
df['C']=[1 if vals==2 or vals==3 else 0 for vals in df['B'] ]
print df
A B
0 1 3
1 0 3
2 1 2
3 0 1
4 0 0
5 1 4
A B C
0 1 3 1
1 0 3 1
2 1 2 1
3 0 1 0
4 0 0 0
5 1 4 0

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Change 1st row of a dataframe based on a condition in pandas - python

Related

Using previous row value while creating a new column

How to apply cummulative count on multiple columns of dataframe

Find first row with condition after each row satisfying another condition

return rows with unique pairs across columns

Problems with pandas and numpy where condition/multiple values?

Categories

Resources