Apply a value to all instances of a number based on conditions

Apply a value to all instances of a number based on conditions - python

I have a df like this:
ID Number
1 0
1 0
1 1
2 0
2 0
3 1
3 1
3 0
I want to apply a 5 to any ids that have a 1 anywhere in the number column and a zero to those that don't. For example, if the number "1" appears anywhere in the Number column for ID 1, I want to place a 5 in the total column for every instance of that ID.
My desired output would look as such
ID Number Total
1 0 5
1 0 5
1 1 5
2 0 0
2 0 0
3 1 5
3 1 5
3 0 5
Trying to think of a way leverage applymap for this issue but not sure how to implement.

Use transform to add a column to your df as a result of a groupby on 'ID':
In [6]:
df['Total'] = df.groupby('ID').transform(lambda x: 5 if (x == 1).any() else 0)
df
Out[6]:
ID Number Total
0 1 0 5
1 1 0 5
2 1 1 5
3 2 0 0
4 2 0 0
5 3 1 5
6 3 1 5
7 3 0 5

You can use DataFrame.groupby() on ID column and then take max() of the Number column, and then make that into a dictionary and then use that to create the 'Total' column. Example -
grouped = df.groupby('ID')['Number'].max().to_dict()
df['Total'] = df.apply((lambda row:5 if grouped[row['ID']] else 0), axis=1)
Demo -
In [44]: df
Out[44]:
ID Number
0 1 0
1 1 0
2 1 1
3 2 0
4 2 0
5 3 1
6 3 1
7 3 0
In [56]: grouped = df.groupby('ID')['Number'].max().to_dict()
In [58]: df['Total'] = df.apply((lambda row:5 if grouped[row['ID']] else 0), axis=1)
In [59]: df
Out[59]:
ID Number Total
0 1 0 5
1 1 0 5
2 1 1 5
3 2 0 0
4 2 0 0
5 3 1 5
6 3 1 5
7 3 0 5

Related

Using previous row value while creating a new column

I have a df in python that looks something like this:
'A'
0
1
0
0
1
1
1
1
0
I want to create another column that adds cumulative 1's from column A, and starts over if the value in column A becomes 0 again. So desired output:
'A' 'B'
0 0
1 1
0 0
0 0
1 1
1 2
1 3
1 4
0 0
This is what I am trying, but it's just replicating column A:
df.B[df.A ==0] = 0
df.B[df.A !=0] = df.A + df.B.shift(1)

Let us do cumsum with groupby cumcount
df['B']=(df.groupby(df.A.eq(0).cumsum()).cumcount()).where(df.A==1,0)
Out[81]:
0 0
1 1
2 0
3 0
4 1
5 2
6 3
7 4
8 0
dtype: int64

Use shift with ne and groupby.cumsum:
df['B'] = df.groupby(df['A'].shift().ne(df['A']).cumsum())['A'].cumsum()
print(df)
A B
0 0 0
1 1 1
2 0 0
3 0 0
4 1 1
5 1 2
6 1 3
7 1 4
8 0 0

How to obtain Nan Values in pandas.groupby

When i put my data to groupby function with the following code
x =x.groupby(['Time', 'Distance'],as_index=True,observed=False).size().reset_index()
x.columns=['Time','Distance','Flow']
x.head(3)
i get such output:
Time Distance Flow
0 0 5 1
1 0 7 170
2 0 8 10
However, i need to do some smoothing, thus i need the skipped values such as:
Time Distance Flow
0 0 0 0
1 0 1 0
2 0 2 0
3 0 3 0
4 0 4 0
5 0 5 1
etc. In short, i need also the missed grouping values. How can i do this?

Use:
x = pd.DataFrame({
'Time':[0,1,1,1,1,0],
'Distance':[4,5,4,5,5,3],
})
df = x.groupby(['Time', 'Distance'],as_index=True,observed=False).size()
print (df)
Time Distance
0 3 1
4 1
1 4 1
5 3
dtype: int64
df1 = df.unstack(fill_value=0).stack().reset_index(name='Flow')
print (df1)
Time Distance Flow
0 0 3 1
1 0 4 1
2 0 5 0
3 1 3 0
4 1 4 1
5 1 5 3
Or:
m = pd.MultiIndex.from_product(df.index.levels, names=df.index.names)
df1 = df.reindex(m, fill_value=0).reset_index(name='Flow')

Deleting rows in pandas unitil the specific value first occurred

I would like to delete the rows that users equal to 1 first occurred and its previous rows for each unique user in the DataFrame.
For instance, I have the following Dataframe, and I would like to get another dataframe which deletes the row in the "val" column 1 first occured and its previous rows for each user.
user val
0 1 0
1 1 1
2 1 0
3 1 1
4 2 0
5 2 0
6 2 1
7 2 0
8 3 1
9 3 0
10 3 0
11 3 0
12 3 1
user val
0 1 0
1 1 1
2 2 0
3 3 0
4 3 0
5 3 0
6 3 1
Sample Data
import pandas as pd
s = [1,1,1,1,2,2,2,2,3,3,3,3,3]
t = [0,1,0,1,0,0,1,0,1,0,0,0,1]
df = pd.DataFrame(zip(s,t), columns=['user', 'val'])

groupby checking cummax and shift to remove all rows before, and including, the first 1 in the 'val' column per user.
Assuming your values are either 1 or 0, also possible to create the mask with a double cumsum.
m = df.groupby('user').val.apply(lambda x: x.eq(1).cummax().shift().fillna(False))
# m = df.groupby('user').val.apply(lambda x: x.cumsum().cumsum().gt(1))
df.loc[m]
Output:
user val
2 1 0
3 1 1
7 2 0
9 3 0
10 3 0
11 3 0
12 3 1

How to add incremental number to Dataframe using Pandas

I have original dataframe:
ID T value
1 0 1
1 4 3
2 0 0
2 4 1
2 7 3
The value is same previous row.
The output should be like:
ID T value
1 0 1
1 1 1
1 2 1
1 3 1
1 4 3
2 0 0
2 1 0
2 2 0
2 3 0
2 4 1
2 5 1
2 6 1
2 7 3
... ... ...
I tried loop it take long time process.
Any idea how to solve this for large dataframe?
Thanks!

For solution is necessary unique integer values in T for each group.
Use groupby with custom function - for each group use reindex and then replace NaNs in value column by forward filling ffill:
df1 = (df.groupby('ID')['T', 'value']
.apply(lambda x: x.set_index('T').reindex(np.arange(x['T'].min(), x['T'].max() + 1)))
.ffill()
.astype(int)
.reset_index())
print (df1)
ID T value
0 1 0 1
1 1 1 1
2 1 2 1
3 1 3 1
4 1 4 3
5 2 0 0
6 2 1 0
7 2 2 0
8 2 3 0
9 2 4 1
10 2 5 1
11 2 6 1
12 2 7 3
If get error:
ValueError: cannot reindex from a duplicate axis
it means some duplicated values per group like:
print (df)
ID T value
0 1 0 1
1 1 4 3
2 2 0 0
3 2 4 1 <-4 is duplicates per group 2
4 2 4 3 <-4 is duplicates per group 2
5 2 7 3
Solution is aggregate values first for unique T - e.g.by sum:
df = df.groupby(['ID', 'T'], as_index=False)['value'].sum()
print (df)
ID T value
0 1 0 1
1 1 4 3
2 2 0 0
3 2 4 4
4 2 7 3

How to apply cummulative count on multiple columns of dataframe

Dataframe
a b c
0 0 1 1
1 0 1 1
2 0 0 1
3 0 0 1
4 1 1 0
5 1 1 1
6 1 1 1
7 0 0 1
I am trying apply cummulative count cumcount on multiple columns of dataframe, i have tried applying the cummulative count by grouping each column. Is there any easy way to achieve expected output
I have tried this code , but it is not working
li =[]
for column in df.columns:
li.append(df.groupby(column)[column].cumcount())
pd.concat(li,axis=1)
Expected output
a b c
0 1 1 1
1 1 2 2
2 1 1 3
3 1 1 4
4 1 1 1
5 2 2 1
6 3 3 2
7 1 1 3

Create consecutive groups by comparing with shifted values and for each column apply cumcount, last set 1 by boolean mask:
df = (df.ne(df.shift()).cumsum()
.apply(lambda x: df.groupby(x).cumcount() + 1)
.mask(df == 0, 1))
print (df)
a b c
0 1 1 1
1 1 2 2
2 1 1 3
3 1 1 4
4 1 1 1
5 2 2 1
6 3 3 2
7 1 1 3
Another solution if performance is important - count only 1 values and last set 1 by mask by np.where:
a = df == 1
b = a.cumsum()
arr = np.where(a, b-b.mask(a).ffill().fillna(0).astype(int), 1)
df = pd.DataFrame(arr, index=df.index, columns=df.columns)
print (df)
a b c
0 1 1 1
1 1 2 2
2 1 1 3
3 1 1 4
4 1 1 1
5 2 2 1
6 3 3 2
7 1 1 3

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Apply a value to all instances of a number based on conditions - python

Use transform to add a column to your df as a result of a groupby on 'ID': In [6]: df['Total'] = df.groupby('ID').transform(lambda x: 5 if (x == 1).any() else 0) df Out[6]: ID Number Total 0 1 0 5 1 1 0 5 2 1 1 5 3 2 0 0 4 2 0 0 5 3 1 5 6 3 1 5 7 3 0 5

Related

Using previous row value while creating a new column

How to obtain Nan Values in pandas.groupby

Deleting rows in pandas unitil the specific value first occurred

How to add incremental number to Dataframe using Pandas

How to apply cummulative count on multiple columns of dataframe

Categories

Resources