Pandas groupby with each group treated as a unique group

Pandas groupby with each group treated as a unique group - python

Please assist. How do I get the cumsum of a pandas groupby, but my data is boolean 0 and 1. I want to treat each group of 0s or 1s as unique values, and the count to reset when new values are met.
I currently have this which sums up all 1s and 0's
df['grp'] = df.groupby("dir")["dir"].cumsum())
My desired output
df = pd.DataFrame({"dir":[1,1,1,1,0,0,0,1,1,1,1,0,0,0],
"grp": [1,2,3,4,1,2,3,1,2,3,4,1,2,3,]})

Use:
In [1495]: df['grp'] = df.groupby((df['dir'] != df['dir'].shift(1)).cumsum()).cumcount()+1
In [1496]: df
Out[1496]:
dir grp
0 1 1
1 1 2
2 1 3
3 1 4
4 0 1
5 0 2
6 0 3
7 1 1
8 1 2
9 1 3
10 1 4
11 0 1
12 0 2
13 0 3

Related

Fill Pandas dataframe based on conditions from same row values (based on Excel COUNTIFS formula)

I have a table like the following, and I want to fill a new column based on conditions from other columns. In this case:
Having the same value in Cond1
Amount being different than zero
Being from previous months
Column "CountPreviousMonth" is what I need to fill out
Cond1
Amount
Month
CountPreviousMonth
a
10
1
0
a
20
2
1
a
15
3
2
b
10
1
0
b
0
2
1
b
15
3
1
c
5
1
0
c
25
2
1
c
15
3
2
When month is 1 then the count is zero because is the first one.
In Cond1=b it stays at count = 1 because in Month 2 the Amount was zero
In Excel I used COUNTIFS but would like to do it in Python, where I could do it in a for loop but the real table has many rows and it wouldn't be efficient. Is there a better way to calculate it?

First replace Month to missing values if Amount=0 ant hen use custom lambda function with Series.shift and forward filling missing values:
f = lambda x: x.shift(fill_value=0).ffill().astype(int)
df['count'] = df['Month'].mask(df['Amount'].eq(0)).groupby(df['Cond1']).transform(f)
print (df)
Cond1 Amount Month CountPreviousMonth count
0 a 10 1 0 0
1 a 20 2 1 1
2 a 15 3 2 2
3 b 10 1 0 0
4 b 0 2 1 1
5 b 15 3 1 1
6 c 5 1 0 0
7 c 25 2 1 1
8 c 15 3 2 2

Move down by 1, and check which ones are not equal to 0:
arr = df.Amount.shift().ne(0)
Get boolean where month is 1:
repl = df.Month.eq(1)
index arr with repl, to keep track of first month per group:
arr[repl] = True
Groupby, run a cumulative sum, and finally deduct 1, to ensure every group starts at 0:
df.assign(CountPreviousMonth = arr.groupby(df.Cond1).cumsum().sub(1))
Cond1 Amount Month CountPreviousMonth
0 a 10 1 0
1 a 20 2 1
2 a 15 3 2
3 b 10 1 0
4 b 0 2 1
5 b 15 3 1
6 c 5 1 0
7 c 25 2 1
8 c 15 3 2

pandas create category column based on sequence repetition in another column

This is very likely a duplicate, but I'm not sure what to search for to find it.
I have a column in a dataframe that cycles from 0 to some value a number of times (in my example it cycles to 4 three times) . I want to create another column that simply shows which cycle it is. Example:
import pandas as pd
df = pd.DataFrame({'A':[0,1,2,3,4,0,1,2,3,4,0,1,2,3,4]})
df['desired_output'] = [0,0,0,0,0,1,1,1,1,1,2,2,2,2,2]
print(df)
A desired_output
0 0 0
1 1 0
2 2 0
3 3 0
4 4 0
5 0 1
6 1 1
7 2 1
8 3 1
9 4 1
10 0 2
11 1 2
12 2 2
13 3 2
14 4 2
I was thinking maybe something along the lines of a groupby(), cumsum() and transform(), but I'm not quite sure how to implement it. Could be wrong though.

Compare by 0 with Series.eq and then add Series.cumsum, last subtract 1:
df['desired_output'] = df['A'].eq(0).cumsum() - 1
print (df)
A desired_output
0 0 0
1 1 0
2 2 0
3 3 0
4 4 0
5 0 1
6 1 1
7 2 1
8 3 1
9 4 1
10 0 2
11 1 2
12 2 2
13 3 2
14 4 2

Deleting rows in pandas unitil the specific value first occurred

I would like to delete the rows that users equal to 1 first occurred and its previous rows for each unique user in the DataFrame.
For instance, I have the following Dataframe, and I would like to get another dataframe which deletes the row in the "val" column 1 first occured and its previous rows for each user.
user val
0 1 0
1 1 1
2 1 0
3 1 1
4 2 0
5 2 0
6 2 1
7 2 0
8 3 1
9 3 0
10 3 0
11 3 0
12 3 1
user val
0 1 0
1 1 1
2 2 0
3 3 0
4 3 0
5 3 0
6 3 1
Sample Data
import pandas as pd
s = [1,1,1,1,2,2,2,2,3,3,3,3,3]
t = [0,1,0,1,0,0,1,0,1,0,0,0,1]
df = pd.DataFrame(zip(s,t), columns=['user', 'val'])

groupby checking cummax and shift to remove all rows before, and including, the first 1 in the 'val' column per user.
Assuming your values are either 1 or 0, also possible to create the mask with a double cumsum.
m = df.groupby('user').val.apply(lambda x: x.eq(1).cummax().shift().fillna(False))
# m = df.groupby('user').val.apply(lambda x: x.cumsum().cumsum().gt(1))
df.loc[m]
Output:
user val
2 1 0
3 1 1
7 2 0
9 3 0
10 3 0
11 3 0
12 3 1

Identify first non-zero element within group composed of multiple columns in pandas

I have a dataframe that looks like the following. The rightmost column is my desired column:
Group1 Group2 Value Target_Column
1 3 0 0
1 3 1 1
1 4 1 1
1 4 1 0
2 5 5 5
2 5 1 0
2 6 0 0
2 6 1 1
2 6 9 0
How do I identify the first non-zero value in a group that is made up of two columns(Group1 & Group2) and then create a column that shows the first non-zero value and shows all else as zeroes?
This question is very similar to one posed earlier here:
Identify first non-zero element within a group in pandas
but that solution gives an error on groups based on multiple columns.
I have tried:
import pandas as pd
dt = pd.DataFrame({'Group1': [1,1,1,1,2,2,2,2,2], 'Group2': [3,3,4,4,5,5,6,6,6], 'Value': [0,1,1,1,5,1,0,1,9]})
dt['Newcol']=0
dt.loc[dt.Value.ne(0).groupby(dt['Group1','Group2']).idxmax(),'Newcol']=dt.Value

Setup
df['flag'] = df.Value.ne(0)
Using numpy.where and assign:
df.assign(
target=np.where(df.index.isin(df.groupby(['Group1', 'Group2']).flag.idxmax()),
df.Value, 0)
).drop('flag', 1)
Using loc and assign
df.assign(
target=df.loc[df.groupby(['Group1', 'Group2']).flag.idxmax(), 'Value']
).fillna(0).astype(int).drop('flag', 1)
Both produce:
Group1 Group2 Value target
0 1 3 0 0
1 1 3 1 1
2 1 4 1 1
3 1 4 1 0
4 2 5 5 5
5 2 5 1 0
6 2 6 0 0
7 2 6 1 1
8 2 6 9 0

The number may off, since when there are only have two same values, I do not know you need the which one.
Using user3483203 's setting up
df['flag'] = df.Value.ne(0)
df['Target']=df.sort_values(['flag'],ascending=False).drop_duplicates(['Group1','Group2']).Value
df['Target'].fillna(0,inplace=True)
df
Out[20]:
Group1 Group2 Value Target_Column Target
0 1 3 0 0 0.0
1 1 3 1 1 1.0
2 1 4 1 1 1.0
3 1 4 1 0 0.0
4 2 5 5 5 5.0
5 2 5 1 0 0.0
6 2 6 0 0 0.0
7 2 6 1 1 1.0

Apply a value to all instances of a number based on conditions

I have a df like this:
ID Number
1 0
1 0
1 1
2 0
2 0
3 1
3 1
3 0
I want to apply a 5 to any ids that have a 1 anywhere in the number column and a zero to those that don't. For example, if the number "1" appears anywhere in the Number column for ID 1, I want to place a 5 in the total column for every instance of that ID.
My desired output would look as such
ID Number Total
1 0 5
1 0 5
1 1 5
2 0 0
2 0 0
3 1 5
3 1 5
3 0 5
Trying to think of a way leverage applymap for this issue but not sure how to implement.

Use transform to add a column to your df as a result of a groupby on 'ID':
In [6]:
df['Total'] = df.groupby('ID').transform(lambda x: 5 if (x == 1).any() else 0)
df
Out[6]:
ID Number Total
0 1 0 5
1 1 0 5
2 1 1 5
3 2 0 0
4 2 0 0
5 3 1 5
6 3 1 5
7 3 0 5

You can use DataFrame.groupby() on ID column and then take max() of the Number column, and then make that into a dictionary and then use that to create the 'Total' column. Example -
grouped = df.groupby('ID')['Number'].max().to_dict()
df['Total'] = df.apply((lambda row:5 if grouped[row['ID']] else 0), axis=1)
Demo -
In [44]: df
Out[44]:
ID Number
0 1 0
1 1 0
2 1 1
3 2 0
4 2 0
5 3 1
6 3 1
7 3 0
In [56]: grouped = df.groupby('ID')['Number'].max().to_dict()
In [58]: df['Total'] = df.apply((lambda row:5 if grouped[row['ID']] else 0), axis=1)
In [59]: df
Out[59]:
ID Number Total
0 1 0 5
1 1 0 5
2 1 1 5
3 2 0 0
4 2 0 0
5 3 1 5
6 3 1 5
7 3 0 5

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas groupby with each group treated as a unique group - python

Use: In [1495]: df['grp'] = df.groupby((df['dir'] != df['dir'].shift(1)).cumsum()).cumcount()+1 In [1496]: df Out[1496]: dir grp 0 1 1 1 1 2 2 1 3 3 1 4 4 0 1 5 0 2 6 0 3 7 1 1 8 1 2 9 1 3 10 1 4 11 0 1 12 0 2 13 0 3

Related

Fill Pandas dataframe based on conditions from same row values (based on Excel COUNTIFS formula)

pandas create category column based on sequence repetition in another column

Deleting rows in pandas unitil the specific value first occurred

Identify first non-zero element within group composed of multiple columns in pandas

Apply a value to all instances of a number based on conditions

Categories

Resources