conditional cumsum in pandas [duplicate]

conditional cumsum in pandas [duplicate] - python

This question already has an answer here:
How can I use cumsum within a group in Pandas?
(1 answer)
Closed 3 years ago.
I have following dataframe in pandas
code rank quant sales
123 1 0 2
123 1 12 2
123 1 0 2
123 2 0 1
123 2 10 1
I want to do a conditional cumsum of sales groupby rank. where quant is not zero add it in cumulative sum on the same row.
code rank quant sales cumsum
123 1 0 2 2
123 1 12 2 16
123 1 0 2 18
123 2 0 1 1
123 2 10 1 12
How to do it in pandas.

Add columns first and then use GroupBy.cumsum with df['rank'] Series:
df['cumsum'] = df['quant'].add(df['sales']).groupby(df['rank']).cumsum()
Or use sum by both columns:
df['cumsum'] = df[['quant', 'sales']].sum(axis=1).groupby(df['rank']).cumsum()
Alternative is create new column before groupby:
df['cumsum'] = (df.assign(cumsum=df['quant'].add(df['sales']))
.groupby('rank')['cumsum'].cumsum())
print (df)
code rank quant sales cumsum
0 123 1 0 2 2
1 123 1 12 2 16
2 123 1 0 2 18
3 123 2 0 1 1
4 123 2 10 1 12

Related

Pandas groupby with each group treated as a unique group

Please assist. How do I get the cumsum of a pandas groupby, but my data is boolean 0 and 1. I want to treat each group of 0s or 1s as unique values, and the count to reset when new values are met.
I currently have this which sums up all 1s and 0's
df['grp'] = df.groupby("dir")["dir"].cumsum())
My desired output
df = pd.DataFrame({"dir":[1,1,1,1,0,0,0,1,1,1,1,0,0,0],
"grp": [1,2,3,4,1,2,3,1,2,3,4,1,2,3,]})

Use:
In [1495]: df['grp'] = df.groupby((df['dir'] != df['dir'].shift(1)).cumsum()).cumcount()+1
In [1496]: df
Out[1496]:
dir grp
0 1 1
1 1 2
2 1 3
3 1 4
4 0 1
5 0 2
6 0 3
7 1 1
8 1 2
9 1 3
10 1 4
11 0 1
12 0 2
13 0 3

pandas create category column based on sequence repetition in another column

This is very likely a duplicate, but I'm not sure what to search for to find it.
I have a column in a dataframe that cycles from 0 to some value a number of times (in my example it cycles to 4 three times) . I want to create another column that simply shows which cycle it is. Example:
import pandas as pd
df = pd.DataFrame({'A':[0,1,2,3,4,0,1,2,3,4,0,1,2,3,4]})
df['desired_output'] = [0,0,0,0,0,1,1,1,1,1,2,2,2,2,2]
print(df)
A desired_output
0 0 0
1 1 0
2 2 0
3 3 0
4 4 0
5 0 1
6 1 1
7 2 1
8 3 1
9 4 1
10 0 2
11 1 2
12 2 2
13 3 2
14 4 2
I was thinking maybe something along the lines of a groupby(), cumsum() and transform(), but I'm not quite sure how to implement it. Could be wrong though.

Compare by 0 with Series.eq and then add Series.cumsum, last subtract 1:
df['desired_output'] = df['A'].eq(0).cumsum() - 1
print (df)
A desired_output
0 0 0
1 1 0
2 2 0
3 3 0
4 4 0
5 0 1
6 1 1
7 2 1
8 3 1
9 4 1
10 0 2
11 1 2
12 2 2
13 3 2
14 4 2

Groupby, Shift and Sum

I have the following dataframe:
product Week_Number Sales
1 1 10
2 1 15
1 2 20
And I would like to groupby product and week number and create a column with the sales of the next week for that product:
product Week_Number Sales next_week
1 1 10 20
2 1 15 0
1 2 20 0

Use DataFrame.sort_values with DataFrameGroupBy.shift :
#if not sure if sorted per 2 columns
df = df.sort_values(['product','Week_Number'])
#pandas 0.24+
df['next_week'] = df.groupby('product')['Sales'].shift(-1, fill_value=0)
#pandas below
#df['next_week'] = df.groupby('product')['Sales'].shift(-1).fillna(0, downcast='int')
print (df)
product Week_Number Sales next_week
0 1 1 10 20
1 2 1 15 0
2 1 2 20 0
If possible duplicates and need aggregate sum first in real data:
df = df.groupby(['product','Week_Number'], as_index=False)['Sales'].sum()
df['next_week'] = df.groupby('product')['Sales'].shift(-1).fillna(0, downcast='int')
print (df)
product Week_Number Sales next_week
0 1 1 10 20
1 1 2 20 0
2 2 1 15 0

First sort the data
Then apply shift using tranform
df = pd.DataFrame(data={'product':[1,2,1],
'week_number':[1,1,2],
'sales':[10,15,20]})
df.sort_values(['product','week_number'],inplace=True)
df['next_week'] = df.groupby(['product'])['sales'].transform(pd.Series.shift,-1,fill_value=0)
print(df)
product week_number sales next_week
0 1 1 10 20
2 1 2 20 0
1 2 1 15 0

Deleting rows in pandas unitil the specific value first occurred

I would like to delete the rows that users equal to 1 first occurred and its previous rows for each unique user in the DataFrame.
For instance, I have the following Dataframe, and I would like to get another dataframe which deletes the row in the "val" column 1 first occured and its previous rows for each user.
user val
0 1 0
1 1 1
2 1 0
3 1 1
4 2 0
5 2 0
6 2 1
7 2 0
8 3 1
9 3 0
10 3 0
11 3 0
12 3 1
user val
0 1 0
1 1 1
2 2 0
3 3 0
4 3 0
5 3 0
6 3 1
Sample Data
import pandas as pd
s = [1,1,1,1,2,2,2,2,3,3,3,3,3]
t = [0,1,0,1,0,0,1,0,1,0,0,0,1]
df = pd.DataFrame(zip(s,t), columns=['user', 'val'])

groupby checking cummax and shift to remove all rows before, and including, the first 1 in the 'val' column per user.
Assuming your values are either 1 or 0, also possible to create the mask with a double cumsum.
m = df.groupby('user').val.apply(lambda x: x.eq(1).cummax().shift().fillna(False))
# m = df.groupby('user').val.apply(lambda x: x.cumsum().cumsum().gt(1))
df.loc[m]
Output:
user val
2 1 0
3 1 1
7 2 0
9 3 0
10 3 0
11 3 0
12 3 1

How to add incremental number to Dataframe using Pandas

I have original dataframe:
ID T value
1 0 1
1 4 3
2 0 0
2 4 1
2 7 3
The value is same previous row.
The output should be like:
ID T value
1 0 1
1 1 1
1 2 1
1 3 1
1 4 3
2 0 0
2 1 0
2 2 0
2 3 0
2 4 1
2 5 1
2 6 1
2 7 3
... ... ...
I tried loop it take long time process.
Any idea how to solve this for large dataframe?
Thanks!

For solution is necessary unique integer values in T for each group.
Use groupby with custom function - for each group use reindex and then replace NaNs in value column by forward filling ffill:
df1 = (df.groupby('ID')['T', 'value']
.apply(lambda x: x.set_index('T').reindex(np.arange(x['T'].min(), x['T'].max() + 1)))
.ffill()
.astype(int)
.reset_index())
print (df1)
ID T value
0 1 0 1
1 1 1 1
2 1 2 1
3 1 3 1
4 1 4 3
5 2 0 0
6 2 1 0
7 2 2 0
8 2 3 0
9 2 4 1
10 2 5 1
11 2 6 1
12 2 7 3
If get error:
ValueError: cannot reindex from a duplicate axis
it means some duplicated values per group like:
print (df)
ID T value
0 1 0 1
1 1 4 3
2 2 0 0
3 2 4 1 <-4 is duplicates per group 2
4 2 4 3 <-4 is duplicates per group 2
5 2 7 3
Solution is aggregate values first for unique T - e.g.by sum:
df = df.groupby(['ID', 'T'], as_index=False)['value'].sum()
print (df)
ID T value
0 1 0 1
1 1 4 3
2 2 0 0
3 2 4 4
4 2 7 3

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

conditional cumsum in pandas [duplicate] - python

Related

Pandas groupby with each group treated as a unique group

pandas create category column based on sequence repetition in another column

Groupby, Shift and Sum

Deleting rows in pandas unitil the specific value first occurred

How to add incremental number to Dataframe using Pandas

Categories

Resources