Please assist. How do I get the cumsum of a pandas groupby, but my data is boolean 0 and 1. I want to treat each group of 0s or 1s as unique values, and the count to reset when new values are met.
I currently have this which sums up all 1s and 0's
df['grp'] = df.groupby("dir")["dir"].cumsum())
My desired output
df = pd.DataFrame({"dir":[1,1,1,1,0,0,0,1,1,1,1,0,0,0],
"grp": [1,2,3,4,1,2,3,1,2,3,4,1,2,3,]})
Use:
In [1495]: df['grp'] = df.groupby((df['dir'] != df['dir'].shift(1)).cumsum()).cumcount()+1
In [1496]: df
Out[1496]:
dir grp
0 1 1
1 1 2
2 1 3
3 1 4
4 0 1
5 0 2
6 0 3
7 1 1
8 1 2
9 1 3
10 1 4
11 0 1
12 0 2
13 0 3
Related
I have a table like the following, and I want to fill a new column based on conditions from other columns. In this case:
Having the same value in Cond1
Amount being different than zero
Being from previous months
Column "CountPreviousMonth" is what I need to fill out
Cond1
Amount
Month
CountPreviousMonth
a
10
1
0
a
20
2
1
a
15
3
2
b
10
1
0
b
0
2
1
b
15
3
1
c
5
1
0
c
25
2
1
c
15
3
2
When month is 1 then the count is zero because is the first one.
In Cond1=b it stays at count = 1 because in Month 2 the Amount was zero
In Excel I used COUNTIFS but would like to do it in Python, where I could do it in a for loop but the real table has many rows and it wouldn't be efficient. Is there a better way to calculate it?
First replace Month to missing values if Amount=0 ant hen use custom lambda function with Series.shift and forward filling missing values:
f = lambda x: x.shift(fill_value=0).ffill().astype(int)
df['count'] = df['Month'].mask(df['Amount'].eq(0)).groupby(df['Cond1']).transform(f)
print (df)
Cond1 Amount Month CountPreviousMonth count
0 a 10 1 0 0
1 a 20 2 1 1
2 a 15 3 2 2
3 b 10 1 0 0
4 b 0 2 1 1
5 b 15 3 1 1
6 c 5 1 0 0
7 c 25 2 1 1
8 c 15 3 2 2
Move down by 1, and check which ones are not equal to 0:
arr = df.Amount.shift().ne(0)
Get boolean where month is 1:
repl = df.Month.eq(1)
index arr with repl, to keep track of first month per group:
arr[repl] = True
Groupby, run a cumulative sum, and finally deduct 1, to ensure every group starts at 0:
df.assign(CountPreviousMonth = arr.groupby(df.Cond1).cumsum().sub(1))
Cond1 Amount Month CountPreviousMonth
0 a 10 1 0
1 a 20 2 1
2 a 15 3 2
3 b 10 1 0
4 b 0 2 1
5 b 15 3 1
6 c 5 1 0
7 c 25 2 1
8 c 15 3 2
This is very likely a duplicate, but I'm not sure what to search for to find it.
I have a column in a dataframe that cycles from 0 to some value a number of times (in my example it cycles to 4 three times) . I want to create another column that simply shows which cycle it is. Example:
import pandas as pd
df = pd.DataFrame({'A':[0,1,2,3,4,0,1,2,3,4,0,1,2,3,4]})
df['desired_output'] = [0,0,0,0,0,1,1,1,1,1,2,2,2,2,2]
print(df)
A desired_output
0 0 0
1 1 0
2 2 0
3 3 0
4 4 0
5 0 1
6 1 1
7 2 1
8 3 1
9 4 1
10 0 2
11 1 2
12 2 2
13 3 2
14 4 2
I was thinking maybe something along the lines of a groupby(), cumsum() and transform(), but I'm not quite sure how to implement it. Could be wrong though.
Compare by 0 with Series.eq and then add Series.cumsum, last subtract 1:
df['desired_output'] = df['A'].eq(0).cumsum() - 1
print (df)
A desired_output
0 0 0
1 1 0
2 2 0
3 3 0
4 4 0
5 0 1
6 1 1
7 2 1
8 3 1
9 4 1
10 0 2
11 1 2
12 2 2
13 3 2
14 4 2
I would like to delete the rows that users equal to 1 first occurred and its previous rows for each unique user in the DataFrame.
For instance, I have the following Dataframe, and I would like to get another dataframe which deletes the row in the "val" column 1 first occured and its previous rows for each user.
user val
0 1 0
1 1 1
2 1 0
3 1 1
4 2 0
5 2 0
6 2 1
7 2 0
8 3 1
9 3 0
10 3 0
11 3 0
12 3 1
user val
0 1 0
1 1 1
2 2 0
3 3 0
4 3 0
5 3 0
6 3 1
Sample Data
import pandas as pd
s = [1,1,1,1,2,2,2,2,3,3,3,3,3]
t = [0,1,0,1,0,0,1,0,1,0,0,0,1]
df = pd.DataFrame(zip(s,t), columns=['user', 'val'])
groupby checking cummax and shift to remove all rows before, and including, the first 1 in the 'val' column per user.
Assuming your values are either 1 or 0, also possible to create the mask with a double cumsum.
m = df.groupby('user').val.apply(lambda x: x.eq(1).cummax().shift().fillna(False))
# m = df.groupby('user').val.apply(lambda x: x.cumsum().cumsum().gt(1))
df.loc[m]
Output:
user val
2 1 0
3 1 1
7 2 0
9 3 0
10 3 0
11 3 0
12 3 1
I have a dataframe that looks like the following. The rightmost column is my desired column:
Group1 Group2 Value Target_Column
1 3 0 0
1 3 1 1
1 4 1 1
1 4 1 0
2 5 5 5
2 5 1 0
2 6 0 0
2 6 1 1
2 6 9 0
How do I identify the first non-zero value in a group that is made up of two columns(Group1 & Group2) and then create a column that shows the first non-zero value and shows all else as zeroes?
This question is very similar to one posed earlier here:
Identify first non-zero element within a group in pandas
but that solution gives an error on groups based on multiple columns.
I have tried:
import pandas as pd
dt = pd.DataFrame({'Group1': [1,1,1,1,2,2,2,2,2], 'Group2': [3,3,4,4,5,5,6,6,6], 'Value': [0,1,1,1,5,1,0,1,9]})
dt['Newcol']=0
dt.loc[dt.Value.ne(0).groupby(dt['Group1','Group2']).idxmax(),'Newcol']=dt.Value
Setup
df['flag'] = df.Value.ne(0)
Using numpy.where and assign:
df.assign(
target=np.where(df.index.isin(df.groupby(['Group1', 'Group2']).flag.idxmax()),
df.Value, 0)
).drop('flag', 1)
Using loc and assign
df.assign(
target=df.loc[df.groupby(['Group1', 'Group2']).flag.idxmax(), 'Value']
).fillna(0).astype(int).drop('flag', 1)
Both produce:
Group1 Group2 Value target
0 1 3 0 0
1 1 3 1 1
2 1 4 1 1
3 1 4 1 0
4 2 5 5 5
5 2 5 1 0
6 2 6 0 0
7 2 6 1 1
8 2 6 9 0
The number may off, since when there are only have two same values, I do not know you need the which one.
Using user3483203 's setting up
df['flag'] = df.Value.ne(0)
df['Target']=df.sort_values(['flag'],ascending=False).drop_duplicates(['Group1','Group2']).Value
df['Target'].fillna(0,inplace=True)
df
Out[20]:
Group1 Group2 Value Target_Column Target
0 1 3 0 0 0.0
1 1 3 1 1 1.0
2 1 4 1 1 1.0
3 1 4 1 0 0.0
4 2 5 5 5 5.0
5 2 5 1 0 0.0
6 2 6 0 0 0.0
7 2 6 1 1 1.0
I have a df like this:
ID Number
1 0
1 0
1 1
2 0
2 0
3 1
3 1
3 0
I want to apply a 5 to any ids that have a 1 anywhere in the number column and a zero to those that don't. For example, if the number "1" appears anywhere in the Number column for ID 1, I want to place a 5 in the total column for every instance of that ID.
My desired output would look as such
ID Number Total
1 0 5
1 0 5
1 1 5
2 0 0
2 0 0
3 1 5
3 1 5
3 0 5
Trying to think of a way leverage applymap for this issue but not sure how to implement.
Use transform to add a column to your df as a result of a groupby on 'ID':
In [6]:
df['Total'] = df.groupby('ID').transform(lambda x: 5 if (x == 1).any() else 0)
df
Out[6]:
ID Number Total
0 1 0 5
1 1 0 5
2 1 1 5
3 2 0 0
4 2 0 0
5 3 1 5
6 3 1 5
7 3 0 5
You can use DataFrame.groupby() on ID column and then take max() of the Number column, and then make that into a dictionary and then use that to create the 'Total' column. Example -
grouped = df.groupby('ID')['Number'].max().to_dict()
df['Total'] = df.apply((lambda row:5 if grouped[row['ID']] else 0), axis=1)
Demo -
In [44]: df
Out[44]:
ID Number
0 1 0
1 1 0
2 1 1
3 2 0
4 2 0
5 3 1
6 3 1
7 3 0
In [56]: grouped = df.groupby('ID')['Number'].max().to_dict()
In [58]: df['Total'] = df.apply((lambda row:5 if grouped[row['ID']] else 0), axis=1)
In [59]: df
Out[59]:
ID Number Total
0 1 0 5
1 1 0 5
2 1 1 5
3 2 0 0
4 2 0 0
5 3 1 5
6 3 1 5
7 3 0 5