Reset the counter when column has non zero value

Reset the counter when column has non zero value - python

I have the dataframe with a column.
A
0.0
0.0
0.0
12.0
0.0
0.0
34.0
0.0
0.0
0.0
0.0
11.0
I want the output like this with a counter column. I want the counter to be restarted after non zero value. For the row after every non zero value, the counter should be intilaized again and then should increment.
A Counter
0.0 1
0.0 2
0.0 3
12.0 4
0.0 1
0.0 2
34.0 3
0.0 1
0.0 2
0.0 3
0.0 4
11.0 5

Let us try cumsum create the groupby key , [::-1] here is reverse the order
df['Counter'] = df.A.groupby(df.A.ne(0)[::-1].cumsum()).cumcount()+1
Out[442]:
0 1
1 2
2 3
3 4
4 1
5 2
6 3
7 1
8 2
9 3
10 4
11 5
dtype: int64

Related

Adding new rows with new values at some specific columns in pandas

Assume we have a table looks like the following:
id
week_num
people
date
level
a
b
1
1
20
1990101
1
2
3
1
2
30
1990108
1
2
3
1
3
40
1990115
1
2
3
1
5
100
1990129
1
2
3
1
7
100
1990212
1
2
3
week_num skip the "4" and "6" because the corresponding "people" is 0. However, we want the all the rows included like the following table.
id
week_num
people
date
level
a
b
1
1
20
1990101
1
2
3
1
2
30
1990108
1
2
3
1
3
40
1990115
1
2
3
1
4
0
1990122
1
2
3
1
5
100
1990129
1
2
3
1
6
0
1990205
1
2
3
1
7
100
1990212
1
2
3
The date starts with 1990101, the next row must +7 days if it is a continuous week_num(Ex: 1,2 is continuous; 1,3 is not).
How can we use python(pandas) to achieve this goal?
Note: Each id has 10 week_num(1,2,3,...,10), the output must include all "week_num" with corresponding "people" and "date".
Update: Other columns like "level","a","b" should stay the same even we add the skipped week_num.

This assumes that the date restarts at 1990-01-01 for each id:
import itertools
# reindex to get all combinations of ids and week numbers
df_full = (df.set_index(["id", "week_num"])
.reindex(list(itertools.product([1,2], range(1, 11))))
.reset_index())
# fill people with zero
df_full = df_full.fillna({"people": 0})
# forward fill some other columns
cols_ffill = ["level", "a", "b"]
df_full[cols_ffill] = df_full[cols_ffill].ffill()
# reconstruct date from week starting from 1990-01-01 for each id
df_full["date"] = pd.to_datetime("1990-01-01") + (df_full.week_num - 1) * pd.Timedelta("1w")
df_full
# out:
id week_num people date level a b
0 1 1 20.0 1990-01-01 1.0 2.0 3.0
1 1 2 30.0 1990-01-08 1.0 2.0 3.0
2 1 3 40.0 1990-01-15 1.0 2.0 3.0
3 1 4 0.0 1990-01-22 1.0 2.0 3.0
4 1 5 100.0 1990-01-29 1.0 2.0 3.0
5 1 6 0.0 1990-02-05 1.0 2.0 3.0
6 1 7 100.0 1990-02-12 1.0 2.0 3.0
7 1 8 0.0 1990-02-19 1.0 2.0 3.0
8 1 9 0.0 1990-02-26 1.0 2.0 3.0
9 1 10 0.0 1990-03-05 1.0 2.0 3.0
10 2 1 0.0 1990-01-01 1.0 2.0 3.0
11 2 2 0.0 1990-01-08 1.0 2.0 3.0
12 2 3 0.0 1990-01-15 1.0 2.0 3.0
13 2 4 0.0 1990-01-22 1.0 2.0 3.0
14 2 5 0.0 1990-01-29 1.0 2.0 3.0
15 2 6 0.0 1990-02-05 1.0 2.0 3.0
16 2 7 0.0 1990-02-12 1.0 2.0 3.0
17 2 8 0.0 1990-02-19 1.0 2.0 3.0
18 2 9 0.0 1990-02-26 1.0 2.0 3.0
19 2 10 0.0 1990-03-05 1.0 2.0 3.0

Python: how to select indexes from pandas dataframe?

I have a dataframe that looks like the following:
df
0 1 2 3 4
0 0.0 NaN NaN NaN NaN
1 NaN 0.0 0.0 NaN 4.0
2 NaN 2.0 0.0 NaN 5.0
3 NaN NaN NaN 0.0 NaN
4 NaN 0.0 3.0 NaN 0.0
I would like to have a dataframe of all the couples of values different from NaN. The dataframe should be like the following
df
i j val
0 0 0 0.0
1 1 1 0.0
2 1 2 0.0
3 1 4 5.0
4 3 3 0.0
5 4 1 0.0
6 4 2 3.0
7 4 4 0.0

Use DataFrame.stack with DataFrame.rename_axis and DataFrame.reset_index:
df = df.stack().rename_axis(('i','j')).reset_index(name='val')
print (df)
i j val
0 0 0 0.0
1 1 1 0.0
2 1 2 0.0
3 1 4 4.0
4 2 1 2.0
5 2 2 0.0
6 2 4 5.0
7 3 3 0.0
8 4 1 0.0
9 4 2 3.0
10 4 4 0.0

Like this:
In [379]: df.stack().reset_index(name='val').rename(columns={'level_0':'i', 'level_1':'j'})
Out[379]:
i j val
0 0 0 0.0
1 1 1 0.0
2 1 2 0.0
3 1 4 4.0
4 2 1 2.0
5 2 2 0.0
6 2 4 5.0
7 3 3 0.0
8 4 1 0.0
9 4 2 3.0
10 4 4 0.0

pandas count the number of value different from zero between two zeros

I have the following dataframe
0 0 0
1 0 0
1 1 0
1 1 1
1 1 1
0 0 0
0 1 0
0 1 0
0 0 0
how do you get a dataframe which looks like this
0 0 0
4 0 0
4 3 0
4 3 2
4 3 2
0 0 0
0 2 0
0 2 0
0 0 0
Thank you for your help.

You may need using for loop here , with tranform, and using cumsum create the key and assign the position back to your original df
for x in df.columns:
df.loc[df[x]!=0,x]=df[x].groupby(df[x].eq(0).cumsum()[df[x]!=0]).transform('count')
df
Out[229]:
1 2 3
0 0.0 0.0 0.0
1 4.0 0.0 0.0
2 4.0 3.0 0.0
3 4.0 3.0 2.0
4 4.0 3.0 2.0
5 0.0 0.0 0.0
6 0.0 2.0 0.0
7 0.0 2.0 0.0
8 0.0 0.0 0.0
Or without for loop
s=df.stack().sort_index(level=1)
s2=s.groupby([s.index.get_level_values(1),s.eq(0).cumsum()]).transform('count').sub(1).unstack()
df=df.mask(df!=0).combine_first(s2)
df
Out[255]:
1 2 3
0 0.0 0.0 0.0
1 4.0 0.0 0.0
2 4.0 3.0 0.0
3 4.0 3.0 2.0
4 4.0 3.0 2.0
5 0.0 0.0 0.0
6 0.0 2.0 0.0
7 0.0 2.0 0.0
8 0.0 0.0 0.0

Compute rolling sum shifted for each group

My goal is to perform a groupby, then creating rolling total stats and then shift. I need it to shift the first instance of each unique player. Right now it is shifting the entire dataframe once, and not doing it for each grouped player.
Original Data -
player date won
0 A 2016-01-11 0
1 A 2016-02-01 0
2 A 2016-02-01 1
3 A 2016-02-01 1
4 A 2016-10-24 0
5 A 2016-10-31 0
6 A 2018-10-22 0
7 B 2016-10-24 0
8 B 2016-10-24 1
9 B 2017-11-13 0
Things I've tried -
1
temp = temp_master.groupby('player', sort=False)[count_fields].rolling(10, min_periods=1).sum().shift(1).reset_index(drop=True)
temp = temp.add_suffix('_total')
temp['won_total'].head(10)
0 NaN
1 0.0
2 0.0
3 1.0
4 2.0
5 2.0
6 2.0
7 2.0
8 0.0
9 1.0
2
temp = temp_master.groupby('player', sort=False)[count_fields].shift(1).rolling(10, min_periods=1).sum().reset_index(drop=True)
temp = temp.add_suffix('_total')
temp['won_total'].head(10)
0 NaN
1 0.0
2 0.0
3 1.0
4 2.0
5 2.0
6 2.0
7 2.0
8 2.0
9 3.0
3
temp = temp_master.groupby('player', sort=False)[count_fields].rolling(10, min_periods=1).sum().reset_index(drop=True)
temp = temp.add_suffix('_total')
temp = temp.shift(1)
temp['won_total'].head(10)
0 NaN
1 0.0
2 0.0
3 1.0
4 2.0
5 2.0
6 2.0
7 2.0
8 0.0
9 1.0
This is what I need the results to be -
0 NaN
1 0.0
2 0.0
3 1.0
4 2.0
5 2.0
6 2.0
7 NaN
8 0.0
9 1.0
index #7 should equal NaN. It should be the first instance of player B and I want it to shift at the first instance of every new player to sumarrize stats by player.
index 8 should equal 0
index 9 should equal 1
It looks like attempt #1 & #3 is close but it's not assigning the NaN value on the new player. #3 isn't doing a groupedby player anymore though so I know that won't really work.
Also, this will be done on a good amount of data (around 100K-300K rows) and the 'count_fields' column contains around 3K-4K columns that I am calculating. Just something to be aware of.
Any ideas on how to create running stats by player and shift down at for every player?

You need apply here , this two functions are not chain under the groupby object , sum is under the groupby , but shift will implement to the result after sum which is whole columns
temp = temp_master.groupby('player', sort=False)['won'].apply(lambda x : x.rolling(10, min_periods=1).sum().shift(1))\
.reset_index(drop=True)
temp
0 NaN
1 0.0
2 0.0
3 1.0
4 2.0
5 2.0
6 2.0
7 NaN
8 0.0
9 1.0
Name: won, dtype: float64

Another option if you don't want to use apply is to layer a second groupby call and perform the shifting:
(df.groupby('player', sort=False)
.won.rolling(10, min_periods=1)
.sum()
.groupby(level=0)
.shift()
.reset_index(drop=True))
0 NaN
1 0.0
2 0.0
3 1.0
4 2.0
5 2.0
6 2.0
7 NaN
8 0.0
9 1.0
Name: won, dtype: float64

Reshape dataframe to have the same index as another dataframe

I have two dataframes:
dayData
power_comparison final_average_delta_power calculated_power
1 0.0 0.0 0
2 0.0 0.0 0
3 0.0 0.0 0
4 0.0 0.0 0
5 0.0 0.0 0
7 0.0 0.0 0
and
historicPower
power
0 0.0
1 0.0
2 0.0
3 -1.0
4 0.0
5 1.0
7 0.0
I'm trying to reindex the historicPower dataframe to have the same shape as the dayData dataframe (so in this example it would looks like):
power
1 0.0
2 0.0
3 -1.0
4 0.0
5 1.0
7 0.0
The dataframes in reality will be alot larger with different shapes.

I think you can use reindex if index has no duplicates:
historicPower = historicPower.reindex(dayData.index)
print (historicPower)
power
1 0.0
2 0.0
3 -1.0
4 0.0
5 1.0
7 0.0

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reset the counter when column has non zero value - python

Let us try cumsum create the groupby key , [::-1] here is reverse the order df['Counter'] = df.A.groupby(df.A.ne(0)[::-1].cumsum()).cumcount()+1 Out[442]: 0 1 1 2 2 3 3 4 4 1 5 2 6 3 7 1 8 2 9 3 10 4 11 5 dtype: int64

Related

Adding new rows with new values at some specific columns in pandas

Python: how to select indexes from pandas dataframe?

pandas count the number of value different from zero between two zeros

Compute rolling sum shifted for each group

Reshape dataframe to have the same index as another dataframe

Categories

Resources