I have the dataframe with a column.
A
0.0
0.0
0.0
12.0
0.0
0.0
34.0
0.0
0.0
0.0
0.0
11.0
I want the output like this with a counter column. I want the counter to be restarted after non zero value. For the row after every non zero value, the counter should be intilaized again and then should increment.
A Counter
0.0 1
0.0 2
0.0 3
12.0 4
0.0 1
0.0 2
34.0 3
0.0 1
0.0 2
0.0 3
0.0 4
11.0 5
Let us try cumsum create the groupby key , [::-1] here is reverse the order
df['Counter'] = df.A.groupby(df.A.ne(0)[::-1].cumsum()).cumcount()+1
Out[442]:
0 1
1 2
2 3
3 4
4 1
5 2
6 3
7 1
8 2
9 3
10 4
11 5
dtype: int64
Related
Assume we have a table looks like the following:
id
week_num
people
date
level
a
b
1
1
20
1990101
1
2
3
1
2
30
1990108
1
2
3
1
3
40
1990115
1
2
3
1
5
100
1990129
1
2
3
1
7
100
1990212
1
2
3
week_num skip the "4" and "6" because the corresponding "people" is 0. However, we want the all the rows included like the following table.
id
week_num
people
date
level
a
b
1
1
20
1990101
1
2
3
1
2
30
1990108
1
2
3
1
3
40
1990115
1
2
3
1
4
0
1990122
1
2
3
1
5
100
1990129
1
2
3
1
6
0
1990205
1
2
3
1
7
100
1990212
1
2
3
The date starts with 1990101, the next row must +7 days if it is a continuous week_num(Ex: 1,2 is continuous; 1,3 is not).
How can we use python(pandas) to achieve this goal?
Note: Each id has 10 week_num(1,2,3,...,10), the output must include all "week_num" with corresponding "people" and "date".
Update: Other columns like "level","a","b" should stay the same even we add the skipped week_num.
This assumes that the date restarts at 1990-01-01 for each id:
import itertools
# reindex to get all combinations of ids and week numbers
df_full = (df.set_index(["id", "week_num"])
.reindex(list(itertools.product([1,2], range(1, 11))))
.reset_index())
# fill people with zero
df_full = df_full.fillna({"people": 0})
# forward fill some other columns
cols_ffill = ["level", "a", "b"]
df_full[cols_ffill] = df_full[cols_ffill].ffill()
# reconstruct date from week starting from 1990-01-01 for each id
df_full["date"] = pd.to_datetime("1990-01-01") + (df_full.week_num - 1) * pd.Timedelta("1w")
df_full
# out:
id week_num people date level a b
0 1 1 20.0 1990-01-01 1.0 2.0 3.0
1 1 2 30.0 1990-01-08 1.0 2.0 3.0
2 1 3 40.0 1990-01-15 1.0 2.0 3.0
3 1 4 0.0 1990-01-22 1.0 2.0 3.0
4 1 5 100.0 1990-01-29 1.0 2.0 3.0
5 1 6 0.0 1990-02-05 1.0 2.0 3.0
6 1 7 100.0 1990-02-12 1.0 2.0 3.0
7 1 8 0.0 1990-02-19 1.0 2.0 3.0
8 1 9 0.0 1990-02-26 1.0 2.0 3.0
9 1 10 0.0 1990-03-05 1.0 2.0 3.0
10 2 1 0.0 1990-01-01 1.0 2.0 3.0
11 2 2 0.0 1990-01-08 1.0 2.0 3.0
12 2 3 0.0 1990-01-15 1.0 2.0 3.0
13 2 4 0.0 1990-01-22 1.0 2.0 3.0
14 2 5 0.0 1990-01-29 1.0 2.0 3.0
15 2 6 0.0 1990-02-05 1.0 2.0 3.0
16 2 7 0.0 1990-02-12 1.0 2.0 3.0
17 2 8 0.0 1990-02-19 1.0 2.0 3.0
18 2 9 0.0 1990-02-26 1.0 2.0 3.0
19 2 10 0.0 1990-03-05 1.0 2.0 3.0
I have a dataframe that looks like the following:
df
0 1 2 3 4
0 0.0 NaN NaN NaN NaN
1 NaN 0.0 0.0 NaN 4.0
2 NaN 2.0 0.0 NaN 5.0
3 NaN NaN NaN 0.0 NaN
4 NaN 0.0 3.0 NaN 0.0
I would like to have a dataframe of all the couples of values different from NaN. The dataframe should be like the following
df
i j val
0 0 0 0.0
1 1 1 0.0
2 1 2 0.0
3 1 4 5.0
4 3 3 0.0
5 4 1 0.0
6 4 2 3.0
7 4 4 0.0
Use DataFrame.stack with DataFrame.rename_axis and DataFrame.reset_index:
df = df.stack().rename_axis(('i','j')).reset_index(name='val')
print (df)
i j val
0 0 0 0.0
1 1 1 0.0
2 1 2 0.0
3 1 4 4.0
4 2 1 2.0
5 2 2 0.0
6 2 4 5.0
7 3 3 0.0
8 4 1 0.0
9 4 2 3.0
10 4 4 0.0
Like this:
In [379]: df.stack().reset_index(name='val').rename(columns={'level_0':'i', 'level_1':'j'})
Out[379]:
i j val
0 0 0 0.0
1 1 1 0.0
2 1 2 0.0
3 1 4 4.0
4 2 1 2.0
5 2 2 0.0
6 2 4 5.0
7 3 3 0.0
8 4 1 0.0
9 4 2 3.0
10 4 4 0.0
I have the following dataframe
0 0 0
1 0 0
1 1 0
1 1 1
1 1 1
0 0 0
0 1 0
0 1 0
0 0 0
how do you get a dataframe which looks like this
0 0 0
4 0 0
4 3 0
4 3 2
4 3 2
0 0 0
0 2 0
0 2 0
0 0 0
Thank you for your help.
You may need using for loop here , with tranform, and using cumsum create the key and assign the position back to your original df
for x in df.columns:
df.loc[df[x]!=0,x]=df[x].groupby(df[x].eq(0).cumsum()[df[x]!=0]).transform('count')
df
Out[229]:
1 2 3
0 0.0 0.0 0.0
1 4.0 0.0 0.0
2 4.0 3.0 0.0
3 4.0 3.0 2.0
4 4.0 3.0 2.0
5 0.0 0.0 0.0
6 0.0 2.0 0.0
7 0.0 2.0 0.0
8 0.0 0.0 0.0
Or without for loop
s=df.stack().sort_index(level=1)
s2=s.groupby([s.index.get_level_values(1),s.eq(0).cumsum()]).transform('count').sub(1).unstack()
df=df.mask(df!=0).combine_first(s2)
df
Out[255]:
1 2 3
0 0.0 0.0 0.0
1 4.0 0.0 0.0
2 4.0 3.0 0.0
3 4.0 3.0 2.0
4 4.0 3.0 2.0
5 0.0 0.0 0.0
6 0.0 2.0 0.0
7 0.0 2.0 0.0
8 0.0 0.0 0.0
My goal is to perform a groupby, then creating rolling total stats and then shift. I need it to shift the first instance of each unique player. Right now it is shifting the entire dataframe once, and not doing it for each grouped player.
Original Data -
player date won
0 A 2016-01-11 0
1 A 2016-02-01 0
2 A 2016-02-01 1
3 A 2016-02-01 1
4 A 2016-10-24 0
5 A 2016-10-31 0
6 A 2018-10-22 0
7 B 2016-10-24 0
8 B 2016-10-24 1
9 B 2017-11-13 0
Things I've tried -
1
temp = temp_master.groupby('player', sort=False)[count_fields].rolling(10, min_periods=1).sum().shift(1).reset_index(drop=True)
temp = temp.add_suffix('_total')
temp['won_total'].head(10)
0 NaN
1 0.0
2 0.0
3 1.0
4 2.0
5 2.0
6 2.0
7 2.0
8 0.0
9 1.0
2
temp = temp_master.groupby('player', sort=False)[count_fields].shift(1).rolling(10, min_periods=1).sum().reset_index(drop=True)
temp = temp.add_suffix('_total')
temp['won_total'].head(10)
0 NaN
1 0.0
2 0.0
3 1.0
4 2.0
5 2.0
6 2.0
7 2.0
8 2.0
9 3.0
3
temp = temp_master.groupby('player', sort=False)[count_fields].rolling(10, min_periods=1).sum().reset_index(drop=True)
temp = temp.add_suffix('_total')
temp = temp.shift(1)
temp['won_total'].head(10)
0 NaN
1 0.0
2 0.0
3 1.0
4 2.0
5 2.0
6 2.0
7 2.0
8 0.0
9 1.0
This is what I need the results to be -
0 NaN
1 0.0
2 0.0
3 1.0
4 2.0
5 2.0
6 2.0
7 NaN
8 0.0
9 1.0
index #7 should equal NaN. It should be the first instance of player B and I want it to shift at the first instance of every new player to sumarrize stats by player.
index 8 should equal 0
index 9 should equal 1
It looks like attempt #1 & #3 is close but it's not assigning the NaN value on the new player. #3 isn't doing a groupedby player anymore though so I know that won't really work.
Also, this will be done on a good amount of data (around 100K-300K rows) and the 'count_fields' column contains around 3K-4K columns that I am calculating. Just something to be aware of.
Any ideas on how to create running stats by player and shift down at for every player?
You need apply here , this two functions are not chain under the groupby object , sum is under the groupby , but shift will implement to the result after sum which is whole columns
temp = temp_master.groupby('player', sort=False)['won'].apply(lambda x : x.rolling(10, min_periods=1).sum().shift(1))\
.reset_index(drop=True)
temp
0 NaN
1 0.0
2 0.0
3 1.0
4 2.0
5 2.0
6 2.0
7 NaN
8 0.0
9 1.0
Name: won, dtype: float64
Another option if you don't want to use apply is to layer a second groupby call and perform the shifting:
(df.groupby('player', sort=False)
.won.rolling(10, min_periods=1)
.sum()
.groupby(level=0)
.shift()
.reset_index(drop=True))
0 NaN
1 0.0
2 0.0
3 1.0
4 2.0
5 2.0
6 2.0
7 NaN
8 0.0
9 1.0
Name: won, dtype: float64
I have two dataframes:
dayData
power_comparison final_average_delta_power calculated_power
1 0.0 0.0 0
2 0.0 0.0 0
3 0.0 0.0 0
4 0.0 0.0 0
5 0.0 0.0 0
7 0.0 0.0 0
and
historicPower
power
0 0.0
1 0.0
2 0.0
3 -1.0
4 0.0
5 1.0
7 0.0
I'm trying to reindex the historicPower dataframe to have the same shape as the dayData dataframe (so in this example it would looks like):
power
1 0.0
2 0.0
3 -1.0
4 0.0
5 1.0
7 0.0
The dataframes in reality will be alot larger with different shapes.
I think you can use reindex if index has no duplicates:
historicPower = historicPower.reindex(dayData.index)
print (historicPower)
power
1 0.0
2 0.0
3 -1.0
4 0.0
5 1.0
7 0.0