Avoid iteration over rows for computation in pandas

Avoid iteration over rows for computation in pandas - python

County date available_wheat usage rate (%) consumption
A 1/2/2021 100.00 3
A 1/3/2021 3
A 1/4/2021 2
A 1/5/2021 5
A 1/6/2021 1
A 1/7/2021 2
A 1/8/2021 5
A 1/9/2021 6
A 1/10/2021 7
A 1/11/2021 8
A 1/12/2021 1
A 1/13/2021 2
Above is my dataframe, I need to fill in the available in the columns. Available need to be reduced by usage rate (%), I am able to do using iterrows (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html).
My dataframe is quite big compared to what is displayed so the question is: is it possible to vectorize the calculation using either lambda or something else?
Expected output:
County date available_wheat usage rate (%) consumption
A 1/2/2021 100.00 3 3.00
A 1/3/2021 97.00 3 2.91
A 1/4/2021 94.09 2 1.88
A 1/5/2021 92.21 5 4.61
A 1/6/2021 87.60 1 0.88
A 1/7/2021 86.72 2 1.73
A 1/8/2021 84.99 5 4.25
A 1/9/2021 80.74 6 4.84
A 1/10/2021 75.89 7 5.31
A 1/11/2021 70.58 8 5.65
A 1/12/2021 64.93 1 0.65
A 1/13/2021 64.29 2 1.29

You need to use a shifted cumprod of your usage rate:
factor = df['usage rate (%)'].shift(fill_value=0).rsub(100).div(100).cumprod()
df['available_wheat'] = df['available_wheat'].ffill().mul(factor)
df['consumption'] = df['usage rate (%)'].mul(df['available_wheat']).div(100)
NB. if you have several counties and want to handle them independently, then perform all that within a groupby. Add round(2) to get 2 significant digits.
output:
County date available_wheat usage rate (%) consumption
0 A 1/2/2021 100.000000 3 3.000000
1 A 1/3/2021 97.000000 3 2.910000
2 A 1/4/2021 94.090000 2 1.881800
3 A 1/5/2021 92.208200 5 4.610410
4 A 1/6/2021 87.597790 1 0.875978
5 A 1/7/2021 86.721812 2 1.734436
6 A 1/8/2021 84.987376 5 4.249369
7 A 1/9/2021 80.738007 6 4.844280
8 A 1/10/2021 75.893727 7 5.312561
9 A 1/11/2021 70.581166 8 5.646493
10 A 1/12/2021 64.934673 1 0.649347
11 A 1/13/2021 64.285326 2 1.285707
grouped per County
Same logic in a groupby:
factor = (df.groupby('County')['usage rate (%)']
.apply(lambda s: s.shift(fill_value=0).rsub(100).div(100).cumprod())
)
df['available_wheat'] = df.groupby('County')['available_wheat'].ffill().mul(factor)
df['consumption'] = df['usage rate (%)'].mul(df['available_wheat']).div(100)

available_wheat2=100
def function1(ss:pd.Series):
global available_wheat2
ss['available_wheat']=available_wheat2
ss.consumption=np.round(available_wheat2 * ss['usage rate (%)'] / 100, 2)
available_wheat2= available_wheat2 - ss['consumption']
return ss
df1.apply(function1,axis=1)
out：
County date available_wheat usage rate (%) consumption
0 A 1/2/2021 100.00 3 3.00
1 A 1/3/2021 97.00 3 2.91
2 A 1/4/2021 94.09 2 1.88
3 A 1/5/2021 92.21 5 4.61
4 A 1/6/2021 87.60 1 0.88
5 A 1/7/2021 86.72 2 1.73
6 A 1/8/2021 84.99 5 4.25
7 A 1/9/2021 80.74 6 4.84
8 A 1/10/2021 75.90 7 5.31
9 A 1/11/2021 70.59 8 5.65
10 A 1/12/2021 64.94 1 0.65
11 A 1/13/2021 64.29 2 1.29

Related

How to calculate last two week sum for each group ID

** I have Input data frame **
ID
Date
Amount
A
2021-08-03
100
A
2021-08-04
100
A
2021-08-06
20
A
2021-08-07
100
A
2021-08-09
300
A
2021-08-11
100
A
2021-08-12
100
A
2021-08-13
10
A
2021-08-23
10
A
2021-08-24
10
A
2021-08-26
10
A
2021-08-28
10
desired Output data frame
ID
Date
Amount
TwoWeekSum
A
2021-08-03
100
320
A
2021-08-04
100
320
A
2021-08-06
20
320
A
2021-08-07
100
320
A
2021-08-09
300
830
A
2021-08-11
100
830
A
2021-08-12
100
830
A
2021-08-13
10
830
A
2021-08-23
10
40
A
2021-08-24
10
40
A
2021-08-26
10
40
A
2021-08-28
10
40
I want to calculate the last two week total sum like
twoweekSum= current week total sum + Previous Week total sum i.e. current week is 34 then twoweekSum is 34 week total sum + 33 week total sum.
Please help me in to get in this in like output data frame so I can use that for further analysis.
Thank You folks !

Use:
#convert values to datetimes
df['Date'] = pd.to_datetime(df['Date'])
#convert values to weeks
df['week'] = df['Date'].dt.isocalendar().week
#aggregate sum per ID and weeks, then add missing weeks and sum in rolling
f = lambda x: x.reindex(range(x.index.min(), x.index.max() + 1))
.rolling(2, min_periods=1).sum()
df1 = df.groupby(['ID', 'week'])['Amount'].sum().reset_index(level=0).groupby('ID').apply(f)
print (df1)
Amount
ID week
A 31 320.0
32 830.0
33 510.0
34 40.0
#last add to original DataFrame per ID and weeks
df=df.join(df1.rename(columns={'Amount':'TwoWeekSum'}),on=['ID','week']).drop('week',axis=1)
print (df)
ID Date Amount TwoWeekSum
0 A 2021-08-03 100 320.0
1 A 2021-08-04 100 320.0
2 A 2021-08-06 20 320.0
3 A 2021-08-07 100 320.0
4 A 2021-08-09 300 830.0
5 A 2021-08-11 100 830.0
6 A 2021-08-12 100 830.0
7 A 2021-08-13 10 830.0
8 A 2021-08-23 10 40.0
9 A 2021-08-24 10 40.0
10 A 2021-08-26 10 40.0
11 A 2021-08-28 10 40.0

per = pd.period_range(df['Date'].min(), df['Date'].max(), freq='w')
mapper = df.groupby(df['Date'].astype('Period[W]')).sum().reindex(per, fill_value=0).rolling(2, 1).sum()['Amount']
out = df['Date'].astype('Period[W]').map(mapper)
out
0 320.0
1 320.0
2 320.0
3 320.0
4 830.0
5 830.0
6 830.0
7 830.0
8 40.0
9 40.0
10 40.0
11 40.0
Name: Date, dtype: float64
make out to TwoWeekSum column
df.assign(TwoWeekSum=out)
ID Date Amount TwoWeekSum
0 A 2021-08-03 100 320.0
1 A 2021-08-04 100 320.0
2 A 2021-08-06 20 320.0
3 A 2021-08-07 100 320.0
4 A 2021-08-09 300 830.0
5 A 2021-08-11 100 830.0
6 A 2021-08-12 100 830.0
7 A 2021-08-13 10 830.0
8 A 2021-08-23 10 40.0
9 A 2021-08-24 10 40.0
10 A 2021-08-26 10 40.0
11 A 2021-08-28 10 40.0
Update
if each ID , groupby and merge
per = pd.period_range(df['Date'].min(), df['Date'].max(), freq='w')
s = df['Date'].astype('Period[W]')
idx = pd.MultiIndex.from_product([df['ID'].unique(), per])
df1 = df.groupby(['ID', s]).sum().reindex(idx, fill_value=0).rolling(2, 1).agg(sum).reset_index().set_axis(['ID', 'period', 'TwoWeekSum'], axis=1)
df.assign(period=s).merge(df1, how='left').drop('period', axis=1)

Try using groupby to group the dataframe by dt.week (each week), then use transform sum to add up the values weekly and repeat the values:
df['TwoWeekSum'] = df.groupby(df['Date'].dt.week)['Amount'].transform('sum')
And then:
print(df)
Gives:
ID Date Amount TwoWeekSum
0 A 2021-08-03 100 320
1 A 2021-08-04 100 320
2 A 2021-08-06 20 320
3 A 2021-08-07 100 320
4 A 2021-08-09 300 830
5 A 2021-08-11 100 830
6 A 2021-08-12 100 830
7 A 2021-08-13 10 830
8 A 2021-08-23 10 40
9 A 2021-08-24 10 40
10 A 2021-08-26 10 40
11 A 2021-08-28 10 40

How to split string of text by conjunction in python?

I have a dataframe which is a transcript of a 2 person conversation. In the df are words, their timestamps, and the label of the speaker. It looks like this.
word start stop speaker
0 but 2.72 2.85 2
1 that's 2.85 3.09 2
2 alright 3.09 3.47 2
3 we'll 8.43 8.69 1
4 have 8.69 8.97 1
5 to 8.97 9.07 1
6 okay 9.19 10.01 2
7 sure 10.02 11.01 2
8 what? 11.02 12.00 1
9 i 12.01 13.00 2
10 agree 13.01 14.00 2
11 but 14.01 15.00 2
12 i 15.01 16.00 2
13 disagree 16.01 17.00 2
14 thats 17.01 18.00 1
15 fine 18.01 19.00 1
16 however 19.01 20.00 1
17 you 20.01 21.00 1
18 are 21.01 22.00 1
19 like 22.01 23.00 1
20 this 23.01 24.00 1
21 and 24.01 25.00 1
I have code to combine all words per speaker turn into one utterance which preserving the timestamp and speaker label. Using this code:
df.groupby([(df['speaker'] != df['speaker'].shift()).cumsum(), , df['speaker']], as_index=False).agg({
'word': ' '.join,
'start': 'min',
'stop': 'max'
})
I get this:
word start stop speaker
0 but that's alright 2.72 3.47 2
1 we'll have to 8.43 9.07 1
2 okay sure 9.19 11.01 2
3 what? 11.02 12.00 1
However, I want to split these combined utterances into sub-utterances based on the presence of a conjunctive adverb ('however', 'and', 'but', etc.). As a result, I want this:
word start stop speaker
0 but that's alright 2.72 3.47 2
1 we'll have to 8.43 9.07 1
2 okay sure 9.19 11.01 2
3 what? 11.02 12.00 1
4 I agree 12.01 14.00 2
5 but i disagree 14.01 17.00 2
6 thats fine 17.01 19.00 1
7 however you are 19.01 22.00 1
8 like this 22.01 24.00 1
9 and 24.01 25.00 1
Any recommendations on accomplishing this task would be appreciated.

you can add an OR (|) and check if the word is inside a specific list before grouping (e.g. with df['word'].isin(['however', 'and', 'but'])):
df.groupby([((df['speaker'] != df['speaker'].shift()) | (df['word'].isin(['however', 'and', 'but'])) ).cumsum(), df['speaker']], as_index=False).agg({
'word': ' '.join,
'start': 'min',
'stop': 'max'
})

percentile across dataframes, with missing values

I have several pandas dataframes (say a normal python list) which look like the following two. Note that there can be (in fact there are) some missing values at random dates. I need to compute percentiles of TMAX and/or TMAX_ANOM across the several dataframes, for each date, ignoring the missing values.
YYYY MM DD TMAX TMAX_ANOM
0 1980 7 1 13.0 2.333333
1 1980 7 2 14.3 2.566667
2 1980 7 3 15.6 2.800000
3 1980 7 4 16.9 3.033333
4 1980 8 1 18.2 3.266667
5 1980 8 2 19.5 3.500000
6 1980 8 3 20.8 3.733333
7 1980 8 4 22.1 3.966667
8 1981 7 1 10.0 -0.666667
9 1981 7 2 11.0 -0.733333
10 1981 7 3 12.0 -0.800000
11 1981 7 4 13.0 -0.866667
12 1981 8 1 14.0 -0.933333
13 1981 8 2 15.0 -1.000000
14 1981 8 3 16.0 -1.066667
15 1981 8 4 17.0 -1.133333
16 1982 7 1 9.0 -1.666667
17 1982 7 2 9.9 -1.833333
18 1982 7 3 10.8 -2.000000
19 1982 7 4 11.7 -2.166667
20 1982 8 1 12.6 -2.333333
21 1982 8 2 13.5 -2.500000
22 1982 8 3 14.4 -2.666667
23 1982 8 4 15.3 -2.833333
YYYY MM DD TMAX TMAX_ANOM
0 1980 7 1 14.0 3.666667
1 1980 7 2 15.4 4.033333
2 1980 7 3 16.8 4.400000
3 1980 7 4 18.2 4.766667
4 1980 8 1 19.6 5.133333
6 1980 8 3 22.4 5.866667
7 1980 8 4 23.8 6.233333
8 1981 7 1 10.0 -0.333333
9 1981 7 2 11.0 -0.366667
10 1981 7 3 12.0 -0.400000
11 1981 7 4 13.0 -0.433333
12 1981 8 1 14.0 -0.466667
13 1981 8 2 15.0 -0.500000
14 1981 8 3 16.0 -0.533333
15 1981 8 4 17.0 -0.566667
16 1982 7 1 7.0 -3.333333
17 1982 7 2 7.7 -3.666667
18 1982 7 3 8.4 -4.000000
19 1982 7 4 9.1 -4.333333
20 1982 8 1 9.8 -4.666667
21 1982 8 2 10.5 -5.000000
23 1982 8 4 11.9 -5.666667
So just to be clear, in this example with just two dataframe (and supposing the percentile is median to simplify the discussion), as a output I need a dataframe with 24 elements, the same YYYY/MM/DD fields, and the TMAX (and/or TMAX_ANOM) replaced as follow: for 1980/7/1 it must be the median between 13 and 14, for for 1980/7/2 it must be the median between 14.3 and 15.4 and so on. When there are missing values (for example the 1980/8/2 in the second dataframe here), the median must be computed just from the remaining dataframes -- so in this case the value would just be 19.5
I have not been able to find a clean way to accomplish this, with either numpy or pandas. Any suggestions or should I just resort to manual looping?

#dates as indexes
df1.index = pd.to_datetime(dict(year = df1.YYYY, month = df1.MM, day = df1.DD))
df2.index = pd.to_datetime(dict(year = df2.YYYY, month = df2.MM, day = df2.DD))
#binding useful columns
new_df = df1[['TMAX','TMAX_ANOM']].join(df2[['TMAX','TMAX_ANOM']], lsuffix = '_df1', rsuffix = '_df2')
#calculating quantile
new_df['TMAX_quantile'] = new_df[['TMAX_df1', 'TMAX_df2']].quantile(0.5, axis = 1)

Pandas Dataframe: shift/merge multiple rows sharing the same column values into one row

Sorry for any possible confusion with the title. I will describe my question better with the following code and pictures.
Now I have a dataframe with multiple columns. The first two columns, by which they are sorted, 'Route' and 'ID' (Sorry about the formatting, all the rows here have 'Route' value of '100' and 'ID' from 1 to 3.
df1.head(9)
Route ID Year Vol Truck_Vol Truck_%
0 100 1 2017.0 7016 635.0 9.1
1 100 1 2014.0 6835 NaN NaN
2 100 1 2011.0 5959 352.0 5.9
3 100 2 2018.0 15828 NaN NaN
4 100 2 2015.0 13114 2964.0 22.6
5 100 2 2009.0 11844 1280.0 10.8
6 100 3 2016.0 15434 NaN NaN
7 100 3 2013.0 18699 2015.0 10.8
8 100 3 2010.0 15903 NaN NaN
What I want to have is
Route ID Year Vol1 Truck_Vol1 Truck_%1 Year2 Vol2 Truck_Vol2 Truck_%2 Year3 Vol3 Truck_Vol3 Truck_%3
0 100 1 2017 7016 635.0 9.1 2014 6835 NaN NaN 2011 5959 352.0 5.9
1 100 2 2018 15828 NaN NaN 2015 13114 2964.0 22.6 2009 11844 1280.0 10.8
2 100 3 2016 15434 NaN NaN 2013 18699 2015.0 10.8 2010 15903 NaN NaN
Again, sorry for the messy formatting. Let me try a simplified version.
Input:
Route ID Year Vol T_%
0 100 1 2017 100 1.0
1 100 1 2014 200 NaN
2 100 1 2011 300 2.0
3 100 2 2018 400 NaN
4 100 2 2015 500 3.0
5 100 2 2009 600 4.0
Desired Output:
Route ID Year Vol T_% Year.1 Vol.1 T_%.1 Year.2 Vol.2 T_%.2
0 100 1 2017 100 1.0 2014 200 NaN 2011 300 2
1 100 2 2018 400 NaN 2015 500 3.0 2009 600 4
So basically just move the cells shown in the picture
I am stumped here. The names for the newly generated columns don't matter.
For this current dataframe, I have three rows per 'group' like shown in the code. It will be great if the answer can accommodate any number of rows each group.
Thanks for your time.

with groupby + cumcount + set_index + unstack
df1 = df.assign(cid = df.groupby(['Route', 'ID']).cumcount()).set_index(['Route', 'ID', 'cid']).unstack(-1).sort_index(1,1)
df1.columns = [f'{x}{y}' for x,y in df1.columns]
df1 = df1.reset_index()
Output df1:
Route ID T_%0 Vol0 Year0 T_%1 Vol1 Year1 T_%2 Vol2 Year2
0 100 1 1.0 100 2017 NaN 200 2014 2.0 300 2011
1 100 2 NaN 400 2018 3.0 500 2015 4.0 600 2009

melt + pivot_table
v = df.melt(id_vars=['Route', 'ID'])
v['variable'] += v.groupby(['Route', 'ID', 'variable']).cumcount().astype(str)
res = v.pivot_table(index=['Route', 'ID'], columns='variable', values='value')
variable T_% 0 T_% 1 T_% 2 Vol 0 Vol 1 Vol 2 Year 0 Year 1 Year 2
Route ID
100 1 1.0 NaN 2.0 100.0 200.0 300.0 2017.0 2014.0 2011.0
2 NaN 3.0 4.0 400.0 500.0 600.0 2018.0 2015.0 2009.0
If you want to sort these:
c = res.columns.str.extract(r'(\d+)')[0].values.astype(int)
res.iloc[:,np.argsort(c)]
variable T_%0 Vol0 Year0 T_%1 Vol1 Year1 T_%2 Vol2 Year2
Route ID
100 1 1.0 100.0 2017.0 NaN 200.0 2014.0 2.0 300.0 2011.0
2 NaN 400.0 2018.0 3.0 500.0 2015.0 4.0 600.0 2009.0
You asked about why I used cumcount. To explain, here is what v looks like from above:
Route ID variable value
0 100 1 Year 2017.0
1 100 1 Year 2014.0
2 100 1 Year 2011.0
3 100 2 Year 2018.0
4 100 2 Year 2015.0
5 100 2 Year 2009.0
6 100 1 Vol 100.0
7 100 1 Vol 200.0
8 100 1 Vol 300.0
9 100 2 Vol 400.0
10 100 2 Vol 500.0
11 100 2 Vol 600.0
12 100 1 T_% 1.0
13 100 1 T_% NaN
14 100 1 T_% 2.0
15 100 2 T_% NaN
16 100 2 T_% 3.0
17 100 2 T_% 4.0
If I used pivot_table on this DataFrame, you would end up with something like this:
variable T_% Vol Year
Route ID
100 1 1.5 200.0 2014.0
2 3.5 500.0 2014.0
Obviously you are losing data here. cumcount is the solution, as it turns the variable series into this:
Route ID variable value
0 100 1 Year0 2017.0
1 100 1 Year1 2014.0
2 100 1 Year2 2011.0
3 100 2 Year0 2018.0
4 100 2 Year1 2015.0
5 100 2 Year2 2009.0
6 100 1 Vol0 100.0
7 100 1 Vol1 200.0
8 100 1 Vol2 300.0
9 100 2 Vol0 400.0
10 100 2 Vol1 500.0
11 100 2 Vol2 600.0
12 100 1 T_%0 1.0
13 100 1 T_%1 NaN
14 100 1 T_%2 2.0
15 100 2 T_%0 NaN
16 100 2 T_%1 3.0
17 100 2 T_%2 4.0
Where you have a count of repeated elements per unique Route and ID.

Improve Performance of Apply Method

I would like to groupby by the variable of my df "cod_id" and then apply this function:
[df.loc[df['dt_op'].between(d, d + pd.Timedelta(days = 7)), 'quantity'].sum() \
for d in df['dt_op']]
Moving from this df:
print(df)
dt_op quantity cod_id
20/01/18 1 613
21/01/18 8 611
21/01/18 1 613
...
To this one:
print(final_df)
n = 7
dt_op quantity product_code Final_Quantity
20/01/18 1 613 2
21/01/18 8 611 8
25/01/18 1 613 1
...
I tried with:
def lookforward(x):
L = [x.loc[x['dt_op'].between(row.dt_op, row.dt_op + pd.Timedelta(days=7)), \
'quantity'].sum() for row in x.itertuples(index=False)]
return pd.Series(L, index=x.index)
s = df.groupby('cod_id').apply(lookforward)
s.index = s.index.droplevel(0)
df['Final_Quantity'] = s
print(df)
dt_op quantity cod_id Final_Quantity
0 2018-01-20 1 613 2
1 2018-01-21 8 611 8
2 2018-01-21 1 613 1
But it is not an efficient solution, since it is computationally slow;
How can I improve its performance?
I would achieve it even with a new code/new function that leads to the same result.
EDIT:
Subset of the original dataset, with just one product (cod_id == 2), I tried to run on the code provided by "w-m":
print(df)
cod_id dt_op quantita final_sum
0 2 2017-01-03 1 54.0
1 2 2017-01-04 1 53.0
2 2 2017-01-13 1 52.0
3 2 2017-01-23 2 51.0
4 2 2017-01-26 1 49.0
5 2 2017-02-03 1 48.0
6 2 2017-02-27 1 47.0
7 2 2017-03-05 1 46.0
8 2 2017-03-15 1 45.0
9 2 2017-03-23 1 44.0
10 2 2017-03-27 2 43.0
11 2 2017-03-31 3 41.0
12 2 2017-04-04 1 38.0
13 2 2017-04-05 1 37.0
14 2 2017-04-15 2 36.0
15 2 2017-04-27 2 34.0
16 2 2017-04-30 1 32.0
17 2 2017-05-16 1 31.0
18 2 2017-05-18 1 30.0
19 2 2017-05-19 1 29.0
20 2 2017-06-03 1 28.0
21 2 2017-06-04 1 27.0
22 2 2017-06-07 1 26.0
23 2 2017-06-13 2 25.0
24 2 2017-06-14 1 23.0
25 2 2017-06-20 1 22.0
26 2 2017-06-22 2 21.0
27 2 2017-06-28 1 19.0
28 2 2017-06-30 1 18.0
29 2 2017-07-03 1 17.0
30 2 2017-07-06 2 16.0
31 2 2017-07-07 1 14.0
32 2 2017-07-13 1 13.0
33 2 2017-07-20 1 12.0
34 2 2017-07-28 1 11.0
35 2 2017-08-06 1 10.0
36 2 2017-08-07 1 9.0
37 2 2017-08-24 1 8.0
38 2 2017-09-06 1 7.0
39 2 2017-09-16 2 6.0
40 2 2017-09-20 1 4.0
41 2 2017-10-07 1 3.0
42 2 2017-11-04 1 2.0
43 2 2017-12-07 1 1.0

Edit 181017: this approach doesn't work due to forward rolling functions on sparse time series not currently being supported by pandas, see the comments.
Using for loops can be a performance killer when doing pandas operations.
The for loop around the rows plus their timedelta of 7 days can be replaced with a .rolling("7D"). To get a forward-rolling time delta (current date + 7 days), we reverse the df by date, as shown here.
Then no custom function is required anymore, and you can just take .quantity.sum() from the groupby.
quant_sum = df.sort_values("dt_op", ascending=False).groupby("cod_id") \
.rolling("7D", on="dt_op").quantity.sum()
cod_id dt_op
611 2018-01-21 8.0
613 2018-01-21 1.0
2018-01-20 2.0
Name: quantity, dtype: float64
result = df.set_index(["cod_id", "dt_op"])
result["final_sum"] = quant_sum
result.reset_index()
cod_id dt_op quantity final_sum
0 613 2018-01-20 1 2.0
1 611 2018-01-21 8 8.0
2 613 2018-01-21 1 1.0

Implementing the exact behavior from the question is difficult due to two shortcoming in pandas: neither groupby/rolling/transform nor forward looking rolling sparse dates being implemented (see other answer for more details).
This answer attempts to work around both by resampling the data, filling in all days, and then joining the quant_sums back with the original data.
# Create a temporary df with all in between days filled in with zeros
filled = df.set_index("dt_op").groupby("cod_id") \
.resample("D").asfreq().fillna(0) \
.quantity.to_frame()
# Reverse and sum
filled["quant_sum"] = filled.reset_index().set_index("dt_op") \
.iloc[::-1] \
.groupby("cod_id") \
.rolling(7, min_periods=1) \
.quantity.sum().astype(int)
# Join with original `df`, dropping the filled days
result = df.set_index(["cod_id", "dt_op"]).join(filled.quant_sum).reset_index()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Avoid iteration over rows for computation in pandas - python

Related

How to calculate last two week sum for each group ID

How to split string of text by conjunction in python?

percentile across dataframes, with missing values

Pandas Dataframe: shift/merge multiple rows sharing the same column values into one row

Improve Performance of Apply Method

Categories

Resources