How to calculate last two week sum for each group ID - python

** I have Input data frame **
ID
Date
Amount
A
2021-08-03
100
A
2021-08-04
100
A
2021-08-06
20
A
2021-08-07
100
A
2021-08-09
300
A
2021-08-11
100
A
2021-08-12
100
A
2021-08-13
10
A
2021-08-23
10
A
2021-08-24
10
A
2021-08-26
10
A
2021-08-28
10
desired Output data frame
ID
Date
Amount
TwoWeekSum
A
2021-08-03
100
320
A
2021-08-04
100
320
A
2021-08-06
20
320
A
2021-08-07
100
320
A
2021-08-09
300
830
A
2021-08-11
100
830
A
2021-08-12
100
830
A
2021-08-13
10
830
A
2021-08-23
10
40
A
2021-08-24
10
40
A
2021-08-26
10
40
A
2021-08-28
10
40
I want to calculate the last two week total sum like
twoweekSum= current week total sum + Previous Week total sum i.e. current week is 34 then twoweekSum is 34 week total sum + 33 week total sum.
Please help me in to get in this in like output data frame so I can use that for further analysis.
Thank You folks !

Use:
#convert values to datetimes
df['Date'] = pd.to_datetime(df['Date'])
#convert values to weeks
df['week'] = df['Date'].dt.isocalendar().week
#aggregate sum per ID and weeks, then add missing weeks and sum in rolling
f = lambda x: x.reindex(range(x.index.min(), x.index.max() + 1))
.rolling(2, min_periods=1).sum()
df1 = df.groupby(['ID', 'week'])['Amount'].sum().reset_index(level=0).groupby('ID').apply(f)
print (df1)
Amount
ID week
A 31 320.0
32 830.0
33 510.0
34 40.0
#last add to original DataFrame per ID and weeks
df=df.join(df1.rename(columns={'Amount':'TwoWeekSum'}),on=['ID','week']).drop('week',axis=1)
print (df)
ID Date Amount TwoWeekSum
0 A 2021-08-03 100 320.0
1 A 2021-08-04 100 320.0
2 A 2021-08-06 20 320.0
3 A 2021-08-07 100 320.0
4 A 2021-08-09 300 830.0
5 A 2021-08-11 100 830.0
6 A 2021-08-12 100 830.0
7 A 2021-08-13 10 830.0
8 A 2021-08-23 10 40.0
9 A 2021-08-24 10 40.0
10 A 2021-08-26 10 40.0
11 A 2021-08-28 10 40.0

per = pd.period_range(df['Date'].min(), df['Date'].max(), freq='w')
mapper = df.groupby(df['Date'].astype('Period[W]')).sum().reindex(per, fill_value=0).rolling(2, 1).sum()['Amount']
out = df['Date'].astype('Period[W]').map(mapper)
out
0 320.0
1 320.0
2 320.0
3 320.0
4 830.0
5 830.0
6 830.0
7 830.0
8 40.0
9 40.0
10 40.0
11 40.0
Name: Date, dtype: float64
make out to TwoWeekSum column
df.assign(TwoWeekSum=out)
ID Date Amount TwoWeekSum
0 A 2021-08-03 100 320.0
1 A 2021-08-04 100 320.0
2 A 2021-08-06 20 320.0
3 A 2021-08-07 100 320.0
4 A 2021-08-09 300 830.0
5 A 2021-08-11 100 830.0
6 A 2021-08-12 100 830.0
7 A 2021-08-13 10 830.0
8 A 2021-08-23 10 40.0
9 A 2021-08-24 10 40.0
10 A 2021-08-26 10 40.0
11 A 2021-08-28 10 40.0
Update
if each ID , groupby and merge
per = pd.period_range(df['Date'].min(), df['Date'].max(), freq='w')
s = df['Date'].astype('Period[W]')
idx = pd.MultiIndex.from_product([df['ID'].unique(), per])
df1 = df.groupby(['ID', s]).sum().reindex(idx, fill_value=0).rolling(2, 1).agg(sum).reset_index().set_axis(['ID', 'period', 'TwoWeekSum'], axis=1)
df.assign(period=s).merge(df1, how='left').drop('period', axis=1)

Try using groupby to group the dataframe by dt.week (each week), then use transform sum to add up the values weekly and repeat the values:
df['TwoWeekSum'] = df.groupby(df['Date'].dt.week)['Amount'].transform('sum')
And then:
print(df)
Gives:
ID Date Amount TwoWeekSum
0 A 2021-08-03 100 320
1 A 2021-08-04 100 320
2 A 2021-08-06 20 320
3 A 2021-08-07 100 320
4 A 2021-08-09 300 830
5 A 2021-08-11 100 830
6 A 2021-08-12 100 830
7 A 2021-08-13 10 830
8 A 2021-08-23 10 40
9 A 2021-08-24 10 40
10 A 2021-08-26 10 40
11 A 2021-08-28 10 40

Related

I want to Weekly sum for each ID group but a week being here a full week from Sunday to Thursday

** I have Input data frame **
ID
Date
Amount
A
2021-08-03
100
A
2021-08-04
100
A
2021-08-06
20
A
2021-08-07
100
A
2021-08-09
300
A
2021-08-11
100
A
2021-08-12
100
A
2021-08-13
10
A
2021-08-23
10
A
2021-08-24
10
A
2021-08-26
10
A
2021-08-28
10
desired Output data frame
ID
Date
Amount
OneWeekAmount
TwoWeekAmount
ThreeWeekAmount
A
2021-08-03
100
200
200
200
A
2021-08-04
100
200
200
200
A
2021-08-06
20
200
200
200
A
2021-08-07
100
200
200
200
A
2021-08-09
300
500
700
700
A
2021-08-11
100
500
700
700
A
2021-08-12
100
500
700
700
A
2021-08-13
10
500
700
700
A
2021-08-23
10
30
30
530
A
2021-08-24
10
30
30
530
A
2021-08-26
10
30
30
530
A
2021-08-28
10
30
30
530
Note : Here week being here a full week from Sunday to Thursday.
I need Weekly sum in OneWeekAmount for each group i.e.. I have summed amount for date because 2021-08-03 and 2021-08-04 the days are Tuesday and Wednesday respectively, and excluded for 2021-08-06 , 2021-08-07 days are Friday and Saturday respectively.
TwoWeekAmount = OneWeekAmount + Previous Week Amount
ThreeWeekAmount = Present week + previous two week sum
Please help in get this output.
Thanking you in advanced.
df.groupby(['ID', df['date'].dt.week])['amount'].sum()

Pandas calculate percent growth over rows

I've created the following pandas dataframe and try to calculate the growth in % between the years given in Col2:
Col1
Col2
Jan
Feb
Mrz
Total
A
2019
100
200
300
600
A
2020
200
300
400
900
B
2019
10
20
30
60
B
2020
20
30
40
90
C
2019
1000
2000
3000
6000
C
2020
2000
3000
4000
9000
The table including the results should look like this (see last 3 rows):
Col1
Col2
Jan
Feb
Mrz
Total
A
2019
100
200
300
600
A
2020
200
300
400
900
B
2019
10
20
30
60
B
2020
20
30
40
90
C
2019
1000
2000
3000
6000
C
2020
2000
3000
4000
9000
A
GrowthInPercent
100
50
33
50
B
GrowthInPercent
100
50
33
50
C
GrowthInPercent
100
50
33
50
Is there a way to calculate the GrowthInPercent values using a pandas function?
I do not get it ;-(
You can use pct_change with groupby
u = (df[['Col1']].join(df.drop("Col2",1).groupby('Col1').pct_change()
.mul(100).round())
.dropna().assign(Col2="Growth%"))
out = df.append(u,ignore_index=True)
print(out)
Col1 Col2 Feb Jan Mrz Total
0 A 2019 200.0 100.0 300.0 600.0
1 A 2020 300.0 200.0 400.0 900.0
2 B 2019 20.0 10.0 30.0 60.0
3 B 2020 30.0 20.0 40.0 90.0
4 C 2019 2000.0 1000.0 3000.0 6000.0
5 C 2020 3000.0 2000.0 4000.0 9000.0
6 A Growth% 50.0 100.0 33.0 50.0
7 B Growth% 50.0 100.0 33.0 50.0
8 C Growth% 50.0 100.0 33.0 50.0
Note - this is assuming the data is sorted by Col1 and Col2 , if not you can use df = df.sort_values(by=['Col1','Col2']) first to sort the data.

Increase column by percentage every X rows

I have the following dataframe:
amount
01-01-2020 100
01-02-2020 100
01-03-2020 100
01-04-2020 100
01-05-2020 100
01-06-2020 100
01-07-2020 100
01-08-2020 100
01-09-2020 100
01-10-2020 100
01-11-2020 100
01-12-2020 100
I need to add a new column which starts with 100 and increases the value by 10% every 4 months, ie:
amount result
01-01-2020 100 100
01-02-2020 100 100
01-03-2020 100 100
01-04-2020 100 100
01-05-2020 100 110
01-06-2020 100 110
01-07-2020 100 110
01-08-2020 100 110
01-09-2020 100 121
01-10-2020 100 121
01-11-2020 100 121
01-12-2020 100 121
I think you need Grouper for each 4 months with GroupBy.ngroup for groups, then get 10% by multiple Series by 100 with divide 10 and last add 100:
df.index = pd.to_datetime(df.index, dayfirst=True)
df['result'] = df.groupby(pd.Grouper(freq='4MS')).ngroup().mul(100).div(10).add(100)
print (df)
amount result
2020-01-01 100 100.0
2020-02-01 100 100.0
2020-03-01 100 100.0
2020-04-01 100 100.0
2020-05-01 100 110.0
2020-06-01 100 110.0
2020-07-01 100 110.0
2020-08-01 100 110.0
2020-09-01 100 120.0
2020-10-01 100 120.0
2020-11-01 100 120.0
2020-12-01 100 120.0
If datetimes are consecutive and always each 4 rows is possible use:
df['result'] = np.arange(len(df)) // 4 * 100 / 10 + 100
print (df)
amount result
2020-01-01 100 100.0
2020-02-01 100 100.0
2020-03-01 100 100.0
2020-04-01 100 100.0
2020-05-01 100 110.0
2020-06-01 100 110.0
2020-07-01 100 110.0
2020-08-01 100 110.0
2020-09-01 100 120.0
2020-10-01 100 120.0
2020-11-01 100 120.0
2020-12-01 100 120.0
Here is another way:
pct = .1
df['result'] = df['amount'] * (1 + pct) ** (np.arange(len(df))//4)
you forgot to substract boolean vlaues for each period:
df['result'] = df['amount'] * (1 + pct) ** (np.arange(len(df))//4) - np.arange(len(df))//4
this is how you will have correct results.

Pandas dataframe Groupby and retrieve date range

Here is my dataframe that I am working on. There are two pay periods defined:
first 15 days and last 15 days for each month.
date employee_id hours_worked id job_group report_id
0 2016-11-14 2 7.50 385 B 43
1 2016-11-15 2 4.00 386 B 43
2 2016-11-30 2 4.00 387 B 43
3 2016-11-01 3 11.50 388 A 43
4 2016-11-15 3 6.00 389 A 43
5 2016-11-16 3 3.00 390 A 43
6 2016-11-30 3 6.00 391 A 43
I need to group by employee_id and job_group but at the same time
I have to achieve date range for that grouped row.
For example grouped results would be like following for employee_id 1:
Expected Output:
date employee_id hours_worked job_group report_id
1 2016-11-15 2 11.50 B 43
2 2016-11-30 2 4.00 B 43
4 2016-11-15 3 17.50 A 43
5 2016-11-16 3 9.00 A 43
Is this possible using pandas dataframe groupby?
Use SM with Grouper and last add SemiMonthEnd:
df['date'] = pd.to_datetime(df['date'])
d = {'hours_worked':'sum','report_id':'first'}
df = (df.groupby(['employee_id','job_group',pd.Grouper(freq='SM',key='date', closed='right')])
.agg(d)
.reset_index())
df['date'] = df['date'] + pd.offsets.SemiMonthEnd(1)
print (df)
employee_id job_group date hours_worked report_id
0 2 B 2016-11-15 11.5 43
1 2 B 2016-11-30 4.0 43
2 3 A 2016-11-15 17.5 43
3 3 A 2016-11-30 9.0 43
a. First, (for each employee_id) use multiple Grouper with the .sum() on the hours_worked column. Second, use DateOffset to achieve bi-weekly date column. After these 2 steps, I have assigned the date in the grouped DF based on 2 brackets (date ranges) - if day of month (from the date column) is <=15, then I set the day in date to 15, else I set the day to 30. This day is then used to assemble a new date. I calculated month end day based on 1, 2.
b. (For each employee_id) get the .last() record for the job_group and report_id columns
c. merge a. and b. on the employee_id key
# a.
hours = (df.groupby([
pd.Grouper(key='employee_id'),
pd.Grouper(key='date', freq='SM')
])['hours_worked']
.sum()
.reset_index())
hours['date'] = pd.to_datetime(hours['date'])
hours['date'] = hours['date'] + pd.DateOffset(days=14)
# Assign day based on bracket (date range) 0-15 or bracket (date range) >15
from pandas.tseries.offsets import MonthEnd
hours['bracket'] = hours['date'] + MonthEnd(0)
hours['bracket'] = pd.to_datetime(hours['bracket']).dt.day
hours.loc[hours['date'].dt.day <= 15, 'bracket'] = 15
hours['date'] = pd.to_datetime(dict(year=hours['date'].dt.year,
month=hours['date'].dt.month,
day=hours['bracket']))
hours.drop('bracket', axis=1, inplace=True)
# b.
others = (df.groupby('employee_id')['job_group','report_id']
.last()
.reset_index())
# c.
merged = hours.merge(others, how='inner', on='employee_id')
Raw data for employee_id==1 and employeeid==3
df.sort_values(by=['employee_id','date'], inplace=True)
print(df[df.employee_id.isin([1,3])])
index date employee_id hours_worked id job_group report_id
0 0 2016-11-14 1 7.5 481 A 43
10 10 2016-11-21 1 6.0 491 A 43
11 11 2016-11-22 1 5.0 492 A 43
15 15 2016-12-14 1 7.5 496 A 43
25 25 2016-12-21 1 6.0 506 A 43
26 26 2016-12-22 1 5.0 507 A 43
6 6 2016-11-02 3 6.0 487 A 43
4 4 2016-11-08 3 6.0 485 A 43
3 3 2016-11-09 3 11.5 484 A 43
5 5 2016-11-11 3 3.0 486 A 43
20 20 2016-11-12 3 3.0 501 A 43
21 21 2016-12-02 3 6.0 502 A 43
19 19 2016-12-08 3 6.0 500 A 43
18 18 2016-12-09 3 11.5 499 A 43
Output
print(merged)
employee_id date hours_worked job_group report_id
0 1 2016-11-15 7.5 A 43
1 1 2016-11-30 11.0 A 43
2 1 2016-12-15 7.5 A 43
3 1 2016-12-31 11.0 A 43
4 2 2016-11-15 31.0 B 43
5 2 2016-12-15 31.0 B 43
6 3 2016-11-15 29.5 A 43
7 3 2016-12-15 23.5 A 43
8 4 2015-03-15 5.0 B 43
9 4 2016-02-29 5.0 B 43
10 4 2016-11-15 5.0 B 43
11 4 2016-11-30 15.0 B 43
12 4 2016-12-15 5.0 B 43
13 4 2016-12-31 15.0 B 43

Shift element by 2 when there is a change in value in a column and then forward fill using pandas

I have a pandas dataframe with date index and 100 columns of stock prices.
I want to each stock, when ever there is a price change, there to be a lag of 2 and then after forward fill.
Eg data of 2 columns (subset of my data):
Stock A Stock B
1/1/2000 100 50
1/2/2000 100 50
1/3/2000 100 50
1/4/2000 350 50
1/5/2000 350 50
1/6/2000 350 50
1/7/2000 350 25
1/8/2000 350 25
1/9/2000 500 25
1/10/2000 500 25
1/11/2000 500 25
1/12/2000 500 150
1/1/2001 250 150
1/2/2001 250 150
1/3/2001 250 150
1/4/2001 250 150
1/5/2001 250 150
1/6/2001 250 150
1/7/2001 250 150
1/8/2001 75 150
1/9/2001 75 150
1/10/2001 75 25
1/11/2001 75 25
1/12/2001 75 25
1/1/2002 75 25
Now the output I desire is this:
Stock A Stock B
1/1/2000
1/2/2000
1/3/2000
1/4/2000
1/5/2000 100
1/6/2000 100
1/7/2000 100
1/8/2000 100 50
1/9/2000 100 50
1/10/2000 350 50
1/11/2000 350 50
1/12/2000 350 50
1/1/2001 350 25
1/2/2001 500 25
1/3/2001 500 25
1/4/2001 500 25
1/5/2001 500 25
1/6/2001 500 25
1/7/2001 500 25
1/8/2001 500 25
1/9/2001 250 25
1/10/2001 250 25
1/11/2001 250 150
1/12/2001 250 150
1/1/2002 250 150
Example of stock A:
When stock A changed first time (100 to 350), then previous value (100) was assigned to 2 days ahead (1/5/200). Then when it changed again from 350 to 500, 350 was assigned to 2 days ahead (1/10/2000) etc.....then a forward fill takes place.
Any help would be appreciated.
df.where(df.diff(-1).fillna(0).ne(0)).shift(2).ffill()
A B
2000-01-01 NaN NaN
2000-02-01 NaN NaN
2000-03-01 NaN NaN
2000-04-01 NaN NaN
2000-05-01 100.0 NaN
2000-06-01 100.0 NaN
2000-07-01 100.0 NaN
2000-08-01 100.0 50.0
2000-09-01 100.0 50.0
2000-10-01 350.0 50.0
2000-11-01 350.0 50.0
2000-12-01 350.0 50.0
2001-01-01 350.0 25.0
2001-02-01 500.0 25.0
2001-03-01 500.0 25.0
2001-04-01 500.0 25.0
2001-05-01 500.0 25.0
2001-06-01 500.0 25.0
2001-07-01 500.0 25.0
2001-08-01 500.0 25.0
2001-09-01 250.0 25.0
2001-10-01 250.0 25.0
2001-11-01 250.0 150.0
2001-12-01 250.0 150.0
2002-01-01 250.0 150.0

Categories