delete the integer in timeindex - python

This is a part of a dataframe.as you can see, there are some Integer in the timeindex. It should not be a timestamp. So I want to just delete it.So how can we delete the records which has the integer as a timeindex?
rent_time rent_price_per_square_meter
0 2016-11-28 09:01:58 0.400000
1 2016-11-28 09:02:35 0.400000
2 2016-11-28 09:02:43 0.400000
3 2016-11-28 09:03:21 0.400000
4 2016-11-28 09:03:21 0.400000
5 2016-11-28 09:03:34 0.400000
6 2016-11-28 09:03:34 0.400000
7 2017-06-17 02:49:33 0.933333
8 2017-03-19 01:30:03 0.490196
9 2017-03-10 06:39:03 11.111111
10 2017-03-09 14:40:03 16.666667
11 908797 11.000000
12 2017-06-08 03:27:52 22.000000
13 2017-06-30 03:03:11 22.000000
14 2017-02-20 11:04:48 12.000000
15 2017-03-05 13:53:39 6.842105
16 2017-03-06 14:00:01 6.842105
17 2017-03-15 02:38:54 20.000000
18 2017-03-15 02:19:07 13.043478
19 2017-02-24 15:10:00 25.000000
20 2017-06-26 02:17:31 13.043478
21 82368 11.111111
22 2017-06-30 07:53:55 4.109589
23 2017-07-17 02:42:43 20.000000
24 2017-06-30 07:38:00 5.254237
25 2017-06-30 07:49:00 4.920635
26 2017-06-30 05:26:26 4.189189

You can use boolean indexing with to_datetime and parameter errors=coerce for return NaNs for no datetime values and then add notnull for return all datetimes:
df1 = df[pd.to_datetime(df['rent_time'], errors='coerce').notnull()]
print (df1)
rent_time rent_price_per_square_meter
0 2016-11-28 09:01:58 0.400000
1 2016-11-28 09:02:35 0.400000
2 2016-11-28 09:02:43 0.400000
3 2016-11-28 09:03:21 0.400000
4 2016-11-28 09:03:21 0.400000
5 2016-11-28 09:03:34 0.400000
6 2016-11-28 09:03:34 0.400000
7 2017-06-17 02:49:33 0.933333
8 2017-03-19 01:30:03 0.490196
9 2017-03-10 06:39:03 11.111111
10 2017-03-09 14:40:03 16.666667
12 2017-06-08 03:27:52 22.000000
13 2017-06-30 03:03:11 22.000000
14 2017-02-20 11:04:48 12.000000
15 2017-03-05 13:53:39 6.842105
16 2017-03-06 14:00:01 6.842105
17 2017-03-15 02:38:54 20.000000
18 2017-03-15 02:19:07 13.043478
19 2017-02-24 15:10:00 25.000000
20 2017-06-26 02:17:31 13.043478
22 2017-06-30 07:53:55 4.109589
23 2017-07-17 02:42:43 20.000000
24 2017-06-30 07:38:00 5.254237
25 2017-06-30 07:49:00 4.920635
26 2017-06-30 05:26:26 4.189189
EDIT:
For next data procesing if need DatetimeIndex:
df['rent_time'] = pd.to_datetime(df['rent_time'], errors='coerce')
df = df.dropna(subset=['rent_time']).set_index('rent_time')
print (df)
rent_price_per_square_meter
rent_time
2016-11-28 09:01:58 0.400000
2016-11-28 09:02:35 0.400000
2016-11-28 09:02:43 0.400000
2016-11-28 09:03:21 0.400000
2016-11-28 09:03:21 0.400000
2016-11-28 09:03:34 0.400000
2016-11-28 09:03:34 0.400000
2017-06-17 02:49:33 0.933333
2017-03-19 01:30:03 0.490196
2017-03-10 06:39:03 11.111111
2017-03-09 14:40:03 16.666667
2017-06-08 03:27:52 22.000000
2017-06-30 03:03:11 22.000000
2017-02-20 11:04:48 12.000000
2017-03-05 13:53:39 6.842105
2017-03-06 14:00:01 6.842105
2017-03-15 02:38:54 20.000000
2017-03-15 02:19:07 13.043478
2017-02-24 15:10:00 25.000000
2017-06-26 02:17:31 13.043478
2017-06-30 07:53:55 4.109589
2017-07-17 02:42:43 20.000000
2017-06-30 07:38:00 5.254237
2017-06-30 07:49:00 4.920635
2017-06-30 05:26:26 4.189189

Related

Why cant i filter a python index using a defined list?

So im taking a pandas and Numpy course and ran into a problem, the course instructor performed the solution and it worked, i followed every step and it didnt work for me
Pardon the length, i included the actual datasets for clarity
i assigned the following items in a list to the variable " dates" as instructed, see below
dates = [
"2016-12-22",
"2017-05-03",
"2017-01-06",
"2017-03-05",
"2017-02-12",
"2017-03-21",
"2017-04-14",
"2017-04-15",
]
then i have a series im working against named oil_series with the following data
Date is the Index Name
date
2016-12-20
2016-12-21
2016-12-22
2016-12-23
2016-12-27
2016-12-28
2016-12-29
2016-12-30
2017-01-03
2017-01-04
2017-01-05
2017-01-06
2017-01-09
2017-01-10
2017-01-11
2017-01-12
2017-01-13
2017-01-17
2017-01-18
2017-01-19
2017-01-20
2017-01-23
2017-01-24
2017-01-25
2017-01-26
2017-01-27
2017-01-30
2017-01-31
2017-02-01
2017-02-02
2017-02-03
2017-02-06
2017-02-07
2017-02-08
2017-02-09
2017-02-10
2017-02-13
2017-02-14
2017-02-15
2017-02-16
2017-02-17
2017-02-21
2017-02-22
2017-02-23
2017-02-24
2017-02-27
2017-02-28
2017-03-01
2017-03-02
2017-03-03
2017-03-06
2017-03-07
2017-03-08
2017-03-09
2017-03-10
2017-03-13
2017-03-14
2017-03-15
2017-03-16
2017-03-17
2017-03-20
2017-03-21
2017-03-22
2017-03-23
2017-03-24
2017-03-27
2017-03-28
2017-03-29
2017-03-30
2017-03-31
2017-04-03
2017-04-04
2017-04-05
2017-04-06
2017-04-07
2017-04-10
2017-04-11
2017-04-12
2017-04-13
2017-04-17
2017-04-18
2017-04-19
2017-04-20
2017-04-21
2017-04-24
2017-04-25
2017-04-26
2017-04-27
2017-04-28
2017-05-01
2017-05-02
2017-05-03
2017-05-04
2017-05-05
2017-05-08
2017-05-09
2017-05-10
2017-05-11
2017-05-12
2017-05-15
Values
52.22
51.44
51.98
52.01
52.82
54.01
53.8
53.75
52.36
53.26
53.77
53.98
51.95
50.82
52.19
53.01
52.36
52.45
51.12
51.39
52.33
52.77
52.38
52.14
53.24
53.18
52.63
52.75
53.9
53.55
53.81
53.01
52.19
52.37
52.99
53.84
52.96
53.21
53.11
53.41
53.41
54.02
53.61
54.48
53.99
54.04
54
53.82
52.63
53.33
53.19
52.68
49.83
48.75
48.05
47.95
47.24
48.34
48.3
48.34
47.79
47.02
47.29
47
47.3
47.02
48.36
49.47
50.3
50.54
50.25
50.99
51.14
51.69
52.25
53.06
53.38
53.12
53.19
52.62
52.46
50.49
50.26
49.64
48.9
49.22
49.22
48.96
49.31
48.83
47.65
47.79
45.55
46.23
46.46
45.84
47.28
47.81
47.83
48.86
So when i write the following code to filter the "oil_prices" whose index are dates against the "dates" list i created, see code below
mask = (oil_series.index.isin(dates)) & (oil_series <= 50)
oil_series.loc[mask]
the following error occurs
Error from running the code
Please help me understand the problem
According your comment, you have a MultiIndex of only one level. Convert it to Index and your code should work:
oil_series.index = oil_series.index.get_level_values('date')
# Your code here.
mask = (oil_series.index.isin(dates)) & (oil_series <= 50)

Add to values of a DataFrame using cooridnates

I have a dataframe a:
Out[68]:
p0_4 p5_7 p8_9 p10_14 p15 p16_17 p18_19 p20_24 p25_29 \
0 1360.0 921.0 676.0 1839.0 336.0 668.0 622.0 1190.0 1399.0
1 308.0 197.0 187.0 411.0 67.0 153.0 172.0 336.0 385.0
2 76.0 59.0 40.0 72.0 16.0 36.0 20.0 56.0 82.0
3 765.0 608.0 409.0 1077.0 220.0 359.0 342.0 873.0 911.0
4 1304.0 906.0 660.0 1921.0 375.0 725.0 645.0 1362.0 1474.0
5 195.0 135.0 78.0 262.0 44.0 97.0 100.0 265.0 229.0
6 1036.0 965.0 701.0 1802.0 335.0 701.0 662.0 1321.0 1102.0
7 5072.0 3798.0 2865.0 7334.0 1399.0 2732.0 2603.0 4976.0 4575.0
8 1360.0 962.0 722.0 1758.0 357.0 710.0 713.0 1761.0 1660.0
9 743.0 508.0 369.0 1118.0 286.0 615.0 429.0 738.0 885.0
10 1459.0 1015.0 679.0 1732.0 337.0 746.0 677.0 1493.0 1546.0
11 828.0 519.0 415.0 1057.0 190.0 439.0 379.0 788.0 1024.0
12 1042.0 690.0 503.0 1204.0 219.0 451.0 465.0 1193.0 1406.0
p30_44 p45_59 p60_64 p65_74 p75_84 p85_89 p90plus
0 4776.0 8315.0 2736.0 5463.0 2819.0 738.0 451.0
1 1004.0 2456.0 988.0 2007.0 1139.0 313.0 153.0
2 291.0 529.0 187.0 332.0 108.0 31.0 10.0
3 2807.0 5505.0 2060.0 4104.0 2129.0 516.0 252.0
4 4524.0 9406.0 3034.0 6003.0 3366.0 840.0 471.0
5 806.0 1490.0 606.0 1288.0 664.0 185.0 108.0
6 4127.0 8311.0 2911.0 6111.0 3525.0 1029.0 707.0
7 16917.0 27547.0 8145.0 15950.0 9510.0 2696.0 1714.0
8 5692.0 9380.0 3288.0 6458.0 3830.0 1050.0 577.0
9 2749.0 5696.0 2014.0 4165.0 2352.0 603.0 288.0
10 4676.0 7654.0 2502.0 5077.0 3004.0 754.0 461.0
11 2799.0 4880.0 1875.0 3951.0 2294.0 551.0 361.0
12 3288.0 5661.0 1974.0 4007.0 2343.0 623.0 303.0
and a series d:
Out[70]:
2 p45_59
10 p45_59
11 p45_59
Is there a simple way to add 1 to number in a with the same index and column labels in d?
I have tried:
a[d] +=1
However this adds 1 to every value in the column, not just the values with indices 2, 10 and 11.
Thanking you in advance.
You might want to try this.
a.loc[list(d.index), list(d.values)] += 1

Taking the mean value of N last days, including NaNs

I have this data frame:
ID Date X 123_Var 456_Var 789_Var
A 16-07-19 3 777.0 250.0 810.0
A 17-07-19 9 637.0 121.0 529.0
A 20-07-19 2 295.0 272.0 490.0
A 21-07-19 3 778.0 600.0 544.0
A 22-07-19 6 741.0 792.0 907.0
A 25-07-19 6 435.0 416.0 820.0
A 26-07-19 8 590.0 455.0 342.0
A 27-07-19 6 763.0 476.0 753.0
A 02-08-19 6 717.0 211.0 454.0
A 03-08-19 6 152.0 442.0 475.0
A 05-08-19 6 564.0 340.0 302.0
A 07-08-19 6 105.0 929.0 633.0
A 08-08-19 6 948.0 366.0 586.0
B 07-08-19 4 509.0 690.0 406.0
B 08-08-19 2 413.0 725.0 414.0
B 12-08-19 2 170.0 702.0 912.0
B 13-08-19 3 851.0 616.0 477.0
B 14-08-19 9 475.0 447.0 555.0
B 15-08-19 1 412.0 403.0 708.0
B 17-08-19 2 299.0 537.0 321.0
B 18-08-19 4 310.0 119.0 125.0
C 16-07-19 3 777.0 250.0 810.0
C 17-07-19 9 637.0 121.0 529.0
C 20-07-19 2 NaN NaN NaN
C 21-07-19 3 NaN NaN NaN
C 22-07-19 6 741.0 792.0 907.0
C 25-07-19 6 NaN NaN NaN
C 26-07-19 8 590.0 455.0 342.0
C 27-07-19 6 763.0 476.0 753.0
C 02-08-19 6 717.0 211.0 454.0
C 03-08-19 6 NaN NaN NaN
C 05-08-19 6 564.0 340.0 302.0
C 07-08-19 6 NaN NaN NaN
C 08-08-19 6 948.0 366.0 586.0
I want to show the mean value of n last days (using Date column), excluding the value of current day.
I'm using this code (what should I do to fix this?):
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
n = 4
cols = df.filter(regex='Var').columns
df = df.set_index('Date')
df_ = df.set_index('ID', append=True).swaplevel(1,0)
df1 = df.groupby('ID').rolling(window=f'{n+1}D')[cols].count()
df2 = df.groupby('ID').rolling(window=f'{n+1}D')[cols].mean()
df3 = (df1.mul(df2)
.sub(df_[cols])
.div(df1[cols].sub(1)).add_suffix(f'_{n}')
)
df4 = df_.join(df3)
Expected result:
ID Date X 123_Var 456_Var 789_Var 123_Var_4 456_Var_4 789_Var_4
A 16-07-19 3 777.0 250.0 810.0 NaN NaN NaN
A 17-07-19 9 637.0 121.0 529.0 777.000000 250.000000 810.0
A 20-07-19 2 295.0 272.0 490.0 707.000000 185.500000 669.5
A 21-07-19 3 778.0 600.0 544.0 466.000000 196.500000 509.5
A 22-07-19 6 741.0 792.0 907.0 536.500000 436.000000 517.0
A 25-07-19 6 435.0 416.0 820.0 759.500000 696.000000 725.5
A 26-07-19 8 590.0 455.0 342.0 588.000000 604.000000 863.5
A 27-07-19 6 763.0 476.0 753.0 512.500000 435.500000 581.0
A 02-08-19 6 717.0 211.0 454.0 NaN NaN NaN
A 03-08-19 6 152.0 442.0 475.0 717.000000 211.000000 454.0
A 05-08-19 6 564.0 340.0 302.0 434.500000 326.500000 464.5
A 07-08-19 6 105.0 929.0 633.0 358.000000 391.000000 388.5
A 08-08-19 6 948.0 366.0 586.0 334.500000 634.500000 467.5
B 07-08-19 4 509.0 690.0 406.0 NaN NaN NaN
B 08-08-19 2 413.0 725.0 414.0 509.000000 690.000000 406.0
B 12-08-19 2 170.0 702.0 912.0 413.000000 725.000000 414.0
B 13-08-19 3 851.0 616.0 477.0 291.500000 713.500000 663.0
B 14-08-19 9 475.0 447.0 555.0 510.500000 659.000000 694.5
B 15-08-19 1 412.0 403.0 708.0 498.666667 588.333333 648.0
B 17-08-19 2 299.0 537.0 321.0 579.333333 488.666667 580.0
B 18-08-19 4 310.0 119.0 125.0 395.333333 462.333333 528.0
C 16-07-19 3 777.0 250.0 810.0 NaN NaN NaN
C 17-07-19 9 637.0 121.0 529.0 777.000000 250.000000 810.0
C 20-07-19 2 NaN NaN NaN 707.000000 185.500000 669.5
C 21-07-19 3 NaN NaN NaN 637.000000 121.000000 529.0
C 22-07-19 6 741.0 792.0 907.0 NaN NaN NaN
C 25-07-19 6 NaN NaN NaN 741.000000 792.000000 907.0
C 26-07-19 8 590.0 455.0 342.0 741.000000 792.000000 907.0
C 27-07-19 6 763.0 476.0 753.0 590.000000 455.000000 342.0
C 02-08-19 6 717.0 211.0 454.0 NaN NaN NaN
C 03-08-19 6 NaN NaN NaN 717.000000 211.000000 454.0
C 05-08-19 6 564.0 340.0 302.0 717.000000 211.000000 454.0
C 07-08-19 6 NaN NaN NaN 564.000000 340.000000 302.0
C 08-08-19 6 948.0 366.0 586.0 564.000000 340.000000 302.0
Numbers after the decimal point is not the matter.
These threads might help:
Taking the mean value of N last days
Taking the min value of N last days
Try:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df1 = (df.groupby('ID')['Date','123_Var','456_Var','789_Var'].rolling('4D', on='Date', closed='left').mean())
dfx = (df.set_index(['ID','Date'])
.join(df1.reset_index().set_index(['ID','Date']), rsuffix='_4')
.reset_index()
.drop('level_1',axis=1))
print(dfx.to_string())
ID Date X 123_Var 456_Var 789_Var 123_Var_4 456_Var_4 789_Var_4
0 A 2019-07-16 3 777.0 250.0 810.0 NaN NaN NaN
1 A 2019-07-17 9 637.0 121.0 529.0 777.000000 250.000000 810.0
2 A 2019-07-20 2 295.0 272.0 490.0 707.000000 185.500000 669.5
3 A 2019-07-21 3 778.0 600.0 544.0 466.000000 196.500000 509.5
4 A 2019-07-22 6 741.0 792.0 907.0 536.500000 436.000000 517.0
5 A 2019-07-25 6 435.0 416.0 820.0 759.500000 696.000000 725.5
6 A 2019-07-26 8 590.0 455.0 342.0 588.000000 604.000000 863.5
7 A 2019-07-27 6 763.0 476.0 753.0 512.500000 435.500000 581.0
8 A 2019-08-02 6 717.0 211.0 454.0 NaN NaN NaN
9 A 2019-08-03 6 152.0 442.0 475.0 717.000000 211.000000 454.0
10 A 2019-08-05 6 564.0 340.0 302.0 434.500000 326.500000 464.5
11 A 2019-08-07 6 105.0 929.0 633.0 358.000000 391.000000 388.5
12 A 2019-08-08 6 948.0 366.0 586.0 334.500000 634.500000 467.5
13 B 2019-08-07 4 509.0 690.0 406.0 NaN NaN NaN
14 B 2019-08-08 2 413.0 725.0 414.0 509.000000 690.000000 406.0
15 B 2019-08-12 2 170.0 702.0 912.0 413.000000 725.000000 414.0
16 B 2019-08-13 3 851.0 616.0 477.0 170.000000 702.000000 912.0
17 B 2019-08-14 9 475.0 447.0 555.0 510.500000 659.000000 694.5
18 B 2019-08-15 1 412.0 403.0 708.0 498.666667 588.333333 648.0
19 B 2019-08-17 2 299.0 537.0 321.0 579.333333 488.666667 580.0
20 B 2019-08-18 4 310.0 119.0 125.0 395.333333 462.333333 528.0
21 C 2019-07-16 3 777.0 250.0 810.0 NaN NaN NaN
22 C 2019-07-17 9 637.0 121.0 529.0 777.000000 250.000000 810.0
23 C 2019-07-20 2 NaN NaN NaN 707.000000 185.500000 669.5
24 C 2019-07-21 3 NaN NaN NaN 637.000000 121.000000 529.0
25 C 2019-07-22 6 741.0 792.0 907.0 NaN NaN NaN
26 C 2019-07-25 6 NaN NaN NaN 741.000000 792.000000 907.0
27 C 2019-07-26 8 590.0 455.0 342.0 741.000000 792.000000 907.0
28 C 2019-07-27 6 763.0 476.0 753.0 590.000000 455.000000 342.0
29 C 2019-08-02 6 717.0 211.0 454.0 NaN NaN NaN
30 C 2019-08-03 6 NaN NaN NaN 717.000000 211.000000 454.0
31 C 2019-08-05 6 564.0 340.0 302.0 717.000000 211.000000 454.0
32 C 2019-08-07 6 NaN NaN NaN 564.000000 340.000000 302.0
33 C 2019-08-08 6 948.0 366.0 586.0 564.000000 340.000000 302.0

Select pandas dataframe rows between dates and set column value

In the dataframe below, I want to set row values in the column p50 to NaN if they are below 2.0 between the dates May 15th and August 15th 2018.
date p50
2018-03-02 2018-03-02 NaN
2018-03-03 2018-03-03 NaN
2018-03-04 2018-03-04 0.022590
2018-03-05 2018-03-05 NaN
2018-03-06 2018-03-06 -0.042227
2018-03-07 2018-03-07 NaN
2018-03-08 2018-03-08 NaN
2018-03-09 2018-03-09 -0.028646
2018-03-10 2018-03-10 NaN
2018-03-11 2018-03-11 -0.045244
2018-03-12 2018-03-12 NaN
2018-03-13 2018-03-13 NaN
2018-03-14 2018-03-14 -0.020590
2018-03-15 2018-03-15 NaN
2018-03-16 2018-03-16 -0.028317
2018-03-17 2018-03-17 NaN
2018-03-18 2018-03-18 NaN
2018-03-19 2018-03-19 NaN
2018-03-20 2018-03-20 NaN
2018-03-21 2018-03-21 NaN
2018-03-22 2018-03-22 NaN
2018-03-23 2018-03-23 NaN
2018-03-24 2018-03-24 -0.066800
2018-03-25 2018-03-25 NaN
2018-03-26 2018-03-26 -0.104135
2018-03-27 2018-03-27 NaN
2018-03-28 2018-03-28 NaN
2018-03-29 2018-03-29 -0.115200
2018-03-30 2018-03-30 NaN
2018-03-31 2018-03-31 -0.000455
... ...
2018-07-03 2018-07-03 NaN
2018-07-04 2018-07-04 2.313035
2018-07-05 2018-07-05 NaN
2018-07-06 2018-07-06 NaN
2018-07-07 2018-07-07 NaN
2018-07-08 2018-07-08 NaN
2018-07-09 2018-07-09 0.054513
2018-07-10 2018-07-10 NaN
2018-07-11 2018-07-11 NaN
2018-07-12 2018-07-12 3.711159
2018-07-13 2018-07-13 NaN
2018-07-14 2018-07-14 6.583810
2018-07-15 2018-07-15 NaN
2018-07-16 2018-07-16 NaN
2018-07-17 2018-07-17 0.070182
2018-07-18 2018-07-18 NaN
2018-07-19 2018-07-19 3.688812
2018-07-20 2018-07-20 NaN
2018-07-21 2018-07-21 NaN
2018-07-22 2018-07-22 0.876552
2018-07-23 2018-07-23 NaN
2018-07-24 2018-07-24 1.077895
2018-07-25 2018-07-25 NaN
2018-07-26 2018-07-26 NaN
2018-07-27 2018-07-27 3.802159
2018-07-28 2018-07-28 NaN
2018-07-29 2018-07-29 0.077402
2018-07-30 2018-07-30 NaN
2018-07-31 2018-07-31 NaN
2018-08-01 2018-08-01 3.202214
The dataframe has a datetime index. I do the foll:
mask = (group['date'] > '2018-5-15') & (group['date'] <= '2018-8-15')
group[mask].loc[group[mask]['p50'] < 2.]['p50'] = np.NaN
However, this does not update the dataframe. How to fix this?
I think you should using .loc like
mask = (group['date'] > '2018-5-15') & (group['date'] <= '2018-8-15')
group.loc[mask&(group['p50'] < 2),'p50']=np.nan

How to compute a column depending on previous values of one and current values of another column

I have not too much experience with pandas, and I have the following DataFrame:
month A B
2/28/2017 0.7377573034 0
3/31/2017 0.7594787565 3.7973937824
4/30/2017 0.7508308808 3.7541544041
5/31/2017 0.7038814004 7.0388140044
6/30/2017 0.6920212254 11.0723396061
7/31/2017 0.6801610503 11.5627378556
8/31/2017 0.6683008753 10.6928140044
9/30/2017 0.7075915026 11.3214640415
10/31/2017 0.6989436269 7.6883798964
11/30/2017 0.6259514607 4.3816602247
12/31/2017 0.6119757303 3.671854382
1/31/2018 0.633 3.798
2/28/2018 0.598 4.784
3/31/2018 0.673 5.384
4/30/2018 0.673 1.346
5/31/2018 0.609 0
6/30/2018 0.609 0
7/31/2018 0.609 0
8/31/2018 0.609 0
9/30/2018 0.673 0
10/31/2018 0.673 0
11/30/2018 0.598 0
12/31/2018 0.598 0
I need to compute column C which basically is column A times column B, but the value of column B is the value of the previous year of the corresponding month. In addition, for values not having the corresponding month in the previous year, this value should be zero. To be more specific, this is what I expect C to be:
C
0 # these values are zero because the corresponding month in the previous year is not in column A
0
0
0
0
0
0
0
0
0
0
0
0 # 0.598 * 0
2.5556460155552 # 0.673 * 3.7973937824
2.5265459139593 # 0.673 * 3.7541544041
4.2866377286796 # 0.609 * 7.0388140044
6.7430548201149 # 0.609 * 11.0723396061
7.0417073540604 # 0.609 * 11.5627378556
6.5119237286796 # 0.609 * 10.6928140044
7.6193452999295 # 0.673 * 11.3214640415
5.1742796702772 # 0.673 * 7.6883798964
2.6202328143706 # 0.598 * 4.3816602247
2.195768920436 # 0.598 * 3.671854382
How can I achieve this? I am sure there might be a way to do it not using a for loop. Thanks in advance.
In [73]: (df.drop('B',1)
...: .merge(df.drop('A',1)
...: .assign(month=df.month + pd.offsets.MonthEnd(12)),
...: on='month', how='left')
...: .eval("C = A * B", inplace=False)
...: .fillna(0)
...: )
...:
Out[73]:
month A B C
0 2017-02-28 0.737757 0.000000 0.000000
1 2017-03-31 0.759479 0.000000 0.000000
2 2017-04-30 0.750831 0.000000 0.000000
3 2017-05-31 0.703881 0.000000 0.000000
4 2017-06-30 0.692021 0.000000 0.000000
5 2017-07-31 0.680161 0.000000 0.000000
6 2017-08-31 0.668301 0.000000 0.000000
7 2017-09-30 0.707592 0.000000 0.000000
8 2017-10-31 0.698944 0.000000 0.000000
9 2017-11-30 0.625951 0.000000 0.000000
10 2017-12-31 0.611976 0.000000 0.000000
11 2018-01-31 0.633000 0.000000 0.000000
12 2018-02-28 0.598000 0.000000 0.000000
13 2018-03-31 0.673000 3.797394 2.555646
14 2018-04-30 0.673000 3.754154 2.526546
15 2018-05-31 0.609000 7.038814 4.286638
16 2018-06-30 0.609000 11.072340 6.743055
17 2018-07-31 0.609000 11.562738 7.041707
18 2018-08-31 0.609000 10.692814 6.511924
19 2018-09-30 0.673000 11.321464 7.619345
20 2018-10-31 0.673000 7.688380 5.174280
21 2018-11-30 0.598000 4.381660 2.620233
22 2018-12-31 0.598000 3.671854 2.195769
Explanation:
we can generate a helper DF like this (we have added 12 months to month column and dropped A column):
In [77]: df.drop('A',1).assign(month=df.month + pd.offsets.MonthEnd(12))
Out[77]:
month B
0 2018-02-28 0.000000
1 2018-03-31 3.797394
2 2018-04-30 3.754154
3 2018-05-31 7.038814
4 2018-06-30 11.072340
5 2018-07-31 11.562738
6 2018-08-31 10.692814
7 2018-09-30 11.321464
8 2018-10-31 7.688380
9 2018-11-30 4.381660
10 2018-12-31 3.671854
11 2019-01-31 3.798000
12 2019-02-28 4.784000
13 2019-03-31 5.384000
14 2019-04-30 1.346000
15 2019-05-31 0.000000
16 2019-06-30 0.000000
17 2019-07-31 0.000000
18 2019-08-31 0.000000
19 2019-09-30 0.000000
20 2019-10-31 0.000000
21 2019-11-30 0.000000
22 2019-12-31 0.000000
now we can merge it with the original DF (we don't need B column in the original DF):
In [79]: (df.drop('B',1)
...: .merge(df.drop('A',1)
...: .assign(month=df.month + pd.offsets.MonthEnd(12)),
...: on='month', how='left'))
Out[79]:
month A B
0 2017-02-28 0.737757 NaN
1 2017-03-31 0.759479 NaN
2 2017-04-30 0.750831 NaN
3 2017-05-31 0.703881 NaN
4 2017-06-30 0.692021 NaN
5 2017-07-31 0.680161 NaN
6 2017-08-31 0.668301 NaN
7 2017-09-30 0.707592 NaN
8 2017-10-31 0.698944 NaN
9 2017-11-30 0.625951 NaN
10 2017-12-31 0.611976 NaN
11 2018-01-31 0.633000 NaN
12 2018-02-28 0.598000 0.000000
13 2018-03-31 0.673000 3.797394
14 2018-04-30 0.673000 3.754154
15 2018-05-31 0.609000 7.038814
16 2018-06-30 0.609000 11.072340
17 2018-07-31 0.609000 11.562738
18 2018-08-31 0.609000 10.692814
19 2018-09-30 0.673000 11.321464
20 2018-10-31 0.673000 7.688380
21 2018-11-30 0.598000 4.381660
22 2018-12-31 0.598000 3.671854
then using .eval("C = A * B", inplace=False) we cann generate a new column "on the fly"

Categories