I have two datetime columns - ColumnA and ColumnB. I want to create a new column - ColumnC, using conditional logic.
Originally, I created ColumnB from a YearMonth column of dates such as 201907, 201908, etc.
When ColumnA is NaN, I want to choose ColumnB.
Otherwise, I want to choose ColumnA.
Currently, my code below is causing ColumnC to have different formats. I'm not sure how to get rid of all of those 0's. I want the whole column to be YYYY-MM-DD.
ID YearMonth ColumnA ColumnB ColumnC
0 1 201712 2017-12-29 2017-12-31 2017-12-29
1 1 201801 2018-01-31 2018-01-31 2018-01-31
2 1 201802 2018-02-28 2018-02-28 2018-02-28
3 1 201806 2018-06-29 2018-06-30 2018-06-29
4 1 201807 2018-07-31 2018-07-31 2018-07-31
5 1 201808 2018-08-31 2018-08-31 2018-08-31
6 1 201809 2018-09-28 2018-09-30 2018-09-28
7 1 201810 2018-10-31 2018-10-31 2018-10-31
8 1 201811 2018-11-30 2018-11-30 2018-11-30
9 1 201812 2018-12-31 2018-12-31 2018-12-31
10 1 201803 NaN 2018-03-31 1522454400000000000
11 1 201804 NaN 2018-04-30 1525046400000000000
12 1 201805 NaN 2018-05-31 1527724800000000000
13 1 201901 NaN 2019-01-31 1548892800000000000
14 1 201902 NaN 2019-02-28 1551312000000000000
15 1 201903 NaN 2019-03-31 1553990400000000000
16 1 201904 NaN 2019-04-30 1556582400000000000
17 1 201905 NaN 2019-05-31 1559260800000000000
18 1 201906 NaN 2019-06-30 1561852800000000000
19 1 201907 NaN 2019-07-31 1564531200000000000
20 1 201908 NaN 2019-08-31 1567209600000000000
21 1 201909 NaN 2019-09-30 1569801600000000000
df['ColumnB'] = pd.to_datetime(df['YearMonth'], format='%Y%m', errors='coerce').dropna() + pd.offsets.MonthEnd(0)
df['ColumnC'] = np.where(pd.isna(df['ColumnA']), pd.to_datetime(df['ColumnB'], format='%Y%m%d'), df['ColumnA'])
df['ColumnC'] = np.where(df['ColumnA'].isnull(),df['ColumnB'] , df['ColumnA'])
Just figured it out!
df['ColumnC'] = np.where(pd.isna(df['ColumnA']), pd.to_datetime(df['ColumnB']), pd.to_datetime(df['ColumnA']))
Related
I'm trying to merge two dataframes by time with multiple matches. I'm looking for all the instances of df2 whose timestamp falls 7 days or less before endofweek in df1. There may be more than one record that fits the case, and I want all of the matches, not just the first or last (which pd.merge_asof does).
import pandas as pd
df1 = pd.DataFrame({'endofweek': ['2019-08-31', '2019-08-31', '2019-09-07', '2019-09-07', '2019-09-14', '2019-09-14'], 'GroupCol': [1234,8679,1234,8679,1234,8679]})
df2 = pd.DataFrame({'timestamp': ['2019-08-30 10:00', '2019-08-30 10:30', '2019-09-07 12:00', '2019-09-08 14:00'], 'GroupVal': [1234, 1234, 8679, 1234], 'TextVal': ['1234_1', '1234_2', '8679_1', '1234_3']})
df1['endofweek'] = pd.to_datetime(df1['endofweek'])
df2['timestamp'] = pd.to_datetime(df2['timestamp'])
I've tried
pd.merge_asof(df1, df2, tolerance=pd.Timedelta('7d'), direction='backward', left_on='endofweek', right_on='timestamp', left_by='GroupCol', right_by='GroupVal')
but that gets me
endofweek GroupCol timestamp GroupVal TextVal
0 2019-08-31 1234 2019-08-30 10:30:00 1234.0 1234_2
1 2019-08-31 8679 NaT NaN NaN
2 2019-09-07 1234 NaT NaN NaN
3 2019-09-07 8679 NaT NaN NaN
4 2019-09-14 1234 2019-09-08 14:00:00 1234.0 1234_3
5 2019-09-14 8679 2019-09-07 12:00:00 8679.0 8679_1
I'm losing the text 1234_1. Is there way to do a sort of outer join for pd.merge_asof, where I can keep all of the instances of df2 and not just the first or last?
My ideal result would look like this (assuming that the endofweek times are treated like 00:00:00 on that date):
endofweek GroupCol timestamp GroupVal TextVal
0 2019-08-31 1234 2019-08-30 10:00:00 1234.0 1234_1
1 2019-08-31 1234 2019-08-30 10:30:00 1234.0 1234_2
2 2019-08-31 8679 NaT NaN NaN
3 2019-09-07 1234 NaT NaN NaN
4 2019-09-07 8679 NaT NaN NaN
5 2019-09-14 1234 2019-09-08 14:00:00 1234.0 1234_3
6 2019-09-14 8679 2019-09-07 12:00:00 8679.0 8679_1
pd.merge_asof only does a left join. After a lot of frustration trying to speed up the groupby/merge_ordered example, it's more intuitive and faster to do pd.merge_asof on both data sources in different directions, and then do an outer join to combine them.
left_merge = pd.merge_asof(df1, df2,
tolerance=pd.Timedelta('7d'), direction='backward',
left_on='endofweek', right_on='timestamp',
left_by='GroupCol', right_by='GroupVal')
right_merge = pd.merge_asof(df2, df1,
tolerance=pd.Timedelta('7d'), direction='forward',
left_on='timestamp', right_on='endofweek',
left_by='GroupVal', right_by='GroupCol')
merged = (left_merge.merge(right_merge, how="outer")
.sort_values(['endofweek', 'GroupCol', 'timestamp'])
.reset_index(drop=True))
merged
endofweek GroupCol timestamp GroupVal TextVal
0 2019-08-31 1234 2019-08-30 10:00:00 1234.0 1234_1
1 2019-08-31 1234 2019-08-30 10:30:00 1234.0 1234_2
2 2019-08-31 8679 NaT NaN NaN
3 2019-09-07 1234 NaT NaN NaN
4 2019-09-07 8679 NaT NaN NaN
5 2019-09-14 1234 2019-09-08 14:00:00 1234.0 1234_3
6 2019-09-14 8679 2019-09-07 12:00:00 8679.0 8679_1
In addition, it is much faster than my other answer:
import time
n=1000
start=time.time()
for i in range(n):
left_merge = pd.merge_asof(df1, df2,
tolerance=pd.Timedelta('7d'), direction='backward',
left_on='endofweek', right_on='timestamp',
left_by='GroupCol', right_by='GroupVal')
right_merge = pd.merge_asof(df2, df1,
tolerance=pd.Timedelta('7d'), direction='forward',
left_on='timestamp', right_on='endofweek',
left_by='GroupVal', right_by='GroupCol')
merged = (left_merge.merge(right_merge, how="outer")
.sort_values(['endofweek', 'GroupCol', 'timestamp'])
.reset_index(drop=True))
end = time.time()
end-start
15.040804386138916
One way I tried is using groupby on one data frame, and then subsetting the other one in a pd.merge_ordered:
merged = (df1.groupby(['GroupCol', 'endofweek']).
apply(lambda x: pd.merge_ordered(x, df2[(
(df2['GroupVal']==x.name[0])
&(abs(df2['timestamp']-x.name[1])<=pd.Timedelta('7d')))],
left_on='endofweek', right_on='timestamp')))
merged
endofweek GroupCol timestamp GroupVal TextVal
GroupCol endofweek
1234 2019-08-31 0 NaT NaN 2019-08-30 10:00:00 1234.0 1234_1
1 NaT NaN 2019-08-30 10:30:00 1234.0 1234_2
2 2019-08-31 1234.0 NaT NaN NaN
2019-09-07 0 2019-09-07 1234.0 NaT NaN NaN
2019-09-14 0 NaT NaN 2019-09-08 14:00:00 1234.0 1234_3
1 2019-09-14 1234.0 NaT NaN NaN
8679 2019-08-31 0 2019-08-31 8679.0 NaT NaN NaN
2019-09-07 0 2019-09-07 8679.0 NaT NaN NaN
2019-09-14 0 NaT NaN 2019-09-07 12:00:00 8679.0 8679_1
1 2019-09-14 8679.0 NaT NaN NaN
merged[['endofweek', 'GroupCol']] = (merged[['endofweek', 'GroupCol']]
.fillna(method="bfill"))
merged.reset_index(drop=True, inplace=True)
merged
endofweek GroupCol timestamp GroupVal TextVal
0 2019-08-31 1234.0 2019-08-30 10:00:00 1234.0 1234_1
1 2019-08-31 1234.0 2019-08-30 10:30:00 1234.0 1234_2
2 2019-08-31 1234.0 NaT NaN NaN
3 2019-09-07 1234.0 NaT NaN NaN
4 2019-09-14 1234.0 2019-09-08 14:00:00 1234.0 1234_3
5 2019-09-14 1234.0 NaT NaN NaN
6 2019-08-31 8679.0 NaT NaN NaN
7 2019-09-07 8679.0 NaT NaN NaN
8 2019-09-14 8679.0 2019-09-07 12:00:00 8679.0 8679_1
9 2019-09-14 8679.0 NaT NaN NaN
However it seems to me the result is very slow:
import time
n=1000
start=time.time()
for i in range(n):
merged = (df1.groupby(['GroupCol', 'endofweek']).
apply(lambda x: pd.merge_ordered(x, df2[(
(df2['GroupVal']==x.name[0])
&(abs(df2['timestamp']-x.name[1])<=pd.Timedelta('7d')))],
left_on='endofweek', right_on='timestamp')))
end = time.time()
end-start
40.72932052612305
I would greatly appreciate any improvements!
I have a pandas dataframe like this,
id d1 d2
0 1 2016-12-15 2017-02-08
1 2 2017-04-28 2017-07-20
2 3 2017-07-28 2017-10-19
3 4 2018-02-20 2019-01-21
4 5 2019-03-19 2019-06-10
5 1 2019-05-24 2019-05-30
6 2 2019-06-04 2019-07-22
I want to check whether any d2 is greater than next d1, if so I want to set that d2 to next d1 - 1.
I can figure out where I want to change the date with this code,
x['d2'].gt(x['d1'].shift(-1))
I am not sure how to proceed efficently after this.
Result I am looking for is like this,
id d1 d2
0 1 2016-12-15 2017-02-08
1 2 2017-04-28 2017-07-20
2 3 2017-07-28 2017-10-19
3 4 2018-02-20 2019-01-21
4 5 2019-03-19 2019-05-23
5 1 2019-05-24 2019-05-30
6 2 2019-06-04 2019-07-22
How can I do this in pandas with no loops.?
I am currently using apply like this for solving this,
x.apply(lambda x : x['d1_shifted'] - pd.Timedelta(days=1) if x['d2'] > x['d1_shifted'] else x['d2'], axis=1)
Try :
c=df.d2.gt(df.d1.shift(-1))
df=df.assign(d2=np.where(c,df.d1.shift(-1)-pd.Timedelta(1,unit='d'),df.d2))
print(df)
id d1 d2
0 1 2016-12-15 2017-02-08
1 2 2017-04-28 2017-07-20
2 3 2017-07-28 2017-10-19
3 4 2018-02-20 2019-01-21
4 5 2019-03-19 2019-05-23
5 1 2019-05-24 2019-05-30
6 2 2019-06-04 2019-07-22
Another way is using direct assign from .loc and pd.DateOffset as follows
m = df.d2.gt(df.d1.shift(-1))
df.loc[m, 'd2'] = df.shift(-1).loc[m, 'd1'] - pd.DateOffset(1)
Out[947]:
id d1 d2
0 1 2016-12-15 2017-02-08
1 2 2017-04-28 2017-07-20
2 3 2017-07-28 2017-10-19
3 4 2018-02-20 2019-01-21
4 5 2019-03-19 2019-05-23
5 1 2019-05-24 2019-05-30
6 2 2019-06-04 2019-07-22
The following is a simplification of the problem at hand.
I have a dataframe containing three columns, the date a state began, the state itself, and a flag field. It looks similar to this:
df = pd.DataFrame(
{'begin': pd.to_datetime(['2018-01-05', '2018-07-11', '2018-11-14', '2019-02-19']),
'state': [1, 2, 3, 4],
'started': [1, 0, 0, 0]
}
)
df
begin state started
0 2018-01-05 1 1
1 2018-07-11 2 0
2 2018-11-14 3 0
3 2019-02-19 4 0
I want to resample the dates so that they have a monthly period, and I achieve this as follows:
df.set_index('begin', drop=False).resample('m').ffill()
df
begin state started
begin
2018-01-31 2018-01-05 1 1
2018-02-28 2018-01-05 1 1
2018-03-31 2018-01-05 1 1
2018-04-30 2018-01-05 1 1
2018-05-31 2018-01-05 1 1
2018-06-30 2018-01-05 1 1
2018-07-31 2018-07-11 2 0
2018-08-31 2018-07-11 2 0
2018-09-30 2018-07-11 2 0
2018-10-31 2018-07-11 2 0
2018-11-30 2018-11-14 3 0
2018-12-31 2018-11-14 3 0
2019-01-31 2018-11-14 3 0
2019-02-28 2019-02-19 4 0
Everything looks ok, except for the flag column (started). I need it to be 1 exactly once, at its first occurrence as in the original dataframe.
The desired output is :
begin state started
begin
2018-01-31 2018-01-05 1 1
2018-02-28 2018-01-05 1 0
2018-03-31 2018-01-05 1 0
2018-04-30 2018-01-05 1 0
2018-05-31 2018-01-05 1 0
2018-06-30 2018-01-05 1 0
2018-07-31 2018-07-11 2 0
2018-08-31 2018-07-11 2 0
2018-09-30 2018-07-11 2 0
2018-10-31 2018-07-11 2 0
2018-11-30 2018-11-14 3 0
2018-12-31 2018-11-14 3 0
2019-01-31 2018-11-14 3 0
2019-02-28 2019-02-19 4 0
Thus, for a given combination of begin and state, if started is 1, it should be one only at the first occurrence of this combination.
Is there an efficient way to achieve this?
If only 1 and 0 in started column use DataFrame.duplicated with specify both columns in list:
mask = df.duplicated(['begin','started'])
Also is possible rewrite only 1 values by chain another mask:
mask = df.duplicated(['begin','started']) & df['started'].eq(1)
df.loc[mask, 'started'] = 0
Or:
df['started'] = np.where(mask, 0, df['started'])
print (df)
begin state started
begin
2018-01-31 2018-01-05 1 1
2018-02-28 2018-01-05 1 0
2018-03-31 2018-01-05 1 0
2018-04-30 2018-01-05 1 0
2018-05-31 2018-01-05 1 0
2018-06-30 2018-01-05 1 0
2018-07-31 2018-07-11 2 0
2018-08-31 2018-07-11 2 0
2018-09-30 2018-07-11 2 0
2018-10-31 2018-07-11 2 0
2018-11-30 2018-11-14 3 0
2018-12-31 2018-11-14 3 0
2019-01-31 2018-11-14 3 0
2019-02-28 2019-02-19 4 0
Can you do:
df = df.set_index('begin', drop=False).resample('m').ffill()
df.loc[df['started'].duplicated(keep='first'), 'started'] = 0
I have some code which produces a dataframe output, of columns date, and x (a given value). df=
index date colx
2018-08-09 NaN NaN
2018-08-10 2018-08-10 00:00:00 -0.200460
2018-08-13 NaN NaN
2018-08-14 NaN NaN
2018-08-15 NaN NaN
2018-08-16 NaN NaN
2018-08-17 NaN NaN
2018-08-20 NaN NaN
2018-08-21 NaN NaN
2018-08-22 2018-08-22 00:00:00 -2.317475
2018-08-23 2018-08-23 00:00:00 -1.652724
2018-08-24 2018-08-24 00:00:00 -3.669870
2018-08-27 2018-08-27 00:00:00 -3.807074
2018-08-28 2018-08-28 00:00:00 -0.257006
2018-08-29 NaN NaN
2018-08-30 2018-08-30 00:00:00 -0.374825
2018-08-31 2018-08-31 00:00:00 -5.655345
2018-09-03 2018-09-03 00:00:00 -4.631105
2018-09-04 2018-09-04 00:00:00 -4.722768
2018-09-05 2018-09-05 00:00:00 -3.012673
2018-09-06 NaN NaN
Date column is the same as the index, for selected values, and np.nan for other sections.
What I am looking to achieve and unsure as to how, is to extract the first date and last date of a block of data (without the 00:00:00)
With the help of the following link I am able to tackle the issue of cumsum but not the extraction of the data into the required output below:
python pandas conditional cumulative sum
b = df.colx
c = b.cumsum()
df['cumsumcolx']=c.sub(c.mask(b != 0).ffill(), fill_value=0).astype(float)
This code gives me:
index date colx cumsumcolx
2018-08-09 0 0 0
2018-08-10 2018-08-10 00:00:00 -0.200460 -0.200460
2018-08-13 0 0 0
2018-08-14 0 0 0
2018-08-15 0 0 0
2018-08-16 0 0 0
2018-08-17 0 0 0
2018-08-20 0 0 0
2018-08-21 0 0 0
2018-08-22 2018-08-22 00:00:00 -2.317475 -2.317475
2018-08-23 2018-08-23 00:00:00 -1.652724 -3.970198
2018-08-24 2018-08-24 00:00:00 -3.669870 -7.640069
2018-08-27 2018-08-27 00:00:00 -3.807074 -11.447143
2018-08-28 2018-08-28 00:00:00 -0.257006 -11.704148
2018-08-29 0 0 0
2018-08-30 2018-08-30 00:00:00 -0.374825 -0.374825
2018-08-31 2018-08-31 00:00:00 -5.655345 -6.030169
2018-09-03 2018-09-03 00:00:00 -4.631105 -10.661275
2018-09-04 2018-09-04 00:00:00 -4.722768 -15.384043
2018-09-05 2018-09-05 00:00:00 -3.012673 -18.396715
2018-09-06 0 0 0
Thus, im asking for help with extraction so that i achieve an expected output of a table/dataframe:
entrydate exitdate cumsumcolx
2018-08-10 2018-08-10 -0.200460
2018-08-22 2018-08-28 -11.704148
2018-08-30 2018-09-05 -18.396715
my df is very long, thus just taken a snippet of it for illustration purposes.
Thank you
First you need to label the separations between groups:
blanks = df.date.isnull()
Then label the groups themselves:
df['group'] = blanks.cumsum()
Now you have a column which labels each group, with one small defect in that the first member of each group is a NAN row. Simply remove such rows:
df = df[~blanks]
Then use groupby:
grouped = df.groupby('group')
entrydate = grouped.date.first()
exitdate = grouped.date.last()
cumsumcolx = grouped.colx.sum()
Similar another solution as below:
# Python Code
def AggSum(dfg):
return pd.DataFrame([[dfg.iloc[0].idx, dfg.iloc[-1].date, dfg.colx.sum()]],
columns=['entrydate', 'exitdate', 'cumsumcolx'])
df['idx'] = pd.to_datetime(df['idx'])
df['date'] = pd.to_datetime(df['date'])
df['Group'] = df.colx.isnull().cumsum()
df2 = df[df.colx.notnull()].groupby('Group', as_index=False).apply(AggSum)
df2.reset_index(drop=True, inplace=True)
#Output dataframe
entrydate exitdate cumsumcolx
0 2018-08-10 2018-08-10 -0.200460
1 2018-08-22 2018-08-28 -11.704149
2 2018-08-30 2018-09-05 -18.396716
I have not too much experience with pandas, and I have the following DataFrame:
month A B
2/28/2017 0.7377573034 0
3/31/2017 0.7594787565 3.7973937824
4/30/2017 0.7508308808 3.7541544041
5/31/2017 0.7038814004 7.0388140044
6/30/2017 0.6920212254 11.0723396061
7/31/2017 0.6801610503 11.5627378556
8/31/2017 0.6683008753 10.6928140044
9/30/2017 0.7075915026 11.3214640415
10/31/2017 0.6989436269 7.6883798964
11/30/2017 0.6259514607 4.3816602247
12/31/2017 0.6119757303 3.671854382
1/31/2018 0.633 3.798
2/28/2018 0.598 4.784
3/31/2018 0.673 5.384
4/30/2018 0.673 1.346
5/31/2018 0.609 0
6/30/2018 0.609 0
7/31/2018 0.609 0
8/31/2018 0.609 0
9/30/2018 0.673 0
10/31/2018 0.673 0
11/30/2018 0.598 0
12/31/2018 0.598 0
I need to compute column C which basically is column A times column B, but the value of column B is the value of the previous year of the corresponding month. In addition, for values not having the corresponding month in the previous year, this value should be zero. To be more specific, this is what I expect C to be:
C
0 # these values are zero because the corresponding month in the previous year is not in column A
0
0
0
0
0
0
0
0
0
0
0
0 # 0.598 * 0
2.5556460155552 # 0.673 * 3.7973937824
2.5265459139593 # 0.673 * 3.7541544041
4.2866377286796 # 0.609 * 7.0388140044
6.7430548201149 # 0.609 * 11.0723396061
7.0417073540604 # 0.609 * 11.5627378556
6.5119237286796 # 0.609 * 10.6928140044
7.6193452999295 # 0.673 * 11.3214640415
5.1742796702772 # 0.673 * 7.6883798964
2.6202328143706 # 0.598 * 4.3816602247
2.195768920436 # 0.598 * 3.671854382
How can I achieve this? I am sure there might be a way to do it not using a for loop. Thanks in advance.
In [73]: (df.drop('B',1)
...: .merge(df.drop('A',1)
...: .assign(month=df.month + pd.offsets.MonthEnd(12)),
...: on='month', how='left')
...: .eval("C = A * B", inplace=False)
...: .fillna(0)
...: )
...:
Out[73]:
month A B C
0 2017-02-28 0.737757 0.000000 0.000000
1 2017-03-31 0.759479 0.000000 0.000000
2 2017-04-30 0.750831 0.000000 0.000000
3 2017-05-31 0.703881 0.000000 0.000000
4 2017-06-30 0.692021 0.000000 0.000000
5 2017-07-31 0.680161 0.000000 0.000000
6 2017-08-31 0.668301 0.000000 0.000000
7 2017-09-30 0.707592 0.000000 0.000000
8 2017-10-31 0.698944 0.000000 0.000000
9 2017-11-30 0.625951 0.000000 0.000000
10 2017-12-31 0.611976 0.000000 0.000000
11 2018-01-31 0.633000 0.000000 0.000000
12 2018-02-28 0.598000 0.000000 0.000000
13 2018-03-31 0.673000 3.797394 2.555646
14 2018-04-30 0.673000 3.754154 2.526546
15 2018-05-31 0.609000 7.038814 4.286638
16 2018-06-30 0.609000 11.072340 6.743055
17 2018-07-31 0.609000 11.562738 7.041707
18 2018-08-31 0.609000 10.692814 6.511924
19 2018-09-30 0.673000 11.321464 7.619345
20 2018-10-31 0.673000 7.688380 5.174280
21 2018-11-30 0.598000 4.381660 2.620233
22 2018-12-31 0.598000 3.671854 2.195769
Explanation:
we can generate a helper DF like this (we have added 12 months to month column and dropped A column):
In [77]: df.drop('A',1).assign(month=df.month + pd.offsets.MonthEnd(12))
Out[77]:
month B
0 2018-02-28 0.000000
1 2018-03-31 3.797394
2 2018-04-30 3.754154
3 2018-05-31 7.038814
4 2018-06-30 11.072340
5 2018-07-31 11.562738
6 2018-08-31 10.692814
7 2018-09-30 11.321464
8 2018-10-31 7.688380
9 2018-11-30 4.381660
10 2018-12-31 3.671854
11 2019-01-31 3.798000
12 2019-02-28 4.784000
13 2019-03-31 5.384000
14 2019-04-30 1.346000
15 2019-05-31 0.000000
16 2019-06-30 0.000000
17 2019-07-31 0.000000
18 2019-08-31 0.000000
19 2019-09-30 0.000000
20 2019-10-31 0.000000
21 2019-11-30 0.000000
22 2019-12-31 0.000000
now we can merge it with the original DF (we don't need B column in the original DF):
In [79]: (df.drop('B',1)
...: .merge(df.drop('A',1)
...: .assign(month=df.month + pd.offsets.MonthEnd(12)),
...: on='month', how='left'))
Out[79]:
month A B
0 2017-02-28 0.737757 NaN
1 2017-03-31 0.759479 NaN
2 2017-04-30 0.750831 NaN
3 2017-05-31 0.703881 NaN
4 2017-06-30 0.692021 NaN
5 2017-07-31 0.680161 NaN
6 2017-08-31 0.668301 NaN
7 2017-09-30 0.707592 NaN
8 2017-10-31 0.698944 NaN
9 2017-11-30 0.625951 NaN
10 2017-12-31 0.611976 NaN
11 2018-01-31 0.633000 NaN
12 2018-02-28 0.598000 0.000000
13 2018-03-31 0.673000 3.797394
14 2018-04-30 0.673000 3.754154
15 2018-05-31 0.609000 7.038814
16 2018-06-30 0.609000 11.072340
17 2018-07-31 0.609000 11.562738
18 2018-08-31 0.609000 10.692814
19 2018-09-30 0.673000 11.321464
20 2018-10-31 0.673000 7.688380
21 2018-11-30 0.598000 4.381660
22 2018-12-31 0.598000 3.671854
then using .eval("C = A * B", inplace=False) we cann generate a new column "on the fly"