I have two data frames that contain time-series data that are on different ranges. One starts earlier, and ends earlier. Also, one is monthly and one is quarterly. However, the index of both is in the form of YYYY-MM-DD. Is there a cute way of merging these dataframes using "Python" and "Pandas"?
Thanks!
/edit
One set:
DATE GDP GPDI NFLS
0 1947-01-01 243.1 35.9 112.815
1 1947-04-01 246.3 34.5 111.253
2 1947-07-01 250.1 34.9 113.023
3 1947-10-01 260.3 43.2 111.440
The other one:
DATE INDPRO M08354USM310NNBR GDP
(...)
334 1946-11-01 13.3916 NaN NaN
335 1946-12-01 13.4721 NaN NaN
336 1947-01-01 13.6332 42.8 NaN
337 1947-02-01 13.7137 42.5 NaN
Together I would like to join them, such that
DATE INDPRO M08354USM310NNBR GDP GPDI NFLS
1946-11-01 13.3916 NaN NaN NaN NaN
1946-12-01 13.4712 NaN NaN NaN NaN
1947-01-01 13.6332 42.8 243.1 35.9 112.815
1947-02-01 13.7137 42.5 NaN NaN NaN
(...)
Just perform a merge the fact the periods are different and don't overlap suits you in fact:
merged = df1.merge(df2, on='DATE', how='outer')
merged
Out[54]:
DATE GDP_x GPDI NFLS INDPRO M08354USM310NNBR GDP_y
0 1947-01-01 243.1 35.9 112.815 13.6332 42.8 NaN
1 1947-04-01 246.3 34.5 111.253 NaN NaN NaN
2 1947-07-01 250.1 34.9 113.023 NaN NaN NaN
3 1947-10-01 260.3 43.2 111.440 NaN NaN NaN
4 1946-11-01 NaN NaN NaN 13.3916 NaN NaN
5 1946-12-01 NaN NaN NaN 13.4721 NaN NaN
6 1947-02-01 NaN NaN NaN 13.7137 42.5 NaN
[7 rows x 7 columns]
You can rename, fill, drop the erroneous 'GDP_y' column
To sort the merged 'DATE' column just call sort:
In [57]:
merged.sort(['DATE'])
Out[57]:
DATE GDP_x GPDI NFLS INDPRO M08354USM310NNBR GDP_y
4 1946-11-01 NaN NaN NaN 13.3916 NaN NaN
5 1946-12-01 NaN NaN NaN 13.4721 NaN NaN
0 1947-01-01 243.1 35.9 112.815 13.6332 42.8 NaN
6 1947-02-01 NaN NaN NaN 13.7137 42.5 NaN
1 1947-04-01 246.3 34.5 111.253 NaN NaN NaN
2 1947-07-01 250.1 34.9 113.023 NaN NaN NaN
3 1947-10-01 260.3 43.2 111.440 NaN NaN NaN
[7 rows x 7 columns]
Related
The dataframe I am working with is as follows:
date AA1 AB2 AC3 AD4
0 1996-01-01 00:00:00 NaN NaN NaN NaN
1 1996-01-01 01:00:00 NaN 19.2 NaN NaN
2 1996-01-01 02:00:00 NaN 16.4 NaN NaN
3 1996-01-01 03:00:00 NaN 23.5 NaN NaN
4 1996-01-01 04:00:00 20.4 NaN NaN NaN
... ... ... ... ... ...
219164 2020-12-31 20:00:00 13.4 NaN 23.0 26.6
219165 2020-12-31 21:00:00 14.2 NaN 19.6 28.3
219166 2020-12-31 22:00:00 13.5 NaN 17.9 20.5
219167 2020-12-31 23:00:00 NaN NaN 16.7 20.7
219168 2021-01-01 00:00:00 NaN NaN NaN NaN
These are hourly data readings taken from different sensors from the year 1996 to 2021.
My goal is to be able to fill the NaN values with the monthly mean for each of the columns based on the date.
I have tried grouping the data and getting the monthly means for the group, though I am not sure where to go from here to transfer the grouped means to the original, larger dataframe, filling in some of the NaN values.
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
tem = df.groupby(['year', 'month']).mean().reset_index()
The resulting dataframe looks like this, with less indices because of the grouping:
year month AA1 AB2 AC3 AD4
0 1996 1 20.1 18.3 NaN NaN
1 1996 2 NaN NaN NaN NaN
2 1996 3 NaN NaN NaN NaN
3 1996 4 NaN NaN NaN NaN
4 1996 5 NaN NaN NaN NaN
... ... ... ... ... ... ...
296 2020 9 NaN NaN 15.7 20.2
297 2020 10 NaN NaN 15.3 19.7
298 2020 11 NaN NaN 26.7 25.9
299 2020 12 NaN NaN 24.6 25.3
300 2021 1 NaN NaN NaN NaN
Any advice on how I can implement this would be helpful. In the end, I need the original dataset indices, dates and columns, but with the NaN values filled with the means calculated from the monthly groups. The months with all NaN values can be ignored for the time being.
Assuming your date column is of type datetime64 or equivalent:
df['AA2'] = df['AA2'].fillna(df.groupby(df.date.dt.month)['AA2'].transform('mean'))
Or looping over all your columns (except the date column):
for col in df.columns.drop('date'):
df[col] = df[col].fillna(df.groupby(df.date.dt.month)[col].transform('mean'))
If you only want the mean of the month in that specific year, add df.date.dt.year to the group by function:
for col in df.columns.drop('date'):
df[col] = df[col].fillna(df.groupby([df.date.dt.year, df.date.dt.month])[col].transform('mean'))
I have two dictionaries of data frames LP3 and ExeedenceDict. The ExeedenceDict is a dictionary of 4 dataframes with keys 'two','ten','twentyfive','onehundred'. The LP3 dictionary has keys 'LP_DevilMalad', 'LP_Bloomington', 'LP_DevilEvans', 'LP_Deep', 'LP_Maple', 'LP_CubMaple', 'LP_Cottonwood', 'LP_Mill', 'LP_CubNrPreston'
Edit: I am not sure of the most concise way to title this question but I think the title suites what I am asking.
There is a column in each dataframe within the ExeedenceDict that has row values equal to the keys in the LP3 dictionary.
Below is a 'blank' dataframe for two in the ExeedenceDict that I created. Using the code:
ExeedenceDF = []
cols = ['Location','Size','Annual Exceedence', 'With Reg Skew','Without Reg Skew','5% Lower','95% Upper']
for i in range(5):
i = pd.DataFrame(columns=cols)
i['Location'] = LP_names
i['Size'] = [39.8,24,34,29.7,21.2,53.7,61.7,27.6,31.6]
ExeedenceDF.append(i)
ExeedenceDict = {'two':ExeedenceDF[0], 'ten':ExeedenceDF[1], 'twentyfive':ExeedenceDF[2], 'onehundred':ExeedenceDF[3]}
Location Size Annual Exceedence With Reg Skew Without Reg Skew 5% Lower 95% Upper
0 LP_DevilMalad 39.8 NaN NaN NaN NaN NaN
1 LP_Bloomington 24.0 NaN NaN NaN NaN NaN
2 LP_DevilEvans 34.0 NaN NaN NaN NaN NaN
3 LP_Deep 29.7 NaN NaN NaN NaN NaN
4 LP_Maple 21.2 NaN NaN NaN NaN NaN
5 LP_CubMaple 53.7 NaN NaN NaN NaN NaN
6 LP_Cottonwood 61.7 NaN NaN NaN NaN NaN
7 LP_Mill 27.6 NaN NaN NaN NaN NaN
8 LP_CubNrPreston 31.6 NaN NaN NaN NaN NaN
Below is the dataframe for the key LP_DevilMalad in the LP3 dictionary. This dictionary was built by reading in data from 10 excel spreadsheets. Using the code:
LP_names = ['LP_DevilMalad', 'LP_Bloomington', 'LP_DevilEvans', 'LP_Deep', 'LP_Maple', 'LP_CubMaple', 'LP_Cottonwood', 'LP_Mill', 'LP_CubNrPreston']
for i, df in enumerate(LP_Data):
LP_Data[i] = LP_Data[i].dropna()
LP_Data[i]['Annual Exceedence'] = 1 / LP_Data[i]['Annual Exceedence']
LP_Data[i] = LP_Data[i].loc[LP_Data[i]['Annual Exceedence'].isin([2, 10, 25, 100])]
LP3 = {k:v for (k,v) in zip(LP_names, LP_Data)}
'LP_DevilMalad': Annual Exceedence With Reg Skew Without Reg Skew Log Variance of Est \
6 2.0 21.4 22.4 0.0091
9 10.0 46.5 44.7 0.0119
10 25.0 60.2 54.6 0.0166
12 100.0 81.4 67.4 0.0270
5% Lower 95% Upper
6 14.1 31.2
9 32.1 85.7
10 40.6 136.2
12 51.3 250.6
I am having issues matching the column values of each dataframe within the dictionaries from the keys of LP3 to the Location column in ExeedenceDict dataframes. With the goal of coming up with a script that would do all of this iteratively with some sort of dictionary comprehension.
The caveat is that the two dataframe is just the 6 index value in the LP3 dataframes, ten is the 9th index value, 'twentyfive' is the 10th index value, and onehundred is the 12th index value.
The goale data frame for key two in ExeedenceDict based on the two data frames above would look something like this:
Noting that the rest of the dataframe would be filled with the values from the 6th index from the rest of the dataframe values within the LP3 dictionary.
Location Size Annual Exceedence With Reg Skew Without Reg Skew 5% Lower 95% Upper
0 LP_DevilMalad 39.8 2 21.4 22.4 14.1 31.2
1 LP_Bloomington 24.0 NaN NaN NaN NaN NaN
2 LP_DevilEvans 34.0 NaN NaN NaN NaN NaN
3 LP_Deep 29.7 NaN NaN NaN NaN NaN
4 LP_Maple 21.2 NaN NaN NaN NaN NaN
5 LP_CubMaple 53.7 NaN NaN NaN NaN NaN
6 LP_Cottonwood 61.7 NaN NaN NaN NaN NaN
7 LP_Mill 27.6 NaN NaN NaN NaN NaN
8 LP_CubNrPreston 31.6 NaN NaN NaN NaN NaN
Can't test it without a reproducible example, but I would do something along the lines:
index_map = {
"two": 6,
"ten": 9,
"twentyfive": 10,
"onehundred": 12
}
col_of_interest = ["Annual Exceedence", "With Reg Skew", "Without Reg Skew", "5% Lower", "95% Upper"]
for index_key, df in ExeedenceDict.items():
lp_index = index_map[index_key]
for lp_val in df['Location'].values:
df.loc[df['Location'] == lp_val, col_of_interest] = LP3[lp_val].loc[lp_index, col_of_interest].values
I'm working on this raw data frame that needs some cleaning. So far, I have transformed this xlsx file
into this pandas dataframe:
print(df.head(16))
date technician alkalinity colour uv ph turbidity \
0 2020-02-01 00:00:00 Catherine 24.5 33 0.15 7.24 1.53
1 Unnamed: 2 NaN NaN NaN NaN NaN 2.31
2 Unnamed: 3 NaN NaN NaN NaN NaN 2.08
3 Unnamed: 4 NaN NaN NaN NaN NaN 2.2
4 Unnamed: 5 Michel 24 35 0.152 7.22 1.59
5 Unnamed: 6 NaN NaN NaN NaN NaN 1.66
6 Unnamed: 7 NaN NaN NaN NaN NaN 1.71
7 Unnamed: 8 NaN NaN NaN NaN NaN 1.53
8 2020-02-02 00:00:00 Catherine 24 NaN 0.145 7.21 1.44
9 Unnamed: 10 NaN NaN NaN NaN NaN 1.97
10 Unnamed: 11 NaN NaN NaN NaN NaN 1.91
11 Unnamed: 12 NaN NaN 33.0 NaN NaN 2.07
12 Unnamed: 13 Michel 24 34 0.15 7.24 1.76
13 Unnamed: 14 NaN NaN NaN NaN NaN 1.84
14 Unnamed: 15 NaN NaN NaN NaN NaN 1.72
15 Unnamed: 16 NaN NaN NaN NaN NaN 1.85
temperature
0 3
1 NaN
2 NaN
3 NaN
4 3
5 NaN
6 NaN
7 NaN
8 3
9 NaN
10 NaN
11 NaN
12 3
13 NaN
14 NaN
15 NaN
From here, I want to combine the rows so that I only have one row for each date. The values for each row will be the mean in the respective columns. ie.
print(new_df.head(2))
date time alkalinity colour uv ph turbidity temperature
0 2020-02-01 00:00:00 24.25 34 0.151 7.23 1.83 3
1 2020-02-02 00:00:00 24 33.5 0.148 7.23 1.82 3
How can I accomplish this when I have Unnamed values in my date column? Thanks!
Try setting the values to NaN and then use ffill:
df.loc[df.date.str.contains('Unnamed', na=False), 'date'] = np.nan
df.date = df.date.ffill()
If I understand, correctly you want to drop rows that contain 'Unnamed' in the date column, right?
Please look here:
https://stackoverflow.com/a/27360130/12790501
The solution would be something like this:
df = df.drop(df['Unnamed' in df.date].index)
Edit:
No, I would like to replace those Unnamed values with the date so I
could then use the groupby('date') function to return the mean values
for the columns
so in the case you should just iterate over the whole table
last_date = ''
for i in df.index:
if 'Unnamed' not in df.at[i, 'date']:
last_date = df.at[i, 'date']
else:
df.at[i, 'date'] = last_date
If the 'date' column is of type object i.e. string
then just write a logic to loop over the number as seen in image provided it follows a certain pattern-
for _ in range(2,9):
df.loc[(df['date'] == 'Unnamed: '+str(_), 'date'] = your_value
I have the following DataFrame:
fin_data[fin_data['Ticker']=='DNMR']
high low open close volume adj_close Ticker CUMLOGRET_1 PCTRET_1 CUMPCTRET_1 OBV EMA_5 EMA_10 EMA_20 VWMA_15 BBL_20_2.0 BBM_20_2.0 BBU_20_2.0 RSI_14 PVT MACD_10_20_9 MACDh_10_20_9 MACDs_10_20_9 VOLUME_SMA_10 NAV Status Premium_over_NAV
date
2020-05-28 4.700000 4.700000 4.700000 4.700000 100.0 4.700000 DNMR NaN NaN NaN 100.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 10 Completed -0.530
2020-05-29 4.700000 4.700000 4.700000 4.700000 0.0 4.700000 DNMR 0.000000 0.000000 0.000000 100.0 NaN NaN NaN NaN NaN NaN NaN NaN 0.000000e+00 NaN NaN NaN NaN 10 Completed -0.530
2020-06-01 9.660000 9.630000 9.630000 9.660000 2000.0 9.660000 DNMR 0.720431 1.055319 1.055319 2100.0 NaN NaN NaN NaN NaN NaN NaN 100.000000 2.110638e+05 NaN NaN NaN NaN 10 Completed -0.034
2020-06-02 9.660000 9.650000 9.650000 9.660000 60020 9.660000 DNMR 0.720431 0.000000 1.055319 2100.0 NaN NaN NaN NaN NaN NaN NaN 100.000000 2.110638e+05 NaN NaN NaN NaN 10 Completed -0.034
2020-06-03 9.720000 9.630000 9.720000 9.630000 1100.0 9.630000 DNMR 0.717321 -0.003106 1.052214 1000.0 7.670000 NaN NaN NaN NaN NaN NaN 99.303423 2.107222e+05 NaN NaN NaN NaN 10 Completed -0.037
I'd like to either drop the first two rows where the close price is 4.70 or replace 4.70 by 9.66.
In order to drop the rows I tried this but it's giving me an error:
fin_data.drop(fin_data[fin_data['Ticker']=='DNMR'],axis=0,inplace=True)
KeyError: "['high' 'low' 'open' 'close' 'volume' 'adj_close' 'Ticker' 'CUMLOGRET_1'\n 'PCTRET_1' 'CUMPCTRET_1' 'OBV' 'EMA_5' 'EMA_10' 'EMA_20' 'VWMA_15'\n 'BBL_20_2.0' 'BBM_20_2.0' 'BBU_20_2.0' 'RSI_14' 'PVT' 'MACD_10_20_9'\n 'MACDh_10_20_9' 'MACDs_10_20_9' 'VOLUME_SMA_10' 'NAV' 'Status'\n 'Premium_over_NAV'] not found in axis"
Then I tried replace the 4.70 values but even though the code executed without an error the DataFrame is unchanged.
fin_data.loc[fin_data['Ticker']=='DNMR','adj_close'][0:2] = 9.66
Please note that I don't want to delete the data for those two dates (2020-05-28 and 2020-5-29) for other Tickers in the database but just for this one ('DNMR')
Thanks.
you are using it wrong, to drop the rows in question (or actually select the opposite ones) you should do
fin_data = fin_data[(find_data['Ticker'] == 'DNMR']) & (fin_data['close'] == 4.7)]
to point out something about your drop-issue:
the drop() method expects "single label or list-like Index or column labels to drop" (DataFrame.drop) so this would work (notice the .index after the subset)
df
a b c d
0 10 8 3 5
1 5 5 3 1
2 2 2 8 6
df.drop(df[df["a"]== 10].index, axis= 0, inplace= True)
df
a b c d
1 5 5 3 1
2 2 2 8 6
BUT, if you have dates as indexes and there are multiple rows with the same dates, you would also drop those.
A solution would be to reset the index to integers.. but (even though I don't see why you would want non-unique indexes) that may not be what you want and you should stick to Jimmar's answer :)
It's quite often simpler to use a mask
values updates
rows dropped
import pandas as pd
import io
fin_data = pd.read_csv(io.StringIO("""date high low open close volume adj_close Ticker CUMLOGRET_1 PCTRET_1 CUMPCTRET_1 OBV EMA_5 EMA_10 EMA_20 VWMA_15 BBL_20_2.0 BBM_20_2.0 BBU_20_2.0 RSI_14 PVT MACD_10_20_9 MACDh_10_20_9 MACDs_10_20_9 VOLUME_SMA_10 NAV Status Premium_over_NAV
2020-05-28 4.700000 4.700000 4.700000 4.700000 100.0 4.700000 DNMR NaN NaN NaN 100.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 10 Completed -0.530
2020-05-29 4.700000 4.700000 4.700000 4.700000 0.0 4.700000 DNMR 0.000000 0.000000 0.000000 100.0 NaN NaN NaN NaN NaN NaN NaN NaN 0.000000e+00 NaN NaN NaN NaN 10 Completed -0.530
2020-06-01 9.660000 9.630000 9.630000 9.660000 2000.0 9.660000 DNMR 0.720431 1.055319 1.055319 2100.0 NaN NaN NaN NaN NaN NaN NaN 100.000000 2.110638e+05 NaN NaN NaN NaN 10 Completed -0.034
2020-06-02 9.660000 9.650000 9.650000 9.660000 60020 9.660000 DNMR 0.720431 0.000000 1.055319 2100.0 NaN NaN NaN NaN NaN NaN NaN 100.000000 2.110638e+05 NaN NaN NaN NaN 10 Completed -0.034
2020-06-03 9.720000 9.630000 9.720000 9.630000 1100.0 9.630000 DNMR 0.717321 -0.003106 1.052214 1000.0 7.670000 NaN NaN NaN NaN NaN NaN 99.303423 2.107222e+05 NaN NaN NaN NaN 10 Completed -0.037"""), sep="\s+")
fin_data.date=pd.to_datetime(fin_data.date)
fin_data = fin_data.set_index(["date"])
mask = fin_data["Ticker"].eq("DNMR") & fin_data["close"].eq(4.7)
fin_data.loc[mask, "close"] = 0
print(fin_data.iloc[:,0:6].to_markdown())
date
high
low
open
close
volume
adj_close
2020-05-28 00:00:00
4.7
4.7
4.7
0
100
4.7
2020-05-29 00:00:00
4.7
4.7
4.7
0
0
4.7
2020-06-01 00:00:00
9.66
9.63
9.63
9.66
2000
9.66
2020-06-02 00:00:00
9.66
9.65
9.65
9.66
60020
9.66
2020-06-03 00:00:00
9.72
9.63
9.72
9.63
1100
9.63
fin_data = fin_data.drop(fin_data.loc[mask].index, axis=0)
print(fin_data.iloc[:,0:6].to_markdown())
date
high
low
open
close
volume
adj_close
2020-06-01 00:00:00
9.66
9.63
9.63
9.66
2000
9.66
2020-06-02 00:00:00
9.66
9.65
9.65
9.66
60020
9.66
2020-06-03 00:00:00
9.72
9.63
9.72
9.63
1100
9.63
I'm trying to multiply columns from two different dataframes into a new df. The first dataframe (df1) contains the prices for different items, and the column header is the date. The second dataframe (df2) contains the quantity of each item.
df1
Date 1990-01-03 1990-01-04 1990-01-05 ... 2020-04-09 2020-04-14 2020-04-15
AAAAAAA 1.11 1.11 1.09 ... 102.22 103.46 103.96
BBBBBBB NaN NaN NaN ... 308.70 314.95 314.10
CCCCCCC NaN NaN NaN ... 65.34 58.72 56.18
DDDDDDD 5.52 5.51 5.53 ... 104.50 106.03 NaN
EEEEEEE NaN NaN NaN ... 1211.45 1269.23 NaN
FFFFFFF NaN NaN NaN ... 36.14 36.85 NaN
GGGGGGG 93.35 94.37 94.37 ... 1564.00 1537.50 1482.50
HHHHHHH NaN NaN NaN ... 45.69 46.68 46.24
IIIIIII NaN NaN NaN ... 75.10 74.88 74.40
JJJJJJJ 328.76 328.25 327.74 ... 6168.00 6448.00 6296.00
KKKKKKK NaN NaN NaN ... 23.49 23.50 24.04
LLLLLLL 4.45 4.41 4.34 ... 36.55 35.96 NaN
MMMMMMM 1.96 1.96 1.94 ... 141.23 146.03 NaN
NNNNNNN 1.09 1.09 1.09 ... 267.99 287.05 NaN
OOOOOOO 1.09 1.09 1.08 ... 201.53 207.17 NaN
PPPPPPP NaN NaN NaN ... 98.00 100.80 100.50
QQQQQQQ NaN NaN NaN ... 129.00 128.40 124.20
RRRRRRR NaN NaN NaN ... 140.60 141.45 139.60
[18 rows x 7658 columns]
and df2
Symbol Average Purchase Price Quantity
0 AAAAAAA 49.980 320.0
1 BBBBBBB 239.125 120.0
2 CCCCCCC 223.040 40.0
3 DDDDDDD 90.370 100.0
4 EEEEEEE 701.300 10.0
5 FFFFFFF 35.150 120.0
6 GGGGGGG 1259.000 700.0
7 HHHHHHH 32.050 250.0
8 IIIIIII 53.300 240.0
9 JJJJJJJ 6805.000 130.0
10 KKKKKKK 27.590 1000.0
11 LLLLLLL 82.120 170.0
12 MMMMMMM 106.470 150.0
13 NNNNNNN 95.970 308.0
14 OOOOOOO 81.420 150.0
15 PPPPPPP 39.690 60.0
16 QQQQQQQ 35.270 104.0
17 RRRRRRR 68.240 12.0
however when I use the function:
date = '2020-04-14'
total = df2[['Quantity']].mul(df1[date], axis=0)
print(total)
(Ideally, I'd like to do it for every date but I'm just learning so I thought I'd start out with one date)
I get:
Quantity
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
17 NaN
AAAAAAA NaN
BBBBBBB NaN
CCCCCCC NaN
DDDDDDD NaN
EEEEEEE NaN
FFFFFFF NaN
GGGGGGG NaN
HHHHHHH NaN
IIIIIII NaN
JJJJJJJ NaN
KKKKKKK NaN
LLLLLLL NaN
MMMMMMM NaN
NNNNNNN NaN
OOOOOOO NaN
PPPPPPP NaN
QQQQQQQ NaN
RRRRRRR NaN
how can I solve this?
It is a problem of indexes. The index column of the product dataframe is an evidence that Symbol is the index for the first dataframe, while the second has a sequential index. Assuming that no repetition of the symbol occurs in either dataframe, you could set Symbol as the index in the second one
date = '2020-04-14'
total = df2.set_index('Symbol')[['Quantity']].mul(df1[date], axis=0)
print(total)
it gives:
Quantity
Symbol
AAAAAAA 33107.2
BBBBBBB 37794.0
CCCCCCC 2348.8
DDDDDDD 10603.0
EEEEEEE 12692.3
FFFFFFF 4422.0
GGGGGGG 1076250.0
HHHHHHH 11670.0
IIIIIII 17971.2
JJJJJJJ 838240.0
KKKKKKK 23500.0
LLLLLLL 6113.2
MMMMMMM 21904.5
NNNNNNN 88411.4
OOOOOOO 31075.5
PPPPPPP 6048.0
QQQQQQQ 13353.6
RRRRRRR 1697.4
The problem is in indexing - your data frames have got different indices. To make your code work, unify indices in both data frames by pandas.DataFrame.reset_index() method. You can use the following code.
>>> df1.reset_index(inplace=True)
The code will change index in df1 on integers from 0 to 17, which will be the same index as df2 has got.