Columnwise operation on multiple mapped columns using pandas

Columnwise operation on multiple mapped columns using pandas - python

I have two dataframes namely df1 and df2. I want to perform operation on column "New_Amount_Dollar" from df2. Basically in df1 I have historical currency data and I want to perform datewise operation given Currency and Amount_Dollar from df2 to calculate the values for New_Amount_Dollar column in df2.
For 'Currency' == [AUD, BWP] We need to multiply the Amount_Dollar by respective currency value for respective date.
For other currencies We need to divide the Amount_Dollar by respective currency value for respective date.
e.g In df2 I have first currency as AUD for Date = '01-01-2019', so I want to calculate New_Amount_Dollar value such that
New_Amount_Dollar = Amount_Dollar*AUD value from df1 i.e New_Amount_Dollar = 19298*98 = 1891204
another example where in df2 I have third currency as COP for Date = '03-01-2019, so I want to calculate New_Amount_Dollar value such that
New_Amount_Dollar = Amount_Dollar/COP value from df1 i.e New_Amount_Dollar = 5000/0.043 = 116279.06
import pandas as pd
data1 = {'Date':['01-01-2019', '02-01-2019', '03-01-2019',
'04-01-2019','05-01-2019'],
'AUD':[98, 98.5, 99, 99.5, 97],
'BWP':[30,31,33,32,31],
'CAD':[0.02,0.0192,0.0196,0.0196,0.0192],
'BND':[0.99,0.952,0.970,0.980,0.970],
'COP':[0.05,0.047,0.043,0.047,0.045]}
df1 = pd.DataFrame(data1)
data2 = {'Date':['01-01-2019', '02-01-2019', '03-01-2019', '04-01-2019','05-01-2019'],
'Currency':['AUD','AUD','COP','CAD','BND'],
'Amount_Dollar':[19298, 19210, 5000, 200, 2300],
'New_Amount_Dollar':[0,0,0,0,0]
}
df2 = pd.DataFrame(data2)
print (df2)
df1
Date AUD BWP CAD BND COP
0 01-01-2019 98.0 30 0.0200 0.990 0.050
1 02-01-2019 98.5 31 0.0192 0.952 0.047
2 03-01-2019 99.0 33 0.0196 0.970 0.043
3 04-01-2019 99.5 32 0.0196 0.980 0.047
4 05-01-2019 97.0 31 0.0192 0.970 0.045
df2
Date Currency Amount_Dollar New_Amount_Dollar
0 01-01-2019 AUD 19298 0
1 02-01-2019 AUD 19210 0
2 03-01-2019 COP 5000 0
3 04-01-2019 CAD 200 0
4 05-01-2019 BND 2300 0
Expected result
Date Currency Amount_Dollar New_Amount_Dollar
0 01-01-2019 AUD 19298 1891204
1 02-01-2019 AUD 19210 1892185.0
2 03-01-2019 COP 5000 116279.06
3 04-01-2019 CAD 200 10204.08
4 05-01-2019 BND 2300 2371.13

You want lookup and isin():
# this is to know where to multiply
# where to divide
s = df2['Currency'].isin(['AUD', 'BWP'])
# the values to multiply/divide
m = df1.set_index('Date').lookup(df2['Date'],df2['Currency'])
df2['New_Amount_Dollar'] = df2['Amount_Dollar'] * np.where(s, m, 1/m)
Output:
Date Currency Amount_Dollar New_Amount_Dollar
0 01-01-2019 AUD 19298 1891204.00
1 02-01-2019 AUD 19210 1892185.00
2 03-01-2019 COP 5000 116279.07
3 04-01-2019 CAD 200 10204.08
4 05-01-2019 BND 2300 2371.13

Try using melt and merge:
df_out = df2.merge(df1.melt('Date', var_name='Currency'), on= ['Date','Currency'])
df_out['New_Amount_Dollar'] = (df_out['Amount_Dollar'] *
np.where(df_out['Currency'].isin(['AUD', 'BWP']),
df_out['value'],
1/df_out['value']))
print(df_out)
Output:
Date Currency Amount_Dollar New_Amount_Dollar value
0 01-01-2019 AUD 19298 1891204.000 98.000
1 02-01-2019 AUD 19210 1892185.000 98.500
2 03-01-2019 COP 5000 116279.070 0.043
3 04-01-2019 CAD 200 10204.082 0.020
4 05-01-2019 BND 2300 2371.134 0.970

Related

Modify duplicate rows with datetime

I have a dataframe with id, purchase date, price of purchase and duration in days,
df
id purchased_date price duration
1 2020-01-01 16.50 2
2 2020-01-01 24.00 4
What I'm trying to do is where ever the duration is greater than 1 day, I want the number of extra days to be split into duplicated rows, the price to be divided by the number of individual days and the date to increase by 1 day for each day purchased. Effectively giving me this,
df_new
id purchased_date price duration
1 2020-01-01 8.25 1
1 2020-01-02 8.25 1
2 2020-01-01 6.00 1
2 2020-01-02 6.00 1
2 2020-01-03 6.00 1
2 2020-01-04 6.00 1
So far
I've managed to duplicate the rows based on the duration using.
df['price'] = df['price']/df['duration']
df = df.loc[df.index.repeat(df.duration)]
and then I've tried using,
df.groupby(['id', 'purchased_date']).purchased_date.apply(lambda n: n + pd.to_timedelta(1, unit='d'))
however, this just gets stuck in an endless loop and I'm a bit stuck.
My plan is to put this all in a function but for now I just want to get the process working.
Thank you for any help.

Use GroupBy.cumcount for counter, so possible pass to to_timedeltato_timedelta for days timedeltas and add to column purchased_date:
df['price'] = df['price']/df['duration']
df = df.loc[df.index.repeat(df.duration)].assign(duration=1)
df['purchased_date'] += pd.to_timedelta(df.groupby(level=0).cumcount(), unit='d')
df = df.reset_index(drop=True)
print (df)
id purchased_date price duration
0 1 2020-01-01 8.25 1
1 1 2020-01-02 8.25 1
2 2 2020-01-01 6.00 1
3 2 2020-01-02 6.00 1
4 2 2020-01-03 6.00 1
5 2 2020-01-04 6.00 1

An approach with pandas.date_range and explode:
(df.assign(price=df['price'].div(df['duration']),
purchased_date=df.apply(lambda x: pd.date_range(x['purchased_date'],
periods=x['duration']),
axis=1),
duration=1
)
.explode('purchased_date', ignore_index=True)
)
output:
id purchased_date price duration
0 1 2020-01-01 8.25 1
1 1 2020-01-02 8.25 1
2 2 2020-01-01 6.00 1
3 2 2020-01-02 6.00 1
4 2 2020-01-03 6.00 1
5 2 2020-01-04 6.00 1

Here is an easy to understand approach:
Assign average 'price' value
Create a temporary 'end_date' column
Modify 'purchased_date' to form a list of date-time
Explode 'purchased_date' to form new rows
Assign 1 to duration column
Delete the temporary 'end_date' column
Code:
df['price'] = df['price']/df['duration']
df['end_date'] = df.purchased_date + pd.to_timedelta(df.duration.sub(1), unit='d')
df['purchased_date'] = df.apply(lambda x: pd.date_range(start=x['purchased_date'], end=x['end_date']), axis=1)
df = df.explode('purchased_date').reset_index(drop=True)
df = df.assign(duration=1)
del df['end_date']
print (df)
id purchased_date price duration
0 1 2020-01-01 8.25 1
1 1 2020-01-02 8.25 1
2 2 2020-01-01 6.00 1
3 2 2020-01-02 6.00 1
4 2 2020-01-03 6.00 1
5 2 2020-01-04 6.00 1

Select Pandas dataframe rows between two dates

I am working on two tables as follows:
A first table df1 giving a rate and a validity period:
rates = {'rate': [ 0.974, 0.966, 0.996, 0.998, 0.994, 1.006, 1.042, 1.072, 0.954],
'Valid from': ['31/12/2018','15/01/2019','01/02/2019','01/03/2019','01/04/2019','15/04/2019','01/05/2019','01/06/2019','30/06/2019'],
'Valid to': ['14/01/2019','31/01/2019','28/02/2019','31/03/2019','14/04/2019','30/04/2019','31/05/2019','29/06/2019','31/07/2019']}
df1 = pd.DataFrame(rates)
df1['Valid to'] = pd.to_datetime(df1['Valid to'])
df1['Valid from'] = pd.to_datetime(df1['Valid from'])
rate Valid from Valid to
0 0.974 2018-12-31 2019-01-14
1 0.966 2019-01-15 2019-01-31
2 0.996 2019-01-02 2019-02-28
3 0.998 2019-01-03 2019-03-31
4 0.994 2019-01-04 2019-04-14
5 1.006 2019-04-15 2019-04-30
6 1.042 2019-01-05 2019-05-31
7 1.072 2019-01-06 2019-06-29
8 0.954 2019-06-30 2019-07-31
A second table df2 listing recorded amounts and corresponding dates
data = {'date': ['03/01/2019','23/01/2019','27/02/2019','14/03/2019','05/04/2019','30/04/2019','14/06/2019'],
'amount': [200,305,155,67,95,174,236,]}
df2 = pd.DataFrame(data)
df2['date'] = pd.to_datetime(df2['date'])
date amount
0 2019-03-01 200
1 2019-01-23 305
2 2019-02-27 155
3 2019-03-14 67
4 2019-05-04 95
5 2019-04-30 174
6 2019-06-14 236
The objective would be to retrieve from df1 the applicable rate to each row on df2 using iteration and based on the date on df2.
Example: the date on the first row in df2 is 2019-01-03, therefore the applicable rate would be 0.974
The explanations given here (https://www.interviewqs.com/ddi_code_snippets/select_pandas_dataframe_rows_between_two_dates) gives me an idea on how to retrieve the rows on df2 between two dates in df1.
But I didn't manage to retrieve from df1 the applicable rate to each row on df2 using iteration.

If your dataframes are not very big, you can simply do the join on a dummy key and then do filtering to narrow it down to what you need. See example below (note that I had to update your example a little bit to have correct date formatting)
import pandas as pd
rates = {'rate': [ 0.974, 0.966, 0.996, 0.998, 0.994, 1.006, 1.042, 1.072, 0.954],
'valid_from': ['31/12/2018','15/01/2019','01/02/2019','01/03/2019','01/04/2019','15/04/2019','01/05/2019','01/06/2019','30/06/2019'],
'valid_to': ['14/01/2019','31/01/2019','28/02/2019','31/03/2019','14/04/2019','30/04/2019','31/05/2019','29/06/2019','31/07/2019']}
df1 = pd.DataFrame(rates)
df1['valid_to'] = pd.to_datetime(df1['valid_to'],format ='%d/%m/%Y')
df1['valid_from'] = pd.to_datetime(df1['valid_from'],format='%d/%m/%Y')
Then you df1 would be
rate valid_from valid_to
0 0.974 2018-12-31 2019-01-14
1 0.966 2019-01-15 2019-01-31
2 0.996 2019-02-01 2019-02-28
3 0.998 2019-03-01 2019-03-31
4 0.994 2019-04-01 2019-04-14
5 1.006 2019-04-15 2019-04-30
6 1.042 2019-05-01 2019-05-31
7 1.072 2019-06-01 2019-06-29
8 0.954 2019-06-30 2019-07-31
This is your second data frame df2
data = {'date': ['03/01/2019','23/01/2019','27/02/2019','14/03/2019','05/04/2019','30/04/2019','14/06/2019'],
'amount': [200,305,155,67,95,174,236,]}
df2 = pd.DataFrame(data)
df2['date'] = pd.to_datetime(df2['date'],format ='%d/%m/%Y')
Then your df2 would look like the following
date amount
0 2019-01-03 200
1 2019-01-23 305
2 2019-02-27 155
3 2019-03-14 67
4 2019-04-05 95
5 2019-04-30 174
6 2019-06-14 236
Your solution:
df1['key'] = 1
df2['key'] = 1
df_output = pd.merge(df1, df2, on='key').drop('key',axis=1)
df_output = df_output[(df_output['date'] > df_output['valid_from']) & (df_output['date'] <= df_output['valid_to'])]
This is how would the result look like df_output:
rate valid_from valid_to date amount
0 0.974 2018-12-31 2019-01-14 2019-01-03 200
8 0.966 2019-01-15 2019-01-31 2019-01-23 305
16 0.996 2019-02-01 2019-02-28 2019-02-27 155
24 0.998 2019-03-01 2019-03-31 2019-03-14 67
32 0.994 2019-04-01 2019-04-14 2019-04-05 95
40 1.006 2019-04-15 2019-04-30 2019-04-30 174
55 1.072 2019-06-01 2019-06-29 2019-06-14 236

Column-wise Mapping and operations on dataframe using pandas

I have two dataframes namely df1 and df2. I want to perform operation on column New_Amount_Dollar from df2. Basically in df1 I have historical currency data and I want to perform datewise operation given Currency and Amount_Dollar from df2 to calculate the values for New_Amount_Dollar column in df2.
e.g In df2 I have first currency as AUD for Date = '01-01-2019', so I want to calculate New_Amount_Dollar value such that
New_Amount_Dollar = Amount_Dollar/AUD value from df1
i.e New_Amount_Dollar = 19298/98 = 196.91
another example where in df2 I have third currency as COP for Date = '03-01-2019, so I want to calculate New_Amount_Dollar value such that
New_Amount_Dollar = Amount_Dollar/COP value from df1
i.e New_Amount_Dollar = 5000/0.043 = 116279.06
import pandas as pd
data1 = {'Date':['01-01-2019', '02-01-2019', '03-01-2019', '04-01-2019','05-01-2019'],
'AUD':[98, 98.5, 99, 99.5, 97],
'BWP':[30,31,33,32,31],
'CAD':[0.02,0.0192,0.0196,0.0196,0.0192],
'BND':[0.99,0.952,0.970,0.980,0.970],
'COP':[0.05,0.047,0.043,0.047,0.045]}
df1 = pd.DataFrame(data1)
data2 = {'Date':['01-01-2019', '02-01-2019', '03-01-2019', '04-01-2019','05-01-2019'],
'Currency':['AUD','AUD','COP','CAD','BND'],
'Amount_Dollar':[19298, 19210, 5000, 200, 2300],
'New_Amount_Dollar':[0,0,0,0,0]
}
df2 = pd.DataFrame(data2)
df1
Date AUD BWP CAD BND COP
0 01-01-2019 98.0 30 0.0200 0.990 0.050
1 02-01-2019 98.5 31 0.0192 0.952 0.047
2 03-01-2019 99.0 33 0.0196 0.970 0.043
3 04-01-2019 99.5 32 0.0196 0.980 0.047
4 05-01-2019 97.0 31 0.0192 0.970 0.045
df2
Date Currency Amount_Dollar New_Amount_Dollar
0 01-01-2019 AUD 19298 0
1 02-01-2019 AUD 19210 0
2 03-01-2019 COP 5000 0
3 04-01-2019 CAD 200 0
4 05-01-2019 BND 2300 0
Expected Result
Date Currency Amount_Dollar New_Amount_Dollar
0 01-01-2019 AUD 19298 196.91
1 02-01-2019 AUD 19210 195.02
2 03-01-2019 COP 5000 116279.06
3 04-01-2019 CAD 200 10204.08
4 05-01-2019 BND 2300 2371.13

Use DataFrame.lookup with DataFrame.set_index for array and divide Amount_Dollar column:
arr = df1.set_index('Date').lookup(df2['Date'], df2['Currency'])
df2['New_Amount_Dollar'] = df2['Amount_Dollar'] / arr
print (df2)
Date Currency Amount_Dollar New_Amount_Dollar
0 01-01-2019 AUD 19298 196.918367
1 02-01-2019 AUD 19210 195.025381
2 03-01-2019 COP 5000 116279.069767
3 04-01-2019 CAD 200 10204.081633
4 05-01-2019 BND 2300 2371.134021
But if datetimes not match, use DataFrame.asfreq:
import pandas as pd
data1 = {'Date':['01-01-2019', '02-01-2019', '03-01-2019',
'04-01-2019','05-01-2019','08-01-2019'],
'AUD':[98, 98.5, 99, 99.5, 97,100],
'BWP':[30,31,33,32,31,20],
'CAD':[0.02,0.0192,0.0196,0.0196,0.0192,0.2],
'BND':[0.99,0.952,0.970,0.980,0.970,.23],
'COP':[0.05,0.047,0.043,0.047,0.045,0.023]}
df1 = pd.DataFrame(data1)
data2 = {'Date':['01-01-2019', '02-01-2019', '03-01-2019', '04-01-2019','07-01-2019'],
'Currency':['AUD','AUD','COP','CAD','BND'],
'Amount_Dollar':[19298, 19210, 5000, 200, 2300],
'New_Amount_Dollar':[0,0,0,0,0]
}
df2 = pd.DataFrame(data2)
print (df1)
Date AUD BWP CAD BND COP
0 01-01-2019 98.0 30 0.0200 0.990 0.050
1 02-01-2019 98.5 31 0.0192 0.952 0.047
2 03-01-2019 99.0 33 0.0196 0.970 0.043
3 04-01-2019 99.5 32 0.0196 0.980 0.047
4 05-01-2019 97.0 31 0.0192 0.970 0.045
5 08-01-2019 100.0 20 0.2000 0.230 0.023
print (df2)
Date Currency Amount_Dollar New_Amount_Dollar
0 01-01-2019 AUD 19298 0
1 02-01-2019 AUD 19210 0
2 03-01-2019 COP 5000 0
3 04-01-2019 CAD 200 0
4 07-01-2019 BND 2300 0
df1['Date'] = pd.to_datetime(df1['Date'], dayfirst=True)
df2['Date'] = pd.to_datetime(df2['Date'], dayfirst=True)
print (df1.set_index('Date').asfreq('D', method='ffill'))
AUD BWP CAD BND COP
Date
2019-01-01 98.0 30 0.0200 0.990 0.050
2019-01-02 98.5 31 0.0192 0.952 0.047
2019-01-03 99.0 33 0.0196 0.970 0.043
2019-01-04 99.5 32 0.0196 0.980 0.047
2019-01-05 97.0 31 0.0192 0.970 0.045
2019-01-06 97.0 31 0.0192 0.970 0.045
2019-01-07 97.0 31 0.0192 0.970 0.045
2019-01-08 100.0 20 0.2000 0.230 0.023
arr = df1.set_index('Date').asfreq('D', method='ffill').lookup(df2['Date'], df2['Currency'])
df2['New_Amount_Dollar'] = df2['Amount_Dollar'] / arr
print (df2)
Date Currency Amount_Dollar New_Amount_Dollar
0 2019-01-01 AUD 19298 196.918367
1 2019-01-02 AUD 19210 195.025381
2 2019-01-03 COP 5000 116279.069767
3 2019-01-04 CAD 200 10204.081633
4 2019-01-07 BND 2300 2371.134021

Convert pandas column with single list of values into rows

I have the following dataframe:
symbol PSAR
0 AAPL [nan,100,200]
1 PYPL [nan,300,400]
2 SPY [nan,500,600]
I am trying to turn the PSAR list values into rows like the following:
symbol PSAR
AAPL nan
AAPL 100
AAPL 200
PYPL nan
PYPL 300
... ...
SPY 600
I have been trying to solve it by following the answers in this post(one key difference being that that post has a list of list) but cant get there.
How to convert column with list of values into rows in Pandas DataFrame.
df['PSAR'].stack().reset_index(level=1, drop=True).to_frame('PSAR')
.join(df[['symbol']], how='left')

Not a slick one but this does the job:
list_of_lists = []
df_as_dict = dict(df.values)
for key,values in df_as_dict.items():
list_of_lists+=[[key,value] for value in values]
pd.DataFrame(list_of_lists)
returns:
0 1
0 AAPL NaN
1 AAPL 100.0
2 AAPL 200.0
3 PYPL NaN
4 PYPL 300.0
5 PYPL 400.0
6 SPY NaN
7 SPY 500.0
8 SPY 600.0

Pandas >= 0.25:
df1 = pd.DataFrame({'symbol':['AAPL', 'PYPL', 'SPY'],
'PSAR':[[None,100,200], [None,300,400], [None,500,600]]})
print(df1)
symbol PSAR
0 AAPL [None, 100, 200]
1 PYPL [None, 300, 400]
2 SPY [None, 500, 600]
df1.explode('PSAR')
symbol PSAR
0 AAPL None
0 AAPL 100
0 AAPL 200
1 PYPL None
1 PYPL 300
1 PYPL 400
2 SPY None
2 SPY 500
2 SPY 600

pandas calculates column value means on groups and means across whole dataframe

I have a df, df['period'] = (df['date1'] - df['date2']) / np.timedelta64(1, 'D')
code y_m date1 date2 period
1000 201701 2017-12-10 2017-12-09 1
1000 201701 2017-12-14 2017-12-12 2
1000 201702 2017-12-15 2017-12-13 2
1000 201702 2017-12-17 2017-12-15 2
2000 201701 2017-12-19 2017-12-18 1
2000 201701 2017-12-12 2017-12-10 2
2000 201702 2017-12-11 2017-12-10 1
2000 201702 2017-12-13 2017-12-12 1
2000 201702 2017-12-11 2017-12-10 1
then groupby code and y_m to calculate the average of date1-date2,
df_avg_period = df.groupby(['code', 'y_m'])['period'].mean().reset_index(name='avg_period')
code y_m avg_period
1000 201701 1.5
1000 201702 2
2000 201701 1.5
2000 201702 1
but I like to convert df_avg_period into a matrix that transposes column code to rows and y_m to columns, like
0 1 2 3
0 -1 0 201701 201702
1 0 1.44 1.44 1.4
2 1000 1.75 1.5 2
3 2000 1.20 1.5 1
-1 represents a dummy value that indicates either a value doesn't exist for a specific code/y_m cell or to maintain matrix shape; 0 represents 'all' values, that averages the code or y_m or code and y_m, e.g. cell (1,1) averages the period values for all rows in df; (1,2) averages the period for 201701 across all rows that have this value for y_m in df.
apparently pivot_table cannot give correct results using mean. so I am wondering how to achieve that correctly?

pivot_table with margins=True
piv = df.pivot_table(
index='code', columns='y_m', values='period', aggfunc='mean', margins=True
)
# housekeeping
(piv.reset_index()
.rename_axis(None, 1)
.rename({'code' : -1, 'All' : 0}, axis=1)
.sort_index(axis=1)
)
-1 0 201701 201702
0 1000 1.750000 1.5 2.0
1 2000 1.200000 1.5 1.0
2 All 1.444444 1.5 1.4

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Columnwise operation on multiple mapped columns using pandas - python

Related

Modify duplicate rows with datetime

Select Pandas dataframe rows between two dates

Column-wise Mapping and operations on dataframe using pandas

Convert pandas column with single list of values into rows

pandas calculates column value means on groups and means across whole dataframe

Categories

Resources