Matching and adding column to data frame - python

I am going crazy about this one. I am trying to add a new column to a data frame DF1, based on values found in another data frame DF2. This is how they look,
DF1=
Date Amount Currency
0 2014-08-20 -20000000 EUR
1 2014-08-20 -12000000 CAD
2 2014-08-21 10000 EUR
3 2014-08-21 20000 USD
4 2014-08-22 25000 USD
DF2=
NAME OPEN
0 EUR 10
1 CAD 20
2 USD 30
Now, I would like to create a new column in DF1, named 'Amount (Local)', where each amount in 'Amount' is multipled with the correct matching value found in DF2 yielding a result,
DF1=
Date Amount Currency Amount (Local)
0 2014-08-20 -20000000 EUR -200000000
1 2014-08-20 -12000000 CAD -240000000
2 2014-08-21 10000 EUR 100000
3 2014-08-21 20000 USD 600000
4 2014-08-22 25000 USD 750000
If there exists a method for adding a column to DF1 based on a function, instead of just multiplication as the above problem, that would be very much appreciated also.
Thanks,

You can use map from a dict of your second df (in my case it is called df1. yours is DF2), and then multiply the result of this by the amount:
In [65]:
df['Amount (Local)'] = df['Currency'].map(dict(df1[['NAME','OPEN']].values)) * df['Amount']
df
Out[65]:
Date Amount Currency Amount (Local)
index
0 2014-08-20 -20000000 EUR -200000000
1 2014-08-20 -12000000 CAD -240000000
2 2014-08-21 10000 EUR 100000
3 2014-08-21 20000 USD 600000
4 2014-08-22 25000 USD 750000
So breaking this down, map will match the value against the value in the dict key, in this case we are matching Currency against the NAME key, the value in the dict is the OPEN values, the result of this would be:
In [66]:
df['Currency'].map(dict(df1[['NAME','OPEN']].values))
Out[66]:
index
0 10
1 20
2 10
3 30
4 30
Name: Currency, dtype: int64
We then simply multiply this series against the Amount column from df (DF1 in your case) to get the desired result.

Use fancy-indexing to create a currency array aligned with your data in df1, then use it in multiplication, and assign the result to a new column in df1:
import pandas as pd
ccy_series = pd.Series([10,20,30], index=['EUR', 'CAD', 'USD'])
df1 = pd.DataFrame({'amount': [-200, -120, 1, 2, 2.5], 'ccy': ['EUR', 'CAD', 'EUR', 'USD', 'USD']})
aligned_ccy = ccy_series[df1.ccy].reset_index(drop=True)
aligned_ccy
=>
0 10
1 20
2 10
3 30
4 30
dtype: int64
df1['amount_local'] = df1.amount *aligned_ccy
df1
=>
amount ccy amount_local
0 -200.0 EUR -2000
1 -120.0 CAD -2400
2 1.0 EUR 10
3 2.0 USD 60
4 2.5 USD 75

Related

Stacking column indices on top of one another using Pandas

I'm looking to stack the indices of some columns on top of one another, this is what I currently have:
Buy Buy Currency Sell Sell Currency
Date
2013-12-31 100 CAD 100 USD
2014-01-02 200 USD 200 CAD
2014-01-03 300 CAD 300 USD
2014-01-06 400 USD 400 CAD
This is what I'm looking to achieve:
Buy/Sell Buy/Sell Currency
100 USD
100 CAD
200 CAD
200 USD
300 USD
300 CAD
And so on, Basically want to take the values in "Buy" and "Buy Currency" and stack their values in the "Sell" and "Sell Currency" columns, one after the other.
And so on. I should mention that my data frame has 10 columns in total so using
df_pl.stack(level=0)
doesn't seem to work.
One option is with pivot_longer from pyjanitor, where for this particular use case, you pass a list of regular expressions (to names_pattern) to aggregate the desired column labels into new groups (in names_to):
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(index=None,
names_to = ['Buy/Sell', 'Buy/Sell Currency'],
names_pattern = [r"Buy$|Sell$", ".+Currency$"],
ignore_index = False,
sort_by_appearance=True)
Buy/Sell Buy/Sell Currency
Date
2013-12-31 100 CAD
2013-12-31 100 USD
2014-01-02 200 USD
2014-01-02 200 CAD
2014-01-03 300 CAD
2014-01-03 300 USD
2014-01-06 400 USD
2014-01-06 400 CAD
using concat
import pandas as pd
print(pd.concat(
[df['Buy'], df['sell']], axis=1
).stack().reset_index(1, drop=True).rename(index='buy/sell')
)
output:
0 100
0 100
1 200
1 200
2 300
2 300
3 400
3 400
# assuming that your data has date as index.
df.set_index('date', inplace=True)
# create a mapping to new column names
d={'Buy Currency': 'Buy/Sell Currency',
'Sell Currency' : 'Buy/Sell Currency',
'Buy' : 'Buy/Sell',
'Sell' :'Buy/Sell'
}
df.columns=df.columns.map(d)
# stack first two columns over the next two columns
out=pd.concat([ df.iloc[:,:2],
df.iloc[:,2:]
],
ignore_index=True
)
out
Buy/Sell Buy/Sell Currency
0 100 CAD
1 200 USD
2 300 CAD
3 400 USD
4 100 USD
5 200 CAD
6 300 USD
7 400 CAD

Multiplying columns in separate pandas dataframes

I'm trying to multiply data from 2 different dataframes and my code as below:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'v_contract_number': ['VN120001438','VN120001439',
'VN120001440','VN120001438',
'VN120001439','VN120001440'],
'Currency': ['VND','USD','KRW','USD','KRW','USD'],
'Amount': [10000,5000,6000,200,150,175]})
df2 = pd.DataFrame({'Currency': ['VND','USD','KRW'],'Rate': [1,23000,1200]})
print(df1)
# df1
v_contract_number Currency Amount
0 VN120001438 VND 10000
1 VN120001439 USD 5000
2 VN120001440 KRW 6000
3 VN120001438 USD 200
4 VN120001439 KRW 150
5 VN120001440 USD 175
print(df2)
Currency Rate
0 VND 1
1 USD 23000
2 KRW 1200
df1 = df1.merge(df2)
df1['VND AMount'] = df1['Amount'].mul(df1['Rate'])
df1.drop('Rate', axis=1, inplace=True)
print(df1)
# result
v_contract_number Currency Amount VND AMount
0 VN120001438 VND 10000 10000
1 VN120001439 USD 5000 115000000
2 VN120001438 USD 200 4600000
3 VN120001440 USD 175 4025000
4 VN120001440 KRW 6000 7200000
5 VN120001439 KRW 150 180000
This is exactly what I want but I would like to know that have another way to not merge and drop as I did?
The reason that I drop ‘Rate’ because I dont want it appears in my report.
Thanks and best regards
You can use pandas' map for this:
df2 = df2.set_index('Currency').squeeze() # squeeze converts to a Series
df1.assign(VND_Amount = df1.Amount.mul(df1.Currency.map(df2)))
v_contract_number Currency Amount VND_Amount
0 VN120001438 VND 10000 10000
1 VN120001439 USD 5000 115000000
2 VN120001440 KRW 6000 7200000
3 VN120001438 USD 200 4600000
4 VN120001439 KRW 150 180000
5 VN120001440 USD 175 4025000
You can avoid the drop by not overwriting df1 on the merge operation:
df1["VND Amount"] = df1.merge(df2, on="Currency").eval("Amount * Rate")
Alternatively you can use .reindex to align df2 to df1 based on the currency column:
df1["VND Amount"] = (
df1["Amount"] *
(df2.set_index("Currency")["Rate"] # set the index and return Rate column
.reindex(df1["Currency"]) # align "Rate" values to df1 "Currency"
.to_numpy() # get numpy array to avoid pandas
# auto alignment on math ops
)
)

In Python, how can I map column value in Table 1 to another Table 2 and append back to Table 1?

I have a table with a list of currencies as column in df1['Ccy]
Ccy
0 GBP
1 GBP
2 USD
3 EUR
4 GBP
5 USD
I have a second table that has values, df2:
Ccy FX Rate
0 USD 0.750244
1 JPY 0.007196
2 GBP 1.000000
3 EUR 0.893390
How can I create a mapping as a new column in df1 that has the FX Rate column values per respective currency, e.g:
Ccy FX Rate
0 GBP 1.000000
1 GBP 1.000000
2 USD 0.750244
3 EUR 0.893390
4 GBP 1.000000
5 USD 0.750244
I could do a mapping like the below but it replaces original currencies rather than creates a new column with the mapped numerical values:
rename_dict = df2.set_index('Ccy').to_dict()['FX Rate']
df1 = df1.replace(rename_dict)
Want to simply add the mappings as a new column to the original df1.
Thanks!
You can merge two dataframes on "Ccy".
newdf = pd.merge(df1,df2,on="Ccy",how="left")

How can I filter an pandas dataframe with another Smaller pandas dataframe

I have 2 Data frames the first looks like this
df1:
MONEY Value
0 EUR 850
1 USD 750
2 CLP 1
3 DCN 1
df2:
Money
0 USD
1 USD
2 USD
3 USD
4 EGP
... ...
25984 USD
25985 DCN
25986 USD
25987 CLP
25988 USD
I want to remove the "Money" values of df2 that are not present in df1. and add any column of the values of the "Value" column in df1
Money Value
0 USD 720
1 USD 720
2 USD 720
3 USD 720
... ...
25984 USD 720
25985 DCN 1
25986 USD 720
25987 CLP 1
25000 USD 720
Step by step:
df1.set_index("MONEY")["Value"]
This code transforms the column MONEY into the Dataframe index. Which results in:
print(df1)
MONEY
EUR 850
USD 150
DCN 1
df2["Money"].map(df1.set_index("MONEY")["Value"])
This code maps the content of df2 to df1. This returns the following:
0 150.0
1 NaN
2 850.0
3 NaN
Now we assign the previous column to a new column in df2 called Value. Putting it all together:
df2["Value"] = df2["Money"].map(df1.set_index("MONEY")["Value"])
df2 now looks like:
Money Value
0 USD 150.0
1 GBP NaN
2 EUR 850.0
3 CLP NaN
Only one thing is left to do: Delete any rows that have NaN value:
df2.dropna(inplace=True)
Entire code sample:
import pandas as pd
# Create df1
x_1 = ["EUR", 850], ["USD", 150], ["DCN", 1]
df1 = pd.DataFrame(x_1, columns=["MONEY", "Value"])
# Create d2
x_2 = "USD", "GBP", "EUR", "CLP"
df2 = pd.DataFrame(x_2, columns=["Money"])
# Create new column in df2 called 'Value'
df2["Value"] = df2["Money"].map(df1.set_index("MONEY")["Value"])
# Drops any rows that have 'NaN' in column 'Value'
df2.dropna(inplace=True)
print(df2)
Outputs:
Money Value
0 USD 150.0
2 EUR 850.0

Pandas DataFrame currency conversion

I have DataFrame with two columns:
col1 | col2
20 EUR
31 GBP
5 JPY
I may have 10000 rows like this
How to do fast currency conversion to base currency being GBP?
should I use easymoney?
I know how to apply conversion to single row but I do not know how to iterate through all the rows fast.
EDIT:
I would like to apply sth as:
def convert_currency(amount, currency_symbol):
converted = ep.currency_converter(amount=1000, from_currency=currency_symbol, to_currency="GBP")
return converted
df.loc[df.currency != 'GBP', 'col1'] = convert_currency(currency_data.col1, df.col2
)
but it does not work yet.
Join a third column with the conversion rates for each currency, joining on the currency code in col2. Then create a column with the translated amount.
dfRate:
code | rate
EUR 1.123
USD 2.234
df2 = pd.merge(df1, dfRate, how='left', left_on=['col2'], right_on=['code'])
df2['translatedAmt'] = df2['col1'] / df2['rate']
df = pd.DataFrame([[20, 'EUR'], [31, 'GBP'], [5, 'JPY']], columns=['value', 'currency'])
print df
value currency
0 20 EUR
1 31 GBP
2 5 JPY
def convert_to_gbp(args): # placeholder for your fancy conversion function
amount, currency = args
rates = {'EUR': 2, 'JPY': 10, 'GBP': 1}
return rates[currency] * amount
df.assign(**{'In GBP': df.apply(convert_to_gbp, axis=1)})
value currency In GBP
0 20 EUR 40
1 31 GBP 31
2 5 JPY 50

Categories