Stacking column indices on top of one another using Pandas - python

I'm looking to stack the indices of some columns on top of one another, this is what I currently have:
Buy Buy Currency Sell Sell Currency
Date
2013-12-31 100 CAD 100 USD
2014-01-02 200 USD 200 CAD
2014-01-03 300 CAD 300 USD
2014-01-06 400 USD 400 CAD
This is what I'm looking to achieve:
Buy/Sell Buy/Sell Currency
100 USD
100 CAD
200 CAD
200 USD
300 USD
300 CAD
And so on, Basically want to take the values in "Buy" and "Buy Currency" and stack their values in the "Sell" and "Sell Currency" columns, one after the other.
And so on. I should mention that my data frame has 10 columns in total so using
df_pl.stack(level=0)
doesn't seem to work.

One option is with pivot_longer from pyjanitor, where for this particular use case, you pass a list of regular expressions (to names_pattern) to aggregate the desired column labels into new groups (in names_to):
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(index=None,
names_to = ['Buy/Sell', 'Buy/Sell Currency'],
names_pattern = [r"Buy$|Sell$", ".+Currency$"],
ignore_index = False,
sort_by_appearance=True)
Buy/Sell Buy/Sell Currency
Date
2013-12-31 100 CAD
2013-12-31 100 USD
2014-01-02 200 USD
2014-01-02 200 CAD
2014-01-03 300 CAD
2014-01-03 300 USD
2014-01-06 400 USD
2014-01-06 400 CAD

using concat
import pandas as pd
print(pd.concat(
[df['Buy'], df['sell']], axis=1
).stack().reset_index(1, drop=True).rename(index='buy/sell')
)
output:
0 100
0 100
1 200
1 200
2 300
2 300
3 400
3 400

# assuming that your data has date as index.
df.set_index('date', inplace=True)
# create a mapping to new column names
d={'Buy Currency': 'Buy/Sell Currency',
'Sell Currency' : 'Buy/Sell Currency',
'Buy' : 'Buy/Sell',
'Sell' :'Buy/Sell'
}
df.columns=df.columns.map(d)
# stack first two columns over the next two columns
out=pd.concat([ df.iloc[:,:2],
df.iloc[:,2:]
],
ignore_index=True
)
out
Buy/Sell Buy/Sell Currency
0 100 CAD
1 200 USD
2 300 CAD
3 400 USD
4 100 USD
5 200 CAD
6 300 USD
7 400 CAD

Related

merging quarterly and monthly data while doing ffill on multiindex

I am trying to merge a quarterly series and a monthly series, and in the process essentially "downsampling" the quarterly series. Both dataframes contain a DATE column, BANK, and the remaining columns are various values either in a monthly or quarterly format. The complication I have had is that it is a multiindex, so if I try:
merged_data=df1.join(df2).reset_index(['DATE', 'BANK_CODE']).ffill()
the forward fill for quarterly data up to the last monthly datapoint is not done for each respective bank as I intended. Could anyone help with this please? Note: I have also tried to resample the quarterly dataframe separately, however I do not know of a way to downsample it to a monthly level until a certain date (should be the latest date in the monthly data).
df2 = df2.set_index(['DATE']).groupby(['BANK']).resample('M')['VALUE'].ffill()
df1:
Date Bank Value1 Value2
2021-06-30 bank 1 2000 7000
2021-07-31 bank 1 3000 2000
2021-06-30 bank 2 6000 9000
df2:
Date Bank Value1 Value2
2021-06-30 bank 1 2000 5000
2021-09-30 bank 1 5000 4000
2021-06-30 bank 2 9000 10000
HERE IS A MINI EXAMPLE
Using the data provided, assuming df1 is monthly and df2 is quarterly.
Set index and resample your quarterly data to monthly:
# monthly data
x1 = df1.set_index(['Bank','Date'])
# quarterly data, resampling back to monthly
x2 = ( df2.set_index('Date')
.groupby('Bank')
.resample('M')
.ffill()
.drop(columns='Bank')
)
Merge both - I assume you want the product, not the union:
x1.join(x2, lsuffix='_m', rsuffix='_q', how='outer').fillna(0)
Value1_m Value2_m Value1_q Value2_q
Bank Date
bank 1 2021-06-30 2000.0 7000.0 2000 5000
2021-07-31 3000.0 2000.0 2000 5000
2021-08-31 0.0 0.0 2000 5000
2021-09-30 0.0 0.0 5000 4000
bank 2 2021-06-30 6000.0 9000.0 9000 10000
The _m suffices are the values from df1, _q are from df2. I'm assuming you'll know how to explain or deal with the differences between monthly and quarterly values on the same dates.
As you can see, no need to specify the interval, this is provided automatically.

Multiplying columns in separate pandas dataframes

I'm trying to multiply data from 2 different dataframes and my code as below:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'v_contract_number': ['VN120001438','VN120001439',
'VN120001440','VN120001438',
'VN120001439','VN120001440'],
'Currency': ['VND','USD','KRW','USD','KRW','USD'],
'Amount': [10000,5000,6000,200,150,175]})
df2 = pd.DataFrame({'Currency': ['VND','USD','KRW'],'Rate': [1,23000,1200]})
print(df1)
# df1
v_contract_number Currency Amount
0 VN120001438 VND 10000
1 VN120001439 USD 5000
2 VN120001440 KRW 6000
3 VN120001438 USD 200
4 VN120001439 KRW 150
5 VN120001440 USD 175
print(df2)
Currency Rate
0 VND 1
1 USD 23000
2 KRW 1200
df1 = df1.merge(df2)
df1['VND AMount'] = df1['Amount'].mul(df1['Rate'])
df1.drop('Rate', axis=1, inplace=True)
print(df1)
# result
v_contract_number Currency Amount VND AMount
0 VN120001438 VND 10000 10000
1 VN120001439 USD 5000 115000000
2 VN120001438 USD 200 4600000
3 VN120001440 USD 175 4025000
4 VN120001440 KRW 6000 7200000
5 VN120001439 KRW 150 180000
This is exactly what I want but I would like to know that have another way to not merge and drop as I did?
The reason that I drop ‘Rate’ because I dont want it appears in my report.
Thanks and best regards
You can use pandas' map for this:
df2 = df2.set_index('Currency').squeeze() # squeeze converts to a Series
df1.assign(VND_Amount = df1.Amount.mul(df1.Currency.map(df2)))
v_contract_number Currency Amount VND_Amount
0 VN120001438 VND 10000 10000
1 VN120001439 USD 5000 115000000
2 VN120001440 KRW 6000 7200000
3 VN120001438 USD 200 4600000
4 VN120001439 KRW 150 180000
5 VN120001440 USD 175 4025000
You can avoid the drop by not overwriting df1 on the merge operation:
df1["VND Amount"] = df1.merge(df2, on="Currency").eval("Amount * Rate")
Alternatively you can use .reindex to align df2 to df1 based on the currency column:
df1["VND Amount"] = (
df1["Amount"] *
(df2.set_index("Currency")["Rate"] # set the index and return Rate column
.reindex(df1["Currency"]) # align "Rate" values to df1 "Currency"
.to_numpy() # get numpy array to avoid pandas
# auto alignment on math ops
)
)

How can I filter an pandas dataframe with another Smaller pandas dataframe

I have 2 Data frames the first looks like this
df1:
MONEY Value
0 EUR 850
1 USD 750
2 CLP 1
3 DCN 1
df2:
Money
0 USD
1 USD
2 USD
3 USD
4 EGP
... ...
25984 USD
25985 DCN
25986 USD
25987 CLP
25988 USD
I want to remove the "Money" values of df2 that are not present in df1. and add any column of the values of the "Value" column in df1
Money Value
0 USD 720
1 USD 720
2 USD 720
3 USD 720
... ...
25984 USD 720
25985 DCN 1
25986 USD 720
25987 CLP 1
25000 USD 720
Step by step:
df1.set_index("MONEY")["Value"]
This code transforms the column MONEY into the Dataframe index. Which results in:
print(df1)
MONEY
EUR 850
USD 150
DCN 1
df2["Money"].map(df1.set_index("MONEY")["Value"])
This code maps the content of df2 to df1. This returns the following:
0 150.0
1 NaN
2 850.0
3 NaN
Now we assign the previous column to a new column in df2 called Value. Putting it all together:
df2["Value"] = df2["Money"].map(df1.set_index("MONEY")["Value"])
df2 now looks like:
Money Value
0 USD 150.0
1 GBP NaN
2 EUR 850.0
3 CLP NaN
Only one thing is left to do: Delete any rows that have NaN value:
df2.dropna(inplace=True)
Entire code sample:
import pandas as pd
# Create df1
x_1 = ["EUR", 850], ["USD", 150], ["DCN", 1]
df1 = pd.DataFrame(x_1, columns=["MONEY", "Value"])
# Create d2
x_2 = "USD", "GBP", "EUR", "CLP"
df2 = pd.DataFrame(x_2, columns=["Money"])
# Create new column in df2 called 'Value'
df2["Value"] = df2["Money"].map(df1.set_index("MONEY")["Value"])
# Drops any rows that have 'NaN' in column 'Value'
df2.dropna(inplace=True)
print(df2)
Outputs:
Money Value
0 USD 150.0
2 EUR 850.0

Pandas Pivot Table sum based on other column (as though had two indexes)

I have a database with transfers orders between two cities. I have, in each record, a departure date, the amount to be delivered, a returning date and the amount to be returned.
The database is something like this:
df = pd.DataFrame({"dep_date":[201701,201701,201702,201703], "del_amount":[100,200,300,400],"ret_date":[201703,201702,201703,201705], "ret_amount":[50,75,150,175]})
df
dep_date del_amount ret_date ret_amount
0 201701 100 201703 50
1 201701 200 201702 75
2 201702 300 201703 150
3 201703 400 201705 175
I want to get a pivot table with dep_data as index, showing the sum of del_amount in that month and the returned amount scheduled for the same month of departure date.
It's an odd construction, cause it seems to has two indexes. The result that I need is:
del_amount ret_amount
dep_date
201701 300 0
201702 300 75
201703 400 200
Note that some returning dates does not match with any departure month. Does anyone know if it is possible to build a proper aggfunc in pivot_table enviroment to achieve this? If it is not possible, can anyone tell me the best approach?
Thanks in advance
You'll need two groupby + sum operations, followed by a reindex and concatenation -
i = df.groupby(df.dep_date % 100)['del_amount'].sum()
j = df.groupby(df.ret_date % 100)['ret_amount'].sum()
pd.concat([i, j.reindex(i.index, fill_value=0)], 1)
del_amount ret_amount
dep_date
1 300 0
2 300 75
3 400 200
If you want to group on the entire date (and not just the month number), change df.groupby(df.dep_date % 100) to df.groupby('dep_date').
Use
In [97]: s1 = df.groupby('dep_date')['del_amount'].sum()
In [98]: s2 = df.groupby('ret_date')['ret_amount'].sum()
In [99]: s1.to_frame().join(s2.rename_axis('dep_date')).fillna(0)
Out[99]:
del_amount ret_amount
dep_date
201701 300 0.0
201702 300 75.0
201703 400 200.0
split it into two df, then we calculation for each of them , then we do join
s=df.loc[:,df.columns.str.startswith('de')]
v=df.loc[:,df.columns.str.startswith('ret')]
s.set_index('dep_date').sum(level=0).join(v.set_index('ret_date').sum(level=0)).fillna(0)
Out[449]:
del_amount ret_amount
dep_date
201701 300 0.0
201702 300 75.0
201703 400 200.0

Matching and adding column to data frame

I am going crazy about this one. I am trying to add a new column to a data frame DF1, based on values found in another data frame DF2. This is how they look,
DF1=
Date Amount Currency
0 2014-08-20 -20000000 EUR
1 2014-08-20 -12000000 CAD
2 2014-08-21 10000 EUR
3 2014-08-21 20000 USD
4 2014-08-22 25000 USD
DF2=
NAME OPEN
0 EUR 10
1 CAD 20
2 USD 30
Now, I would like to create a new column in DF1, named 'Amount (Local)', where each amount in 'Amount' is multipled with the correct matching value found in DF2 yielding a result,
DF1=
Date Amount Currency Amount (Local)
0 2014-08-20 -20000000 EUR -200000000
1 2014-08-20 -12000000 CAD -240000000
2 2014-08-21 10000 EUR 100000
3 2014-08-21 20000 USD 600000
4 2014-08-22 25000 USD 750000
If there exists a method for adding a column to DF1 based on a function, instead of just multiplication as the above problem, that would be very much appreciated also.
Thanks,
You can use map from a dict of your second df (in my case it is called df1. yours is DF2), and then multiply the result of this by the amount:
In [65]:
df['Amount (Local)'] = df['Currency'].map(dict(df1[['NAME','OPEN']].values)) * df['Amount']
df
Out[65]:
Date Amount Currency Amount (Local)
index
0 2014-08-20 -20000000 EUR -200000000
1 2014-08-20 -12000000 CAD -240000000
2 2014-08-21 10000 EUR 100000
3 2014-08-21 20000 USD 600000
4 2014-08-22 25000 USD 750000
So breaking this down, map will match the value against the value in the dict key, in this case we are matching Currency against the NAME key, the value in the dict is the OPEN values, the result of this would be:
In [66]:
df['Currency'].map(dict(df1[['NAME','OPEN']].values))
Out[66]:
index
0 10
1 20
2 10
3 30
4 30
Name: Currency, dtype: int64
We then simply multiply this series against the Amount column from df (DF1 in your case) to get the desired result.
Use fancy-indexing to create a currency array aligned with your data in df1, then use it in multiplication, and assign the result to a new column in df1:
import pandas as pd
ccy_series = pd.Series([10,20,30], index=['EUR', 'CAD', 'USD'])
df1 = pd.DataFrame({'amount': [-200, -120, 1, 2, 2.5], 'ccy': ['EUR', 'CAD', 'EUR', 'USD', 'USD']})
aligned_ccy = ccy_series[df1.ccy].reset_index(drop=True)
aligned_ccy
=>
0 10
1 20
2 10
3 30
4 30
dtype: int64
df1['amount_local'] = df1.amount *aligned_ccy
df1
=>
amount ccy amount_local
0 -200.0 EUR -2000
1 -120.0 CAD -2400
2 1.0 EUR 10
3 2.0 USD 60
4 2.5 USD 75

Categories