I am seeking to populate a pandas dataframe row-by-row, whereby each new row is calculated on the basis of the contents of the previous row. I am using this for simple financial projections.
Let us take a dataframe 'df_basic_financials':
df_basic_financials = pd.DataFrame({'current_account': [18357.], 'savings_account': [14809.]})
Now I want to forecast what my current and saving accounts will look like in five years, assuming that I earn 24000 a year and that my saving accounts yields 2% yearly, assuming I spend zero money and do not transfer any money to my savings account.
How do I write the code so that I get this:
current_account savings_account
0 18357 14809
1 42357 15105.18
2 66357 15407.2836
etc... for any number of years I want, each time using the calculation 'value of the previous row in the same column + 24000' for current_account and 'value of the previous row in the same column*1.02' for savings_account.
You can get the input from user on number of years and then run the code this way
import pandas as pd
df = pd.DataFrame({'current_account': [18357], 'savings_account':[14809]})
years = int(input("Enter years: "))
for n in range(years):
lastrow = df.iloc[len(df)-1]
print(lastrow[0], lastrow[1])
df.loc[len(df.index)] = [int(lastrow[0]) +24000, int(lastrow[1])*1.02]
df
Out will be....
Just use math
df_basic_financials = pd.DataFrame({'current_account': [18357.], 'savings_account': [14809.]})
current_account_projection = [df_basic_financials['current_account'].iloc[-1] + (24000 * i) for i in range(10)]
savings_account_projection = [df_basic_financials['savings_account'].iloc[-1] * (1.02 ** i) for i in range(10)]
df_basic_financials = pd.DataFrame({'current_account': current_account_projection, 'savings_account': savings_account_projection})
if you really want an interative solution, apply the function on savings_account.iloc[-1]
current_account_next = df_basic_financials.iloc[-1]['current_account'] + 24000
savings_account_next = df_basic_financials.iloc[-1]['savings_account'] * 1.02
df_basic_financials = df_basic_financials.append(pd.Series({'current_account': current_account_next, 'savings_account': savings_account_next}))
Related
I started (based on the idea shown in this model an actuarial project in Python in which I want to simulate, based on a set of inputs and adding (as done here: https://github.com/Saurabh0503/Financial-modelling-and-valuationn/blob/main/Dynamic%20Salary%20Retirement%20Model%20Internal%20Randomness.ipynb) some degree of internal randomness, how much it will take for an individual to retire, with a certain amount of wealth and a certain amount of annual salary and by submitting a certain annual payment (calculated as the desired cash divided by the years that will be necessary to retire). In my model's variation, the user can define his/her own parameters, making the model more flexible and user friendly; and there is a function that calculates the desired retirement cash based on individual's propensity both to save and spend.
The problem is that since I want to summarize (by taking the mean, max, min and std. deviation of wealth, salary and years to retirement) the output I obtain from the model, I have to save results (and to recall them) when I need to do so; but I don't have idea of what to do in order to accomplish this task.
I tried this solution, consisting in saving the simultation's output in a pandas dataframe. In particular I wrote that function:
def get_salary_wealth_year_case_df(data):
all_ytrs = []
salary = []
wealth = []
annual_payments = []
for i in range(data.n_iter):
ytr = years_to_retirement(data, print_output=False)
sal = salary_at_year(data, year, case, print_output=False)
wlt = wealth_at_year(data, year, prior_wealth, case, print_output=False)
pmt = annual_pmts_case_df(wealth_at_year, year, case, print_output=False)
all_ytrs.append(ytr)
salary.append(sal)
annual_payments.append(pmt)
df = pd.DataFrame()
df['Years to Retirement'] = all_ytrs
df['Salary'] = sal
df['Wealth'] = wlt
df['Annual Payments'] = pmt
return df
I need a feedback about what I'm doing. Am I doing it right? If so, are there more efficient ways to do so? If not, what should I do? Thanks in advance!
Given the inputs used for the function, I'm assuming your code (as it is) will do just fine in terms of computation speed.
As suggested, you can add a saving option to your function so the results that are being returned are stored in a .csv file.
def get_salary_wealth_year_case_df(data, path):
all_ytrs = []
salary = []
wealth = []
annual_payments = []
for i in range(data.n_iter):
ytr = years_to_retirement(data, print_output=False)
sal = salary_at_year(data, year, case, print_output=False)
wlt = wealth_at_year(data, year, prior_wealth, case, print_output=False)
pmt = annual_pmts_case_df(wealth_at_year, year, case, print_output=False)
all_ytrs.append(ytr)
salary.append(sal)
annual_payments.append(pmt)
df = pd.DataFrame()
df['Years to Retirement'] = all_ytrs
df['Salary'] = sal
df['Wealth'] = wlt
df['Annual Payments'] = pmt
# Save the dataframe to a given path inside your workspace
df.to_csv(path, header=False)
return df
After saving, returning the object might be optional. This depends on if you are going to use this dataframe on your code moving forward.
i have 2 df
df_hist: daily data of share values
df_buy_data: date when share were bought
i want to add the share holdings to df_hist for each data, which calaculate from df_buy_data depending on the date. in my version i have to iterate over the dataframe which works but i guess not so nice...
hist_data={'Date':['2022-01-01','2022-01-02','2022-01-03','2022-01-04'],'Value':[23,22,21,24]}
df_hist=pd.DataFrame(hist_data)
buy_data={'Date':['2022-01-01','2022-01-04'],'Ticker': ['Index1', 'Index1'], 'NumberOfShares':[15,29]}
df_buy_data = pd.DataFrame(buy_data)
for i, historical_row in df_hist.iterrows():
ticker_count = df_buy_data.loc[(df_buy_data['Date'] <= historical_row['Date'])]\
.groupby('Ticker').sum()['NumberOfShares']
if(len(ticker_count)>0):
df_hist.at[i,'Index1_NumberOfShares'] = ticker_count.item()
else:
df_hist.at[i, 'Index1_NumberOfShares'] = 0
df_hist
how can i impove this?
thanks for the help!
Hello, I have the following code:
# Import Libraies
import numpy as np
import pandas as pd
import datetime as dt
# Conect to Drive
from google.colab import drive
drive.mount('/content/drive')
# Read Data
ruta = '/content/drive/MyDrive/example.csv'
df = pd.read_csv(ruta)
df.head(15)
d = pd.date_range(start="2015-01-01",end="2022-01-01", freq='MS')
dates = pd.DataFrame({"DATE":d})
df["DATE"] = pd.to_datetime(df["DATE"])
df_merge = pd.merge(dates, df, how='outer', on='DATE')
The data that I am using, you could download here: DATA
What I am trying to achieve is something known as Rolling Year.
First I create this metric gruped for each category:
# ROLLING YEAR
##################################################################################################
# I want to make a Roling Year for each category. Thats mean how much sell each category since 12 moths ago TO current month
# RY_ACTUAL One year have 12 months so I pass as parameter in the rolling 12
f = lambda x:x.rolling(12).sum()
df_merge["RY_ACTUAL"] = df_merge.groupby(["CATEGORY"])['Sales'].apply(f)
# RY_24 I create a rolling with 24 as parameter to compare actual RY vs last RY
f_1 = lambda x:x.rolling(24).sum()
df_merge["RY_24"] = df_merge.groupby(["CATEGORY"])['Sales'].apply(f_1)
#RY_LAST Substract RY_24 - RY_Actual to get the correct amount. Thats mean amount of RY vs the amount of RY-1
df_merge["RY_LAST"] = df_merge["RY_24"] - df_merge["RY_ACTUAL"]
##################################################################################################
df_merge.head(30)
And it works perfectly, ´cause if you download the file and then filter for example for "Blue" category, you could see, something like this:
Thats mean, if you stop in the row 2015-November, you could see in the column RY_ACTUAL the sum of all the values 12 records before.
Mi next goal is to create a similar column using the rollig function but with the next condition:
The column must sum all the sales of ALL the categories, as long as
the Color/Animal column is equal to Colour. For example if I am
stopped in 2016-December, it should give me the sum of ALL the sales
of the colors from 2016-January to 2016-December
This was my attempt:
df_merge.loc[(df_merge['Colour/Animal'] == 'Colour'),'Sales'].apply(f)
Cold anyone help me to code correctly this example?.
Thanks in advance comunity!!!
I am writing an emulation of a bank deposit account in pandas.
I got stuck with Compound interest (It is the result of reinvesting interest, so that interest in the next period is then earned on the principal sum plus previously accumulated interest.)
So far I have the following code:
import pandas as pd
from pandas.tseries.offsets import MonthEnd
from datetime import datetime
# Create a date range
start = '21/11/2017'
now = datetime.now()
date_rng = pd.date_range(start=start, end=now, freq='d')
# Create an example data frame with the timestamp data
df = pd.DataFrame(date_rng, columns=['Date'])
# Add column (EndOfMonth) - shows the last day of the current month
df['LastDayOfMonth'] = pd.to_datetime(df['Date']) + MonthEnd(0)
# Add columns for interest, Sasha, Artem, Total, Description
df['Debit'] = 0
df['Credit'] = 0
df['Total'] = 0
df['Description'] = ''
# Iterate through the DataFrame to set "IsItLastDay" value
for i in df:
df['IsItLastDay'] = (df['LastDayOfMonth'] == df['Date'])
# Add the transaction of the first deposit
df.loc[df.Date == '2017-11-21', ['Debit', 'Description']] = 10000, "First deposit"
# Calculate the principal sum (It the summ of all deposits minus all withdrows plus all compaund interests)
df['Total'] = (df.Debit - df.Credit).cumsum()
# Calculate interest per day and Cumulative interest
# 11% is the interest rate per year
df['InterestPerDay'] = (df['Total'] * 0.11) / 365
df['InterestCumulative'] = ((df['Total'] * 0.11) / 365).cumsum()
# Change the order of columns
df = df[['Date', 'LastDayOfMonth', 'IsItLastDay', 'InterestPerDay', 'InterestCumulative', 'Debit', 'Credit', 'Total', 'Description']]
df.to_excel("results.xlsx")
The output file looks fine, but I need the following:
The "InterestCumulative" column adds to the "Total" column at the last day of each months (compounding the interests)
At the beggining of each month the "InterestCumulative" column should be cleared (Because the interest were added to the Principal sum).
How can I do this?
You're going to need to loop, as your total changes depending on previous rows, which then affects the later rows. As a result your current interest calculations are wrong.
total = 0
cumulative_interest = 0
total_per_day = []
interest_per_day = []
cumulative_per_day = []
for day in df.itertuples():
total += day.Debit - day.Credit
interest = total * 0.11 / 365
cumulative_interest += interest
if day.IsItLastDay:
total += cumulative_interest
total_per_day.append(total)
interest_per_day.append(interest)
cumulative_per_day.append(cumulative_interest)
if day.IsItLastDay:
cumulative_interest = 0
df.Total = total_per_day
df.InterestPerDay = interest_per_day
df.InterestCumulative = cumulative_per_day
This is unfortunately a lot more confusing looking, but that's what happens when values depend on previous values. Depending on your exact requirements there may be nice ways to simplify this using math, but otherwise this is what you've got.
I've written this directly into stackoverflow so it may not be perfect.
I've setup a simulation example below.
Setup:
I have weekly data, say 6 years of data each week of around 1000 stocks some weeks more other weeks less than 1000. I randomly chose 75 stocks at time t0. At t1 some stocks dies (probability p, goes out of fashion) or leave the index (structural such as merging). I need to simulate stocks so that every week I've exactly 75 stocks. Every week some stocks dies (between 0 and 75) and I pick new ones not from the existing 75. I also check if the stock leaves do to structural reasons. Every week I calculate the returns of the 75 stocks.
Questions: Is there an obvious why to improve the speed. I started with Pandas objects (group sort) which was to slow. I haven't tried to parallel the loop. I'm more interesting to hear if I should use numba (but it doesn't have the np.in1d function) or if there is a faster way to shuffle (I actually only need to shuffle the ones). I've also think about creating a fixed array with all stocks id using NaN, the problem here is that I need 75 names so I still need to filter out these NaN every week.
Maybe this is to detailed problem for this forum, I apologize if that's the case
Code:
from timeit import default_timer
import numpy as np
# Create dataset
n_weeks = 312 # Approximately 6 years of weekly data
n_stocks = np.random.normal(1000, 5, n_weeks).astype(dtype=np.uint16) # Around 1000 stocks every week but not fixed
idx_new_week = np.cumsum(np.hstack((0, n_stocks)))
# We give each stock a stock idea
n_obs = n_stocks.sum()
stock_id = np.ones([n_obs], dtype=np.uint16)
for j in range(1, n_weeks+1):
stock_id[idx_new_week[j-1]:idx_new_week[j]] = np.cumsum(np.ones(n_stocks[j-1]))
stock_rtn = np.random.normal(0, 0.25/np.sqrt(52), n_obs) # Simulated forward (one week ahead) return for each stock
# Simulation part
# Week 0 pick randomly 75 stocks
# Week n >=1 a stock dies for two reasons
# 1) randomness (probability 'p')
# 2) structural event (could be merger, fall out of index).
# We cannot assume that it is always the high stockid which dies for structural reasons (as it looks like here)
# If a stock dies we randomely pick a stock from the "deak" stock dataset (not included the ones which dies this week)
n_sim = 100 # I want this to be 1 mill
n_stock_cand = 75 # For this example we pick 75 stocks
p_survial = 0.90
# The weekly periodcal returns
pf_rtn = np.zeros([n_weeks, n_sim])
start = default_timer()
for k in range(0, n_sim):
# Randomely choice n_stock_cand at time zero
boolean_list = np.array([False] * (n_stocks[0] - n_stock_cand) + [True] * n_stock_cand)
np.random.shuffle(boolean_list) # Shuffle the list
stock_id_this_week = stock_id[idx_new_week[0]:idx_new_week[1]][boolean_list]
stock_rtn_this_week = stock_rtn[idx_new_week[0]:idx_new_week[1]][boolean_list]
# This part only simulate the Buzz portfolio names - later we simulate returns and from specific holdings of the 75 names
for j in range(1, n_weeks):
pf_rtn[j-1, k] = stock_rtn_this_week.mean()
# Find the number of stocks to keep
boolean_keep_stocks = np.random.rand(n_stock_cand) < p_survial
# Next we need to check if a stock is still part of the universe next period
stock_cand_temp = stock_id[idx_new_week[j-1]:idx_new_week[j]]
stock_rtn_temp = stock_rtn[idx_new_week[j-1]:idx_new_week[j]]
boolean_keep_stocks = (boolean_keep_stocks) & (np.in1d(stock_id_this_week, stock_cand_temp, assume_unique=True))
n_stocks_to_replace = n_stock_cand - boolean_keep_stocks.sum() # Number of new stocks to pick this week
if n_stocks_to_replace > 0:
# We have to pick from stocks which is not part of the portfolio already
boolean_cand = np.in1d(stock_cand_temp, stock_id_this_week, assume_unique=True, invert=True)
n_stocks_to_pick_from = boolean_cand.sum()
boolean_list = np.array([False] * (n_stocks_to_pick_from - n_stocks_to_replace) + [True] * n_stocks_to_replace)
np.random.shuffle(boolean_list) # Shuffle the list
# First avoid picking the same stock twich, next pick from the unique candidate list
stock_id_new = stock_cand_temp[boolean_cand][boolean_list] # The new stocks
stock_rtn_new = stock_rtn_temp[boolean_cand][boolean_list] # and their returns
stock_id_this_week = np.hstack((stock_id_this_week[boolean_keep_stocks], stock_id_new))
stock_rtn_this_week = np.hstack((stock_rtn_this_week[boolean_keep_stocks], stock_rtn_new))
else:
# No replacement of stocks / all surview but order might differ
boolean_cand = np.in1d(stock_cand_temp, stock_id_this_week, assume_unique=True, invert=False)
stock_id_this_week = stock_cand_temp[boolean_cand]
stock_rtn_this_week = stock_rtn_temp[boolean_cand]
# PnL last period
pf_rtn[n_weeks-1, k] = stock_rtn_this_week.mean()
print(default_timer() - start)