I have the following table and want to calculate the rolling beta of each stock with respect to QQQ.
Date
AAPL
AMD
NVDA
SPY
QQQ
20200121
316.550
51.050
256.930
223.340
331.310
20200123
319.290
51.700
259.940
224.530
331.740
20200127
308.960
49.260
266.680
218.050
323.520
20200129
324.320
47.510
265.750
221.720
326.620
20200131
309.330
46.970
268.050
219.010
321.740
20200204
318.860
49.470
264.540
227.420
329.080
20200205
321.580
49.850
266.720
228.250
332.820
20200206
325.210
49.320
268.340
230.170
333.960
20200210
321.540
52.270
271.760
231.960
334.690
I am trying the
df = df.pct_change()
for i in df.columns:
df[f'{i}_Beta'] = df['QQQ'].rolling(10).cov(df[i])
Along the lines, but cant figure out the proper output. Need Help.
Related
I am working with stock data coming from Yahoo Finance.
def load_y_finance_data(y_finance_tickers: list):
df = pd.DataFrame()
print("Loading Y-Finance data ...")
for ticker in y_finance_tickers:
df[ticker.replace("^", "")] = yf.download(
ticker,
auto_adjust=True, # only download adjusted data
progress=False,
)["Close"]
print("Done loading Y-Finance data!")
return df
x = load_y_finance_data(["^VIX", "^GSPC"])
x
VIX GSPC
Date
1990-01-02 17.240000 359.690002
1990-01-03 18.190001 358.760010
1990-01-04 19.219999 355.670013
1990-01-05 20.110001 352.200012
1990-01-08 20.260000 353.790009
DataSize=(8301, 2)
Here I want to perform a sliding window operation for every 50 days period, where I want to get correlation (using corr() function) for 50 days slice (day_1 to day_50) of data and after window will move by one day (day_2 to day_51) and so on.
I tried the naive way of using a for loop to do this and it works as well. But it takes too much time. Code below-
data_size = len(x)
period = 50
df = pd.DataFrame()
for i in range(data_size-period):
df.loc[i, "GSPC_VIX_corr"] = x[["GSPC", "VIX"]][i:i+period].corr().loc["GSPC", "VIX"]
df
GSPC_VIX_corr
0 -0.703156
1 -0.651513
2 -0.602876
3 -0.583256
4 -0.589086
How can I do this more efficiently? Is there any built-in way I can use?
Thanks :)
You can use the rolling windows functionality of Pandas with many different aggreggations, including corr(). Instead of your for loop, do this:
x["VIX"].rolling(window=period).corr(x["GSPC"])
I'm following a tutorial on using Yfinance in Jupyter Notebook to get prices for SPY (S&P 500) in a dataframe. The code looks simple, but I can't seem to get the desired results.
df_tickers = pd.DataFrame()
spyticker = yf.Ticker("SPY")
print(spyticker)
df_ticker = spyticker.history(period="max", interval="1d", start="1998-12-01", end="2022-01-01" , auto_adjust=True, rounding=True)
df_ticker.head()
The error states: "SPY: No data found for this date range, symbol may be delisted." But when I print spyticker, I get the correct yfinance object:
yfinance.Ticker object <SPY>
I am not sure what your problem is but if I use the following:
spyticker = yf.Ticker("SPY")
df_ticker = spyticker.history(period="max", interval="1d", start="1998-12-01", end="2022-01-01" , auto_adjust=True, rounding=True)
df_ticker.head()
I get the following:
Open High Low Close Volume Dividends Stock Splits
Date
1998-12-01 76.02 77.27 75.43 77.00 8950600 0.0 0
1998-12-02 76.74 77.19 75.94 76.78 7495500 0.0 0
1998-12-03 76.76 77.45 75.35 75.51 12145300 0.0 0
1998-12-04 76.35 77.58 76.27 77.49 10339500 0.0 0
1998-12-07 77.29 78.21 77.25 77.86 4290000 0.0 0
My only explanation is that the call to spyticker.history already returns a dataframe, so it isn't necessary to define the df_ticker beforehand.
So I have an excel file containing data on a specific stock.
My excel file contains about 2 months of data, it monitors the Open price, Close price, High Price, Low Price and Volume of trades in 5 minute intervals, so there are about 3000 rows in my file.
I want to calculate the RSI (or EMA if it's easier) of a stock daily, I'm making a summary table that collects the daily data so it converts my table of 3000+ rows into a table with only about 60 rows (each row represents one day).
Essentially I want some sort of code that sorts the excel data by date then calculates the RSI as a single value for that day. RSI is given by: 100-(100/(1+RS)) where RS = average gain of up periods/average loss of down periods.
Note: My excel uses 'Datetime' so each row's 'Datetime' looks something like '2022-03-03 9:30-5:00' and the next row would be '2022-03-03 9:35-5:00', etc. So the code needs to just look at the date and ignore the time I guess.
Some code to maybe help understand what I'm looking for:
So here I'm calling my excel file, I want the code to take the called excel file, group data by date and then calculate the RSI of each day using the formula I wrote above.
dat = pd.read_csv('AMD_5m.csv',index_col='Datetime',parse_dates=['Datetime'],
date_parser=lambda x: pd.to_datetime(x, utc=True))
dates = backtest.get_dates(dat.index)
#create a summary table
cols = ['Num. Obs.', 'Num. Trade', 'PnL', 'Win. Ratio','RSI'] #add addtional fields if necessary
summary_table = pd.DataFrame(index = dates, columns=cols)
# loop backtest by dates
This is the code I used to fill out the other columns in my summary table, I'll put my SMA (simple moving average) function below.
for d in dates:
this_dat = dat.loc[dat.index.date==d]
#find the number of observations in date d
summary_table.loc[d]['Num. Obs.'] = this_dat.shape[0]
#get trading (i.e. position holding) signals
signals = backtest.SMA(this_dat['Close'].values, window=10)
#find the number of trades in date d
summary_table.loc[d]['Num. Trade'] = np.sum(np.diff(signals)==1)
#find PnLs for 100 shares
shares = 100
PnL = -shares*np.sum(this_dat['Close'].values[1:]*np.diff(signals))
if np.sum(np.diff(signals))>0:
#close position at market close
PnL += shares*this_dat['Close'].values[-1]
summary_table.loc[d]['PnL'] = PnL
#find the win ratio
ind_in = np.where(np.diff(signals)==1)[0]+1
ind_out = np.where(np.diff(signals)==-1)[0]+1
num_win = np.sum((this_dat['Close'].values[ind_out]-this_dat['Close'].values[ind_in])>0)
if summary_table.loc[d]['Num. Trade']!=0:
summary_table.loc[d]['Win. Ratio'] = 1. *num_win/summary_table.loc[d]['Num. Trade']
This is my function for calculating Simple Moving Average. I was told to try and adapt this for RSI or for EMA (Exponential Moving Average). Apparently adapting this for EMA isn't too troublesome but I can't figure it out.
def SMA(p,window=10,signal_type='buy only'):
#input price "p", look-back window "window",
#signal type = buy only (default) --gives long signals, sell only --gives sell signals, both --gives both long and short signals
#return a list of signals = 1 for long position and -1 for short position
signals = np.zeros(len(p))
if len(p)<window:
#no signal if no sufficient data
return signals
sma = list(np.zeros(window)+np.nan) #the first few prices does not give technical indicator values
sma += [np.average(p[k:k+window]) for k in np.arange(len(p)-window)]
for i in np.arange(len(p)-1):
if np.isnan(sma[i]):
continue #skip the open market time window
if sma[i]<p[i] and (signal_type=='buy only' or signal_type=='both'):
signals[i] = 1
elif sma[i]>p[i] and (signal_type=='sell only' or signal_type=='both'):
signals[i] = -1
return signals
I have two solutions to this. One is to loop through each group, then add the relevant data to the summary_table, the other is to calculate the whole series and set the RSI column as this.
I first recreated the data:
import yfinance
import pandas as pd
# initially created similar data through yfinance,
# then copied this to Excel and changed the Datetime column to match yours.
df = yfinance.download("AAPL", period="60d", interval="5m")
# copied it and read it as a dataframe
df = pd.read_clipboard(sep=r'\s{2,}', engine="python")
df.head()
# Datetime Open High Low Close Adj Close Volume
#0 2022-03-03 09:30-05:00 168.470001 168.910004 167.970001 168.199905 168.199905 5374241
#1 2022-03-03 09:35-05:00 168.199997 168.289993 167.550003 168.129898 168.129898 1936734
#2 2022-03-03 09:40-05:00 168.119995 168.250000 167.740005 167.770004 167.770004 1198687
#3 2022-03-03 09:45-05:00 167.770004 168.339996 167.589996 167.718094 167.718094 2128957
#4 2022-03-03 09:50-05:00 167.729996 167.970001 167.619995 167.710007 167.710007 968410
Then I formatted the data and created the summary_table:
df["date"] = pd.to_datetime(df["Datetime"].str[:16], format="%Y-%m-%d %H:%M").dt.date
# calculate percentage change from open and close of each row
df["gain"] = (df["Close"] / df["Open"]) - 1
# your summary table, slightly changing the index to use the dates above
cols = ['Num. Obs.', 'Num. Trade', 'PnL', 'Win. Ratio','RSI'] #add addtional fields if necessary
summary_table = pd.DataFrame(index=df["date"].unique(), columns=cols)
Option 1:
# loop through each group, calculate the average gain and loss, then RSI
for grp, data in df.groupby("date"):
# average gain for gain greater than 0
average_gain = data[data["gain"] > 0]["gain"].mean()
# average loss for gain less than 0
average_loss = data[data["gain"] < 0]["gain"].mean()
# add to relevant cell of summary_table
summary_table["RSI"].loc[grp] = 100 - (100 / (1 + (average_gain / average_loss)))
Option 2:
# define a function to apply in the groupby
def rsi_calc(series):
avg_gain = series[series > 0].mean()
avg_loss = series[series < 0].mean()
return 100 - (100 / (1 + (avg_gain / avg_loss)))
summary_table["RSI"] = df.groupby("date")["gain"].apply(lambda x: rsi_calc(x))
Output (same for each):
summary_table.head()
# Num. Obs. Num. Trade PnL Win. Ratio RSI
#2022-03-03 NaN NaN NaN NaN -981.214015
#2022-03-04 NaN NaN NaN NaN 501.950956
#2022-03-07 NaN NaN NaN NaN -228.379066
#2022-03-08 NaN NaN NaN NaN -2304.451654
#2022-03-09 NaN NaN NaN NaN -689.824739
I am using yahoo finance in python and when I run the following code:
print(apple.history('max'))
It gives me this output:
Open High ... Dividends Stock Splits
Date ...
1980-12-12 0.100323 0.100759 ... 0.0 0.0
1980-12-15 0.095525 0.095525 ... 0.0 0.0
How do I get the output to show the Low price, Close price, and Volume for each date as many sites show it does? It only shows me the 3 dots in between High and Dividends.
ticker.history() returns a Pandas DataFrame. You can access any column using the column name e. g. 'Low'. A primer on indexing and selecting data can be found in the docs.
By default the number of rows, that are shown of a DataFrame are limited. However, you can disable this limit.
import yfinance as yf
import pandas as pd
apple = yf.Ticker('AAPL')
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(apple.history('max')[['Low', 'Close', 'Volume']])
# or: display(...) if working with jupyter
I have a dataframe df:
Open Volume Adj Close Ticker
Date
2006-11-18 140.750000 45505300 114.480649 SPY
2006-11-18 100.470001 274000 72.382071 AGG
2006-11-19 140.750000 45505300 114.480649 SPY
2006-11-19 100.470001 274000 72.382071 AGG
2006-11-22 140.750000 45505300 114.480649 SPY
2006-11-22 100.470001 274000 72.382071 AGG
I use this cmd to select the row want I want:
"2006-11-22” is today.
df[df.index==today]
But I finally want to get the row 2006-11-19(Previous trade day).
I don't know the previos trade day is "2006-11-19".
I only know today is "2006-11-22”.
How to write this code?
Thank you very much.