creating signals based on current and prior time periods - python

I'm trying to write a trading algo and I am very new to python.
Lots of things are easy to understand but I get lost easily. I have a strategy I want to use, but the coding is getting in the way.
I want to create two moving averages and when they cross I want that to be a signal.
The part im I am currently struggling with is also including information about the prior period.
When
MovingAverage1( last 10 candles ) == MovingAverage2( Last 20 candles ),
that's a signal,
but is it a buy or sell?
When
MovingAVerage1( last 10 candles after skipping most recent ) > MovingAverage2( last 10 candles after skipping most recent )
then sell.
Here is what I've got so far, where the MA-s I am using are being simplified for this question:
class MyMACrossStrategy (Strategy):
"""
Requires:
symbol - A stock symbol on which to form a strategy on.
bars - A DataFrame of bars for the above symbol.
short_window - Lookback period for short moving average.
long_window - Lookback period for long moving average."""
def __init__(self, symbol, bars, short_window=4, long_window=9):
self.symbol = symbol
self.bars = bars
self.short_window = short_window
self.long_window = long_window
# Function Helper for indicators
def fill_for_noncomputable_vals(input_data, result_data):
non_computable_values = np.repeat(
np.nan, len(input_data) - len(result_data)
)
filled_result_data = np.append(non_computable_values, result_data)
return filled_result_data
def simple_moving_average(data, period):
"""
Simple Moving Average.
Formula:
SUM(data / N)
"""
catch_errors.check_for_period_error(data, period)
# Mean of Empty Slice RuntimeWarning doesn't affect output so it is
# supressed
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=RuntimeWarning)
sma = [np.mean(data[idx-(period-1):idx+1]) for idx in range(0, len(data))]
sma = fill_for_noncomputable_vals(data, sma)
return sma
def hull_moving_average(data, period):
"""
Hull Moving Average.
Formula:
HMA = WMA(2*WMA(n/2) - WMA(n)), sqrt(n)
"""
catch_errors.check_for_period_error(data, period)
hma = wma(
2 * wma(data, int(period/2)) - wma(data, period), int(np.sqrt(period))
)
return hma
def generate_signals(self):
"""Returns the DataFrame of symbols containing the signals
to go long, short or hold (1, -1 or 0)."""
signals = pd.DataFrame(index=self.bars.index)
signals['signal'] = 0.0
# Create the set of moving averages over the
# respective periods
signals['Fast_Line'] = sma(bars['Close'], self.short_window)
signals['Slow_line'] = hma(bars['Close'], self.long_window)
signals1['Fast_Line'] = sma(bars['Close'], self.short_window[-1])
signals1['Slow_line'] = hma(bars['Close'], self.long_window[-1])
# Create a 'signal' (invested or not invested) when the short moving average crosses the long
# moving average, but only for the period greater than the shortest moving average window
signals['signal'][self.short_window:] = np.where(signals['Fast_Line'][self.short_window:]
> signals['Slow_line'][self.short_window:], 1.0, 0.0)
# Take the difference of the signals in order to generate actual trading orders
signals['positions'] = signals['signal'].diff()
if signals['Fast_Line'] = signals['Slow_Line'] and ...
return signals
Hopefully my question makes sense.

I am assuming that you want to test your strategy first before using it in live market. You can download the stock data from yahoo finance in csv format. And you can upload with below code:
import pandas as pd
import numpy as np
data = pd.read_csv('MSFT.csv')
once the data is stored in the pandas dataframe data, you can moving average of the Closing price with following code:
if you are planning the crossover strategy
sma_days=20
lma_days=50
data['SMA_20']=data['Close'].rolling(window=sma_days,center=False).mean()
data['SMA_50']=data['Close'].rolling(window=lma_days,center=False).mean()
data['SIGNAL']=np.where(data['SMA_20']>data['SMA_50'],'BUY','SELL')

Related

Calculating EMA from SMA in python

I've defined a function that calculates SMA(Simple Moving average), compares the value to the closing price 'p' and gives a signal as to whether we should buy or sell, represented by a 1 or -1 respectively.
I now want to create a function for EMA(Exponential Moving Average). The formula for EMA is EMA = Closing price x multiplier + EMA (previous day) x (1-multiplier) where the multiplier is [2 รท (number of observations + 1)].
However, I'm not sure how to do this in Spyder/Python.
My code for SMA is:
def SMA(p,window=10,signal_type='buy only'):
#input price "p", look-back window "window",
#signal type = buy only (default) --gives long signals, sell only --gives sell signals, both --gives both long and short signals
#return a list of signals = 1 for long position and -1 for short position
signals = np.zeros(len(p))
if len(p)<window:
#no signal if no sufficient data
return signals
sma = list(np.zeros(window)+np.nan) #the first few prices does not give technical indicator values
sma += [np.average(p[k:k+window]) for k in np.arange(len(p)-window)]
for i in np.arange(len(p)-1):
if np.isnan(sma[i]):
continue #skip the open market time window
if sma[i]<p[i] and (signal_type=='buy only' or signal_type=='both'):
signals[i] = 1
elif sma[i]>p[i] and (signal_type=='sell only' or signal_type=='both'):
signals[i] = -1
return signals
This gets called in another file like so, I want to be able to just change 'backtest.SMA' to 'backtest.EMA' so I can see the signals that EMA produces.
signals = backtest.SMA(this_dat['Close'].values, window=10)
('this_dat['Close'].values' is a list of closing price values taken from a table)
I defined EMA in the same way I defined SMA but I'm not sure how to obtain EMA
def EMA(p,window=10,signal_type='buy only'):
signals = np.zeros(len(p))
if len(p)<window:
#no signal if no sufficient data
return signals
multiplier=(2/(len(p)+1))
ema = list(np.zeros(window)+np.nan)
#ema = Closing price x multiplier + EMA (previous day) x (1-multiplier)#
ema +=
for i in np.arange(len(p)-1):
if np.isnan(ema[i]):
continue #skip the open market time window
if ema[i]<p[i] and (signal_type=='buy only' or signal_type=='both'):
signals[i] = 1
elif ema[i]>p[i] and (signal_type=='sell only' or signal_type=='both'):
signals[i] = -1
return signals
I need help finding a way to write that forumla for EMA in python, I can't figure it out. If anyone knows a package I can use that might work then I'm also open to trying it but I would prefer if I could find EMA using my SMA function as a template.
Any help would be greatly appreciated, thanks

How to smoothen the value of Stochastic Oscillator from (14,1,3) to (14,3,3) in Numpy/Pandas?

I have a value where it produces the exact results as given for any stock at TradingView Website. This result is for Stochastic Oscillator with values (14,1,3). I want to know if I want to Smooth the value to (14,3,3), what would have to be done?
This is the blog which uses the same idea I am talking about and below is my code:
df.sort_index(ascending=False,inplace=True) #My stock is Newest First order
k_period = 14
d_period = 3
LOW,HIGH,CLOSE = "LOW", "HIGH", "CLOSE" # Column names
# Adds a "n_high" column with max value of previous 14 periods
df['n_high'] = df[HIGH].rolling(k_period).max()
# Adds an "n_low" column with min value of previous 14 periods
df['n_low'] = df[LOW].rolling(k_period).min()
# Uses the min/max values to calculate the %k (as a percentage)
df['%K'] = (df[CLOSE] - df['n_low']) * 100 / (df['n_high'] - df['n_low'])
# Uses the %k to calculates a SMA over the past 3 values of %k
df['%D'] = df['%K'].rolling(d_period).mean()
Found the solution. It was a silly adjustment. You need to .rolling_average() the Blue Line Also. Here is the adjusted code.
def Stochastic(data, k_period:int = 14, d_period:int = 3, smooth_k = 3, names:tuple = ('OPEN','CLOSE','LOW','HIGH'),return_df:bool=False):
'''
Implementation of the Stochastic Oscillator. Returns the Fast and Slow lines values or the whole DataFrame
args:
data: Pandas Dataframe of the stock
k_period: Period for the %K /Fast / Blue line
d_period: Period for the %D / Slow /Red / Signal Line
smooth_k: Smoothening the Fast line value. With increase/ decrease in number, it becomes the Fast or Slow Stochastic
names: Names of the columns which contains the corresponding values
return_df: Whether to return the DataFrame or the Values
out:
Returns either the Array containing (fast_line,slow_line) values or the entire DataFrame
'''
OPEN, CLOSE, LOW, HIGH = names
df = data.copy()
if df.iloc[0,0] > df.iloc[1,0]: # if the first Date entry [0,0] is > previous data entry [1,0] then it is in descending order, then reverse it for calculation
df.sort_index(ascending=False, inplace = True)
# Adds a "n_high" column with max value of previous 14 periods
df['n_high'] = df[HIGH].rolling(k_period).max()
# Adds an "n_low" column with min value of previous 14 periods
df['n_low'] = df[LOW].rolling(k_period).min()
# Uses the min/max values to calculate the %k (as a percentage)
df['Blue Line'] = (df[CLOSE] - df['n_low']) * 100 / (df['n_high'] - df['n_low']) # %K or so called Fast Line
if smooth_k > 1: # Smoothen the fast, blue line
df['Blue Line'] = df['Blue Line'].rolling(smooth_k).mean()
# Uses the %k to calculates a SMA over the past 3 values of %k
df['Red Line'] = df['Blue Line'].rolling(d_period).mean() # %D of so called Slow Line
df.drop(['n_high','n_low'],inplace=True,axis=1)
df.sort_index(ascending = True, inplace = True)
if return_df:
return df
return df.iloc[0,-2:] # Fast

How to slice on DateTime objects more efficiently and compute a given statistic at each iteration?

I am dealing with a pandas dataframe where the index is a DateTime object and the columns represent minute-by-minute returns on several stocks from the SP500 index, together with a column of returns from the index. It's fairly long (100 stocks, 1510 trading days, minute-by-minute data each day) and looks like this (only three stocks for the sake of example):
DateTime SPY AAPL AMZN T
2014-01-02 9:30 0.032 -0.01 0.164 0.007
2014-01-02 9:31 -0.012 0.02 0.001 -0.004
2014-01-02 9:32 -0.015 0.031 0.004 -0.001
I am trying to compute the betas of each stock for each different day and for each 30-minute window. The beta of a stock in this case is defined as the covariance between its returns and the SPY returns divided by the variance of SPY in the same period. My desired output is a 3-dimensional numpy array beta_HF where beta_HF[s, i, j], for instance, means the beta of stock s at day i at window j. At this moment, I am computing the betas in the following way (let returns be full dataframe):
trading_days = pd.unique(returns.index.date)
window = "30min"
moments = pd.date_range(start = "9:30", end = "16:00", freq = window).time
def dispersion(trading_days, moments, df, verbose = True):
index = 'SPY'
beta_HF = np.zeros((df.shape[1] - 1, len(trading_days), len(moments) - 1))
for i, day in enumerate(trading_days):
daily_data = df[df.index.date == day]
start_time = dt.time(9,30)
for j, end_time in enumerate(moments[1:]):
moment_data = daily_data.between_time(start_time, end_time)
covariances = np.array([moment_data[index].cov(moment_data[symbol]) for symbol in df])
beta_HF[:, i,j] = covariances[1:]/covariances[0]
if verbose == True:
if np.remainder(i, 100) == 0:
print("Current Trading Day: {}".format(day))
return(beta_HF)
The dispersion() function generates the correct output. However, I understand that I am looping over long iterables and this is not very efficient. I seek a more efficient way to "slice" the dataframe at each 30-minute window for each day in the sample and compute the covariances. Effectively, for each slice, I need to compute 101 numbers (100 covariances + 1 variance). On my local machine (a 2013 Retina i5 Macbook Pro) it's taking around 8 minutes to compute everything. I tested it on a research server of my university and the computing time was basically the same, which probably implies that computing power is not the bottleneck but my code has low quality in this part. I would appreciate any ideas on how to make this faster.
One might point out that parallelization is the way to go here since the elements in beta_HF never interact with each other. So this seems to be easy to parallelize. However, I have never implemented anything with parallelization so I am very new to these concepts. Any ideas on how to make the code run faster? Thanks a lot!
You can use pandas Grouper in order to group your data by frequency. The only drawbacks are that you cannot have overlapping windows and it will iterate over times that are not existant.
The first issue basically means that the window will slide from 9:30-9:59 to 10:00-10:29 instead of 9:30-10:00 to 10:00-10:30.
The second issue comes to play during holidays and night when no trading takes place. Hence, if you have a large period without trading then you might want to split the DataFrame and combine them afterwards.
Create example data
import pandas as pd
import numpy as np
time = pd.date_range(start="2014-01-02 09:30",
end="2014-01-02 16:00", freq="min")
time = time.append( pd.date_range(start="2014-01-03 09:30",
end="2014-01-03 16:00", freq="min") )
df = pd.DataFrame(data=np.random.rand(time.shape[0], 4)-0.5,
index=time, columns=['SPY','AAPL','AMZN','T'])
define the range you want to use
freq = '30min'
obs_per_day = len(pd.date_range(start = "9:30", end = "16:00", freq = "30min"))
trading_days = len(pd.unique(df.index.date))
make a function to calculate the beta values
def beta(df):
if df.empty: # returns nan when no trading takes place
return np.nan
mat = df.to_numpy() # numpy is faster than pandas
m = mat.mean(axis=0)
mat = mat - m[np.newaxis,:] # demean
dof = mat.shape[0] - 1 # degree of freedom
if dof != 0: # check if you data has more than one observation
mat = mat.T.dot(mat[:,0]) / dof # covariance with first column
return mat[1:] / mat[0] # beta
else:
return np.zeros(mat.shape[1] - 1) # return zeros for to short data e.g. 16:00
and in the end use pd.groupby().apply()
res = df.groupby(pd.Grouper(freq=freq)).apply(beta)
res = np.array( [k for k in res.values if ~np.isnan(k).any()] ) # remove NaN
res = res.reshape([trading_days, obs_per_day, df.shape[1]-1])
Note that the result is in a slightly different shape than yours.
The results also differ a bit because of the different window sliding. To check whether the results are the same, simply try somthing like this
trading_days = pd.unique(df.index.date)
# Your result
moments1 = pd.date_range(start = "9:30", end = "10:00", freq = "30min").time
beta(df[df.index.date == trading_days[0]].between_time(moments1[0], moments1[1]))
# mine
moments2 = pd.date_range(start = "9:30", end = "10:00", freq = "29min").time
beta(df[df.index.date == trading_days[0]].between_time(moments[0], moments2[1]))

Python - path dependent simulation

I've setup a simulation example below.
Setup:
I have weekly data, say 6 years of data each week of around 1000 stocks some weeks more other weeks less than 1000. I randomly chose 75 stocks at time t0. At t1 some stocks dies (probability p, goes out of fashion) or leave the index (structural such as merging). I need to simulate stocks so that every week I've exactly 75 stocks. Every week some stocks dies (between 0 and 75) and I pick new ones not from the existing 75. I also check if the stock leaves do to structural reasons. Every week I calculate the returns of the 75 stocks.
Questions: Is there an obvious why to improve the speed. I started with Pandas objects (group sort) which was to slow. I haven't tried to parallel the loop. I'm more interesting to hear if I should use numba (but it doesn't have the np.in1d function) or if there is a faster way to shuffle (I actually only need to shuffle the ones). I've also think about creating a fixed array with all stocks id using NaN, the problem here is that I need 75 names so I still need to filter out these NaN every week.
Maybe this is to detailed problem for this forum, I apologize if that's the case
Code:
from timeit import default_timer
import numpy as np
# Create dataset
n_weeks = 312 # Approximately 6 years of weekly data
n_stocks = np.random.normal(1000, 5, n_weeks).astype(dtype=np.uint16) # Around 1000 stocks every week but not fixed
idx_new_week = np.cumsum(np.hstack((0, n_stocks)))
# We give each stock a stock idea
n_obs = n_stocks.sum()
stock_id = np.ones([n_obs], dtype=np.uint16)
for j in range(1, n_weeks+1):
stock_id[idx_new_week[j-1]:idx_new_week[j]] = np.cumsum(np.ones(n_stocks[j-1]))
stock_rtn = np.random.normal(0, 0.25/np.sqrt(52), n_obs) # Simulated forward (one week ahead) return for each stock
# Simulation part
# Week 0 pick randomly 75 stocks
# Week n >=1 a stock dies for two reasons
# 1) randomness (probability 'p')
# 2) structural event (could be merger, fall out of index).
# We cannot assume that it is always the high stockid which dies for structural reasons (as it looks like here)
# If a stock dies we randomely pick a stock from the "deak" stock dataset (not included the ones which dies this week)
n_sim = 100 # I want this to be 1 mill
n_stock_cand = 75 # For this example we pick 75 stocks
p_survial = 0.90
# The weekly periodcal returns
pf_rtn = np.zeros([n_weeks, n_sim])
start = default_timer()
for k in range(0, n_sim):
# Randomely choice n_stock_cand at time zero
boolean_list = np.array([False] * (n_stocks[0] - n_stock_cand) + [True] * n_stock_cand)
np.random.shuffle(boolean_list) # Shuffle the list
stock_id_this_week = stock_id[idx_new_week[0]:idx_new_week[1]][boolean_list]
stock_rtn_this_week = stock_rtn[idx_new_week[0]:idx_new_week[1]][boolean_list]
# This part only simulate the Buzz portfolio names - later we simulate returns and from specific holdings of the 75 names
for j in range(1, n_weeks):
pf_rtn[j-1, k] = stock_rtn_this_week.mean()
# Find the number of stocks to keep
boolean_keep_stocks = np.random.rand(n_stock_cand) < p_survial
# Next we need to check if a stock is still part of the universe next period
stock_cand_temp = stock_id[idx_new_week[j-1]:idx_new_week[j]]
stock_rtn_temp = stock_rtn[idx_new_week[j-1]:idx_new_week[j]]
boolean_keep_stocks = (boolean_keep_stocks) & (np.in1d(stock_id_this_week, stock_cand_temp, assume_unique=True))
n_stocks_to_replace = n_stock_cand - boolean_keep_stocks.sum() # Number of new stocks to pick this week
if n_stocks_to_replace > 0:
# We have to pick from stocks which is not part of the portfolio already
boolean_cand = np.in1d(stock_cand_temp, stock_id_this_week, assume_unique=True, invert=True)
n_stocks_to_pick_from = boolean_cand.sum()
boolean_list = np.array([False] * (n_stocks_to_pick_from - n_stocks_to_replace) + [True] * n_stocks_to_replace)
np.random.shuffle(boolean_list) # Shuffle the list
# First avoid picking the same stock twich, next pick from the unique candidate list
stock_id_new = stock_cand_temp[boolean_cand][boolean_list] # The new stocks
stock_rtn_new = stock_rtn_temp[boolean_cand][boolean_list] # and their returns
stock_id_this_week = np.hstack((stock_id_this_week[boolean_keep_stocks], stock_id_new))
stock_rtn_this_week = np.hstack((stock_rtn_this_week[boolean_keep_stocks], stock_rtn_new))
else:
# No replacement of stocks / all surview but order might differ
boolean_cand = np.in1d(stock_cand_temp, stock_id_this_week, assume_unique=True, invert=False)
stock_id_this_week = stock_cand_temp[boolean_cand]
stock_rtn_this_week = stock_rtn_temp[boolean_cand]
# PnL last period
pf_rtn[n_weeks-1, k] = stock_rtn_this_week.mean()
print(default_timer() - start)

How to price a SimpleCashFlow

I would like to use QuantLib to price a portfolio of liabilities, which are modeled to be deterministic future cash-flows. I am now modelling them as a strip of FixedRateBonds with zero coupons, which seems like a very inelegant solution.
Problem:
Question 1: Is there a way to create an 'Instrument' that is just a 'SimpleCashFlow', 'Redemption' etc. and price it on a discount curve?
Question 2: Is it possible to construct a 'CashFlows' object or Instrument from multiple SimpleCashFlow's and price it on a curve?
Many thanks in advance
Code Example:
See code below for an example of what I am trying to do.
from QuantLib import *
# set params
calc_date = Date(30, 3, 2017)
risk_free_rate = 0.01
discount_curve = YieldTermStructureHandle(
FlatForward(calc_date, risk_free_rate, ActualActual()))
bond_engine = DiscountingBondEngine(discount_curve)
# characteristics of the cash-flow that I am trying to NPV
paymentdate = Date(30, 3, 2018)
paymentamount = 1000
# this works: pricing a fixed rate bond with no coupons
schedule = Schedule(paymentdate-1, paymentdate, Period(Annual), TARGET(),
Unadjusted, Unadjusted, DateGeneration.Backward, False)
fixed_rate_bond = FixedRateBond(0, paymentamount, schedule, [0.0],ActualActual())
bond_engine = DiscountingBondEngine(discount_curve)
fixed_rate_bond.setPricingEngine(bond_engine)
print(fixed_rate_bond.NPV())
# create a simple cashflow
simple_cash_flow = SimpleCashFlow(paymentamount, paymentdate)
# Q1: how to create instrument, set pricing engine and price a SimpleCashFlow?
#wrongcode:# simple_cash_flow.setPricingEngine(bond_engine)
#wrongcode:# print(simple_cash_flow.NPV())
# Q2: can I stick multiple cashflows into a single instrument, e.g.:
# how do I construct and price a CashFlows object from multiple 'SimpleCashFlow's?
simple_cash_flow2 = SimpleCashFlow(paymentamount, Date(30, 3, 2019))
#wrongcode:# cashflows_multiple = CashFlows([simple_cash_flow, simple_cash_flow2])
#wrongcode:# cashflows_multiple.setPricingEngine(bond_engine)
#wrongcode:# print(cashflows_multiple.NPV())
There are a couple of possible approaches. If you want to use an instrument, you can use a ZeroCouponBond instead of the fixed-rate one you're currently using:
bond = ZeroCouponBond(0, TARGET(), paymentamount, paymentdate)
bond.setPricingEngine(bond_engine)
print(bond.NPV())
Using an instrument will give you notifications and recalculation if the discount curve were to change, but might be overkill if you want a single pricing. In that case, you might work directly with the cashflows by using the methods of the CashFlows class:
cf = SimpleCashFlow(paymentamount, paymentdate)
print(CashFlows.npv([cf], discount_curve, True))
where the last parameter is True if you want to include any cashflow happening on today's date and False otherwise (note that this will give you a result a bit different from your calculation; that's because the payment date you used is a TARGET holiday, and the FixedRateBond constructor adjusts it to the next business day).
The above also works with several cash flows:
cfs = [SimpleCashFlow(paymentamount, paymentdate),
SimpleCashFlow(paymentamount*0.5, paymentdate+180),
SimpleCashFlow(paymentamount*2, paymentdate+360)]
print(CashFlows.npv(cfs, discount_curve, True))
Finally, if you want to do the same with an instrument, you can use the base Bond class and pass the cashflows directly:
custom_bond = Bond(0, TARGET(), 100.0, Date(), Date(), cfs)
custom_bond.setPricingEngine(bond_engine)
print(custom_bond.NPV())
this works but is kind of a kludge: the bond uses the passed cashflows directly and ignores the passed face amount and maturity date.

Categories