Getting my def function to apply.() to my stocks - python

I'm having trouble getting my code to work. Im coding python in a backtesting environment called "Quantopian". Regardless, the .apply(), series, .pd or whatever terminology is beyond my skill level. (assuming I'm even on the right track lol)
What I'm trying to accomplish:
Taking a couple stocks and constantly calculating the MACD. Then when the indicator meets a certain condition, the algo purchases or sells that specific stock.
What the MACD is simplistically:
A momentum indicator that looks at historical data, using 12, 26 and 9 day Exponential Moving Averages and comparing them with each other.
I've designed my own function, thats not my problem....
Help:
I'm trying to apply it to the pool of stocks in my universe to constantly calculate the MACD every minute.
Where I'm specifically confused:
I defined a MACD function but don't know how to get it to calculate every minute for whatever stocks are in my pool.
CODE:
import numpy as np
import math
import talib as ta
import pandas as pd
def initialize(context):
set_commission(commission.PerTrade(cost=10))
context.stocks = symbols('AAPL', 'GOOG_L')
def handle_data(context, data):
for stock in context.stocks:
prices_fast = data.history(context.stocks, "close", 390, "1m").resample("30min").dropna()
prices_slow = data.history(context.stocks, "close", 390, "1m").resample("30min").dropna()
prices_signal = data.history(context.stocks, "close", 390, "1m").resample("30min").dropna()
curr_price = data.history(context.stocks, "price", 30, "1m").resample("30min")[-1:].dropna()
series = pd.Series([stock]).dropna()
macd = series.apply(MACD)
macd_func = stock.apply(MACD)
if macd_func[stock] > 0:
order(stock, 1)
print macd_func
record(macd=macd_func[stock])
def MACD(prices_fast, prices_slow, prices_signal, curr_price):
# Setting MACD Conditions:
slow = 26
fast = 12
signal = 9
# Calcualting Averages:
avg_fast = pd.rolling_sum(prices_fast[:fast], fast)[-1:] / fast
avg_slow = pd.rolling_sum(prices_slow[:slow], slow)[-1:] / slow
avg_signal = pd.rolling_sum(prices_signal[:signal], signal)[-1:] / signal
# Calculating the Weighting Multipliers:
A = 2 / (fast + 1)
B = 2 / (slow + 1)
C = 2 / (signal + 1)
# Calculating the Exponential Moving Averages:
EMA_fast = (curr_price * A) + [avg_fast * (1 - A)]
EMA_slow = (curr_price * B) + [avg_slow * (1 - B)]
EMA_signal = (curr_price * C) + [avg_signal * (1 - C)]
# Calculating MACD Histogram:
macd = EMA_fast - EMA_slow - EMA_signal
If someone could give me a handle, I would GREATLY appreciate it!
Thank you very VERY much,
Mike

Related

Does anyone know how to calculate the Coppock Curve in python

I'm currently trying to calculate the Coppock Curve for a strategy i'm making in python.
I've written it like this(ROC1 is the 11 length and the ROC2 is the 14 length):
final = wma_onehr*(rocOne_onehr+rocTwo_onehr)
I know my values are correct but this is the only calculation for it and it does not match with tradingview at all. For instance when I run it I get
ROC1: -1.094
ROC2: -0.961
WMA: 7215.866
And my answer is -15037.864744
While Tradingview is at -0.9
These values are know where near close and i'm just wondering why I have not found a way to get a value like that of any kind. (I'm using taapio api if anyones wondering)
Take a look at below function. Note that data_array that is passed to function is a one dimensional numpy array that contains close prices of financial asset.
import numpy as np
def coppock_curve(data_array, sht_roc_length=11, long_roc_length=14, curve_length=10): # Coppock Curve
"""
:param sht_roc_length: Short Rate of Change length
:param long_roc_length: Long Rate of Change length
:param curve_length: Coppock Curve Line length
:return: Coppock oscillator values
"""
data_array = data_array[-(curve_length + max(sht_roc_length, long_roc_length, curve_length) + 1):]
# Calculation of short rate of change
roc11 = (data_array[-(curve_length + 1):] - data_array[-(curve_length + sht_roc_length + 1):-sht_roc_length]) /\
data_array[-(curve_length + sht_roc_length + 1):-sht_roc_length] * 100
roc14 = (data_array[-(curve_length + 1):] - data_array[:-long_roc_length]) / data_array[:-long_roc_length] * 100
sum_values = roc11 + roc14 # calculation of long rate of change
curve = np.convolve(sum_values, np.arange(1, curve_length + 1, dtype=int)[::-1], 'valid') / \
np.arange(1, curve_length + 1).sum() # calculation of coppock curve line

Fast EMA calculation on large dataset with irregular time intervals

I have data that has 800,000+ rows. I want to take an Exponential Moving Average (EMA) of one of the columns. The times are not evenly sampled and I want to decay the EMA on each update (row). The code I have is this:
window = 5
for i in range(1, len(series)):
dt = series['datetime'][i] - series['datetime'][i - 1]
decay = 1 - numpy.exp(-dt / window)
result[i] = (1 - decay) * result[i - 1] + decay * series['midpoint'].iloc[i]
return pandas.Series(result, index=series.index)
The problem is, for 800,000 rows, this is very slow. Is there anyway to optimize this using some other features of numpy? I can't vectorize it because results[i] is dependent on results[i-1].
sample data here:
Timestamp Midpoint
1559655000001096130 2769.125
1559655000001162260 2769.127
1559655000001171688 2769.154
1559655000001408734 2769.138
1559655000001424200 2769.123
1559655000001433128 2769.110
1559655000001541560 2769.125
1559655000001640406 2769.125
1559655000001658436 2769.127
1559655000001755924 2769.129
1559655000001793266 2769.125
1559655000001878688 2769.143
1559655000002061024 2769.125
How about something like the following which takes me 0.34 seconds to run on a series of irregularly spaced data with 900k rows? I am assuming the window of 5 implies a 5 day span.
First, let's create some sample data.
# Create sample data for a price stream of 2.6m price observations sampled 1 second apart.
seconds_per_day = 60 * 60 * 24 # 60 seconds / minute * 60 minutes / hour * 24 hours / day
starting_value = 100
annualized_vol = .3
sampling_percentage = .35 # 35%
start_date = '2018-12-01'
end_date = '2018-12-31'
np.random.seed(0)
idx = pd.date_range(start=start_date, end=end_date, freq='s') # One second intervals.
periodic_vol = annualized_vol * (1/ 252 / seconds_per_day) ** 0.5
daily_returns = np.random.randn(len(idx)) * periodic_vol
cumulative_indexed_return = (1 + daily_returns).cumprod() * starting_value
index_level = pd.Series(cumulative_indexed_return, index=idx)
# Sample 35% of the simulated prices to create a time series of 907k rows with irregular time intervals.
s = index_level.sample(frac=sampling_percentage).sort_index()
Now let's create a generator function to store the latest value of the exponentially weighted time series. This can run c. 4x faster by installing numba, importing it and then adding the single decorator line above the function definition #jit(nopython=True).
from numba import jit # Optional, see below.
#jit(nopython=True) # Optional, see below.
def ewma(vals, decay_vals):
result = vals[0]
yield result
for val, decay in zip(vals[1:], decay_vals[1:]):
result = result * (1 - decay) + val * decay
yield result
Now let's run this generator on the irregularly spaced series s. For this sample with 900k rows, it takes me 1.2 seconds to run the following code. I can further cut down the execution time to 0.34 seconds by optionally using the the just in time compiler from numba. You first need to install that package, e.g. conda install numba. Note that I used a list compehension to populate the ewma values from the generator, and then I assign these values back to the original series after first converting it to a dataframe.
# Assumes time series data is now named `s`.
window = 5 # Span of 5 days?
dt = pd.Series(s.index).diff().dt.total_seconds().div(seconds_per_day) # Measured in days.
decay = (1 - (dt / -window).apply(np.exp))
g = ewma_generator(s.values, decay.values)
result = s.to_frame('midpoint').assign(
ewma=pd.Series([next(g) for _ in range(len(s))], index=s.index))
>>> result.tail()
midpoint ewma
2018-12-30 23:59:45 103.894471 105.546004
2018-12-30 23:59:49 103.914077 105.545929
2018-12-30 23:59:50 103.901910 105.545910
2018-12-30 23:59:53 103.913476 105.545853
2018-12-31 00:00:00 103.910422 105.545720
>>> result.shape
(907200, 2)
To make sure the numbers follow our intuition, let's visualize the result taking hourly samples. This looks good to me.
obs_per_day = 24 # 24 hourly observations per day.
step = int(seconds_per_day / obs_per_day)
>>> result.iloc[::step, :].plot()
A slight improvement may be obtained by iterating on the underlying numpy arrays instead of on pandas DataFrames and Series:
result = np.ndarray(len(series))
window = 5
serdt = series['datetime'].values
sermp = series['midpoint'].values
for i in range(1, len(series)):
dt = serdt[i] - serdt[i - 1]
decay = 1 - numpy.exp(-dt / window)
result[i] = (1 - decay) * result[i - 1] + decay * sermp[i]
return pandas.Series(result, index=series.index)
With your sample data it is about 6 times faster that the original method.

How to implement Ehlers Homodyne Discriminator in Pandas series

I want to include a dynamic 'lookback period' for my stock indicators for a given time period entry. I've previously implemented Ehler's Homodyne Discriminator using a rolling window; every time a new datapoint comes into my algorithm, the discriminator is recalculated (but retains memory of prior calculations...see below). I would rather determine the period using Pandas as it seems to be a faster method of implementing data processing over large datasets.
Note that I encounter data two ways: first, historical data is generated in bulk; and second, data comes in 1 minute at a time and will be added to the historical data for reprocessing.
The issues I face are:
Calculations are dependent on the at-index value of period, and period depends on the other calculations (see original script). However the calculations using pandas are currently done in bulk so the data never changes over time which it should.
The dataframe includes values for multiple assets (MultiIndex) and so I currently process the discriminator once per asset; is there a way I can run this once and let Pandas do the grouping?
Should I simply reprocess the entire dataset every time new data comes in, or should I do away with the benefits of Pandas and just iterate through each new row and use my old script?
Historical Data:
close high low open volume
symbol time
SPY 2019-06-07 15:41:00 288.03 288.060 287.98 288.030 132296.0
2019-06-07 15:42:00 288.04 288.060 287.96 288.035 103635.0
2019-06-07 15:43:00 288.15 288.160 288.04 288.045 144841.0
2019-06-07 15:44:00 288.10 288.190 288.09 288.150 166086.0
2019-06-07 15:45:00 287.93 288.120 287.93 288.100 145304.0
2019-06-07 15:46:00 287.77 287.935 287.75 287.935 253202.0
2019-06-07 15:47:00 287.86 287.870 287.76 287.760 140996.0
2019-06-07 15:48:00 287.78 287.865 287.76 287.860 178082.0
2019-06-07 15:49:00 287.83 287.855 287.62 287.790 631133.0
2019-06-07 15:50:00 287.83 287.915 287.78 287.825 279326.0
Original Script (self.Value is actual period). If you don't use QuantConnect, I'm sure you could just replace all RollingWindows with arrays with reversed data or reverse the references. In this script, Update is called every time a new row is created in the dataframe:
class HomodyneDiscriminatorPeriodOld():
Values = RollingWindow[int](2)
SmoothedPeriod = RollingWindow[float](2)
Smooth = RollingWindow[float](7)
Detrend = RollingWindow[float](7)
Source = RollingWindow[float](4)
I1 = RollingWindow[float](7)
I2 = RollingWindow[float](7)
Q1 = RollingWindow[float](7)
Q2 = RollingWindow[float](7)
Re = RollingWindow[float](2)
Im = RollingWindow[float](2)
def FillWindows(self, *args, value=0):
for window in args:
for i in range(window.Size):
window.Add(value)
def __init__(self, period=1):
self.Value = period
self.Period = period
# Start with history
self.FillWindows(self.Smooth, self.SmoothedPeriod, self.Detrend, self.I1, self.I2, self.Q1, self.Q2, self.Re, self.Im)
self.FillWindows(self.Values, value=self.Value)
def __repr__(self):
return "{}".format(self.Value)
def Weighted(self, first, second, percent=0.2):
return percent * first + (1 - percent) * second
def Quadrature(self, window):
C1 = 0.0962
C2 = 0.5769
C3 = self.Period * 0.075 + 0.54
return (window[0] * C1 + window[2] * C2 - window[4] * C2 - window[6] * C1) * C3
def Update(self, data):
self.Source.Add((data.High + data.Low) / 2)
if not self.Source.IsReady: return self.Value
#
# --- Start the Homodyne Discriminator Caculations
#
# Mutable Variables (non-series)
self.Smooth.Add((self.Source[0] * 4.0 + self.Source[1] * 3.0 + self.Source[2] * 2.0 + self.Source[3]) / 10.0)
self.Detrend.Add(self.Quadrature(self.Smooth))
# Compute InPhase and Quadrature components
self.Q1.Add(self.Quadrature(self.Detrend))
self.I1.Add(self.Detrend[3])
# Advance Phase of I1 and Q1 by 90 degrees
jI = self.Quadrature(self.I1)
jQ = self.Quadrature(self.Q1)
# Phaser addition for 3 bar averaging and
# Smooth i and q components before applying discriminator
self.I2.Add(self.Weighted(self.I1[0] - jQ, self.I2[0]))
self.Q2.Add(self.Weighted(self.Q1[0] + jI, self.Q2[0]))
# Extract Homodyne Discriminator
self.Re.Add(self.Weighted(self.I2[0] * self.I2[1] + self.Q2[0] * self.Q2[1], self.Re[0]))
self.Im.Add(self.Weighted(self.I2[0] * self.Q2[1] - self.Q2[0] * self.I2[1], self.Im[0]))
# Calculate the period
period = ((math.pi * 2) / math.atan(self.Im[0] / self.Re[0])) if (self.Re[0] != 0 and self.Im[0] != 0) else 0
period = min(max(max(min(period, 1.5 * self.Period), 0.6667 * self.Period), 6), 50)
self.Period = self.Weighted(period, self.Period)
self.SmoothedPeriod.Add(self.Weighted(self.Period, self.SmoothedPeriod[0], 0.33))
self.Value = round(self.SmoothedPeriod[0] * 0.5 - 1)
if self.Value < 1: self.Value = 1
self.Values.Add(self.Value)
return self.Value
Pandas Script. Update is currently only called once after bulk import of historical data. I have yet to implement a walk-forward method of calculation as indicated by Q3 if it's even required:
class HomodyneDiscriminatorPeriod():
def Weighted(self, series, other=None, percent=0.2):
if other is None: other = series
return percent * series + (1 - percent) * other
def Quadrature(self, series):
C1 = 0.0962
C2 = 0.5769
C3 = self.Frame.period * 0.075 + 0.54
return (series * C1 + series.shift(2) * C2 - series.shift(4) * C2 - series.shift(6) * C1) * C3
def Update(self, frame):
# Add period column to timeframe's dataframe
frame['period'] = 1
# Initialize internal dataframe with same structure
# as timeframe's dataframe but without original columns
self.Frame = pd.DataFrame().reindex_like(frame)
self.Frame.drop(frame.columns, axis=1)
self.Frame['period'] = 1
self.Frame['smoothed_period'] = 1
self.Frame['i2'] = 0
self.Frame['q2'] = 0
self.Frame['re'] = 0
self.Frame['im'] = 0
# Shorthand references
period = self.Frame['period']
smoothed_period = self.Frame['smoothed_period']
i2 = self.Frame['i2']
q2 = self.Frame['q2']
re = self.Frame['re']
im = self.Frame['im']
#
# --- Start the Homodyne Discriminator Caculations
#
# Mutable Variables (non-series)
hl2 = (frame.high + frame.low) / 2
smooth = (hl2 * 4.0 + hl2.shift(1) * 3.0 + hl2.shift(2) * 2.0 + hl2.shift(3)) / 10.0
detrend = self.Quadrature(smooth)
# Compute InPhase and Quadrature components
q1 = self.Quadrature(detrend)
i1 = detrend.shift(3)
# Advance Phase of I1 and Q1 by 90 degrees
ji = self.Quadrature(i1)
jq = self.Quadrature(q1)
# Phaser addition for 3 bar averaging and
# smooth i and q components before applying discriminator
i2 = self.Weighted(i1 - jq)
q2 = self.Weighted(q1 + ji)
# Extract Homodyne Discriminator
re = self.Weighted(i2 * i2.shift(1) + q2 * q2.shift(1))
im = self.Weighted(i2 * q2.shift(1) - q2 * i2.shift(1))
# Calculate the period
# TODO: Use 360 or 2 * np.pi???? Official doc says 360...
_period = (2 * np.pi / np.arctan(im / re)).clip(upper=1.5 * period, lower=0.6667 * period).clip(upper=50, lower=6)
period = self.Weighted(_period, period)
smoothed_period = self.Weighted(period, smoothed_period, 0.33)
return (smoothed_period * 0.5 - 1).round().clip(lower=1)
I would think that recalculating the homodyne filter for the entire dataset each time a new bar became available would be much too expensive. Recall, most of Ehler's cycle filters are determined recursively -- and the homodyne looks back more bars than the supersmoother or high-pass filter. Given this, most trading platforms simply hold the resulting arrays in memory, and then just pick off the array elements a few bars back to calculate results for each new bar.
Note that none of the platforms go all the way back to the beginning and calculate the resulting output arrays for the entire time series when a new bar becomes available. If Pandas is that fast, then this may not be an issue. But in theory, I would not do that computationally, since it would be duplicative (unnecessary) computation. In other words, no matter how fast a platform is, why would you calculate the same array elements over and over again thousands of times within the time series, when you only need to look back about 6 bars for most Ehlers filters, and a few more for the homodyne when each new bar becomes available?

Average True Range and Exponential Moving Average Functions on PandasDataSeries needed

I am stuck while calculating Average True Range[ATR] of a Series.
ATR is basically a Exp Movin Avg of TrueRange[TR]
TR is nothing but MAX of -
Method 1: Current High less the current Low
Method 2: Current High less the previous Close (absolute value)
Method 3: Current Low less the previous Close (absolute value)
In Pandas we dont have an inbuilt EMA function. Rather we have EWMA which is a weighted moving average.
If someone helps to calculate EMA that also will be good enough
def ATR(df,n):
df['H-L']=abs(df['High']-df['Low'])
df['H-PC']=abs(df['High']-df['Close'].shift(1))
df['L-PC']=abs(df['Low']-df['Close'].shift(1))
df['TR']=df[['H-L','H-PC','L-PC']].max(axis=1)
df['ATR_' + str(n)] =pd.ewma(df['TR'], span = n, min_periods = n)
return df
The above code doesnt give error but it also doesnt give correct values either. I compared it with manually calculating ATR values on same dataseries in excel and values were different
ATR excel formula-
Current ATR = [(Prior ATR x 13) + Current TR] / 14
- Multiply the previous 14-day ATR by 13.
- Add the most recent day's TR value.
- Divide the total by 14
This is the dataseries I used as a sample
start='2016-1-1'
end='2016-10-30'
auro=web.DataReader('AUROPHARMA.NS','yahoo',start,end)
You do need to use ewma
See here: An exponential moving average (EMA) is a type of moving average that is similar to a simple moving average, except that more weight is given to the latest data.
Read more: Exponential Moving Average (EMA) http://www.investopedia.com/terms/e/ema.asp#ixzz4ishZbOGx
I dont think your excel formula is right... Here is a manual way to calculate ema in python
def exponential_average(values, window):
weights = np.exp(np.linspace(-1.,0.,window))
weights /= weights.sum()
a = np.convolve(values, weights) [:len(values)]
a[:window]=a[window]
return a
scipy.signal.lfilter could help you.
scipy.signal.lfilter(b, a, x, axis=-1,zi=None)
The filter function is implemented as a direct II transposed structure. This means that the filter implements:
a[0]*y[n] = b[0]*x[n] + b[1]*x[n-1] + ... + b[M]*x[n-M]
- a[1]*y[n-1] - ... - a[N]*y[n-N]
If we normalize the above formula, we obtain the following one:
y[n] = b'[0]*x[n] + b'[1]*x[n-1] + ... + b'[M]*x[n-M]
- a'[1]*y[n-1] + ... + a'[N]*y[n-N]
where b'[i] = b[i]/a[0], i = 0,1,...,M; a'[j] = a[j]/a[0],j = 1,2,...,N
and a'[0] = 1
Exponential Moving Average formula:
y[n] = alpha*x[n] + (1-alpha)*y[n-1]
So to apply scipy.signal.lfilter, by the formula above we can set a and b as below:
a[0] = 1, a[1] = -(1-alpha)
b[0] = alpha
My implementation is as below, hope it to help you.
def ema(values, window_size):
alpha = 2./ (window_size + 1)
a = np.array([1, alpha - 1.])
b = np.array([alpha])
zi = sig.lfilter_zi(b, a)
y, _ = sig.lfilter(b, a, values, zi=zi)
return y

Efficiently Running Newton Algorithm

This is related to another question I asked earlier. I want to run the newton method on a large dataset. Below is the code I created using a loop. I need to run it on ~50 million lines and the loop is quite unwieldy. Is there a more efficient way to run it using Pandas/Numpy/ect? Thanks in advance
In:
from pandas import *
from pylab import *
import pandas as pd
import pylab as plt
import numpy as np
from scipy import *
import scipy
df = DataFrame(list([100,2,34.1556,9,105,-100]))
df = DataFrame.transpose(df)
df = df.rename(columns={0:'Face',1:'Freq',2:'N',3:'C',4:'Mkt_Price',5:'Yield'})
df2= df
df = concat([df, df2])
df = df.reset_index(drop=True)
df
Out:
Face Freq N C Mkt_Price Yield
0 100 2 34.1556 9 105 -100
1 100 2 34.1556 9 105 -100
In:
def Px(Rate):
return Mkt_Price - (Face * ( 1 + Rate / Freq ) ** ( - N ) + ( C / Rate ) * ( 1 - (1 + ( Rate / Freq )) ** -N ) )
for count, row in df.iterrows():
Face = row['Face']
Freq = row['Freq']
N = row['N']
C = row['C']
Mkt_Price = row['Mkt_Price']
row['Yield'] = scipy.optimize.newton(Px, .1, tol=.0001, maxiter=100)
df
Out:
Face Freq N C Mkt_Price Yield
0 100 2 34.1556 9 105 0.084419
1 100 2 34.1556 9 105 0.084419
One possibility that pops into my mind is that you might do it vectorized. However, you must then throw away all conditional code, and just run the required amount of iterations.
The basic step in Newton-Raphson is always the same, so you do not need to have any conditional code. Your function Px looks as if it could be vectorized without any extra effort.
The steps are roughly:
def Px(Rate, Mkt_Price, Face, Freq, N, C):
return Mkt_Price - (Face * ( 1 + Rate / Freq ) ** ( - N ) + ( C / Rate ) * ( 1 - (1 + ( Rate / Freq )) ** -N ) )
# initialize the iteration vector
y = 0.1 * np.zeros(num_rows)
# just a guess for the differentiation, might be smaller
h = 1e-6
# then iterate for a suitable number of iterations
for i in range(100):
f = Px(y, Mkt_Price, Face, Freq, N, C)
fp = Px(y+h, Mkt_Price, Face, Freq, N, C)
y -= h * f / (fp - f)
After this you have the iteration results in y. I have assumed Mkt_Price, Face, etc. are 50-million-row vectors.
There will be billions of calculations, so this will still take maybe a dozen seconds. Also, there is no error checking, so if something goes wildly oscillating, there is nothing to warn you about it.
One way to make this better is to calculate the first differential analytically, as it can be done. The practical improvement may be small, though. You will have to experiment to find the best number of iterations. If the function converges fast (as I suppose), 20 iterations will be plenty.
The code is completely untested, but it should illustrate the idea.

Categories