Average True Range and Exponential Moving Average Functions on PandasDataSeries needed

Average True Range and Exponential Moving Average Functions on PandasDataSeries needed - python

I am stuck while calculating Average True Range[ATR] of a Series.
ATR is basically a Exp Movin Avg of TrueRange[TR]
TR is nothing but MAX of -
Method 1: Current High less the current Low
Method 2: Current High less the previous Close (absolute value)
Method 3: Current Low less the previous Close (absolute value)
In Pandas we dont have an inbuilt EMA function. Rather we have EWMA which is a weighted moving average.
If someone helps to calculate EMA that also will be good enough
def ATR(df,n):
df['H-L']=abs(df['High']-df['Low'])
df['H-PC']=abs(df['High']-df['Close'].shift(1))
df['L-PC']=abs(df['Low']-df['Close'].shift(1))
df['TR']=df[['H-L','H-PC','L-PC']].max(axis=1)
df['ATR_' + str(n)] =pd.ewma(df['TR'], span = n, min_periods = n)
return df
The above code doesnt give error but it also doesnt give correct values either. I compared it with manually calculating ATR values on same dataseries in excel and values were different
ATR excel formula-
Current ATR = [(Prior ATR x 13) + Current TR] / 14
- Multiply the previous 14-day ATR by 13.
- Add the most recent day's TR value.
- Divide the total by 14
This is the dataseries I used as a sample
start='2016-1-1'
end='2016-10-30'
auro=web.DataReader('AUROPHARMA.NS','yahoo',start,end)

You do need to use ewma
See here: An exponential moving average (EMA) is a type of moving average that is similar to a simple moving average, except that more weight is given to the latest data.
Read more: Exponential Moving Average (EMA) http://www.investopedia.com/terms/e/ema.asp#ixzz4ishZbOGx
I dont think your excel formula is right... Here is a manual way to calculate ema in python
def exponential_average(values, window):
weights = np.exp(np.linspace(-1.,0.,window))
weights /= weights.sum()
a = np.convolve(values, weights) [:len(values)]
a[:window]=a[window]
return a

scipy.signal.lfilter could help you.
scipy.signal.lfilter(b, a, x, axis=-1,zi=None)
The filter function is implemented as a direct II transposed structure. This means that the filter implements:
a[0]*y[n] = b[0]*x[n] + b[1]*x[n-1] + ... + b[M]*x[n-M]
- a[1]*y[n-1] - ... - a[N]*y[n-N]
If we normalize the above formula, we obtain the following one:
y[n] = b'[0]*x[n] + b'[1]*x[n-1] + ... + b'[M]*x[n-M]
- a'[1]*y[n-1] + ... + a'[N]*y[n-N]
where b'[i] = b[i]/a[0], i = 0,1,...,M; a'[j] = a[j]/a[0],j = 1,2,...,N
and a'[0] = 1
Exponential Moving Average formula:
y[n] = alpha*x[n] + (1-alpha)*y[n-1]
So to apply scipy.signal.lfilter, by the formula above we can set a and b as below:
a[0] = 1, a[1] = -(1-alpha)
b[0] = alpha
My implementation is as below, hope it to help you.
def ema(values, window_size):
alpha = 2./ (window_size + 1)
a = np.array([1, alpha - 1.])
b = np.array([alpha])
zi = sig.lfilter_zi(b, a)
y, _ = sig.lfilter(b, a, values, zi=zi)
return y

Related

RMS value of a function

Now the full code / questions
I would like to estimate the random fluctuations of the function v - therefore I would like to calculate the RMS value of it:
import numpy as np
import matplotlib.pyplot as plt
def HHmodel(I,length, area):
v = []
m = []
h = []
z = []
n = []
squares = []
vsquare = (-60)*(-60)
sumsquares = 0
rms = []
a= []
dt = 0.05
t = np.linspace(0,100,length)
#constants
Cm = area#microFarad
ENa=50 #miliVolt
EK=-77 #miliVolt
El=-54 #miliVolt
g_Na=120*area #mScm-2
g_K=36*area #mScm-2
g_l=0.03*area #mScm-2
def alphaN(v):
return 0.01*(v+50)/(1-np.exp(-(v+50)/10))
def betaN(v):
return 0.125*np.exp(-(v+60)/80)
def alphaM(v):
return 0.1*(v+35)/(1-np.exp(-(v+35)/10))
def betaM(v):
return 4.0*np.exp(-0.0556*(v+60))
def alphaH(v):
return 0.07*np.exp(-0.05*(v+60))
def betaH(v):
return 1/(1+np.exp(-(0.1)*(v+30)))
#Initialize the voltage and the channels :
v.append(-60)
rms.append(1)
m0 = alphaM(v[0])/(alphaM(v[0])+betaM(v[0]))
n0 = alphaN(v[0])/(alphaN(v[0])+betaN(v[0]))
h0 = alphaH(v[0])/(alphaH(v[0])+betaH(v[0]))
#t.append(0)
m.append(m0)
n.append(n0)
h.append(h0)
#solving ODE using Euler's method:
for i in range(1,len(t)):
m.append(m[i-1] + dt*((alphaM(v[i-1])*(1-m[i-1]))-betaM(v[i-1])*m[i-1]))
n.append(n[i-1] + dt*((alphaN(v[i-1])*(1-n[i-1]))-betaN(v[i-1])*n[i-1]))
h.append(h[i-1] + dt*((alphaH(v[i-1])*(1-h[i-1]))-betaH(v[i-1])*h[i-1]))
gNa = g_Na * h[i-1]*(m[i-1])**3
gK=g_K*n[i-1]**4
gl=g_l
INa = gNa*(v[i-1]-ENa)
IK = gK*(v[i-1]-EK)
Il=gl*(v[i-1]-El)
v.append(v[i-1]+(dt)*((1/Cm)*(I[i-1]-(INa+IK+Il))))
#v.append(v[i-1]+(dt)*((1/Cm)*(I-(INa+IK+Il))))
meansquare = np.sqrt((np.square(v).sum()))
return v,area,meansquare
spikeEvents = [] #timing each spike
length = 1000*5 #the time period
fluctuations = []
output = []
for j in range(1, 10):
barcode = np.zeros(length)
noisyI = np.random.normal(0,9,length)
area = 1.0+0.1*j
res = HHmodel(noisyI,length,area)
output.append(res[2])
print('Done.')
The goal should be that the fluctuations of v increase in some way with the size of the are a - I was thinking here of the rms amplitude as a reasonable measure
BR
edit:
for i in range(1,len(t)):
m.append(m[i-1] + dt*((alphaM(v[i-1])*(1-m[i-1]))-betaM(v[i-1])*m[i-1]))
n.append(n[i-1] + dt*((alphaN(v[i-1])*(1-n[i-1]))-betaN(v[i-1])*n[i-1]))
h.append(h[i-1] + dt*((alphaH(v[i-1])*(1-h[i-1]))-betaH(v[i-1])*h[i-1]))
gNa = g_Na * h[i-1]*(m[i-1])**3
gK=g_K*n[i-1]**4
gl=g_l
INa = gNa*(v[i-1]-ENa)
IK = gK*(v[i-1]-EK)
Il=gl*(v[i-1]-El)
v.append(v[i-1]+(dt)*((1/Cm)*(I[i-1]-(INa+IK+Il))))
z.append(v[i-1]-np.mean(v))
#v.append(v[i-1]+(dt)*((1/Cm)*(I-(INa+IK+Il))))
mean = sum(np.square(v))/len(v)
squared_diffs =[(item-mean)**2 for item in v]
ms_diff = sum(squared_diffs)/len(squared_diffs)
rms_diff =np.sqrt(ms_diff)
return v,area,rms_diff
edit2:
Plot for j in range(1,10) - blue: rmsvalue as calculated in edit 1, yellow 1/sqrt(j)
edit3:
Plot for j in range(1,100) - but the "size" of fluctuations should increase, and not decrease and center somewhere

A few minor notes:
So, basically your "function" v is a one-timestep discrete evaluation of some function rather than a true function, but that's not really relevant here.
As indicated by comments above, you should calculate v for all timesteps and aggregate the squared values, then sum them outside of the loop and normalize by dividing by len(v).
It is also unclear why in iteration i you calculate v[i] but the corresponding squared value you calculate is v[i-1] squared. Should use same index on same loop iteration or you'll likely end up missing an element.
I would say that the reason that the result is not useful is that root-mean square is not really ever used for a function's outputs (RMS in this case is just some sort of less useful mean that gives extra weight to outliers); rather RMS is generally used on the error or variance of that function's outputs. RMS error or variance tells you how far, in the function's original units, does the average function value differ from the average value?). Note that this is only really an imporant metric if you expect the value of v to be constant.
Given all this, it's hard to say from your question what your intention is and what you're actually trying to do with this info so I will guess that what you really care about is how much the value of v is varying from the mean. In this case, you can use RMS difference from mean value of v calculated as such:
for i in range(1,len(t)):
#calculate v[i] here, omitted for simplicity
# get mean value
mean = sum(squares)/len(squares)
# you want to get the squared value of the difference, not the value itself
squared_diffs = [(item - mean)**2 for item in v)]
# get mean squared diff
ms_diff = sum(squared_diffs) / len(squared_diffs)
# return root of mean squared diff
rms_diff = np.sqrt(ms_diff)
return v,area,rms_diff
Again, this is only useful if you expect the outputs of v to be a constant. If not, you would try to fit a different model (linear, quadratic, etc.) to the function and then calculate the RMS error. Question would be much clearer if you indicated goal of this calculation.

Does anyone know how to calculate the Coppock Curve in python

I'm currently trying to calculate the Coppock Curve for a strategy i'm making in python.
I've written it like this(ROC1 is the 11 length and the ROC2 is the 14 length):
final = wma_onehr*(rocOne_onehr+rocTwo_onehr)
I know my values are correct but this is the only calculation for it and it does not match with tradingview at all. For instance when I run it I get
ROC1: -1.094
ROC2: -0.961
WMA: 7215.866
And my answer is -15037.864744
While Tradingview is at -0.9
These values are know where near close and i'm just wondering why I have not found a way to get a value like that of any kind. (I'm using taapio api if anyones wondering)

Take a look at below function. Note that data_array that is passed to function is a one dimensional numpy array that contains close prices of financial asset.
import numpy as np
def coppock_curve(data_array, sht_roc_length=11, long_roc_length=14, curve_length=10): # Coppock Curve
"""
:param sht_roc_length: Short Rate of Change length
:param long_roc_length: Long Rate of Change length
:param curve_length: Coppock Curve Line length
:return: Coppock oscillator values
"""
data_array = data_array[-(curve_length + max(sht_roc_length, long_roc_length, curve_length) + 1):]
# Calculation of short rate of change
roc11 = (data_array[-(curve_length + 1):] - data_array[-(curve_length + sht_roc_length + 1):-sht_roc_length]) /\
data_array[-(curve_length + sht_roc_length + 1):-sht_roc_length] * 100
roc14 = (data_array[-(curve_length + 1):] - data_array[:-long_roc_length]) / data_array[:-long_roc_length] * 100
sum_values = roc11 + roc14 # calculation of long rate of change
curve = np.convolve(sum_values, np.arange(1, curve_length + 1, dtype=int)[::-1], 'valid') / \
np.arange(1, curve_length + 1).sum() # calculation of coppock curve line

Looking for code that runs faster or an opinion whether the run time is reasonable

this is in Python
I'm trying to replace NaN values in a dataframe with x, x ~ N.trunc(upper, lower, mu, sigma)
The dataframe's shape about (150000,150)
import scipy.stats as stats
df = pd.read_csv(r'C:\Users\User\Desktop\Coding\Data Project\df1.csv')
for k in df.columns:
upper = np.nanmax(df[str(k)])
lower = np.nanmin(df[str(k)])
mu = df.loc[:,str(k)].mean()
sigma = df.loc[:,str(k)].std()
def fill_nan(column_value): #fill_nan finds NaN values and replaces them with x, N.trunc(upper, lower, mu, sigma,)
if np.isnan(column_value) == True:
column_value = stats.truncnorm((lower - mu) / sigma, (upper - mu) / sigma, loc=mu, scale=sigma).rvs()
return column_value
df[str(k)] = df[str(k)].apply(fill_nan) # runs fill_nan on each column
print('NaN count on dataframe is :%d' %df.isnull().sum().sum())
df.to_csv(r'C:\Users\User\Desktop\Coding\Data Project\df2.csv')
#run-time about 7 minutes
Please estimate whether a 7 minute run time is appropriate to complete this task and how, if at all possible, I could speed up this code or write other faster code.
Thank you.

There are many things to optimize:
1. convert your columns to string only once
You have many lines that contain str(k). Convert your code to only convert the key to a string once by doing k=str(k) once in the beginning and replace all other occurrences by just k
2. declare the fill_nan function only once:
Basically same thing as before, move the declaration of the function out of the loop.
3. optimize the fill_nan function with numba
have a look at https://numba.pydata.org
4. Don't use your own fill_nan at all
instead of iterating over each cell in each column you can simply use the df.fillna method. This way you do not need to apply the previous 2 steps
You did not provide a sample csv so the following code is untested:
for k in df.columns:
k = str(k)
upper = np.nanmax(df[k])
lower = np.nanmin(df[k])
mu = df.loc[:,k].mean()
sigma = df.loc[:,k].std()
column_value = stats.truncnorm((lower - mu) / sigma, (upper - mu) / sigma, loc=mu, scale=sigma).rvs()
df[k] = df[k].fillna(column_value)
5. don't loop at all
this is more for readability than code performance but you should also be able to do:
You did not provide a sample csv so the following code is untested:
upper = np.nanmax(df)
lower = np.nanmin(df)
mu = df.mean()
sigma = df.std()
norm_values = stats.truncnorm((lower - mu) / sigma, (upper - mu) / sigma, loc=mu, scale=sigma).rvs()
df = df.fillna(pd.Series(norm_values))

I testec for 150000 rows and 7 columns, it took less than one second. You need to create upper,lower etc only once.
upper = df.max()
lower = df.min()
mu = df.mean()
sigma = df.std()
column_values = {}
for column_name in df.columns:
column_value = stats.truncnorm((lower[column_name] - mu[column_name]) / sigma[column_name], (upper[column_name] - mu[column_name]) / sigma[column_name], loc=mu[column_name], scale=sigma[column_name]).rvs()
df[column_name].fillna(column_value, inplace=True)
df

solving for a compound interest between PV and "summation" FV?

Given inputs of:
present value = 11, SUMMATIONS of future values that = 126, and n = 7 (periods of change)
how can I solve for a rate of chain that would create a chain that sums into being the FV? This is different from just solving for a rate of return between 11 and 126. This is solving for the rate of return that allows the summation to 126. Ive been trying different ideas and looking up IRR and NPV functions but the summation aspect is stumping me.
In case the summation aspect isn't clear, if I assume a rate of 1.1, that would turn PV = 11 into a list like so (that adds up to nearly the FV 126), how can I solve for r only knowing n, summation fv and pv?:
11
12.1
13.31
14.641
16.1051
17.71561
19.487171
21.4358881
total = 125.7947691
Thank you.
EDIT:
I attempted to create a sort of iterator, but it's hanging after the first loop...
for r in (1.01,1.02,1.03,1.04,1.05,1.06,1.07,1.08,1.09,1.10,1.11,1.12):
print r
test = round(11* (1-r**8) / (1 - r),0)
print test
while True:
if round(126,0) == round(11* (1-r**8) / (1 - r),0):
answer = r
break
else:
pass
EDIT 2:
IV = float(11)
SV = float(126)
N = 8
# sum of a geometric series: (SV = IV * (1 - r^n) / (1 - r )
# r^n - (SV/PV)*r + ((SV - IV)/IV) = 0
# long form polynomial to be solved, with an n of 3 for example:
# 1r^n + 0r^n + 0r^n + -(SV/PV)r + ((SV - IV)/IV)
# each polynomial coefficient can go into numpy.roots to solve
# for the r that solves for the abcd * R = 0 above.
import numpy
array = numpy.roots([1.,0.,0.,0.,0.,0.,0.,(-SV)/IV,(SV-IV)/IV])
for i in array:
if i > 1:
a = str(i)
b = a.split("+")
answer = float(b[0])
print answer
I'm getting a ValueError that my string "1.10044876702" cant be converted to float. any ideas?
SOLVED: i.real gets the real part of it. no need for split or string conversion ie:
for i in array:
if i > 1:
a = i.real
answer = float(a)
print answer

Sum of a geometric series
Subbing in,
126 = 11 * (1 - r**8) / (1 - r)
where we need to solve for r. After rearranging,
r**8 - (126/11)*r + (115/11) = 0
then using NumPy
import numpy as np
np.roots([1., 0., 0., 0., 0., 0., 0., -126./11, 115./11])
gives
array([-1.37597528+0.62438671j, -1.37597528-0.62438671j,
-0.42293755+1.41183514j, -0.42293755-1.41183514j,
0.74868844+1.1640769j , 0.74868844-1.1640769j ,
1.10044877+0.j , 1.00000000+0.j ])
where the first six roots are imaginary and the last is invalid (gives a div-by-0 in the original equation), so the only useable answer is r = 1.10044877.
Edit:
Per the Numpy docs, np.root expects an array-like object (aka a list) containing the polynomial coefficients. So the parameters above can be read as 1.0*r^8 + 0.*r^7 + 0.*r^6 + 0.*r^5 + 0.*r^4 + 0.*r^3 + 0.*r^2 + -126./11*r + 115./11, which is the polynomial to be solved.
Your iterative solver is pretty crude; it will get you a ballpark answer, but the calculation time is exponential with the desired degree of accuracy. We can do much better!
No general analytic solution is known for an eighth-order equation, so some numeric method is needed.
If you really want to code your own solver from scratch, the simplest is Newton-Raphson method - start with a guess, then iteratively evaluate the function and offset your guess by the error divided by the first derivative to hopefully converge on a root - and hope that your initial guess is a good one and your equation has real roots.
If you care more about getting good answers quickly, np.root is hard to beat - it computes the eigenvectors of the companion matrix to simultaneously find all roots, both real and complex.
Edit 2:
Your iterative solver is hanging because of your while True clause - r never changes in the loop, so you will never break. Also, else: pass is redundant and can be removed.
After a good bit of rearranging, your code becomes:
import numpy as np
def iterative_test(rng, fn, goal):
return min(rng, key=lambda x: abs(goal - fn(x)))
rng = np.arange(1.01, 1.20, 0.01)
fn = lambda x: 11. * (1. - x**8) / (1. - x)
goal = 126.
sol = iterative_test(rng, fn, goal)
print('Solution: {} -> {}'.format(sol, fn(sol)))
which results in
Solution: 1.1 -> 125.7947691
Edit 3:
Your last solution is looking much better, but you must keep in mind that the degree of the polynomial (and hence the length of the array passed to np.roots) changes as the number of periods changes.
import numpy as np
def find_rate(present_value, final_sum, periods):
"""
Given the initial value, sum, and number of periods in
a geometric series, solve for the rate of growth.
"""
# The formula for the sum of a geometric series is
# final_sum = sum_i[0..periods](present_value * rate**i)
# which can be reduced to
# final_sum = present_value * (1 - rate**(periods+1) / (1 - rate)
# and then rearranged as
# rate**(periods+1) - (final_sum / present_value)*rate + (final_sum / present_value - 1) = 0
# Build the polynomial
poly = [0.] * (periods + 2)
poly[ 0] = 1.
poly[-2] = -1. * final_sum / present_value
poly[-1] = 1. * final_sum / present_value - 1.
# Find the roots
roots = np.roots(poly)
# Discard unusable roots
roots = [rt for rt in roots if rt.imag == 0. and rt.real != 1.]
# Should be zero or one roots left
if len(roots):
return roots[0].real
else:
raise ValueError('no solution found')
def main():
pv, fs, p = 11., 126., 7
print('Solution for present_value = {}, final_sum = {}, periods = {}:'.format(pv, fs, p))
print('rate = {}'.format(find_rate(pv, fs, p)))
if __name__=="__main__":
main()
This produces:
Solution for present_value = 11.0, final_sum = 126.0, periods = 7:
rate = 1.10044876702

Solving the polynomial roots is overkill. This computation is usually made with a solver such as the Newton method directly applied to the exponential formula. Works for fractional durations too.
For example, https://math.stackexchange.com/questions/502976/newtons-method-annuity-due-equation

Relative Strength Index in python pandas

I am new to pandas. What is the best way to calculate the relative strength part in the RSI indicator in pandas? So far I got the following:
from pylab import *
import pandas as pd
import numpy as np
def Datapull(Stock):
try:
df = (pd.io.data.DataReader(Stock,'yahoo',start='01/01/2010'))
return df
print 'Retrieved', Stock
time.sleep(5)
except Exception, e:
print 'Main Loop', str(e)
def RSIfun(price, n=14):
delta = price['Close'].diff()
#-----------
dUp=
dDown=
RolUp=pd.rolling_mean(dUp, n)
RolDown=pd.rolling_mean(dDown, n).abs()
RS = RolUp / RolDown
rsi= 100.0 - (100.0 / (1.0 + RS))
return rsi
Stock='AAPL'
df=Datapull(Stock)
RSIfun(df)
Am I doing it correctly so far? I am having trouble with the difference part of the equation where you separate out upward and downward calculations

It is important to note that there are various ways of defining the RSI. It is commonly defined in at least two ways: using a simple moving average (SMA) as above, or using an exponential moving average (EMA). Here's a code snippet that calculates various definitions of RSI and plots them for comparison. I'm discarding the first row after taking the difference, since it is always NaN by definition.
Note that when using EMA one has to be careful: since it includes a memory going back to the beginning of the data, the result depends on where you start! For this reason, typically people will add some data at the beginning, say 100 time steps, and then cut off the first 100 RSI values.
In the plot below, one can see the difference between the RSI calculated using SMA and EMA: the SMA one tends to be more sensitive. Note that the RSI based on EMA has its first finite value at the first time step (which is the second time step of the original period, due to discarding the first row), whereas the RSI based on SMA has its first finite value at the 14th time step. This is because by default rolling_mean() only returns a finite value once there are enough values to fill the window.
import datetime
from typing import Callable
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pandas_datareader.data as web
# Window length for moving average
length = 14
# Dates
start, end = '2010-01-01', '2013-01-27'
# Get data
data = web.DataReader('AAPL', 'yahoo', start, end)
# Get just the adjusted close
close = data['Adj Close']
# Define function to calculate the RSI
def calc_rsi(over: pd.Series, fn_roll: Callable) -> pd.Series:
# Get the difference in price from previous step
delta = over.diff()
# Get rid of the first row, which is NaN since it did not have a previous row to calculate the differences
delta = delta[1:]
# Make the positive gains (up) and negative gains (down) Series
up, down = delta.clip(lower=0), delta.clip(upper=0).abs()
roll_up, roll_down = fn_roll(up), fn_roll(down)
rs = roll_up / roll_down
rsi = 100.0 - (100.0 / (1.0 + rs))
# Avoid division-by-zero if `roll_down` is zero
# This prevents inf and/or nan values.
rsi[:] = np.select([roll_down == 0, roll_up == 0, True], [100, 0, rsi])
rsi.name = 'rsi'
# Assert range
valid_rsi = rsi[length - 1:]
assert ((0 <= valid_rsi) & (valid_rsi <= 100)).all()
# Note: rsi[:length - 1] is excluded from above assertion because it is NaN for SMA.
return rsi
# Calculate RSI using MA of choice
# Reminder: Provide ≥ `1 + length` extra data points!
rsi_ema = calc_rsi(close, lambda s: s.ewm(span=length).mean())
rsi_sma = calc_rsi(close, lambda s: s.rolling(length).mean())
rsi_rma = calc_rsi(close, lambda s: s.ewm(alpha=1 / length).mean()) # Approximates TradingView.
# Compare graphically
plt.figure(figsize=(8, 6))
rsi_ema.plot(), rsi_sma.plot(), rsi_rma.plot()
plt.legend(['RSI via EMA/EWMA', 'RSI via SMA', 'RSI via RMA/SMMA/MMA (TradingView)'])
plt.show()

dUp= delta[delta > 0]
dDown= delta[delta < 0]
also you need something like:
RolUp = RolUp.reindex_like(delta, method='ffill')
RolDown = RolDown.reindex_like(delta, method='ffill')
otherwise RS = RolUp / RolDown will not do what you desire
Edit: seems this is a more accurate way of RS calculation:
# dUp= delta[delta > 0]
# dDown= delta[delta < 0]
# dUp = dUp.reindex_like(delta, fill_value=0)
# dDown = dDown.reindex_like(delta, fill_value=0)
dUp, dDown = delta.copy(), delta.copy()
dUp[dUp < 0] = 0
dDown[dDown > 0] = 0
RolUp = pd.rolling_mean(dUp, n)
RolDown = pd.rolling_mean(dDown, n).abs()
RS = RolUp / RolDown

My answer is tested on StockCharts sample data.
StockChart RSI info
def RSI(series, period):
delta = series.diff().dropna()
u = delta * 0
d = u.copy()
u[delta > 0] = delta[delta > 0]
d[delta < 0] = -delta[delta < 0]
u[u.index[period-1]] = np.mean( u[:period] ) #first value is sum of avg gains
u = u.drop(u.index[:(period-1)])
d[d.index[period-1]] = np.mean( d[:period] ) #first value is sum of avg losses
d = d.drop(d.index[:(period-1)])
rs = pd.DataFrame.ewm(u, com=period-1, adjust=False).mean() / \
pd.DataFrame.ewm(d, com=period-1, adjust=False).mean()
return 100 - 100 / (1 + rs)
#sample data from StockCharts
data = pd.Series( [ 44.34, 44.09, 44.15, 43.61,
44.33, 44.83, 45.10, 45.42,
45.84, 46.08, 45.89, 46.03,
45.61, 46.28, 46.28, 46.00,
46.03, 46.41, 46.22, 45.64 ] )
print RSI( data, 14 )
#output
14 70.464135
15 66.249619
16 66.480942
17 69.346853
18 66.294713
19 57.915021

I too had this question and was working down the rolling_apply path that Jev took. However, when I tested my results, they didn't match up against the commercial stock charting programs I use, such as StockCharts.com or thinkorswim. So I did some digging and discovered that when Welles Wilder created the RSI, he used a smoothing technique now referred to as Wilder Smoothing. The commercial services above use Wilder Smoothing rather than a simple moving average to calculate the average gains and losses.
I'm new to Python (and Pandas), so I'm wondering if there's some brilliant way to refactor out the for loop below to make it faster. Maybe someone else can comment on that possibility.
I hope you find this useful.
More info here.
def get_rsi_timeseries(prices, n=14):
# RSI = 100 - (100 / (1 + RS))
# where RS = (Wilder-smoothed n-period average of gains / Wilder-smoothed n-period average of -losses)
# Note that losses above should be positive values
# Wilder-smoothing = ((previous smoothed avg * (n-1)) + current value to average) / n
# For the very first "previous smoothed avg" (aka the seed value), we start with a straight average.
# Therefore, our first RSI value will be for the n+2nd period:
# 0: first delta is nan
# 1:
# ...
# n: lookback period for first Wilder smoothing seed value
# n+1: first RSI
# First, calculate the gain or loss from one price to the next. The first value is nan so replace with 0.
deltas = (prices-prices.shift(1)).fillna(0)
# Calculate the straight average seed values.
# The first delta is always zero, so we will use a slice of the first n deltas starting at 1,
# and filter only deltas > 0 to get gains and deltas < 0 to get losses
avg_of_gains = deltas[1:n+1][deltas > 0].sum() / n
avg_of_losses = -deltas[1:n+1][deltas < 0].sum() / n
# Set up pd.Series container for RSI values
rsi_series = pd.Series(0.0, deltas.index)
# Now calculate RSI using the Wilder smoothing method, starting with n+1 delta.
up = lambda x: x if x > 0 else 0
down = lambda x: -x if x < 0 else 0
i = n+1
for d in deltas[n+1:]:
avg_of_gains = ((avg_of_gains * (n-1)) + up(d)) / n
avg_of_losses = ((avg_of_losses * (n-1)) + down(d)) / n
if avg_of_losses != 0:
rs = avg_of_gains / avg_of_losses
rsi_series[i] = 100 - (100 / (1 + rs))
else:
rsi_series[i] = 100
i += 1
return rsi_series

You can use rolling_apply in combination with a subfunction to make a clean function like this:
def rsi(price, n=14):
''' rsi indicator '''
gain = (price-price.shift(1)).fillna(0) # calculate price gain with previous day, first row nan is filled with 0
def rsiCalc(p):
# subfunction for calculating rsi for one lookback period
avgGain = p[p>0].sum()/n
avgLoss = -p[p<0].sum()/n
rs = avgGain/avgLoss
return 100 - 100/(1+rs)
# run for all periods with rolling_apply
return pd.rolling_apply(gain,n,rsiCalc)

# Relative Strength Index
# Avg(PriceUp)/(Avg(PriceUP)+Avg(PriceDown)*100
# Where: PriceUp(t)=1*(Price(t)-Price(t-1)){Price(t)- Price(t-1)>0};
# PriceDown(t)=-1*(Price(t)-Price(t-1)){Price(t)- Price(t-1)<0};
# Change the formula for your own requirement
def rsi(values):
up = values[values>0].mean()
down = -1*values[values<0].mean()
return 100 * up / (up + down)
stock['RSI_6D'] = stock['Momentum_1D'].rolling(center=False,window=6).apply(rsi)
stock['RSI_12D'] = stock['Momentum_1D'].rolling(center=False,window=12).apply(rsi)
Momentum_1D = Pt - P(t-1) where P is closing price and t is date

You can get a massive speed up of Bill's answer by using numba. 100 loops of 20k row series( regular = 113 seconds, numba = 0.28 seconds ). Numba excels with loops and arithmetic.
import numpy as np
import numba as nb
#nb.jit(fastmath=True, nopython=True)
def calc_rsi( array, deltas, avg_gain, avg_loss, n ):
# Use Wilder smoothing method
up = lambda x: x if x > 0 else 0
down = lambda x: -x if x < 0 else 0
i = n+1
for d in deltas[n+1:]:
avg_gain = ((avg_gain * (n-1)) + up(d)) / n
avg_loss = ((avg_loss * (n-1)) + down(d)) / n
if avg_loss != 0:
rs = avg_gain / avg_loss
array[i] = 100 - (100 / (1 + rs))
else:
array[i] = 100
i += 1
return array
def get_rsi( array, n = 14 ):
deltas = np.append([0],np.diff(array))
avg_gain = np.sum(deltas[1:n+1].clip(min=0)) / n
avg_loss = -np.sum(deltas[1:n+1].clip(max=0)) / n
array = np.empty(deltas.shape[0])
array.fill(np.nan)
array = calc_rsi( array, deltas, avg_gain, avg_loss, n )
return array
rsi = get_rsi( array or series, 14 )

rsi_Indictor(close,n_days):
rsi_series = pd.DataFrame(close)
# Change = close[i]-Change[i-1]
rsi_series["Change"] = (rsi_series["Close"] - rsi_series["Close"].shift(1)).fillna(0)
# Upword Movement
rsi_series["Upword Movement"] = (rsi_series["Change"][rsi_series["Change"] >0])
rsi_series["Upword Movement"] = rsi_series["Upword Movement"].fillna(0)
# Downword Movement
rsi_series["Downword Movement"] = (abs(rsi_series["Change"])[rsi_series["Change"] <0]).fillna(0)
rsi_series["Downword Movement"] = rsi_series["Downword Movement"].fillna(0)
#Average Upword Movement
# For first Upword Movement Mean of first n elements.
rsi_series["Average Upword Movement"] = 0.00
rsi_series["Average Upword Movement"][n] = rsi_series["Upword Movement"][1:n+1].mean()
# For Second onwords
for i in range(n+1,len(rsi_series),1):
#print(rsi_series["Average Upword Movement"][i-1],rsi_series["Upword Movement"][i])
rsi_series["Average Upword Movement"][i] = (rsi_series["Average Upword Movement"][i-1]*(n-1)+rsi_series["Upword Movement"][i])/n
#Average Downword Movement
# For first Downword Movement Mean of first n elements.
rsi_series["Average Downword Movement"] = 0.00
rsi_series["Average Downword Movement"][n] = rsi_series["Downword Movement"][1:n+1].mean()
# For Second onwords
for i in range(n+1,len(rsi_series),1):
#print(rsi_series["Average Downword Movement"][i-1],rsi_series["Downword Movement"][i])
rsi_series["Average Downword Movement"][i] = (rsi_series["Average Downword Movement"][i-1]*(n-1)+rsi_series["Downword Movement"][i])/n
#Relative Index
rsi_series["Relative Strength"] = (rsi_series["Average Upword Movement"]/rsi_series["Average Downword Movement"]).fillna(0)
#RSI
rsi_series["RSI"] = 100 - 100/(rsi_series["Relative Strength"]+1)
return rsi_series.round(2)
For More Information

You do this using finta package as well just to add above
ref: https://github.com/peerchemist/finta/tree/master/examples
import pandas as pd
from finta import TA
import matplotlib.pyplot as plt
ohlc = pd.read_csv("C:\\WorkSpace\\Python\\ta-lib\\intraday_5min_IBM.csv", index_col="timestamp", parse_dates=True)
ohlc['RSI']= TA.RSI(ohlc)

It is not really necessary to calculate the mean, because after they are divided, you only need to calculate the sum, so we can use Series.cumsum ...
def rsi(serie, n):
diff_serie = close.diff()
cumsum_incr = diff_serie.where(lambda x: x.gt(0), 0).cumsum()
cumsum_decr = diff_serie.where(lambda x: x.lt(0), 0).abs().cumsum()
rs_serie = cumsum_incr.div(cumsum_decr)
rsi = rs_serie.mul(100).div(rs_serie.add(1)).fillna(0)
return rsi

Less code here but seems to work for me:
df['Change'] = (df['Close'].shift(-1)-df['Close']).shift(1)
df['ChangeAverage'] = df['Change'].rolling(window=2).mean()
df['ChangeAverage+'] = df.apply(lambda x: x['ChangeAverage'] if x['ChangeAverage'] > 0 else 0,axis=1).rolling(window=14).mean()
df['ChangeAverage-'] = df.apply(lambda x: x['ChangeAverage'] if x['ChangeAverage'] < 0 else 0,axis=1).rolling(window=14).mean()*-1
df['RSI'] = 100-(100/(1+(df['ChangeAverage+']/df['ChangeAverage-'])))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Average True Range and Exponential Moving Average Functions on PandasDataSeries needed - python

Related

RMS value of a function

Does anyone know how to calculate the Coppock Curve in python

Looking for code that runs faster or an opinion whether the run time is reasonable

solving for a compound interest between PV and "summation" FV?

Relative Strength Index in python pandas

Categories

Resources