I am just wondering how to convert the PineScript dev() function into Python code. Is my interpretation correct?
Pine Scripts Example is the following:
plot(dev(close, 10))
// the same on pine
pine_dev(source, length) =>
mean = sma(source, length)
sum = 0.0
for i = 0 to length - 1
val = source[i]
sum := sum + abs(val - mean)
dev = sum/length
plot(pine_dev(close, 10))
My Python code is the following:
df["SMA_highest"] = ta.sma(df["Close"], 10)
df["dev_abs_highest"] = (df["Close"] - df["SMA_highest"]).abs()
df["dev_cumsum_highest"] = df["dev_abs_highest"].rolling(window=10).sum()
df["DEV_SMA_highest"] = df["dev_cumsum_highest"] / 10
What do I need to adjust in the Python code to have the same result as in the Pine Script?
Thanks for any hints :)
I was looking for the same script too and did not find a ready-to-go solution. So I implemented myself. Unfortunately I did not test it completely because the stock prices between yfinance and TradingView differ a little bit, so the result differs a little bit too.
diffavg = stock[columnname].rolling(days).apply(pine_dev)
def pine_dev(column):
summ = 0.0
mean = column.mean()
length = len(column)
for i in range(0, length):
summ = summ + abs( column[i] - mean )
ret_val = summ / length
return ret_val
Basically I use the rolling function and if you apply a function to this, you get all Values from the rolling timeframe in the function.
Related
I'm stuck on returning the result from the function which is checking samples for A/B test and gave the result. The calculation is correct, but somehow I'm getting the result twice. The code and output below.
def test (sample1, sample2):
for i in it.chain (range(len(sample1)), range(len(sample2))):
alpha = .05
difference = (sample1['step_conversion'][i] - sample2['step_conversion'][i])/100
if (i > 0):
p_combined = (sample1['unq_user'][i] + sample2['unq_user'][i]) / (sample1['unq_user'][i-1] + sample2['unq_user'][i-1])
z_value = difference / mth.sqrt(
p_combined * (1 - p_combined) * (1 / sample1['unq_user'][i-1] + 1 / sample2['unq_user'][i-1]))
distr = st.norm(0, 1)
p_value = (1 - distr.cdf(abs(z_value))) * 2
print( sample1['event_name'][i], 'p-value: ', p_value)
if p_value < alpha:
print('Deny H0')
else:
print('Accept H0')
return
So I need the result in output just once (tagged in the box), but I get it twice from both samples.
When using Pandas dataframes, you should avoid most for loops, and use the standard vectorised approach. Use NumPy where applicable.
First, I've reset the indexes (indices) of the dataframes, to be sure .loc can be used with a standard numerical index.
sample1 = sample1.reset_index()
sample2 = sample2.reset_index()
The below does what I think you for loop does.
I can't test it, and without a clear description, example dataframes and expected outcome, it is anyone's guess if the code below does what you want. But it may get close, and mostly serves as an example of the vectorised approach.
import numpy as np
difference = (sample1['step_conversion'] - sample2['step_conversion']) / 100
n = len(sample1)
# Note that Pandas uses `n` as the highest *valid* index when using `.loc`, `n-1` is one lower
p_combined = ((sample1.loc[1:, 'unq_user'] + sample2.loc[1:, 'unq_user']).reset_index(drop=True) /
(sample1.loc[:n-1, 'unq_user'] + sample2.loc[:n-1, 'unq_user'])).reset_index(drop=True)
z_value = difference / np.sqrt(
p_combined * (1 - p_combined) * (
1 / sample1.loc[:n-1, 'unq_user'] + 1 / sample2.loc[:n-1, 'unq_user']))
distr = st.norm(0, 1) # ??
p_value = (1 - distr.cdf(np.abs(z_value))) * 2
sample1['p_value'] = p_value
print(sample1)
# The below prints a list of True values for elements for which the condition is valid.
# You can also use e.g. `print(sample1[p_value < alpha])`.
alpha = 0.05
print('Deny H0:')
print(p_value < alpha)
print('Accept H0:')
print(p_value > alpha)
No for loop needed, and for a large dataframe, the above will be notably faster.
Note that the .reset_index(drop=True) is a bit ugly. But if that is not there, Pandas will divide the two dataframes by equal indices, which is not what we want. This way, that is avoided.
For a list of numbers
val numbers = Seq(0.0817381355303346, 0.08907955219917718, 0.10581384008994665, 0.10970915785902469, 0.1530743353025532, 0.16728932033107657, 0.181932212814931, 0.23200826752868853, 0.2339654613723784, 0.2581657775305527, 0.3481071101229365, 0.5010850992326521, 0.6153244818101578, 0.6233250409474894, 0.6797744231690304, 0.6923891392381571, 0.7440316016776881, 0.7593186414698002, 0.8028091068764153, 0.8780699052482807, 0.8966649331194205)
python / pandas computes the following percentiles:
25% 0.167289
50% 0.348107
75% 0.692389
However, scala returns:
calcPercentiles(Seq(.25, .5, .75), sortedNumber.toArray)
25% 0.1601818278168149
50% 0.3481071101229365
75% 0.7182103704579226
The numbers are almost matching - but different. How can I get rid of the difference (and most likely fix a bug in my scala code?
val sortedNumber = numbers.sorted
import scala.collection.mutable
case class PercentileResult(percentile:Double, value:Double)
// https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/stats/DescriptiveStats.scala#L537
def calculatePercentile(arr: Array[Double], p: Double)={
// +1 so that the .5 == mean for even number of elements.
val f = (arr.length + 1) * p
val i = f.toInt
if (i == 0) arr.head
else if (i >= arr.length) arr.last
else {
arr(i - 1) + (f - i) * (arr(i) - arr(i - 1))
}
}
def calcPercentiles(percentiles:Seq[Double], arr: Array[Double]):Array[PercentileResult] = {
val results = new mutable.ListBuffer[PercentileResult]
percentiles.foreach(p => {
val r = PercentileResult(percentile = p, value = calculatePercentile(arr, p))
results.append(r)
})
results.toArray
}
python:
import pandas as pd
df = pd.DataFrame({'foo':[0.0817381355303346, 0.08907955219917718, 0.10581384008994665, 0.10970915785902469, 0.1530743353025532, 0.16728932033107657, 0.181932212814931, 0.23200826752868853, 0.2339654613723784, 0.2581657775305527, 0.3481071101229365, 0.5010850992326521, 0.6153244818101578, 0.6233250409474894, 0.6797744231690304, 0.6923891392381571, 0.7440316016776881, 0.7593186414698002, 0.8028091068764153, 0.8780699052482807, 0.8966649331194205]})
display(df.head())
df.describe()
After a bit trial and error I write this code that returns the same results as Panda (using linear interpolation as this is pandas default):
def calculatePercentile(numbers: Seq[Double], p: Double): Double = {
// interpolate only - no special handling of the case when rank is integer
val rank = (numbers.size - 1) * p
val i = numbers(math.floor(rank).toInt)
val j = numbers(math.ceil(rank).toInt)
val fraction = rank - math.floor(rank)
i + (j - i) * fraction
}
From that I would say that the errors was here:
(arr.length + 1) * p
Percentile of 0 should be 0, and percentile at 100% should be a maximal index.
So for numbers (.size == 21) that would be indices 0 and 20. However, for 100% you would get index value of 22 - bigger than the size of array! If not for these guard clauses:
else if (i >= arr.length) arr.last
you would get error and you could suspect that something is wrong. Perhaps authors of the code:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/stats/DescriptiveStats.scala#L537
used a different definition of percentile... (?) or they might simply have a bug. I cannot tell.
BTW: This:
def calcPercentiles(percentiles:Seq[Double], arr: Array[Double]): Array[PercentileResult]
could be much easier to write like this:
def calcPercentiles(percentiles:Seq[Double], numbers: Seq[Double]): Seq[PercentileResult] =
percentiles.map { p =>
PercentileResult(p, calculatePercentile(numbers, p))
}
I'm trying to convert tradingview script code to python, but I have this operator I don't understand and I'd like to get some help with it.
https://www.tradingview.com/script/Q0eQz7ll-Fisher-Transform-Indicator-by-Ehlers-Strategy/
Length = input(10, minval=1)
xHL2 = hl2
xMaxH = highest(xHL2, Length)
xMinL = lowest(xHL2,Length)
nValue1 = 0.33 * 2 * ((xHL2 - xMinL) / (xMaxH - xMinL) - 0.5) + 0.67 * nz(nValue1[1])
nValue2 = iff(nValue1 > .99, .999,
iff(nValue1 < -.99, -.999, nValue1))
nFish = 0.5 * log((1 + nValue2) / (1 - nValue2)) + 0.5 * nz(nFish[1])
pos = iff(nFish > nz(nFish[1]), 1,
iff(nFish < nz(nFish[1]), -1, nz(pos[1], 0)))
barcolor(pos == -1 ? red: pos == 1 ? green : blue )
plot(nFish, color=green, title="Fisher")
plot(nz(nFish[1]), color=red, title="Trigger")
The expression I don't understand is from (nFish[1]) and (nValue1[1]) parts.
In the manual of the script(https://www.tradingview.com/study-script-reference/#op_[]), it says that it's series subscript and provides access to previous values of series.
I tried to convert the script code to python with a dataframe that looks like the below, but I have no idea how to translate (nFish[1]) and (nValue1[1]) parts.
Date Open High Low Close
37821 2016/10/13 18:10:00 50.31 50.31 50.27 50.28
37822 2016/10/13 18:09:00 50.30 50.31 50.29 50.31
37823 2016/10/13 18:08:00 50.31 50.31 50.30 50.31
37824 2016/10/13 18:07:00 50.34 50.34 50.31 50.32
37825 2016/10/13 18:06:00 50.37 50.37 50.35 50.35
37826 2016/10/13 18:05:00 50.35 50.37 50.34 50.37
37827 2016/10/13 18:04:00 50.39 50.39 50.35 50.35
for x in range(len(df)):
Pt = (df.iloc[x,2] + df.iloc[x,3]) / 2.0
MaxH = df.iloc[x:x+9, 2].max()
MinL = df.iloc[x:x+9, 3].min()
X = 0.33 * 2.0 * ((Pt - MinL)/(MaxH - MinL) - 0.5) * 0.67 * X[1] # ?????
I'd like to know the meaning of the square brackets in the first script, and if it's possible, I'd like to know how to convert it to Python.
If you have a variable such as x = [] that means the variable is set as a list or an array whichever you prefer to call it(it's called a list in Python technically) so you can store values in it such as x = [1,2,3].
You can then retrieve these values later in your program by doing x[0] where 0 is 1st item in the list.
Edit: So in this case, x[0] would be equal to 1 as that is the 1st item in the list.
in Pine Script on Tradingview, a variable like nFish[1] is the value of the variable nFish 1 bar ago on the chart. nFish[8] is the value of nFish 8 bars ago. I hope that helps.
I was looking to replicate the fisher transform of TradingView in python.
After looking into multiple codes out there, I found that they implement fisher transform differently. I am still trying to find out:
1- What is MaxH and MinL?
In some documentation it is the highest of all HIGHs in a certain period.Ref: https://wizardforcel.gitbooks.io/python-quant-uqer/content/131.html
In other implementations like yours it is highest of the price which is (high+low)/2 in a certain period. Ref: https://www.mesasoftware.com/papers/UsingTheFisherTransform.pdf
2- Which price does tradingview use to calculate fisher transform, some implementations use the price of (high + low)/2 and some others use the close/open price. 3- The ratio of multiplications, some use (0.33 & 0.67) where others use (0.5 & 0.5).
Here is the python script that I put together to use the highest high and lowest low but it is not matching TradingView's fisher 9 indicator. I tried using the highest/lowest of the price hl with no luck as well.
def fisherT(high,low,n):
high = np.asarray(high,dtype=np.float32)
low = np.asarray(low,dtype=np.float32)
sizeArray = len(high)
hl = (high + low)/2
maxHln = [max(high[x:x+n]) for x in range(sizeArray-n+1)]
minHln = [min(low[x:x+n]) for x in range(sizeArray-n+1)]
valTn = [0.33*2*((hl[x+n-1]-minHln[x])/(maxHln[x]-minHln[x])-0.5) if maxHln[x]-minHln[x] != 0 else 0.33*2*((hl[x+n-1]-minHln[x])/(0.001)-0.5) for x in range(sizeArray-n+1)]
for i in range(1,sizeArray-n+1):
valTn[i] = valTn[i] + 0.67 * valTn[i-1]
if valTn[i] > 0.99:
valTn[i] = 0.999
elif valTn[i] < -0.99:
valTn[i] = -0.999
_fisher = [0.5*np.log((1.0+valTn[x])/(1.0-valTn[x])) for x in range(sizeArray-n+1)]
_fisherSignal = []
for i in range(1,sizeArray-n+1):
_fisher[i] = _fisher[i] + 0.5 * _fisher[i-1]
_fisherSignal.append(_fisher[i-1])
fisher = np.zeros(sizeArray)
fisherSignal = np.zeros(sizeArray)
fisher[n-1:] = _fisher
fisherSignal[n:] = _fisherSignal
return fisher,fisherSignal
I'm doing an exercise that asks for a function that approximates the value of pi using Leibniz' formula. These are the explanations on Wikipedia:
Logical thinking comes to me easily, but I wasn't given much of a formal education in maths, so I'm a bit lost as to what the leftmost symbols in the second one represent. I tried to make the code pi = ( (-1)**n / (2*n + 1) ) * 4, but that returned 1.9999990000005e-06 instead of 3.14159..., so I used an accumulator pattern instead (since the chapter of the guide that this was in mentions them as well) and it worked fine. However, I can't help thinking that it's somewhat contrived and there's probably a better way to do it, given Python's focus on simplicity and making programmes as short as possible. This is the full code:
def myPi(n):
denominator = 1
addto = 1
for i in range(n):
denominator = denominator + 2
addto = addto - (1/denominator)
denominator = denominator + 2
addto = addto + (1/denominator)
pi = addto * 4
return(pi)
print(myPi(1000000))
Does anyone know a better function?
The Leibniz formula translates directly into Python with no muss or fuss:
>>> steps = 1000000
>>> sum((-1.0)**n / (2.0*n+1.0) for n in reversed(range(steps))) * 4
3.1415916535897934
The capital sigma here is sigma notation. It is notation used to represent a summation in concise form.
So your sum is actually an infinite sum. The first term, for n=0, is:
(-1)**0/(2*0+1)
This is added to
(-1)**1/(2*1+1)
and then to
(-1)**2/(2*2+1)
and so on for ever. The summation is what is known mathematically as a convergent sum.
In Python you would write it like this:
def estimate_pi(terms):
result = 0.0
for n in range(terms):
result += (-1.0)**n/(2.0*n+1.0)
return 4*result
If you wanted to optimise a little, you can avoid the exponentiation.
def estimate_pi(terms):
result = 0.0
sign = 1.0
for n in range(terms):
result += sign/(2.0*n+1.0)
sign = -sign
return 4*result
....
>>> estimate_pi(100)
3.1315929035585537
>>> estimate_pi(1000)
3.140592653839794
Using pure Python you can do something like:
def term(n):
return ( (-1.)**n / (2.*n + 1.) )*4.
def pi(nterms):
return sum(map(term,range(nterms)))
and then calculate pi with the number of terms you need to reach a given precision:
pi(100)
# 3.13159290356
pi(1000)
# 3.14059265384
The following version uses Ramanujan's formula as outlined in this SO post - it uses a relation between pi and the "monster group", as discussed in this article.
import math
def Pi(x):
Pi = 0
Add = 0
for i in range(x):
Add =(math.factorial(4*i) * (1103 + 26390*i))/(((math.factorial(i))**4)*(396**(4*i)))
Pi = Pi + (((math.sqrt(8))/(9801))*Add)
Pi = 1/Pi
print(Pi)
Pi(100)
This was my approach:
def estPi(terms):
outPut = 0.0
for i in range (1, (2 * terms), 4):
outPut = (outPut + (1/i) - (1/(i+2)))
return 4 * outPut
I take in the number of terms the user wants, then in the for loop I double it to account for only using odds.
at 100 terms I get 3.1315929035585537
at 1000 terms I get 3.140592653839794
at 10000 terms I get 3.1414926535900345
at 100000 terms I get 3.1415826535897198
at 1000000 terms I get 3.1415916535897743
at 10000000 terms I get 3.1415925535897915
at 100000000 terms I get 3.141592643589326
at 1000000000 terms I get 3.1415926525880504
Actual Pi is 3.1415926535897932
Got to love a convergent series.
def myPi(iters):
pi = 0
sign = 1
denominator = 1
for i in range(iters):
pi = pi + (sign/denominator)
# alternating between negative and positive
sign = sign * -1
denominator = denominator + 2
pi = pi * 4.0
return pi
pi_approx = myPi(10000)
print(pi_approx)
old thread, but i wanted to stuff around with this and coincidentally i came up with pretty much the same as user3220980
# gregory-leibnitz
# pi acurate to 8 dp in around 80 sec
# pi to 5 dp in .06 seconds
import time
start_time = time.time()
pi = 4 # start at 4
times = 100000000
for i in range(3,times,4):
pi -= (4/i) + (4/(i + 2))
print(pi)
print("{} seconds".format(time.time() - start_time))
What's an efficient way to calculate a trimmed or winsorized standard deviation of a list?
I don't mind using numpy, but if I have to make a separate copy of the list, it's going to be quite slow.
This will make two copies, but you should give it a try because it should be very fast.
def trimmed_std(data, low, high):
tmp = np.asarray(data)
return tmp[(low <= tmp) & (tmp < high)].std()
Do you need to do rank order trimming (ie 5% trimmed)?
Update:
If you need percentile trimming, the best way I can think of is to sort the data first. Something like this should work:
def trimmed_std(data, percentile):
data = np.array(data)
data.sort()
percentile = percentile / 2.
low = int(percentile * len(data))
high = int((1. - percentile) * len(data))
return data[low:high].std(ddof=0)
You can obviously implement this without using numpy, but even including the time of converting the list to an array, using numpy is faster than anything I could think of.
This is what generator functions are for.
SD requires two passes, plus a count. For this reason, you'll need to "tee" some iterators over the base collection.
So.
trimmed = ( x for x in the_list if low <= x < high )
sum_iter, len_iter, var_iter = itertools.tee( trimmed, 3 )
n = sum( 1 for x in len_iter)
mean = sum( sum_iter ) / n
sd = math.sqrt( sum( (x-mean)**2 for x in var_iter ) / (n-1) )
Something like that might do what you want without copying anything.
In order to get an unbiased trimmed mean you have to account for fractional bits of items in the list as described here and (a little less directly) here. I wrote a function to do it:
def percent_tmean( data, pcent ):
# make sure data is a list
dc = list( data )
# find the number of items
n = len(dc)
# sort the list
dc.sort()
# get the proportion to trim
p = pcent / 100.0
k = n*p
# print "n = %i\np = %.3f\nk = %.3f" % ( n,p,k )
# get the decimal and integer parts of k
dec_part, int_part = modf( k )
# get an index we can use
index = int(int_part)
# trim down the list
dc = dc[ index: index * -1 ]
# deal with the case of trimming fractional items
if dec_part != 0.0:
# deal with the first remaining item
dc[ 0 ] = dc[ 0 ] * (1 - dec_part)
# deal with last remaining item
dc[ -1 ] = dc[ -1 ] * (1 - dec_part)
return sum( dc ) / ( n - 2.0*k )
I also made an iPython Notebook that demonstrates it.
My function will probably be slower than those already posted but it will give unbiased results.