How to change the code from Scala to Python - python

def computeTotalVariationDistance(p: Distribution, q: Distribution): Double = {
val pSum = p.sum
val qSum = q.sum
val l1Distance = p.zip(q)
.map { case (_, pVal, qVal) =>
math.abs((pVal / pSum) - (qVal / qSum))
}
.sum
0.5 * l1Distance
}
Can someone help me to change this code into python.

It's actually relatively straightforward. Instead of map, you can use list comprehension:
l1Distance = sum(
[abs((pVal / pSum) - (qVal / qSum))
for pVal, qVal in zip(p, q)]
)
I have not tested this but it should work, or something very similar.

Related

Tensorflow GO - how to port python querie

pretty new to the whole world of tf and co.
managed to create/train/predict a model - in jupyter-playbook using python.
for production, i'd like to use golang. but i am unable to find a "simple" sample on how to do the prediction inside go.
i'd like to have this piece of python, for go:
sample = {
'b': 200,
'c': 10,
'd': 1,
}
input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}
predictions = reloaded_model.predict(input_dict)
prob = tf.nn.sigmoid(predictions[0])
anyone has a good tutorial for model.Session.Run using github.com/galeone/tensorflow/tensorflow/go
regards
helmut
ok so i managed todo it, so i answer it myself for other folks!
using tfgo
import (
tf "github.com/galeone/tensorflow/tensorflow/go"
tg "github.com/galeone/tfgo"
)
func main() {
model := tg.LoadModel("../krot-classifier", []string{"serve"}, nil)
start := time.Now()
fakeInput, _ := tf.NewTensor([][]float32{{1}})
//fakeString, _ := tf.NewTensor([][]string{{"127.0.0"}})
results := model.Exec([]tf.Output{
model.Op("StatefulPartitionedCall", 0),
}, map[tf.Output]*tf.Tensor{
model.Op("serving_default_b", 0): fakeInput,
model.Op("serving_default_c", 0): fakeInput,
model.Op("serving_default_d", 0): fakeInput,
})
predictions := results[0]
pred := predictions.Value().([][]float32)
prob := sigmoid(float64(pred[0][0]))
fmt.Printf("%.1f percent probability\n", (100 * prob))
}
func sigmoid(x float64) float64 {
return 1 / (1 + math.Exp(x*(-1)))
}
for string inputs see the fakeString

Tradingview pinescript's RMA (Moving average used in RSI. It is the exponentially weighted moving average with alpha = 1 / length) in python, pandas

I've been trying to get same results from tradingviews RMA method but I dont know how to accomplish it
In their page RMA is computed as:
plot(rma(close, 15))
//the same on pine
pine_rma(src, length) =>
alpha = 1/length
sum = 0.0
sum := na(sum[1]) ? sma(src, length) : alpha * src + (1 - alpha) * nz(sum[1])
plot(pine_rma(close, 15))
to test I used input and their result, this is input column and the same input after applying tradingview´s rma(input,14):
data = [[588.0,519.9035093599585],
[361.98999999999984,508.62397297710436],
[412.52000000000055,501.7594034787397],
[197.60000000000042,480.0337318016869],
[208.71999999999932,460.6541795301378],
[380.1100000000006,454.90102384941366],
[537.6599999999999,460.8123792887413],
[323.5600000000013,451.0086379109742],
[431.78000000000077,449.6351637744761],
[299.6299999999992,438.9205092191563],
[225.1900000000005,423.65404427493087],
[292.42000000000013,414.28018396957873],
[357.64999999999964,410.23517082889435],
[692.5100000000003,430.3976586268306],
[219.70999999999916,415.34854015348543],
[400.32999999999987,414.2757872853794],
[604.3099999999995,427.849659622138],
[204.29000000000087,411.8811125062711],
[176.26000000000022,395.0510330415374],
[204.1800000000003,381.41738782428473],
[324.0,377.3161458368358],
[231.67000000000007,366.91284970563316],
[184.21000000000092,353.8626461552309],
[483.0,363.08674285842864],
[290.6399999999994,357.911975511398],
[107.10000000000036,339.996834403441],
[179.0,328.49706051748086],
[182.36000000000058,318.05869905194663],
[275.0,314.98307769109323],
[135.70000000000073,302.17714357030087],
[419.59000000000015,310.56377617242225],
[275.6399999999994,308.06922073153487],
[440.48999999999984,317.5278478221396],
[224.0,310.8472872634153],
[548.0100000000001,327.78748103031415],
[257.0,322.73123238529183],
[267.97999999999956,318.82043007205664],
[366.51000000000016,322.2268279240526],
[341.14999999999964,323.57848307233456],
[147.4200000000001,310.9957342814536],
[158.78000000000063,300.12318183277836],
[416.03000000000077,308.4022402732943],
[360.78999999999917,312.14422311091613],
[1330.7299999999996,384.90035003156487],
[506.92000000000013,393.61603931502464],
[307.6100000000006,387.4727507925229],
[296.7299999999996,380.991125735914],
[462.0,386.7774738976345],
[473.8099999999995,392.9940829049463],
[769.4200000000002,419.88164841173585],
[971.4799999999997,459.2815306680404],
[722.1399999999994,478.0571356203232],
[554.9799999999996,483.5516259331572],
[688.5,498.19079550936027],
[292.0,483.462881544406],
[634.9500000000007,494.2833900055199]]
# Create the pandas DataFrame
dfRMA = pd.DataFrame(data, columns = ['input', 'wantedrma'])
dfRMA['try1'] = dfRMA['input'].ewm( alpha=1/14, adjust=False).mean()
dfRMA['try2'] = numpy_ewma_vectorized(dfRMA['input'],14)
dfRMA
ewm does not give me same results, so I searched and found this but I just replicated ewma
def numpy_ewma_vectorized(data, window):
alpha = 1/window
alpha_rev = 1-alpha
scale = 1/alpha_rev
n = data.shape[0]
r = np.arange(n)
scale_arr = scale**r
offset = data[0]*alpha_rev**(r+1)
pw0 = alpha*alpha_rev**(n-1)
mult = data*pw0*scale_arr
cumsums = mult.cumsum()
out = offset + cumsums*scale_arr[::-1]
return out
I am getting these results
Do you know how to translate pinescript rma method in pandas?
I realized that using pandas ewm it seems to converge, last rows are closer and closer to the value, is this correct?
...
As far as I know Pine-script will use data that is not exported, so the weighted mean is taking into account previous records that are not in your table, meaning that you can't reproduce the results without more information.
What you need to do is load around 50-100 points (depending on alpha) of data further into the past than what you actually will use, and use a threshold for the comparing the data. You need that both python and pine-script is using data with the same or at least similar "history".
So you make the calculations using the whole dataframe and then you skip the first rows. You can see the effect of the historical data in your own example as difference between your calculation and pine-script one is quickly vanishing after the 55 point, but of course the difference will also depend on alpha.
So actually your code could be already well written. In any case you can use the pandas implementation directly, it will be easier and probably faster.
const cloneArray = (input) => [...input]
const pluck = (input, key) => input.map(element => element[key])
const pineSma = (source, length) => {
let sum = 0.0
for (let i = 0; i < length; ++i) {
sum += source[i] / length
}
return sum
}
const pineRma = (src, length, last) => {
const alpha = 1.0 / length
return alpha * src[0] + (1.0 - alpha) * last
}
const calculatePineRma = (candles, sourceKey, length) => {
const results = []
for (let i = 0; i <= candles.length - length; ++i) {
const sourceCandles = cloneArray(candles.slice(i, i + length)).reverse()
const closes = pluck(sourceCandles, sourceKey)
if (i === 0) {
results.push(pineSma(closes, length))
continue
}
results.push(pineRma(closes, length, results[results.length - 1]))
}
return results
}

What I am doing wrong in R to python translation?

I have a code in R which implements the Metropolis Hastings algorithm :
trials <- 100000
sim <- numeric(trials)
sim[1] <- 2
for (i in 2: trials) {
old <- sim[i-1]
prop <- runif(1,0,5)
acc <- (exp(-(prop-1)^2/2) + exp(-(prop-4)^2/2)) /
( (exp(-(old-1)^2/2) + exp(-(old-4)^2/2)) )
if (runif(1) < acc)
sim[i] <- prop
else
sim[i] <- old }
mean(sim)
var(sim)
and the results are right.
But when I translate it in Python the results are different.
trials = 100000
sim = np.repeat(0,trials+1)
sim[0] = 2
for i in range (2, trials):
old = sim[i-1]
prop = np.random.uniform(0,5,1)
acc = (np.exp(-(prop-1)**2/2) + np.exp(-(prop-4)**2/2)) /( (np.exp(-(old-1)**2/2) + np.exp(-(old-4)**2/2)) )
if np.random.uniform(1) < acc:
sim[i] = prop
else:
sim[i] = old
Why ? What I am doing wrong here ?
Firstly, you want to start your python for loop at i=1 in the range(1, trials) since Python starts at 0.
Secondly, at if np.random.uniform(1) you are just producing 1.0, so that needs to change, e.g. np.random.uniform(min, max, size=1) or np.random.uniform(size=1) if you just want a uniform number between 0 and 1. Have a look at the documentation for np.random.uniform if this isn't clear.
Update
Thirdly, you are unknowingly casting to integers, so this needs to be handled also. When being used to R this is easy to forget (I just did myself). I have refactored your code below, and this solution should provide you with a similar result to what you see in R.
Here I turned sim into a float vector, and I made sure to subtract and divide using floats inside the exp-functions. Hope this works for you.
trials = 100000
sim = np.repeat(0,trials).astype(np.float64)
sim[0] = 2.0
for i in range (1, trials):
old = sim[i-1]
prop = np.random.uniform(0,5,1)
acc = (np.exp(-(prop-1.0)**2/2.0) + np.exp(-(prop-4.0)**2/2.0)) \
/ ( (np.exp(-(old-1.0)**2/2.0) + np.exp(-(old-4.0)**2/2.0)) )
if np.random.uniform(size=1) < acc:
sim[i] = prop
else:
sim[i] = old

percentiles pandas vs. scala where is the bug?

For a list of numbers
val numbers = Seq(0.0817381355303346, 0.08907955219917718, 0.10581384008994665, 0.10970915785902469, 0.1530743353025532, 0.16728932033107657, 0.181932212814931, 0.23200826752868853, 0.2339654613723784, 0.2581657775305527, 0.3481071101229365, 0.5010850992326521, 0.6153244818101578, 0.6233250409474894, 0.6797744231690304, 0.6923891392381571, 0.7440316016776881, 0.7593186414698002, 0.8028091068764153, 0.8780699052482807, 0.8966649331194205)
python / pandas computes the following percentiles:
25% 0.167289
50% 0.348107
75% 0.692389
However, scala returns:
calcPercentiles(Seq(.25, .5, .75), sortedNumber.toArray)
25% 0.1601818278168149
50% 0.3481071101229365
75% 0.7182103704579226
The numbers are almost matching - but different. How can I get rid of the difference (and most likely fix a bug in my scala code?
val sortedNumber = numbers.sorted
import scala.collection.mutable
case class PercentileResult(percentile:Double, value:Double)
// https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/stats/DescriptiveStats.scala#L537
def calculatePercentile(arr: Array[Double], p: Double)={
// +1 so that the .5 == mean for even number of elements.
val f = (arr.length + 1) * p
val i = f.toInt
if (i == 0) arr.head
else if (i >= arr.length) arr.last
else {
arr(i - 1) + (f - i) * (arr(i) - arr(i - 1))
}
}
def calcPercentiles(percentiles:Seq[Double], arr: Array[Double]):Array[PercentileResult] = {
val results = new mutable.ListBuffer[PercentileResult]
percentiles.foreach(p => {
val r = PercentileResult(percentile = p, value = calculatePercentile(arr, p))
results.append(r)
})
results.toArray
}
python:
import pandas as pd
df = pd.DataFrame({'foo':[0.0817381355303346, 0.08907955219917718, 0.10581384008994665, 0.10970915785902469, 0.1530743353025532, 0.16728932033107657, 0.181932212814931, 0.23200826752868853, 0.2339654613723784, 0.2581657775305527, 0.3481071101229365, 0.5010850992326521, 0.6153244818101578, 0.6233250409474894, 0.6797744231690304, 0.6923891392381571, 0.7440316016776881, 0.7593186414698002, 0.8028091068764153, 0.8780699052482807, 0.8966649331194205]})
display(df.head())
df.describe()
After a bit trial and error I write this code that returns the same results as Panda (using linear interpolation as this is pandas default):
def calculatePercentile(numbers: Seq[Double], p: Double): Double = {
// interpolate only - no special handling of the case when rank is integer
val rank = (numbers.size - 1) * p
val i = numbers(math.floor(rank).toInt)
val j = numbers(math.ceil(rank).toInt)
val fraction = rank - math.floor(rank)
i + (j - i) * fraction
}
From that I would say that the errors was here:
(arr.length + 1) * p
Percentile of 0 should be 0, and percentile at 100% should be a maximal index.
So for numbers (.size == 21) that would be indices 0 and 20. However, for 100% you would get index value of 22 - bigger than the size of array! If not for these guard clauses:
else if (i >= arr.length) arr.last
you would get error and you could suspect that something is wrong. Perhaps authors of the code:
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/stats/DescriptiveStats.scala#L537
used a different definition of percentile... (?) or they might simply have a bug. I cannot tell.
BTW: This:
def calcPercentiles(percentiles:Seq[Double], arr: Array[Double]): Array[PercentileResult]
could be much easier to write like this:
def calcPercentiles(percentiles:Seq[Double], numbers: Seq[Double]): Seq[PercentileResult] =
percentiles.map { p =>
PercentileResult(p, calculatePercentile(numbers, p))
}

Linear Interpolation. How to implement this algorithm in C ? (Python version is given)

There exists one very good linear interpolation method. It performs linear interpolation requiring at most one multiply per output sample. I found its description in a third edition of Understanding DSP by Lyons. This method involves a special hold buffer. Given a number of samples to be inserted between any two input samples, it produces output points using linear interpolation. Here, I have rewritten this algorithm using Python:
temp1, temp2 = 0, 0
iL = 1.0 / L
for i in x:
hold = [i-temp1] * L
temp1 = i
for j in hold:
temp2 += j
y.append(temp2 *iL)
where x contains input samples, L is a number of points to be inserted, y will contain output samples.
My question is how to implement such algorithm in ANSI C in a most effective way, e.g. is it possible to avoid the second loop?
NOTE: presented Python code is just to understand how this algorithm works.
UPDATE: here is an example how it works in Python:
x=[]
y=[]
hold=[]
num_points=20
points_inbetween = 2
temp1,temp2=0,0
for i in range(num_points):
x.append( sin(i*2.0*pi * 0.1) )
L = points_inbetween
iL = 1.0/L
for i in x:
hold = [i-temp1] * L
temp1 = i
for j in hold:
temp2 += j
y.append(temp2 * iL)
Let's say x=[.... 10, 20, 30 ....]. Then, if L=1, it will produce [... 10, 15, 20, 25, 30 ...]
Interpolation in the sense of "signal sample rate increase"
... or i call it, "upsampling" (wrong term, probably. disclaimer: i have not read Lyons'). I just had to understand what the code does and then re-write it for readability. As given it has couple of problems:
a) it is inefficient - two loops is ok but it does multiplication for every single output item; also it uses intermediary lists(hold), generates result with append (small beer)
b) it interpolates wrong the first interval; it generates fake data in front of the first element. Say we have multiplier=5 and seq=[20,30] - it will generate [0,4,8,12,16,20,22,24,28,30] instead of [20,22,24,26,28,30].
So here is the algorithm in form of a generator:
def upsampler(seq, multiplier):
if seq:
step = 1.0 / multiplier
y0 = seq[0];
yield y0
for y in seq[1:]:
dY = (y-y0) * step
for i in range(multiplier-1):
y0 += dY;
yield y0
y0 = y;
yield y0
Ok and now for some tests:
>>> list(upsampler([], 3)) # this is just the same as [Y for Y in upsampler([], 3)]
[]
>>> list(upsampler([1], 3))
[1]
>>> list(upsampler([1,2], 3))
[1, 1.3333333333333333, 1.6666666666666665, 2]
>>> from math import sin, pi
>>> seq = [sin(2.0*pi * i/10) for i in range(20)]
>>> seq
[0.0, 0.58778525229247314, 0.95105651629515353, 0.95105651629515364, 0.58778525229247325, 1.2246063538223773e-016, -0.58778525229247303, -0.95105651629515353, -0.95105651629515364, -0.58778525229247336, -2.4492127076447545e-016, 0.58778525229247214, 0.95105651629515353, 0.95105651629515364, 0.58778525229247336, 3.6738190614671318e-016, -0.5877852522924728, -0.95105651629515342, -0.95105651629515375, -0.58778525229247347]
>>> list(upsampler(seq, 2))
[0.0, 0.29389262614623657, 0.58778525229247314, 0.76942088429381328, 0.95105651629515353, 0.95105651629515364, 0.95105651629515364, 0.7694208842938135, 0.58778525229247325, 0.29389262614623668, 1.2246063538223773e-016, -0.29389262614623646, -0.58778525229247303, -0.76942088429381328, -0.95105651629515353, -0.95105651629515364, -0.95105651629515364, -0.7694208842938135, -0.58778525229247336, -0.29389262614623679, -2.4492127076447545e-016, 0.29389262614623596, 0.58778525229247214, 0.76942088429381283, 0.95105651629515353, 0.95105651629515364, 0.95105651629515364, 0.7694208842938135, 0.58778525229247336, 0.29389262614623685, 3.6738190614671318e-016, -0.29389262614623618, -0.5877852522924728, -0.76942088429381306, -0.95105651629515342, -0.95105651629515364, -0.95105651629515375, -0.76942088429381361, -0.58778525229247347]
And here is my translation to C, fit into Kratz's fn template:
/**
*
* #param src caller supplied array with data
* #param src_len len of src
* #param steps to interpolate
* #param dst output param will be filled with (src_len - 1) * steps + 1 samples
*/
float* linearInterpolation(float* src, int src_len, int steps, float* dst)
{
float step, y0, dy;
float *src_end;
if (src_len > 0) {
step = 1.0 / steps;
for (src_end = src+src_len; *dst++ = y0 = *src++, src < src_end; ) {
dY = (*src - y0) * step;
for (int i=steps; i>0; i--) {
*dst++ = y0 += dY;
}
}
}
}
Please note the C snippet is "typed but never compiled or run", so there might be syntax errors, off-by-1 errors etc. But overall the idea is there.
In that case I think you can avoid the second loop:
def interpolate2(x, L):
new_list = []
new_len = (len(x) - 1) * (L + 1)
for i in range(0, new_len):
step = i / (L + 1)
substep = i % (L + 1)
fr = x[step]
to = x[step + 1]
dy = float(to - fr) / float(L + 1)
y = fr + (dy * substep)
new_list.append(y)
new_list.append(x[-1])
return new_list
print interpolate2([10, 20, 30], 3)
you just calculate the member in the position you want directly. Though - that might not be the most efficient to do it. The only way to be sure is to compile it and see which one is faster.
Well, first of all, your code is broken. L is not defined, and neither is y or x.
Once that is fixed, I run cython on the resulting code:
L = 1
temp1, temp2 = 0, 0
iL = 1.0 / L
y = []
x = range(5)
for i in x:
hold = [i-temp1] * L
temp1 = i
for j in hold:
temp2 += j
y.append(temp2 *iL)
And that seemed to work. I haven't tried to compile it, though, and you can also improve the speed a lot by adding different optimizations.
"e.g. is it possible to avoid the second loop?"
If it is, then it's possible in Python too. And I don't see how, although I don't see why you would do it the way you do. First creating a list of L length of i-temp is completely pointless. Just loop L times:
L = 1
temp1, temp2 = 0, 0
iL = 1.0 / L
y = []
x = range(5)
for i in x:
hold = i-temp1
temp1 = i
for j in range(L):
temp2 += hold
y.append(temp2 *iL)
It all seems overcomplicated for what you get out though. What are you trying to do, actually? Interpolate something? (Duh it says so in the title. Sorry about that.)
There are surely easier ways of interpolating.
Update, a much simplified interpolation function:
# A simple list, so it's easy to see that you interpolate.
indata = [float(x) for x in range(0, 110, 10)]
points_inbetween = 3
outdata = [indata[0]]
for point in indata[1:]: # All except the first
step = (point - outdata[-1]) / (points_inbetween + 1)
for i in range(points_inbetween):
outdata.append(outdata[-1] + step)
I don't see a way to get rid of the inner loop, nor a reason for wanting to do so.
Converting it to C I'll leave up to someone else, or even better, Cython, as C is a great langauge of you want to talk to hardware, but otherwise just needlessly difficult.
I think you need the two loops. You have to step over the samples in x to initialize the interpolator, not to mention copy their values into y, and you have to step over the output samples to fill in their values. I suppose you could do one loop to copy x into the appropriate places in y, followed by another loop to use all the values from y, but that will still require some stepping logic. Better to use the nested loop approach.
(And, as Lennart Regebro points out) As a side note, I don't see why you do hold = [i-temp1] * L. Instead, why not do hold = i-temp, and then loop for j in xrange(L): and temp2 += hold? This will use less memory but otherwise behave exactly the same.
Heres my try at a C implementation for your algorithm. Before trying to further optimize it id suggest you profile its performance with all compiler optimizations enabled.
/**
*
* #param src caller supplied array with data
* #param src_len len of src
* #param steps to interpolate
* #param dst output param needs to be of size src_len * steps
*/
float* linearInterpolation(float* src, size_t src_len, size_t steps, float* dst)
{
float* dst_ptr = dst;
float* src_ptr = src;
float stepIncrement = 1.0f / steps;
float temp1 = 0.0f;
float temp2 = 0.0f;
float hold;
size_t idx_src, idx_steps;
for(idx_src = 0; idx_src < src_len; ++idx_src)
{
hold = *src_ptr - temp1;
temp1 = *src_ptr;
++src_ptr;
for(idx_steps = 0; idx_steps < steps; ++idx_steps)
{
temp2 += hold;
*dst_ptr = temp2 * stepIncrement;
++dst_ptr;
}
}
}

Categories