I'm trying to calculate standard deviation in python without the use of numpy or any external library except for math. I want to get better at writing algorithms and am just doing this as a bit of "homework" as I improve my python skills. My goal is to translate this formula into python but am not getting the correct result.
I'm using an array of speeds where speeds = [86,87,88,86,87,85,86]
When I run:
std_dev = numpy.std(speeds)
print(std_dev)
I get: 0.903507902905. But I don't want to rely on numpy. So...
My implementation is as follows:
import math
speeds = [86,87,88,86,87,85,86]
def get_mean(array):
sum = 0
for i in array:
sum = sum + i
mean = sum/len(array)
return mean
def get_std_dev(array):
# get mu
mean = get_mean(array)
# (x[i] - mu)**2
for i in array:
array = (i - mean) ** 2
return array
sum_sqr_diff = 0
# get sigma
for i in array:
sum_sqr_diff = sum_sqr_diff + i
return sum_sqr_diff
# get mean of squared differences
variance = 1/len(array)
mean_sqr_diff = (variance * sum_sqr_diff)
std_dev = math.sqrt(mean_sqr_diff)
return std_dev
std_dev = get_std_dev(speeds)
print(std_dev)
Now when I run:
std_dev = get_std_dev(speeds)
print(std_dev)
I get: [0] but I am expecting 0.903507902905
What am I missing here?
The problem in your code is the reuse of array and return in the middle of the loop
def get_std_dev(array):
# get mu
mean = get_mean(array) <-- this is 86.4
# (x[i] - mu)**2
for i in array:
array = (i - mean) ** 2 <-- this is almost 0
return array <-- this is the value returned
Now let us look at the algorithm you are using. Note that there are two std deviation formulas that are commonly used. There are various arguments as to which one is correct.
sqrt(sum((x - mean)^2) / n)
or
sqrt(sum((x - mean)^2) / (n -1))
For big values of n, the first formula is used since the -1 is insignificant. The first formula can be reduced to
sqrt(sum(x^2) /n - mean^2)
So how would you do this in python?
def std_dev1(array):
n = len(array)
mean = sum(array) / n
sumsq = sum(v * v for v in array)
return (sumsq / n - mean * mean) ** 0.5
speeds = [86,87,88,86,87,85,86]
# Calculate the mean of the values in your list
mean_speeds = sum(speeds) / len(speeds)
# Calculate the variance of the values in your list
# This is 1/N * sum((x - mean(X))^2)
var_speeds = sum((x - mean_speeds) ** 2 for x in speeds) / len(speeds)
# Take the square root of variance to get standard deviation
sd_speeds = var_speeds ** 0.5
>>> sd_speeds
0.9035079029052513
some problems in the code, one of them is the return value inside the for statement. you can try this
def get_mean(array):
return sum(array) / len(array)
def get_std_dev(array):
n = len(array)
mean = get_mean(array)
squares_arr = []
for item in array:
squares_arr.append((item - mean) ** 2)
return math.sqrt(sum(squares_arr) / n)
If you don't want to use numpy its ok give a try to statistics package in python
import statistics
st_dev = statistics.pstdev(speeds)
print(st_dev)
or if you are still willing to use a custom solution then I recommend you to use the following way using list comprehension instead of your complex buggy approach
import math
mean = sum(speeds) / len(speeds)
var = sum((l-mean)**2 for l in speeds) / len(speeds)
st_dev = math.sqrt(var)
print(st_dev)
This. You need to get rid of return inside for loops.
def get_std_dev(array):
# get mu
mean = get_mean(array)
sum_sqr_diff = 0
# get sigma
for i in array:
sum_sqr_diff = sum_sqr_diff + (i - mean)**2
# get mean of squared differences
variance = 1/len(array)
mean_sqr_diff = (variance * sum_sqr_diff)
std_dev = math.sqrt(mean_sqr_diff)
return std_dev
Related
I'd like to calculate average of each values in a list. To do so, I wrote a function which gets list as parameter and calculate the average and returns the list of average again.
Here is the signal:
random_data = [10 * random.uniform(0,1) for i in range(1000)]
random_peak = [100 * random.uniform(0,1) for i in range(50)] + [0] * 950
random.shuffle(peak)
for i in range(0, len(signal)):
signal = [peak[x] + random_data[x] for x in range(len(random_data))]
And now, I'd like to calculate m as following.
'''
m1 = 1/(number of signal) * x1
m2 = 1/(number of signal) * (x1+x2)
m3 = 1/(number of signal) * (x1+x2+x3)
...
'''
I wrote a following function to calculate m. How would I change the function to return list of m s?
def mean_values(s):
for i in range(len(s)):
m[i] = 1/len(s)*s[i]
return m[i]
mean_values(signal)
#mean_values(np.array(signal)
use m as a float instead of list it make more sense
to get mean
s = 1 / n * Σxi
you can use this to get new mean from a previous one
s' = s + (x1 - s) / n1
where s is the lastest mean, x1 the new value and n1 the new length
However in numpy their is a prebuilt function np.mean() which do that and manage python list too
I wanted to calculated 1, 2, 3 sigma error of a distribution using python. It is described in following 68–95–99.7 rule wikipedia page. So far I have written following code. Is it correct way to compute such kpi's. Thanks.
import numpy as np
# sensor and reference value
temperature_measured = np.random.rand(1000) # value from a sensor under test
temperature_reference = np.random.rand(1000) # value from a best sensor from market
# error computation
error = temperature_reference - temperature_measured
error_sigma = np.std(error)
error_mean = np.mean(error)
# kpi comutation
expected_sigma = 1 # 1 degree deviation is allowed (Customer requirement)
kpi_1_sigma = (abs(error - error_mean) < 1*expected_sigma).mean()*100.0 >= 68.27
kpi_2_sigma = (abs(error - error_mean) < 2*expected_sigma).mean()*100.0 >= 95.45
kpi_3_sigma = (abs(error - error_mean) < 3*expected_sigma).mean()*100.0 >= 99.73
I would recommend to use the definition you found in wikipedia and just calculate the percentiles, i.e., calculate the diference between:
((mu+sigma)-(mu-sigma) )/2.
sigma1 = (np.percentile(error, 50+34.1, axis=0)- np.percentile(error, 50-34.1, axis=0))/2.
sigma2 = (np.percentile(error, 50+34.1+13.6, axis=0)- np.percentile(error, 50-34.1-13.6, axis=0))/2.
sigma3 = (np.percentile(error, 50+34.1+13.6+2.1, axis=0)- np.percentile(error, 50-34.1-13.6-2.1, axis=0))/2.
An easier way could be like so (taken from here):
NumPy's std yields the standard deviation, which is usually denoted
with "sigma". To get the 2-sigma or 3-sigma ranges, you can simply
multiply sigma with 2 or 3:
print [x.mean() - 3 * x.std(), x.mean() + 3 * x.std()]
result:
[-27.545797458510656, 52.315028227741429]
Now the full code / questions
I would like to estimate the random fluctuations of the function v - therefore I would like to calculate the RMS value of it:
import numpy as np
import matplotlib.pyplot as plt
def HHmodel(I,length, area):
v = []
m = []
h = []
z = []
n = []
squares = []
vsquare = (-60)*(-60)
sumsquares = 0
rms = []
a= []
dt = 0.05
t = np.linspace(0,100,length)
#constants
Cm = area#microFarad
ENa=50 #miliVolt
EK=-77 #miliVolt
El=-54 #miliVolt
g_Na=120*area #mScm-2
g_K=36*area #mScm-2
g_l=0.03*area #mScm-2
def alphaN(v):
return 0.01*(v+50)/(1-np.exp(-(v+50)/10))
def betaN(v):
return 0.125*np.exp(-(v+60)/80)
def alphaM(v):
return 0.1*(v+35)/(1-np.exp(-(v+35)/10))
def betaM(v):
return 4.0*np.exp(-0.0556*(v+60))
def alphaH(v):
return 0.07*np.exp(-0.05*(v+60))
def betaH(v):
return 1/(1+np.exp(-(0.1)*(v+30)))
#Initialize the voltage and the channels :
v.append(-60)
rms.append(1)
m0 = alphaM(v[0])/(alphaM(v[0])+betaM(v[0]))
n0 = alphaN(v[0])/(alphaN(v[0])+betaN(v[0]))
h0 = alphaH(v[0])/(alphaH(v[0])+betaH(v[0]))
#t.append(0)
m.append(m0)
n.append(n0)
h.append(h0)
#solving ODE using Euler's method:
for i in range(1,len(t)):
m.append(m[i-1] + dt*((alphaM(v[i-1])*(1-m[i-1]))-betaM(v[i-1])*m[i-1]))
n.append(n[i-1] + dt*((alphaN(v[i-1])*(1-n[i-1]))-betaN(v[i-1])*n[i-1]))
h.append(h[i-1] + dt*((alphaH(v[i-1])*(1-h[i-1]))-betaH(v[i-1])*h[i-1]))
gNa = g_Na * h[i-1]*(m[i-1])**3
gK=g_K*n[i-1]**4
gl=g_l
INa = gNa*(v[i-1]-ENa)
IK = gK*(v[i-1]-EK)
Il=gl*(v[i-1]-El)
v.append(v[i-1]+(dt)*((1/Cm)*(I[i-1]-(INa+IK+Il))))
#v.append(v[i-1]+(dt)*((1/Cm)*(I-(INa+IK+Il))))
meansquare = np.sqrt((np.square(v).sum()))
return v,area,meansquare
spikeEvents = [] #timing each spike
length = 1000*5 #the time period
fluctuations = []
output = []
for j in range(1, 10):
barcode = np.zeros(length)
noisyI = np.random.normal(0,9,length)
area = 1.0+0.1*j
res = HHmodel(noisyI,length,area)
output.append(res[2])
print('Done.')
The goal should be that the fluctuations of v increase in some way with the size of the are a - I was thinking here of the rms amplitude as a reasonable measure
BR
edit:
for i in range(1,len(t)):
m.append(m[i-1] + dt*((alphaM(v[i-1])*(1-m[i-1]))-betaM(v[i-1])*m[i-1]))
n.append(n[i-1] + dt*((alphaN(v[i-1])*(1-n[i-1]))-betaN(v[i-1])*n[i-1]))
h.append(h[i-1] + dt*((alphaH(v[i-1])*(1-h[i-1]))-betaH(v[i-1])*h[i-1]))
gNa = g_Na * h[i-1]*(m[i-1])**3
gK=g_K*n[i-1]**4
gl=g_l
INa = gNa*(v[i-1]-ENa)
IK = gK*(v[i-1]-EK)
Il=gl*(v[i-1]-El)
v.append(v[i-1]+(dt)*((1/Cm)*(I[i-1]-(INa+IK+Il))))
z.append(v[i-1]-np.mean(v))
#v.append(v[i-1]+(dt)*((1/Cm)*(I-(INa+IK+Il))))
mean = sum(np.square(v))/len(v)
squared_diffs =[(item-mean)**2 for item in v]
ms_diff = sum(squared_diffs)/len(squared_diffs)
rms_diff =np.sqrt(ms_diff)
return v,area,rms_diff
edit2:
Plot for j in range(1,10) - blue: rmsvalue as calculated in edit 1, yellow 1/sqrt(j)
edit3:
Plot for j in range(1,100) - but the "size" of fluctuations should increase, and not decrease and center somewhere
A few minor notes:
So, basically your "function" v is a one-timestep discrete evaluation of some function rather than a true function, but that's not really relevant here.
As indicated by comments above, you should calculate v for all timesteps and aggregate the squared values, then sum them outside of the loop and normalize by dividing by len(v).
It is also unclear why in iteration i you calculate v[i] but the corresponding squared value you calculate is v[i-1] squared. Should use same index on same loop iteration or you'll likely end up missing an element.
I would say that the reason that the result is not useful is that root-mean square is not really ever used for a function's outputs (RMS in this case is just some sort of less useful mean that gives extra weight to outliers); rather RMS is generally used on the error or variance of that function's outputs. RMS error or variance tells you how far, in the function's original units, does the average function value differ from the average value?). Note that this is only really an imporant metric if you expect the value of v to be constant.
Given all this, it's hard to say from your question what your intention is and what you're actually trying to do with this info so I will guess that what you really care about is how much the value of v is varying from the mean. In this case, you can use RMS difference from mean value of v calculated as such:
for i in range(1,len(t)):
#calculate v[i] here, omitted for simplicity
# get mean value
mean = sum(squares)/len(squares)
# you want to get the squared value of the difference, not the value itself
squared_diffs = [(item - mean)**2 for item in v)]
# get mean squared diff
ms_diff = sum(squared_diffs) / len(squared_diffs)
# return root of mean squared diff
rms_diff = np.sqrt(ms_diff)
return v,area,rms_diff
Again, this is only useful if you expect the outputs of v to be a constant. If not, you would try to fit a different model (linear, quadratic, etc.) to the function and then calculate the RMS error. Question would be much clearer if you indicated goal of this calculation.
I has tried solve this but I cannot.
I'm trying determinate standard deviation in finance, I mean:
Pr = Prob are equal [0.3, 0.4, 0.3]
r = Return are equal [0.10 ,0.05, 0.30]
So, first I calculate my average
E(r) = 0.10*0.3 + 0.4*0.05 + 0.3*0.3 = 0.14
Second, calculate my variance:
Var = 0.3*(0.1-0.14)^2 + 0.4*(0.05-0.14)^2 + 0.3*(0.3 - 0.14)^2 = 0.0114
Third, my Standard Deviation is
Var^(1/2) = 0.10677078 rounded to 0.10677
In Python, I has tried solve using basic arhitmetic but I cannot do.
My code is:
import math
def dev_stan(prob, ret):
Pro = 0
Des_Stan = 0
Var = 0
for i in range(len(ret)):
Pro += prob[i]*ret[i]
Var += (ret[i] - Pro)**2*prob[i]
Des_Stan = (math.sqrt(Var))
return Des_Stan, Var, Pro, ret, prob
x = [0.30,0.4,0.30]
y = [0.10,0.05,0.30]
print(dev_stan(x,y))
This code result in : 0.0956556 but this is not the answer.
Your problem is trying to calculate the mean and variance and standard deviation as some sort of running total, all calculated simultaneously. You can't do that with those specific formulas that you are using here. As you showed by hand, you did the mean calculation first, and only after you got the full mean did you calculate the variance, and then only after getting the variance did you calculate the standard deviation. You can't just apply that variance formula to a piece of the mean and hope things work out right.
import math
def dev_stan(prob, ret):
Pro = 0
Des_Stan = 0
Var = 0
for i in range(len(ret)):
Pro += prob[i]*ret[i]
for i in range(len(ret)):
Var += (ret[i] - Pro)**2*prob[i]
Des_Stan = (math.sqrt(Var))
return Des_Stan, Var, Pro, ret, prob
should work. Notice that the final Des_Stan has to be outside the for loop. If you want to compute a running estimate of the mean, variance and standard deviation, you will have to utilize different formulas.
For mathematical calculations in python Numpy is what you want.
import numpy as np
def dev_stan(x, y):
mean = x.dot(y)
var = np.sum(x * (y - mean) ** 2)
std = np.sqrt(var)
return mean, var, std, x, y
x = np.array([0.30,0.4,0.30])
y = np.array([0.10,0.05,0.30])
print(dev_stan(x,y))
So I have a multi-variable framework where I have 3 variables with different weights and an array of variations such has :
import numpy as np
weights = np.array([-2.61540125, -0.2480875, -0.2737325])
var = np.array([[0.00660683, -0.03470032, -0.02153846],
[0.00458204, -0.02614379, -0.02830189],
[-0.00098619, -0.00671141, 0.0032362],
[0.00175217, -0.02591793, -0.01217039],
[0.00077738, 0.00886918, 0.00821355],
[0.00077677, -0.02197802, 0.00100000]])
I can easily calculate the standard deviation with np.dot:
cov = np.cov(var.T)
standard_deviation = np.sqrt(weights.T.dot(cov).dot(weights))
standard_deviation
Out:
0.0044526680008974574
but I would like to override the correlation matrix and assume a correlation of 1 for all variable and find a standard deviation based on that assumption. is there any simple matrix operation I can do with numpy to do this? I can do it with a loop but I feel like this is not efficient
the result should be :
0.015335270585229297
I did this to arrive to it:
def stdev(weights, corr, stdev):
total = 0.0
for i in range(0, len(weights)):
for j in range(0, len(weights)):
if i < j:
total = total + weights[i] * stdev[i] * weights[j] * stdev[j] * corr[i, j]
else:
total = total + weights[i] * stdev[i] * weights[j] * stdev[j] * corr[j, i]
return total ** 0.5
cor_one = np.ones(var.shape)
stdev = pd.DataFrame(var).std(ddof=1)
stdev(weights, cor_one, stdev)