Accumulative sums and Standard Deviation with loops for in Python

Accumulative sums and Standard Deviation with loops for in Python - python

I has tried solve this but I cannot.
I'm trying determinate standard deviation in finance, I mean:
Pr = Prob are equal [0.3, 0.4, 0.3]
r = Return are equal [0.10 ,0.05, 0.30]
So, first I calculate my average
E(r) = 0.10*0.3 + 0.4*0.05 + 0.3*0.3 = 0.14
Second, calculate my variance:
Var = 0.3*(0.1-0.14)^2 + 0.4*(0.05-0.14)^2 + 0.3*(0.3 - 0.14)^2 = 0.0114
Third, my Standard Deviation is
Var^(1/2) = 0.10677078 rounded to 0.10677
In Python, I has tried solve using basic arhitmetic but I cannot do.
My code is:
import math
def dev_stan(prob, ret):
Pro = 0
Des_Stan = 0
Var = 0
for i in range(len(ret)):
Pro += prob[i]*ret[i]
Var += (ret[i] - Pro)**2*prob[i]
Des_Stan = (math.sqrt(Var))
return Des_Stan, Var, Pro, ret, prob
x = [0.30,0.4,0.30]
y = [0.10,0.05,0.30]
print(dev_stan(x,y))
This code result in : 0.0956556 but this is not the answer.

Your problem is trying to calculate the mean and variance and standard deviation as some sort of running total, all calculated simultaneously. You can't do that with those specific formulas that you are using here. As you showed by hand, you did the mean calculation first, and only after you got the full mean did you calculate the variance, and then only after getting the variance did you calculate the standard deviation. You can't just apply that variance formula to a piece of the mean and hope things work out right.
import math
def dev_stan(prob, ret):
Pro = 0
Des_Stan = 0
Var = 0
for i in range(len(ret)):
Pro += prob[i]*ret[i]
for i in range(len(ret)):
Var += (ret[i] - Pro)**2*prob[i]
Des_Stan = (math.sqrt(Var))
return Des_Stan, Var, Pro, ret, prob
should work. Notice that the final Des_Stan has to be outside the for loop. If you want to compute a running estimate of the mean, variance and standard deviation, you will have to utilize different formulas.

For mathematical calculations in python Numpy is what you want.
import numpy as np
def dev_stan(x, y):
mean = x.dot(y)
var = np.sum(x * (y - mean) ** 2)
std = np.sqrt(var)
return mean, var, std, x, y
x = np.array([0.30,0.4,0.30])
y = np.array([0.10,0.05,0.30])
print(dev_stan(x,y))

Related

How do I calculate standard deviation in python without using numpy?

I'm trying to calculate standard deviation in python without the use of numpy or any external library except for math. I want to get better at writing algorithms and am just doing this as a bit of "homework" as I improve my python skills. My goal is to translate this formula into python but am not getting the correct result.
I'm using an array of speeds where speeds = [86,87,88,86,87,85,86]
When I run:
std_dev = numpy.std(speeds)
print(std_dev)
I get: 0.903507902905. But I don't want to rely on numpy. So...
My implementation is as follows:
import math
speeds = [86,87,88,86,87,85,86]
def get_mean(array):
sum = 0
for i in array:
sum = sum + i
mean = sum/len(array)
return mean
def get_std_dev(array):
# get mu
mean = get_mean(array)
# (x[i] - mu)**2
for i in array:
array = (i - mean) ** 2
return array
sum_sqr_diff = 0
# get sigma
for i in array:
sum_sqr_diff = sum_sqr_diff + i
return sum_sqr_diff
# get mean of squared differences
variance = 1/len(array)
mean_sqr_diff = (variance * sum_sqr_diff)
std_dev = math.sqrt(mean_sqr_diff)
return std_dev
std_dev = get_std_dev(speeds)
print(std_dev)
Now when I run:
std_dev = get_std_dev(speeds)
print(std_dev)
I get: [0] but I am expecting 0.903507902905
What am I missing here?

The problem in your code is the reuse of array and return in the middle of the loop
def get_std_dev(array):
# get mu
mean = get_mean(array) <-- this is 86.4
# (x[i] - mu)**2
for i in array:
array = (i - mean) ** 2 <-- this is almost 0
return array <-- this is the value returned
Now let us look at the algorithm you are using. Note that there are two std deviation formulas that are commonly used. There are various arguments as to which one is correct.
sqrt(sum((x - mean)^2) / n)
or
sqrt(sum((x - mean)^2) / (n -1))
For big values of n, the first formula is used since the -1 is insignificant. The first formula can be reduced to
sqrt(sum(x^2) /n - mean^2)
So how would you do this in python?
def std_dev1(array):
n = len(array)
mean = sum(array) / n
sumsq = sum(v * v for v in array)
return (sumsq / n - mean * mean) ** 0.5

speeds = [86,87,88,86,87,85,86]
# Calculate the mean of the values in your list
mean_speeds = sum(speeds) / len(speeds)
# Calculate the variance of the values in your list
# This is 1/N * sum((x - mean(X))^2)
var_speeds = sum((x - mean_speeds) ** 2 for x in speeds) / len(speeds)
# Take the square root of variance to get standard deviation
sd_speeds = var_speeds ** 0.5
>>> sd_speeds
0.9035079029052513

some problems in the code, one of them is the return value inside the for statement. you can try this
def get_mean(array):
return sum(array) / len(array)
def get_std_dev(array):
n = len(array)
mean = get_mean(array)
squares_arr = []
for item in array:
squares_arr.append((item - mean) ** 2)
return math.sqrt(sum(squares_arr) / n)

If you don't want to use numpy its ok give a try to statistics package in python
import statistics
st_dev = statistics.pstdev(speeds)
print(st_dev)
or if you are still willing to use a custom solution then I recommend you to use the following way using list comprehension instead of your complex buggy approach
import math
mean = sum(speeds) / len(speeds)
var = sum((l-mean)**2 for l in speeds) / len(speeds)
st_dev = math.sqrt(var)
print(st_dev)

This. You need to get rid of return inside for loops.
def get_std_dev(array):
# get mu
mean = get_mean(array)
sum_sqr_diff = 0
# get sigma
for i in array:
sum_sqr_diff = sum_sqr_diff + (i - mean)**2
# get mean of squared differences
variance = 1/len(array)
mean_sqr_diff = (variance * sum_sqr_diff)
std_dev = math.sqrt(mean_sqr_diff)
return std_dev

Correct way to compute 1,2,3 sigma errors

I wanted to calculated 1, 2, 3 sigma error of a distribution using python. It is described in following 68–95–99.7 rule wikipedia page. So far I have written following code. Is it correct way to compute such kpi's. Thanks.
import numpy as np
# sensor and reference value
temperature_measured = np.random.rand(1000) # value from a sensor under test
temperature_reference = np.random.rand(1000) # value from a best sensor from market
# error computation
error = temperature_reference - temperature_measured
error_sigma = np.std(error)
error_mean = np.mean(error)
# kpi comutation
expected_sigma = 1 # 1 degree deviation is allowed (Customer requirement)
kpi_1_sigma = (abs(error - error_mean) < 1*expected_sigma).mean()*100.0 >= 68.27
kpi_2_sigma = (abs(error - error_mean) < 2*expected_sigma).mean()*100.0 >= 95.45
kpi_3_sigma = (abs(error - error_mean) < 3*expected_sigma).mean()*100.0 >= 99.73

I would recommend to use the definition you found in wikipedia and just calculate the percentiles, i.e., calculate the diference between:
((mu+sigma)-(mu-sigma) )/2.
sigma1 = (np.percentile(error, 50+34.1, axis=0)- np.percentile(error, 50-34.1, axis=0))/2.
sigma2 = (np.percentile(error, 50+34.1+13.6, axis=0)- np.percentile(error, 50-34.1-13.6, axis=0))/2.
sigma3 = (np.percentile(error, 50+34.1+13.6+2.1, axis=0)- np.percentile(error, 50-34.1-13.6-2.1, axis=0))/2.

An easier way could be like so (taken from here):
NumPy's std yields the standard deviation, which is usually denoted
with "sigma". To get the 2-sigma or 3-sigma ranges, you can simply
multiply sigma with 2 or 3:
print [x.mean() - 3 * x.std(), x.mean() + 3 * x.std()]
result:
[-27.545797458510656, 52.315028227741429]

RMS value of a function

Now the full code / questions
I would like to estimate the random fluctuations of the function v - therefore I would like to calculate the RMS value of it:
import numpy as np
import matplotlib.pyplot as plt
def HHmodel(I,length, area):
v = []
m = []
h = []
z = []
n = []
squares = []
vsquare = (-60)*(-60)
sumsquares = 0
rms = []
a= []
dt = 0.05
t = np.linspace(0,100,length)
#constants
Cm = area#microFarad
ENa=50 #miliVolt
EK=-77 #miliVolt
El=-54 #miliVolt
g_Na=120*area #mScm-2
g_K=36*area #mScm-2
g_l=0.03*area #mScm-2
def alphaN(v):
return 0.01*(v+50)/(1-np.exp(-(v+50)/10))
def betaN(v):
return 0.125*np.exp(-(v+60)/80)
def alphaM(v):
return 0.1*(v+35)/(1-np.exp(-(v+35)/10))
def betaM(v):
return 4.0*np.exp(-0.0556*(v+60))
def alphaH(v):
return 0.07*np.exp(-0.05*(v+60))
def betaH(v):
return 1/(1+np.exp(-(0.1)*(v+30)))
#Initialize the voltage and the channels :
v.append(-60)
rms.append(1)
m0 = alphaM(v[0])/(alphaM(v[0])+betaM(v[0]))
n0 = alphaN(v[0])/(alphaN(v[0])+betaN(v[0]))
h0 = alphaH(v[0])/(alphaH(v[0])+betaH(v[0]))
#t.append(0)
m.append(m0)
n.append(n0)
h.append(h0)
#solving ODE using Euler's method:
for i in range(1,len(t)):
m.append(m[i-1] + dt*((alphaM(v[i-1])*(1-m[i-1]))-betaM(v[i-1])*m[i-1]))
n.append(n[i-1] + dt*((alphaN(v[i-1])*(1-n[i-1]))-betaN(v[i-1])*n[i-1]))
h.append(h[i-1] + dt*((alphaH(v[i-1])*(1-h[i-1]))-betaH(v[i-1])*h[i-1]))
gNa = g_Na * h[i-1]*(m[i-1])**3
gK=g_K*n[i-1]**4
gl=g_l
INa = gNa*(v[i-1]-ENa)
IK = gK*(v[i-1]-EK)
Il=gl*(v[i-1]-El)
v.append(v[i-1]+(dt)*((1/Cm)*(I[i-1]-(INa+IK+Il))))
#v.append(v[i-1]+(dt)*((1/Cm)*(I-(INa+IK+Il))))
meansquare = np.sqrt((np.square(v).sum()))
return v,area,meansquare
spikeEvents = [] #timing each spike
length = 1000*5 #the time period
fluctuations = []
output = []
for j in range(1, 10):
barcode = np.zeros(length)
noisyI = np.random.normal(0,9,length)
area = 1.0+0.1*j
res = HHmodel(noisyI,length,area)
output.append(res[2])
print('Done.')
The goal should be that the fluctuations of v increase in some way with the size of the are a - I was thinking here of the rms amplitude as a reasonable measure
BR
edit:
for i in range(1,len(t)):
m.append(m[i-1] + dt*((alphaM(v[i-1])*(1-m[i-1]))-betaM(v[i-1])*m[i-1]))
n.append(n[i-1] + dt*((alphaN(v[i-1])*(1-n[i-1]))-betaN(v[i-1])*n[i-1]))
h.append(h[i-1] + dt*((alphaH(v[i-1])*(1-h[i-1]))-betaH(v[i-1])*h[i-1]))
gNa = g_Na * h[i-1]*(m[i-1])**3
gK=g_K*n[i-1]**4
gl=g_l
INa = gNa*(v[i-1]-ENa)
IK = gK*(v[i-1]-EK)
Il=gl*(v[i-1]-El)
v.append(v[i-1]+(dt)*((1/Cm)*(I[i-1]-(INa+IK+Il))))
z.append(v[i-1]-np.mean(v))
#v.append(v[i-1]+(dt)*((1/Cm)*(I-(INa+IK+Il))))
mean = sum(np.square(v))/len(v)
squared_diffs =[(item-mean)**2 for item in v]
ms_diff = sum(squared_diffs)/len(squared_diffs)
rms_diff =np.sqrt(ms_diff)
return v,area,rms_diff
edit2:
Plot for j in range(1,10) - blue: rmsvalue as calculated in edit 1, yellow 1/sqrt(j)
edit3:
Plot for j in range(1,100) - but the "size" of fluctuations should increase, and not decrease and center somewhere

A few minor notes:
So, basically your "function" v is a one-timestep discrete evaluation of some function rather than a true function, but that's not really relevant here.
As indicated by comments above, you should calculate v for all timesteps and aggregate the squared values, then sum them outside of the loop and normalize by dividing by len(v).
It is also unclear why in iteration i you calculate v[i] but the corresponding squared value you calculate is v[i-1] squared. Should use same index on same loop iteration or you'll likely end up missing an element.
I would say that the reason that the result is not useful is that root-mean square is not really ever used for a function's outputs (RMS in this case is just some sort of less useful mean that gives extra weight to outliers); rather RMS is generally used on the error or variance of that function's outputs. RMS error or variance tells you how far, in the function's original units, does the average function value differ from the average value?). Note that this is only really an imporant metric if you expect the value of v to be constant.
Given all this, it's hard to say from your question what your intention is and what you're actually trying to do with this info so I will guess that what you really care about is how much the value of v is varying from the mean. In this case, you can use RMS difference from mean value of v calculated as such:
for i in range(1,len(t)):
#calculate v[i] here, omitted for simplicity
# get mean value
mean = sum(squares)/len(squares)
# you want to get the squared value of the difference, not the value itself
squared_diffs = [(item - mean)**2 for item in v)]
# get mean squared diff
ms_diff = sum(squared_diffs) / len(squared_diffs)
# return root of mean squared diff
rms_diff = np.sqrt(ms_diff)
return v,area,rms_diff
Again, this is only useful if you expect the outputs of v to be a constant. If not, you would try to fit a different model (linear, quadratic, etc.) to the function and then calculate the RMS error. Question would be much clearer if you indicated goal of this calculation.

Define an algorithm which gets a number and a list and returns a scalar based on number's distance to average of the list

Let's suppose that we have got a list which appends an integer in each iteration which is between 15, 32(let's call the integer rand). I want to design an algorithm which assigns a reward around 1 (between 1.25 and 0.75) to each rand. the rule for assigning the reward goes like this.
first we calculate the average of the list. Then if rand is more than average, we expect the reward to be less than 1, and if rand is less than average, the reward gets higher than 1. The more distance between average and rand, the more reward increases/decreases.
for example:
rand = 15, avg = 23 then reward = 1.25
rand = 32, avg = 23 then reward = 0.75
rand = 23, avg = 23 then reward = 1
and so on.
I had developed the code below for this algorithm:
import numpy as np
rollouts = np.array([])
i = 0
def modify_reward(lst, rand):
reward = 1
constant1 = 0.25
constant2 = 1
std = np.std(lst)
global avg
avg = np.mean(lst)
sub = np.subtract(avg, rand)
landa = sub / std if std != 0 else 0
coefficient = -1 + ( 2 / (1 + np.exp(-constant2 * landa)))
md_reward = reward + (reward * constant1 * coefficient)
return md_reward
while i < 100:
rand = np.random.randint(15, 33)
rollouts = np.append(rollouts, rand)
modified_reward = modify_reward(rollouts, rand)
i += 1
print([i,rand, avg, modified_reward])
# test the reward for upper bound and lower bound
rand1, rand2 = 15, 32
reward1, reward2 = modify_reward(rollouts, rand1), modify_reward(rollouts, rand2)
print(['reward for upper bound', rand1, avg, reward1])
print(['reward for lower bound', rand2, avg, reward2])
The algorithm works quite fine, but if you look at examples below, you would notice the problem with algorithm.
rand = 15, avg = 23.94 then reward = 1.17 # which has to be 1.25
rand = 32, avg = 23.94 then reward = 0.84 # which has to be 0.75
rand = 15, avg = 27.38 then reward = 1.15 # which has to be 1.25
rand = 32, avg = 27.38 then reward = 0.93 # which has to be 0.75
As you might have noticed, Algorithm doesn't consider the distance between avg and bounds (15, 32).
The more avg moves towards lower bound or higher bound, the more modified_reward gets unbalanced.
I need modified_reward to be uniformly assigned, no matter avg moves toward upper bound or lower bound.
Can anyone suggest some modification to this algorithm which could consider the distance between avg and bounds of the list.

Putting together these two requirements:
if rand is more than average, we expect the reward to be less than 1, and if rand is less than average, the reward gets higher than 1.
I need modified_reward to be uniformly assigned, no matter avg moves toward upper bound or lower bound.
is slightly tricky, depending on what you mean by 'uniformly'.
If you want 15 to always be rewarded with 1.25, and 32 to always be rewarded with 0.75, you can't have a single linear relationship while also respecting your first requirement.
If you are happy with two linear relationships, you can aim for a situation where modified_reward depends on rand like this:
which I produced with this Wolfram Alpha query. As you can see, this is two linear relationships, with a 'knee' at avg. I expect you'll be able to derive the formulae for each part without too much trouble.

This code implements a linear distribution of weights proportional to the distance from average towards your given limits.
import numpy as np
class Rewarder(object):
lo = 15
hi = 32
weight = 0.25
def __init__(self):
self.lst = np.array([])
def append(self, x):
self.lst = np.append(self.lst, [x])
def average(self):
return np.mean(self.lst)
def distribution(self, a, x, b):
'''
Return a number between 0 and 1 proportional to
the distance of x from a towards b.
Note: Modify this fraction if you want a normal distribution
or quadratic etc.
'''
return (x - a) / (b - a)
def reward(self, x):
avg = self.average()
if x > avg :
w = self.distribution(avg, x, self.hi)
else:
w = - self.distribution(avg, x, self.lo)
return 1 - self.weight * w
rollouts = Rewarder()
rollouts.append(23)
print rollouts.reward(15)
print rollouts.reward(32)
print rollouts.reward(23)
Producing:
1.25
0.75
1.0
The code in your question seems to be using np.std which I presume is an attempt to get a normal distribution. Remember that the normal distribution never actually gets to zero.
If you tell me what shape you want for the distribution we can modify Rewarder.distribution to suit.
Edit:
I can't access the paper you refer to but infer that you want a sigmoid style distribution of rewards giving a 0 at mean and approximately +/-0.25 at min and max. Using the error function as the weighting if we scale by 2 we get approximately 0.995 at min and max.
Override the Rewarder.distribution:
import math
class RewarderERF(Rewarder):
def distribution(self, a, x, b):
"""
Return an Error Function (sigmoid) weigthing of the distance from a.
Note: scaled to reduce error at max to ~0.003
ref: https://en.wikipedia.org/wiki/Sigmoid_function
"""
return math.erf(2.0 * super(RewarderERF, self).distribution(a, x, b))
rollouts = RewarderERF()
rollouts.append(23)
print rollouts.reward(15)
print rollouts.reward(32)
print rollouts.reward(23)
results in:
1.24878131454
0.75121868546
1.0
You can choose which error function suits your application and how much error you can accept at min and max. I'd also expect that you'd integrate all these functions into your class, I've split everything out so we can see the parts.
Regarding the calculating the mean, do you need to keep the list of values and recalculate each time or can you keep a count and running total of the sum? Then you would not need numpy for this calculation.

I don't understand why you are calculating md_reward like this. Please provide logic and reason. But
landa = sub / std if std != 0 else 0
coefficient = -1 + ( 2 / (1 + np.exp(-constant2 * landa)))
md_reward = reward + (reward * constant1 * coefficient)
will not give what you are looking for. Because lets consider below cases
for md_reward to be .75
--> coefficient should be -1
--> landa == -infinite (negative large value, i.e. , rand should be much larger than 32)
for md_reward to be 1
--> coefficient should be 0
--> landa == 0 (std == 0 or sub == 0) # which is possible
for md_reward to be 1.25
--> coefficient should be 1
--> landa == infinite (positive large value, i.e. , rand should be much smaller than 15)
If you want to normalize reward from avg to max and avg to min. check below links.
https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range
https://stats.stackexchange.com/questions/70553/what-does-normalization-mean-and-how-to-verify-that-a-sample-or-a-distribution
Now modify your function with something below.
def modify_reward(lst, rand):
reward = 1
constant1 = 0.25
min_value = 15
max_value = 32
avg = np.mean(lst)
if rand >= avg:
md_reward = reward - constant1*(rand - avg)/(max_value - avg) # normalize rand from avg to max
else:
md_reward = reward + constant1*(1 - (rand - min_value)/(avg - min_value)) # normalize rand from min to avg
return md_reward
I have used below method
Normalized:
(X−min(X))/(max(X)−min(X))
for case rand >= avg
min(X) will be avg and max(X) is max_value
and for case rand < avg
min(X) in min_value and max(X) is avg
Hope this helps.

Try this
def modify_reward(lst, rand):
reward = 1
constant = 0.25 #Think of this as the +/- amount from initial reward
global avg
avg = np.mean(lst)
sub = np.subtract(avg, rand)
dreward = 0
if sub>0:
dreward = sub/(avg-15) #put your lower boundary instead of 15
elif sub<0:
dreward = sub/(32-avg) #put your higher boundary instead of 32
md_reward = reward +(dreward*constant)
return md_reward
This is the linear solution inspired by #AakashM. I don't know if this is what you were looking for, but this fits your description.

Is there a python (scipy) function to determine parameters needed to obtain a target power?

In R there is a very useful function that helps with determining parameters for a two sided t-test in order to obtain a target statistical power.
The function is called power.prop.test.
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/power.prop.test.html
You can call it using:
power.prop.test(p1 = .50, p2 = .75, power = .90)
And it will tell you n the sample size needed to obtain this power. This is extremely useful in deterring sample sizes for tests.
Is there a similar function in the scipy package?

I've managed to replicate the function using the below formula for n and the inverse survival function norm.isf from scipy.stats
from scipy.stats import norm, zscore
def sample_power_probtest(p1, p2, power=0.8, sig=0.05):
z = norm.isf([sig/2]) #two-sided t test
zp = -1 * norm.isf([power])
d = (p1-p2)
s =2*((p1+p2) /2)*(1-((p1+p2) /2))
n = s * ((zp + z)**2) / (d**2)
return int(round(n[0]))
def sample_power_difftest(d, s, power=0.8, sig=0.05):
z = norm.isf([sig/2])
zp = -1 * norm.isf([power])
n = s * ((zp + z)**2) / (d**2)
return int(round(n[0]))
if __name__ == '__main__':
n = sample_power_probtest(0.1, 0.11, power=0.8, sig=0.05)
print n #14752
n = sample_power_difftest(0.1, 0.5, power=0.8, sig=0.05)
print n #392

Some of the basic power calculations are now available in statsmodels
http://statsmodels.sourceforge.net/devel/stats.html#power-and-sample-size-calculations
http://jpktd.blogspot.ca/2013/03/statistical-power-in-statsmodels.html
The blog article does not yet take the latest changes to the statsmodels code into account. Also, I haven't decided yet how many wrapper functions to provide, since many power calculations just reduce to the basic distribution.
>>> import statsmodels.stats.api as sms
>>> es = sms.proportion_effectsize(0.5, 0.75)
>>> sms.NormalIndPower().solve_power(es, power=0.9, alpha=0.05, ratio=1)
76.652940372066908
In R stats
> power.prop.test(p1 = .50, p2 = .75, power = .90)
Two-sample comparison of proportions power calculation
n = 76.7069301141077
p1 = 0.5
p2 = 0.75
sig.level = 0.05
power = 0.9
alternative = two.sided
NOTE: n is number in *each* group
using R's pwr package
> library(pwr)
> h<-ES.h(0.5,0.75)
> pwr.2p.test(h=h, power=0.9, sig.level=0.05)
Difference of proportion power calculation for binomial distribution (arcsine transformation)
h = 0.5235987755982985
n = 76.6529406106181
sig.level = 0.05
power = 0.9
alternative = two.sided
NOTE: same sample sizes

Matt's answer for getting the needed n (per group) is almost right, but there is a small error.
Given d (difference in means), s (standard deviation), sig (significance level, typically .05), and power (typically .80), the formula for calculating the number of observations per group is:
n= (2s^2 * ((z_(sig/2) + z_power)^2) / (d^2)
As you can see in his formula, he has
n = s * ((zp + z)**2) / (d**2)
the "s" part is wrong. a correct function that reproduces r's functionality is:
def sample_power_difftest(d, s, power=0.8, sig=0.05):
z = norm.isf([sig/2])
zp = -1 * norm.isf([power])
n = (2*(s**2)) * ((zp + z)**2) / (d**2)
return int(round(n[0]))
Hope this helps.

You also have:
from statsmodels.stats.power import tt_ind_solve_power
and put "None" in the value you want to obtain. For instande, to obtain the number of observations in the case of effect_size = 0.1, power = 0.8 and so on, you should put:
tt_ind_solve_power(effect_size=0.1, nobs1 = None, alpha=0.05, power=0.8, ratio=1, alternative='two-sided')
and obtain: 1570.7330663315456 as the number of observations required.
Or else, to obtain the power you can attain with the other values fixed:
tt_ind_solve_power(effect_size= 0.2, nobs1 = 200, alpha=0.05, power=None, ratio=1, alternative='two-sided')
and you obtain: 0.5140816347005553

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Accumulative sums and Standard Deviation with loops for in Python - python

For mathematical calculations in python Numpy is what you want. import numpy as np def dev_stan(x, y): mean = x.dot(y) var = np.sum(x * (y - mean) ** 2) std = np.sqrt(var) return mean, var, std, x, y x = np.array([0.30,0.4,0.30]) y = np.array([0.10,0.05,0.30]) print(dev_stan(x,y))

Related

How do I calculate standard deviation in python without using numpy?

Correct way to compute 1,2,3 sigma errors

RMS value of a function

Define an algorithm which gets a number and a list and returns a scalar based on number's distance to average of the list

Is there a python (scipy) function to determine parameters needed to obtain a target power?

Categories

Resources