Finding the self-consistent solution to an equation - python

At the bottom of this question are a set of functions transcribed from a published neural-network model. When I call R, I get the following error:
RuntimeError: maximum recursion depth exceeded while calling a Python object
Note that within each call to R, a recursive call to R is made for every other neuron in the network. This is what causes the recursion depth to be exceeded. Each return value for R depends on all the others (with the network involving N = 512 total values.) Does anyone have any idea what method should be used to compute the self-consistent solution for R? Note that R itself is a smooth function. I've tried treating this as a vector root-solving problem -- but in this case the 512 dimensions are not independent. With so many degrees of freedom, the roots are never found (using the scipy.optimize functions). Does Python have any tools that can help with this? Maybe it would be more natural to solve R using something like Mathematica? I don't know how this is normally done.
"""Recurrent model with strong excitatory recurrence."""
import numpy as np
l = 3.14
def R(x_i):
"""Steady-state firing rate of neuron at location x_i.
Parameters
----------
x_i : number
Location of this neuron.
Returns
-------
rate : float
Firing rate.
"""
N = 512
T = 1
x = np.linspace(-2, 2, N)
sum_term = 0
for x_j in x:
sum_term += J(x_i - x_j) * R(x_j)
rate = I_S(x_i) + I_A(x_i) + 1.0 / N * sum_term - T
if rate < 0:
return 0
return rate
def I_S(x):
"""Sensory input.
Parameters
----------
x : number
Location of this neuron.
Returns
-------
float
Sensory input to neuron at x.
"""
S_0 = 0.46
S_1 = 0.66
x_S = 0
sigma_S = 1.31
return S_0 + S_1 * np.exp(-0.5 * (x - x_S) ** 2 / sigma_S ** 2)
def I_A(x):
"""Attentional additive bias.
Parameters
----------
x : number
Location of this neuron.
Returns
-------
number
Additive bias for neuron at x.
"""
x_A = 0
A_1 = 0.089
sigma_A = 0.35
A_0 = 0
sigma_A_prime = 0.87
if np.abs(x - x_A) < l:
return (A_1 * np.exp(-0.5 * (x - x_A) ** 2 / sigma_A ** 2) +
A_0 * np.exp(-0.5 * (x - x_A) ** 2 / sigma_A_prime ** 2))
return 0
def J(dx):
"""Connection strength.
Parameters
----------
dx : number
Neuron i's distance from neuron j.
Returns
-------
number
Connection strength.
"""
J_0 = -2.5
J_1 = 8.5
sigma_J = 1.31
if np.abs(dx) < l:
return J_0 + J_1 * np.exp(-0.5 * dx ** 2 / sigma_J ** 2)
return 0
if __name__ == '__main__':
pass

This recursion never ends since there is no termination condition before recursive call, adjusting maximum recursion depth does not help
def R(x_i):
...
for x_j in x:
sum_term += J(x_i - x_j) * R(x_j)
Perhaps you should be doing something like
# some suitable initial guess
state = guess
while True: # or a fixed number of iterations
next_state = compute_next_state(state)
if some_condition_check(state, next_state):
# return answer
return state
if some_other_check(state, next_state):
# something wrong, terminate
raise ...

Change the maximum recursion depth using sys.setrecursionlimit
import sys
sys.setrecursionlimit(10000)
def rec(i):
if i > 1000:
print 'i is over 1000!'
return
rec(i + 1)
rec(0)
More info: https://docs.python.org/3/library/sys.html#sys.setrecursionlimit`

Related

Scipy's bfgs for CEC2013 benchmark functions with 30 dimensions -> precision loss for Rotated Bent Cigar & Rotated Schaffers F7

I'm using Scipy's BFGS on built CEC2013 benchmark functions(source: https://github.com/Naeemeh146/cec-benchmark-2013-python), but the function of Rotated Bent Cigar and the function of Rotated Schaffers F7 didn't work well under 30 dimensions. Both of the functions got precision loss. I'm new to the optimization and I've googled a lot but I still don't know how to solve this...🙏
Reproduce the problem of Rotated Bent Cigar
After importing the bfgs from scipy, I used np.ones(30) as the initial point (x0). For Rotated Bent Cigar, I set gtol=1e-9 so that the result value won't be "nan".
from scipy.optimize import fmin_bfgs fmin_bfgs(f3, np.ones(30), fprime=None, args=(), gtol=1e-9, norm=float('inf'), epsilon=math.sqrt(np.finfo(float).eps), maxiter=None, full_output=0, disp=1, retall=0, callback=None)
Output I got
` RuntimeWarning: overflow encountered in double_scalars
Y = Z[0]**2 + 1e6 * np.sum(Z[1:]**2) - 1200
RuntimeWarning: overflow encountered in square
Y = Z[0]**2 + 1e6 * np.sum(Z[1:]**2) - 1200
RuntimeWarning: invalid value encountered in subtract
df = fun(x) - f0
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 77457993654581510275072.000000
Iterations: 28
Function evaluations: 3524
Gradient evaluations: 113`
Reproduce the problem of Rotated Schaffers F7
For Rotated Schaffers F7, I used the default value of gtol=1e-5 as below:
fmin_bfgs(f7, np.ones(30), fprime=None, args=(), gtol=1e-5, norm=float('inf'), epsilon=math.sqrt(np.finfo(float).eps), maxiter=None, full_output=0, disp=1, retall=0, callback=None)
Output I got
Warning: Desired error not necessarily achieved due to precision loss. Current function value: 2406912647990380032.000000 Iterations: 1 Function evaluations: 1531 Gradient evaluations: 49
The benchmark functions code
The benchmark function code were written as below, using this code.
`. def read_M(self , dim , m):
return self.rotate_data[m * dim : (m + 1) * dim]
def shift_data(self , dim , m):
return np.array(self.sd[m * dim : (m + 1) * dim])
def carat(self , dim , alpha): # I don't know this is correct or not!!!
return alpha ** (np.arange(dim)/(2*(dim-1)))
def T_asy(self , X , Y, beta):
D = len(X)
for i in range(D):
if X[i] > 0:
Y[i] = X[i] ** (1 + beta * (i/(D-1)) * np.sqrt(X[i]) )
pass
# Rotated Bent Cigar Function
# 3
def f3(self,X):
X_shift = X - self.O
X_rotate = self.M1 # X_shift
self.T_asy(X_rotate ,X_shift, 0.5)
Z = self.M2 # X_shift
Y = Z[0]**2 + 1e6 * np.sum(Z[1:]**2) - 1200
return Y
# Rotated Schaffers F7 Function
# 7
def f7(self,X):
d = len(X)
X_shift = X - self.O
X_rotate = self.M1 # X_shift
self.T_asy(X_rotate , X_shift, 0.5)
Z = self.M2 # (self.carat(d , 10) * X_shift)
Z = np.sqrt(Z[:-1]**2 + Z[1:]**2)
Y = (np.sum(np.sqrt(Z) + np.sqrt(Z) * np.sin(50 * Z**0.2)**2 ) / (d-1))**2 - 800
return Y
`
Note: I really appriciate any help I get from you. Wish you the best.🙇‍♀️
I tried to change the value of epsilon to make it smaller but it didn't help either.

Generate mesh from implicit function

I want to use Python (or Python 3) to generate a volume (3D) mesh from an implicit function:
def func(x,y,z):
q = 0.25
mu = q/(1.+q)
return -(1-mu)*pow(x*x + y*y + z*z,-1./2.) - mu*pow(pow(x-1,2) + y*y + z*z,-1./2.) - 0.5*(pow(x-mu,2) + y*y) + 1.9023266381531847
This function has a complicated isosurface, but I want to restrict the surface to be between x=-0.615 and x=1.4, and y = -0.6, y=0.6, there are no restrictions on the z-direction, but the interesting part is between z = +/- 1.
I have tried pygalmesh, but I could not get their example to adapt to my function. It crashes my Python kernel without output. It is possible to get pygalmesh to do this? If not, what would be a better way?
Just for the record, this doesn't crash without output for me:
import numpy
import pygalmesh
class GrimReaper(pygalmesh.DomainBase):
def __init__(self):
super().__init__()
def eval(self, x):
q = 0.25
mu = q / (1.0 + q)
x, y, z = x
return (
-(1 - mu) / numpy.sqrt(x ** 2 + y ** 2 + z ** 2)
- mu / numpy.sqrt((x - 1) ** 2 + y ** 2 + z ** 2)
- 0.5 * ((x - mu) ** 2 + y ** 2)
+ 1.9023266381531847
)
def get_bounding_sphere_squared_radius(self):
return 2.0
d = GrimReaper()
mesh = pygalmesh.generate_mesh(d, cell_size=0.1)
mesh.write("out.xmf")
Inserting protection balls...
refine_balls = true
min_balls_radius = 0
min_balls_weight = 0
insert_corners() done. Nb of points in triangulation: 0
insert_balls_on_edges() done. Nb of points in triangulation: 0
refine_balls() done. Nb of points in triangulation: 0
construct initial points (nb_points: 12)
s.py:17: RuntimeWarning: divide by zero encountered in double_scalars
+ 1.9023266381531847
12/12 initial point(s) found...
Start surface scan...Scanning triangulation for bad facets (sequential) - number of finite facets = 50...
Number of bad facets: 0
scanning edges (lazy)
scanning vertices (lazy)
end scan. [Bad facets:0]
Refining Surface...
Legend of the following line: (#vertices,#steps,#facets to refine,#tets to refine)
(12,0,0,0)
Total refining surface time: 1.90735e-05s
Start volume scan...Scanning triangulation for bad cells (sequential)... 20 cells scanned, done.
Number of bad cells: 1
end scan. [Bad tets:1]
Refining...
Legend of the following line: (#vertices,#steps,#facets to refine,#tets to refine)
(23,11,0,70) (18839.3 vertices/s)Segmentation fault (core dumped)
It works fine without the term - 0.5 * ((x - mu) ** 2 + y ** 2).
The segfault points towards an issue in CGAL. Perhaps it's useful to file a bug there.

Problems Solving XOR with Genetic Algorithm

I'm trying to solve XOR problem using neural network. For training I'm using genetic algorithm.
population size : 200
max_generations: 10000
crossover rate : 0,8
mutation rate : 0.1
number of weights : 9
activation function : sigmoid
selection method : high percentance for the ones with best fits
Code:
def crossover(self,wfather,wmother):
r = np.random.random()
if r <= self.crossover_perc:
new_weight= self.crossover_perc*wfather+(1-self.crossover_perc)*wmother
new_weight2=self.crossover_perc*wmother+(1-self.crossover_perc)*wfather
return new_weight,new_weight2
else:
return wfather,wmother
def select(self,fits):
percentuais = np.array(fits) / float(sum(fits))
vet = [percentuais[0]]
for p in percentuais[1:]:
vet.append(vet[-1] + p)
r = np.random.random()
#print(len(vet), r)
for i in range(len(vet)):
if r <= vet[i]:
return i
def mutate(self, weight):
r = np.random.random()
if r <= self.mut_perc:
mutr=np.random.randint(self.number_weights)
weight[mutr] = weight[mutr] + np.random.normal()
return weight
def activation_fuction(self, net):
return 1 / (1 + math.exp(-net))
Problem:
~5/10 tests works fine
Expected Output:
0,0 0
0,1 1
1,0 1
1,1 0
Tests:
Its inconsistent, sometimes i got four 0's, three 1's, multiple results
Could you help me find the error?
**Edit
All Code:
def create_initial_population(self):
population = np.random.uniform(-40, 40, [self.population_size, self.number_weights])
return population
def feedforward(self, inp1, inp2, weights):
bias = 1
x = self.activation_fuction(bias * weights[0] + (inp1 * weights[1]) + (inp2 * weights[2]))
x2 = self.activation_fuction(bias * weights[3] + (inp1 * weights[4]) + (inp2 * weights[5]))
out = self.activation_fuction(bias * weights[6] + (x * weights[7]) + (x2 * weights[8]))
print(inp1, inp2, out)
return out
def fitness(self, weights):
y1 = abs(0.0 - self.feedforward(0.0, 0.0, weights))
y2 = abs(1.0 - self.feedforward(0.0, 1.0, weights))
y3 = abs(1.0 - self.feedforward(1.0, 0.0, weights))
y4 = abs(0.0 - self.feedforward(1.0, 1.0, weights))
error = (y1 + y2 + y3 + y4) ** 2
# print("Error: ", 1/error)
return 1 / error
def sortpopbest(self, pop):
pop_with_fit = [(weights,self.fitness(weights)) for weights in pop]
sorted_population=sorted(pop_with_fit, key=lambda weights_fit: weights_fit[1]) #Worst->Best One
fits = []
pop = []
for i in sorted_population:
pop.append(i[0])
fits.append(i[1])
return pop,fits
def execute(self):
pop = self.create_initial_population()
for g in range(self.max_generations): # maximo de geracoes
pop, fits = self.sortpopbest(pop)
nova_pop=[]
for c in range(int(self.population_size/2)):
weights = pop[self.select(fits)]
weights2 = pop[self.select(fits)]
new_weights,new_weights2=self.crossover(weights,weights2)
new_weights=self.mutate(new_weights)
new_weights2=self.mutate(new_weights2)
#print(fits)
nova_pop.append(new_weights) # adiciona na nova_pop
nova_pop.append(new_weights2)
pop = nova_pop
print(len(fits),fits)
Some input:
XOR is a simple problem. With a few hundreds of random initialization, you should have some lucky ones that solve it immediately (if "solved" means that they output is correct after doing a threshold). This is a good test to see if your initialization and feed-forward pass is correct, without debugging the whole GA all at once. Or you chould just hand-craft the correct weights and biases, and see if that works.
Your initial weights (uniform -40...+40) are way too large. I guess for XOR this maybe okay-ish. But initial weights should be such that most neurons don't saturate, but aren't fully in the linear zone of sigmoid either.
After your implementation works, have a look at this numpy implementation of the feed-foward pass of a neural network for how to do it with less code.

How to simulate a reaction with an order < 1 in pyomo?

I am simulating a chemical reaction of the form A --> B --> C using a chemical batch reactor model. The corresponding ODE is a follows:
dcA/dt = - kA * cA(t) ** nA1
dcB/dt = kA * cA(t) ** nA1 - kB * cB(t) **nB2
dcC/dt = - kB * cB(t) ** nB2
Pyomo solves the ODE system fine if the exponents nA1 and nB2 are 1 or higher. But in my case they below 1 and as the components concentrations approach zero the ode integration fails, giving out only nans. The reason is that once the concentrations approach zero they numerically become values of cA(t) = -10e-20 for example and then the expression cA(t)**nA1 is not solvable any more.
I tried to implement a workaround of the form:
if cA < 0:
R1 = 0
else:
R1 = kA * cA(t) ** nA1
but I wasn't able to do it properly as I had a hard time using the pyomo synthax.
This is the minimal working example:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from pyomo.environ import *
from pyomo.dae import *
V = 40 # l
kA = 0.5 # 1/min
kB = 0.1 # 1/min
nA1 = 0.5
nB2 = 0.5
cAf = 2.0 # mol/l
def batch_plot(t, y):
plt.plot(t, y[:, 0], label = "cA")
plt.plot(t, y[:, 1], label = "cB")
plt.plot(t, y[:, 2], label = "cC")
plt.legend()
def batch():
m = ConcreteModel()
m.t = ContinuousSet(bounds = (0, 500))
m.cA = Var(m.t, domain = NonNegativeReals)
m.cB = Var(m.t, domain = NonNegativeReals)
m.cC = Var(m.t, domain = NonNegativeReals)
m.dcA = DerivativeVar(m.cA, wrt = m.t)
m.dcB = DerivativeVar(m.cB, wrt = m.t)
m.dcC = DerivativeVar(m.cC, wrt = m.t)
m.cA[0] = cAf
m.cB[0] = 0
m.cC[0] = 0
R1 = lambda m, t: kA * m.cA[t] ** nA1
R2 = lambda m, t: kB * m.cB[t] ** nB2
m.odeA = Constraint(m.t, rule = lambda m, t: m.dcA[t] == - R1(m, t) )
m.odeB = Constraint(m.t,
rule = lambda m, t: m.dcB[t] == R1(m, t) - R2(m, t) )
m.odeC = Constraint(m.t,
rule = lambda m, t: m.dcC[t] == R2(m, t) )
return m
tsim, profiles = Simulator(batch(), package = "scipy").simulate(numpoints = 100)
batch_plot(tsim, profiles)
I expect the ode integration to work even with reaction orders below 1.
Does anybody have an idea on how to achieve this?
There are two aims in modifying the power function x^n:
extend to negative x in a smooth way so that the numerical method does not hiccup close to x=0 and
have a small slope for small x so that the numerical integration for very small x has a greater chance to be stable.
The first condition is satisfied by constructs like
x*max(eps,abs(x))^(n-1) or
x*(eps+abs(x-eps))^(n-1),
x*(eps^2+abs(x-eps)^2)^(0.5*(n-1)),
which all have the exact same value x^n for x>eps and are continuous and piecewise smooth. But the slope at x=0 is of the size eps^(n-1) which will require very small step sizes even after the system stabilizes.
The solution is to extract even more integer power from the rational power in the form of
x*abs(x) * max(eps,abs(x))^(n-2)
or one of the other variants for the last factor. For 0<x<eps and n=0.5 this results in the value r(x)=x^2 * eps^(-1.5), so that the equation x'=-k*r(x) has the solution x(t)=x1/(1+x1*k*eps^(-1.5)*(t-t1)) after it fell to a point 0<x1<eps at t=t1. The slope of r is smaller 2, which is nice for numerical integrators.
This was implemented for scipy.integrate.solve_ivp, using method LSODA and rather strict tolerances, with the ODE right side function
# your original function, stabilizes at negative values
power0 = lambda x,n: max(0,x) ** n;
# linear at x=0, small step sizes
def power1(x,n): eps=1e-4; return x * max(eps, abs(x)) ** (n-1);
def power2(x,n): eps=1e-4; return x * (eps**2+(x-eps)**2) ** (0.5*(n-1))
# quadratic at x=0, large step sizes on the tail
eps = 1e-8
power3 = lambda x,n: x * abs(x) * max(eps,abs(x)) ** (n-2)
power4 = lambda x,n: x * abs(x) * (eps**2+(x-eps)**2) ** (0.5*n-1)
# select the power approximation used
power = power3
def model(t,u):
cA, cB, Cc = u;
R1 = kA * power(cA, nA1)
R2 = kB * power(cB, nB2)
return [ -R1, R1-R2, R2 ]
The integration runs successfully, using step sizes 20-30 in the tail end. The resulting plot looks qualitatively correct,
and in the zoom for small values is smooth and remains positive.

Is there any python function/library for calculate binomial confidence intervals?

I need to calculate binomial confidence intervals for large set of data within a script of python. Do you know any function or library of python that can do this?
Ideally I would like to have a function like this http://statpages.org/confint.html implemented on python.
Thanks for your time.
Just noting because it hasn't been posted elsewhere here that statsmodels.stats.proportion.proportion_confint lets you get a binomial confidence interval with a variety of methods. It only does symmetric intervals, though.
I would say that R (or another stats package) would probably serve you better if you have the option. That said, if you only need the binomial confidence interval you probably don't need an entire library. Here's the function in my most naive translation from javascript.
def binP(N, p, x1, x2):
p = float(p)
q = p/(1-p)
k = 0.0
v = 1.0
s = 0.0
tot = 0.0
while(k<=N):
tot += v
if(k >= x1 and k <= x2):
s += v
if(tot > 10**30):
s = s/10**30
tot = tot/10**30
v = v/10**30
k += 1
v = v*q*(N+1-k)/k
return s/tot
def calcBin(vx, vN, vCL = 95):
'''
Calculate the exact confidence interval for a binomial proportion
Usage:
>>> calcBin(13,100)
(0.07107391357421874, 0.21204372406005856)
>>> calcBin(4,7)
(0.18405151367187494, 0.9010086059570312)
'''
vx = float(vx)
vN = float(vN)
#Set the confidence bounds
vTU = (100 - float(vCL))/2
vTL = vTU
vP = vx/vN
if(vx==0):
dl = 0.0
else:
v = vP/2
vsL = 0
vsH = vP
p = vTL/100
while((vsH-vsL) > 10**-5):
if(binP(vN, v, vx, vN) > p):
vsH = v
v = (vsL+v)/2
else:
vsL = v
v = (v+vsH)/2
dl = v
if(vx==vN):
ul = 1.0
else:
v = (1+vP)/2
vsL =vP
vsH = 1
p = vTU/100
while((vsH-vsL) > 10**-5):
if(binP(vN, v, 0, vx) < p):
vsH = v
v = (vsL+v)/2
else:
vsL = v
v = (v+vsH)/2
ul = v
return (dl, ul)
While the scipy.stats module has a method .interval() to compute the equal tails confidence, it lacks a similar method to compute the highest density interval. Here is a rough way to do it using methods found in scipy and numpy.
This solution also assumes you want to use a Beta distribution as a prior. The hyper-parameters a and b are set to 1, so that the default prior is a uniform distribution between 0 and 1.
import numpy
from scipy.stats import beta
from scipy.stats import norm
def binomial_hpdr(n, N, pct, a=1, b=1, n_pbins=1e3):
"""
Function computes the posterior mode along with the upper and lower bounds of the
**Highest Posterior Density Region**.
Parameters
----------
n: number of successes
N: sample size
pct: the size of the confidence interval (between 0 and 1)
a: the alpha hyper-parameter for the Beta distribution used as a prior (Default=1)
b: the beta hyper-parameter for the Beta distribution used as a prior (Default=1)
n_pbins: the number of bins to segment the p_range into (Default=1e3)
Returns
-------
A tuple that contains the mode as well as the lower and upper bounds of the interval
(mode, lower, upper)
"""
# fixed random variable object for posterior Beta distribution
rv = beta(n+a, N-n+b)
# determine the mode and standard deviation of the posterior
stdev = rv.stats('v')**0.5
mode = (n+a-1.)/(N+a+b-2.)
# compute the number of sigma that corresponds to this confidence
# this is used to set the rough range of possible success probabilities
n_sigma = numpy.ceil(norm.ppf( (1+pct)/2. ))+1
# set the min and max values for success probability
max_p = mode + n_sigma * stdev
if max_p > 1:
max_p = 1.
min_p = mode - n_sigma * stdev
if min_p > 1:
min_p = 1.
# make the range of success probabilities
p_range = numpy.linspace(min_p, max_p, n_pbins+1)
# construct the probability mass function over the given range
if mode > 0.5:
sf = rv.sf(p_range)
pmf = sf[:-1] - sf[1:]
else:
cdf = rv.cdf(p_range)
pmf = cdf[1:] - cdf[:-1]
# find the upper and lower bounds of the interval
sorted_idxs = numpy.argsort( pmf )[::-1]
cumsum = numpy.cumsum( numpy.sort(pmf)[::-1] )
j = numpy.argmin( numpy.abs(cumsum - pct) )
upper = p_range[ (sorted_idxs[:j+1]).max()+1 ]
lower = p_range[ (sorted_idxs[:j+1]).min() ]
return (mode, lower, upper)
Just been trying this myself. If it helps here's my solution, which takes two lines of code and seems to give equivalent results to that JS page. This is the frequentist one-sided interval, I'm calling the input argument the MLE (maximum likelihood estimate) of the binomial parameter theta. I.e. mle = number of successes/number of trials. I find the upper bound of the one sided interval. The alpha value used here is therefore double the one in the JS page for the upper limit.
from scipy.stats import binom
from scipy.optimize import bisect
def binomial_ci( mle, N, alpha=0.05 ):
"""
One sided confidence interval for a binomial test.
If after N trials we obtain mle as the proportion of those
trials that resulted in success, find c such that
P(k/N < mle; theta = c) = alpha
where k/N is the proportion of successes in the set of trials,
and theta is the success probability for each trial.
"""
to_minimise = lambda c: binom.cdf(mle*N,N,c)-alpha
return bisect(to_minimise,0,1)
To find the two sided interval, call with (1-alpha/2) and alpha/2 as arguments.
The following gives exact (Clopper-Pearson) interval for binomial distribution in a simple way.
def binomial_ci(x, n, alpha=0.05):
#x is number of successes, n is number of trials
from scipy import stats
if x==0:
c1 = 0
else:
c1 = stats.beta.interval(1-alpha, x,n-x+1)[0]
if x==n:
c2=1
else:
c2 = stats.beta.interval(1-alpha, x+1,n-x)[1]
return c1, c2
You may check the code by e.g.:
p1,p2 = binomial_ci(2,7)
from scipy import stats
assert abs(stats.binom.cdf(1,7,p1)-.975)<1E-5
assert abs(stats.binom.cdf(2,7,p2)-.025)<1E-5
assert abs(binomial_ci(0,7, alpha=.1)[0])<1E-5
assert abs((1-binomial_ci(0,7, alpha=.1)[1])**7-0.05)<1E-5
assert abs(binomial_ci(7,7, alpha=.1)[1]-1)<1E-5
assert abs((binomial_ci(7,7, alpha=.1)[0])**7-0.05)<1E-5
I used the relation between the binomial proportion confidence interval and the regularized incomplete beta function, as described here:
https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Clopper%E2%80%93Pearson_interval
I needed to do this as well. I was using R and wanted to learn a way to work it out for myself. I would not say it is strictly pythonic.
The docstring explains most of it. It assumes you have scipy installed.
def exact_CI(x, N, alpha=0.95):
"""
Calculate the exact confidence interval of a proportion
where there is a wide range in the sample size or the proportion.
This method avoids the assumption that data are normally distributed. The sample size
and proportion are desctibed by a beta distribution.
Parameters
----------
x: the number of cases from which the proportion is calulated as a positive integer.
N: the sample size as a positive integer.
alpha : set at 0.95 for 95% confidence intervals.
Returns
-------
The proportion with the lower and upper confidence intervals as a dict.
"""
from scipy.stats import beta
x = float(x)
N = float(N)
p = round((x/N)*100,2)
intervals = [round(i,4)*100 for i in beta.interval(alpha,x,N-x+1)]
intervals.insert(0,p)
result = {'Proportion': intervals[0], 'Lower CI': intervals[1], 'Upper CI': intervals[2]}
return result
A numpy/scipy-free way of computing the same thing using the Wilson score and an approximation to the normal cumulative density function,
import math
def binconf(p, n, c=0.95):
'''
Calculate binomial confidence interval based on the number of positive and
negative events observed.
Parameters
----------
p: int
number of positive events observed
n: int
number of negative events observed
c : optional, [0,1]
confidence percentage. e.g. 0.95 means 95% confident the probability of
success lies between the 2 returned values
Returns
-------
theta_low : float
lower bound on confidence interval
theta_high : float
upper bound on confidence interval
'''
p, n = float(p), float(n)
N = p + n
if N == 0.0: return (0.0, 1.0)
p = p / N
z = normcdfi(1 - 0.5 * (1-c))
a1 = 1.0 / (1.0 + z * z / N)
a2 = p + z * z / (2 * N)
a3 = z * math.sqrt(p * (1-p) / N + z * z / (4 * N * N))
return (a1 * (a2 - a3), a1 * (a2 + a3))
def erfi(x):
"""Approximation to inverse error function"""
a = 0.147 # MAGIC!!!
a1 = math.log(1 - x * x)
a2 = (
2.0 / (math.pi * a)
+ a1 / 2.0
)
return (
sign(x) *
math.sqrt( math.sqrt(a2 * a2 - a1 / a) - a2 )
)
def sign(x):
if x < 0: return -1
if x == 0: return 0
if x > 0: return 1
def normcdfi(p, mu=0.0, sigma2=1.0):
"""Inverse CDF of normal distribution"""
if mu == 0.0 and sigma2 == 1.0:
return math.sqrt(2) * erfi(2 * p - 1)
else:
return mu + math.sqrt(sigma2) * normcdfi(p)
Astropy provides such a function (although installing and importing astropy may be a bit excessive):
astropy.stats.binom_conf_interval
I am not an expert on statistics, but binomtest is built into SciPy and produces the same results as the accepted answer:
from scipy.stats import binomtest
binomtest(13, 100).proportion_ci()
Out[11]: ConfidenceInterval(low=0.07107304618545972, high=0.21204067708744978)
binomtest(4, 7).proportion_ci()
Out[25]: ConfidenceInterval(low=0.18405156764007, high=0.9010117215575631)
It uses Clopper-Pearson exact method by default, which matches Curt's accepted answer, which gives these values, for comparison:
Usage:
>>> calcBin(13,100)
(0.07107391357421874, 0.21204372406005856)
>>> calcBin(4,7)
(0.18405151367187494, 0.9010086059570312)
It also has options for Wilson's method, with or without continuity correction, which matches TheBamf's astropy answer:
binomtest(4, 7).proportion_ci(method='wilson')
Out[32]: ConfidenceInterval(low=0.2504583645276572, high=0.8417801447485302)
binom_conf_interval(4, 7, 0.95, interval='wilson')
Out[33]: array([0.25045836, 0.84178014])
This also matches R's binom.test and statsmodels.stats.proportion.proportion_confint, according to cxrodgers' comment:
For 30 successes in 60 trials, both R's binom.test and statsmodels.stats.proportion.proportion_confint give (.37, .63) using Klopper-Pearson.
binomtest(30, 60).proportion_ci(method='exact')
Out[34]: ConfidenceInterval(low=0.3680620319424367, high=0.6319379680575633)

Categories