python nodal/piecewise-linear power-law generator - python

I need efficient python code that returns (not fits) a piecewise-linear (or, actually, piecewise power-law) continuous function for an arbitrary number of nodes (/poles/control points) defined by their positions plus (preferably) slopes rather than amplitudes. For example, for three pieces (four nodes) I have:
def powerLawFuncTriple(x,C,alpha,beta,gamma,xmin,xmax,x0,x1):
"""
Extension of the two-power-law version described at
http://en.wikipedia.org/wiki/Power_law#Broken_power_law
"""
if x <= xmin or x > xmax:
return 0.0
elif xmin < x <= x0:
n = C * (x ** alpha)
elif x0 < x <= x1:
n = C * x0**(alpha-beta) * (x ** beta)
elif x1 < x <= xmax:
n = C * x0**(alpha-beta) * x1**(beta-gamma) * (x ** gamma)
return n
Do any helpful functions already exist, otherwise what's an efficient way to code up code to generate these functions? Maybe this amounts to evaluating rather than fitting one of the scipy built-ins.
Somewhat related: Fitting piecewise function in Python
One possible answer may be:
def piecewise(x,n0,posns=[1.0,2.0,3.0],alpha=[-2.0,-1.5]):
if x <= posns[0] or x > posns[-1]: return 0.0
n=n0*x**alpha[0]
np=len(posns)
for ip in range(np):
if posns[ip] < x <= posns[ip+1]: return n
n *= (posns[ip+1]/float(x))**(alpha[ip]-alpha[ip+1])
return n
But this must get slower as x increases. Would list comprehension or anything else speed up the loop?
Thanks!

In the end I've gone with the excellent CosmoloPy module:
http://pydoc.net/Python/CosmoloPy/0.1.103/cosmolopy.utils
class PiecewisePowerlaw()
"""
You can specify the intervals and power indices, and this class
will figure out the coefficients needed to make the function
continuous and normalized to unit integral.
"""
Seems to work swimmingly.

Related

How to fit a piecewise (alternating linear and constant segments) function to a parabolic function?

I do have a function, for example , but this can be something else as well, like a quadratic or logarithmic function. I am only interested in the domain of . The parameters of the function (a and k in this case) are known as well.
My goal is to fit a continuous piece-wise function to this, which contains alternating segments of linear functions (i.e. sloped straight segments, each with intercept of 0) and constants (i.e. horizontal segments joining the sloped segments together). The first and last segments are both sloped. And the number of segments should be pre-selected between around 9-29 (that is 5-15 linear steps + 4-14 constant plateaus).
Formally
The input function:
The fitted piecewise function:
I am looking for the optimal resulting parameters (c,r,b) (in terms of least squares) if the segment numbers (n) are specified beforehand.
The resulting constants (c) and the breakpoints (r) should be whole natural numbers, and the slopes (b) round two decimal point values.
I have tried to do the fitting numerically using the pwlf package using a segmented constant models, and further processed the resulting constant model with some graphical intuition to "slice" the constant steps with the slopes. It works to some extent, but I am sure this is suboptimal from both fitting perspective and computational efficiency. It takes multiple minutes to generate a fitting with 8 slopes on the range of 1-50000. I am sure there must be a better way to do this.
My idea would be to instead using only numerical methods/ML, the fact that we have the algebraic form of the input function could be exploited in some way to at least to use algebraic transforms (integrals) to get to a simpler optimization problem.
import numpy as np
import matplotlib.pyplot as plt
import pwlf
# The input function
def input_func(x,k,a):
return np.power(x,1/a)*k
x = np.arange(1,5e4)
y = input_func(x, 1.8, 1.3)
plt.plot(x,y);
def pw_fit(func, x_r, no_seg, *fparams):
# working on the specified range
x = np.arange(1,x_r)
y_input = func(x, *fparams)
my_pwlf = pwlf.PiecewiseLinFit(x, y_input, degree=0)
res = my_pwlf.fit(no_seg)
yHat = my_pwlf.predict(x)
# Function values at the breakpoints
y_isec = func(res, *fparams)
# Slope values at the breakpoints
slopes = np.round(y_isec / res, decimals=2)
slopes = slopes[1:]
# For the first slope value, I use the intersection of the first constant plateau and the input function
slopes = np.insert(slopes,0,np.round(y_input[np.argwhere(np.diff(np.sign(y_input - yHat))).flatten()[0]] / np.argwhere(np.diff(np.sign(y_input - yHat))).flatten()[0], decimals=2))
plateaus = np.unique(np.round(yHat))
# If due to rounding slope values (to two decimals), there is no change in a subsequent step, I just remove those segments
to_del = np.argwhere(np.diff(slopes) == 0).flatten()
slopes = np.delete(slopes,to_del + 1)
plateaus = np.delete(plateaus,to_del)
breakpoints = [np.ceil(plateaus[0]/slopes[0])]
for idx, j in enumerate(slopes[1:-1]):
breakpoints.append(np.floor(plateaus[idx]/j))
breakpoints.append(np.ceil(plateaus[idx+1]/j))
breakpoints.append(np.floor(plateaus[-1]/slopes[-1]))
return slopes, plateaus, breakpoints
slo, plat, breaks = pw_fit(input_func, 50000, 8, 1.8, 1.3)
# The piecewise function itself
def pw_calc(x, slopes, plateaus, breaks):
x = x.astype('float')
cond_list = [x < breaks[0]]
for idx, j in enumerate(breaks[:-1]):
cond_list.append((j <= x) & (x < breaks[idx+1]))
cond_list.append(breaks[-1] <= x)
func_list = [lambda x: x * slopes[0]]
for idx, j in enumerate(slopes[1:]):
func_list.append(plateaus[idx])
func_list.append(lambda x, j=j: x * j)
return np.piecewise(x, cond_list, func_list)
y_output = pw_calc(x, slo, plat, breaks)
plt.plot(x,y,y_output);
(Not important, but I think the fitted piecewise function is not continuous as it is. Intervals should be x<=r1; r1<x<=r2; ....)
As Anatolyg has pointed out, it looks to me that in the optimal solution (for the function posted at least, and probably for any where the derivative is different from zero), the horizantal segments will collapse to a point or the minimum segment length (in this case 1).
EDIT---------------------------------------------
The behavior above could only be valid if the slopes could have an intercept. If the intercepts are zero, as posted in the question, one consideration must be taken into account: Is the initial parabolic function defined in zero or nearby? Imagine the function y=0.001 *sqrt(x-1000), then the segments defined as b*x will have a slope close to zero and will be so similar to the constant segments that the best fit will be just the line that without intercept that fits better all the function.
Provided that the function is defined in zero or nearby, you can start by approximating the curve just by linear segments (with intercepts):
divide the function domain in N intervals(equal intervals or whose size is a function of the average curvature (or second derivative) of the function along the domain).
linear fit/regression in each intervals
for each interval, if a point (or bunch of points) in the extreme of any interval is better fitted by the line of the neighbor interval than the line of its interval, this point is assigned to the neighbor interval.
Repeat from 2) until no extreme points are moved.
Linear regressions might be optimized not to calculate all the covariance matrixes from scratch on each iteration, but just adding the contributions of the moved points to the previous covariance matrixes.
Then each linear segment (LSi) is replaced by a combination of a small constant segment at the beginning (Cbi), a linear segment without intercept (Si), and another constant segment at the end (Cei). This segments are easy to calculate as Si will contain the middle point of LSi, and Cbi and Cei will have respectively the begin and end values of the segment LSi. Then the intervals of each segment has to be calculated as an intersection between lines.
With this, the constant end segment will be collinear with the constant begin segment from the next interval so they will merge, resulting in a series of constant and linear segments interleaved.
But this would be a floating point start solution. Next, you will have to apply all the roundings which will mess up quite a lot all the segments as the conditions integer intervals and linear segments without slope can be very confronting. In fact, b,c,r are not totally independent. If ci and ri+1 are known, then bi+1 is already fixed
If nothing is broken so far, the final task will be to minimize the error/cost function (I assume that it will be the integral of the error between the parabolic function and the segments). My guess is that gradients here will be quite a pain, as if you change for example one ci, all the rest of the bj and cj will have to adapt as well due to the integer intervals restriction. However, if you can generalize the derivatives between parameters ( how much do I have to adapt bi+1 if ci changes a unit), you can propagate the change of one parameter to all other parameters and have kind of a gradient. Then for each interval, you can estimate what would be the ideal parameter and averaging all intervals calculate the best gradient step. Let me illustrate this:
Assuming first that r parameters are fixed, if I change c1 by one unit, b2 changes by 0.1, c2 changes by -0.2 and b3 changes by 0.2. This would be the gradient.
Then I estimate, comparing with the parabolic curve, that c1 should increase 0.5 (to reduce the cost by 10 points), b2 should increase 0.2 (to reduce the cost by 5 points), c2 should increase 0.2 (to reduce the cost by 6 points) and b3 should increase 0.1 (to reduce the cost by 9 points).
Finally, the gradient step would be (0.5/1·10 + 0.2/0.1·5 - 0.2/(-0.2)·6 + 0.1/0.2·9)/(10 + 5 + 6 + 9)~= 0.45. Thus, c1 would increase 0.45 units, b2 would increase 0.45·0.1, and so on.
When you add the r parameters to the pot, as integer intervals do not have an proper derivative, calculation is not straightforward. However, you can consider r parameters as floating points, calculate and apply the gradient step and then apply the roundings.
We can integrate the squared error function for linear and constant pieces and let SciPy optimize it. Python 3:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize
xl = 1
xh = 50000
a = 1.3
p = 1 / a
n = 8
def split_b_and_c(bc):
return bc[::2], bc[1::2]
def solve_for_r(b, c):
r = np.empty(2 * n)
r[0] = xl
r[1:-1:2] = c / b[:-1]
r[2::2] = c / b[1:]
r[-1] = xh
return r
def linear_residual_integral(b, x):
return (
(x ** (2 * p + 1)) / (2 * p + 1)
- 2 * b * x ** (p + 2) / (p + 2)
+ b ** 2 * x ** 3 / 3
)
def constant_residual_integral(c, x):
return x ** (2 * p + 1) / (2 * p + 1) - 2 * c * x ** (p + 1) / (p + 1) + c ** 2 * x
def squared_error(bc):
b, c = split_b_and_c(bc)
r = solve_for_r(b, c)
linear = np.sum(
linear_residual_integral(b, r[1::2]) - linear_residual_integral(b, r[::2])
)
constant = np.sum(
constant_residual_integral(c, r[2::2])
- constant_residual_integral(c, r[1:-1:2])
)
return linear + constant
def evaluate(x, b, c, r):
i = 0
while x > r[i + 1]:
i += 1
return b[i // 2] * x if i % 2 == 0 else c[i // 2]
def main():
bc0 = (xl + (xh - xl) * np.arange(1, 4 * n - 2, 2) / (4 * n - 2)) ** (
p - 1 + np.arange(2 * n - 1) % 2
)
bc = scipy.optimize.minimize(
squared_error, bc0, bounds=[(1e-06, None) for i in range(2 * n - 1)]
).x
b, c = split_b_and_c(bc)
r = solve_for_r(b, c)
X = np.linspace(xl, xh, 1000)
Y = [evaluate(x, b, c, r) for x in X]
plt.plot(X, X ** p)
plt.plot(X, Y)
plt.show()
if __name__ == "__main__":
main()
I have tried to come up with a new solution myself, based on the idea of #Amo Robb, where I have partitioned the domain, and curve fitted a dual - constant and linear - piece together (with the help of np.maximum). I have used the 1 / f(x)' as the function to designate the breakpoints, but I know this is arbitrary and does not provide a global optimum. Maybe there is some optimal function for these breakpoints. But this solution is OK for me, as it might be appropriate to have a better fit at the first segments, at the expense of the error for the later segments. (The task itself is actually a cost based retail margin calculation {supply price -> added margin}, as the retail POS software can only work with such piecewise margin function).
The answer from #David Eisenstat is correct optimal solution if the parameters are allowed to be floats. Unfortunately the POS software can not use floats. It is OK to round up c-s and r-s afterwards. But the b-s should be rounded to two decimals, as those are inputted as percents, and this constraint would ruin the optimal solution with long floats. I will try to further improve my solution with both Amo's and David's valuable input. Thank You for that!
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# The input function f(x)
def input_func(x,k,a):
return np.power(x,1/a) * k
# 1 / f(x)'
def one_per_der(x,k,a):
return a / (k * np.power(x, 1/a-1))
# 1 / f(x)' inverted
def one_per_der_inv(x,k,a):
return np.power(a / (x*k), a / (1-a))
def segment_fit(start,end,y,first_val):
b, _ = curve_fit(lambda x,b: np.maximum(first_val, b*x), np.arange(start,end), y[start-1:end-1])
b = float(np.round(b, decimals=2))
bp = np.round(first_val / b)
last_val = np.round(b * end)
return b, bp, last_val
def pw_fit(end_range, no_seg, **fparams):
y_bps = np.linspace(one_per_der(1, **fparams), one_per_der(end_range,**fparams) , no_seg+1)[1:]
x_bps = np.round(one_per_der_inv(y_bps, **fparams))
y = input_func(x, **fparams)
slopes = [np.round(float(curve_fit(lambda x,b: x * b, np.arange(1,x_bps[0]), y[:int(x_bps[0])-1])[0]), decimals = 2)]
plats = [np.round(x_bps[0] * slopes[0])]
bps = []
for i, xbp in enumerate(x_bps[1:]):
b, bp, last_val = segment_fit(int(x_bps[i]+1), int(xbp), y, plats[i])
slopes.append(b); bps.append(bp); plats.append(last_val)
breaks = sorted(list(x_bps) + bps)[:-1]
# If due to rounding slope values (to two decimals), there is no change in a subsequent step, I just remove those segments
to_del = np.argwhere(np.diff(slopes) == 0).flatten()
breaks_to_del = np.concatenate((to_del * 2, to_del * 2 + 1))
slopes = np.delete(slopes,to_del + 1)
plats = np.delete(plats[:-1],to_del)
breaks = np.delete(breaks,breaks_to_del)
return slopes, plats, breaks
def pw_calc(x, slopes, plateaus, breaks):
x = x.astype('float')
cond_list = [x < breaks[0]]
for idx, j in enumerate(breaks[:-1]):
cond_list.append((j <= x) & (x < breaks[idx+1]))
cond_list.append(breaks[-1] <= x)
func_list = [lambda x: x * slopes[0]]
for idx, j in enumerate(slopes[1:]):
func_list.append(plateaus[idx])
func_list.append(lambda x, j=j: x * j)
return np.piecewise(x, cond_list, func_list)
fparams = {'k':1.8, 'a':1.2}
end_range = 5e4
no_steps = 10
x = np.arange(1, end_range)
y = input_func(x, **fparams)
slopes, plats, breaks = pw_fit(end_range, no_steps, **fparams)
y_output = pw_calc(x, slopes, plats, breaks)
plt.plot(x,y_output,y);

Python script for finding intersections for a graph function

I have this python code for finding intersections in the function "f(x)=x**2+x-2" with "g(x)=6-x"
import math
#brute force the functions with numbers until his Y values match, and then, do this for the other point.
def funcs():
for X in range(-100, 100):
funcA = (X**2)+X-2
funcB = 6 - X
if funcA == funcB:
print("##INTERSECTION FOUND!!")
print(f"({X},{funcA})")
print(f"({X},{funcB})")
else:
pass
funcs()
But my problem is the script only works with THAT SPECIFIC MATH FUNCTION, if I try to change the math function a little bit, the code won't work.
The code just checks when the Y values of the f(x) and the g(x) match together, and do the same for the other point.
Here it is the output:
##INTERSECTION FOUND!!
(-4,10)
(-4,10)
##INTERSECTION FOUND!!
(2,4)
(2,4)
In general, this is a root finding problem.
Define h(x) = f(x) - g(x).
The intersection point x implies f(x)=g(x) or h(x)=0.
For root-finding problems, there are many methods, say, bisection method, newton's method.
Here is a numerical example with bisection method.
def f(x):
return x ** 2 + x - 2
def g(x):
return 6 - x
def h(x):
return f(x) - g(x)
def bisection(a, b):
eps = 10 ** -10
ha = h(a)
hb = h(b)
if ha * hb > 0:
raise ValueError("Bad input")
for i in range(1000): # fix iteration number
ha = h(a)
midpoint = (a + b) / 2
hm = h(midpoint)
if abs(hm) < eps:
return midpoint
if hm * ha >= 0:
a = midpoint
else:
b = midpoint
raise RuntimeError("Out of iterations")
if __name__ == '__main__':
print(bisection(0, 100))
print(bisection(-100, 0))
Output:
1.999999999998181
-3.999999999996362
Why the numbers so close but not exact? because the problem is solved numerically. Other answers that utilize the sympy package solves the problem symbolically, which give the exact answer. But they only work with simple problems.
Why [0, 100] and [-100, 0]? this is because I sketched the graph somehow and know there is a root within the interval. In practice, the bisection method requires the interval [a,b] such that h(a) * h(b) < 0. Given a big interval [-100,100] and, thus, h(-100) * h(100) > 0, bisection method does not work this case. The big interval is partitioned such that some of the sub-intervals [a,b] satisfy the condition h(a) * h(b) < 0, say, [-100, 0] and [0, 100]
Why abs(hm) < eps? This tests whether hm is close to 0. In computers, we consider two floating-point numbers equal if the absolute value of their difference abs(hm) is smaller than a threshold eps. eps is usually 10 ** -10 to 10 ** -15 because there are usually 15 decimal significant digits for float numbers in Python or computer.
Newton's method will give you one of the outputs depending on the initial point.
For further study, search root finding problem or numerical root finding.
As you want the intersection, hence you are looking for a solution for f(x) - g(x) = 0. So, you can use fsolve in python to find the root of f(x) - g(x):
from scipy.optimize import fsolve
def func(X):
funcA = (X ** 2) + X - 2
funcB = 6 - X
return (funcA - funcB)
x = fsolve(func,0)
print(x)
You could employ sympy, Python's symbolic math library:
from sympy import symbols, Eq, solve
X = symbols('X', real=True)
funcA = (X ** 2) + X - 2
funcB = 6 - X
sol = solve(Eq(funcA, funcB))
print(sol) # --> [-4, 2]
To obtain the corresponding values for funcA and funcB
for s in sol:
print(f'X={s} funcA({s})={funcA.subs(X, s)} funcB({s})={funcB.subs(X, s)} ')
# X=-4 funcA(-4)=10 funcB(-4)=10
# X=2 funcA(2)=4 funcB(2)=4
For some functions, the result could still be symbolically, as that is the most exact form. .evalf() obtains a numeric approximation. For example:
funcA = X ** 2 + X - 2
funcB = - 2*X ** 2 + X + 7
sol = solve(Eq(funcA, funcB))
for s in sol:
print(f'X={s} funcA(X)={funcA.subs(X, s)} funcB(X)={funcB.subs(X, s)}')
print(f'X={s.evalf()} funcA(X)={funcA.subs(X, s).evalf()} funcB(X)={funcB.subs(X, s).evalf()}')
Output:
X=-sqrt(3) funcA(X)=1 - sqrt(3) funcB(X)=1 - sqrt(3)
X=-1.73205080756888 funcA(X)=-0.732050807568877 funcB(X)=-0.732050807568877
X=sqrt(3) funcA(X)=1 + sqrt(3) funcB(X)=1 + sqrt(3)
X=1.73205080756888 funcA(X)=2.73205080756888 funcB(X)=2.73205080756888

How to add several constraints to differential_evolution?

I have the same problem as in this question but don't want to add only one but several constraints to the optimization problem.
So e.g. I want to maximize x1 + 5 * x2 with the constraints that the sum of x1 and x2 is smaller than 5 and x2 is smaller than 3 (needless to say that the actual problem is far more complicated and cannot just thrown into scipy.optimize.minimize as this one; it just serves to illustrate the problem...).
I can to an ugly hack like this:
from scipy.optimize import differential_evolution
import numpy as np
def simple_test(x, more_constraints):
# check wether all constraints evaluate to True
if all(map(eval, more_constraints)):
return -1 * (x[0] + 5 * x[1])
# if not all constraints evaluate to True, return a positive number
return 10
bounds = [(0., 5.), (0., 5.)]
additional_constraints = ['x[0] + x[1] <= 5.', 'x[1] <= 3']
result = differential_evolution(simple_test, bounds, args=(additional_constraints, ), tol=1e-6)
print(result.x, result.fun, sum(result.x))
This will print
[ 1.99999986 3. ] -16.9999998396 4.99999985882
as one would expect.
Is there a better/ more straightforward way to add several constraints than using the rather 'dangerous' eval?
An example is something like this::
additional_constraints = [lambda(x): x[0] + x[1] <= 5., lambda(x):x[1] <= 3]
def simple_test(x, more_constraints):
# check wether all constraints evaluate to True
if all(constraint(x) for constraint in more_constraints):
return -1 * (x[0] + 5 * x[1])
# if not all constraints evaluate to True, return a positive number
return 10
There is a proper solution to the problem described in the question, to enforce multiple nonlinear constraints with scipy.optimize.differential_evolution.
The proper way is by using the scipy.optimize.NonlinearConstraint function.
Here below I give a non-trivial example of optimizing the classic Rosenbrock function inside a region defined by the intersection of two circles.
import numpy as np
from scipy import optimize
# Rosenbrock function
def fun(x):
return 100*(x[1] - x[0]**2)**2 + (1 - x[0])**2
# Function defining the nonlinear constraints:
# 1) x^2 + (y - 3)^2 < 4
# 2) (x - 1)^2 + (y + 1)^2 < 13
def constr_fun(x):
r1 = x[0]**2 + (x[1] - 3)**2
r2 = (x[0] - 1)**2 + (x[1] + 1)**2
return r1, r2
# No lower limit on constr_fun
lb = [-np.inf, -np.inf]
# Upper limit on constr_fun
ub = [4, 13]
# Bounds are irrelevant for this problem, but are needed
# for differential_evolution to compute the starting points
bounds = [[-2.2, 1.5], [-0.5, 2.2]]
nlc = optimize.NonlinearConstraint(constr_fun, lb, ub)
sol = optimize.differential_evolution(fun, bounds, constraints=nlc)
# Accurate solution by Mathematica
true = [1.174907377273171, 1.381484428610871]
print(f"nfev = {sol.nfev}")
print(f"x = {sol.x}")
print(f"err = {sol.x - true}\n")
This prints the following with default parameters:
nfev = 636
x = [1.17490808 1.38148613]
err = [7.06260962e-07 1.70116282e-06]
Here is a visualization of the function (contours) and the feasible region defined by the nonlinear constraints (shading inside the green line). The constrained global minimum is indicated by the yellow dot, while the magenta one shows the unconstrained global minimum.
This constrained problem has an obvious local minimum at (x, y) ~ (-1.2, 1.4) on the boundary of the feasible region which will make local optimizers fail to converge to the global minimum for many starting locations. However, differential_evolution consistently finds the global minimum as expected.

Riemann sum in python

I need help coding a program that will use the Riemann definition (left AND right rules) to calculate the integral of f(x)=sin(x) from a=0 to b=2*pi. I can do this by hand for days, but I have zero idea how to code it with python.
Did you take a look at this code: http://statmath.org/calculate_area.pdf
# Calcuate the area under a curve
#
# Example Function y = x^2
#
# This program integrates the function from x1 to x2
# x2 must be greater than x1, otherwise the program will print an error message.
#
x1 = float(input('x1='))
x2 = float (input('x2='))
if x1 > x2:
print('The calculated area will be negative')
# Compute delta_x for the integration interval
#
delta_x = ((x2-x1)/1000)
j = abs ((x2-x1)/delta_x)
i = int (j)
print('i =', i)
# initialize
n=0
A= 0.0
x = x1
# Begin Numerical Integration
while n < i:
delta_A = x**2 * delta_x
x = x + delta_x
A = A + delta_A
n = n+1
print('Area Under the Curve =', A)
From my experience, looking at the equations from wiki has helped me with translating into python. Here are a few wiki pages:
Riemann definition
Fundamental theorem of calculus
Numerical integration
Also, The math module of python will help you with this:
Python Math
After checking these out, look at some examples of other mathematical equations in the python language to understand how to integrate some of the math functions.

Calculating Probability of a Random Variable in a Distribution in Python

Given a mean and standard-deviation defining a normal distribution, how would you calculate the following probabilities in pure-Python (i.e. no Numpy/Scipy or other packages not in the standard library)?
The probability of a random variable r where r < x or r <= x.
The probability of a random variable r where r > x or r >= x.
The probability of a random variable r where x > r > y.
I've found some libraries, like Pgnumerics, that provide functions for calculating these, but the underlying math is unclear to me.
Edit: To show this isn't homework, posted below is my working code for Python<=2.6, albeit I'm not sure if it handles the boundary conditions correctly.
from math import *
import unittest
def erfcc(x):
"""
Complementary error function.
"""
z = abs(x)
t = 1. / (1. + 0.5*z)
r = t * exp(-z*z-1.26551223+t*(1.00002368+t*(.37409196+
t*(.09678418+t*(-.18628806+t*(.27886807+
t*(-1.13520398+t*(1.48851587+t*(-.82215223+
t*.17087277)))))))))
if (x >= 0.):
return r
else:
return 2. - r
def normcdf(x, mu, sigma):
t = x-mu;
y = 0.5*erfcc(-t/(sigma*sqrt(2.0)));
if y>1.0:
y = 1.0;
return y
def normpdf(x, mu, sigma):
u = (x-mu)/abs(sigma)
y = (1/(sqrt(2*pi)*abs(sigma)))*exp(-u*u/2)
return y
def normdist(x, mu, sigma, f):
if f:
y = normcdf(x,mu,sigma)
else:
y = normpdf(x,mu,sigma)
return y
def normrange(x1, x2, mu, sigma, f=True):
"""
Calculates probability of random variable falling between two points.
"""
p1 = normdist(x1, mu, sigma, f)
p2 = normdist(x2, mu, sigma, f)
return abs(p1-p2)
All these are very similar: If you can compute #1 using a function cdf(x), then the solution to #2 is simply 1 - cdf(x), and for #3 it's cdf(x) - cdf(y).
Since Python includes the (gauss) error function built in since version 2.7 you can do this by calculating the cdf of the normal distribution using the equation from the article you linked to:
import math
print 0.5 * (1 + math.erf((x - mean)/math.sqrt(2 * standard_dev**2)))
where mean is the mean and standard_dev is the standard deviation.
Some notes since what you asked seemed relatively straightforward given the information in the article:
CDF of a random variable (say X) is the probability that X lies between -infinity and some limit, say x (lower case). CDF is the integral of the pdf for continuous distributions. The cdf is exactly what you described for #1, you want some normally distributed RV to be between -infinity and x (<= x).
< and <= as well as > and >= are same for continuous random variables as the probability that the rv is any single point is 0. So whether or not x itself is included doesn't actually matter when calculating the probabilities for continuous distributions.
Sum of probabilities is 1, if its not < x then it's >= x so if you have the cdf(x). then 1 - cdf(x) is the probability that the random variable X >= x. Since >= is equivalent for continuous random variables to >, this is also the probability X > x.

Categories