Calculating expectation of functions across normal distribution - python

I want to compute the expectation of certain functions across a normal distribution.
An example:
mu = 100
k = 100
sigma = 10
val, err = quad(lambda x: norm.pdf((x - mu) / sigma) * x if x > k else 0, -math.inf, math.inf)
print(val)
This prints 4.878683842492743e-288 which is clearly not the correct answer.
I assume this is happening because SciPy is unable to integrate the Gaussian. How can I solve this? Ideally, I'd want a method that'd allows one to integrate all sorts of functions across Gaussian and is not specific to the example I have put in.
Thanks!

I think this is a problem with the quadrature (sometimes it doesn't really adapt), and doesn't like the if statement.
So I would suggest something like this (integrate from k to infinity):
def f(x):
return 1/sigma*norm.pdf((x - mu) / sigma)*x
val, err = quad(f, k, math.inf)
Notice, as implied by Jimmy, the correct form of the Gaussian needs 1/sigma.
Another way to do this integral would be to force quad to be careful at some points. My favorite way is to do something like
import numpy as np
from scipy.integrate import quad
#this is the Gaussian. Note that *0.5*(np.sign(x-k)+1) is 0 for x<k and 1 otherwise.
def f(x):
return 1/(np.sqrt(2*np.pi)*sigma)*np.exp(- 0.5*( (x-mu)/sigma )**2 ) *x*0.5*(np.sign(x-k)+1)
#use this to integrate from in (-1,1)
def G(u):
x=u/(1-u**2)
return f(x)*(1+u**2)/(1-u**2)**2
quad(G,-1,1,points=np.linspace(-0.999,0.999,25))
I would suggest to read this in order to get how you can optimize such integrals.

Related

Derivatives in python

I am trying to find the coefficients of a finite series, $f(x) = \sum_n a_nx^n$. To get the $m$th coefficient, we can take the $m$th derivative evaluated at zero. Therefore, the $m$th coefficient is
$$
a_n = \frac{1}{2\pi i } \oint_C \frac{f(z)}{z^{n+1}} dz
$$
I believe this code takes the derivative of a function using the above contour integral.
import math
import numpy
import matplotlib.pyplot as plt
def F(x):
mean=10
return math.exp(mean*(x.real-1))
def p(n):
mean=10
return (math.pow(mean, n) * math.exp(-mean)) / math.factorial(n)
def integration(func, a, n, r, n_steps):
z = r * numpy.exp(2j * numpy.pi * numpy.arange(0, 1, 1. / n_steps))
return math.factorial(n) * numpy.mean(func(a + z) / z**n)
ns = list(range(20))
f2 = numpy.vectorize(F)
plt.plot(ns,[p(n) for n in ns], label='Actual')
plt.plot(ns,[integration(f2, a=0., n=n, r=1., n_steps=100).real/math.factorial(n) for n in ns], label='Numerical derivative')
plt.legend()
However, it is clear that the numerical derivative is completely off the actual values of the coefficients of the series. What am I doing wrong?
The formulas in the Mathematics Stack Exchange answer that you're using to derive the coefficients of the power series expansion of F are based on complex analysis - coming for example from Cauchy's residue theorem (though other derivations are possible). One of the assumptions necessary to make those formulas work is that you have a holomorphic (i.e., complex differentiable) function.
Your definition of F gives a function that's not holomorphic. (For one thing, it always gives a real result for any complex input, which isn't possible for a non-constant holomorphic function.) But it's easily fixed to be holomorphic, while continuing to return the same result for real inputs.
Here's a fixed version of F, which replaces x.real with x. Since the input to exp is now complex, it's also necessary to use cmath.exp instead of math.exp to avoid a TypeError:
import cmath
def F(x):
mean=10
return cmath.exp(mean*(x-1))
After that fix for F, if I run your code I get rather surprisingly accurate results. Here's the graph that I get. (I had to print out the values to double check that that graph really did show two lines on top of one another.)

python scipy.integrate inverse ppf

Hello I would like what is the x value that would give me an specific numeric integral.
So far I am using scipy.integrate and works ok, but given a integral value is there a way to know what is the x that gave that result?
let say I have a function f(x) = |2-x| 1<=x<=3
I would like to know what is the x value that give me 0.25 (the first quartile).
What in scipy.stats for normal distribution is norm.ppf(), in this particular case I am using it for PDF (probability density function) but it can be any integral.
Thanks and regards
I use binary search to find answer in logarithmic time (very fast). You can also run code below online here.
import math, scipy.integrate, scipy.optimize
def f(x):
return math.sin(x)
a, b = 0, 10
integral_value = 0.5
res_x = scipy.optimize.bisect(
lambda x: scipy.integrate.quad(f, a, x)[0] - integral_value,
a, b
)
print(
'found point x', res_x, ', integral value at this point',
scipy.integrate.quad(f, a, res_x)[0]
)

Solving an ODE with a random variable with a custom pdf

I have the following ODE:
where p is a probability, and y is a random variable with pdf:
epsilon is a small cut-off value (typically 0.0001).
I am looking to numerically solve this system in Python, for t = 0 to about 500.
Is there a way I can implement this using numpy/scipy?
There's an annoying lack of quality Python packages for SDE integration. You have kind of an unusual setup, in that your random variable has an explicit dependence on your integrand (at least it appears that y depends on p). Because of this, it'll probably be hard to find a preexisting implementation that meets your needs.
Fortunately, the simplest method for SDE integration, Euler-Maruyama, is very easy to implement, as in the eulmar function below:
from matplotlib import pyplot as plt
import numpy as np
def eulmar(func, randfunc, x0, tinit, tfinal, dt):
times = np.arange(tinit, tfinal + dt, dt)
x = np.zeros(times.size, dtype=float)
x[0] = x0
for i,t in enumerate(times[1:]):
x[i+1] = x[i] + func(x[i], t) + randfunc(x[i], t)
return times, x
You could then use eulmar to integrate your SDE like so:
def func(x, t):
return 1 - 2*x
def randfunc(x, t):
return np.random.rand()
times,x = eulmar(func, randfunc, 0, 0, 500, 5)
plt.plot(times, x)
You'll have to supply your own randfunc however. Like above, it should be a function that takes x and t as arguments and returns a single sample from your random variable y. If you're having trouble coming up with a way to generate samples of y, since you know the PDF you can always use rejection sampling (though it does tend to be fairly inefficient).
Notes
This is not a particularly efficient implementation of Euler-Maruyama. For example, the random samples are usually generated all at once (eg np.random.rand(500)). However, you can't pre-generate your random samples, since y depends on p.

Numerical integration with singularities in python (principal value)

I'm trying to integrate a function with singularities using the quad function in scipy.integrate but I'm not getting the desired answer. Here is the code:
from scipy.integrate import quad
import numpy as np
def fun(x):
return 1./(1-x**2)
quad(fun, -2, 2, points=[-1, 1])
This results in IntegrationWarning and return value about 0.4.
The poles for the function are [-1,1]. The answer should be of roughly 1.09 (calculated using pen and paper).
The option weight='cauchy' can be used to efficiently compute the principal value of divergent integrals like this one. It means that the function provided to quad will be implicitly multiplied by 1/(x-wvar), so adjust that function accordingly (multiply it by x-wvar where wvar is the point of singularity).
i1 = quad(lambda x: -1./(x+1), 0, 2, weight='cauchy', wvar=1)[0]
i2 = quad(lambda x: -1./(x-1), -2, 0, weight='cauchy', wvar=-1)[0]
result = i1 + i2
The result is 1.0986122886681091.
With a simple function like this, you can also do symbolic integration with SymPy:
from sympy import symbols, integrate
x = symbols('x')
f = integrate(1/(1-x**2), x)
result = (f.subs(x, 2) - f.subs(x, -2)).evalf()
Result: 1.09861228866811. Without evalf() it would be log(3).
I also couldn't get it to work with the original function. I came up with this to evaluate the principal value in scipy:
def principal_value(func, a, b, poles, eps=10**(-6)):
#edges
res = quad(func,a,poles[0]-eps)[0]+quad(func,poles[-1]+eps,b)[0]
#inner part
for i in range(len(poles)-1):
res += quad(func, poles[i]+eps, poles[i+1]-eps)[0]
return res
Where func is your function handle, a and b are the limits, poles is a list of poles and eps is how near you want to approach the poles. You can make eps smaller and smaller to get a better result, but maybe sympy will be better for a problem like this.
With this function and the standard eps I get 1.0986112886023367 as a result, which is almost the same as wolframalpha gives.

Riemann sum of a probability density

I am trying to find the probability of an event of a random variable past a specific value, i.e. pr(x>a), where a is some constant, typically much higher than the average of x, and x is not of any standard Gaussian distribution. So I wanted to fit some other probability density function, and take the integral of the pdf of x from a to inf. As this is a problem of modelling the spikes, I considered this an Extreme Value analysis problem, and found that the Weibull distribution might be appropriate.
Regarding extreme value distributions, the Weibull distribution has a very "not-easy-to-implement" integral, and I therefore figured I could just get the pdf from Scipy, and do a Riemann-sum. I also thought that I could as well simply evaluate the kernel density, get the pdf, and do the same with the Riemann sum, to approximate the integral.
I found a Q here on Stack which provided a neat method for doing Riemann sums in Python, and I adapted that code to fit my problem. But when I evaluate the integral I get weird numbers, indicating that something is either wrong with the KDE, or the Riemann sum-function.
Two scenarios, the first with the Weibull, in accordance with the Scipy documentation:
x = theData
x_grid = np.linspace(0,np.max(x),len(x))
p = ss.weibull_min.fit(x[x!=0], floc=0)
pd = ss.weibull_min.pdf(x_grid,p[0], p[1], p[2])
which looks like this:
and then also tried the KDE method as follows
pd = ss.gaussian_kde(x).pdf(x_grid)
which I subsequently run through the following function:
def riemannSum(a, b, n):
dx = (b - a) / n
s = 0.0
x = a
for i in range(n):
s += pd[x]
x += dx
return s * dx
print(riemannSum(950.0, 1612.0, 10000))
print(riemannSum(0.0, 1612.0, 100000))
In the case of the Weibull, it gives me
>> 0.272502150549
>> 18.2860384829
and in the case of the KDE, I get
>> 0.448450460469
>> 18.2796021034
This is obviously wrong. Taking the integral of the entire thing should give me 1, and 18.2+ is quite far off.
Am I wrong in my assumptions of what I can do with these density functions? Or have I made some mistake in the Riemann sum function
the Weibull distribution has a very "not-easy-to-implement" integral
Huh?!
Weibull distribution has very well defined CDF, so implementing integral is pretty much one-liner (ok, make it two for clarity)
def WeibullCDF(x, lmbd, k):
q = pow(x/lmbd, k)
return 1.0 - exp(-q)
And, of course, there is ss.weibull_min.cdf(x_grid,p[0], p[1], p[2]) if you want to pick from standard library
I know there is an accepted answer that worked for you but I stumbled across this while looking to see how to do a Riemann sum of a probability density and others may too so I will give this a go.
Basically, I think you had (what is now) an older version of numpy that allowed floating point indexing and your pd variable pointed to an array of values drawn from the pdf corresponding to the values at xgrid. Nowadays you will get an error in numpy when trying to use a floating point index but since you didn't you were accessing the value of the pdf at the grid values corresponding to that index. What you needed to do was calculate the pdf with the new values you wanted to use in your Riemann sum.
I edited the code from the question to create a method that works for calculating the integral of the pdf.
def riemannSum(a, b, n):
dx = (b-a)/n
s = 0.0
x = 0
pd = weibull_min.pdf(np.linspace(a, b, n), p[0], p[1], p[2])
for i in range(n):
s += pd[x]
x += 1
return s*dx
Below Riemann implementation can also be used (it uses Java instead of Python) sorry.
import static java.lang.Math.exp;
import static java.lang.Math.pow;
import java.util.Optional;
import java.util.function.BiFunction;
import java.util.function.BinaryOperator;
import java.util.function.Function;
import java.util.stream.IntStream;
public class WeibullPDF
{
public interface Riemann extends BiFunction<Function<Double, Double>, Integer,
BinaryOperator<Double>> { }
public static void main(String args[])
{
int N=100000;
Riemann s = (f, n) -> (a, b) ->
IntStream.range(0, n).
.mapToDouble(i->f.apply(a+i*((b-a)/n))*((b-a)/n)).sum();
double k=1.5;
Optional<Double> weibull =
Optional.of(s.apply(x->k*pow(x,k-1)*exp(-pow(x,k)),N).apply(0.0,1612.0));
weibull.ifPresent(System.out::println); //prints 0.9993617886716168
}
}

Categories