python: random sampling from self-defined probability function [duplicate] - python

This question already has answers here:
Fast arbitrary distribution random sampling (inverse transform sampling)
(5 answers)
Closed 5 years ago.
I have a piecewise quartic distribution with a probability density function:
p(x)= c(x/a)^2 if 0≤x<a;
c((b+a-x)^2/b)^2 if a≤x≤b;
0 otherwise
Suppose c, a, b are known, I am trying to draw 100 random samples from the distribution. How can I do it with numpy/scipy?

One standard way is to find an explicit formula, G = F^-1 for the inverse of the cumulative distribution function. That is doable here (although it will naturally be piecewise defined) and then use G(U) where U is uniform on [0,1] to generate your samples.
In this case, I think that I worked out the details, but you will need to check the Calculus/Algebra.
First of all, to streamline things it helps to introduce a couple of new parameters. Let
f(a,b,c,d,x) = c*x**2 #if 0 <= x <= a
and
f(a,b,c,d,x) = d*(x-e)**4 #if a < x <= b
Then your p(x) is given by
p(x) = f(a,b,c/a**2,c/b**2,a+b)
I integrated f to find the cumulative distribution and then inverted and got the following:
def Finverse(a,b,c,d,e,x):
if x <= (c*a**3)/3:
return (3*x/c)**(1/3)
else:
return e + ((a-e)**5 - (5*c*a**3)/(3*d))**(1/5)
Assuming this is right, then simply:
def randX(a,b,c):
u = random.random()
return Finverse(a,b,c/a**2,c/b**2,a+b,u)
In this case it was possible to work out an explicit formula. When you can't work out such a formula for the inverse, consider using the Monte Carlo methods described by #lucianopaz

As your function is bounded both in x and p(x), I recommend that you use Monte Carlo rejection sampling. The basic principle is that you draw two uniform random numbers, one representing a candidate x in the x space bounds [0,b] and another representing y. If y is lower or equal to the normalized p(x), then the sampled x is returned, if not it continues to the next iteration
import numpy as np
def rejection_sampler(p,xbounds,pmax):
while True:
x = np.random.rand(1)*(xbounds[1]-xbounds[0])+xbounds[0]
y = np.random.rand(1)*pmax
if y<=p(x):
return x
Here, p should be a callable to your normalized piecewise probability density, xbounds can be a list or tuple containing the lower and upper bounds, and pmax the maximum of the probability density in the x interval.

Related

Generating 3D Gaussian Data [duplicate]

This question already has answers here:
Generating 3D Gaussian distribution in Python
(2 answers)
Closed 2 years ago.
I'm trying to generate a 3D distribution, where x, y represents the surface plane, and z is the magnitude of some value, distributed over a range.
I'm looking at numpy's multivariate_normal, but it only lets me get a number of samples. I'd like the ability to specify some x, y coordinate, and get back what the z value should be; so I'd be able to query gp(x, y) and get back a z value that adheres to some mean and covariance.
Perhaps a more illustrative (toy) example: assume I have some temperature distribution that can be modeled as a gaussian process. So I might have a mean temperature of 20 at (0, 0), and some covariance [[1, 0], [0, 1]]. I'd like to be able to create a model that I can then query at different x, y locations to get the temperature at that position (so, at (5, 5) I might get back something like 7 degrees).
How to best accomplish this?
I assume that your data can be copied to a single np.array, which I will refer to as X in my code, with shape X.shape = (n,2), where n is the number of data points you have and you can have n = 1, if you wish to test a single point at a time. 2, of course, refers to the 2D space spanned by your coordinates (x and y) base. Then:
def estimate_gaussian(X):
return X.mean(axis=0), np.cov(X.T)
def mva_gaussian( X, mu, sigma2 ):
k = len(mu)
# check if sigma2 is a vector and, if yes, use as the diagonal of the covariance matrix
if sigma2.ndim == 1 :
sigma2 = np.diag(sigma2)
X = X - mu
return (2 * np.pi)**(-k/2) * np.linalg.det(sigma2)**(-0.5) * \
np.exp( -0.5 * np.sum( np.multiply( X.dot( np.linalg.inv(sigma2) ), X ), axis=1 ) ).reshape( ( X.shape[0], 1 ) )
will do what you want - that is, given data points you will get the value of the gaussian function at those points (or a single point). This is actually a generalized version of what you need, as this function can describe a multivariate gaussian. You seem to be interested in the k = 2 case and a diagonal covariance matrix sigma2.
Moreover, this is also a probability distribution - which you say you don't want. We don't have enough info to know what exactly it is you're trying to fit to (i.e. what you expect the three parameters of the gaussian function to be. Usually, people are interested in a normal distribution). Nevertheless, you can simply change the parameters in the return statement of the mva_gaussian function according to your needs and ignore the estimate gaussian function if you don't want a normalized distribution (although a normalized function would still give you what you seek - a real valued temperature - as long as you know the normalization process - which you do :-) ).
You can create a multivariate normal using scipy.stats.multivariate_normal.
>>> import scipy.stats
>>> dist = scipy.stats.multivariate_normal(mean=[2,3], cov=[[1,0],
[0,1]])
Then to find p(x,y) you can use pdf
>>> dist.pdf([2,3])
0.15915494309189535
>>> dist.pdf([1,1])
0.013064233284684921
Which represents the probability (which you called z) given any [x,y]

Derivatives blow up in python

I am trying to find higher order derivatives of a dataset (x,y). x and y are 1D arrays of length N.
Let's say I generate them as :
xder0=np.linspace(0,10,1000)
yder0=np.sin(xder0)
I define the derivative function which takes in 2 array (x,y) and returns (x1, y1) where y1 is the derivative calculated at each index as : (y[i+1]-y[i])/(x[i+1]-x[i]). x1 is just the mean of x[i+1] and x[i]
Here is the function that does it:
def deriv(x,y):
delx =np.zeros((len(x)-1), dtype=np.longdouble)
ydiff=np.zeros((len(x)-1), dtype=np.longdouble)
for i in range(len(x)-1):
delx[i] =(x[i+1]+x[i])/2.0
ydiff[i] =(y[i+1]-y[i])/(x[i+1]-x[i])
return delx, ydiff
Now to calculate the first derivative, I call this function as:
xder1, yder1 = deriv(xder0, yder0)
Similarly for second derivative, I call this function giving first derivatives as input:
xder2, yder2 = deriv(xder1, yder1)
And it goes on:
xder3, yder3 = deriv(xder2, yder2)
xder4, yder4 = deriv(xder3, yder3)
xder5, yder5 = deriv(xder4, yder4)
xder6, yder6 = deriv(xder5, yder5)
xder7, yder7 = deriv(xder6, yder6)
xder8, yder8 = deriv(xder7, yder7)
xder9, yder9 = deriv(xder8, yder8)
Something peculiar happens after I reach order 7. The 7th order becomes very noisy! Earlier derivatives are all either sine or cos functions as expected. However 7th order is a noisy sine. And hence all derivatives after that blow up.
Any idea what is going on?
This is a well known stability issue with numerical interpolation using equally-spaced points. Read the answers at http://math.stackexchange.com.
To overcome this problem you have to use non-equally-spaced points, like the roots of Lagendre polynomial. The instability occurs due to the unavailability of information at the boundaries, thus more concentration of points at the boundaries is required, as per the roots of say Lagendre polynomials or others with similar properties such as Chebyshev polynomial.

Stochastic integration with python

I want to numerically solve integrals that contain white noise.
Mathematically white noise can be described by a variable X(t), which is a random variable with a time average, Avg[X(t)] = 0 and the correlation function, Avg[X(t), X(t')] = delta_distribution(t-t').
A simple example would be to calculate the integral over X(t) from t=0 to t=1. On average this is of course zero, but what I need are different realizations of this integral.
The problem is that this does not work with numpy.integrate.quad().
Are there any packages for python that deal with stochastic integrals?
This is a good starting point for numerical SDE methods: http://math.gmu.edu/~tsauer/pre/sde.pdf.
Here is a simple numpy solver for the stochastic differential equation dX_t = a(t,X_t)dt + b(t,X_t)dW_t which I wrote for a class project last year. It is based on the forward euler method for regular differential equations, and in practice is fairly widely used when solving SDEs.
def euler_maruyama(a,b,x0,t):
N = len(t)
x = np.zeros((N,len(x0)))
x[0] = x0
for i in range(N-1):
dt = t[i+1]-t[i]
dWt = np.random.normal(0,dt)
x[i+1] = x[i] + a(t[i],x[i])*dt + b(t[i],x[i])*dWt
return x
Essentially, at each timestep, the deterministic part of the function is integrated using forward Euler, and the stochastic part is integrated by generating a normal random variable dWt with mean 0 and variance dt and integrating the stochastic part with respect to this.
The reason we generate dWt like this is based on the definition of Brownian motions. In particular, if $W$ is a Brownian motion, then $(W_t-W_s)$ is normally distributed with mean 0 and variance $t-s$. So dWt is a discritization of the change in $W$ over a small time interval.
This is a the docstring from the function above:
Parameters
----------
a : callable a(t,X_t),
t is scalar time and X_t is vector position
b : callable b(t,X_t),
where t is scalar time and X_t is vector position
x0 : ndarray
the initial position
t : ndarray
list of times at which to evaluate trajectory
Returns
-------
x : ndarray
positions of trajectory at each time in t

How to simulate from an (arbitrary) continuous probability distribution? [duplicate]

This question already has answers here:
Fast arbitrary distribution random sampling (inverse transform sampling)
(5 answers)
Closed 5 years ago.
I have a probability density function like this:
def p1(x):
return ( sin(x) ** (-0.75) ) / (4.32141 * (x ** (1/5)))
I want to denerate random value on [0; 1] with this pdf. How can I do random value?
As mentioned by Francis you'd better know the cdf of your distribution.
Anyway scipy provides a handy way to define custom distributions.
It looks pretty much like that
from scipy import stats
class your_distribution(stats.rv_continuous):
def _pdf(self, x):
return ( sin(x) ** (-0.75) ) / (4.32141 * (x ** (1/5)))
distribution = your_distribution()
distribution.rvs()
Without using scipy and given a numerical sampling of your PDF, you can sample using a cumulative distribution and linear interpolation. The code below assumes equal spacing in x. It could be modified to do an integration for an arbitrarily sampled PDF. Note it renormalises the PDF to 1 within the range of x.
import numpy as np
def randdist(x, pdf, nvals):
"""Produce nvals random samples from pdf(x), assuming constant spacing in x."""
# get cumulative distribution from 0 to 1
cumpdf = np.cumsum(pdf)
cumpdf *= 1/cumpdf[-1]
# input random values
randv = np.random.uniform(size=nvals)
# find where random values would go
idx1 = np.searchsorted(cumpdf, randv)
# get previous value, avoiding division by zero below
idx0 = np.where(idx1==0, 0, idx1-1)
idx1[idx0==0] = 1
# do linear interpolation in x
frac1 = (randv - cumpdf[idx0]) / (cumpdf[idx1] - cumpdf[idx0])
randdist = x[idx0]*(1-frac1) + x[idx1]*frac1
return randdist

Generating numbers with Gaussian function in a range using python

I want to use the gaussian function in python to generate some numbers between a specific range giving the mean and variance
so lets say I have a range between 0 and 10
and I want my mean to be 3 and variance to be 4
mean = 3, variance = 4
how can I do that ?
Use random.gauss. From the docs:
random.gauss(mu, sigma)
Gaussian distribution. mu is the mean, and sigma is the standard deviation. This is slightly
faster than the normalvariate() function defined below.
It seems to me that you can clamp the results of this, but that wouldn't make it a Gaussian distribution. I don't think you can satisfy all the constraints simultaneously. If you want to clamp it to the range [0, 10], you could get your numbers:
num = min(10, max(0, random.gauss(3, 4)))
But then the resulting distribution of numbers won't be truly Gaussian. In this case, it seems you can't have your cake and eat it, too.
There's probably a better way to do this, but this is the function I ended up creating to solve this problem:
import random
def trunc_gauss(mu, sigma, bottom, top):
a = random.gauss(mu,sigma))
while (bottom <= a <= top) == False:
a = random.gauss(mu,sigma))
return a
If we break it down line by line:
import random
This allows us to use functions from the random library, which includes a gaussian random number generator (random.gauss).
def trunc_gauss(mu, sigma, bottom, top):
The function arguments allow us to specify the mean (mu) and variance (sigma), as well as the top and bottom of our desired range.
a = random.gauss(mu,sigma))
Inside the function, we generate an initial random number according to a gaussian distribution.
while (bottom <= a <= top) == False:
a = random.gauss(mu,sigma))
Next, the while loop checks if the number is within our specified range, and generates a new random number as long as the current number is outside our range.
return a
As soon as the number is inside our range, the while loop stops running and the function returns the number.
This should give a better approximation of a gaussian distribution, since we don't artificially inflate the top and bottom boundaries of our range by rounding up or down the outliers.
I'm quite new to Python, so there are most probably simpler ways, but this worked for me.
I was working on some numerical analytical computation and I ran into this python tutorial site - http://www.python-course.eu/weighted_choice_and_sample.php
Now, this is what I proffer as a solution should anyone be too busy as to not hit the site.
I don't know how many gaussian values you need so I'll go with 100 as n, mu you gave as 3 and variance as 4 which makes sigma = 2. Here's the code:
from random import gauss
n = 100
values = []
frequencies = {}
while len(values) < n:
value = gauss(3, 2)
if 0 < value < 10:
frequencies[int(value)] = frequencies.get(int(value), 0) + 1
values.append(value)
print(values)
I hope this helps. You can get the plot as well. It's all in the tutorials.
If you have a small range of integers, you can create a list with a gaussian distribution of the numbers within that range and then make a random choice from it.
import numpy as np
from random import uniform
from scipy.special import erf,erfinv
import math
def trunc_gauss(mu, sigma,xmin=np.nan,xmax=np.nan):
"""Truncated Gaussian distribution.
mu is the mean, and sigma is the standard deviation.
"""
if np.isnan(xmin):
zmin=0
else:
zmin = erf((xmin-mu)/sigma)
if np.isnan(xmax):
zmax=1
else:
zmax = erf((xmax-mu)/sigma)
y = uniform(zmin,zmax)
z = erfinv(y)
# This will not come up often but if y >= 0.9999999999999999
# due to the truncation of the ervinv function max z = 5.805018683193454
while math.isinf(z):
z = erfinv(uniform(zmin,zmax))
return mu + z*sigma
You can use minimalistic code for 150 variables:
import numpy as np
s = np.random.normal(3,4,150) #<= mean = 3, variance = 4
print(s)
Normal distribution is another like random, stochastic distribution.
So, we can check it by:
import seaborn as sns
import matplotlib.pyplot as plt
AA1_plot = sns.distplot(s, kde=True, rug=False)
plt.show()

Categories