I would like to associate a probability value to a number.
Let's say, I consider a norman probability distribution with mean = 7 and std = 3.
I can generate a random number based on such distribution in this way
np.random.normal(7, 3, 1)
I would like to find a method to associate to a given number the value of the probability associated to it.
For instance, what is the value of the probability associated with 0.6 based on such distribution?
Let's assume I generate the histogram of n random values.
x = np.random.normal(7, 3, 100000)
plt.hist(x, 10)
Here I can I see that a value of 5 has a probability of ~0.11 while a value of 20 has probability 0.
For any normalized continuous distribution represented on a histogram as you have above, the only way to find the probability for a given histogram bin is to take the integral of that distribution over the range of the bins. So this depends on:
The distribution
The range of the bin you are considering
You can use the scipy package for example to do this integral numerically for you.
https://docs.scipy.org/doc/scipy/reference/tutorial/integrate.html
If you need something more simple, you can approximate this probability by taking the value of the CDF at the center of the bin and multiplying by the width of the bin.
Related
I have a vector of floats V with values from 0 to 1. I want to create a histogram with some window say A==0.01. And check how close is the resulting histogram to uniform distribution getting one value from zero to one where 0 is correlating perfectly and 1 meaning not correlating at all. For me correlation here first of all means histogram shape.
How one would do such a thing in python with numpy?
You can create the histogram with np.histogram. Then, you can generate the uniform histogram from the average of the previously retrieved histogram with np.mean. Then you can use a statistical test like the Pearson coefficient to do that with scipy.stats.pearsonr.
Suppose I draw randomly from a normal distribution with mean zero and standard deviation represented by a vector of, say, dimension 3 with
scale_rng=np.array([1,2,3])
eps=np.random.normal(0,scale_rng)
I need to compute a weighted average based on some simulations for which I draw the above mentioned eps. The weights of this average are "the probability of eps" (hence I will have a vector with 3 weights). For weighted average I simply mean an arithmetic sum wehere each component is multiplied by a weight, i.e. a number between 0 and 1 and where all the weights should sum up to one.
Such weighted average shall be calculated as follows: I have a time series of observations for one variable, x. I calculate an expanding rolling standard deviation of x (say this is the values in scale). Then, I extract a random variable eps from a normal distribution as explained above for each time-observation in x and I add it to it, say obtaining y=x+eps. Finally, I need to compute the weighted average of y where each value of y is weighted by the "probability of drawing each value of eps from a normal distribution with mean zero and standard deviation equal to scale.
Now, I know that I cannot think of this being the points on the pdf corresponding to the values randomly drawn because a normal random variable is continuous and as such the pdf at a certain point is zero. Hence, the only solution I Found out is to discretize a normal distribution with a certain number of bins and then find the probability that a value extracted with the code of above is actually drawn. How could I do this in Python?
EDIT: the solution I found is to use
norm.cdf(eps_it+0.5, loc=0, scale=scale_rng)-norm.cdf(eps_it-0.5, loc=0, scale=scale_rng)
which is not really based on the discretization but at least it seems feasible to me "probability-wise".
here's an example leaving everything continuous.
import numpy as np
from scipy import stats
# some function we want a monte carlo estimate of
def fn(eps):
return np.sum(np.abs(eps), axis=1)
# define distribution of eps
sd = np.array([1,2,3])
d_eps = stats.norm(0, sd)
# draw uniform samples so we don't double apply the normal density
eps = np.random.uniform(-6*sd, 6*sd, size=(10000, 3))
# calculate weights (working with log-likelihood is better for numerical stability)
w = np.prod(d_eps.pdf(eps), axis=1)
# normalise so weights sum to 1
w /= np.sum(w)
# get estimate
np.sum(fn(eps) * w)
which gives me 4.71, 4.74, 4.70 4.78 if I run it a few times. we can verify this is correct by just using a mean when eps is drawn from a normal directly:
np.mean(fn(d_eps.rvs(size=(10000, 3))))
which gives me essentially the same values, but with expected lower variance. e.g. 4.79, 4.76, 4.77, 4.82, 4.80.
Let Z = X/Y where X and Y are two normal variables. I know the mean and standard deviation of X and Y. How can I find the probability P( z > a ) where
What you are asking is not so simple, the wikipedia page has a lot of information https://en.wikipedia.org/wiki/Ratio_distribution. In short, the ratio of two normal independent distributions with zero mean is a Cauchy distribution, from which you can estimate your desired probability.
I know that the normal distribution is always greater than 0 for any chosen value of the mean and the standard deviation.
>> np.random.normal(scale=0.3, size=x.shape)
[ 0.15038925 -0.34161875 -0.07159422 0.41803414 0.39900799 0.10714512
0.5770597 -0.16351734 0.00962916 0.03901677]
Here the mean is 0.0 and the standard deviation is 0.3. But some values in the ndarray are negative. Am I wrong in my interpretation that normal distribution curve is always positive?
Edit:
But using normpdf function in matlab always give an array of positive values which I guess is the probability density function (y axis). Whereas numpy.random.normal gives both positive and negative values (x axis). Now this is confusing.
Values generated from a Normal distribution does take negative value.
For example, for a mean 0 normal distribution. We need some positive values and negative values for the average value to be zero. Also, for the normal distribution with mean 0, it is equally likely to be positive or negative.
It actually take any real number with positive probability. You might be confused with the probability density function is always positive.
referencing to np.random.normal in "https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html", the output is the sample (x), not the distribution (y). Therefore, the output can be negative.
Therefore, np.random.normal is used to do the sampling by following the normal distribution, not to randomly generate a probability value by following the normal distribution.
Try to not expect probability mean as 0, as it makes no sense, you expecting your random event never to occur.
Try to use something like np.random.normal(0.5, 0.3, 1000) to express your normal probability distribution.
Also, take a closer look at the math of Normal Distribution to be able to construct your probability density functions easily.
I have a question:
Given mean and variance I want to calculate the probability of a sample using a normal distribution as probability basis.
The numbers are:
mean = -0.546369
var = 0.006443
curr_sample = -0.466102
prob = 1/(np.sqrt(2*np.pi*var))*np.exp( -( ((curr_sample - mean)**2)/(2*var) ) )
I get a probability which is larger than 1! I get prob = 3.014558...
What is causing this? The fact that the variance is too small messes something up? It's a totally legal input to the formula and should give something small not greater than 1! Any suggestions?
Ok, what you compute is not a probability, but a probability density (which may be larger than one). In order to get 1 you have to integrate over the normal distribution like so:
import numpy as np
mean = -0.546369
var = 0.006443
curr_sample = np.linspace(-10,10,10000)
prob = np.sum( 1/(np.sqrt(2*np.pi*var))*np.exp( -( ((curr_sample - mean)**2)/(2*var) ) ) * (curr_sample[1]-curr_sample[0]) )
print prob
witch results in
0.99999999999961509
The formula you give is a probability density, not a probability. The density formula is such that when you integrate it between two values of x, you get the probability of being in that interval. However, this means that the probability of getting any particular sample is, in fact, 0 (it's the density times the infinitesimally small dx).
So what are you actually trying to calculate? You probably want something like the probability of getting your value or larger, the so-called tail probability, which is often used in statistics (it so happens that this is given by the error function when you're talking about a normal distribution, although you need to be careful of exactly how it's defined).
When considering the bell-shaped probability distribution function (PDF) of given mean and variance, the peak value of the curve (height of mode) is 1/sqrt(2*pi*var). It is 1 for standard normal distribution (mean 0 and var 1). Hence when trying to calculate a specific value of a general normal distribution pdf, values larger than 1 are possible.