I want to create a normal distributed array with numpy.random.normal that only consists of positive values.
For example the following illustrates that it sometimes gives back negative values and sometimes positive. How can I modify it so it will only gives back positive values?
>>> import numpy
>>> numpy.random.normal(10,8,3)
array([ -4.98781629, 20.12995344, 4.7284051 ])
>>> numpy.random.normal(10,8,3)
array([ 17.71918829, 15.97617052, 1.2328115 ])
>>>
I guess I could solve it somehow like this:
myList = numpy.random.normal(10,8,3)
while item in myList <0:
# run again until all items are positive values
myList = numpy.random.normal(10,8,3)
The normal distribution, by definition, extends from -inf to +inf so what you are asking for doesn't make sense mathematically.
You can take a normal distribution and take the absolute value to "clip" to positive values, or just discard negative values, but you should understand that it will no longer be a normal distribution.
I assume that what you mean is that you want to modify the probability density such that it is the same shape as normal in the positive range, and zero in negative. That is a pretty common practical case. In such case, you cannot simply take the absolute value of generated normal random variables. Instead, you have to generate a new independent normally distributed number until you come up with a positive one. One way to do that is recursively, see below.
import numpy as np
def PosNormal(mean, sigma):
x = np.random.normal(xbar,delta_xbar,1)
return(x if x>=0 else PosNormal(mean,sigma))
what about using lognormal along these lines:
mu = np.mean(np.log(list))
sigma = np.std(np.log(list))
new_list = np.random.lognormal(mu, sigma, length_of_new_list)
data = np.random.randint(low=1,high=100,size=(4,4),dtype='int')
Or maybe you could just 'shift' your entire distribution to the 'right' by subtracting the min (or adding the abs val of your min):
y = np.random.normal(0.0, 1.0, 10)
y
array([-0.16934484, 0.06163384, -0.29714508, -0.25917105, -0.0395456 ,
0.17424635, -0.42289079, 0.71837785, 0.93113373, 1.12096384])
y - min(y)
array([0.25354595, 0.48452463, 0.12574571, 0.16371974, 0.38334519,
0.59713714, 0. , 1.14126864, 1.35402452, 1.54385463])
The question is reasonable. For motivation, consider simulations of biological cells. The distribution of the count of a type of molecule in a cell can be approximated by the normal distribution, but must be non-negative to be physically meaningful.
My whole-simulator uses this method to sample the initial distribution of a molecule's count:
def non_neg_normal_sample(random_state, mean, std, max_iters=1000):
""" Obtain a non-negative sample from a normal distribution
The distribution returned is normal for 0 <= x, and 0 for x < 0
Args:
random_state (:obj:`numpy.random.RandomState`): a random state
mean (:obj:`float`): mean of the normal dist. to sample
std (:obj:`float`): std of the normal dist. to sample
max_iters (:obj:`int`, optional): maximum number of draws of the true normal distribution
Returns:
:obj:`float`: a normal sample that is not negative
Raises:
:obj:`ValueError`: if taking `max_iters` normal sample does not obtain one that is not negative
"""
iter = 0
while True:
sample = random_state.normal(mean, std)
iter += 1
if 0 <= sample:
return sample
if max_iters <= iter:
raise ValueError(f"{iter} draws of a normal dist. with mean {mean:.2E} and std {std:.2E} "
f"fails to obtain a non-negative sample")
I expand on #gena-kukartsev 's answer in two ways: First, I avoid recursion which could overflow the call stack. (Let's avoid answers that can overflow the stack on stackoverflow!) Second, I catch possibly bad input by limiting the number of samples of the distribution.
You can offset your entire array by the lowest value (left most) of the array. What you get may not be truly "normal distribution", but within the scope of your work, dealing with finite array, you can ensure that the values are positive and fits under a bell curve.
>>> mu,sigma = (0,1.0)
>>> s = np.random.normal(mu, 1.0, 100)
>>> s
array([-0.58017653, 0.50991809, -1.13431539, -2.34436721, -1.20175652,
0.56225648, 0.66032708, -0.98493441, 2.72538462, -1.28928887])
>>> np.min(s)
-2.3443672118476226
>>> abs(np.min(s))
2.3443672118476226
>>> np.add(s,abs(np.min(s)))
array([ 1.76419069, 2.85428531, 1.21005182, 0. , 1.14261069,
2.90662369, 3.00469429, 1.3594328 , 5.06975183, 1.05507835])
You could use high loc with low scale:
np.random.normal(100, 10, 10) /100
[0.96568643 0.92123722 0.83242272 0.82323367 1.07532713 0.90125736
0.91226052 0.90631754 1.08473303 0.94115643]
Related
My problem:
I have an array of ufloats (e.g. an unarray) in pythons uncertainties package.
All values of the array got their own errors, and I need a funktion, that gives me the average of the array in respect to both, the error
I get when calculating the mean of the nominal values and the influence the values errors have.
I have an uarray:
2 +/- 1
3 +/- 2
4 +/- 3
and need a funktion, that gives me an average value of the array.
Thanks
Assuming Gaussian statistics, the uncertainties stem from Gaussian parent distributions. In such a case, it is standard to weight the measurements (nominal values) by the inverse variance. This application to the general weighted average gives,
$$ \frac{\sum_i w_i x_i}{\sum_i w_i} = \frac{\sum_i x_i/\sigma_i^2}{\sum_i 1/\sigma_i^2} $$.
One need only perform good 'ol error propagation on this to get an uncertainty of the weighted average as,
$$ \sqrt{\sum_i \frac{1}{1/\sum_i \sigma_i^2}} $$
I don't have an n-length formula to do this syntactically speaking on hand, but here's how one could get the weighted average and its uncertainty in a simple case:
a = un.ufloat(5, 2)
b = un.ufloat(8, 4)
wavg = un.ufloat((a.n/a.s**2 + b.n/b.s**2)/(1/a.s**2 + 1/b.s**2),
np.sqrt(2/(1/a.s**2 + 1/b.s**2)))
print(wavg)
>>> 5.6+/-2.5298221281347035
As one would expect, the result tends more-so towards the value with the smaller uncertainty. This is good since a smaller uncertainty in a measurement implies that its associated nominal value is closer to the true value in the parent distribution than those with larger uncertainties.
Unless I'm missing something, you could calculate the sum divided by the length of the array:
from uncertainties import unumpy, ufloat
import numpy as np
arr = np.array([ufloat(2, 1), ufloat(3, 2), ufloat(4,3)])
print(sum(arr)/len(arr))
# 3.0+/-1.2
You can also define it like this:
arr1 = unumpy.uarray([2, 3, 4], [1, 2, 3])
print(sum(arr1)/len(arr1))
# 3.0+/-1.2
uncertainties takes care of the rest.
I used Captain Morgan's answer to serve up some sweet Python code for a project and discovered that it needed a little extra ingredient:
import uncertainties as un
from un.unumpy import unp
epsilon = unp.nominal_values(values).mean()/(1e12)
wavg = ufloat(sum([v.n/(v.s**2+epsilon) for v in values])/sum([1/(v.s**2+epsilon) for v in values]),
np.sqrt(len(values)/sum([1/(v.s**2+epsilon) for v in values])))
if wavg.s <= np.sqrt(epsilon):
wavg = ufloat(wavg.n, 0.0)
Without that little something (epsilon) we'd get div/0 errors from observations recorded with zero uncertainty.
If you already have a .csv file which stores variables in 'mean+/-sted' format, you could try the code below; it works for me.
from uncertainties import ufloat_fromstr
df=pd.read_csv('Z:\compare\SL2P_PAR.csv')
for i in range(len(df.uncertainty)):
df['mean'] = ufloat_fromstr(df['uncertainty'][I]).n
df['sted'] = ufloat_fromstr(df['uncertainty'][I]).s
I have a question:
Given mean and variance I want to calculate the probability of a sample using a normal distribution as probability basis.
The numbers are:
mean = -0.546369
var = 0.006443
curr_sample = -0.466102
prob = 1/(np.sqrt(2*np.pi*var))*np.exp( -( ((curr_sample - mean)**2)/(2*var) ) )
I get a probability which is larger than 1! I get prob = 3.014558...
What is causing this? The fact that the variance is too small messes something up? It's a totally legal input to the formula and should give something small not greater than 1! Any suggestions?
Ok, what you compute is not a probability, but a probability density (which may be larger than one). In order to get 1 you have to integrate over the normal distribution like so:
import numpy as np
mean = -0.546369
var = 0.006443
curr_sample = np.linspace(-10,10,10000)
prob = np.sum( 1/(np.sqrt(2*np.pi*var))*np.exp( -( ((curr_sample - mean)**2)/(2*var) ) ) * (curr_sample[1]-curr_sample[0]) )
print prob
witch results in
0.99999999999961509
The formula you give is a probability density, not a probability. The density formula is such that when you integrate it between two values of x, you get the probability of being in that interval. However, this means that the probability of getting any particular sample is, in fact, 0 (it's the density times the infinitesimally small dx).
So what are you actually trying to calculate? You probably want something like the probability of getting your value or larger, the so-called tail probability, which is often used in statistics (it so happens that this is given by the error function when you're talking about a normal distribution, although you need to be careful of exactly how it's defined).
When considering the bell-shaped probability distribution function (PDF) of given mean and variance, the peak value of the curve (height of mode) is 1/sqrt(2*pi*var). It is 1 for standard normal distribution (mean 0 and var 1). Hence when trying to calculate a specific value of a general normal distribution pdf, values larger than 1 are possible.
What function can I use in Python if I want to sample a truncated integer power law?
That is, given two parameters a and m, generate a random integer x in the range [1,m) that follows a distribution proportional to 1/x^a.
I've been searching around numpy.random, but I haven't found this distribution.
AFAIK, neither NumPy nor Scipy defines this distribution for you. However, using SciPy it is easy to define your own discrete distribution function using scipy.rv_discrete:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
def truncated_power_law(a, m):
x = np.arange(1, m+1, dtype='float')
pmf = 1/x**a
pmf /= pmf.sum()
return stats.rv_discrete(values=(range(1, m+1), pmf))
a, m = 2, 10
d = truncated_power_law(a=a, m=m)
N = 10**4
sample = d.rvs(size=N)
plt.hist(sample, bins=np.arange(m)+0.5)
plt.show()
I don't use Python, so rather than risk syntax errors I'll try to describe the solution algorithmically. This is a brute-force discrete inversion. It should translate quite easily into Python. I'm assuming 0-based indexing for the array.
Setup:
Generate an array cdf of size m with cdf[0] = 1 as the first entry, cdf[i] = cdf[i-1] + 1/(i+1)**a for the remaining entries.
Scale all entries by dividing cdf[m-1] into each -- now they actually are CDF values.
Usage:
Generate your random values by generating a Uniform(0,1) and
searching through cdf[] until you find an entry greater than your
uniform. Return the index + 1 as your x-value.
Repeat for as many x-values as you want.
For instance, with a,m = 2,10, I calculate the probabilities directly as:
[0.6452579827864142, 0.16131449569660355, 0.07169533142071269, 0.04032862392415089, 0.02581031931145657, 0.017923832855178172, 0.013168530260947229, 0.010082155981037722, 0.007966147935634743, 0.006452579827864143]
and the CDF is:
[0.6452579827864142, 0.8065724784830177, 0.8782678099037304, 0.9185964338278814, 0.944406753139338, 0.9623305859945162, 0.9754991162554634, 0.985581272236501, 0.9935474201721358, 1.0]
When generating, if I got a Uniform outcome of 0.90 I would return x=4 because 0.918... is the first CDF entry larger than my uniform.
If you're worried about speed you could build an alias table, but with a geometric decay the probability of early termination of a linear search through the array is quite high. With the given example, for instance, you'll terminate on the first peek almost 2/3 of the time.
Use numpy.random.zipf and just reject any samples greater than or equal to m
I want to use the gaussian function in python to generate some numbers between a specific range giving the mean and variance
so lets say I have a range between 0 and 10
and I want my mean to be 3 and variance to be 4
mean = 3, variance = 4
how can I do that ?
Use random.gauss. From the docs:
random.gauss(mu, sigma)
Gaussian distribution. mu is the mean, and sigma is the standard deviation. This is slightly
faster than the normalvariate() function defined below.
It seems to me that you can clamp the results of this, but that wouldn't make it a Gaussian distribution. I don't think you can satisfy all the constraints simultaneously. If you want to clamp it to the range [0, 10], you could get your numbers:
num = min(10, max(0, random.gauss(3, 4)))
But then the resulting distribution of numbers won't be truly Gaussian. In this case, it seems you can't have your cake and eat it, too.
There's probably a better way to do this, but this is the function I ended up creating to solve this problem:
import random
def trunc_gauss(mu, sigma, bottom, top):
a = random.gauss(mu,sigma))
while (bottom <= a <= top) == False:
a = random.gauss(mu,sigma))
return a
If we break it down line by line:
import random
This allows us to use functions from the random library, which includes a gaussian random number generator (random.gauss).
def trunc_gauss(mu, sigma, bottom, top):
The function arguments allow us to specify the mean (mu) and variance (sigma), as well as the top and bottom of our desired range.
a = random.gauss(mu,sigma))
Inside the function, we generate an initial random number according to a gaussian distribution.
while (bottom <= a <= top) == False:
a = random.gauss(mu,sigma))
Next, the while loop checks if the number is within our specified range, and generates a new random number as long as the current number is outside our range.
return a
As soon as the number is inside our range, the while loop stops running and the function returns the number.
This should give a better approximation of a gaussian distribution, since we don't artificially inflate the top and bottom boundaries of our range by rounding up or down the outliers.
I'm quite new to Python, so there are most probably simpler ways, but this worked for me.
I was working on some numerical analytical computation and I ran into this python tutorial site - http://www.python-course.eu/weighted_choice_and_sample.php
Now, this is what I proffer as a solution should anyone be too busy as to not hit the site.
I don't know how many gaussian values you need so I'll go with 100 as n, mu you gave as 3 and variance as 4 which makes sigma = 2. Here's the code:
from random import gauss
n = 100
values = []
frequencies = {}
while len(values) < n:
value = gauss(3, 2)
if 0 < value < 10:
frequencies[int(value)] = frequencies.get(int(value), 0) + 1
values.append(value)
print(values)
I hope this helps. You can get the plot as well. It's all in the tutorials.
If you have a small range of integers, you can create a list with a gaussian distribution of the numbers within that range and then make a random choice from it.
import numpy as np
from random import uniform
from scipy.special import erf,erfinv
import math
def trunc_gauss(mu, sigma,xmin=np.nan,xmax=np.nan):
"""Truncated Gaussian distribution.
mu is the mean, and sigma is the standard deviation.
"""
if np.isnan(xmin):
zmin=0
else:
zmin = erf((xmin-mu)/sigma)
if np.isnan(xmax):
zmax=1
else:
zmax = erf((xmax-mu)/sigma)
y = uniform(zmin,zmax)
z = erfinv(y)
# This will not come up often but if y >= 0.9999999999999999
# due to the truncation of the ervinv function max z = 5.805018683193454
while math.isinf(z):
z = erfinv(uniform(zmin,zmax))
return mu + z*sigma
You can use minimalistic code for 150 variables:
import numpy as np
s = np.random.normal(3,4,150) #<= mean = 3, variance = 4
print(s)
Normal distribution is another like random, stochastic distribution.
So, we can check it by:
import seaborn as sns
import matplotlib.pyplot as plt
AA1_plot = sns.distplot(s, kde=True, rug=False)
plt.show()
Where in the range 10-20 there would be twice the probability of 15 being returned than either extreme.
You can use random.triangular() with Python >= 2.6:
n = random.triangular(10, 20)
n will be a floating point value, so you need to convert it to int.
As pointed out by Blender, you really need to be more specific. But in the simplest case you can generate a Triangular Distribution from a uniform variate.
Try and see if this works (sorry if it's not very readable):
import random
def randIntWeight(min, max):
distanceFromMedian = random.uniform(0, (max - min) / 2.0)
return (max - min) / 2.0 + distanceFromMedian * (-1) ** (random.randrange(-1, 0))
I'm still brushing up on my Probability Theory, so please correct me if this isn't right.
Another built-in function is [numpy.random.normal][1]
numpy.random.normal(loc=0.0, scale=1.0, size=None)
Draw random samples from a normal (Gaussian) distribution.
You can specify loc=15.0 to set the mean and scale=2 to 5 to make the range of possible values narrower or broader. The scale is the number of standard deviations +/- of your mean (15) that is likely. It doesn't let you define a specific range, but you can always take the output and re-roll it if it falls outside of some range. This gives you a more nuanced way to get values around a certain value.
set size=None to return one value.
From https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.normal.html
mu, sigma = 0, 0.1 # mean and standard deviation
s = np.random.normal(mu, sigma, 1000)
... (see link below for full code)
plt.show()