Generate random integer weighted toward median - python

Where in the range 10-20 there would be twice the probability of 15 being returned than either extreme.

You can use random.triangular() with Python >= 2.6:
n = random.triangular(10, 20)
n will be a floating point value, so you need to convert it to int.

As pointed out by Blender, you really need to be more specific. But in the simplest case you can generate a Triangular Distribution from a uniform variate.

Try and see if this works (sorry if it's not very readable):
import random
def randIntWeight(min, max):
distanceFromMedian = random.uniform(0, (max - min) / 2.0)
return (max - min) / 2.0 + distanceFromMedian * (-1) ** (random.randrange(-1, 0))
I'm still brushing up on my Probability Theory, so please correct me if this isn't right.

Another built-in function is [numpy.random.normal][1]
numpy.random.normal(loc=0.0, scale=1.0, size=None)
Draw random samples from a normal (Gaussian) distribution.
You can specify loc=15.0 to set the mean and scale=2 to 5 to make the range of possible values narrower or broader. The scale is the number of standard deviations +/- of your mean (15) that is likely. It doesn't let you define a specific range, but you can always take the output and re-roll it if it falls outside of some range. This gives you a more nuanced way to get values around a certain value.
set size=None to return one value.
From https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.normal.html
mu, sigma = 0, 0.1 # mean and standard deviation
s = np.random.normal(mu, sigma, 1000)
... (see link below for full code)
plt.show()

Related

Can normal distribution prob density be greater than 1?... based on python code checkup

I have a question:
Given mean and variance I want to calculate the probability of a sample using a normal distribution as probability basis.
The numbers are:
mean = -0.546369
var = 0.006443
curr_sample = -0.466102
prob = 1/(np.sqrt(2*np.pi*var))*np.exp( -( ((curr_sample - mean)**2)/(2*var) ) )
I get a probability which is larger than 1! I get prob = 3.014558...
What is causing this? The fact that the variance is too small messes something up? It's a totally legal input to the formula and should give something small not greater than 1! Any suggestions?
Ok, what you compute is not a probability, but a probability density (which may be larger than one). In order to get 1 you have to integrate over the normal distribution like so:
import numpy as np
mean = -0.546369
var = 0.006443
curr_sample = np.linspace(-10,10,10000)
prob = np.sum( 1/(np.sqrt(2*np.pi*var))*np.exp( -( ((curr_sample - mean)**2)/(2*var) ) ) * (curr_sample[1]-curr_sample[0]) )
print prob
witch results in
0.99999999999961509
The formula you give is a probability density, not a probability. The density formula is such that when you integrate it between two values of x, you get the probability of being in that interval. However, this means that the probability of getting any particular sample is, in fact, 0 (it's the density times the infinitesimally small dx).
So what are you actually trying to calculate? You probably want something like the probability of getting your value or larger, the so-called tail probability, which is often used in statistics (it so happens that this is given by the error function when you're talking about a normal distribution, although you need to be careful of exactly how it's defined).
When considering the bell-shaped probability distribution function (PDF) of given mean and variance, the peak value of the curve (height of mode) is 1/sqrt(2*pi*var). It is 1 for standard normal distribution (mean 0 and var 1). Hence when trying to calculate a specific value of a general normal distribution pdf, values larger than 1 are possible.

Monte Carlo simulations by large no of trials

Consider that following program.
import math
import random
def inside_unit_circle(point):
"""
Compute distance of point from origin
"""
distance = math.sqrt(point[0] ** 2 + point[1] ** 2)
return distance < 1
def estimate_mystery(num_trials):
"""
Main function
"""
num_inside = 0
for dumm_idx in range(num_trials):
new_point = [2 * random.random() - 1, 2 * random.random() - 1]
if inside_unit_circle(new_point):
num_inside += 1
return float(num_inside) / num_trials
print estimate_mystery(10000)
This program uses random.random() to generates a random set of points that are uniformly distributed over the square with corners at
(1, 1) (−1, 1)
(1,−1) (−1,−1)
Here, being uniformly distribution means that each point in the square has an equal chance of being generated. The method then tests whether these points lie inside a unit circle.
As one increases the number of trials, the value returned by estimate_mystery tends towards a specific value that has a simple expression involving a well-known constant. Enter this value as a math expression below. (Do not enter a floating point number.)
So you need to run estimate_mystery with increasingly higher numbers of trials. As you do so, it will become clear that the value increases to the following simple expression:
(\sum_{k=1}^{\infty} \frac{e^{i\pi(k+1)}}{2k-1})
It should be noted, however, that this is not the only correct answer. The following would have been valid too, where \zeta is the Riemann zeta function:
However, this does not include the well-known constant e.
I'm not sure why this is confusing. It's quite clear that the sum expression is correct, and it's written quite clearly: the code below the image is very standard LaTeX formatting for mathematical expressions. But to illustrate its correctness, here's a plot showing the convergence when taking the sum to n, and running estimate_mystery up to n as well:
Hrmm... maybe this wasn't what your question wanted? It should also converge to the following, where \gamma is a unit circle around z=0 on the complex plane:
(-i\oint_\gamma z^{-3}e^{\frac{z}{2}}dz)
If you try estimate_mystery() method with different inputs such as with, 100, 1000, 10000, 100000), you will see that the result will be 0.81, 0.781 0.7807 0.7855, accordingly.
It means, the more you increase the trial number, the result is getting closer ( converges ) to 0.7855. This number can be defined with Pi.
You can find it just by simple calculation. Pi * x = 0.7855. From this equation we can find that x ~ 0.25. Therefore, 0.7855 can be described with Pi/4.

Calculate the exact integral in Python

I need to write a python code to calculate the exact value of the integral (-5, 5) of 1/(1+x^2).
I know the answer is 2arctan(5) which is roughly equivalent to 2.746801...
I have below the code I have written, however I am getting a slightly different answer and I was wondering if there is anything I can do to make this code more accurate? Thanks for any help!
## The function to be integrated
def func(x):
return 1/(1 + x**2)
## Defining variables
a = -5.0
b = 5.0
dx = 1.0
Area = 0
## Number of trapezoids
n = int((b-a)/dx)
## Loop to calculate area and sum
for i in range(1, n+1):
x0 = a + (i-1)*dx
x1 = a + i*dx
## Area of each trapezoid
Ai = dx*(func(x0) + func(x1))/2.0
## Cumulative sum of areas
Area = Area + Ai
print("The exact value is: ", Area)
The answer I am getting is 2.756108...
I know it's a small difference, however, it is a difference and I would like to try for something more exact.
The reason you are getting an approximate value for the integral is because you are using an approximation technique (a first-order approximation to compute the value of the definite integral).
There are two ways to evaluate an integral: analytically or numerically (by approximation). Your method is of the second variety, and since it's an approximation it will generate a value that is within a certain margin of error of the real value.
The point of my answer is that there is no way for you to calculate the exact value of the integral using a numeric approach (definitely not in the case of this function). So you will have to settle for a certain margin of error that you're willing to accept and then choose a delta-x sufficiently small to get you within that range.

Generating numbers with Gaussian function in a range using python

I want to use the gaussian function in python to generate some numbers between a specific range giving the mean and variance
so lets say I have a range between 0 and 10
and I want my mean to be 3 and variance to be 4
mean = 3, variance = 4
how can I do that ?
Use random.gauss. From the docs:
random.gauss(mu, sigma)
Gaussian distribution. mu is the mean, and sigma is the standard deviation. This is slightly
faster than the normalvariate() function defined below.
It seems to me that you can clamp the results of this, but that wouldn't make it a Gaussian distribution. I don't think you can satisfy all the constraints simultaneously. If you want to clamp it to the range [0, 10], you could get your numbers:
num = min(10, max(0, random.gauss(3, 4)))
But then the resulting distribution of numbers won't be truly Gaussian. In this case, it seems you can't have your cake and eat it, too.
There's probably a better way to do this, but this is the function I ended up creating to solve this problem:
import random
def trunc_gauss(mu, sigma, bottom, top):
a = random.gauss(mu,sigma))
while (bottom <= a <= top) == False:
a = random.gauss(mu,sigma))
return a
If we break it down line by line:
import random
This allows us to use functions from the random library, which includes a gaussian random number generator (random.gauss).
def trunc_gauss(mu, sigma, bottom, top):
The function arguments allow us to specify the mean (mu) and variance (sigma), as well as the top and bottom of our desired range.
a = random.gauss(mu,sigma))
Inside the function, we generate an initial random number according to a gaussian distribution.
while (bottom <= a <= top) == False:
a = random.gauss(mu,sigma))
Next, the while loop checks if the number is within our specified range, and generates a new random number as long as the current number is outside our range.
return a
As soon as the number is inside our range, the while loop stops running and the function returns the number.
This should give a better approximation of a gaussian distribution, since we don't artificially inflate the top and bottom boundaries of our range by rounding up or down the outliers.
I'm quite new to Python, so there are most probably simpler ways, but this worked for me.
I was working on some numerical analytical computation and I ran into this python tutorial site - http://www.python-course.eu/weighted_choice_and_sample.php
Now, this is what I proffer as a solution should anyone be too busy as to not hit the site.
I don't know how many gaussian values you need so I'll go with 100 as n, mu you gave as 3 and variance as 4 which makes sigma = 2. Here's the code:
from random import gauss
n = 100
values = []
frequencies = {}
while len(values) < n:
value = gauss(3, 2)
if 0 < value < 10:
frequencies[int(value)] = frequencies.get(int(value), 0) + 1
values.append(value)
print(values)
I hope this helps. You can get the plot as well. It's all in the tutorials.
If you have a small range of integers, you can create a list with a gaussian distribution of the numbers within that range and then make a random choice from it.
import numpy as np
from random import uniform
from scipy.special import erf,erfinv
import math
def trunc_gauss(mu, sigma,xmin=np.nan,xmax=np.nan):
"""Truncated Gaussian distribution.
mu is the mean, and sigma is the standard deviation.
"""
if np.isnan(xmin):
zmin=0
else:
zmin = erf((xmin-mu)/sigma)
if np.isnan(xmax):
zmax=1
else:
zmax = erf((xmax-mu)/sigma)
y = uniform(zmin,zmax)
z = erfinv(y)
# This will not come up often but if y >= 0.9999999999999999
# due to the truncation of the ervinv function max z = 5.805018683193454
while math.isinf(z):
z = erfinv(uniform(zmin,zmax))
return mu + z*sigma
You can use minimalistic code for 150 variables:
import numpy as np
s = np.random.normal(3,4,150) #<= mean = 3, variance = 4
print(s)
Normal distribution is another like random, stochastic distribution.
So, we can check it by:
import seaborn as sns
import matplotlib.pyplot as plt
AA1_plot = sns.distplot(s, kde=True, rug=False)
plt.show()

Python numpy.random.normal only positive values

I want to create a normal distributed array with numpy.random.normal that only consists of positive values.
For example the following illustrates that it sometimes gives back negative values and sometimes positive. How can I modify it so it will only gives back positive values?
>>> import numpy
>>> numpy.random.normal(10,8,3)
array([ -4.98781629, 20.12995344, 4.7284051 ])
>>> numpy.random.normal(10,8,3)
array([ 17.71918829, 15.97617052, 1.2328115 ])
>>>
I guess I could solve it somehow like this:
myList = numpy.random.normal(10,8,3)
while item in myList <0:
# run again until all items are positive values
myList = numpy.random.normal(10,8,3)
The normal distribution, by definition, extends from -inf to +inf so what you are asking for doesn't make sense mathematically.
You can take a normal distribution and take the absolute value to "clip" to positive values, or just discard negative values, but you should understand that it will no longer be a normal distribution.
I assume that what you mean is that you want to modify the probability density such that it is the same shape as normal in the positive range, and zero in negative. That is a pretty common practical case. In such case, you cannot simply take the absolute value of generated normal random variables. Instead, you have to generate a new independent normally distributed number until you come up with a positive one. One way to do that is recursively, see below.
import numpy as np
def PosNormal(mean, sigma):
x = np.random.normal(xbar,delta_xbar,1)
return(x if x>=0 else PosNormal(mean,sigma))
what about using lognormal along these lines:
mu = np.mean(np.log(list))
sigma = np.std(np.log(list))
new_list = np.random.lognormal(mu, sigma, length_of_new_list)
data = np.random.randint(low=1,high=100,size=(4,4),dtype='int')
Or maybe you could just 'shift' your entire distribution to the 'right' by subtracting the min (or adding the abs val of your min):
y = np.random.normal(0.0, 1.0, 10)
y
array([-0.16934484, 0.06163384, -0.29714508, -0.25917105, -0.0395456 ,
0.17424635, -0.42289079, 0.71837785, 0.93113373, 1.12096384])
y - min(y)
array([0.25354595, 0.48452463, 0.12574571, 0.16371974, 0.38334519,
0.59713714, 0. , 1.14126864, 1.35402452, 1.54385463])
The question is reasonable. For motivation, consider simulations of biological cells. The distribution of the count of a type of molecule in a cell can be approximated by the normal distribution, but must be non-negative to be physically meaningful.
My whole-simulator uses this method to sample the initial distribution of a molecule's count:
def non_neg_normal_sample(random_state, mean, std, max_iters=1000):
""" Obtain a non-negative sample from a normal distribution
The distribution returned is normal for 0 <= x, and 0 for x < 0
Args:
random_state (:obj:`numpy.random.RandomState`): a random state
mean (:obj:`float`): mean of the normal dist. to sample
std (:obj:`float`): std of the normal dist. to sample
max_iters (:obj:`int`, optional): maximum number of draws of the true normal distribution
Returns:
:obj:`float`: a normal sample that is not negative
Raises:
:obj:`ValueError`: if taking `max_iters` normal sample does not obtain one that is not negative
"""
iter = 0
while True:
sample = random_state.normal(mean, std)
iter += 1
if 0 <= sample:
return sample
if max_iters <= iter:
raise ValueError(f"{iter} draws of a normal dist. with mean {mean:.2E} and std {std:.2E} "
f"fails to obtain a non-negative sample")
I expand on #gena-kukartsev 's answer in two ways: First, I avoid recursion which could overflow the call stack. (Let's avoid answers that can overflow the stack on stackoverflow!) Second, I catch possibly bad input by limiting the number of samples of the distribution.
You can offset your entire array by the lowest value (left most) of the array. What you get may not be truly "normal distribution", but within the scope of your work, dealing with finite array, you can ensure that the values are positive and fits under a bell curve.
>>> mu,sigma = (0,1.0)
>>> s = np.random.normal(mu, 1.0, 100)
>>> s
array([-0.58017653, 0.50991809, -1.13431539, -2.34436721, -1.20175652,
0.56225648, 0.66032708, -0.98493441, 2.72538462, -1.28928887])
>>> np.min(s)
-2.3443672118476226
>>> abs(np.min(s))
2.3443672118476226
>>> np.add(s,abs(np.min(s)))
array([ 1.76419069, 2.85428531, 1.21005182, 0. , 1.14261069,
2.90662369, 3.00469429, 1.3594328 , 5.06975183, 1.05507835])
You could use high loc with low scale:
np.random.normal(100, 10, 10) /100
[0.96568643 0.92123722 0.83242272 0.82323367 1.07532713 0.90125736
0.91226052 0.90631754 1.08473303 0.94115643]

Categories