python 3.1 - Creating normal distribution - python

I have scipy and numpy, Python v3.1
I need to create a 1D array of length 3million, using random numbers between (and including) 100-60,000. It has to fit a normal distribution.
Using 'a = numpy.random.standard_normal(3000000)', I get a normal distribution for that required length; not sure how to achieve the required range.

A standard normal distribution has mean 0 and standard deviation 1. What I understand from your requirements is that you need a ((60000-100)/2, (60000-100)/2) one. Take each value from standard_normal() result, multiply it by the new variance, and add the new mean.
I haven't used NumPy, but a quick search of the docs says that you can achieve what you want directly bu using numpy.random.normal()
One last tidbit: normal distributions are not bounded. That means there isn't a value with probability zero. Your requirements should be in terms of means and variances (or standard deviations), and not of limits.

If you want a truly random normal distribution, you can't guarentee how far the numbers will spread. You can reduce the probability of outliers, however, by specifying the standard deviation
>>> n = 3000000
>>> sigma5 = 1.0 / 1744278
>>> n * sigma5
1.7199093263803131 # Expect one values in 3 mil outside range at 5 stdev.
>>> sigma6 = 1.0 / 1 / 506800000
>>> sigma6 = 1.0 / 506800000
>>> n * sigma6
0.0059194948697711127 # Expect 0.005 values in 3 mil outside range at 6 stdev.
>>> sigma7 = 1.0 / 390600000000
>>> n * sigma7
7.6804915514592934e-06
Therefore, in this case, ensuring that the standard deviation is only 1/6 or 1/7 of half the range will give you reasonable confidence that your data will not exceed the range.
>>> range = 60000 - 100
>>> spread = (range / 2) / 6 # Anything outside of the range will be six std. dev. from the mean
>>> mean = (60000 + 100) / 2
>>> a = numpy.random.normal(loc = mean, scale = spread, size = n)
>>> min(a)
6320.0238199673404
>>> max(a)
55044.015566089176
Of course, you can still can values that fall outside the range here

try this nice little method:
You'll want a method that just makes one random number.
import random
list = [random.randint(min,max) for i in range(numitems)]
This will give you a list with numitems random numbers between min and max.
Of course, 3000000 is a lot of items to have in memory. Consider making the random numbers as they are needed by the program.

Related

Python - How to generate Random Natural Number with no limit on Upper Bound

I want to generate random natural number without any upper bound.
For example I can generate random natural number using random.randint(1, 1000) but I don't want to specify the upper bound. How can I achieve that?
The main problem here isn't even the maximum integer the computer can keep (including temporarily while generating such large random numbers). Let's assume PO agrees to sacrifice precision when getting very large numbers and agrees to keep them as float.
The fundamental problem here is actually the distribution of this natural number. random.randint(1, 1000) returns a random integer from the uniform distribution. This distribution can't exist without the upper bound, because then the Probability Mass Function (pmf) will return only zeroes. The pmf integral from -∞ to +∞ must equal 1, which is impossible with the uniform distribution without the upper bound, because it doesn't converge on the right.
However, there are other discrete distributions of natural numbers, which although not limited on the right side have a progressively lower probability the larger the integer is. But the packages that generate them in Python usually work with numpy data types (very limited, like C and unlike regular Python and Wolfram, by the size of the integer), so in Python there is an inherent bound anyway.
from scipy.stats import nbinom
print(nbinom.rvs(1e10, 0.5, size=1) + 1)
One could try to write a numerical algorithm in regular Python to generate let's say this negative binomial + 1 random variable from the random number generated by the continuous uniform distribution from 0.0 to 1.0, each time calculating the integral of the former's pmf in regular Python (and reversing it into the quantile function), but it will be grossly inefficient.
it is not possible to choose a number from 0 to infinity. However, you can use sys.maxsize for upper bound which is the maximum number supported. Do not forget to import sys module.
import sys
If we note p(n) the probability of drawing the number n, even if p(n) may not drop to zero, the p(n) serie must still be convergent, with a sum equal to 1. So p(n) must decrease fast enough.
A possibility is to take an exponential distribution. The parameter of the distribution determines the finite average value of the random number.
Naïvely, the distribution returns a floating-point number, so we have to use the int() conversion function.
Like this, aiming at an average value of 20:
$ python3
Python 3.10.6 (main, Aug 2 2022, 00:00:00) [GCC 12.1.1 20220507 (Red Hat 12.1.1-1)] on linux
...
>>>
>>> import numpy as np
>>>
>>> ys = list(map(int, np.random.exponential(20, 10)))
>>>
>>> ys
[7, 5, 36, 4, 10, 3, 26, 45, 9, 17]
>>>
>>> ys2 = list(map(int, np.random.exponential(20, 100)))
>>>
>>> sum(ys2) / len(ys2)
18.89
>>>
>>> ys4 = list(map(int, np.random.exponential(20, 10000)))
>>>
>>> sum(ys4) / len(ys4)
19.5025
>>>
>>> min(ys4)
0
>>> max(ys4)
207
>>>
>>> quit()
$

How to make a biased random number out of a large set of numbers

I want to make Python 3.7.1 pick a number between 0 and 100. However I want a lower number to be much more likely than a higher number, in a reverse exponential-smooth-graduation-curve kind of way (doesn't have to be exact).
I guess I could start with
myrandomnumber = random.randint(0, 100)
And then link that to an array of some sort to determine differing percentages for each number. I've seen other people do that with random die rolls, but the thing is, that's quite neat for only for six possibilities, I want to do this for a hundred (or more) and don't want to sit there making a huge array with a hundred entries just for that. Of course I could do it this way I suppose, but I feel like Python probably has a really easy way to do this that I'm missing.
Thanks, folks!
What you probably want is a gamma distributed random number.
For example with a k=1 and θ=2.0:
There are algorithms for using the evenly-distributed random function to generate normal, exponential, or gamma distributed values.
But since you're in python, you could probably jump straight to using numpy's random.gamma function:
#the (1,2) shape ends basically at 20. Multiply by 5 to get my 0..100 scale
numpy.random.gamma(1, 2.0) * 5
I'm going with the assumption that you want to generate integer values over a bounded range, and that you mean non-uniformly distributed when you talk about "bias". Since you don't have a particular parametric distribution in mind, one approach is to start with a continuous distribution and take the "floor" of the outcomes using int(). You'll want to increase the upper bound by 1 so that rounding down gives values inclusive of that bound.
One easy choice is a triangular distribution. Python provides random.triangular() function, which takes 3 arguments—the lower bound, upper bound, and the mode. Here's a discretized version:
import random as rnd
import math
import sys
def triangle(upper_bound):
return int(rnd.triangular(0.0, float(upper_bound + 1) - sys.float_info.epsilon, 0.0))
I've subtracted float's epsilon from the upper bound to prevent the (extremely unlikely) chance of getting an outcome of 101 when your upper bound is 100. Another bounded distribution choice might be the beta distribution, which you could then scale and truncate.
If you want the distribution shifted even further down the scale towards 0, you could use distributions such as the exponential, or more generally the gamma, with truncation and rounding. Both of those have infinite support, so there are a couple of ways to truncate. The simpler way is to use acceptance/rejection—keep generating values until you get one in range:
def expo_w_rejection(upper_bound, scale_param = 0.4):
upper_bound += 1
while True:
candidate = rnd.expovariate(1.0 / (upper_bound * scale_param))
if candidate < upper_bound:
return int(candidate)
As before, bump the upper limit up by 1 to get outcomes that include the upper limit after truncating. I've also included an optional scale_param which should be a value strictly between 0 and 1, i.e., not inclusive of either limit. Values closer to 0 will cause the results to bunch more to the left, values closer to 1 yield less bunching.
The other method would be to use the inverse transform technique for generating, and to restrict the range of the uniform to not exceed the upper bound based on evaluating the cumulative distribution function at the target upper bound:
def trunc_exp(upper_bound, scale_param = 0.4):
upper_bound = float(upper_bound) + 1.0 - sys.float_info.epsilon
trunc = 1.0 - math.exp(-1.0 / scale_param)
return int((-upper_bound * scale_param) * math.log(1.0 - trunc * rnd.random()))
Both approaches yield distributionally simular results, as can be seen in the following screenshot. "Column 1" was generated with truncated inversion, while "Column 2" was generated with acceptance/rejection.

Discretization size and Recurrence period of numpy uniform

My queries are regarding the generation of the uniform random number generator using numpy.random.uniform on [0,1).
Does this implementation involve a uniform step-size, i.e. are the universe of possibilities {0,a,2a,...,Na} where (N+1)a = 1 and a is constant?
If the above is true, then what's the value of this step-size? I noticed that the value of numpy.nextafter(x,y) keeps on changing depending upon x. Hence my question regarding whether a uniform step-size was used to implement numpy.random.uniform.
If the step-size is not uniform, then what would be the best way to figure out the number of unique values that numpy.random.uniform(low=0, high=1) can take?
What's the recurrence period of numpy.random.uniform, i.e. after how many samples will I see my original number again? For maximum efficiency, this should be equal to the number of unique values.
I tried looking up the source code at Github but didn't find anything directly interpretable there.
The relevant function is
double
rk_double(rk_state *state)
{
/* shifts : 67108864 = 0x4000000, 9007199254740992 = 0x20000000000000 */
long a = rk_random(state) >> 5, b = rk_random(state) >> 6;
return (a * 67108864.0 + b) / 9007199254740992.0;
}
which is found in randomkit.c inside the numpy source tree.
As you can see the granularity is 1 / 9007199254740992.0 which happens to equal 2^-53 which is the (downward) float64 resolution at 1.
>>> 1 / 9007199254740992.0
1.1102230246251565e-16
>>> 2**-53
1.1102230246251565e-16
>>> 1-np.nextafter(1.0, 0)
1.1102230246251565e-16

How do you generate a random number in a range with a specific average in python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
So lets say I want to generate a random number between -1000 and 1000 and I want the average to be x. How would I do this?
Edit: just to be clear, the numbers generated should fall in a standard normal distribution with an average of x.
So if I generated a thousand numbers and found the average of them it would be x.
I tried this but it doesn't seem to work:
sum_ = 0
for i in range(0, 10):
sum_ += random.triangular(-1000, 1000, 10)
print sum_ / 10
and I want this to give me something ~10 but I'm obviously not using the right code.
The standard normal distribution has infinite range, there is a non-null probability of finding points outside any given interval. You could use triangular, just remember that the mean is (a + b + mode) / 3 so triangular(a, b, 3*x - a - b) will get you what you want:
from random import triangular
a = 0
b = 10
x = 3
test = [triangular(a, b, 3*x - a - b) for _ in range(1000)]
sum(test) / 1000.0
# 3.006828109140065
As several comments have mentioned, these two requirements conflict with each other:
I want to generate a random number between -1000 and 1000
and
the numbers generated should fall in a standard normal distribution with an average of x
because a standard normal distribution has an infinite domain. If you choose numbers from a normal distribution, there will be some probability that you get a value greater than 1000 or less than -1000. Conversely, if you do anything to limit the range to [-1000,1000], then you will not be drawing from a normal distribution.
One option is to generate numbers according to a truncated normal distribution, which is just like a standard normal distribution except that the probability is set to zero outside the range [-1000,1000]. The easiest way to do this is to pick a number according to a normal distribution, and if it's outside the desired range, just pick again.
SIGMA=10.0 # you can pick this value to be pretty much anything
def generate_number(average):
x = random.normal_variate(average, SIGMA)
while x > 1000 or x < -1000:
x = random.normalvariate(average, SIGMA)
return x
Here SIGMA is the standard deviation of the normal distribution, which governs how spread out the values will be. If SIGMA is small and average is not close to 1000 or -1000, or to be precise: if (1000-average)/SIGMA and (1000+average)/SIGMA are both larger than 2 or 3, then this method will be fairly efficient because it will usually hit a number within the desired range [-1000,1000] the first time. But if one of those ratios is small, like around 1 or less, then the algorithm will sometimes have to loop once or twice. That probably won't be a big deal. (If you wanted to avoid it there are advanced techniques you could use, but I don't think it'd be worth the complexity.)
Another option, which is kind of what your example code in the question does, is to drop the requirement of using a normal distribution entirely, and use some other probability distribution which is naturally restricted to a certain range. Your example code, equivalent to
random.triangular(-1000,1000,mode)
uses a distribution in which the probability increases linearly from -1000 to the mode and then decreases linearly from mode to 1000. The catch with this, though, is that mode is the value which has the largest probability of being chosen. It's not the same as the average of the numbers chosen. The actual average is (min+max+mode)/3., or in your case, since min+max = 1000-1000 = 0, just mode/3, so if you wanted to generate numbers with a specified average, you would have to use
def generate_number(average):
mode = 3*average
if mode < -1000 or mode > 1000:
raise ValueError('Average cannot be satisfied: %f' % average)
return random.normal_variate(-1000, 1000, mode)
Note that using this distribution means you can never produce numbers with an average less than -1000./3. or greater than 1000./3., unless you also adjust the min or max values accordingly.
normalvariate(self, mu, sigma) method of Random instance
Normal distribution.
mu is the mean, and sigma is the standard deviation.
i.e.
import random
x= random.normalvariate(2,17)
Here 2 is the mean, and 17 is the standard deviation. If you want to scale linearly you would add and multiply by the appropriate values.

Calculate poisson probability percentage

When you use the POISSON function in Excel (or in OpenOffice Calc), it takes two arguments:
an integer
an 'average' number
and returns a float.
In Python (I tried RandomArray and NumPy) it returns an array of random poisson numbers.
What I really want is the percentage that this event will occur (it is a constant number and the array has every time different numbers - so is it an average?).
for example:
print poisson(2.6,6)
returns [1 3 3 0 1 3] (and every time I run it, it's different).
The number I get from calc/excel is 3.19 (POISSON(6,2.16,0)*100).
Am I using the python's poisson wrong (no pun!) or am I missing something?
scipy has what you want
>>> scipy.stats.distributions
<module 'scipy.stats.distributions' from '/home/coventry/lib/python2.5/site-packages/scipy/stats/distributions.pyc'>
>>> scipy.stats.distributions.poisson.pmf(6, 2.6)
array(0.031867055625524499)
It's worth noting that it's pretty easy to calculate by hand, too.
It is easy to do by hand, but you can overflow doing it that way. You can do the exponent and factorial in a loop to avoid the overflow:
def poisson_probability(actual, mean):
# naive: math.exp(-mean) * mean**actual / factorial(actual)
# iterative, to keep the components from getting too large or small:
p = math.exp(-mean)
for i in xrange(actual):
p *= mean
p /= i+1
return p
This page explains why you get an array, and the meaning of the numbers in it, at least.

Categories