Generating independent random variables - python

I would like to generate n independent uniform variables on [0,1]. I am not sure if numpy.random.uniform(0,1,size=n) is fine because nothing tells me that the data is independently generated.
Do I have instead to loop n times on numpy.random.uniform(0,1,size=1)? Do I have to do something with the seed?

You can use the function exactly as you wrote: numpy.random.uniform(0,1,size=n). Do not loop.
From the documentation "... any value within the given interval is equally likely to be drawn by uniform." Each value is as independent as a computer can make them.
Setting a seed will create the same array of random numbers each time. This is useful for testing so you can get the same output each time, but if you want the array to be a new set of random numbers each time the function is called, don't set a seed.

numpy.random.uniform(0,1,size=n) generates values as independently as possible using an pseudo-random number generator.
If you're not convinced, you can check that using the size argument and a for loop will give you exactly the same results:
import numpy as np
n = 5
np.random.seed(5)
print(np.random.uniform(0,1,size=n))
np.random.seed(5)
for _ in range(n):
print(np.random.uniform(0,1))

Related

Is there a way to symbolically represent the random sum of a random variable?

Say I have some random distret random variable N, and some other random variable X. I want to make a new compund random variable Y as follows.
Y = \sum_{i=1}^{N} X_{i}
where X_i are independent and identically distributed random variables from the same distribution as X.
(Sorry for the code block, I have too low rep to post pictures.)
So codewise I'd begin with somethine like
from sympy import abc, stats, S
N = stats.Geometric(name='N', p=S.One/2)
X = stats.Normal(name='X')
or whatever distributions I was interested in for the discrete and summand random variable. I've been looking at the docs and examples, but can't find a way to represent the compund variable I'm interested in. Then be able to invoke stats.E(Y) or stats.variance(Y).
I know I can calculate conditional expected values and variances along with other computations to calculate the "total" expected value and "total" variance. I can do that on my own, but I'd like to verify that work with a direct symbolic construction then invoke.
Until you stick to the normal distribution, like in your example, the sum is the same normal random variable, which E and var you may need to sweat to get to, given the discrete random variable is exotic. But if you deviate from normal, ooh-la-la, the things will become very nasty real quick. Monte Carlo is often the only solution. In the place of sympy developers, I would stay away from this mess.

Why am I getting different bootstrap results using different algorithms?

I am using two different methods of trying to generate a bootstrap sample
np.random.seed(335)
y=np.random.normal(0,1,5)
b=np.empty(len(y)) #initializes an empty vector
for j in range(len(y)):
a = np.random.randint(1,len(y)) #Draws a random integer from 1 to n, where n is our sample size
b[j] = y[a-1] #indicies in python start at zero, the worst part of Python in my opinion
c = np.random.choice(y, size=5)
print(b)
print(c)
and for my output I get different results
[1.04749432 1.71963433 1.71963433 1.71963433 1.71963433]
[-0.25224454 -0.25224454 0.46604474 1.71963433 0.46604474]
I think the answer has something to do with the random number generator, but I'm confused as to the exact reason.
This comes down to the use of different algorithms for randomized selection. There are numerous equivalent ways to select items at random with replacement using a pseudorandom generator (or to generate random variates from any other distribution). In particular, the algorithm for numpy.random.choice need not make use of numpy.random.randint in theory. What matters is that these equivalent ways should produce the same distribution of random variates. In the case of NumPy, look at NumPy's source code.
Another, less important, reason for different results is that the two different selection procedures (randint and choice) produce pseudorandom numbers themselves, which can differ from each other because the selection procedures didn't begin with the same seed (more precisely, the same sequence of pseudorandom numbers). If we set the seed to the same value before beginning each procedure:
np.random.seed(335)
y=np.random.normal(0,1,5)
b=np.empty(len(y))
np.random.seed(999999) # Seed selection procedure 1
for j in range(len(y)):
a = np.random.randint(1,len(y))
b[j] = y[a-1]
np.random.seed(999999) # Seed selection procedure 2
c = np.random.choice(y, size=5)
print(b)
print(c)
then each procedure will begin with the same pseudorandom numbers. But even so, the two procedures may use different algorithms for random selection, and these differences may still lead to different results.
(However, numpy.random.* functions, such as randint and choice, have become legacy functions as of NumPy 1.17, and their algorithms are expected to remain as they are for backward compatibility reasons. That version didn't deprecate any numpy.random.* functions, however, so they are still available for the time being. See also this question. In newer applications you should make use of the new system introduced in version 1.17, including numpy.random.Generator, if you have that version or later. One advantage of the new system is that the application relies less on global state.)

How best to initialize python's random module within a function

I'm looking for some info around generating as random of number as possible when the random module is embedded within a function like so:
import random as rd
def coinFlip()
flip = rd.random()
if flip > .5:
return "Heads"
else:
return "Tails"
main()
for i in range(1000000):
print(coinFlip())
Edit: Ideally the above script would always yield different results therefore limiting my ability to use random.seed()
Does the random module embedded within a function initialize with a new seed each time the function is called? (Instead of using the previous generated random number as the seed.)
If so...
Is the default initialization on system time exact enough to pull a truly random number considering that the system times in the for loop here would be so close together or maybe even the same (depending on the precision of the system time.)
Is there a way to initialize a random module outside of the function and have the function pull the next random number (so to avoid multiple initializations.)
Any other more pythonic ways to accomplish this?
Thank you very much!
use random.seed() if you want to initialize the pseudo-random number generator
you can have a look here
If you don’t initialize the pseudo-random number generator using a
random.seed (), internally random generator call the seed function and
use current system current time value as the seed value. That’s why
whenever we execute random.random() we always get a different value
if you want to always have a diff number than you should not bother with initializing the random module since internally, the random module it is using by default the current system time(which is always diff).
just use :
from random import random
def coinFlip()
if random() > .5:
return "Heads"
else:
return "Tails"
to make more clear, the random module it is not initializing each time it is used, only at import time, so every time you call random.random() you have the next number which is guaranteed to be different
For starters:
This module implements pseudo-random number generators for various distributions.
[..]
The functions supplied by this module are actually bound methods of a hidden instance of the random.Random class. You can instantiate your own instances of Random to get generators that don’t share state.
https://docs.python.org/3/library/random.html
The random module is a Pseudo-Random Number Generator. All PRNGs are entirely deterministic and have state. Meaning, if the PRNG is in the same state, the next "random" number will always be the same. As the above paragraph explains, your rd.random() call is really a call to an implicitly instantiated Random object.
So:
Does the random module embedded within a function initialize with a new seed each time the function is called?
No.
Is there a way to initialize a random module outside of the function and have the function pull the next random number (so to avoid multiple initializations.)
You don't need to avoid multiple initialisation, as it's not happening. You can instantiate your own Random object if you want to control the state exactly.
class random.Random([seed])
Class that implements the default pseudo-random number generator used by the random module.
random.seed(a=None, version=2)
Initialize the random number generator. If a is omitted or None, the current system time is used. [..]
So, the implicitly instantiated Random object uses the system time as initial seed (read further though), and from there will keep state. So each time you start your Python instance, it will be seeded differently, but will be seeded only once.

How to use a random seed value in order to unittest a PRNG in Python?

I'm still pretty new to programming and just learning how to unittest. I need to test a function that returns a random value. I've so far found answers suggesting the use of a specific seed value so that the 'random' sequence is constant and can be compared. This is what I've got so far:
This is the function I want to test:
import random
def roll():
'''Returns a random number in the range 1 to 6, inclusive.'''
return random.randint(1, 6)
And this is my unittest:
class Tests(unittest.TestCase):
def test_random_roll(self):
random.seed(900)
seq = random.randint(1, 6)
self.assertEqual(roll(), seq)
How do I set the corresponding seed value for the PRNG in the function so that it can be tested without writing it into the function itself? Or is this completely the wrong way to go about testing a random number generator?
Thanks
The other answers are correct as far as they go. Here I'm answering the deeper question of how to test a random number generator:
Your provided function is not really a random number generator, as its entire implementation depends on a provided random number generator. In other words, you are trusting that Python provides you with a sensible random generator. For most purposes, this is a good thing to do. If you are writing cryptographic primitives, you might want to do something else, and at that point you would want some really robust test strategies (but they will never be enough).
Testing a function returns a specific sequence of numbers tells you virtually nothing about the correctness of your function in terms of "producing random numbers". A predefined sequence of numbers is the opposite of a random sequence.
So, what do you actually want to test? For 'roll' function, I think you'd like to test:
That given 'enough' rolls it produces all the numbers between 1 and 6, preferably in 'approximately' equal proportions.
That it doesn't produce anything else.
The problem with 1. is that your function is defined to be a random sequence, so there is always a non-zero chance that any hard limits you put in to define 'enough' or 'approximately equal' will occasionally fail. You could do some calculations to pick some limits that would make sure your test is unlikely to fail more than e.g. 1 in a billion times, or you could slap a random.seed() call that will mean it will never fail if it passes once (unless the underlying implementation from Python changes).
Item 2. could be 'tested' more easily - generate some large 'N' number of items, check that all are within expected outcome.
For all of this, however, I'd ask what value the unit tests actually are. You literally cannot write a test to check whether something is 'random' or not. To see whether the function has a reasonable source of randomness and uses it correctly, tests are useless - you have to inspect the code. Once you have done that, it's clear that your function is correct (providing Python provides a decent random number generator).
In short, this is one of those cases where unit tests provide extremely little value. I would probably just write one test (item 2 above), and leave it at that.
By seeding the prng with a known seed, you know which sequence it will produce, so you can test for this sequence:
class Tests(unittest.TestCase):
def test_random_roll(self):
random.seed(900)
self.assertEqual(roll(), 6)
self.assertEqual(roll(), 2)
self.assertEqual(roll(), 5)

Can one use negative numbers as seeds for random number generation?

This is not a coding question, but am hoping that someone has come across this in the forums here. I am using Python to run some simulations. I need to run many replications using different random number seeds. I have two questions:
Are negative numbers okay as seeds?
Should I keep some distance in the seeds?
Currently I am using random.org to create 50 numbers between -100000 and +100000, which I use as seeds. Is this okay?
Thanks.
Quoting random.seed([x]):
Optional argument x can be any hashable object.
Both positive and negative numbers are hashable, and many other objects besides.
>>> hash(42)
42
>>> hash(-42)
-42
>>> hash("hello")
-1267296259
>>> hash(("hello", "world"))
759311865
Is it important that your simulations are repeatable? The canonical way to seed a RNG is by using the current system time, and indeed this is random's default behaviour:
random.seed([x])
Initialize the basic random number generator. Optional argument x can be
any hashable object. If x is omitted
or None, current system time is used;
current system time is also used to
initialize the generator when the
module is first imported.
I would only deviate from this behaviour if repeatability is important. If it is important, then your random.org seeds are a reasonable solution.
Should I keep some distance in the seeds?
No. For a good quality RNG, the choice of seed will not affect the quality of the output. A set of seeds [1,2,3,4,5,6,7,8,9,10] should result in the same quality of randomness as any random selection of 10 ints. But even if a selection of random uniformly-distributed seeds were desirable, maintaining some distance would break that distribution.

Categories