How does Python seed the Mersenne twister - python

How does Python seed its Mersenne twister pseudorandom number generator used in the built-in random library if no explicit seed value is provided? Is it based on the clock somehow? If so, is the seed found when the random module is imported or when it is first called?
Python's documentation does not seem to have the answer.

In modern versions of python (c.f. http://svn.python.org/projects/python/branches/release32-maint/Lib/random.py) Random.seed tries to use 32 bytes read from /dev/urandom. If that doesn't work, it uses the current time: (a is an optional value which can be used to explicitly seed the PRNG.)
if a is None:
try:
a = int.from_bytes(_urandom(32), 'big')
except NotImplementedError:
import time
a = int(time.time() * 256) # use fractional seconds

The seed is based on the clock or (if available) an operating system source. The random module creates (and hence seeds) a shared Random instance when it is imported, not when first used.
References
Python docs for random.seed:
random.seed(a=None, version=2)
Initialize the random number generator.
If a is omitted or None, the current system time is used. If randomness sources are provided by the operating system, they are used
instead of the system time (see the os.urandom() function for details
on availability).
Source of random.py (heavily snipped):
from os import urandom as _urandom
class Random(_random.Random):
def __init__(self, x=None):
self.seed(x)
def seed(self, a=None, version=2):
if a is None:
try:
a = int.from_bytes(_urandom(32), 'big')
except NotImplementedError:
import time
a = int(time.time() * 256) # use fractional seconds
# Create one instance, seeded from current time, and export its methods
# as module-level functions. The functions share state across all uses
#(both in the user's code and in the Python libraries), but that's fine
# for most programs and is easier for the casual user than making them
# instantiate their own Random() instance.
_inst = Random()
The last line is at the top level, so it is executed when the module is loaded.

From this answer, I found the source of random.py. In the Random class, the seed is set when the object is constructed. The module instantiates a Random object and uses it for all of the module methods. So if the random number is produced with random.random() or another module method, then the seed was set at the time of the import. If the random number is produced by another instance of Random, then the seed was set at the time of the construction of that instance.
From the source:
# Create one instance, seeded from current time, and export its methods
# as module-level functions. The functions share state across all uses
#(both in the user's code and in the Python libraries), but that's fine
# for most programs and is easier for the casual user than making them
# instantiate their own Random() instance.

The other answers are correct, but to summarize something from comments above which might be missed by someone else looking for the answer I tracked down today:
The typical reference implementations of Mersenne Twister take a seed and then internally (usually in the constructor) call this.init_genrand(seed)
If you do that and use a simple number you will get different results than what Python uses -- and probably wonder why like I did.
In order to get the same results in another language (node.js in my case) that you would in python you need an implementation which supports the init_by_array method and then initialize it with init_by_array([seed]).
This example is if you're just using a simple 32 bit int val -- if your seed is something else then python passes it in a different way (e.g. larger than 32 bit numbers are split up and sent in 32 bits per array element, etc) but that should at least help someone get going in the right direction.
The node.js implementation I ended up using was https://gist.github.com/banksean/300494 and it worked beautifully. I could not find one in npm which had the support I needed -- might have to add one.

Related

Difference Between np.random.uniform() and uniform() using built-in python packages

I'm using np.random.uniform() to generate a number in a class. Surprisingly, when I run the code, I can't see any expected changes in my results. On the other hand, when I use uniform() from python built-in packages, I see the changes in my results and that's obviously normal.
Are they really the same or is there anything tricky in their implementation?
Thank you in advance!
Create one module, say, blankpaper.py, with only two lines of code
import numpy as np
np.random.seed(420)
Then, in your main script, execute
import numpy as np
import blankpaper
print(np.random.uniform())
You should be getting exactly the same numbers.
When a module or library sets np.random.seed(some_number), it is global. Behind the numpy.random.* functions is an instance of the global RandomState generator, emphasis on the global.
It is very likely that something that you are importing is doing the aforementioned.
Change the main script to
import numpy as np
import blankpaper
rng = np.random.default_rng()
print(rng.uniform())
and you should be getting new numbers each time.
default_rng is a constructor for the random number class, Generator. As stated in the documentation,
This function does not manage a default global instance.
In reply to the question, "[a]re you setting a seed first?", you said
Yes, I'm using it but it doesn't matter if I don't use a seed or
change the seed number. I checked it several times.
Imagine we redefine blankpaper.py to contain the lines
import numpy as np
def foo():
np.random.seed(420)
print("I exist to always give you the same number.")
and suppose your main script is
import numpy as np
import blankpaper
np.random.seed(840)
blankpaper.foo()
print(np.random.uniform())
then you should be getting the same numbers as were obtained from executing the first main script (top of the answer).
In this case, the setting of the seed is hidden in one of the functions in the blankpaper module, but the same thing would happen if blankpaper.foo were a class and blankpaper.foo's __init__() method set the seed.
So this setting of the global seed can be quite "hidden".
Note also that the above also applies for the functions in the random module
The functions supplied by this module are actually bound methods of a
hidden instance of the random.Random class. You can instantiate your
own instances of Random to get generators that don’t share state.
So when uniform() from the random module was generating different numbers each time for you, it was very likely because you nor some other module set the seed shared by functions from the random module.
In both numpy and random, if your class or application wants to have it's own state, create an instance of Generator from numpy or Random from random (or SystemRandom for cryptographically-secure randomness). This will be something you can pass around within your application. It's methods will be the functions in the numpy.random or random module, only they will have their own state (unless you explicitly set them to be equal).
Finally, I am not claiming that this is exactly what is causing your problem (I had to make a few inferences since I cannot see your code), but this is a very likely reason.
Any questions/concerns please let me know!

How best to initialize python's random module within a function

I'm looking for some info around generating as random of number as possible when the random module is embedded within a function like so:
import random as rd
def coinFlip()
flip = rd.random()
if flip > .5:
return "Heads"
else:
return "Tails"
main()
for i in range(1000000):
print(coinFlip())
Edit: Ideally the above script would always yield different results therefore limiting my ability to use random.seed()
Does the random module embedded within a function initialize with a new seed each time the function is called? (Instead of using the previous generated random number as the seed.)
If so...
Is the default initialization on system time exact enough to pull a truly random number considering that the system times in the for loop here would be so close together or maybe even the same (depending on the precision of the system time.)
Is there a way to initialize a random module outside of the function and have the function pull the next random number (so to avoid multiple initializations.)
Any other more pythonic ways to accomplish this?
Thank you very much!
use random.seed() if you want to initialize the pseudo-random number generator
you can have a look here
If you don’t initialize the pseudo-random number generator using a
random.seed (), internally random generator call the seed function and
use current system current time value as the seed value. That’s why
whenever we execute random.random() we always get a different value
if you want to always have a diff number than you should not bother with initializing the random module since internally, the random module it is using by default the current system time(which is always diff).
just use :
from random import random
def coinFlip()
if random() > .5:
return "Heads"
else:
return "Tails"
to make more clear, the random module it is not initializing each time it is used, only at import time, so every time you call random.random() you have the next number which is guaranteed to be different
For starters:
This module implements pseudo-random number generators for various distributions.
[..]
The functions supplied by this module are actually bound methods of a hidden instance of the random.Random class. You can instantiate your own instances of Random to get generators that don’t share state.
https://docs.python.org/3/library/random.html
The random module is a Pseudo-Random Number Generator. All PRNGs are entirely deterministic and have state. Meaning, if the PRNG is in the same state, the next "random" number will always be the same. As the above paragraph explains, your rd.random() call is really a call to an implicitly instantiated Random object.
So:
Does the random module embedded within a function initialize with a new seed each time the function is called?
No.
Is there a way to initialize a random module outside of the function and have the function pull the next random number (so to avoid multiple initializations.)
You don't need to avoid multiple initialisation, as it's not happening. You can instantiate your own Random object if you want to control the state exactly.
class random.Random([seed])
Class that implements the default pseudo-random number generator used by the random module.
random.seed(a=None, version=2)
Initialize the random number generator. If a is omitted or None, the current system time is used. [..]
So, the implicitly instantiated Random object uses the system time as initial seed (read further though), and from there will keep state. So each time you start your Python instance, it will be seeded differently, but will be seeded only once.

Reset global variables in timeit.repeat

Scenario
Let test be the module we run as __main__. This module contains one global variable named primes, which is initialized in the module with the following assignment.
primes = []
The module also contains a function named pi, which alters this global variable:
def pi(n):
global primes
"""Some code that modifies the global 'primes' variable"""
I then want to time said function using the builtin timeit module. I want to use the timeit.repeat function and get the minimum value of the timing, as a way of improving the measurement's accuracy (instead of measuring just one time, which may be subject to slow-down due to unrelated processes).
print(min(timeit.repeat('test.pi(50000)',
setup="import test",
number=1, repeat=10)) * 1000)
The problem is that the pi function behaves differently depending on the value of primes: I expected that, for each repetition, the import test statement in the setup parameter would re-run the primes = [] statement in the test, thus 'resetting' primes so that the code being executed would be identical for each repetition. But, instead, the value of primes that resulted from the previous execution is used, so I had to add the statement test.primes = [] to the setup parameter:
print(min(timeit.repeat('test.pi(50000)',
setup="import test \n" + "test.primes = []",
number=1, repeat=10)) * 1000)
Question
This leads me to the question: is there a direct way (i.e. in one statement) to 'reset' the values of all the global variables to what they were when they were first assigned in the module?
In this specific scenario adding that one statement to manually 'reset' primes works fine, but consider a case in which there are a lot of global variables, and you want to 'reset' all of them.
Side quest-ion
Why doesn't the statement import test re-run the initial primes = [] assignment?
Let's start with your side question, because it turns out that it's actually central to everything:
Why doesn't the statement import test re-run the initial primes = [] assignment?"
Because, as explained in the docs on the import system and the import statement, what import test does is, loosely, this pseudocode:
if 'test' not in sys.modules:
find, load (compiling if needed), and exec the module
sys.modules['test'] = result
test = sys['test.modules']
OK, but why does it do that?
If you have two modules that both import the same module, they expect to see the same globals. And remember that types, functions, etc. defined at the top level of a function are all globals. For example, if sortedlist.py imports collections.abc to class SortedList(collections.abc.Sequence):, and scraper.py imports collections.abc to isinstance(something, collections.abc.Sequence), you'd want a SortedList to pass that test—but it won't if those are two completely independent types because they came from two different module objects that happen to have the same name,
If you have 12 modules that all import pandas as pd, you'd be running all the Pandas initialization code 12 times. Except that some of your modules also probably import each other, so they'd each be run multiple times, and import Pandas each time. How long do you think it would take to run all the Pandas initialization 60 times?
So, reusing existing modules is almost always what you want.
And when you don't, that's usually a sign that there's something wrong with your design (which may well be the case here).
But "almost always" isn't "always". So there are ways around it. None of them are usually a good idea for live code, but for things like unit tests and benchmarking, there are three basic options that are all fine, as long as the tradeoffs are the ones you want:
del sys.modules['test']. This is obviously pretty hacky, but it actually does exactly what you want here. Any existing references to the old module are completely untouched, but the next time anyone does import test, they're going to get a brand-new test module.
importlib.reload(test). This sounds great, but it may on the one hand be overkill (notice that it forces the module source to be recompiled, which you don't need), while on the other it may not be sufficient (it doesn't actually reset the globals—if your code does primes = [] at the top level, that line gets executed, so who cares, but if your code instead does, say, globals().setdefault('primes', []) inside the pi function, you care).
Instead of import test, manually do all the steps up through executing the module (see the examples in the importlib docs), but don't store it in sys.modules['test'] or in test, just store it in a local variable you discard after each test. This is probably the cleanest, although it does mean 6 lines of code instead of 1.

Implementing common random numbers in a simulation

I am building a small simulation in Python and I would like to use Common Random Numbers to reduce variation. I know that I must achieve synchronization for CRN to work:
CRN requires synchronization of the random number streams, which ensures that in addition to using the same random numbers to simulate all configurations, a specific random number used for a specific purpose in one configuration is used for exactly the same purpose in all other configurations.
I was wondering if the way I wanted to implement it in my simulation was valid or if I should be using a different approach.
My simulation has three different classes (ClassA, ClassB, ClassC), and ClassA objects have random travel times, ClassB objects have random service times and random usage rates, and ClassC objects have random service times. Of course there can be multiple instances of each class of object.
At the start of the simulation I specify a single random number seed (replication_seed) so that I can use a different seed for each simulation replication.
import numpy.random as npr
rep_rnd_strm = npr.RandomState().seed(replication_seed)
Then in the constructor for each Class, I use rep_rnd_strm to generate a seed that is used to initialize the random number stream for the instance of the class:
self.class_rnd_strm = npr.RandomState().seed(rep_rnd_strm.randint(10000000))
I then use self.class_rnd_strm to generate a seed for each random number stream needed for the class instance. For example the constructor of ClassA has:
self.travel_time_strm = npr.RandomState().seed(self.class_rnd_strm.randint(10000000))
while the constructor of ClassB has:
self.service_time_strm = npr.RandomState().seed(self.class_rnd_strm.randint(10000000))
self.usage_rate_strm = npr.RandomState().seed(self.class_rnd_strm.randint(10000000))
Is what I am doing here a valid approach to getting synchronization to work, or should I be doing things differently?
Yes. That is a valid approach to make it replicable, but only if you can guarantee that there is no randomness in the order in which the various instances of the various classes are instantiated.
This is because if they are instantiated in a different order, then they will get a different seed for their random number generator.
If I understand the question correctly, you are seeding with a (pseudo) random number, so you're not synchronizing your CRN.
Consider:
self.travel_time_strm = npr.RandomState().seed(seed=self.seed)
self.service_time_strm = npr.RandomState().seed(seed=self.seed)
self.usage_rate_strm = npr.RandomState().seed(seed=self.seed)
Where self.seed is an instance variable set per class, perhaps from keyword argument.
def __init__(self, *args, **kwargs):
self.seed = None
if kwargs.get('seed'):
self.seed = kwargs.get('seed')
One initial setting of the seed at the start of your simulation run is good for the reproducible random numbers allowing you to reproduce any given simulation run exactly by re-using its seed.
Do nothing in your classes other than use calls to random.[method of your choice] to get your initial random numbers. Don't touch the seed again in the simulation run.
As long as all your randomness uses the random package and you avoid resetting the seed during the run, this should give you the behaviour you need.

how to query seed used by random.random()?

Is there any way to find out what seed Python used to seed its random number generator?
I know I can specify my own seed, but I'm quite happy with Python managing it. But, I do want to know what seed it used, so that if I like the results I'm getting in a particular run, I could reproduce that run later. If I had the seed that was used then I could.
If the answer is I can't, then what's the best way to generate a seed myself? I want them to always be different from run to run---I just want to know what was used.
UPDATE: yes, I mean random.random()! mistake... [title updated]
It is not possible to get the automatic seed back out from the generator. I normally generate seeds like this:
seed = random.randrange(sys.maxsize)
rng = random.Random(seed)
print("Seed was:", seed)
This way it is time-based, so each time you run the script (manually) it will be different, but if you are using multiple generators they won't have the same seed simply because they were created almost simultaneously.
The state of the random number generator isn't always simply a seed. For example, a secure PRNG typically has an entropy buffer, which is a larger block of data.
You can, however, save and restore the entire state of the randon number generator, so you can reproduce its results later on:
import random
old_state = random.getstate()
print random.random()
random.setstate(old_state)
print random.random()
# You can also restore the state into your own instance of the PRNG, to avoid
# thread-safety issues from using the default, global instance.
prng = random.Random()
prng.setstate(old_state)
print prng.random()
The results of getstate can, of course, be pickled if you want to save it persistently.
http://docs.python.org/library/random.html#random.getstate
You can subclass the random.Random, rewrite the seed() method the same way python does (v3.5 in this example) but storing seed value in a variable before calling super():
import random
class Random(random.Random):
def seed(self, a=None, version=2):
from os import urandom as _urandom
from hashlib import sha512 as _sha512
if a is None:
try:
# Seed with enough bytes to span the 19937 bit
# state space for the Mersenne Twister
a = int.from_bytes(_urandom(2500), 'big')
except NotImplementedError:
import time
a = int(time.time() * 256) # use fractional seconds
if version == 2:
if isinstance(a, (str, bytes, bytearray)):
if isinstance(a, str):
a = a.encode()
a += _sha512(a).digest()
a = int.from_bytes(a, 'big')
self._current_seed = a
super().seed(a)
def get_seed(self):
return self._current_seed
If you test it, a first random value generated with a new seed and a second value generated using the same seed (with the get_seed() method we created) will be equal:
>>> rnd1 = Random()
>>> seed = rnd1.get_seed()
>>> v1 = rnd1.randint(1, 0x260)
>>> rnd2 = Random(seed)
>>> v2 = rnd2.randint(1, 0x260)
>>> v1 == v2
True
If you store/copy the huge seed value and try using it in another session the value generated will be exactly the same.
Since no one mentioned that usually the best random sample you could get in any programming language is generated through the operating system I have to provide the following code:
random_data = os.urandom(8)
seed = int.from_bytes(random_data, byteorder="big")
this is cryptographically secure.
Source: https://www.quora.com/What-is-the-best-way-to-generate-random-seeds-in-python
with a value 8 it seems to produce around the same number of digits as sys.maxsize for me.
>>> int.from_bytes(os.urandom(8), byteorder="big")
17520563261454622261
>>> sys.maxsize
9223372036854775807
>>>
If you "set" the seed using random.seed(None), the randomizer is automatically seeded as a function the system time. However, you can't access this value, as you observed. What I do when I want to randomize but still know the seed is this:
tim = datetime.datetime.now()
randseed = tim.hour*10000+tim.minute*100+tim.second
random.seed(randseed)
note: the reason I prefer this to using time.time() as proposed by #Abdallah is because this way the randseed is human-readable and immediately understandable, which often has big benefits. Date components and even microsegments could also be added as needed.
I wanted to do the same thing but I could not get the seed. So, I thought since the seed is generated from time. I created my seed using the system time and used it as a seed so now I know which seed was used.
SEED = int(time.time())
random.seed(SEED)
The seed is an internal variable in the random package which is used to create the next random number. When a new number is requested, the seed is updated, too.
I would simple use 0 as a seed if you want to be sure to have the same random numbers every time, or make i configurable.
CorelDraw once had a random pattern generator, which was initialized with a seed. Patterns varied drastically for different seeds, so the seed was important configuration information of the pattern. It should be part of the config options for your runs.
EDIT: As noted by ephemient, the internal state of a random number generator may be more complex than the seed, depending on its implementation.

Categories