For Python 3, I can find many different places on the internet stating that the default seed for the random module is based on system time.
Is this also the case for Python 2.7? I imagine it is, because if I start two different Python processes, and in both I do import random; random.random() then the two different processes return different results.
If it does use system time, what is the actual seed used? (E.g. "number of seconds since midnight" or "number of microseconds since UNIX epoch", or ...)
If not, what is used to seed the PRNG?
This is the source code about how to generate default seed for a Random object.
try:
# Seed with enough bytes to span the 19937 bit
# state space for the Mersenne Twister
a = long(_hexlify(_urandom(2500)), 16)
except NotImplementedError:
import time
a = long(time.time() * 256) # use fractional seconds
urandom equals to os.urandom. And for more information about urandom, please check this page.
Related
I am trying to generate true random numbers that would be considered cryptographically secure in MicroPython (a variant of Python that is used for microcontrollers). MicroPython does not currently support Python's secrets library.
I understand that I can use os.urandom to generate cryptographically secure random numbers, but would like to bring in the conveniences of setting minimums, maximums, ranges, choices, etc... that are available in Python's (and MicroPython's) random library.
In order to do this, I am contemplating "seeding" the pseudo random number generator with a sufficiently large input from os.urandom (please see example code below). This code considers some of the concepts described here: https://stackoverflow.com/a/72908523/17870197
What are the security implications of this approach? Would numbers output by this code be considered cryptographically secure?
import os
import random
count = 4
def generate_true_random_int(min_int, max_int):
seed_bytes = os.urandom(32)
seed_int = int.from_bytes(seed_bytes, "big")
random.seed(seed_int)
return random.randint(min_int, max_int)
for x in range(count):
min_int = 1
max_int = 9999
true_random_int = generate_true_random_int(min_int, max_int)
print(true_random_int)
I am running a code that could potentially benefit from different initialization(s) of random number generators. I use libraries torch and python. I am using the following lines of code to set random seed at the beginning of every iteration.
import numpy as np
import torch
seed = np.random.randint(0, 1000)
print(f"Seed: {seed}")
np.random.seed(seed)
torch.manual_seed(seed)
For some reason though, across (many) iterations I have observed that the seed is always set to one value, 688 in my case. What I do not understand is that the generation of the seed variable is not governed by the seed that is set later. So why does the same seed get set every time and how do I fix it? Thanks.
In your example, you initialize the default random number generator implicitly by not calling and providing seed for the RandomState class. In such cases, NumPy obtains an alternative source for the seed which may be not random enough.
Furthermore, it is not considered as a good practice to generate a random number on a small set of numbers and use it to seed the random number generator, because the probability that you'll generate the same seed is high. However, if you have similar seed values and a not too good initialization, it is a common practice to use a fast, tiny, but maybe not too good random number generator to create good quality seed values, or the whole initial state itself. But there is no need to do it manually, because NumPy's legacy random implementation follows a specific case of a scientifically sound approach [1] that ensures good different initial states even on similar (e.g. adjacent) seed values. I.e. you can seed your simulations with 0 to 1000, and the random numbers you get with NumPy in the different iterations will look completely different. You can also use this seed value to identify your calculation when you save it, or when you create statistics.
I am not sure about the implementation of random number generator in torch though. It seems it takes a 64-bit integer. If it suits your need, you can generate random numbers with NumPy's engine on this range and use at as a seed value. If you make 2 simulations, the probability that the 2 seed values are the same is 1/2^{64} ~ 5 * 10^{-20}.
With the example below, it is ensured that the state of NumPy's random generator is different in each iteration of the for loop, and the random state of torch is most probably different in each iteration
.
import numpy as np
import torch
max_sim = 3 # how many simulations you need
for numpy_seed in range(max_sim):
np.random.seed(numpy_seed)
torch_seed = np.random.randint(low=-2**63,
high=2**63,
dtype=np.int64)
print(torch_seed)
torch.manual_seed(torch_seed)
# do the rest of the simulation
# output:
# 900450186894289455
# -1530673954295414549
# -1180685649882019313
[1]: Matsumoto, Makoto and Wada, Isaku and Kuramoto, Ai and Ashihara, Hyo,
Title: Common Defects in Initialization of Pseudorandom Number Generators; around equation 30
I cannot reproduce your result as well like #iacob, and I believe the script to set the seed has no problem.
I am debugging a 64-bit Linux ELF binary which uses time() to generate a seed. Then this seed is used by srand() to seed the random number generator. And rand() is used to generate the random number.
I have the value of the seed and now I am trying to reproduce the same result as the binary.
seed = 0x93ae5c6
srand(seed)
rand() returns 0x000000003173C91C
If I use Python to generate the random number, I get a different result
import random
random.seed(0x93ae5c6)
random.random() returns 0.8019104241491927
Is it because Python generates random numbers in a different way than glibc on Linux?
Try this
Python port of the GLIBC rng
I have such code and use Jupyter-Notebook
for j in range(timesteps):
a_int = np.random.randint(largest_number/2) # int version
and i get random numbers, but when i try to move part of code to the functions, i start to receive same number in each iteration
def create_train_data():
np.random.seed(seed=int(time.time()))
a_int = np.random.randint(largest_number/2) # int version
return a
for j in range(timesteps):
c = create_train_data()
Why it's happend and how to fix it? i think maybe it because of processes in Jupyter-Notebook
The offending line of code is
np.random.seed(seed=int(time.time()))
Since you're executing in a loop that completes fairly quickly, calling int() on the time reduces your random seed to the same number for the entire loop. If you really want to manually set the seed, the following is a more robust approach.
def create_train_data():
a_int = np.random.randint(largest_number/2) # int version
return a
np.random.seed(seed=int(time.time()))
for j in range(timesteps):
c = create_train_data()
Note how the seed is being created once and then used for the entire loop, so that every time a random integer is called the seed changes without being reset.
Note that numpy already takes care of a pseudo-random seed. You're not gaining more random results by using it. A common reason for manually setting the seed is to ensure reproducibility. You set the seed at the start of your program (top of your notebook) to some fixed integer (I see 42 in a lot of tutorials), and then all the calculations follow from that seed. If somebody wants to verify your results, the stochasticity of the algorithms can't be a confounding factor.
The other answers are correct in saying that it is because of the seed. If you look at the Documentation From SciPy you will see that seeds are used to create a predictable random sequence. However, I think the following answer from another question regarding seeds gives a better overview of what it does and why/where to use it.
What does numpy.random.seed(0) do?
Hans Musgrave's answer is great if you are happy with pseudo-random numbers. Pseudo-random numbers are good for most applications but they are problematic if used for cryptography.
The standard approach for getting one truly random number is seeding the random number generator with the system time before pulling the number, like you tried. However, as Hans Musgrave pointed out, if you cast the time to int, you get the time in seconds which will most likely be the same throughout the loop. The correct solution to seed the RNG with a time is:
def create_train_data():
np.random.seed()
a_int = np.random.randint(largest_number/2) # int version
return a
This works because Numpy already uses the computer clock or another source of randomness for the seed if you pass no arguments (or None) to np.random.seed:
Parameters: seed : {None, int, array_like}, optional Random seed used
to initialize the pseudo-random number generator. Can be any integer
between 0 and 2**32 - 1 inclusive, an array (or other sequence) of
such integers, or None (the default). If seed is None, then
RandomState will try to read data from /dev/urandom (or the Windows
analogue) if available or seed from the clock otherwise.
It all depends on your application though. Do note the warning in the docs:
Warning The pseudo-random generators of this module should not be used
for security purposes. For security or cryptographic uses, see the
secrets module.
I ran into this today and can't figure out why. I have several functions chained together that perform some time consuming operations as part of a larger pipeline. I've included these here, pared down to a test example, as best as I could. The issue is that when I call a function directly, I get the expected output (e.g., 5 different trees). However, when I call the same function in a multiprocessing pool with apply_async (or apply, doesn't matter), I get 5 trees, but they are all the same.
I've documented this in an IPython notebook, which can be viewed here: http://nbviewer.ipython.org/gist/cfriedline/0e275d528ff1a8d674c6
In cell 91, I create 5 trees (each with 10 tips), and return two lists. The first containing the non-multiprocessing trees, and the second from apply_async.
In cell 92, you can see the results of creating trees without multiprocessing, and in 93, with multiprocessing.
What I expect is that there would be a total of 10 different trees between the two tests, but instead all of the multiprocessing trees are identical. Makes little sense to me.
Relevant versions of things:
Linux 2.6.18-238.12.1.el5 x86_64 GNU/Linux
Python 2.7.6 :: Anaconda 1.9.2 (64-bit)
IPython 2.0.0
Rpy2 2.3.9
Thanks!
Chris
I solved this one, with a point in the right direction from #mgilson. In fact, it was a random number problem, just not in python - in R (sigh). The state of R is copied when the Pool is created, meaning so is its random seed. To fix, just a little rpy2 as below calling R's set.seed function (with some process specific stuff for good measure):
def create_tree(num_tips, type):
"""
creates the taxa tree in R
#param num_tips: number of taxa to create
#param type: type for naming (e.g., 'taxa')
#return: a dendropy Tree
#rtype: dendropy.Tree
"""
r = rpy2.robjects.r
set_seed = r('set.seed')
set_seed(int((time.time()+os.getpid()*1000)))
rpy2.robjects.globalenv['numtips'] = num_tips
rpy2.robjects.globalenv['treetype'] = type
name = _get_random_string(20)
if type == "T":
r("%s = rtree(numtips, rooted=T, tip.label=paste(treetype, seq(1:(numtips)), sep=''))" % name)
else:
r("%s = rtree(numtips, rooted=F, tip.label=paste(treetype, seq(1:(numtips)), sep=''))" % name)
tree = r[name]
return ape_to_dendropy(tree)
I'm not 100% familiar with these libraries, however, on Linux, (IIRC) multiprocessing uses os.fork. This means that the state of the random module (which you're using) will also be forked and that each of your processes will generate the same sequence of random numbers resulting in a not-so-random _get_random_string function.
If I'm right, and you make the pool smaller than the number of trees that you want, you should see that you get groups of N identical trees (where N is the number of pools).
I think that probably the ideal solution is to re-seed the random number generator inside of each of the processes. It's unlikely that they'll run at exactly the same time, so you should get differing results.