Python Numpy: Random number in a loop - python

I have such code and use Jupyter-Notebook
for j in range(timesteps):
a_int = np.random.randint(largest_number/2) # int version
and i get random numbers, but when i try to move part of code to the functions, i start to receive same number in each iteration
def create_train_data():
np.random.seed(seed=int(time.time()))
a_int = np.random.randint(largest_number/2) # int version
return a
for j in range(timesteps):
c = create_train_data()
Why it's happend and how to fix it? i think maybe it because of processes in Jupyter-Notebook

The offending line of code is
np.random.seed(seed=int(time.time()))
Since you're executing in a loop that completes fairly quickly, calling int() on the time reduces your random seed to the same number for the entire loop. If you really want to manually set the seed, the following is a more robust approach.
def create_train_data():
a_int = np.random.randint(largest_number/2) # int version
return a
np.random.seed(seed=int(time.time()))
for j in range(timesteps):
c = create_train_data()
Note how the seed is being created once and then used for the entire loop, so that every time a random integer is called the seed changes without being reset.
Note that numpy already takes care of a pseudo-random seed. You're not gaining more random results by using it. A common reason for manually setting the seed is to ensure reproducibility. You set the seed at the start of your program (top of your notebook) to some fixed integer (I see 42 in a lot of tutorials), and then all the calculations follow from that seed. If somebody wants to verify your results, the stochasticity of the algorithms can't be a confounding factor.

The other answers are correct in saying that it is because of the seed. If you look at the Documentation From SciPy you will see that seeds are used to create a predictable random sequence. However, I think the following answer from another question regarding seeds gives a better overview of what it does and why/where to use it.
What does numpy.random.seed(0) do?

Hans Musgrave's answer is great if you are happy with pseudo-random numbers. Pseudo-random numbers are good for most applications but they are problematic if used for cryptography.
The standard approach for getting one truly random number is seeding the random number generator with the system time before pulling the number, like you tried. However, as Hans Musgrave pointed out, if you cast the time to int, you get the time in seconds which will most likely be the same throughout the loop. The correct solution to seed the RNG with a time is:
def create_train_data():
np.random.seed()
a_int = np.random.randint(largest_number/2) # int version
return a
This works because Numpy already uses the computer clock or another source of randomness for the seed if you pass no arguments (or None) to np.random.seed:
Parameters: seed : {None, int, array_like}, optional Random seed used
to initialize the pseudo-random number generator. Can be any integer
between 0 and 2**32 - 1 inclusive, an array (or other sequence) of
such integers, or None (the default). If seed is None, then
RandomState will try to read data from /dev/urandom (or the Windows
analogue) if available or seed from the clock otherwise.
It all depends on your application though. Do note the warning in the docs:
Warning The pseudo-random generators of this module should not be used
for security purposes. For security or cryptographic uses, see the
secrets module.

Related

Is it "cryptographically secure" to seed MicroPython pseudo random number generator with an input from os.urandom?

I am trying to generate true random numbers that would be considered cryptographically secure in MicroPython (a variant of Python that is used for microcontrollers). MicroPython does not currently support Python's secrets library.
I understand that I can use os.urandom to generate cryptographically secure random numbers, but would like to bring in the conveniences of setting minimums, maximums, ranges, choices, etc... that are available in Python's (and MicroPython's) random library.
In order to do this, I am contemplating "seeding" the pseudo random number generator with a sufficiently large input from os.urandom (please see example code below). This code considers some of the concepts described here: https://stackoverflow.com/a/72908523/17870197
What are the security implications of this approach? Would numbers output by this code be considered cryptographically secure?
import os
import random
count = 4
def generate_true_random_int(min_int, max_int):
seed_bytes = os.urandom(32)
seed_int = int.from_bytes(seed_bytes, "big")
random.seed(seed_int)
return random.randint(min_int, max_int)
for x in range(count):
min_int = 1
max_int = 9999
true_random_int = generate_true_random_int(min_int, max_int)
print(true_random_int)

Setting a different seed for each run of the code

I am running a code that could potentially benefit from different initialization(s) of random number generators. I use libraries torch and python. I am using the following lines of code to set random seed at the beginning of every iteration.
import numpy as np
import torch
seed = np.random.randint(0, 1000)
print(f"Seed: {seed}")
np.random.seed(seed)
torch.manual_seed(seed)
For some reason though, across (many) iterations I have observed that the seed is always set to one value, 688 in my case. What I do not understand is that the generation of the seed variable is not governed by the seed that is set later. So why does the same seed get set every time and how do I fix it? Thanks.
In your example, you initialize the default random number generator implicitly by not calling and providing seed for the RandomState class. In such cases, NumPy obtains an alternative source for the seed which may be not random enough.
Furthermore, it is not considered as a good practice to generate a random number on a small set of numbers and use it to seed the random number generator, because the probability that you'll generate the same seed is high. However, if you have similar seed values and a not too good initialization, it is a common practice to use a fast, tiny, but maybe not too good random number generator to create good quality seed values, or the whole initial state itself. But there is no need to do it manually, because NumPy's legacy random implementation follows a specific case of a scientifically sound approach [1] that ensures good different initial states even on similar (e.g. adjacent) seed values. I.e. you can seed your simulations with 0 to 1000, and the random numbers you get with NumPy in the different iterations will look completely different. You can also use this seed value to identify your calculation when you save it, or when you create statistics.
I am not sure about the implementation of random number generator in torch though. It seems it takes a 64-bit integer. If it suits your need, you can generate random numbers with NumPy's engine on this range and use at as a seed value. If you make 2 simulations, the probability that the 2 seed values are the same is 1/2^{64} ~ 5 * 10^{-20}.
With the example below, it is ensured that the state of NumPy's random generator is different in each iteration of the for loop, and the random state of torch is most probably different in each iteration
.
import numpy as np
import torch
max_sim = 3 # how many simulations you need
for numpy_seed in range(max_sim):
np.random.seed(numpy_seed)
torch_seed = np.random.randint(low=-2**63,
high=2**63,
dtype=np.int64)
print(torch_seed)
torch.manual_seed(torch_seed)
# do the rest of the simulation
# output:
# 900450186894289455
# -1530673954295414549
# -1180685649882019313
[1]: Matsumoto, Makoto and Wada, Isaku and Kuramoto, Ai and Ashihara, Hyo,
Title: Common Defects in Initialization of Pseudorandom Number Generators; around equation 30
I cannot reproduce your result as well like #iacob, and I believe the script to set the seed has no problem.

random: what is the default seed?

For Python 3, I can find many different places on the internet stating that the default seed for the random module is based on system time.
Is this also the case for Python 2.7? I imagine it is, because if I start two different Python processes, and in both I do import random; random.random() then the two different processes return different results.
If it does use system time, what is the actual seed used? (E.g. "number of seconds since midnight" or "number of microseconds since UNIX epoch", or ...)
If not, what is used to seed the PRNG?
This is the source code about how to generate default seed for a Random object.
try:
# Seed with enough bytes to span the 19937 bit
# state space for the Mersenne Twister
a = long(_hexlify(_urandom(2500)), 16)
except NotImplementedError:
import time
a = long(time.time() * 256) # use fractional seconds
urandom equals to os.urandom. And for more information about urandom, please check this page.

Sympy reconfigures the randomness seed

The use of Python symbolic computation module "Sympy" in a simulation is very difficult, I need to have reliable fixed inputs, for that I use the seed() in the random module.
However every time I call a simple sympy function, it seems to overwrites the seed with a new value, thus getting new output every time. I have searched a little bit and found this. But neither of them has a solution.
Consider this code:
from sympy import *
import random
random.seed(1)
for _ in range(2):
x = symbols('x')
equ = (x** random.randint(1,5)) ** Rational(random.randint(1,5)/2)
print(equ)
This outputs
(x**2)**(5/2)
x**4
on the first run, and
(x**2)**(5/2)
(x**5)**(3/2)
On the second run, and every-time I run the script it returns new output. I need a way to fix this to enforce the use of seed().
Does this help? From the docs on random:
"You can instantiate your own instances of Random to get generators that don’t share state"
Usage:
import random
# Create a new pseudo random number generator
prng = random.Random()
prng.seed(1)
This number generator will be unaffected by sympy

Python something resets my random seed

My question is the exact opposite of this one.
This is an excerpt from my test file
f1 = open('seed1234','r')
f2 = open('seed7883','r')
s1 = eval(f1.read())
s2 = eval(f2.read())
f1.close()
f2.close()
####
test_sampler1.random_inst.setstate(s1)
out1 = test_sampler1.run()
self.assertEqual(out1,self.out1_regress) # this is fine and passes
test_sampler2.random_inst.setstate(s2)
out2 = test_sampler2.run()
self.assertEqual(out2,self.out2_regress) # this FAILS
Some info -
test_sampler1 and test_sampler2 are 2 object from a class that performs some stochastic sampling. The class has an attribute random_inst which is an object of type random.Random(). The file seed1234 contains a TestSampler's random_inst's state as returned by random.getstate() when it was given a seed of 1234 and you can guess what seed7883 is. What I did was I created a TestSampler in the terminal, gave it a random seed of 1234, acquired the state with rand_inst.getstate() and save it to a file. I then recreate the regression test and I always get the same output.
HOWEVER
The same procedure as above doesn't work for test_sampler2 - whatever I do not get the same random sequence of numbers. I am using python's random module and I am not importing it anywhere else, but I do use numpy in some places (but not numpy.random).
The only difference between test_sampler1 and test_sampler2 is that they are created from 2 different files. I know this is a big deal and it is totally dependent on the code I wrote but I also can't simply paste ~800 lines of code here, I am merely looking for some general idea of what I might be messing up...
What might be scrambling the state of test_sampler2's random number generator?
Solution
There were 2 separate issues with my code:
1
My script is a command line script and after I refactored it to use python's optparse library I found out that I was setting the seed for my sampler using something like seed = sys.argv[1] which meant that I was setting the seed to be a str, not an int - seed can take any hashable object and I found it the hard way. This explains why I would get 2 different sequences if I used the same seed - one if I run my script from the command line with sth like python sample 1234 #seed is 1234 and from my unit_tests.py file when I would create an object instance like test_sampler1 = TestSampler(seed=1234).
2
I have a function for discrete distribution sampling which I borrowed from here (look at the accepted answer). The code there was missing something fundamental: it was still non-deterministic in the sense that if you give it the same values and probabilities array, but transformed by a permutation (say values ['a','b'] and probs [0.1,0.9] and values ['b','a'] and probabilities [0.9,0.1]) and the seed is set and you will get the same random sample, say 0.3, by the PRNG, but since the intervals for your probabilities are different, in one case you'll get a b and in one an a. To fix it, I just zipped the values and probabilities together, sorted by probability and tadaa - I now always get the same probability intervals.
After fixing both issues the code worked as expected i.e. out2 started behaving deterministically.
The only thing (apart from an internal Python bug) that can change the state of a random.Random instance is calling methods on that instance. So the problem lies in something you haven't shown us. Here's a little test program:
from random import Random
r1 = Random()
r2 = Random()
for _ in range(100):
r1.random()
for _ in range(200):
r2.random()
r1state = r1.getstate()
r2state = r2.getstate()
with open("r1state", "w") as f:
print >> f, r1state
with open("r2state", "w") as f:
print >> f, r2state
for _ in range(100):
with open("r1state") as f:
r1.setstate(eval(f.read()))
with open("r2state") as f:
r2.setstate(eval(f.read()))
assert r1state == r1.getstate()
assert r2state == r2.getstate()
I haven't run that all day, but I bet I could and never see a failing assert ;-)
BTW, it's certainly more common to use pickle for this kind of thing, but it's not going to solve your real problem. The problem is not in getting or setting the state. The problem is that something you haven't yet found is calling methods on your random.Random instance(s).
While it's a major pain in the butt to do so, you could try adding print statements to random.py to find out what's doing it. There are cleverer ways to do that, but better to keep it dirt simple so that you don't end up actually debugging the debugging code.

Categories