The below code attempts to illustrate what I want. I basically want two instances of "random" that operate independently of each other. I want to seed "random" within one class without affecting "random" in another class. How can I do that?
class RandomSeeded:
def __init__(self, seed):
import random as r1
self.random = r1
self.random.seed(seed)
def get(self):
print self.random.choice([4,5,6,7,8,9,2,3,4,5,6,7,])
class Random:
def __init__(self):
import random as r2
self.random = r2
self.random.seed()
def get(self):
print self.random.choice([4,5,6,7,8,9,2,3,4,5,6,7,])
if __name__ == '__main__':
t = RandomSeeded('asdf')
t.get() # random is seeded within t
s = Random()
s.get()
t.get() # random should still be seeded within t, but is no longer
Class random.Random exists specifically to allow the behavior you want -- modules are intrinsically singletons, but classes are meant to be multiply instantiated, so both kinds of needs are covered.
Should you ever need an independent copy of a module (which you definitely don't in the case of random!), try using copy.deepcopy on it -- in many cases it will work. However, the need is very rare, because modules don't normally keep global mutable states except by keeping one privileged instance of a class they also offer for "outside consumption" (other examples besided random include fileinput).
For the seeded random numbers, make your own instance of random.Random. The random documentation explains this class, which the module depends on a single instance of when you use the functions directly within it.
Sadly, having two independent RNG's is can be less random than having a single RNG using an "offset" into the generated sequence.
Using an "offset" means you have to generate both complete sequences of samples, and then use them for your simulation. Something like this.
def makeSequences( sequences=2, size=1000000 ):
g = random.Random()
return [ [ g.random() for g in xrange(size) ] for s in xrange(sequences) ] ]
t, s = makeSequences( 2 )
RNG's can only be proven to have desirable randomness properties for a single seed and a single sequence of numbers. Because two parallel sequences use the same constants for the multiplier and modulus, there's a chance that they can have a detectable correlation with each other.
Related
I'm looking for some info around generating as random of number as possible when the random module is embedded within a function like so:
import random as rd
def coinFlip()
flip = rd.random()
if flip > .5:
return "Heads"
else:
return "Tails"
main()
for i in range(1000000):
print(coinFlip())
Edit: Ideally the above script would always yield different results therefore limiting my ability to use random.seed()
Does the random module embedded within a function initialize with a new seed each time the function is called? (Instead of using the previous generated random number as the seed.)
If so...
Is the default initialization on system time exact enough to pull a truly random number considering that the system times in the for loop here would be so close together or maybe even the same (depending on the precision of the system time.)
Is there a way to initialize a random module outside of the function and have the function pull the next random number (so to avoid multiple initializations.)
Any other more pythonic ways to accomplish this?
Thank you very much!
use random.seed() if you want to initialize the pseudo-random number generator
you can have a look here
If you don’t initialize the pseudo-random number generator using a
random.seed (), internally random generator call the seed function and
use current system current time value as the seed value. That’s why
whenever we execute random.random() we always get a different value
if you want to always have a diff number than you should not bother with initializing the random module since internally, the random module it is using by default the current system time(which is always diff).
just use :
from random import random
def coinFlip()
if random() > .5:
return "Heads"
else:
return "Tails"
to make more clear, the random module it is not initializing each time it is used, only at import time, so every time you call random.random() you have the next number which is guaranteed to be different
For starters:
This module implements pseudo-random number generators for various distributions.
[..]
The functions supplied by this module are actually bound methods of a hidden instance of the random.Random class. You can instantiate your own instances of Random to get generators that don’t share state.
https://docs.python.org/3/library/random.html
The random module is a Pseudo-Random Number Generator. All PRNGs are entirely deterministic and have state. Meaning, if the PRNG is in the same state, the next "random" number will always be the same. As the above paragraph explains, your rd.random() call is really a call to an implicitly instantiated Random object.
So:
Does the random module embedded within a function initialize with a new seed each time the function is called?
No.
Is there a way to initialize a random module outside of the function and have the function pull the next random number (so to avoid multiple initializations.)
You don't need to avoid multiple initialisation, as it's not happening. You can instantiate your own Random object if you want to control the state exactly.
class random.Random([seed])
Class that implements the default pseudo-random number generator used by the random module.
random.seed(a=None, version=2)
Initialize the random number generator. If a is omitted or None, the current system time is used. [..]
So, the implicitly instantiated Random object uses the system time as initial seed (read further though), and from there will keep state. So each time you start your Python instance, it will be seeded differently, but will be seeded only once.
I have a bunch of functions which apply to a similar object, for example a Numpy array which represents an n-dimensional box:
# 3-D box parameterized as:
# box[0] = 3-D min coordinate
# box[1] = 3-D max coordinate
box = np.array([
[1, 3, 0],
[4, 5, 7]
])
Now I have a whole bunch of functions that I want to run on lists of boxes, eg. volumes, intersection, smallest_containing_box, etc. In my mind here is the way I was hoping to set this up:
# list of test functions:
test_funcs = [volume, intersection, smallest_containing_box, ...]
# manually create a bunch of inputs and outputs
test_set_1 = (
input = [boxA, boxB, ...], # where each of these are np.Array objects
output = [
[volA, volB, ...], # floats I calculated manually
intersection, # np.Array representing the correct intersection
smallest_containing_box, # etc.
]
)
# Create a bunch of these, eg. test_set_2, test_set_3, etc. and bundle them in a list:
test_sets = [test_set_1, ...]
# Now run the set of tests over each of these:
test_results = [[assertEqual(test(t.input), t.output) for test in test_funcs] for t in test_sets]
The reason I want to structure it this way is so that I can create multiple sets of (input, answer) pairs and just run all the tests over each. Unless I'm missing something, the structure of unittest doesn't seem to work well with this approach. Instead, it seems like it wants me to create an individual TestCase object for each pair of function and input, i.e.
class TestCase1(unittest.TestCase):
def setUp(self):
self.input = [...]
self.volume = [volA, volB, ...]
self.intersection = ...
# etc.
def test_volume(self):
self.assertEqual(volume(self.input), self.volume)
def test_intersection(self):
self.assertEqual(intersection(self.input), self.output)
# etc.
# Repeat this for every test case!?
This seems like a crazy amount of boilerplate. Am I missing something?
Let met try to describe how I understand your approach: You have implemented a number of different functions that have a similarity, namely that they operate on the same types of input data. In your tests you try to make use of that similarity: You create some input data and pass that input data to all of your functions.
This test-data centric approach is unusual. The typical unit-testing approach is code-centric. The reason is, that one primary goal of unit-testing is to find bugs in the code. Different functions have (obviously) different code, and therefore the types of bugs may be different. Thus, test data is typically carefully designed to identify certain kinds of bugs in the respective code. Test design methods are approaches that methodically design test cases such that ideally all likely bugs would be detected.
I am sceptical that with your test-data centric approach you will be equally successful in finding the bugs in your different functions: For the volume function there may be overflow scenarios (and also underflow scenarios) that don't apply for intersection or smallest_containing_box. In contrast, there will have to be empty intersections, one-point-intersections etc.. Thus it seems that each of the functions probably needs specifically designed test scenarios.
Regarding the boilerplate code that seems to be the consequence of code-centric unit-testing: There are several ways to limit that. You would, agreed, have different test methods for different functions under test. But then you could use parameterized tests to avoid further code duplication. And, for the case that you still see an advantage to use (at least sometimes) common test-data for the different functions: For such cases you can use factory functions that create the test-data and can be called from the different test cases. For example, you could have a factory function make-unit-cube to be used from different tests.
Try unittest.TestSuite(). That gives you an object where you can add test cases. In your case, create the suite, then loop over your lists, creating instances of TestCase which all have only a single test method. Pass the test data to the constructor and save them to properties there instead of in setUp().
The unit test runner will detect the suite when you create it in a method called suite() and run all of them.
Note: Assign a name to each TestCase instance or finding out which failed will be very hard.
I have a Python class with a number of functions. Evaluating each function requires a long list of constants (a few dozen); the constants are different for each function. Each function will be called many times (millions), so performance is a major concern. What is the proper way to handle this situation, both in terms of readability and in terms of speed? (I'm relatively new to python)
I could make the constants class attributes, but this seems to clash with the safety provided by scopes (each constant is used only locally in a single function). Also, it would be convenient to be able to use the same variable names in different functions. Besides, I find that all the self.s make the code unreadable, especially when trying to keep line length under 79.
I now have the constants defined locally at the start of each function. This is reasonably clear, but I'm not sure if it's optimal in terms of performance? Is new memory allocated and freed for every function call? Especially since you apparently can't declare proper constants in Python.
Maybe it's better to put the parameters for each function in a separate class and pass an object of this class as an argument? This would maintain the proper scopes while ensuring each constant is defined only once?
I ran some speed tests using the logistic map with single parameter r as an example. Defining r locally at the start of the function gives the best performance, while setting r either as a class attribute or passing it to the function as an argument (individually or as attribute of an instance of a parameter class) raises computation time by roughly 10%. So I suppose the interpreter recognizes the fact that a locally defined r is a constant when the object is created. (Note that I test with only a single parameter, so the difference may be larger in more complicated models.)
In case anyone is curious, here is the code of the test:
import matplotlib.pyplot as plt
import timeit
# Define class
class Mod:
def __init__(self):
self.reset()
# self.r = 3.9 # -> uncomment when testing class attribute
return
def reset(self):
self.x = 0.5
def step(self): # -> add r when testing argument
# ARGUMENT
# self.x = r*self.x*(1-self.x) # +10% computation time
# CLASS ATTRIBUTE
# self.x = self.r*self.x*(1-self.x) # +10% computation time
# LOCAL
r = 3.9
self.x = r*self.x*(1-self.x)
# Length of test
trials = int(1e5)
T = int(1e3)
# r = 3.9 # -> uncomment when testing class attribute
# start timer
t0 = timeit.default_timer()
for _ in range(trials):
x = []
m.reset()
for _ in range(T):
m.step() # -> add r when testing argument
x.append(m.x) # do something with data to avoid interpreter tricks
# print computation time
print(timeit.default_timer() - t0)
plt.plot(x)
plt.show()
On the random module python page (Link Here) there is this warning:
Warning: The pseudo-random generators of this module should not be used for security purposes. Use os.urandom() or SystemRandom if you
require a cryptographically secure pseudo-random number generator.
So whats the difference between os.urandom() and random?
Is one closer to a true random than the other?
Would the secure random be overkill in non-cryptographic instances?
Are there any other random modules in python?
You can read up on the distinction of cryptographically secure RNG in this fantastic answer over at Crypto.SE.
The main distinction between random and the system RNG like urandom is one of use cases. random implements deterministic PRNGs. There are scenarios where you want exactly those. For instance when you have an algorithm with a random element which you want to test, and you need those tests to be repeatable. In that case you want a deterministic PRNG which you can seed.
urandom on the other hand cannot be seeded and draws its source of entropy from many unpredictable sources, making it more random.
True random is something else yet and you'd need a physical source of randomness like something that measures atomic decay; that is truly random in the physical sense, but usually overkill for most applications.
So whats the difference between os.urandom() and random?
Random itself is predicable. That means that given the same seed the sequence of numbers generated by random is the same. Take a look at this question for a better explanation. This question also illustrates than random isn't really random.
This is generally the case for most programming languages - the generation of random numbers is not truly random. You can use these numbers when
cryptographic security is not a concern or if you want the same pattern of numbers to be generated.
Is one closer to a true random than the other?
Not sure how to answer this question because truly random numbers cannot be generated. Take a look at this article or this question for more information.
Since random generates a repeatable pattern I would say that os.urandom() is certainly more "random"
Would the secure random be overkill in non-cryptographic instances?
I wrote the following functions and there doesn't appear to be a huge time difference. However, if you don't need cryptographically secure numbers
it doesn't really make sense to use os.urandom(). Again it comes down to the use case, do you want a repeatable pattern, how "random" do you want your numbers, etc?
import time
import os
import random
def generate_random_numbers(x):
start = time.time()
random_numbers = []
for _ in range(x):
random_numbers.append(random.randrange(1,10,1))
end = time.time()
print(end - start)
def generate_secure_randoms(x):
start = time.time()
random_numbers = []
for _ in range(x):
random_numbers.append(os.urandom(1))
end = time.time()
print(end - start)
generate_random_numbers(10000)
generate_secure_randoms(10000)
Results:
0.016040563583374023
0.013456106185913086
Are there any other random modules in python?
Python 3.6 introduces the new secrets module
random implements a pseudo random number generator. Knowing the algorithm and the parameters we can predict the generated sequence. At the end of the text is a possible implementation of a linear pseudo random generator in Python, that shows the generator can be a simple linear function.
os.urandom uses system entropy sources to have better random generation. Entropy sources are something that we cannot predict, like asynchronous events. For instance the frequency that we hit the keyboard keys cannot be predicted.
Interrupts from other devices can also be unpredictable.
In the random module there is a class: SystemRandom which uses os.urandom() to generate random numbers.
Actually, it cannot be proven if a given sequence is Random or NOT. Andrey Kolmogorov work this out extensively around 1960s.
One can think that a sequence is random when the rules to obtain the sequence, in any given language, are larger than the sequence itself. Take for instance the following sequence, which seems random:
264338327950288419716939937510
However we can represent it also as:
pi digits 21 to 50
Since we found a way to represent the sequence smaller than the sequence itself, the sequence is not random. We could even think of a more compact language to represent it, say:
pi[21,50]
or yet another.
But the smaller rules, in the most compact language (or the smaller algorithm, if you will), to generate the sequence may never be found, even if it exists.
This finding depends only on human intelligence which is not absolute.
There might be a definitive way to prove if a sequence is random, but we will only know it when someone finds it. Or maybe there is no way to prove if randomness even exists.
An implementation of a LCG (Linear congruent generator) in Python can be:
from datetime import datetime
class LCG:
defaultSeed = 0
defaultMultiplier = 1664525
defaultIncrement = 1013904223
defaultModulus = 0x100000000
def __init__(self, seed, a, c, m):
self._x0 = seed #seed
self._a = a #multiplier
self._c = c #increment
self._m = m #modulus
#classmethod
def lcg(cls, seed = None):
if seed is None: seed = cls.defaultSeed
return LCG(int(seed), cls.defaultMultiplier,
cls.defaultIncrement, cls.defaultModulus)
#pre: bound > 0
#returns: pseudo random integer in [0, bound[
def randint(self, bound):
self._x0 = (self._a * self._x0 + self._c) % self._m
return int(abs(self._x0 % bound))
#generate a sequence of 20 digits
rnd = LCG.lcg(datetime.now().timestamp()) #diff seed every time
for i in range(20):
print(rnd.randint(10), end='')
print()
I am building a small simulation in Python and I would like to use Common Random Numbers to reduce variation. I know that I must achieve synchronization for CRN to work:
CRN requires synchronization of the random number streams, which ensures that in addition to using the same random numbers to simulate all configurations, a specific random number used for a specific purpose in one configuration is used for exactly the same purpose in all other configurations.
I was wondering if the way I wanted to implement it in my simulation was valid or if I should be using a different approach.
My simulation has three different classes (ClassA, ClassB, ClassC), and ClassA objects have random travel times, ClassB objects have random service times and random usage rates, and ClassC objects have random service times. Of course there can be multiple instances of each class of object.
At the start of the simulation I specify a single random number seed (replication_seed) so that I can use a different seed for each simulation replication.
import numpy.random as npr
rep_rnd_strm = npr.RandomState().seed(replication_seed)
Then in the constructor for each Class, I use rep_rnd_strm to generate a seed that is used to initialize the random number stream for the instance of the class:
self.class_rnd_strm = npr.RandomState().seed(rep_rnd_strm.randint(10000000))
I then use self.class_rnd_strm to generate a seed for each random number stream needed for the class instance. For example the constructor of ClassA has:
self.travel_time_strm = npr.RandomState().seed(self.class_rnd_strm.randint(10000000))
while the constructor of ClassB has:
self.service_time_strm = npr.RandomState().seed(self.class_rnd_strm.randint(10000000))
self.usage_rate_strm = npr.RandomState().seed(self.class_rnd_strm.randint(10000000))
Is what I am doing here a valid approach to getting synchronization to work, or should I be doing things differently?
Yes. That is a valid approach to make it replicable, but only if you can guarantee that there is no randomness in the order in which the various instances of the various classes are instantiated.
This is because if they are instantiated in a different order, then they will get a different seed for their random number generator.
If I understand the question correctly, you are seeding with a (pseudo) random number, so you're not synchronizing your CRN.
Consider:
self.travel_time_strm = npr.RandomState().seed(seed=self.seed)
self.service_time_strm = npr.RandomState().seed(seed=self.seed)
self.usage_rate_strm = npr.RandomState().seed(seed=self.seed)
Where self.seed is an instance variable set per class, perhaps from keyword argument.
def __init__(self, *args, **kwargs):
self.seed = None
if kwargs.get('seed'):
self.seed = kwargs.get('seed')
One initial setting of the seed at the start of your simulation run is good for the reproducible random numbers allowing you to reproduce any given simulation run exactly by re-using its seed.
Do nothing in your classes other than use calls to random.[method of your choice] to get your initial random numbers. Don't touch the seed again in the simulation run.
As long as all your randomness uses the random package and you avoid resetting the seed during the run, this should give you the behaviour you need.