Why am I getting different bootstrap results using different algorithms? - python

I am using two different methods of trying to generate a bootstrap sample
np.random.seed(335)
y=np.random.normal(0,1,5)
b=np.empty(len(y)) #initializes an empty vector
for j in range(len(y)):
a = np.random.randint(1,len(y)) #Draws a random integer from 1 to n, where n is our sample size
b[j] = y[a-1] #indicies in python start at zero, the worst part of Python in my opinion
c = np.random.choice(y, size=5)
print(b)
print(c)
and for my output I get different results
[1.04749432 1.71963433 1.71963433 1.71963433 1.71963433]
[-0.25224454 -0.25224454 0.46604474 1.71963433 0.46604474]
I think the answer has something to do with the random number generator, but I'm confused as to the exact reason.

This comes down to the use of different algorithms for randomized selection. There are numerous equivalent ways to select items at random with replacement using a pseudorandom generator (or to generate random variates from any other distribution). In particular, the algorithm for numpy.random.choice need not make use of numpy.random.randint in theory. What matters is that these equivalent ways should produce the same distribution of random variates. In the case of NumPy, look at NumPy's source code.
Another, less important, reason for different results is that the two different selection procedures (randint and choice) produce pseudorandom numbers themselves, which can differ from each other because the selection procedures didn't begin with the same seed (more precisely, the same sequence of pseudorandom numbers). If we set the seed to the same value before beginning each procedure:
np.random.seed(335)
y=np.random.normal(0,1,5)
b=np.empty(len(y))
np.random.seed(999999) # Seed selection procedure 1
for j in range(len(y)):
a = np.random.randint(1,len(y))
b[j] = y[a-1]
np.random.seed(999999) # Seed selection procedure 2
c = np.random.choice(y, size=5)
print(b)
print(c)
then each procedure will begin with the same pseudorandom numbers. But even so, the two procedures may use different algorithms for random selection, and these differences may still lead to different results.
(However, numpy.random.* functions, such as randint and choice, have become legacy functions as of NumPy 1.17, and their algorithms are expected to remain as they are for backward compatibility reasons. That version didn't deprecate any numpy.random.* functions, however, so they are still available for the time being. See also this question. In newer applications you should make use of the new system introduced in version 1.17, including numpy.random.Generator, if you have that version or later. One advantage of the new system is that the application relies less on global state.)

Related

Need help finding GCD (noob approach)

I am currently going through Math adventures with Python book by Peter Farrel. Now I am simply trying to improve my math skills while learning Python in a fun way. So we made a factors function as seen below:
def factors(num):
factorList = []
for i in range(1, num+1):
if num % i == 0:
factorList.append(i)
return factorList
Exercise 3-1 is asking to make GCF (Greatest Common Factor) function. All the answers here are how we could use builtin Python modules or recursive or Euclid algorithm. I have no clue what any of these things mean, let alone trying it on this assignment. I came with the following solution using the above function:
def gcFactor(num1, num2):
fnum1 = factors(num1)
fnum2 = factors(num2)
gcf = list(set(fnum1).intersection(fnum2))
return max(gcf)
print(gcFactor(28,21))
Is this the best way of doing it? Using the .intersection() function seems a little cheaty to me.
So what I wanted to do is if I could use a loop and separate the list values in fnum1 & fnum2 and compare them and then return the value that matches (which would make common factors) and is greatest (which would be GCF).
The idea behind your algorithm is sound, but there are a few problems:
In your original version, you used gcf[-1] to get the greatest factor, but that will not always work, since converting a set to list does not guarantee that the elements will be in sorted order, even if they were sorted before converting to set. Better use max (you already changed that).
Using set.intersection is definitely not "cheating" but just making good use of what the languages provides. It might be considered cheating to just use math.gcd, but not basic set or list functions.
Your algorithm is rather inefficient. I don't know the book, but I don't think you should actually use the factors function to calculate the gcf, but that was just an exercise to teach you stuff like loops and modulo. Consider two very different numbers as inputs, say 23764372 and 6. You'd calculate all the factors of 23764372 first, before testing the very few values that could actually be common factors. Instead of using factors directly, try to rewrite your gcFactor function to test which values up to the min of the two numbers are factors of both numbers.
Even then, your algorithm will not be very efficient. I would suggest reading up on Euclid's Algorithm and trying to implement that next. If unsure if you did it right, you can use your first function as a reference for testing, and to see the difference in performance.
About your factors function itself: Note that there is a symmetry: if i is a factor, so is n//i. If you use this, you do not have to test all the values up to n but just up to sqrt(n), which is a speed-up equivalent to reducing running time from O(n²) to O(n).

Does Python have a function to mimic the sequence of C's rand()?

I am looking for a Python function that will mimic the behavior of rand() (and srand()) in c with the following requirements:
I can provide the same epoch time into the Python equivalent of srand() to seed the function
The equivalent of rand()%256 should result in the same char value as in c if both were provided the same seed.
So far, I have considered both the random library and numpy's random library. Instead of providing a random number from 0 to 32767 as C does though both yield a floating point number from 0 to 1 on their random functions. When attempting random.randint(0,32767), I yielded different results than when in my C function.
TL;DR
Is there an existing function/libary in Python that follows the same random sequence as C?
You can accomplish this with CDLL from https://docs.python.org/3/library/ctypes.html
from ctypes import CDLL
libc = CDLL("libc.so.6")
libc.srand(42)
print(libc.rand() % 32768)
You can't make a Python version of rand and srand functions "follo[w] the same random sequence" of C's rand and srand because the C standard doesn't specify exactly what that sequence is, even if the seed is given. Notably:
rand uses an unspecified pseudorandom number algorithm, and that algorithm can differ between C implementations, including versions of the same standard library.
rand returns values no greater than RAND_MAX, and RAND_MAX can differ between C implementations.
In general, the best way to "sync" PRNGs between two programs in different languages is to implement the same PRNG in both languages. This may be viable assuming you're not using pseudorandom numbers for information security purposes. See also this question: How to sync a PRNG between C#/Unity and Python?
You can use random.seed(). The following example should print the same sequence every time it runs:
import random
random.seed(42)
for _ in range(10):
print(random.randint(0, 32768))
Update: Just saw the last comment by the OP. No, this code won't give you the same sequence as the C code, because of reasons given in other comments. Two different C implementations won't agree either.

Whats the difference between os.urandom() and random?

On the random module python page (Link Here) there is this warning:
Warning: The pseudo-random generators of this module should not be used for security purposes. Use os.urandom() or SystemRandom if you
require a cryptographically secure pseudo-random number generator.
So whats the difference between os.urandom() and random?
Is one closer to a true random than the other?
Would the secure random be overkill in non-cryptographic instances?
Are there any other random modules in python?
You can read up on the distinction of cryptographically secure RNG in this fantastic answer over at Crypto.SE.
The main distinction between random and the system RNG like urandom is one of use cases. random implements deterministic PRNGs. There are scenarios where you want exactly those. For instance when you have an algorithm with a random element which you want to test, and you need those tests to be repeatable. In that case you want a deterministic PRNG which you can seed.
urandom on the other hand cannot be seeded and draws its source of entropy from many unpredictable sources, making it more random.
True random is something else yet and you'd need a physical source of randomness like something that measures atomic decay; that is truly random in the physical sense, but usually overkill for most applications.
So whats the difference between os.urandom() and random?
Random itself is predicable. That means that given the same seed the sequence of numbers generated by random is the same. Take a look at this question for a better explanation. This question also illustrates than random isn't really random.
This is generally the case for most programming languages - the generation of random numbers is not truly random. You can use these numbers when
cryptographic security is not a concern or if you want the same pattern of numbers to be generated.
Is one closer to a true random than the other?
Not sure how to answer this question because truly random numbers cannot be generated. Take a look at this article or this question for more information.
Since random generates a repeatable pattern I would say that os.urandom() is certainly more "random"
Would the secure random be overkill in non-cryptographic instances?
I wrote the following functions and there doesn't appear to be a huge time difference. However, if you don't need cryptographically secure numbers
it doesn't really make sense to use os.urandom(). Again it comes down to the use case, do you want a repeatable pattern, how "random" do you want your numbers, etc?
import time
import os
import random
def generate_random_numbers(x):
start = time.time()
random_numbers = []
for _ in range(x):
random_numbers.append(random.randrange(1,10,1))
end = time.time()
print(end - start)
def generate_secure_randoms(x):
start = time.time()
random_numbers = []
for _ in range(x):
random_numbers.append(os.urandom(1))
end = time.time()
print(end - start)
generate_random_numbers(10000)
generate_secure_randoms(10000)
Results:
0.016040563583374023
0.013456106185913086
Are there any other random modules in python?
Python 3.6 introduces the new secrets module
random implements a pseudo random number generator. Knowing the algorithm and the parameters we can predict the generated sequence. At the end of the text is a possible implementation of a linear pseudo random generator in Python, that shows the generator can be a simple linear function.
os.urandom uses system entropy sources to have better random generation. Entropy sources are something that we cannot predict, like asynchronous events. For instance the frequency that we hit the keyboard keys cannot be predicted.
Interrupts from other devices can also be unpredictable.
In the random module there is a class: SystemRandom which uses os.urandom() to generate random numbers.
Actually, it cannot be proven if a given sequence is Random or NOT. Andrey Kolmogorov work this out extensively around 1960s.
One can think that a sequence is random when the rules to obtain the sequence, in any given language, are larger than the sequence itself. Take for instance the following sequence, which seems random:
264338327950288419716939937510
However we can represent it also as:
pi digits 21 to 50
Since we found a way to represent the sequence smaller than the sequence itself, the sequence is not random. We could even think of a more compact language to represent it, say:
pi[21,50]
or yet another.
But the smaller rules, in the most compact language (or the smaller algorithm, if you will), to generate the sequence may never be found, even if it exists.
This finding depends only on human intelligence which is not absolute.
There might be a definitive way to prove if a sequence is random, but we will only know it when someone finds it. Or maybe there is no way to prove if randomness even exists.
An implementation of a LCG (Linear congruent generator) in Python can be:
from datetime import datetime
class LCG:
defaultSeed = 0
defaultMultiplier = 1664525
defaultIncrement = 1013904223
defaultModulus = 0x100000000
def __init__(self, seed, a, c, m):
self._x0 = seed #seed
self._a = a #multiplier
self._c = c #increment
self._m = m #modulus
#classmethod
def lcg(cls, seed = None):
if seed is None: seed = cls.defaultSeed
return LCG(int(seed), cls.defaultMultiplier,
cls.defaultIncrement, cls.defaultModulus)
#pre: bound > 0
#returns: pseudo random integer in [0, bound[
def randint(self, bound):
self._x0 = (self._a * self._x0 + self._c) % self._m
return int(abs(self._x0 % bound))
#generate a sequence of 20 digits
rnd = LCG.lcg(datetime.now().timestamp()) #diff seed every time
for i in range(20):
print(rnd.randint(10), end='')
print()

How to use a random seed value in order to unittest a PRNG in Python?

I'm still pretty new to programming and just learning how to unittest. I need to test a function that returns a random value. I've so far found answers suggesting the use of a specific seed value so that the 'random' sequence is constant and can be compared. This is what I've got so far:
This is the function I want to test:
import random
def roll():
'''Returns a random number in the range 1 to 6, inclusive.'''
return random.randint(1, 6)
And this is my unittest:
class Tests(unittest.TestCase):
def test_random_roll(self):
random.seed(900)
seq = random.randint(1, 6)
self.assertEqual(roll(), seq)
How do I set the corresponding seed value for the PRNG in the function so that it can be tested without writing it into the function itself? Or is this completely the wrong way to go about testing a random number generator?
Thanks
The other answers are correct as far as they go. Here I'm answering the deeper question of how to test a random number generator:
Your provided function is not really a random number generator, as its entire implementation depends on a provided random number generator. In other words, you are trusting that Python provides you with a sensible random generator. For most purposes, this is a good thing to do. If you are writing cryptographic primitives, you might want to do something else, and at that point you would want some really robust test strategies (but they will never be enough).
Testing a function returns a specific sequence of numbers tells you virtually nothing about the correctness of your function in terms of "producing random numbers". A predefined sequence of numbers is the opposite of a random sequence.
So, what do you actually want to test? For 'roll' function, I think you'd like to test:
That given 'enough' rolls it produces all the numbers between 1 and 6, preferably in 'approximately' equal proportions.
That it doesn't produce anything else.
The problem with 1. is that your function is defined to be a random sequence, so there is always a non-zero chance that any hard limits you put in to define 'enough' or 'approximately equal' will occasionally fail. You could do some calculations to pick some limits that would make sure your test is unlikely to fail more than e.g. 1 in a billion times, or you could slap a random.seed() call that will mean it will never fail if it passes once (unless the underlying implementation from Python changes).
Item 2. could be 'tested' more easily - generate some large 'N' number of items, check that all are within expected outcome.
For all of this, however, I'd ask what value the unit tests actually are. You literally cannot write a test to check whether something is 'random' or not. To see whether the function has a reasonable source of randomness and uses it correctly, tests are useless - you have to inspect the code. Once you have done that, it's clear that your function is correct (providing Python provides a decent random number generator).
In short, this is one of those cases where unit tests provide extremely little value. I would probably just write one test (item 2 above), and leave it at that.
By seeding the prng with a known seed, you know which sequence it will produce, so you can test for this sequence:
class Tests(unittest.TestCase):
def test_random_roll(self):
random.seed(900)
self.assertEqual(roll(), 6)
self.assertEqual(roll(), 2)
self.assertEqual(roll(), 5)

Can one use negative numbers as seeds for random number generation?

This is not a coding question, but am hoping that someone has come across this in the forums here. I am using Python to run some simulations. I need to run many replications using different random number seeds. I have two questions:
Are negative numbers okay as seeds?
Should I keep some distance in the seeds?
Currently I am using random.org to create 50 numbers between -100000 and +100000, which I use as seeds. Is this okay?
Thanks.
Quoting random.seed([x]):
Optional argument x can be any hashable object.
Both positive and negative numbers are hashable, and many other objects besides.
>>> hash(42)
42
>>> hash(-42)
-42
>>> hash("hello")
-1267296259
>>> hash(("hello", "world"))
759311865
Is it important that your simulations are repeatable? The canonical way to seed a RNG is by using the current system time, and indeed this is random's default behaviour:
random.seed([x])
Initialize the basic random number generator. Optional argument x can be
any hashable object. If x is omitted
or None, current system time is used;
current system time is also used to
initialize the generator when the
module is first imported.
I would only deviate from this behaviour if repeatability is important. If it is important, then your random.org seeds are a reasonable solution.
Should I keep some distance in the seeds?
No. For a good quality RNG, the choice of seed will not affect the quality of the output. A set of seeds [1,2,3,4,5,6,7,8,9,10] should result in the same quality of randomness as any random selection of 10 ints. But even if a selection of random uniformly-distributed seeds were desirable, maintaining some distance would break that distribution.

Categories