Ok this is one of those trickier than it sounds questions so I'm turning to stack overflow because I can't think of a good answer. Here is what I want: I need Python to generate a simple a list of numbers from 0 to 1,000,000,000 in random order to be used for serial numbers (using a random number so that you can't tell how many have been assigned or do timing attacks as easily, i.e. guessing the next one that will come up). These numbers are stored in a database table (indexed) along with the information linked to them. The program generating them doesn't run forever so it can't rely on internal state.
No big deal right? Just generate a list of numbers, shove them into an array and use Python "random.shuffle(big_number_array)" and we're done. Problem is I'd like to avoid having to store a list of numbers (and thus read the file, pop one off the top, save the file and close it). I'd rather generate them on the fly. Problem is that the solutions I can think of have problems:
1) Generate a random number and then check if it has already been used. If it has been used generate a new number, check, repeat as needed until I find an unused one. Problem here is that I may get unlucky and generate a lot of used numbers before getting one that is unused. Possible fix: use a very large pool of numbers to reduce the chances of this (but then I end up with silly long numbers).
2) Generate a random number and then check if it has already been used. If it has been used add or subtract one from the number and check again, keep repeating until I hit an unused number. Problem is this is no longer a random number as I have introduced bias (eventually I will get clumps of numbers and you'd be able to predict the next number with a better chance of success).
3) Generate a random number and then check if it has already been used. If it has been used add or subtract another randomly generated random number and check again, problem is we're back to simply generating random numbers and checking as in solution 1.
4) Suck it up and generate the random list and save it, have a daemon put them into a Queue so there are numbers available (and avoid constantly opening and closing a file, batching it instead).
5) Generate much larger random numbers and hash them (i.e. using MD5) to get a smaller numeric value, we should rarely get collisions, but I end up with larger than needed numbers again.
6) Prepend or append time based information to the random number (i.e. unix timestamp) to reduce chances of a collision, again I get larger numbers than I need.
Anyone have any clever ideas that will reduce the chances of a "collision" (i.e. generating a random number that is already taken) but will also allow me to keep the number "small" (i.e. less than a billion (or a thousand million for your europeans =)).
Answer and why I accepted it:
So I will simply go with 1, and hope it's not an issue, however if it is I will go with the deterministic solution of generating all the numbers and storing them so that there is a guarentee of getting a new random number, and I can use "small" numbers (i.e. 9 digits instead of an MD5/etc.).
This is a neat problem, and I've been thinking about it for a while (with solutions similar to Sjoerd's), but in the end, here's what I think:
Use your point 1) and stop worrying.
Assuming real randomness, the probability that a random number has already been chosen before is the count of previously chosen numbers divided by the size of your pool, i.e. the maximal number.
If you say you only need a billion numbers, i.e. nine digits: Treat yourself to 3 more digits, so you have 12-digit serial numbers (that's three groups of four digits – nice and readable).
Even when you're close to having chosen a billion numbers previously, the probability that your new number is already taken is still only 0,1%.
Do step 1 and draw again. You can still check for an "infinite" loop, say don't try more than 1000 times or so, and then fallback to adding 1 (or something else).
You'll win the lottery before that fallback ever gets used.
You could use Format-Preserving Encryption to encrypt a counter. Your counter just goes from 0 upwards, and the encryption uses a key of your choice to turn it into a seemingly random value of whatever radix and width you want.
Block ciphers normally have a fixed block size of e.g. 64 or 128 bits. But Format-Preserving Encryption allows you to take a standard cipher like AES and make a smaller-width cipher, of whatever radix and width you want (e.g. radix 10, width 9 for the parameters of the question), with an algorithm which is still cryptographically robust.
It is guaranteed to never have collisions (because cryptographic algorithms create a 1:1 mapping). It is also reversible (a 2-way mapping), so you can take the resulting number and get back to the counter value you started with.
AES-FFX is one proposed standard method to achieve this.
I've experimented with some basic Python code for AES-FFX--see Python code here (but note that it doesn't fully comply with the AES-FFX specification). It can e.g. encrypt a counter to a random-looking 7-digit decimal number. E.g.:
0000000 0731134
0000001 6161064
0000002 8899846
0000003 9575678
0000004 3030773
0000005 2748859
0000006 5127539
0000007 1372978
0000008 3830458
0000009 7628602
0000010 6643859
0000011 2563651
0000012 9522955
0000013 9286113
0000014 5543492
0000015 3230955
... ...
For another example in Python, using another non-AES-FFX (I think) method, see this blog post "How to Generate an Account Number" which does FPE using a Feistel cipher. It generates numbers from 0 to 2^32-1.
With some modular arithmic and prime numbers, you can create all numbers between 0 and a big prime, out of order. If you choose your numbers carefully, the next number is hard to guess.
modulo = 87178291199 # prime
incrementor = 17180131327 # relative prime
current = 433494437 # some start value
for i in xrange(1, 100):
print current
current = (current + incrementor) % modulo
If they don't have to be random, but just not obviously linear (1, 2, 3, 4, ...), then here's a simple algorithm:
Pick two prime numbers. One of them will be the largest number you can generate, so it should be around one billion. The other should be fairly large.
max_value = 795028841
step = 360287471
previous_serial = 0
for i in xrange(0, max_value):
previous_serial += step
previous_serial %= max_value
print "Serial: %09i" % previous_serial
Just store the previous serial each time so you know where you left off. I can't prove mathmatically that this works (been too long since those particular classes), but it's demonstrably correct with smaller primes:
s = set()
with open("test.txt", "w+") as f:
previous_serial = 0
for i in xrange(0, 2711):
previous_serial += 1811
previous_serial %= 2711
assert previous_serial not in s
s.add(previous_serial)
You could also prove it empirically with 9-digit primes, it'd just take a bit more work (or a lot more memory).
This does mean that given a few serial numbers, it'd be possible to figure out what your values are--but with only nine digits, it's not likely that you're going for unguessable numbers anyway.
If you don't need something cryptographically secure, but just "sufficiently obfuscated"...
Galois Fields
You could try operations in Galois Fields, e.g. GF(2)32, to map a simple incrementing counter x to a seemingly random serial number y:
x = counter_value
y = some_galois_function(x)
Multiply by a constant
Inverse is to multiply by the reciprocal of the constant
Raise to a power: xn
Reciprocal x-1
Special case of raising to power n
It is its own inverse
Exponentiation of a primitive element: ax
Note that this doesn't have an easily-calculated inverse (discrete logarithm)
Ensure a is a primitive element, aka generator
Many of these operations have an inverse, which means, given your serial number, you can calculate the original counter value from which it was derived.
As for finding a library for Galois Field for Python... good question. If you don't need speed (which you wouldn't for this) then you could make your own. I haven't tried these:
NZMATH
Finite field Python package
Sage, although it's a whole environment for mathematical computing, much more than just a Python library
Matrix multiplication in GF(2)
Pick a suitable 32×32 invertible matrix in GF(2), and multiply a 32-bit input counter by it. This is conceptually related to LFSR, as described in S.Lott's answer.
CRC
A related possibility is to use a CRC calculation. Based on the remainder of long-division with an irreducible polynomial in GF(2). Python code is readily available for CRCs (crcmod, pycrc), although you might want to pick a different irreducible polynomial than is normally used, for your purposes. I'm a little fuzzy on the theory, but I think a 32-bit CRC should generate a unique value for every possible combination of 4-byte inputs. Check this. It's quite easy to experimentally check this, by feeding the output back into the input, and checking that it produces a complete cycle of length 232-1 (zero just maps to zero). You may need to get rid of any initial/final XORs in the CRC algorithm for this check to work.
I think you are overestimating the problems with approach 1). Unless you have hard-realtime requirements just checking by random choice terminates rather fast. The probability of needing more than a number of iterations decays exponentially. With 100M numbers outputted (10% fillfactor) you'll have one in billion chance of requiring more than 9 iterations. Even with 50% of numbers taken you'll on average need 2 iterations and have one in a billion chance of requiring more than 30 checks. Or even the extreme case where 99% of the numbers are already taken might still be reasonable - you'll average a 100 iterations and have 1 in a billion change of requiring 2062 iterations
The standard Linear Congruential random number generator's seed sequence CANNOT repeat until the full set of numbers from the starting seed value have been generated. Then it MUST repeat precisely.
The internal seed is often large (48 or 64 bits). The generated numbers are smaller (32 bits usually) because the entire set of bits are not random. If you follow the seed values they will form a distinct non-repeating sequence.
The question is essentially one of locating a good seed that generates "enough" numbers. You can pick a seed, and generate numbers until you get back to the starting seed. That's the length of the sequence. It may be millions or billions of numbers.
There are some guidelines in Knuth for picking suitable seeds that will generate very long sequences of unique numbers.
You can run 1) without running into the problem of too many wrong random numbers if you just decrease the random interval by one each time.
For this method to work, you will need to save the numbers already given (which you want to do anyway) and also save the quantity of numbers taken.
It is pretty obvious that, after having collected 10 numbers, your pool of possible random numbers will have been decreased by 10. Therefore, you must not choose a number between 1 and 1.000.000 but between 1 an 999.990. Of course this number is not the real number but only an index (unless the 10 numbers collected have been 999.991, 999.992, …); you’d have to count now from 1 omitting all the numbers already collected.
Of course, your algorithm should be smarter than just counting from 1 to 1.000.000 but I hope you understand the method.
I don’t like drawing random numbers until I get one which fits either. It just feels wrong.
My solution https://github.com/glushchenko/python-unique-id, i think you should extend matrix for 1,000,000,000 variations and have fun.
I'd rethink the problem itself... You don't seem to be doing anything sequential with the numbers... and you've got an index on the column which has them. Do they actually need to be numbers?
Consider a sha hash... you don't actually need the entire thing. Do what git or other url shortening services do, and take first 3/4/5 characters of the hash. Given that each character now has 36 possible values instead of 10, you have 2,176,782,336 combinations instead of 999,999 combinations (for six digits). Combine that with a quick check on whether the combination exists (a pure index query) and a seed like a timestamp + random number and it should do for almost any situation.
Do you need this to be cryptographically secure or just hard to guess? How bad are collisions? Because if it needs to be cryptographically strong and have zero collisions, it is, sadly, impossible.
I started trying to write an explanation of the approach used below, but just implementing it was easier and more accurate. This approach has the odd behavior that it gets faster the more numbers you've generated. But it works, and it doesn't require you to generate all the numbers in advance.
As a simple optimization, you could easily make this class use a probabilistic algorithm (generate a random number, and if it's not in the set of used numbers add it to the set and return it) at first, keep track of the collision rate, and switch over to the deterministic approach used here once the collision rate gets bad.
import random
class NonRepeatingRandom(object):
def __init__(self, maxvalue):
self.maxvalue = maxvalue
self.used = set()
def next(self):
if len(self.used) >= self.maxvalue:
raise StopIteration
r = random.randrange(0, self.maxvalue - len(self.used))
result = 0
for i in range(1, r+1):
result += 1
while result in self.used:
result += 1
self.used.add(result)
return result
def __iter__(self):
return self
def __getitem__(self):
raise NotImplemented
def get_all(self):
return [i for i in self]
>>> n = NonRepeatingRandom(20)
>>> n.get_all()
[12, 14, 13, 2, 20, 4, 15, 16, 19, 1, 8, 6, 7, 9, 5, 11, 10, 3, 18, 17]
If it is enough for you that a casual observer can't guess the next value, you can use things like a linear congruential generator or even a simple linear feedback shift register to generate the values and keep the state in the database in case you need more values. If you use these right, the values won't repeat until the end of the universe. You'll find more ideas in the list of random number generators.
If you think there might be someone who would have a serious interest to guess the next values, you can use a database sequence to count the values you generate and encrypt them with an encryption algorithm or another cryptographically strong perfect has function. However you need to take care that the encryption algorithm isn't easily breakable if one can get hold of a sequence of successive numbers you generated - a simple RSA, for instance, won't do it because of the Franklin-Reiter Related Message Attack.
Bit late answer, but I haven't seen this suggested anywhere.
Why not use the uuid module to create globally unique identifiers
To generate a list of totally random numbers within a defined threshold, as follows:
plist=list()
length_of_list=100
upbound=1000
lowbound=0
while len(pList)<(length_of_list):
pList.append(rnd.randint(lowbound,upbound))
pList=list(set(pList))
I bumped into the same problem and opened a question with a different title before getting to this one. My solution is a random sample generator of indexes (i.e. non-repeating numbers) in the interval [0,maximal), called itersample. Here are some usage examples:
import random
generator=itersample(maximal)
another_number=generator.next() # pick the next non-repeating random number
or
import random
generator=itersample(maximal)
for random_number in generator:
# do something with random_number
if some_condition: # exit loop when needed
break
itersample generates non-repeating random integers, storage need is limited to picked numbers, and the time needed to pick n numbers should be (as some tests confirm) O(n log(n)), regardelss of maximal.
Here is the code of itersample:
import random
def itersample(c): # c = upper bound of generated integers
sampled=[]
def fsb(a,b): # free spaces before middle of interval a,b
fsb.idx=a+(b+1-a)/2
fsb.last=sampled[fsb.idx]-fsb.idx if len(sampled)>0 else 0
return fsb.last
while len(sampled)<c:
sample_index=random.randrange(c-len(sampled))
a,b=0,len(sampled)-1
if fsb(a,a)>sample_index:
yielding=sample_index
sampled.insert(0,yielding)
yield yielding
elif fsb(b,b)<sample_index+1:
yielding=len(sampled)+sample_index
sampled.insert(len(sampled),yielding)
yield yielding
else: # sample_index falls inside sampled list
while a+1<b:
if fsb(a,b)<sample_index+1:
a=fsb.idx
else:
b=fsb.idx
yielding=a+1+sample_index
sampled.insert(a+1,yielding)
yield yielding
You are stating that you store the numbers in a database.
Wouldn't it then be easier to store all the numbers there, and ask the database for a random unused number?
Most databases support such a request.
Examples
MySQL:
SELECT column FROM table
ORDER BY RAND()
LIMIT 1
PostgreSQL:
SELECT column FROM table
ORDER BY RANDOM()
LIMIT 1
Related
I've got a large data problem. My python program calculates numbers from 1 to 2^32, and I want to know if I've already calculated a number.
I could track them as a bitmap using half a gigabyte of memory. But since some numbers can be put in a bag along with others (approx 100 numbers per bag), I was wondering, if there is another way of storing my values, like hashes but less memory consuming.
Thank you for your help.
As I said in my comment above, you can use a setthat stores all computed numbers.
And wehn you compute a new number, you check if it's already the the set.
Let consider compute is the function that computes numbers
computedNumbers = set() # initialize set
for i in range(1, 2**32): # for loop
number = compute(i)
if number in computedNumbers:
print("Number", number, " is already computed!")
else:
computedNumbers.add(number) # add the number to the set
Hope this helps you.
I'm currently trying to solve the 'dance recital' kattis challenge in Python 3. See here
After taking input on how many performances there are in the dance recital, you must arrange performances in such a way that sequential performances by a dancer are minimized.
I've seen this challenge completed in C++, but my code kept running out of time and I wanted to optimize it.
Question: As of right now, I generate all possible permutations of performances and run comparisons off of that. A faster way to would be to not generate all permutations, as some of them are simply reversed and would result in the exact same output.
import itertools
print(list(itertools.permutations(range(2)))) --> [(0,1),(1,0)] #They're the same, backwards and forwards
print(magic_algorithm(range(2))) --> [(0,1)] #This is what I want
How might I generate such a list of permutations?
I've tried:
-Generating all permutation, running over them again to reversed() duplicates and saving them. This takes too long and the result cannot be hard coded into the solution as the file becomes too big.
-Only generating permutations up to the half-way mark, then stopping, assuming that after that, no unique permutations are generated (not true, as I found out)
-I've checked out questions here, but no one seems to have the same question as me, ditto on the web
Here's my current code:
from itertools import permutations
number_of_routines = int(input()) #first line is number of routines
dance_routine_list = [0]*10
permutation_list = list(permutations(range(number_of_routines))) #generate permutations
for q in range(number_of_routines):
s = input()
for c in s:
v = ord(c) - 65
dance_routine_list[q] |= (1 << v) #each routine ex.'ABC' is A-Z where each char represents a performer in the routine
def calculate():
least_changes_possible = 1e9 #this will become smaller, as optimizations are found
for j in permutation_list:
tmp = 0
for i in range(1,number_of_routines):
tmp += (bin(dance_routine_list[j[i]] & dance_routine_list[j[i - 1]]).count('1')) #each 1 represents a performer who must complete sequential routines
least_changes_possible = min(least_changes_possible, tmp)
return least_changes_possible
print(calculate())
Edit: Took a shower and decided adding a 2-element-comparison look-up table would speed it up, as many of the operations are repeated. Still doesn't fix iterating over the whole permutations, but it should help.
Edit: Found another thread that answered this pretty well. How to generate permutations of a list without "reverse duplicates" in Python using generators
Thank you all!
There are at most 10 possible dance routines, so at most 3.6M permutations, and even bad algorithms like generate 'em all and test will be done very quickly.
If you wanted a fast solution for up to 24 or so routines, then I would do it like this...
Given the the R dance routines, at any point in the recital, in order to decide which routine you can perform next, you need to know:
Which routines you've already performed, because there you can't do those ones next. There are 2R possible sets of already-performed routines; and
Which routine was performed last, because that helps determine the cost of the next one. There are at most R-1 possible values for that.
So there are at less than (R-2)*2R possible recital states...
Imagine a directed graph that connects each possible state to all the possible following states, by an edge for the routine that you would perform to get to that state. Label each edge with the cost of performing that routine.
For example, if you've performed routines 5 and 6, with 5 last, then you would be in state (5,6):5, and there would be an edge to (3,5,6):3 that you could get to after performing routine 3.
Starting at the initial "nothing performed yet" state ():-, use Dijkstra's algorithm to find the least cost path to a state with all routines performed.
Total complexity is O(R2*2R) or so, depending exactly how you implement it.
For R=10, R2*2R is ~100 000, which won't take very long at all. For R=24 it's about 9 billion, which is going to take under half a minute in pretty good C++.
I want to find a seed that creates a specific number sequence:
[115,91,45,76,78,93,35,5,29,8,99,88,98,70,40,116,11,39,102,41,124,98,120,57,36,67,57,23,52,34,75,32,117,66,12,19,86,67,62,121,60,5,54,37,65,18,5,56,66,115,32,99,73,70,115,73,123,74,31]
I wonder if I could find one of seed that give me this result with the function get() I created :
def get():
seed(x)
return [choice(range(128)) for _ in range(59)]
with x a constant equal to the number that, apply as seed, give me the right top above sequence.
This is a little program I made to expect find it, but right now I'm about 1.6 milions tested seed and still nothing.
from random import choice, seed
lc =[115,91,45,76,78,93,35,5,29,8,99,88,98,70,40,116,11,39,102,41,124,98,120,57,36,67,57,23,52,34,75,32,117,66,12,19,86,67,62,121,60,5,54,37,65,18,5,56,66,115,32,99,73,70,115,73,123,74,31]
sd, h = 0,0
while 1:
seed(sd)
for c, o in enumerate(lc):
if not choice(range(128)) == o:
if c > h :
print(f"[Seeed {sd}] {c} matchs")
h = c
sd += 1
break
Can someone help me to find one of the right seed ?
I hope it is not possible.
Technically, it is possible to code a quasi-random generator that allows restoring a seed by a short sequence of results. But normal quasi-random generator should disallow that.
E.g. for quite common Mersenne Twister the internal state is 624 ints. But your seed is just one int.
Even if you brute-force the seed that gives you’re the same short sequence, the whole internal state actual may be different and consequent generation will goes completely other way.
Any seeded PRNG will have a formula to generate the next number from the internal data it holds. With something as simple as a Linear Congruential PRNG then it is easy to back-calculate the internal data and the numbers used in the formula from the output. With a more complex PRNG, such as Mersenne Twister, then the back-calculation becomes very difficult.
One solution would be to copy the sequence of numbers you want and store them somewhere, pulling them from the store as needed. Alternatively read the documentation of the PRNG used to generate those numbers initially to see if a back-calculation is possible.
If the numbers came from a cryptographically secure PRNG then your task becomes orders of magnitude more difficult.
Using brute force and assuming that every seed represent an extraction from your set of number (128), with replacement, you have a probability of
1/(128)^59 = 1 / 2.1153791001287955166461289857048673274508949854856999 × 10^124
for every extraction to get your exact set of numbers (assuming uniform distribution for every number extraction of your random function). Which is a probability pretty near to zero.
So yes. You could hang (almost) forever for that brute force search
I need to crack a sha256 hash, and I know the answer is in coordinates, but I don't know what are the coordinate values
example:
3f1c756daec9ebced7ff403acb10430659c13b328c676c4510773dc315784e4e
58.375782 26.742632
Is it possible to create a python script that makes two variables (both with the value 00.000000), then add them togheter (ex: k=i+" "+j), then converts k into sha256 and compares it to the sha256, I'm trying to crack. If it doesn't equal the sha256 being cracked, then it adds i a value (i=i+00.000001) and triess again. and so on and so on
Producing all possible coordinates between 00.000000 and 99.999999 is easy enough:
from itertools import product
import hashlib
digits = '0123456789'
for combo in product(digits, repeat=16):
coords = '{}.{} {}.{}'.format(
''.join(combo[:2]), ''.join(combo[2:8]),
''.join(combo[8:10]), ''.join(combo[10:]))
hash = hashlib.sha256(coords).hexdigest()
if hash == '3f1c756daec9ebced7ff403acb10430659c13b328c676c4510773dc315784e4e':
print coords
break
This'll brute-force all 10**16 (a big number) combinations. Sit back and relax, this'll take a while.
And by 'a while', we really mean not in your lifetime, or anyone else's. Just iterating over all possible combinations produced by product() takes a huge amount of time, as each added digit to try increases the time required by a factor of 10:
>>> from collections import deque
>>> from itertools import product
>>> from timeit import timeit
>>> digits = '0123456789'
>>> timeit(lambda: deque(product(digits, repeat=8), 0), number=5)
3.014396679995116
>>> timeit(lambda: deque(product(digits, repeat=9), 0), number=5)
30.99540744899423
If producing all possible combinations of 8 digits takes .8 seconds (4s divided by 5 repetitions), 9 digits takes 8 seconds, you can extrapolate from that that 10 digits takes almost 1.5 minutes, etc. Just producing all possible combinations of 16 digits takes 1 million (10 ** 6) times as much time as 10 digits, so 963 days or just onder 3 years to run those in a loop. You could split this task up across 2000 different processes on a large number of computers with enough cores in total to run those processes in parallel, to reduce that to under 12 hours.
Then the loop body itself takes about 2.4 seconds per million iterations:
>>> from random import choice
>>> combo = tuple(choice(digits) for _ in range(16)) # random combination for testing
>>> timeit("""\
... coords = '{}.{} {}.{}'.format(
... ''.join(combo[:2]), ''.join(combo[2:8]),
... ''.join(combo[8:10]), ''.join(combo[10:]))
... hash = hashlib.sha256(coords).hexdigest()
... if hash == '3f1c756daec9ebced7ff403acb10430659c13b328c676c4510773dc315784e4e': pass
... """, 'from __main__ import combo; import hashlib')
2.3429908752441406
But you have 10 ** 10 (10 thousand million) more times work than that, totaling roughly 743 years of computation work. Even being able to run 20 thousand parallel processes won't reduce that to a reasonable number (that's still about 13.5 years of work).
Python is just not fast enough for this task. Using GPUs it should be possible to reach 500 million hashes per second (0.5 Gigahash / s), at which point you could run the above brute-force operation and find a solution in about 230 days on such a system. At a cost, of course, because such a rig would cost about $3000-$4000 a month to run! But with enough dedicated hardware you can certainly 'crack' the hash in 'humane' timeline.
One of the common claims about hashes is that they discard information, therefore they cannot be reversed. There's infinite messages that have the same hash. You can't know which of the infinite messages that give the same hash is correct.
Of course in practice, a brute force attack often works - either because your search strategy is likely to find the real original message first (most messages with hash collisions are obviously wrong in some trivial way - e.g. the wrong format - and won't turn up in the search because of that) or because your attack needs a different message with the same hash anyway.
In your case, what you know about the message means there's less information in the message than (apparently) in the hash. Of course hashing doesn't create new information, so that means many hashes cannot occur for any co-ordinate strings. You have (with very high probability for good hash algorithms) a 1:1 relationship between possible hashes and possible messages. In principle, you have an encrypted form of your message which can be decrypted.
Many people would call me an idiot for saying this, of course. After all, you still have to find all the hashes for all the possible messages. That may be faster than some people think, but it's a long way from trivial.
It has already been pointed out that there are 10^16 possible combinations based on your co-ordinate format. One thing to check is whether all values for all those digits are possible (and equally probable). Using floating point arithmetic internally shouldn't be a problem - double precision floats aren't 8-digit decimals, but 53 bits of mantissa should be plenty to ensure all those decimal digits are fully in use. However, it may be worth checking that there's no other limitation that reduces the number of cases to check - the obvious one being the precision in how those co-ordinates are measured.
Even if certain digit values are less probable than others, that means ordering the search to check more likely values first will save a lot of time for crackers.
I have the following code:
import random
rand1 = random.Random()
rand2 = random.Random()
rand1.seed(0)
rand2.seed(0)
rand1.jumpahead(1)
rand2.jumpahead(2)
x = [rand1.random() for _ in range(0,5)]
y = [rand2.random() for _ in range(0,5)]
According to the documentation of jumpahead() function I expected x and y to be (pseudo)independent sequences. But the output that I get is:
x: [0.038378463064751012, 0.79353887395667977, 0.13619161852307016, 0.82978789012683285, 0.44296031215986331]
y: [0.98374801970498793, 0.79353887395667977, 0.13619161852307016, 0.82978789012683285, 0.44296031215986331]
If you notice, the 2nd-5th numbers are same. This happens each time I run the code.
Am I missing something here?
rand1.seed(0)
rand2.seed(0)
You initialize them with the same values so you get the same (non-)randomness. Use some value like the current unix timestamp to seed it and you will get better values. But note that if you initialize two RNGs at the same time with the current time though, you will get the same "random" values from them of course.
Update: Just noticed the jumpahead() stuff: Have a look at How should I use random.jumpahead in Python - it seems to answer your question.
I think there is a bug, python's documentation does not make this as clear as it should.
The difference between your two parameters to jumpahead is 1, this means you are only guaranteed to get 1 unique value (which is what happens). if you want more values, you need larger parameters.
EDIT: Further Explanation
Originally, as the name suggests, jumpahead merely jumped ahead in the sequence. Its clear to see in that case where jumping 1 or 2 places ahead in the sequence would not produce independent results. As it turns out, jumping ahead in most random number generators is inefficient. For that reason, python only approximates jumping ahead. Because its only approximate, python can implement a more effecient algorithm. However, the method is "pretending" to jump ahead, passing two similiar integers will not result in a very different sequence.
To get different sequences you need the integers passed in to be far apart. In particular, if you want to read a million random integers, you need to seperate your jumpaheads by a million.
As a final note, if you have two random number generators, you only need to jumpahead on one of them. You can (and should) leave the other in its original state.