What is the difference between random.sample and random.shuffle in Python - python

I have a list a_tot with 1500 elements and I would like to divide this list into two lists in a random way. List a_1 would have 1300 and list a_2 would have 200 elements. My question is about the best way to randomize the original list with 1500 elements. When I have randomized the list, I could take one slice with 1300 and another slice with 200.
One way is to use the random.shuffle, another way is to use the random.sample. Any differences in the quality of the randomization between the two methods? The data in list 1 should be a random sample as well as the data in list2.
Any recommendations?
using shuffle:
random.shuffle(a_tot) #get a randomized list
a_1 = a_tot[0:1300] #pick the first 1300
a_2 = a_tot[1300:] #pick the last 200
using sample
new_t = random.sample(a_tot,len(a_tot)) #get a randomized list
a_1 = new_t[0:1300] #pick the first 1300
a_2 = new_t[1300:] #pick the last 200

The source for shuffle:
def shuffle(self, x, random=None, int=int):
"""x, random=random.random -> shuffle list x in place; return None.
Optional arg random is a 0-argument function returning a random
float in [0.0, 1.0); by default, the standard random.random.
"""
if random is None:
random = self.random
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random() * (i+1))
x[i], x[j] = x[j], x[i]
The source for sample:
def sample(self, population, k):
"""Chooses k unique random elements from a population sequence.
Returns a new list containing elements from the population while
leaving the original population unchanged. The resulting list is
in selection order so that all sub-slices will also be valid random
samples. This allows raffle winners (the sample) to be partitioned
into grand prize and second place winners (the subslices).
Members of the population need not be hashable or unique. If the
population contains repeats, then each occurrence is a possible
selection in the sample.
To choose a sample in a range of integers, use xrange as an argument.
This is especially fast and space efficient for sampling from a
large population: sample(xrange(10000000), 60)
"""
# XXX Although the documentation says `population` is "a sequence",
# XXX attempts are made to cater to any iterable with a __len__
# XXX method. This has had mixed success. Examples from both
# XXX sides: sets work fine, and should become officially supported;
# XXX dicts are much harder, and have failed in various subtle
# XXX ways across attempts. Support for mapping types should probably
# XXX be dropped (and users should pass mapping.keys() or .values()
# XXX explicitly).
# Sampling without replacement entails tracking either potential
# selections (the pool) in a list or previous selections in a set.
# When the number of selections is small compared to the
# population, then tracking selections is efficient, requiring
# only a small set and an occasional reselection. For
# a larger number of selections, the pool tracking method is
# preferred since the list takes less space than the
# set and it doesn't suffer from frequent reselections.
n = len(population)
if not 0 <= k <= n:
raise ValueError, "sample larger than population"
random = self.random
_int = int
result = [None] * k
setsize = 21 # size of a small set minus size of an empty list
if k > 5:
setsize += 4 ** _ceil(_log(k * 3, 4)) # table size for big sets
if n <= setsize or hasattr(population, "keys"):
# An n-length list is smaller than a k-length set, or this is a
# mapping type so the other algorithm wouldn't work.
pool = list(population)
for i in xrange(k): # invariant: non-selected at [0,n-i)
j = _int(random() * (n-i))
result[i] = pool[j]
pool[j] = pool[n-i-1] # move non-selected item into vacancy
else:
try:
selected = set()
selected_add = selected.add
for i in xrange(k):
j = _int(random() * n)
while j in selected:
j = _int(random() * n)
selected_add(j)
result[i] = population[j]
except (TypeError, KeyError): # handle (at least) sets
if isinstance(population, list):
raise
return self.sample(tuple(population), k)
return result
As you can see, in both cases, the randomization is essentially done by the line int(random() * n). So, the underlying algorithm is essentially the same.

There are two major differences between shuffle() and sample():
1) Shuffle will alter data in-place, so its input must be a mutable sequence. In contrast, sample produces a new list and its input can be much more varied (tuple, string, xrange, bytearray, set, etc).
2) Sample lets you potentially do less work (i.e. a partial shuffle).
It is interesting to show the conceptual relationships between the two by demonstrating that is would have been possible to implement shuffle() in terms of sample():
def shuffle(p):
p[:] = sample(p, len(p))
Or vice-versa, implementing sample() in terms of shuffle():
def sample(p, k):
p = list(p)
shuffle(p)
return p[:k]
Neither of these are as efficient at the real implementation of shuffle() and sample() but it does show their conceptual relationships.

The randomization should be just as good with both option. I'd say go with shuffle, because it's more immediately clear to the reader what it does.

random.shuffle() shuffles the given list in-place. Its length stays the same.
random.sample() picks n items out of the given sequence without replacement (which also might be a tuple or whatever, as long as it has a __len__()) and returns them in randomized order.

I think they are quite the same, except that one updated the original list, one use (read only) it. No differences in quality.

from random import shuffle
from random import sample
x = [[i] for i in range(10)]
shuffle(x)
sample(x,10)
shuffle update the output in same list but sample return the update
list sample provide the no of argument in pic facility but shuffle
provide the list of same length of input

Related

Fastest way to find sub-lists of a fixed length, from a given list of values, whose elements sum equals a defined number

In Python 3.6, suppose that I have a list of numbers L, and that I want to find all possible sub-lists S of a given pre-chosen length |S|, such that:
any S has to have length smaller than L, that is |S| < |L|
any S can only contain numbers present in L
numbers in S do not have to be unique (they can appear repeatedly)
the sum of all numbers in S should be equal to a pre-determined number N
A trivial solution for this can be found using the Cartesian Product with itertools.product. For example, suppose L is a simple list of all integers between 1 and 10 (inclusive) and |S| is chosen to be 3. Then:
import itertools
L = range(1,11)
N = 8
Slength = 3
result = [list(seq) for seq in itertools.product(L, repeat=Slength) if sum(seq) == N]
However, as larger lists L are chosen, and or larger |S|, the above approach becomes extremely slow. In fact, even for L = range(1,101) with |S|=5 and N=80, the computer almost freezes and it takes approximately an hour to compute the result.
My take is that:
there is a lot of unnecessary computations going on there under the hood, given the condition that sub-lists should sum to N
there is a ton of cache misses due to iterating over possibly millions of lists generated by itertools.product to just keep much much fewer
So, my question/challenge is: is there a way I can do this in a more computationally efficient way? Unless we are talking hundreds of Gigabytes, speed to me is more critical than memory, so the challenge focuses more on speed, even if considerations for memory efficiency are a welcome bonus.
So given an input list and a target length and sum, you want all the permutations of the numbers in the input list such that:
The sum equals the target sum
The length equals the target length
The following code should be faster:
# Input
input_list = range(1,101)
# Targets
target_sum = 15
target_length = 5
# Available numbers
numbers = set(input_list)
# Initialize the stack
stack = [[num] for num in numbers]
result = []
# Loop until we run out of permutations
while stack:
# Get a permutation from the stack
current = stack.pop()
# If it's too short
if len(current) < target_length:
# And the sum is too small
if sum(current) < target_sum:
# Then for each available number
for num in numbers:
# Append said number and put the resulting permutation back into the stack
stack.append(current + [num])
# If it's not too short and the sum equals the target, add to the result!
elif sum(current) == target_sum:
result.append(current)
print(len(result))

Random Sample of N Distinct Permutations of a List

Suppose I have a Python list of arbitrary length k. Now, suppose I would like a random sample of n , (where n <= k!) distinct permutations of that list. I was tempted to try:
import random
import itertools
k = 6
n = 10
mylist = list(range(0, k))
j = random.sample(list(itertools.permutations(mylist)), n)
for i in j:
print(i)
But, naturally, this code becomes unusably slow when k gets too large. Given that the number of permutations that I may be looking for n is going to be relatively small compared to the total number of permutations, computing all of the permutations is unnecessary. Yet it's important that none of the permutations in the final list are duplicates.
How would you achieve this more efficiently? Remember, mylist could be a list of anything, I just used list(range(0, k)) for simplicity.
You can generate permutations, and keep track of the ones you have already generated. To make it more versatile, I made a generator function:
import random
k = 6
n = 10
mylist = list(range(0, k))
def perm_generator(seq):
seen = set()
length = len(seq)
while True:
perm = tuple(random.sample(seq, length))
if perm not in seen:
seen.add(perm)
yield perm
rand_perms = perm_generator(mylist)
j = [next(rand_perms) for _ in range(n)]
for i in j:
print(i)
Naïve implementation
Bellow the naïve implementation I did (well implemented by #Tomothy32, pure PSL using generator):
import numpy as np
mylist = np.array(mylist)
perms = set()
for i in range(n): # (1) Draw N samples from permutations Universe U (#U = k!)
while True: # (2) Endless loop
perm = np.random.permutation(k) # (3) Generate a random permutation form U
key = tuple(perm)
if key not in perms: # (4) Check if permutation already has been drawn (hash table)
perms.update(key) # (5) Insert into set
break # (6) Break the endless loop
print(i, mylist[perm])
It relies on numpy.random.permutation which randomly permute a sequence.
The key idea is:
to generate a new random permutation (index randomly permuted);
to check if permutation already exists and store it (as tuple of int because it must hash) to prevent duplicates;
Then to permute the original list using the index permutation.
This naïve version does not directly suffer to factorial complexity O(k!) of itertools.permutations function which does generate all k! permutations before sampling from it.
About Complexity
There is something interesting about the algorithm design and complexity...
If we want to be sure that the loop could end, we must enforce N <= k!, but it is not guaranteed. Furthermore, assessing the complexity requires to know how many time the endless-loop will actually loop before a new random tuple is found and break it.
Limitation
Let's encapsulate the function written by #Tomothy32:
import math
def get_perms(seq, N=10):
rand_perms = perm_generator(mylist)
return [next(rand_perms) for _ in range(N)]
For instance, this call work for very small k<7:
get_perms(list(range(k)), math.factorial(k))
But will fail before O(k!) complexity (time and memory) when k grows because it boils down to randomly find a unique missing key when all other k!-1 keys have been found.
Always look on the bright side...
On the other hand, it seems the method can generate a reasonable amount of permuted tuples in a reasonable amount of time when N<<<k!. Example, it is possible to draw more than N=5000 tuples of length k where 10 < k < 1000 in less than one second.
When k and N are kept small and N<<<k!, then the algorithm seems to have a complexity:
Constant versus k;
Linear versus N.
This is somehow valuable.

How can I shuffle a very large list stored in a file in Python?

I need to deterministically generate a randomized list containing the numbers from 0 to 2^32-1.
This would be the naive (and totally nonfunctional) way of doing it, just so it's clear what I'm wanting.
import random
numbers = range(2**32)
random.seed(0)
random.shuffle(numbers)
I've tried making the list with numpy.arange() and using pycrypto's random.shuffle() to shuffle it. Making the list ate up about 8gb of ram, then shuffling raised that to around 25gb. I only have 32gb to give. But that doesn't matter because...
I've tried cutting the list into 1024 slices and trying the above, but even one of these slices takes way too long. I cut one of these slices into 128 yet smaller slices, and that took about 620ms each. If it grew linearly, then that means the whole thing would take about 22 and a half hours to complete. That sounds alright, but it doesn't grow linearly.
Another thing I've tried is generating random numbers for every entry and using those as indices for their new location. I then go down the list and attempt to place the number at the new index. If that index is already in use, the index is incremented until it finds a free one. This works in theory, and it can do about half of it, but near the end it keeps having to search for new spots, wrapping around the list several times.
Is there any way to pull this off? Is this a feasible goal at all?
Computing all the values seems impossible, since Crypto compute a random integer in about a milisecond, so the whole job take days.
Here is a Knuth algorithm implementation as a generator:
from Crypto.Random.random import randint
import numpy as np
def onthefly(n):
numbers=np.arange(n,dtype=np.uint32)
for i in range(n):
j=randint(i,n-1)
numbers[i],numbers[j]=numbers[j],numbers[i]
yield numbers[i]
For n=10 :
gen=onthefly(10)
print([next(gen) for i in range(9)])
print(next(gen))
#[9, 0, 2, 6, 4, 8, 7, 3, 1]
#5
For n=2**32, the generator take a minute to initialize, but calls are O(1).
If you have a continuous range of numbers, you don't need to store them at all. It is easy to devise a bidirectional mapping between the value in a shuffled list and its position in that list. The idea is to use a pseudo-random permutation and this is exactly what block ciphers provide.
The trick is to find a block cipher that matches exactly your requirement of 32-bit integers. There are very few such block ciphers, but the Simon and Speck ciphers (released by the NSA) are parameterisable and support a block size of 32-bit (usually block sizes are much larger).
This library seems to provide an implementation of that. We can devise the following functions:
def get_value_from_index(key, i):
cipher = SpeckCipher(key, mode='ECB', key_size=64, block_size=32)
return cipher.encrypt(i)
def get_index_from_value(key, val):
cipher = SpeckCipher(key, mode='ECB', key_size=64, block_size=32)
return cipher.decrypt(val)
The library works with Python's big integers, so you might not even need to encode them.
A 64-bit key (for example 0x123456789ABCDEF0) is not much. You could use a similar construction that increased the key size in DES to Triple DES. Keep in mind that keys should be chosen randomly and they have to be constant if you want determinism.
If you don't want to use an algorithm by the NSA for that, I would understand. There are others, but I can't find them now. The Hasty Pudding cipher is even more flexible, but I don't know if there is an implementation of that for Python.
The class I created uses a bitarray of keep track which numbers have already been used. With the comments, I think the code is pretty self explanatory.
import bitarray
import random
class UniqueRandom:
def __init__(self):
""" Init boolean array of used numbers and set all to False
"""
self.used = bitarray.bitarray(2**32)
self.used.setall(False)
def draw(self):
""" Draw a previously unused number
Return False if no free numbers are left
"""
# Check if there are numbers left to use; return False if none are left
if self._free() == 0:
return False
# Draw a random index
i = random.randint(0, 2**32-1)
# Skip ahead from the random index to a undrawn number
while self.used[i]:
i = (i+1) % 2**32
# Update used array
self.used[i] = True
# return the selected number
return i
def _free(self):
""" Check how many places are unused
"""
return self.used.count(False)
def main():
r = UniqueRandom()
for _ in range(20):
print r.draw()
if __name__ == '__main__':
main()
Design considerations
While Garrigan Stafford's answer is perfectly fine, the memory footprint of this solution is much smaller (a bit more than 4 GB). Another difference between our answers is that Garrigan's algorithm takes more time to generate a random number when the amount of generated numbers increases (because he keeps iterating until an unused number is found). This algorithm just looks up the next unused number if a certain number is already used. This makes the time it takes to draw a number every time practically the same, regardless of how far the pool of free numbers is exhausted.
Here is a permutation RNG which I wrote which uses the fact that a squaring a number mod a prime (plus some intricacies) gives a pseudorandom permutation.
https://github.com/pytorch/pytorch/blob/09b4f4f2ff88088306ecedf1bbe23d8aac2d3f75/torch/utils/data/_utils/index_utils.py
Short version:
def _is_prime(n):
if n == 2:
return True
if n == 1 or n % 2 == 0:
return False
for d in range(3, floor(sqrt(n)) + 1, 2): # can use isqrt in Python 3.8
if n % d == 0:
return False
return True
class Permutation(Range):
"""
Generates a random permutation of integers from 0 up to size.
Inspired by https://preshing.com/20121224/how-to-generate-a-sequence-of-unique-random-integers/
"""
size: int
prime: int
seed: int
def __init__(self, size: int, seed: int):
self.size = size
self.prime = self._get_prime(size)
self.seed = seed % self.prime
def __getitem__(self, index):
x = self._map(index)
while x >= self.size:
# If we map to a number greater than size, then the cycle of successive mappings must eventually result
# in a number less than size. Proof: The cycle of successive mappings traces a path
# that either always stays in the set n>=size or it enters and leaves it,
# else the 1:1 mapping would be violated (two numbers would map to the same number).
# Moreover, `set(range(size)) - set(map(n) for n in range(size) if map(n) < size)`
# equals the `set(map(n) for n in range(size, prime) if map(n) < size)`
# because the total mapping is exhaustive.
# Which means we'll arrive at a number that wasn't mapped to by any other valid index.
# This will take at most `prime-size` steps, and `prime-size` is on the order of log(size), so fast.
# But usually we just need to remap once.
x = self._map(x)
return x
#staticmethod
def _get_prime(size):
"""
Returns the prime number >= size which has the form (4n-1)
"""
n = size + (3 - size % 4)
while not _is_prime(n):
# We expect to find a prime after O(log(size)) iterations
# Using a brute-force primehood test, total complexity is O(log(size)*sqrt(size)), which is pretty good.
n = n + 4
return n
def _map(self, index):
a = self._permute_qpr(index)
b = (a + self.seed) % self.prime
c = self._permute_qpr(b)
return c
def _permute_qpr(self, x):
residue = pow(x, 2, self.prime)
if x * 2 < self.prime:
return residue
else:
return self.prime - residue
So one way is to keep track of which numbers you have already given out and keep handing out new random numbers one at a time, consider
import random
random.seed(0)
class RandomDeck:
def __init__(self):
self.usedNumbers = set()
def draw(self):
number = random.randint(0,2**32)
while number in self.usedNumbers:
number = random.randint(0,2**32)
self.usedNumbers.append(number)
return number
def shuffle(self):
self.usedNumbers = set()
As you can see we essentially have a deck of random numbers between 0 and 2^32 but we only store the numbers we have given out to ensure we don't have repeats. Then you can re-shuffle the deck by forgetting all the numbers you have already given out.
This should be efficient in most use cases as long as you don't draw ~1 million numbers without a reshuffle.

Random number generator that returns only one number each time

Does Python have a random number generator that returns only one random integer number each time when next() function is called? Numbers should not repeat and the generator should return random integers in the interval [1, 1 000 000] that are unique.
I need to generate more than million different numbers and that sounds as if it is very memory consuming in case all the number are generated at same time and stored in a list.
You are looking for a linear congruential generator with a full period. This will allow you to get a pseudo-random sequence of non-repeating numbers in your target number range.
Implementing a LCG is actually very simple, and looks like this:
def lcg(a, c, m, seed = None):
num = seed or 0
while True:
num = (a * num + c) % m
yield num
Then, it just comes down to choosing the correct values for a, c, and m to guarantee that the LCG will generate a full period (which is the only guarantee that you get non-repeating numbers). As the Wikipedia article explains, the following three conditions need to be true:
m and c need to be relatively prime.
a - 1 is divisible by all prime factors of m
a - 1 is divisible by 4, if m is also divisible by 4.
The first one is very easily guaranteed by simply choosing a prime for c. Also, this is the value that can be chosen last, and this will ultimately allow us to mix up the sequence a bit.
The relationship between a - 1 and m is more complicated though. In a full period LCG, m is the length of the period. Or in other words, it is the number range your numbers come from. So this is what you are usually choosing first. In your case, you want m to be around 1000000. Choosing exactly your maximum number might be difficult since that restricts you a lot (in both your choice of a and also c), so you can also choose numbers larger than that and simply skip all numbers outside of your range later.
Let’s choose m = 1000000 now though. The prime factors of m are 2 and 5. And it’s also obviously divisible by 4. So for a - 1, we need a number that is a multiple of 2 * 2 * 5 to satisfy the conditions 2 and 3. Let’s choose a - 1 = 160, so a = 161.
For c, we are using a random prime that’s somewhere in between of our range: c = 506903
Putting that into our LCG gives us our desired sequence. We can choose any seed value from the range (0 <= seed <= m) as the starting point of our sequence.
So let’s try it out and verify that what we thought of actually works. For this purpose, we are just collecting all numbers from the generator in a set until we hit a duplicate. At that point, we should have m = 1000000 numbers in the set:
>>> g = lcg(161, 506903, 1000000)
>>> numbers = set()
>>> for n in g:
if n in numbers:
raise Exception('Number {} already encountered before!'.format(n))
numbers.add(n)
Traceback (most recent call last):
File "<pyshell#5>", line 3, in <module>
raise Exception('Number {} already encountered before!'.format(n))
Exception: Number 506903 already encountered before!
>>> len(numbers)
1000000
And it’s correct! So we did create a pseudo-random sequence of numbers that allowed us to get non-repeating numbers from our range m. Of course, by design, this sequence will be always the same, so it is only random once when you choose those numbers. You can switch up the values for a and c to get different sequences though, as long as you maintain the properties mentioned above.
The big benefit of this approach is of course that you do not need to store all the previously generated numbers. It is a constant space algorithm as it only needs to remember the initial configuration and the previously generated value.
It will also not deteriorate as you get further into the sequence. This is a general problem with solutions that just keep generating a random number until a new one is found that hasn’t been encountered before. This is because the longer the list of generated numbers gets, the less likely you are going to hit a numbers that’s not in that list with an evenly distributed random algorithm. So getting the 1000000th number will likely take you a long time to generate with memory based random generators.
But of course, having this simply algorithm which just performs some multiplication and some addition does not appear very random. But you have to keep in mind that this is actually the basis for most pseudo-random number generators out there. So random.random() uses something like this internally. It’s just that the m is a lot larger, so you don’t notice it there.
If you really care about the memory you could use a NumPy array (or a Python array).
A one million NumPy array of int32 (more than enough to contain integers between 0 and 1 000 000) will only consume ~4MB, Python itself would require ~36MB (roughly 28byte per integer and 8 byte for each list element + overallocation) for an identical list:
>>> # NumPy array
>>> import numpy as np
>>> np.arange(1000000, dtype=np.int32).nbytes
4 000 000
>>> # Python list
>>> import sys
>>> import random
>>> l = list(range(1000000))
>>> random.shuffle(l)
>>> size = sys.getsizeof(l) # size of the list
>>> size += sum(sys.getsizeof(item) for item in l) # size of the list elements
>>> size
37 000 108
You only want unique values and you have a consecutive range (1 million requested items and 1 million different numbers), so you could simply shuffle the range and then yield items from your shuffled array:
def generate_random_integer():
arr = np.arange(1000000, dtype=np.int32)
np.random.shuffle(arr)
yield from arr
# yield from is equivalent to:
# for item in arr:
# yield item
And it can be called using next:
>>> gen = generate_random_integer()
>>> next(gen)
443727
However that will throw away the performance benefit of using NumPy, so in case you want to use NumPy don't bother with the generator and just perform the operations (vectorized - if possible) on the array. It consumes much less memory than Python and it could be orders of magnitude faster (factors of 10-100 faster are not uncommon!).
For a large number of non-repeating random numbers use an encryption. With a given key, encrypt the numbers: 0, 1, 2, 3, ... Since encryption is uniquely reversible then each encrypted number is guaranteed to be unique, provided you use the same key. For 64 bit numbers use DES. For 128 bit numbers use AES. For other size numbers use some Format Preserving Encryption. For pure numbers you might find Hasty Pudding cipher useful as that allows a large range of different bit sizes and non-bit sizes as well, like [0..5999999].
Keep track of the key and the last number you encrypted. When you need a new unique random number just encrypt the next number you haven't used so far.
Considering your numbers should fit in a 64bit integer, one million of them stored in a list would be up to 64 mega bytes plus the list object overhead, if your processing computer can afford that the easyest way is to use shuffle:
import random
randInts = list(range(1000000))
random.shuffle(randInts)
print(randInts)
Note that the other method is to keep track of the previously generated numbers, which will get you to the point of having all of them stored too.
I just needed that function, and to my huge surprise I haven't found anything that would suit my needs. #poke's answer didn't satisfy me because I needed to have precise borders, and other ones which included lists caused heaped memory.
Initially, I needed a function that would generate numbers from a to b, where a - b could be anything from 0 to 2^32 - 1, which means the range of those numbers could be as high as maximal 32-bit unsigned integer.
The idea of my own algorithm is simple both to understand and implement. It's a binary tree, where the next branch is chosen by 50/50 chance boolean generator. Basically, we divide all numbers from a to b into two branches, then decide from which one we yield the next value, then do that recursively until we end up with single nodes, which are also being picked up by random.
The depth of recursion is:
, which implies that for the given stack limit of 256, your highest range would be 2^256, which is impressive.
Things to note:
a must be lesser or equal b - otherwise no output will be displayed.
Boundaries are included, meaning unique_random_generator(0, 3) will generate [0, 1, 2, 3].
TL;DR - here's the code
import math, random
# a, b - inclusive
def unique_random_generator(a, b):
# corner case on wrong input
if a > b:
return
# end node of the tree
if a == b:
yield a
return
# middle point of tree division
c = math.floor((a + b) / 2)
generator_left = unique_random_generator(a, c) # left branch - contains all the numbers between 'a' and 'c'
generator_right = unique_random_generator(c + 1, b) # right branch - contains all the numbers between 'c + 1' and 'b'
has_values = True
while (has_values):
# decide whether we pick up a value from the left branch, or the right
decision = bool(random.getrandbits(1))
if decision:
next_left = next(generator_left, None)
# if left branch is empty, check the right one
if next_left == None:
next_right = next(generator_right, None)
# if both empty, current recursion's dessicated
if next_right == None:
has_values = False
else:
yield next_right
else:
yield next_left
next_right = next(generator_right, None)
if next_right != None:
yield next_right
else:
next_right = next(generator_right, None)
# if right branch is empty, check the left one
if next_right == None:
next_left = next(generator_left, None)
# if both empty, current recursion's dessicated
if next_left == None:
has_values = False
else:
yield next_left
else:
yield next_right
next_left = next(generator_left, None)
if next_left != None:
yield next_left
Usage:
for i in unique_random_generator(0, 2**32):
print(i)
import random
# number of random entries
x = 1000
# The set of all values
y = {}
while (x > 0) :
a = random.randint(0 , 10**10)
if a not in y :
a -= 1
This way you are sure you have perfectly random unique values
x represents the number of values you want
You can easily make one yourself:
from random import random
def randgen():
while True:
yield random()
ran = randgen()
next(ran)
next(ran)
...

Tetris Random Generator: random.choice with permutations versus random.shuffle

In some implementations of the game of Tetris, there is an algorithm called Random Generator which generates an infinite sequence of permutations of the set of one-sided tetrominoes based on the following algorithm:
Random Generator generates a sequence of all seven one-sided
tetrominoes (I, J, L, O, S, T, Z) permuted randomly, as if they were
drawn from a bag. Then it deals all seven tetrominoes to the piece
sequence before generating another bag.
Elements of this infinite sequence are only generated when necessary. i.e. a random permutation of 7 one-sided tetrominoes is appended to a queue of tetrominoes when more pieces are required than the queue can provide.
I believe there are two primary methods of doing this in Python.
The first method uses itertools.permutations and random.choice
import itertools, random, collections
bag = "IJLOSTZ"
bigbag = list(itertools.permutations(bag))
sequence = collections.deque(random.choice(bigbag))
sequence.extend(random.choice(bigbag))
sequence.extend(random.choice(bigbag))
# . . . Extend as necessary
The second method uses only random.shuffle.
import random, collections
bag = ['I', 'J', 'L', 'O', 'S', 'T', 'Z']
random.shuffle(bag)
sequence = collections.deque(bag)
random.shuffle(bag)
sequence.extend(bag)
random.shuffle(bag)
sequence.extend(bag)
# . . . Extend as necessary
What are the advantages/disadvantages of either method, assuming that the player of Tetris is skilled and the Random Generator must produce a large sequence of one-sided tetrominoes?
I'd say that the time to shuffle a tiny list is simply trivial, so don't worry about it. Either method should be "equally random", so there's no basis for deciding there.
But rather than muck with both lists and deques, I'd use a tile generator instead:
def get_tile():
from random import shuffle
tiles = list("IJLOSTZ")
while True:
shuffle(tiles)
for tile in tiles:
yield tile
Short, sweet, and obvious.
Making that peekable
Since I'm old, when I hear "peekable queue" I think "circular buffer". Allocate the memory for a fixed-size buffer once, and keep track of "the next item" with an index variable that wraps around. Of course this pays off a lot more in C than in Python, but for concreteness:
class PeekableQueue:
def __init__(self, item_getter, maxpeek=50):
self.getter = item_getter
self.maxpeek = maxpeek
self.b = [next(item_getter) for _ in range(maxpeek)]
self.i = 0
def pop(self):
result = self.b[self.i]
self.b[self.i] = next(self.getter)
self.i += 1
if self.i >= self.maxpeek:
self.i = 0
return result
def peek(self, n):
if not 0 <= n <= self.maxpeek:
raise ValueError("bad peek argument %r" % n)
nthruend = self.maxpeek - self.i
if n <= nthruend:
result = self.b[self.i : self.i + n]
else:
result = self.b[self.i:] + self.b[:n - nthruend]
return result
q = PeekableQueue(get_tile())
So you consume the next tile via q.pop(), and at any time you can get a list of the next n tiles that will be popped via q.peek(n). And there's no organic Tetris player in the universe fast enough for the speed of this code to make any difference at all ;-)
There are 7! = 5040 permutations of a sequence of 7 distinct objects. Thus, generating all permutations is very costly in terms of both time complexity (O(n!*n)) and space complexity (O(n!*n)). However choosing a random permutation from the sequence of permutations is easy. Let's look at the code for choice from random.py.
def choice(self, seq):
"""Choose a random element from a non-empty sequence."""
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
As you can see the calculation of the index is O(1) since len(seq) is O(1) for any sequence and self.random() is also O(1). Fetching an element from the list type in Python is also O(1) so the entire function is O(1).
On the other hand, using random.shuffle will swap the elements of your bag in-place. Thus it will use O(1) space complexity. However in terms of time complexity it is not so efficient. Let's look at the code for shuffle from random.py
def shuffle(self, x, random=None, int=int):
"""x, random=random.random -> shuffle list x in place; return None.
Optional arg random is a 0-argument function returning a random
float in [0.0, 1.0); by default, the standard random.random.
"""
if random is None:
random = self.random
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random() * (i+1))
x[i], x[j] = x[j], x[i]
random.shuffle implements the Fisher-Yates shuffle which, "is similar to randomly picking numbered tickets out of a hat, or cards from a deck, one after another until there are no more left." However the number of computations are clearly greater than the first method since len(x)-1 calls to random() must be made and it also requires len(x)-1 swap operations. Each swap operation requires 2 fetches from the list and the generation of a 2-tuple for unpacking and assignment.
Based on all of this information, I would guess that the first method mentioned uses up a lot of memory to store the permutations and requires O(n!*n) time complexity overhead, but in the long-run it is probably much more efficient than the second method and will likely keep the framerate stable in an actual implementation of a Tetris game since there will be less computations to do during the actual game loop. The permutations can be generated before the display is even initialized which is nice for giving the illusion that your game does not perform many computations.
Here I post finalized code using Tim Peters' suggestion of a generator and a circular buffer. Since the size of the circular buffer would be known before the buffer's creation, and it would never change, I did not implement all of the features that circular buffers usually have (you can find that on the Wikipedia article). In any case, it works perfectly for the Random Generator algorithm.
def random_generator():
import itertools, random
bag = "IJLOSTZ"
bigbag = list(itertools.permutations(bag))
while True:
for ost in random.choice(bigbag):
yield ost
def popleft_append(buff, start_idx, it):
""" This function emulates popleft and append from
collections.deque all in one step for a circular buffer
of size n which is always full,
The argument to parameter "it" must be an infinite
generator iterable, otherwise it.next() may throw
an exception at some point """
left_popped = buff[start_idx]
buff[start_idx] = it.next()
return (start_idx + 1) % len(buff), left_popped
def circular_peek(seq, start_idx):
return seq[start_idx:len(seq)] + seq[:start_idx]
# Example usage for peek queue of size 5
rg = random_generator()
pqsize = 5
# Initialize buffer using pqsize elements from generator
buff = [rg.next() for _ in xrange(pqsize)]
start_idx = 0
# Game loop
while True:
# Popping one OST (one-sided tetromino) from queue and
# adding a new one, also updating start_idx
start_idx, left_popped = popleft_append(buff, start_idx, rg)
# To show the peek queue currently
print circular_peek(buff, start_idx)

Categories