Python: Memory-efficient random sampling of list of permutations

Python: Memory-efficient random sampling of list of permutations - python

I am seeking to sample n random permutations of a list in Python.
This is my code:
obj = [ 5 8 9 ... 45718 45719 45720]
#type(obj) = numpy.ndarray
pairs = random.sample(list(permutations(obj,2)),k= 150)
Although the code does what I want it to, it causes memory issues. I sometimes receive the error Memory error when running on CPU, and when running on GPU, my virtual machine crashes.
How can I make the code work in a more memory-efficient manner?

This avoids using permutations at all:
count = len(obj)
def index2perm(i,obj):
i1, i2 = divmod(i,len(obj)-1)
if i1 <= i2:
i2 += 1
return (obj[i1],obj[i2])
pairs = [index2perm(i,obj) for i in random.sample(range(count*(count-1)),k=3)]

Building on Pablo Ruiz's excellent answer, I suggest wrapping his sampling solution into a generator function that yields unique permutations by keeping track of what it has already yielded:
import numpy as np
def unique_permutations(sequence, r, n):
"""Yield n unique permutations of r elements from sequence"""
seen = set()
while len(seen) < n:
# This line of code adapted from Pablo Ruiz's answer:
candidate_permutation = tuple(np.random.choice(sequence, r, replace=False))
if candidate_permutation not in seen:
seen.add(candidate_permutation)
yield candidate_permutation
obj = list(range(10))
for permutation in unique_permutations(obj, 2, 15):
# do something with the permutation
# Or, to save the result as a list:
pairs = list(unique_permutations(obj, 2, 15))
My assumption is that you are sampling a small subset of the very large number of possible permutations, in which case collisions will be rare enough that keeping a seen set will not be expensive.
Warnings: this function is an infinite loop if you ask for more permutations than are possible given the inputs. It will also get increasingly slow an n gets close to the number of possible permutations, since collisions will get increasingly frequent.
If I were to put this function in my code base, I would put a shield at the top that calculated the number of possible permutations and raised a ValueError exception if n exceeded that number, and maybe output a warning if n exceeded one tenth that number, or something like that.

You can avoid listing the permutation iterator that could be massive in memory. You can generate random permutations by sampling the list with replace=False.
import numpy as np
obj = np.array([5,8,123,13541,42])
k = 15
permutations = [tuple(np.random.choice(obj, 2, replace=False)) for _ in range(k)]
print(permutations)
This problem becomes much harder, if you for example impose no repetition in your random permutations.
Edit, no repetitions code
I think this is the best possible approach for the non repetition case.
We index all possible permutations from 1 to n**2-n in a permutation matrix where the diagonal should be avoided. We sample the indexes without repetitions and without listing them, then we map the samples to the coordinates of the permutations and then we get the permutations from the indexes of matrix.
import random
import numpy as np
obj = np.array([1,2,3,10,43,19,323,142,334,33,312,31,12])
k = 150
obj_len = len(obj)
indexes = random.sample(range(obj_len**2-obj_len), k)
def mapm(m):
return m + m //(obj_len) +1
permutations = [(obj[mapm(i)//obj_len], obj[mapm(i)%obj_len]) for i in indexes]
This approach is not based on any assumption, does not load the permutations and also the performance is not based on a while loop failing to insert duplicates, as no duplicates are ever generated.

Related

Fast Python algorithm for random partitioning with subset sums equal or close to given ratios

This question is an extension of my previous question: Fast python algorithm to find all possible partitions from a list of numbers that has subset sums equal to a ratio
. I want to divide a list of numbers so that the ratios of subset sums equal to given values. The difference is now I have a long list of 200 numbers so that a enumeration is infeasible. Note that although there are of course same numbers in the list, every number is distinguishable.
import random
lst = [random.randrange(10) for _ in range(200)]
In this case, I want a function to stochastically sample a certain amount of partitions with subset sums equal or close to the given ratios. This means that the solution can be sub-optimal, but I need the algorithm to be fast enough. I guess a Greedy algorithm will do. With that being said, of course it would be even better if there is a relatively fast algorithm that can give the optimal solution.
For example, I want to sample 100 partitions, all with subset sum ratios of 4 : 3 : 3. Duplicate partitions are allowed but should be very unlikely for such long list. The function should be used like this:
partitions = func(numbers=lst, ratios=[4, 3, 3], num_gen=100)
To test the solution, you can do something like:
from math import isclose
eps = 0.05
assert all([isclose(ratios[i] / sum(ratios), sum(x) / sum(lst), abs_tol=eps)
for part in partitions for i, x in enumerate(part)])
Any suggestions?

You can use a greedy heuristic where you generate each partition from num_gen random permutations of the list. Each random permutation is partitioned into len(ratios) contiguous sublists. The fact that the partition subsets are sublists of a permutation make enforcing the ratio condition very easy to do during sublist generation: as soon as the sum of the sublist we are currently building reaches one of the ratios, we "complete" the sublist, add it to the partition and start creating a new sublist. We can do this in one pass through the entire permutation, giving us the following algorithm of time complexity O(num_gen * len(lst)).
M = 100
N = len(lst)
P = len(ratios)
R = sum(ratios)
S = sum(lst)
for _ in range(M):
# get a new random permutation
random.shuffle(lst)
partition = []
# starting index (in the permutation) of the current sublist
lo = 0
# permutation partial sum
s = 0
# index of sublist we are currently generating (i.e. what ratio we are on)
j = 0
# ratio partial sum
rs = ratios[j]
for i in range(N):
s += lst[i]
# if ratio of permutation partial sum exceeds ratio of ratio partial sum,
# the current sublist is "complete"
if s / S >= rs / R:
partition.append(lst[lo:i + 1])
# start creating new sublist from next element
lo = i + 1
j += 1
if j == P:
# done with partition
# remaining elements will always all be zeroes
# (i.e. assert should never fail)
assert all(x == 0 for x in lst[i+1:])
partition[-1].extend(lst[i+1:])
break
rs += ratios[j]
Note that the outer loop can be redesigned to loop indefinitely until num_gen good partitions are generated (rather than just looping num_gen times) for more robustness. This algorithm is expected to produce M good partitions in O(M) iterations (provided random.shuffle is sufficiently random) if the number of good partitions is not too small compared to the total number of partitions of the same size, so it should perform well for for most inputs. For an (almost) uniformly random list like [random.randrange(10) for _ in range(200)], every iteration produces a good partition with eps = 0.05 as is evident by running the example below. Of course, how well the algorithm performs will also depend on the definition of 'good' -- the stricter the closeness requirement (in other words, the smaller the epsilon), the more iterations it will take to find a good partition. This implementation can be found here, and will work for any input (assuming random.shuffle eventually produces all permutations of the input list).
You can find a runnable version of the code (with asserts to test how "good" the partitions are) here.

Iterate though all permutations randomly

I have imported an large array and I want to iterate through all row permutations at random.
The code is designed to break if a certain array produces the desired solution.
The attempt so far involves your normal iterative perturbation procedure:
import numpy as np
import itertools
file = np.loadtxt("my_array.csv", delimiter=", ")
for i in itertools.permutations(file):
** do something **
if condition:
break
However, I would like the iterations to cover all perturbation and at random, with no repeats.
Ideally, (unlike random iteration in Python) I would also avoid storing all permutations of the array in memory.
Therefore a generator based solution would be best.
Is there a simple solution?

The answer is to first write a function that given an integer k in [0, n!) returns the kth permutation:
def unrank(n, k):
pi = np.arange(n)
while n > 0:
pi[n-1], pi[k % n] = pi[k % n], pi[n-1]
k //= n
n -= 1
return pi
This technique is found in Ranking and unranking permutations in linear time by Wendy Myrvold and Frank Ruskey.
Then, if we can generate a random permutation of [0, n!) we are done. We can find a technique for this (without having to construct the whole permutation) in Sometimes-Recurse Shuffle by Ben Morris and Phillip Rogaway. I have an implementation of it available here.
Then, all we have to do is:
import math
a = np.array(...) # Load data.
p = SometimeShuffle(math.factorial(len(a)), "some_random_seed")
for kth_perm in p:
shuffled_indices = unrank(len(a), kth_perm)
shuffled_a = a[shuffled_indices]

Fastest way to find sub-lists of a fixed length, from a given list of values, whose elements sum equals a defined number

In Python 3.6, suppose that I have a list of numbers L, and that I want to find all possible sub-lists S of a given pre-chosen length |S|, such that:
any S has to have length smaller than L, that is |S| < |L|
any S can only contain numbers present in L
numbers in S do not have to be unique (they can appear repeatedly)
the sum of all numbers in S should be equal to a pre-determined number N
A trivial solution for this can be found using the Cartesian Product with itertools.product. For example, suppose L is a simple list of all integers between 1 and 10 (inclusive) and |S| is chosen to be 3. Then:
import itertools
L = range(1,11)
N = 8
Slength = 3
result = [list(seq) for seq in itertools.product(L, repeat=Slength) if sum(seq) == N]
However, as larger lists L are chosen, and or larger |S|, the above approach becomes extremely slow. In fact, even for L = range(1,101) with |S|=5 and N=80, the computer almost freezes and it takes approximately an hour to compute the result.
My take is that:
there is a lot of unnecessary computations going on there under the hood, given the condition that sub-lists should sum to N
there is a ton of cache misses due to iterating over possibly millions of lists generated by itertools.product to just keep much much fewer
So, my question/challenge is: is there a way I can do this in a more computationally efficient way? Unless we are talking hundreds of Gigabytes, speed to me is more critical than memory, so the challenge focuses more on speed, even if considerations for memory efficiency are a welcome bonus.

So given an input list and a target length and sum, you want all the permutations of the numbers in the input list such that:
The sum equals the target sum
The length equals the target length
The following code should be faster:
# Input
input_list = range(1,101)
# Targets
target_sum = 15
target_length = 5
# Available numbers
numbers = set(input_list)
# Initialize the stack
stack = [[num] for num in numbers]
result = []
# Loop until we run out of permutations
while stack:
# Get a permutation from the stack
current = stack.pop()
# If it's too short
if len(current) < target_length:
# And the sum is too small
if sum(current) < target_sum:
# Then for each available number
for num in numbers:
# Append said number and put the resulting permutation back into the stack
stack.append(current + [num])
# If it's not too short and the sum equals the target, add to the result!
elif sum(current) == target_sum:
result.append(current)
print(len(result))

Python Lottery Number Generation

I am working on a lottery number generation program. I have a fixed list of allowed numbers (1-80) from which users can choose 6 numbers. Each number can only be picked once. I want to generate all possible combinations efficiently. Current implementation takes more than 30 seconds if allowed_numbers is [1,...,60]. Above that, it freezes my system.
from itertools import combinations
import numpy as np
LOT_SIZE = 6
allowed_numbers = np.arange(1, 61)
all_combinations = np.array(list(combinations(allowed_numbers, LOT_SIZE)))
print(len(all_combinations))
I think I would need a numpy array (not sure if 2D). Something like,
[[1,2,3,4,5,6],
[1,2,3,4,5,,7],...]
because I want to (quickly) perform several operations on these combinations. These operations may include,
Removing combinations that have only even numbers
Removing combinations who's sum is greater than 150 etc.
Checking if there is only one pair of consecutive numbers (Acceptable: [1,2,4,6,8,10] {Pair: (1,2)}| Not-acceptable: [1,2,4,5,7,9] {Pairs: (1,2) and (4,5)} )
Any help will be appreciated.
Thanks

Some options:
1) apply filters on the iterable instead of on the data, using filter:
def filt(x):
return sum(x) < 7
list(filter(filt, itertools.combinations(allowed, n)))
will save ~15% time vs. constructing the list and applying the filters then, i.e.:
[i for i in itertools.combinations(allowed, n) if filt(i) if filt(i)]
2) Use np.fromiter
arr = np.fromiter(itertools.chain.from_iterable(itertools.combinations(allowed, n)), int).reshape(-1, n)
return arr[arr.sum(1) < 7]
3) work on the generator object itself. In the example above, you can stop the itertools.combinations when the first number is above 7 (as an example):
def my_generator():
for i in itertools.combinations(allowed, n):
if i[0] >= 7:
return
elif sum(i) < 7:
yield i
list(my_generator()) # will build 3x times faster than option 1
Note that np.fromiter becomes less efficient on compound expressions, so the mask is applied afterwards

You can use itertools.combinations(allowed_numbers, 6) to get all combinations of length 6 from your list (this is the fastest way to get this operation done).

Random Sample of N Distinct Permutations of a List

Suppose I have a Python list of arbitrary length k. Now, suppose I would like a random sample of n , (where n <= k!) distinct permutations of that list. I was tempted to try:
import random
import itertools
k = 6
n = 10
mylist = list(range(0, k))
j = random.sample(list(itertools.permutations(mylist)), n)
for i in j:
print(i)
But, naturally, this code becomes unusably slow when k gets too large. Given that the number of permutations that I may be looking for n is going to be relatively small compared to the total number of permutations, computing all of the permutations is unnecessary. Yet it's important that none of the permutations in the final list are duplicates.
How would you achieve this more efficiently? Remember, mylist could be a list of anything, I just used list(range(0, k)) for simplicity.

You can generate permutations, and keep track of the ones you have already generated. To make it more versatile, I made a generator function:
import random
k = 6
n = 10
mylist = list(range(0, k))
def perm_generator(seq):
seen = set()
length = len(seq)
while True:
perm = tuple(random.sample(seq, length))
if perm not in seen:
seen.add(perm)
yield perm
rand_perms = perm_generator(mylist)
j = [next(rand_perms) for _ in range(n)]
for i in j:
print(i)

Naïve implementation
Bellow the naïve implementation I did (well implemented by #Tomothy32, pure PSL using generator):
import numpy as np
mylist = np.array(mylist)
perms = set()
for i in range(n): # (1) Draw N samples from permutations Universe U (#U = k!)
while True: # (2) Endless loop
perm = np.random.permutation(k) # (3) Generate a random permutation form U
key = tuple(perm)
if key not in perms: # (4) Check if permutation already has been drawn (hash table)
perms.update(key) # (5) Insert into set
break # (6) Break the endless loop
print(i, mylist[perm])
It relies on numpy.random.permutation which randomly permute a sequence.
The key idea is:
to generate a new random permutation (index randomly permuted);
to check if permutation already exists and store it (as tuple of int because it must hash) to prevent duplicates;
Then to permute the original list using the index permutation.
This naïve version does not directly suffer to factorial complexity O(k!) of itertools.permutations function which does generate all k! permutations before sampling from it.
About Complexity
There is something interesting about the algorithm design and complexity...
If we want to be sure that the loop could end, we must enforce N <= k!, but it is not guaranteed. Furthermore, assessing the complexity requires to know how many time the endless-loop will actually loop before a new random tuple is found and break it.
Limitation
Let's encapsulate the function written by #Tomothy32:
import math
def get_perms(seq, N=10):
rand_perms = perm_generator(mylist)
return [next(rand_perms) for _ in range(N)]
For instance, this call work for very small k<7:
get_perms(list(range(k)), math.factorial(k))
But will fail before O(k!) complexity (time and memory) when k grows because it boils down to randomly find a unique missing key when all other k!-1 keys have been found.
Always look on the bright side...
On the other hand, it seems the method can generate a reasonable amount of permuted tuples in a reasonable amount of time when N<<<k!. Example, it is possible to draw more than N=5000 tuples of length k where 10 < k < 1000 in less than one second.
When k and N are kept small and N<<<k!, then the algorithm seems to have a complexity:
Constant versus k;
Linear versus N.
This is somehow valuable.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Memory-efficient random sampling of list of permutations - python

This avoids using permutations at all: count = len(obj) def index2perm(i,obj): i1, i2 = divmod(i,len(obj)-1) if i1 <= i2: i2 += 1 return (obj[i1],obj[i2]) pairs = [index2perm(i,obj) for i in random.sample(range(count*(count-1)),k=3)]

Related

Fast Python algorithm for random partitioning with subset sums equal or close to given ratios

Iterate though all permutations randomly

Fastest way to find sub-lists of a fixed length, from a given list of values, whose elements sum equals a defined number

Python Lottery Number Generation

Random Sample of N Distinct Permutations of a List

Categories

Resources