I have imported an large array and I want to iterate through all row permutations at random.
The code is designed to break if a certain array produces the desired solution.
The attempt so far involves your normal iterative perturbation procedure:
import numpy as np
import itertools
file = np.loadtxt("my_array.csv", delimiter=", ")
for i in itertools.permutations(file):
** do something **
if condition:
break
However, I would like the iterations to cover all perturbation and at random, with no repeats.
Ideally, (unlike random iteration in Python) I would also avoid storing all permutations of the array in memory.
Therefore a generator based solution would be best.
Is there a simple solution?
The answer is to first write a function that given an integer k in [0, n!) returns the kth permutation:
def unrank(n, k):
pi = np.arange(n)
while n > 0:
pi[n-1], pi[k % n] = pi[k % n], pi[n-1]
k //= n
n -= 1
return pi
This technique is found in Ranking and unranking permutations in linear time by Wendy Myrvold and Frank Ruskey.
Then, if we can generate a random permutation of [0, n!) we are done. We can find a technique for this (without having to construct the whole permutation) in Sometimes-Recurse Shuffle by Ben Morris and Phillip Rogaway. I have an implementation of it available here.
Then, all we have to do is:
import math
a = np.array(...) # Load data.
p = SometimeShuffle(math.factorial(len(a)), "some_random_seed")
for kth_perm in p:
shuffled_indices = unrank(len(a), kth_perm)
shuffled_a = a[shuffled_indices]
Related
I am trying to solve this math problem in python, and I'm not sure what it is called:
The answer X is always 100
Given a list of 5 integers, their sum would equal X
Each integer has to be between 1 and 25
The integers can appear one or more times in the list
I want to find all the possible unique lists of 5 integers that match.
These would match:
20,20,20,20,20
25,25,25,20,5
10,25,19,21,25
along with many more.
I looked at itertools.permutations, but I don't think that handles duplicate integers in the list. I'm thinking there must be a standard math algorithm for this, but my search queries must be poor.
Only other thing to mention is if it matters that the list size could change from 10 integers to some other length (6, 24, etc).
This is a constraint satisfaction problem. These can often be solved by a method called linear programming: You fix one part of the solution and then solve the remaining subproblem. In Python, we can implement this approach with a recursive function:
def csp_solutions(target_sum, n, i_min=1, i_max=25):
domain = range(i_min, i_max + 1)
if n == 1:
if target_sum in domain:
return [[target_sum]]
else:
return []
solutions = []
for i in domain:
# Check if a solution is still possible when i is picked:
if (n - 1) * i_min <= target_sum - i <= (n - 1) * i_max:
# Construct solutions recursively:
solutions.extend([[i] + sol
for sol in csp_solutions(target_sum - i, n - 1)])
return solutions
all_solutions = csp_solutions(100, 5)
This yields 23746 solutions, in agreement with the answer by Alex Reynolds.
Another approach with Numpy:
#!/usr/bin/env python
import numpy as np
start = 1
end = 25
entries = 5
total = 100
a = np.arange(start, end + 1)
c = np.array(np.meshgrid(a, a, a, a, a)).T.reshape(-1, entries)
assert(len(c) == pow(end, entries))
s = c.sum(axis=1)
#
# filter all combinations for those that meet sum criterion
#
valid_combinations = c[np.where(s == total)]
print(len(valid_combinations)) # 23746
#
# filter those combinations for unique permutations
#
unique_permutations = set(tuple(sorted(x)) for x in valid_combinations)
print(len(unique_permutations)) # 376
You want combinations_with_replacement from itertools library. Here is what the code would look like:
from itertools import combinations_with_replacement
values = [i for i in range(1, 26)]
candidates = []
for tuple5 in combinations_with_replacement(values, 5):
if sum(tuple5) == 100:
candidates.append(tuple5)
For me on this problem I get 376 candidates. As mentioned in the comments above if these are counted once for each arrangement of the 5-pair, then you'd want to look at all, permutations of the 5 candidates-which may not be all distinct. For example (20,20,20,20,20) is the same regardless of how you arrange the indices. However, (21,20,20,20,19) is not-this one has some distinct arrangements.
I think that this could be what you are searching for: given a target number SUM, a left treshold L, a right treshold R and a size K, find all the possible lists of K elements between L and R which sum gives SUM. There isn't a specific name for this problem though, as much as I was able to find.
I am seeking to sample n random permutations of a list in Python.
This is my code:
obj = [ 5 8 9 ... 45718 45719 45720]
#type(obj) = numpy.ndarray
pairs = random.sample(list(permutations(obj,2)),k= 150)
Although the code does what I want it to, it causes memory issues. I sometimes receive the error Memory error when running on CPU, and when running on GPU, my virtual machine crashes.
How can I make the code work in a more memory-efficient manner?
This avoids using permutations at all:
count = len(obj)
def index2perm(i,obj):
i1, i2 = divmod(i,len(obj)-1)
if i1 <= i2:
i2 += 1
return (obj[i1],obj[i2])
pairs = [index2perm(i,obj) for i in random.sample(range(count*(count-1)),k=3)]
Building on Pablo Ruiz's excellent answer, I suggest wrapping his sampling solution into a generator function that yields unique permutations by keeping track of what it has already yielded:
import numpy as np
def unique_permutations(sequence, r, n):
"""Yield n unique permutations of r elements from sequence"""
seen = set()
while len(seen) < n:
# This line of code adapted from Pablo Ruiz's answer:
candidate_permutation = tuple(np.random.choice(sequence, r, replace=False))
if candidate_permutation not in seen:
seen.add(candidate_permutation)
yield candidate_permutation
obj = list(range(10))
for permutation in unique_permutations(obj, 2, 15):
# do something with the permutation
# Or, to save the result as a list:
pairs = list(unique_permutations(obj, 2, 15))
My assumption is that you are sampling a small subset of the very large number of possible permutations, in which case collisions will be rare enough that keeping a seen set will not be expensive.
Warnings: this function is an infinite loop if you ask for more permutations than are possible given the inputs. It will also get increasingly slow an n gets close to the number of possible permutations, since collisions will get increasingly frequent.
If I were to put this function in my code base, I would put a shield at the top that calculated the number of possible permutations and raised a ValueError exception if n exceeded that number, and maybe output a warning if n exceeded one tenth that number, or something like that.
You can avoid listing the permutation iterator that could be massive in memory. You can generate random permutations by sampling the list with replace=False.
import numpy as np
obj = np.array([5,8,123,13541,42])
k = 15
permutations = [tuple(np.random.choice(obj, 2, replace=False)) for _ in range(k)]
print(permutations)
This problem becomes much harder, if you for example impose no repetition in your random permutations.
Edit, no repetitions code
I think this is the best possible approach for the non repetition case.
We index all possible permutations from 1 to n**2-n in a permutation matrix where the diagonal should be avoided. We sample the indexes without repetitions and without listing them, then we map the samples to the coordinates of the permutations and then we get the permutations from the indexes of matrix.
import random
import numpy as np
obj = np.array([1,2,3,10,43,19,323,142,334,33,312,31,12])
k = 150
obj_len = len(obj)
indexes = random.sample(range(obj_len**2-obj_len), k)
def mapm(m):
return m + m //(obj_len) +1
permutations = [(obj[mapm(i)//obj_len], obj[mapm(i)%obj_len]) for i in indexes]
This approach is not based on any assumption, does not load the permutations and also the performance is not based on a while loop failing to insert duplicates, as no duplicates are ever generated.
This question is an extension of my previous question: Fast python algorithm to find all possible partitions from a list of numbers that has subset sums equal to a ratio
. I want to divide a list of numbers so that the ratios of subset sums equal to given values. The difference is now I have a long list of 200 numbers so that a enumeration is infeasible. Note that although there are of course same numbers in the list, every number is distinguishable.
import random
lst = [random.randrange(10) for _ in range(200)]
In this case, I want a function to stochastically sample a certain amount of partitions with subset sums equal or close to the given ratios. This means that the solution can be sub-optimal, but I need the algorithm to be fast enough. I guess a Greedy algorithm will do. With that being said, of course it would be even better if there is a relatively fast algorithm that can give the optimal solution.
For example, I want to sample 100 partitions, all with subset sum ratios of 4 : 3 : 3. Duplicate partitions are allowed but should be very unlikely for such long list. The function should be used like this:
partitions = func(numbers=lst, ratios=[4, 3, 3], num_gen=100)
To test the solution, you can do something like:
from math import isclose
eps = 0.05
assert all([isclose(ratios[i] / sum(ratios), sum(x) / sum(lst), abs_tol=eps)
for part in partitions for i, x in enumerate(part)])
Any suggestions?
You can use a greedy heuristic where you generate each partition from num_gen random permutations of the list. Each random permutation is partitioned into len(ratios) contiguous sublists. The fact that the partition subsets are sublists of a permutation make enforcing the ratio condition very easy to do during sublist generation: as soon as the sum of the sublist we are currently building reaches one of the ratios, we "complete" the sublist, add it to the partition and start creating a new sublist. We can do this in one pass through the entire permutation, giving us the following algorithm of time complexity O(num_gen * len(lst)).
M = 100
N = len(lst)
P = len(ratios)
R = sum(ratios)
S = sum(lst)
for _ in range(M):
# get a new random permutation
random.shuffle(lst)
partition = []
# starting index (in the permutation) of the current sublist
lo = 0
# permutation partial sum
s = 0
# index of sublist we are currently generating (i.e. what ratio we are on)
j = 0
# ratio partial sum
rs = ratios[j]
for i in range(N):
s += lst[i]
# if ratio of permutation partial sum exceeds ratio of ratio partial sum,
# the current sublist is "complete"
if s / S >= rs / R:
partition.append(lst[lo:i + 1])
# start creating new sublist from next element
lo = i + 1
j += 1
if j == P:
# done with partition
# remaining elements will always all be zeroes
# (i.e. assert should never fail)
assert all(x == 0 for x in lst[i+1:])
partition[-1].extend(lst[i+1:])
break
rs += ratios[j]
Note that the outer loop can be redesigned to loop indefinitely until num_gen good partitions are generated (rather than just looping num_gen times) for more robustness. This algorithm is expected to produce M good partitions in O(M) iterations (provided random.shuffle is sufficiently random) if the number of good partitions is not too small compared to the total number of partitions of the same size, so it should perform well for for most inputs. For an (almost) uniformly random list like [random.randrange(10) for _ in range(200)], every iteration produces a good partition with eps = 0.05 as is evident by running the example below. Of course, how well the algorithm performs will also depend on the definition of 'good' -- the stricter the closeness requirement (in other words, the smaller the epsilon), the more iterations it will take to find a good partition. This implementation can be found here, and will work for any input (assuming random.shuffle eventually produces all permutations of the input list).
You can find a runnable version of the code (with asserts to test how "good" the partitions are) here.
I have a task for uni that I need help with.
We were given a code that sorts lists correctly, but isn't "well thought out".
I cant find the logical flaw in how it works.
Somthing about the n loop using the result of the m loop.
Heres the code:
from random import randint
numbers = [randint(0,9) for x in range(20)] #random array for testing the sort
#sorting
for n in range(0, len(numbers)-1):
for m in range(n + 1, len(numbers)):
if numbers[n] > numbers[m]:
a = numbers[n]
numbers[n] = numbers[m]
numbers[m] = a
#correctly sorted list
print(numbers)
from random import randint
numbers = [randint(0,9) for x in range(20)]
n = len(numbers)
for i in range(n):
for m in range(1, n-i):
# change < to > to reverse the order
if numbers[m-1] < numbers[m]:
(numbers[m-1], numbers[m]) = (numbers[m], numbers[m-1])
print(numbers)
Untested! The n loop has been taken out of the equation but the variable is still referenced through the m loop. This way you are only using the value of m to sort the list without being dependent on comparison to n.
This algorithm here is called Bubble Sort, in which course did you get this homework for? if its algorithms \ data structures than I can come up with an issue that may fit, the thing with bubble sort is that its θ(n^2), which means, its best case (list already sorted) and the worst case (list is sorted backwards) have the exact same time complexity, you will always do n^2 passes over the list, obviously you can come up with a better algorithm to reduce the time complexity. (view Insertion Sort or Merge Sort to learn more)
Suppose I have a Python list of arbitrary length k. Now, suppose I would like a random sample of n , (where n <= k!) distinct permutations of that list. I was tempted to try:
import random
import itertools
k = 6
n = 10
mylist = list(range(0, k))
j = random.sample(list(itertools.permutations(mylist)), n)
for i in j:
print(i)
But, naturally, this code becomes unusably slow when k gets too large. Given that the number of permutations that I may be looking for n is going to be relatively small compared to the total number of permutations, computing all of the permutations is unnecessary. Yet it's important that none of the permutations in the final list are duplicates.
How would you achieve this more efficiently? Remember, mylist could be a list of anything, I just used list(range(0, k)) for simplicity.
You can generate permutations, and keep track of the ones you have already generated. To make it more versatile, I made a generator function:
import random
k = 6
n = 10
mylist = list(range(0, k))
def perm_generator(seq):
seen = set()
length = len(seq)
while True:
perm = tuple(random.sample(seq, length))
if perm not in seen:
seen.add(perm)
yield perm
rand_perms = perm_generator(mylist)
j = [next(rand_perms) for _ in range(n)]
for i in j:
print(i)
Naïve implementation
Bellow the naïve implementation I did (well implemented by #Tomothy32, pure PSL using generator):
import numpy as np
mylist = np.array(mylist)
perms = set()
for i in range(n): # (1) Draw N samples from permutations Universe U (#U = k!)
while True: # (2) Endless loop
perm = np.random.permutation(k) # (3) Generate a random permutation form U
key = tuple(perm)
if key not in perms: # (4) Check if permutation already has been drawn (hash table)
perms.update(key) # (5) Insert into set
break # (6) Break the endless loop
print(i, mylist[perm])
It relies on numpy.random.permutation which randomly permute a sequence.
The key idea is:
to generate a new random permutation (index randomly permuted);
to check if permutation already exists and store it (as tuple of int because it must hash) to prevent duplicates;
Then to permute the original list using the index permutation.
This naïve version does not directly suffer to factorial complexity O(k!) of itertools.permutations function which does generate all k! permutations before sampling from it.
About Complexity
There is something interesting about the algorithm design and complexity...
If we want to be sure that the loop could end, we must enforce N <= k!, but it is not guaranteed. Furthermore, assessing the complexity requires to know how many time the endless-loop will actually loop before a new random tuple is found and break it.
Limitation
Let's encapsulate the function written by #Tomothy32:
import math
def get_perms(seq, N=10):
rand_perms = perm_generator(mylist)
return [next(rand_perms) for _ in range(N)]
For instance, this call work for very small k<7:
get_perms(list(range(k)), math.factorial(k))
But will fail before O(k!) complexity (time and memory) when k grows because it boils down to randomly find a unique missing key when all other k!-1 keys have been found.
Always look on the bright side...
On the other hand, it seems the method can generate a reasonable amount of permuted tuples in a reasonable amount of time when N<<<k!. Example, it is possible to draw more than N=5000 tuples of length k where 10 < k < 1000 in less than one second.
When k and N are kept small and N<<<k!, then the algorithm seems to have a complexity:
Constant versus k;
Linear versus N.
This is somehow valuable.