Too many permutations in python

Too many permutations in python - python

My program needs to have all the combinations of 2s and 0s in a list of lists. Ex: [[0,0,0,2],[0,0,2,0],[0,2,0,0]....]. I will always have n^2 elements in each sub list where there are n-1 times 2s. So I should have n^2!/((n^2-n)!*(n-1)!) results.
The problem is my code first calculates all the permutations, then removes the duplicates. So for n = 4 there will be 16! sublists, which crashes my computer. How can I fix this? (it needs to handle at least n = 8)
Here is the code:
servers = n*n #number of elements in each sublist
infected = n - 1 #number of 2s
grid = [ 0 for a in range(servers)] #list representing grid, with all 0s
grid = grid[:-infected] + infected * [2] #make last ones 2s
all_infections = list(itertools.permutations(grid)) # !!PROBLEM!! create all permutations of infection (touple)
all_infections = [list(a) for a in all_infections] # Convert touple to lists
all_infections.sort()
all_infections = list(k for k,_ in itertools.groupby(all_infections)) #remove duplicates
combinations = len(all_infections)
print (all_infections)
results = []
for index in range(combinations): #calculate the infected states
results = results + [grid_infecter(all_infections[index],n)]

Your main problem is the combinatorial explosion far beyond the actual problem requirements. As you said, the n=8 case needs only 64!/(57! 7!) results. Why store them all at once?
This leaves you two basic choices:
Write your own permutation routine. These are easy enough to find with a basic search, such as this one.
Build a generator stream from permutations() and eliminate the duplicates before they ever get into your list.
Like so:
def no_duplicate(gen):
previous = set()
for permutation in gen:
if permutation not in previous:
previous.add(permutation)
yield permutation
# Now set up a generator pipeline for the permutations
infection_stream = (no_duplicate(itertools.permutations(grid)))
result_stream = (grid_infecter(dish) for dish in infection_stream)
result_stream is a generator that you can use for whatever purpose you wish, such as:
results = [_ for _ in result_stream]
The magic of generators is that, so far, we have only one active permutation at any time. The unique ones are stored in that "previous" set in no_duplicates, but that's the only place you have a potential space problem. If that exceeds your computer's memory or your patience (after all the algorithm is O(n^2 !), then you'll need to write your own permutation generator so you can handle them one at a time without a long-term "remembering" device.

Related

Asymmetric Swaps - minimising max/min difference in list through swaps

Was doing some exercises in CodeChef and came across the Asymmetric Swaps problem:
Problem
Chef has two arrays 𝐴 and 𝐵 of the same size 𝑁.
In one operation, Chef can:
Choose two integers 𝑖 and 𝑗 (1 ≤ 𝑖,𝑗 ≤ 𝑁) and swap the elements 𝐴𝑖 and 𝐵𝑗.

Chef came up with a task to find the minimum possible value of (𝐴𝑚𝑎𝑥 − 𝐴𝑚𝑖𝑛) after performing the swap operation any (possibly zero) number of times.
Since Chef is busy, can you help him solve this task?
Note that 𝐴𝑚𝑎𝑥 and 𝐴𝑚𝑖𝑛 denote the maximum and minimum elements of the array 𝐴 respectively.
I have tried the below logic for the solution. But the logic fails for some test cases and I have no access to the failed test cases and where exactly the below code failed to meet the required output.
T = int(input())
for _ in range(T):
arraySize = int(input())
A = list(map(int, input().split()))
B = list(map(int, input().split()))
sortedList = sorted(A+B)
minLower = sortedList[arraySize-1] - sortedList[0] # First half of the sortedList
minUpper = sortedList[(arraySize*2)-1] - sortedList[arraySize] # Second half of the sortedList
print(min(minLower,minUpper))
I saw some submitted answers and didn't get the reason or logic why they are doing so. Can someone guide where am I missing?

The approach to sort the input into one list is the right one. But it is not enough to look at the left and the right half of that sorted list.
It could well be that there is another sublist of length 𝑁 that has its extreme values closer to each other.
Take for instance this input:
A = [1,4,5]
B = [6,11,12]
Then the sorted list is [1,4,5,6,11,12] and [4,5,6] is actually the sublist which minimises the difference between its maximum and minimum value.
So implement a loop where you select the minimum among A[i+N-1] - A[i].

How can I iterate through the result of itertools.product()?

I am trying to implement a Q-Learning algorithm, my state-space contains all possible combinations of numbers 0,1,2 in a vector of a given length.
Now I am trying to initialize a Q-Table full of zeros which would have the same amount of rows as my state-space. And then I want to in each step to run through the state space and check which of all possible state vector is right now. But that means I have to subscript an itertools.product()
How can I do that? because when I try to print it n-th vector from the product it shows an error that product is not subscriptable
I tried this:
import itertools
NUMBER_OF_SECTORS = 6
state_space = itertools.product(*[[0, 1, 2]] * NUMBER_OF_SECTORS)
length = len(list(state_space)) # 729
for obs in range(length):
print(list(state_space[obs]))
Also, is there a possibility, how can I rid the length variable? Because when I define the for loop as: for obs in range(len(list(state_space))) it is not executed at all.
Thank you very much

You can only iterate over an instance of product once: after that, it is consumed. list iterates over the instance in order to produce a list whose length you compute. Once you do that, the state space is gone; all you have left is the length.
You don't need to convert the state space to a list or compute its length; you can just iterate over it directly:
state_space = itertools.product([0,1,2], repeat=NUMBER_OF_SECTORS)
for state in state_space:
print(state)

Choosing python data structures to speed up algorithm implementation

So I'm given a large collection (roughly 200k) of lists. Each contains a subset of the numbers 0 through 27. I want to return two of the lists where the product of their lengths is greater than the product of the lengths of any other pair of lists. There's another condition, namely that the lists have no numbers in common.
There's an algorithm I found for this (can't remember the source, apologies for non-specificity of props) which exploits the fact that there are fewer total subsets of the numbers 0 through 27 than there are words in the dictionary.
The first thing I've done is looped through all the lists, found the unique subset of integers that comprise it and indexed it as a number between 0 and 1<<28. As follows:
def index_lists(lists):
index_hash = {}
for raw_list in lists:
length = len(raw_list)
if length > index_hash.get(index,{}).get("length"):
index = find_index(raw_list)
index_hash[index] = {"list": raw_list, "length": length}
return index_hash
This gives me the longest list and the length of the that list for each subset that's actually contained in the collection of lists given. Naturally, not all subsets from 0 to (1<<28)-1 are necessarily included, since there's not guarantee the supplied collection has a list containing each unique subset.
What I then want, for each subset 0 through 1<<28 (all of them this time) is the longest list that contains at most that subset. This is the part that is killing me. At a high level, it should, for each subset, first check to see if that subset is contained in the index_hash. It should then compare the length of that entry in the hash (if it exists there) to the lengths stored previously in the current hash for the current subset minus one number (this is an inner loop 27 strong). The greatest of these is stored in this new hash for the current subset of the outer loop. The code right now looks like this:
def at_most_hash(index_hash):
most_hash = {}
for i in xrange(1<<28): # pretty sure this is a bad idea
max_entry = index_hash.get(i)
if max_entry:
max_length = max_entry["length"]
max_word = max_entry["list"]
else:
max_length = 0
max_word = []
for j in xrange(28): # again, probably not great
subset_index = i & ~(1<<j) # gets us a pre-computed subset
at_most_entry = most_hash.get(subset_index, {})
at_most_length = at_most_entry.get("length",0)
if at_most_length > max_length:
max_length = at_most_length
max_list = at_most_entry["list"]
most_hash[i] = {"length": max_length, "list": max_list}
return most_hash
This loop obviously takes several forevers to complete. I feel that I'm new enough to python that my choice of how to iterate and what data structures to use may have been completely disastrous. Not to mention the prospective memory problems from attempting to fill the dictionary. Is there perhaps a better structure or package to use as data structures? Or a better way to set up the iteration? Or maybe I can do this more sparsely?
The next part of the algorithm just cycles through all the lists we were given and takes the product of the subset's max_length and complementary subset's max length by looking them up in at_most_hash, taking the max of those.
Any suggestions here? I appreciate the patience for wading through my long-winded question and less than decent attempt at coding this up.
In theory, this is still a better approach than working with the collection of lists alone since that approach is roughly o(200k^2) and this one is roughly o(28*2^28 + 200k), yet my implementation is holding me back.

Given that your indexes are just ints, you could save some time and space by using lists instead of dicts. I'd go further and bring in NumPy arrays. They offer compact storage representation and efficient operations that let you implicitly perform repetitive work in C, bypassing a ton of interpreter overhead.
Instead of index_hash, we start by building a NumPy array where index_array[i] is the length of the longest list whose set of elements is represented by i, or 0 if there is no such list:
import numpy
index_array = numpy.zeros(1<<28, dtype=int) # We could probably get away with dtype=int8.
for raw_list in lists:
i = find_index(raw_list)
index_array[i] = max(index_array[i], len(raw_list))
We then use NumPy operations to bubble up the lengths in C instead of interpreted Python. Things might get confusing from here:
for bit_index in xrange(28):
index_array = index_array.reshape([1<<(28-bit_index), 1<<bit_index])
numpy.maximum(index_array[::2], index_array[1::2], out=index_array[1::2])
index_array = index_array.reshape([1<<28])
Each reshape call takes a new view of the array where data in even-numbered rows corresponds to sets with the bit at bit_index clear, and data in odd-numbered rows corresponds to sets with the bit at bit_index set. The numpy.maximum call then performs the bubble-up operation for that bit. At the end, each cell index_array[i] of index_array represents the length of the longest list whose elements are a subset of set i.
We then compute the products of lengths at complementary indices:
products = index_array * index_array[::-1] # We'd probably have to adjust this part
# if we picked dtype=int8 earlier.
find where the best product is:
best_product_index = products.argmax()
and the longest lists whose elements are subsets of the set represented by best_product_index and its complement are the lists we want.

This is a bit too long for a comment so I will post it as an answer. One more direct way to index your subsets as integers is to use "bitsets" with each bit in the binary representation corresponding to one of the numbers.
For example, the set {0,2,3} would be represented by 20 + 22 + 23 = 13 and {4,5} would be represented by 24 + 25 = 48
This would allow you to use simple lists instead of dictionaries and Python's generic hashing function.

Generate a random number from a list, not including one of the numbers

I am trying to create a list of sequential integers from 0 to n, then after picking a randon integer from that list, generate another random integer from the same list that doesn't include the integer previously generated.
n = 10
a = np.arange(1,n) #Creating my initial list
for b=np.random.choice(a): #Generating my first random number
c=np.random.choice(np.arange(1,b)) or np.random.choice(np.arange(b+1,n))
I know that this won't work because my for loop is pretty iffy. I haven't used python in a long time and I am just starting a project and getting myself back into it is proving to be a little tricky!

I think the procedure you are trying to perform is random sampling without replacement.
Let's say you want to pick k numbers:
import numpy as np
n = 10
k = 3
a = np.arange(1,n) #Creating my initial list
numbers = np.random.choice(a, k, replace=False)

Another answer:
1) Generate your initial list.
2) Shuffle the list. If there is no library function to do it, then use the Fisher-Yates shuffle. Hint: there is a big time saver here.
3) Pick the first number from the shuffled list. This is your initial number.
4) Pick the second number from the shuffled list. This is your second number that is both (almost) random and not the same as the first number.

Help on method sample in module random:
sample(self, population, k) method of random.Random instance
Chooses k unique random elements from a population sequence.
Returns a new list containing elements from the population while
leaving the original population unchanged. The resulting list is
in selection order so that all sub-slices will also be valid random
samples. This allows raffle winners (the sample) to be partitioned
into grand prize and second place winners (the subslices).
Members of the population need not be hashable or unique. If the
population contains repeats, then each occurrence is a possible
selection in the sample.
To choose a sample in a range of integers, use xrange as an argument.
This is especially fast and space efficient for sampling from a
large population: sample(xrange(10000000), 60)
Then to pick k random non-repeated numbers in range [0, n] you can do this:
import random
result_list = random.sample(xrange(n + 1), k)

4-sum algorithm in Python [duplicate]

This question already has answers here:
Quadratic algorithm for 4-SUM
(3 answers)
Closed 9 years ago.
I am trying to find whether a list has 4 elements that sum to 0 (and later find what those elements are). I'm trying to make a solution based off the even k algorithm described at https://cs.stackexchange.com/questions/2973/generalised-3sum-k-sum-problem.
I get this code in Python using combinations from the standard library
def foursum(arr):
seen = {sum(subset) for subset in combinations(arr,2)}
return any(-x in seen for x in seen)
But this fails for input like [-1, 1, 2, 3]. It fails because it matches the sum (-1+1) with itself. I think this problem will get even worse when I want to find the elements because you can separate a set of 4 distinct items into 2 sets of 2 items in 6 ways: {1,4}+{-2,-3}, {1,-2}+{4,-3} etc etc.
How can I make an algorithm that correctly returns all solutions avoiding this problem?
EDIT: I should have added that I want to use as efficient algorithm as possible. O(len(arr)^4) is too slow for my task...

This works.
import itertools
def foursum(arr):
seen = {}
for i in xrange(len(arr)):
for j in xrange(i+1,len(arr)):
if arr[i]+arr[j] in seen: seen[arr[i]+arr[j]].add((i,j))
else: seen[arr[i]+arr[j]] = {(i,j)}
for key in seen:
if -key in seen:
for (i,j) in seen[key]:
for (p,q) in seen[-key]:
if i != p and i != q and j != p and j != q:
return True
return False
EDIT
This can be made more pythonic i think, I don't know enough python.

It is normal for the 4SUM problem to permit input elements to be used multiple times. For instance, given the input (2 3 1 0 -4 -1), valid solutions are (3 1 0 -4) and (0 0 0 0).
The basic algorithm is O(n^2): Use two nested loops, each running over all the items in the input, to form all sums of pairs, storing the sums and their components in some kind of dictionary (hash table, AVL tree). Then scan the pair-sums, reporting any quadruple for which the negative of the pair-sum is also present in the dictionary.
If you insist on not duplicating input elements, you can modify the algorithm slightly. When computing the two nested loops, start the second loop beyond the current index of the first loop, so no input elements are taken twice. Then, when scanning the dictionary, reject any quadruples that include duplicates.
I discuss this problem at my blog, where you will see solutions in multiple languages, including Python.

First note that the problem is O(n^4) in worst case, since the output size might be of O(n^4) (you are looking for finding all solutions, not only the binary problem).
Proof:
Take an example of [-1]*(n/2).extend([1]*(n/2)). you need to "choose" two instances of -1 w/o repeats - (n/2)*(n/2-1)/2 possibilities, and 2 instances of 1 w/o repeats - (n/2)*(n/2-1)/2 possibilities. This totals in (n/2)*(n/2-1)*(n/2)*(n/2-1)/4 which is in Theta(n^4)
Now, that we understood we cannot achieve O(n^2logn) worst case, we can get to the following algorithm (pseudo-code), that should scale closer to O(n^2logn) for "good" cases (few identical sums), and get O(n^4) worst case (as expected).
Pseudo-code:
subsets <- all subsets of size of indices (not values!)
l <- empty list
for each s in subsets:
#appending a triplet of (sum,idx1,idx2):
l.append(( arr[s[0]] + arr[s[1]], s[0],s[1]))
sort l by first element (sum) in each tupple
for each x in l:
binary search l for -x[0] #for the sum
for each element y that satisfies the above:
if x[1] != y[1] and x[2] != y[1] and x[1] != y[2] and x[2] != y[2]:
yield arr[x[1]], arr[x[2]], arr[y[1]], arr[y[2]]
Probably a pythonic way to do the above will be more elegant and readable, but I am not a python expert I am afraid.

EDIT: Ofcourse the algorithm shall be atleast as time complex as per the solution size!
If the number of possible solutions is not 'large' as compared to n, then
A suggested solution in O(N^3):
Find pair-wise sums of all elements and build a NxN matrix of the sums.
For each element in this matrix, build a struct that would have sumValue, row and column as it fields.
Sort all these N^2 struct elements in a 1D array. (in O(N^2 logN) time).
For each element x in this array, conduct a binary search for its partner y such that x + y = 0 (O(logn) per search).
Now if you find a partner y, check if its row or column field matches with the element x. If so, iterate sequentially in both directions until either there is no more such y.
If you find some y's that do not have a common row or column with x, then increment the count (or print the solution).
This iteration can at most take 2N steps because the length of rows and columns is N.
Hence the total order of complexity for this algorithm shall be O(N^2 * N) = O(N^3)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Too many permutations in python - python

Related

Asymmetric Swaps - minimising max/min difference in list through swaps

How can I iterate through the result of itertools.product()?

Choosing python data structures to speed up algorithm implementation

Generate a random number from a list, not including one of the numbers

4-sum algorithm in Python [duplicate]

Categories

Resources