Generate permutations of list by swapping specific elements - python

I'm trying to write a function that generates all possible configurations of a list by swapping certain allowable pairs of elements.
For example, if we have the list:
lst = [1, 2, 3, 4, 5]
And we only allow the swapping of the following pairs of elements:
pairs = [[0, 2], [4, 1]]
i.e., we can only swap the 0th element if the list with the 2nd, and the 4th element with the 1st (there can be any number of allowed pairs of swaps).
I would like the function to return the number of distinct configurations of the list given the allowable swaps.
Since I'm planning on running this for large lists and many allowable swaps, it would be preferable for the function to be as efficient as possible.
I've found examples that generate permutations by swapping all the elements, two at a time, but I can't find a way to specify certain pairs of allowable swaps.

You've been lured off other productive paths by the common term "swap". Switch your attack. Instead, note that you need the product of [a[0], a[2]] and [a[1], a[4]] to get all the possible permutations. You take each of these products (four of them) and distribute the elements in your result sets in the proper sequence. It will look vaguely like this ... I'm using Python as pseudo-code, to some extent.
seq = itertools.product([a[0], a[2]], [a[1], a[4]])
for soln in seq:
# each solution "soln" is a list of 4 elements to be distributed.
# Construct a permutation "b" by putting each in its proper place.
# Map the first two soln values to b[0] and b[2];
# and the last two values to b[1] and b[4]
b = [soln[0], soln[2], soln[1], a[3], soln[4]]
Can you take it from there? That's the idea; I'll leave you to generalize the algorithm.

Related

Is there any way I can change my code to make python run it faster?

I created a function max_points that compares two argument strings and returns a certain value in relation to a separately given criterion that involves summing up the values ga, la, ldif, and lgap. It also returns the list of combinations of the strings that reach this certain value. The strings s and t go through a process of running through their respective anagrams with up to n gaps (in this case, the gap is '_'). Here in an example of what the function should return:
In [3]: max_points('AT_', 'A_T', 2, 5, 1, 0, 2)
Out[3]: (16, [['_A_T_', '_A_T_'],
['A__T_', 'A__T_'],
['A_T__', 'A_T__']])
The code I have right now is this:
def max_points(s, t, ga, la, ldif, lgap, n = 1):
lst_s=generate_n_gaps(s, n)
lst_t=generate_n_gaps(t, n)
point_max=-9999
for i in lst_s:
for j in lst_t:
if len(i)==len(j):
point=pointage(i, j, ga, la, ldif, lgap)
if point>=point_max:
point_max=point
ultimate=[]
for i in lst_s:
for j in lst_t:
if len(i)==len(j) and pointage(i, j, ga, la, ldif, lgap)==point_max:
specific=[]
specific.append(i)
specific.append(j)
ultimate.append(specific)
return point_max, ultimate
The other functions, generate_n_gaps and pointage (not shown) work as follows:
generate_n_gaps: Returns a list of all the combinations of the argument strings with up to n gaps.
pointage: Compares only the two argument strings s and t (not all their combinations) and returns an integer value that goes through the same criterion as the max_points function.
You can see that, if the length of the argument strings s and t are larger than 4 or 5 and if n is larger than 2, then the function ends up outputting quite a large amount of lists. I suspect that is why it takes longer than 2 or 3 seconds for some inputs. Is there any way I can make my code for this specific function faster (<1 sec of runtime)? Or might the problem lie on the other non-specified functions used?
One obvious issue here is that you're looping through all i,j combinations twice: once to calculate the maximum value, and then a second time to return all (i,j) combinations that achieve this maximum.
It would probably be more efficient to do this in a single pass. Something like:
point_max=-9999
# or better yet, -math.inf
ultimate=[]
for i in lst_s:
for j in lst_t:
if len(i)==len(j):
point=pointage(i, j, ga, la, ldif, lgap)
if point>point_max:
point_max=point
ultimate=[]
if point==point_max:
specific=[]
specific.append(i)
specific.append(j)
ultimate.append(specific)
This should approximately halve your run-time.
If i and j have many different possible lengths, you might also be able to achieve savings by blocking up the comparisons. Instead of simply looping through lst_s and lst_t, split these lists up by length (use a dict structure keyed by length, with each value being the subset of lst_s or lst_t having that length). Then iterate through all possible lengths, checking only the s- and t-values of that length against one another. This is a bit more work to set up, but may be useful depending on how many comparisons it saves you.
You haven't included the code for max_points but I would be looking hard at that to see if there are any possible savings there; you're going to be calling it a lot, so you want to make it as efficient as possible.
More advanced options include parallelisation, and making use of specific information about the "score" function to do more precise blocking of your score calls. But try the simple stuff first and see if that does the job.

Given a set t of tuples containing elements from the set S, what is the most efficient way to build another set whose members are not contained in t?

For example, suppose I had an (n,2) dimensional tensor t whose elements are all from the set S containing random integers. I want to build another tensor d with size (m,2) where individual elements in each tuple are from S, but the whole tuples do not occur in t.
E.g.
S = [0,1,2,3,7]
t = [[0,1],
[7,3],
[3,1]]
d = some_algorithm(S,t)
/*
d =[[2,1],
[3,2],
[7,4]]
*/
What is the most efficient way to do this in python? Preferably with pytorch or numpy, but I can work around general solutions.
In my naive attempt, I just use
d = np.random.choice(S,(m,2))
non_dupes = [i not in t for i in d]
d = d[non_dupes]
But both t and S are incredibly large, and this takes an enormous amount of time (not to mention, rarely results in a (m,2) array). I feel like there has to be some fancy tensor thing I can do to achieve this, or maybe making a large hash map of the values in t so checking for membership in t is O(1), but this produces the same issue just with memory. Is there a more efficient way?
An approximate solution is also okay.
my naive attempt would be a base-transformation function to reduce the problem to an integer set problem:
definitions and assumptions:
let S be a set (unique elements)
let L be the number of elements in S
let t be a set of M-tuples with elements from S
the original order of the elements in t is irrelevant
let I(x) be the index function of the element x in S
let x[n] be the n-th tuple-member of an element of t
let f(x) be our base-transform function (and f^-1 its inverse)
since S is a set we can write each element in t as a M digit number to the base L using elements from S as digits.
for M=2 the transformation looks like
f(x) = I(x[1])*L^1 + I(x[0])*L^0
f^-1(x) is also rather trivial ... x mod L to get back the index of the least significant digit. floor(x/L) and repeat until all indices are extracted. lookup the values in S and construct the tuple.
since now you can represet t as an integer set (read hastable) calculating the inverse set d becomes rather trivial
loop from L^(M-1) to (L^(M+1)-1) and ask your hashtable if the element is in t or d
if the size of S is too big you can also just draw random numbers against the hashtable for a subset of the inverse of t
does this help you?
If |t| + |d| << |S|^2 then the probability of some random tuple to be chosen again (in a single iteration) is relatively small.
To be more exact, if (|t|+|d|) / |S|^2 = C for some constant C<1, then if you redraw an element until it is a "new" one, the expected number of redraws needed is 1/(1-C).
This means, that by doing this, and redrawing elements until this is a new element, you get O((1/(1-C)) * |d|) times to process a new element (on average), which is O(|d|) if C is indeed constant.
Checking is an element is already "seen" can be done in several ways:
Keeping hash sets of t and d. This requires extra space, but each lookup is constant O(1) time. You could also use a bloom filter instead of storing the actual elements you already seen, this will make some errors, saying an element is already "seen" though it was not, but never the other way around - so you will still get all elements in d as unique.
Inplace sorting t, and using binary search. This adds O(|t|log|t|) pre-processing, and O(log|t|) for each lookup, but requires no additional space (other then where you store d).
If in fact, |d| + |t| is very close to |S|^2, then an O(|S|^2) time solution could be to use Fisher Yates shuffle on the available choices, and choosing the first |d| elements that do not appear in t.

Fast way of getting all sublists of a list

I recently bombed a coding interview because I wasn't able to generate all possible sublists of a list fast enough. More specifically: (using python)
We're given a list of string numbers ["1", "3", "2", ...]
How many sublists of this list of size 6, concateanted, are dividable by 16?
Note that though the elements in the original list may not be unique, you should treat them as unique when constructing your sublists. E.g. for [1, 1, 1] a sublist of the first two 1's and the last two 1's are different sublists.
Using itertools.combinations I was able to generate all my sublists fast enough, but then looping through all those sublists to determine which one's were "dividable" by 16 was too slow.
So is there a way to create the sublists at the same speed (or faster) than itertools.combations, checking while each sublist as I'm creating it to see if they're divisable by 16?
Any insight would be very appreciated!
Sort the list.
find the smallest list (by length)having sum at least 16 and divisible by it(say s).
then check all size list from s to 6.
That should reduce the number of size exponentially since bigger the length of sublist the lesser the number of sublists

Invertable Cartesian Product Elements/Index Translation Function

I have a problem where I need to identify the elements found at an indexed position within
the Cartesian product of a series of lists but also, the inverse, i.e. identify the indexed position from a unique combination of elements from a series of lists.
I've written the following code which performs the task reasonably well:
import numpy as np
def index_from_combination(meta_list_shape, index_combination ):
list_product = np.prod(meta_list_shape)
m_factor = np.cumprod([[l] for e,l in enumerate([1]+meta_list_shape)])[0:len(meta_list_shape)]
return np.sum((index_combination)*m_factor,axis=None)
def combination_at_index(meta_list_shape, index ):
il = len(meta_list_shape)-1
list_product = np.prod(meta_list_shape)
assert index < list_product
m_factor = np.cumprod([[l] for e,l in enumerate([1]+meta_list_shape)])[0:len(meta_list_shape)][::-1]
idxl = []
for e,m in enumerate(m_factor):
if m<=index:
idxl.append((index//m))
index = (index%m)
else:
idxl.append(0)
return idxl[::-1]
e.g.
index_from_combination([3,2],[2,1])
>> 5
combination_at_index([3,2],5)
>> [2,1]
Where [3,2] describes a series of two lists, one containing 3 elements, and the other containing 2 elements. The combination [2,1] denotes a permutation consisting of the 3rd element (zero-indexing) from the 1st list, and the 2nd element (again zero-indexed) from the second list.
...if a little clunkily (and, to save space, one that ignores the actual contents of the lists, and instead works with indexes used elsewhere to fetch the contents from those lists - that's not important here though).
N.B. What is important is that my functions mirror one another such that:
F(a)==b and G(b)==a
i.e. they are the inverse of one another.
From the linked question, it turns out I can replace the second function with the one-liner:
list(itertools.product(['A','B','C'],['P','Q','R'],['X','Y']))[index]
Which will return the unique combination of values for a supplied index integer (though with some question-mark in my mind about how much of that list is instantiated in memory - but again, that's not necessarily important right now).
What I'm asking is, itertools appears to have been built with these types of problems in mind - is there an equally neat one-line inverse to the itertools.product function that, given a combination, e.g. ['A','Q','Y'] will return an integer describing that combination's position within the cartesian product, such that this integer, if fed into the itertools.product function will return the original combination?
Imagine those combinations as two dimensional X-Y coordinates and use subscript to linear-index conversion and vice-verse. Thus, use NumPy's built-ins np.ravel_multi_index for getting the linear index and np.unravel_index for the subscript indices, which becomes your index_from_combination and combination_at_index respectively.
It's a simple translation and doesn't generate any combination whatsoever, so should be a breeze.
Sample run to make things clearer -
In [861]: np.ravel_multi_index((2,1),(3,2))
Out[861]: 5
In [862]: np.unravel_index(5, (3,2))
Out[862]: (2, 1)
The math is simple enough to be implemented if you don't want to NumPy dependency for some reason -
def index_from_combination(a, b):
return b[0]*a[1] + b[1]
def combination_at_index(a, b):
d = b//a[1]
r = b - a[1]*d
return d, r
Sample run -
In [881]: index_from_combination([3,2],[2,1])
Out[881]: 5
In [882]: combination_at_index([3,2],5)
Out[882]: (2, 1)

maintaining hierarchically sorted lists in python

I'm not sure if 'hierarchical' is the correct way to label this problem, but I have a series of lists of integers that I'm intending to keep in 2D numpy array that I need to keep sorted in the following way:
array[0,:] = [1, 1, 1, 1, 2, 2, 2, 2, ...]
array[1,:] = [1, 1, 2, 2, 1, 1, 2, 2, ...]
array[2,:] = [1, 2, 1, 2, 1, 2, 1, 2, ...]
...
...
array[n,:] = [...]
So the first list is sorted, then the second list is broken into subsections of elements which all have the same value in the first list and those subsections are sorted, and so on down all the lists.
Initially each list will contain only one integer, and I'll then receive new columns that I need to insert into the array in such a way that it remains sorted as discussed above.
The purpose of keeping the lists in this order is that if I'm given a new column of integers I need to check whether an exact copy of that column exists in the array or not as efficiently as possible, and I assume this ordering will help me do it. It may be that there is a better way to make that check than keeping the lists like this - if you have thoughts about that please mention them!
I assume the correct position for a new column can be found by a series of binary searches but my attempts have been messy - any thoughts on doing this in a tidy and efficient way?
thanks!
If I understand your problem correctly, you have a bunch of sequences of numbers that you need to process, but you need to be able to tell if the latest one is a duplicate of one of the sequences you've processed before. Currently you're trying to insert the new sequences as columns in a numpy array, but that's awkward since numpy is really best with fixed-sized arrays (concatenating or inserting things is always going to be slow).
A much better data structure for your needs is a set. Membership tests and the addition of new items on a set are both very fast (amortized O(1) time complexity). The only limitation is that a set's items must be hashable (which is true for tuples, but not for lists or numpy arrays).
Here's the outline of some code you might be able to use:
seen = set()
for seq in sequences:
tup = tuple(sequence) # you only need to make a tuple if seq is not already hashable
if tup not in seen:
seen.add(tup)
# do whatever you want with seq here, it has not been seen before
else:
pass # if you want to do something with duplicated sequences, do it here
You can also look at the unique_everseen recipe in the itertools documentation, which does basically the same as the above, but as a well-optimized generator function.

Categories