Adding conditions to python function `set_partitions` - python

I want to work with a given group of elements and reorder them as the function set_partitions do but adding a condition to this lists, I will try to explain my problem in a understandable and simple way. A example of how this function works,
from more_itertools import set_partitions
a=[1,2,3]
print(list(set_partitions(a)))
And return the possible agrupations,
[[[1, 2, 3]],
[[1], [2, 3]],
[[1, 2], [3]],
[[2], [1, 3]],
[[1], [2], [3]]]
I want to put a constrain in this function (for example could be, the groups created need to sum a even number). I can simply extract the arrays and apply the constrains, but my problem is that I need to do it for a huge number of list elements, and my computer dies before finishing. If I apply the constrain directly to the function set_partitions the problem will be reduced because the constrain applied is very strong, i mean, for example it reduce a total amount of 4213597 million possible combinations to 12k valid combinations (but i can not go further because of memory problems), so applying the condition will solve the problem.
For this I need to understand the function set_partitions, this function can be used only copying the following code without needing other auxiliar functions, or to import anything,
def set_partitions(iterable, k=None):
L = list(iterable)
n = len(L)
def set_partitions_helper(L, k):
n = len(L)
if k == 1:
yield [L]
elif n == k:
yield [[s] for s in L]
else:
e, *M = L
for p in set_partitions_helper(M, k - 1):
yield [[e], *p]
for p in set_partitions_helper(M, k):
for i in range(len(p)):
yield p[:i] + [[e] + p[i]] + p[i + 1 :]
for k in range(1, n + 1):
yield from set_partitions_helper(L, k)
I' have experience programming and is my work but I do not arrive to this level, so there is some things in the function that I do not understand, as for example what is doing,
e, *M = L or the following elements in the helper function.
I need some help understanding how work this function for finding where the condition can be placed.
Thanks a lot!
Addition explaining my specific case:
The elements I'm trying to group in partitions are matrixes in tensorial product, for example chains of fixed lenght for example,
A element will be a chain of this style, that we can represent as a string,
'xz11y'
I want the elements of the subgroups to commute, 1 commute with everything, x/y/z only commutes with itself (and the identity), so we need to compare the chains only, so for example if we have the 3 matrixes,
['1z'],['zz'],['x1']
A valid agrupation will be
'1z','zz' | 'x1'
Because '1z' commutes with zz.
A invalid agrupation will be
'1z' | 'zz', 'x1'
Because 'zz' and 'x1' have a x and a z in same position, and this operators do not commute.
This is simplified, so I have a Hamiltonian of about 19 or more separate operators of this type and I need to find the agrupations that conserve this commutation.

Related

How to modify this power set function to only include 'valid' subsets?

I am using a binomial tree as a list with 2^T - 1 'nodes' and want to create a set of subsets that work within some given criteria (outlined below) on the elements of the list. Right now, I use the following code to generate a tree
def gen_nodes(T):
nodes = []
for t in range(T):
for i in range(2**t):
nodes += [[t,1 + i]]
return nodes
For instance, for T = 1, we get the root
gen_nodes(1) = [[0,1]],
but for T = 2 and T = 3, we get
gen_nodes(2) = [[0,1], [1,1], [1,2]]
gen_nodes(3) = [[0,1], [1,1], [1,2], [2,1], [2,2], [2,3], [2,4]],
et cetera. Right now, I'm using a powerset function courtesy of this wonderful contributor,
def powerset(s):
x = len(s)
masks = [1 << i for i in range(x)]
for i in range(1 << x):
yield [ss for mask, ss in zip(masks, s) if i & mask]
This has worked great for me, but as I'm sure you've caught at this point, the length of the powerset gets entirely too large with the time complexity of something like O(2^(2^T)). Initially, I was going to just create the entire set by brute force and then apply constraints on valid subsets after creating the larger set of subsets, but it seem's like I'm going to run into some overflow problems if I don't modify the powerset function with those constraints.
Basically, I only want the lists e within the output of ls = list(powerset(gen_nodes(T))) such that for all i from 0:len(e), e[i] in e implies [e[i] - 1, e[i]] OR [e[i] - 1, e[i] - 1] are in e.
Returning to the binary tree analog, this is basically saying for all [t,i] in [0,t] x [1,t+1], [t,i] in e only if [t-1,i] OR [t-1,i-1] in e, basically if [t,i] is in e, there there must be at least one "path" from [0,1] to [t,i] where each node in the path is also in e. I suspect this will condense the size of subsets output immensely, but I'm unsure of how to implement it. I think I might have to forgo using the powerset function, but I'm not sure how to code it in that case, and would therefore appreciate any help I can get.
EDIT: I should include desired output as commented. Additionally, I've included the function that has been 'working' for me thus far, but it's horribly inefficient currently. First, let pset be the function that solves this problem. and pg(i) = pset(gen_nodes(i)) for brevity. Then
pg(1) = [[0,1]],
pg(2) = [[[0, 1]], [[0, 1], [1, 1]], [[0, 1],
[1, 2]], [[0, 1], [1, 1], [1, 2]]]
Unfortunately, this set still grows very fast (pg(3) is 17 lists of length up to 6 pairs, pg(4) is 97 lists of length up to 10 pairs, etc), so I can't share much more on this post. However, I did develop a function that works, but seems to be horribly inefficient (pg(6) takes half a second, and then pg(7) takes 4 minutes to complete). It is attached below:
import time
def pset(lst):
pw_set = [[]]
start_time = time.time()
for i in range(0,len(lst)):
for j in range(0,len(pw_set)):
ele = pw_set[j].copy()
if lst[i] == [0,1]:
ele = ele + [lst[i]]
pw_set = pw_set + [ele]
else:
if [lst[i][0] - 1,lst[i][1]] in ele or [lst[i][0] - 1,lst[i][1] - 1] in ele:
ele = ele + [lst[i]]
pw_set = pw_set + [ele]
print("--- %s seconds ---" % (time.time() - start_time))
return pw_set[1:]
Here, I just checked if the 'node' being added had at least one of the nodes preceding it in the set: if not, it was skipped. I checked up to pg(3) and the output is as desired, so I'm thinking it's working, just inefficient. Thus, I've (seemingly) solved the memory overflow problem, now I just need to make this efficient.

Splitting list into 2 parts, as equal to sum as possible

I'm trying to wrap my head around this whole thing and I can't seem to figure it out. Basically, I have a list of ints. Adding up those int values equals 15. I want to split up a list into 2 parts, but at the same time, making each list as close as possible to each other in total sum. Sorry if I'm not explaining this good.
Example:
list = [4,1,8,6]
I want to achieve something like this:
list = [[8, 1][6,4]]
adding the first list up equals 9, and the other equals 10. That's perfect for what I want as they are as close as possible.
What I have now:
my_list = [4,1,8,6]
total_list_sum = 15
def divide_chunks(l, n):
# looping till length l
for i in range(0, len(l), n):
yield l[i:i + n]
n = 2
x = list(divide_chunks(my_list, n))
print (x)
But, that just splits it up into 2 parts.
Any help would be appreciated!
You could use a recursive algorithm and "brute force" partitioning of the list. Starting with a target difference of zero and progressively increasing your tolerance to the difference between the two lists:
def sumSplit(left,right=[],difference=0):
sumLeft,sumRight = sum(left),sum(right)
# stop recursion if left is smaller than right
if sumLeft<sumRight or len(left)<len(right): return
# return a solution if sums match the tolerance target
if sumLeft-sumRight == difference:
return left, right, difference
# recurse, brutally attempting to move each item to the right
for i,value in enumerate(left):
solution = sumSplit(left[:i]+left[i+1:],right+[value], difference)
if solution: return solution
if right or difference > 0: return
# allow for imperfect split (i.e. larger difference) ...
for targetDiff in range(1, sumLeft-min(left)+1):
solution = sumSplit(left, right, targetDiff)
if solution: return solution
# sumSplit returns the two lists and the difference between their sums
print(sumSplit([4,1,8,6])) # ([1, 8], [4, 6], 1)
print(sumSplit([5,3,2,2,2,1])) # ([2, 2, 2, 1], [5, 3], 1)
print(sumSplit([1,2,3,4,6])) # ([1, 3, 4], [2, 6], 0)
Use itertools.combinations (details here). First let's define some functions:
def difference(sublist1, sublist2):
return abs(sum(sublist1) - sum(sublist2))
def complement(sublist, my_list):
complement = my_list[:]
for x in sublist:
complement.remove(x)
return complement
The function difference calculates the "distance" between lists, i.e, how similar the sums of the two lists are. complement returns the elements of my_list that are not in sublist.
Finally, what you are looking for:
def divide(my_list):
lower_difference = sum(my_list) + 1
for i in range(1, int(len(my_list)/2)+1):
for partition in combinations(my_list, i):
partition = list(partition)
remainder = complement(partition, my_list)
diff = difference(partition, remainder)
if diff < lower_difference:
lower_difference = diff
solution = [partition, remainder]
return solution
test1 = [4,1,8,6]
print(divide(test1)) #[[4, 6], [1, 8]]
test2 = [5,3,2,2,2,1]
print(divide(test2)) #[[5, 3], [2, 2, 2, 1]]
Basically, it tries with every possible division of sublists and returns the one with the minimum "distance".
If you want to make it a a little bit faster you could return the first combination whose difference is 0.
I think what you're looking for is a hill climbing algorithm. I'm not sure this will cover all cases but at least works for your example. I'll update this if I think of a counter example or something.
Let's call your list of numbers vals.
vals.sort(reverse=true)
a,b=[],[]
for v in vals:
if sum(a)<sum(b):
a.append(v)
else:
b.append(v)

Optimize testing all combinations of rows from multiple NumPy arrays

I have three NumPy arrays of ints, same number of columns, arbitrary number of rows each. I am interested in all instances where a row of the first one plus a row of the second one gives a row of the third one ([3, 1, 4] + [1, 5, 9] = [4, 6, 13]).
Here is a pseudo-code:
for i, j in rows(array1), rows(array2):
if i + j is in rows(array3):
somehow store the rows this occured at (eg. (1,2,5) if 1st row of
array1 + 2nd row of array2 give 5th row of array3)
I will need to run this for very big matrices so I have two questions:
(1) I can write the above using nested loops but is there a quicker way, perhaps list comprehensions or itertools?
(2) What is the fastest/most memory-efficient way to store the triples? Later I will need to create a heatmap using two as coordinates and the first one as the corresponding value eg. point (2,5) has value 1 in the pseudo-code example.
Would be very grateful for any tips - I know this sounds quite simple but it needs to run fast and I have very little experience with optimization.
edit: My ugly code was requested in comments
import numpy as np
#random arrays
A = np.array([[-1,0],[0,-1],[4,1], [-1,2]])
B = np.array([[1,2],[0,3],[3,1]])
C = np.array([[0,2],[2,3]])
#triples stored as numbers with 2 coordinates in a otherwise-zero matrix
output_matrix = np.zeros((B.shape[0], C.shape[0]), dtype = int)
for i in range(A.shape[0]):
for j in range(B.shape[0]):
for k in range(C.shape[0]):
if np.array_equal((A[i,] + B[j,]), C[k,]):
output_matrix[j, k] = i+1
print(output_matrix)
We can leverage broadcasting to perform all those summations and comparison in a vectorized manner and then use np.where on it to get the indices corresponding to the matching ones and finally index and assign -
output_matrix = np.zeros((B.shape[0], C.shape[0]), dtype = int)
mask = ((A[:,None,None,:] + B[None,:,None,:]) == C).all(-1)
I,J,K = np.where(mask)
output_matrix[J,K] = I+1
(1) Improvements
You can use sets for the final result in the third matrix, as a + b = c must hold identically. This already replaces one nested loop with a constant-time lookup. I will show you an example of how to do this below, but we first ought to introduce some notation.
For a set-based approach to work, we need a hashable type. Lists will thus not work, but a tuple will: it is an ordered, immutable structure. There is, however, a problem: tuple addition is defined as appending, that is,
(0, 1) + (1, 0) = (0, 1, 1, 0).
This will not do for our use-case: we need element-wise addition. As such, we subclass the built-in tuple as follows,
class AdditionTuple(tuple):
def __add__(self, other):
"""
Element-wise addition.
"""
if len(self) != len(other):
raise ValueError("Undefined behaviour!")
return AdditionTuple(self[idx] + other[idx]
for idx in range(len(self)))
Where we override the default behaviour of __add__. Now that we have a data-type amenable to our problem, let's prepare the data.
You give us,
A = [[-1, 0], [0, -1], [4, 1], [-1, 2]]
B = [[1, 2], [0, 3], [3, 1]]
C = [[0, 2], [2, 3]]
To work with. I say,
from types import SimpleNamespace
A = [AdditionTuple(item) for item in A]
B = [AdditionTuple(item) for item in B]
C = {tuple(item): SimpleNamespace(idx=idx, values=[])
for idx, item in enumerate(C)}
That is, we modify A and B to use our new data-type, and turn C into a dictionary which supports (amortised) O(1) look-up times.
We can now do the following, eliminating one loop altogether,
from itertools import product
for a, b in product(enumerate(A), enumerate(B)):
idx_a, a_i = a
idx_b, b_j = b
if a_i + b_j in C: # a_i + b_j == c_k, identically
C[a_i + b_j].values.append((idx_a, idx_b))
Then,
>>>print(C)
{(2, 3): namespace(idx=1, values=[(3, 2)]), (0, 2): namespace(idx=0, values=[(0, 0), (1, 1)])}
Where for each value in C, you get the index of that value (as idx), and a list of tuples of (idx_a, idx_b) whose elements of A and B together sum to the value at idx in C.
Let us briefly analyse the complexity of this algorithm. Redefining the lists A, B, and C as above is linear in the length of the lists. Iterating over A and B is of course in O(|A| * |B|), and the nested condition computes the element-wise addition of the tuples: this is linear in the length of the tuples themselves, which we shall denote k. The whole algorithm then runs in O(k * |A| * |B|).
This is a substantial improvement over your current O(k * |A| * |B| * |C|) algorithm.
(2) Matrix plotting
Use a dok_matrix, a sparse SciPy matrix representation. Then you can use any heatmap-plotting library you like on the matrix, e.g. Seaborn's heatmap.

generating conditional data with Hypothesis Python

I want to generate a list of lists of integers of size 2 with the following conditions.
the first element should be smaller than the second and
all the data should be unique.
I could generate each tuple with a custom function but don't know how to use that to satisfy the second condition.
from hypothesis import strategies as st
#st.composite
def generate_data(draw):
min_val, max_val = draw(st.lists(st.integers(1, 1e2), min_size=2, max_size=2))
st.assume(min_val < max_val)
return [min_val, max_val]
I could generate the data by iterating over generate_date a few times in this (inefficient ?) way:
>>> [generate_data().example() for _ in range(3)]
[[5, 31], [1, 12], [33, 87]]
But how can I check that the data is unique?
E.g, the following values are invalid:
[[1, 2], [1, 5], ...] # (1 is repeated)
[[1, 2], [1, 2], ...] # (repeated data)
but the following is valid:
[[1, 2], [3, 4], ...]
I think the following strategy satisfies your requirements:
import hypothesis.strategies as st
#st.composite
def unique_pair_lists(draw):
data = draw(st.lists(st.integers(), unique=True)
if len(data) % 2 != 0:
data.pop()
result = [data[i:i+2] for i in range(0, len(data), 2)]
for pair in result:
pair.sort()
return result
The idea here is that we generate something that gives the right elements, and then we transform it into something of the right shape. Rather than trying to generate pairs of lists of integers, we just generate a list of unique integers and then group them into pairs (we drop the last element if there's an odd number of integers). We then sort each pair to ensure it's in the right order.
David's solution permits an integer to appear in two sub-lists - for totally unique integers I'd use the following:
#st.composite
def list_of_pairs_of_unique_elements(draw):
seen = set()
new_int = st.integers(1, 1e2)\
.filter(lambda n: n not in seen)\ # Check that it's unique
.map(lambda n: seen.add(n) or n) # Add to filter before next draw
return draw(st.lists(st.tuples(new_int, new_int).map(sorted))
The .filter(...) method is probably what you're looking for.
.example() is only for interactive use - you'll get a warning (or error) if you use it in #given().
If you might end up filtering out most elements in the range (eg outer list of length > 30, meaning 60/100 possible unique elements), you might get better performance by creating a list of possible elements and popping out of it rather than rejecting seen elements.

How to calculate all interleavings of two lists?

I want to create a function that takes in two lists, the lists are not guaranteed to be of equal length, and returns all the interleavings between the two lists.
Input: Two lists that do not have to be equal in size.
Output: All possible interleavings between the two lists that preserve the original list's order.
Example: AllInter([1,2],[3,4]) -> [[1,2,3,4], [1,3,2,4], [1,3,4,2], [3,1,2,4], [3,1,4,2], [3,4,1,2]]
I do not want a solution. I want a hint.
Itertools would not be capable enough to handle this problem and would require some bit of understanding of the pegs and holes problem
Consider your example list
A = [1, 2 ]
B = [3, 4 ]
There are four holes (len(A) + len(B)) where you can place the elements (pegs)
| || || || |
|___||___||___||___|
In Python you can represent empty slots as a list of None
slots = [None]*(len(A) + len(B))
The number of ways you can place the elements (pegs) from the second list into the pegs is simply, the number of ways you can select slots from the holes which is
len(A) + len(B)Clen(B)
= 4C2
= itertools.combinations(range(0, len(len(A) + len(B)))
which can be depicted as
| || || || | Slots
|___||___||___||___| Selected
3 4 (0,1)
3 4 (0,2)
3 4 (0,3)
3 4 (1,2)
3 4 (1,3)
3 4 (2,3)
Now for each of these slot position fill it with elements (pegs) from the second list
for splice in combinations(range(0,len(slots)),len(B)):
it_B = iter(B)
for s in splice:
slots[s] = next(it_B)
Once you are done, you just have to fill the remaining empty slots with the elements (pegs) from the first list
it_A = iter(A)
slots = [e if e else next(it_A) for e in slots]
Before you start the next iteration with another slot position, flush your holes
slots = [None]*(len(slots))
A Python implementation for the above approach
def slot_combinations(A,B):
slots = [None]*(len(A) + len(B))
for splice in combinations(range(0,len(slots)),len(B)):
it_B = iter(B)
for s in splice:
slots[s] = next(it_B)
it_A = iter(A)
slots = [e if e else next(it_A) for e in slots]
yield slots
slots = [None]*(len(slots))
And the O/P from the above implementation
list(slot_combinations(A,B))
[[3, 4, 1, 2], [3, 1, 4, 2], [3, 1, 2, 4], [1, 3, 4, 2], [1, 3, 2, 4], [1, 2, 3, 4]]
Hint: Suppose each list had the same elements (but different between the list), i.e. one list was completely Red (say r of them), and the other was Blue (say b of them).
Each element of the output contains r+b or them, r of which are red.
Seems like others have spoilt it for you, even though you didn't ask for a solution (but they have a very good explanation)
So here it the code I wrote up quickly.
import itertools
def interleave(lst1, lst2):
r,b = len(lst1), len(lst2)
for s in itertools.combinations(xrange(0,r+b), r):
lst = [0]*(r+b)
tuple_idx,idx1,idx2 = 0,0,0
for i in xrange(0, r+b):
if tuple_idx < r and i == s[tuple_idx]:
lst[i] = lst1[idx1]
idx1 += 1
tuple_idx += 1
else:
lst[i] = lst2[idx2]
idx2 += 1
yield lst
def main():
for s in interleave([1,2,3], ['a','b']):
print s
if __name__ == "__main__":
main()
The basic idea is to map the output to (r+b) choose r combinations.
As suggested by #airza, the itertools module is your friend.
If you want to avoid using encapsulated magical goodness, my hint is to use recursion.
Start playing the process of generating the lists in your mind, and when you notice you're doing the same thing again, try to find the pattern. For example:
Take the first element from the first list
Either take the 2nd, or the first from the other list
Either take the 3rd, or the 2nd if you didn't, or another one from the other list
...
Okay, that is starting to look like there's some greater logic we're not using. I'm just incrementing the numbers. Surely I can find a base case that works while changing the "first element, instead of naming higher elements?
Play with it. :)
You can try something a little closer to the metal and more elegant(in my opinion) iterating through different possible slices. Basically step through and iterate through all three arguments to the standard slice operation, removing anything added to the final list. Can post code snippet if you're interested.

Categories