Related
I have this code that creates permutations of a given number. It also give the permuatations based on the number of specified digits, so 2 would give the permutations of all possible 2 digit values. I have this looped so for a 4 digit number, it would loop giving all permutations scenarios, like 4,3,2 and 1 digit permutations scenarios. The problem im having is how to store the perm variable which stores the permutations. I tried making a multi array perm, then as the loop iterates it adds the new array to the perm. Didn't work because the arrays are different sizes. How can I continue?
def fp(number):
# A Python program to print all
# permutations using library function
from itertools import permutations
# Get all permutations of [1, 2, 3]
c= list(map(int,str(number)))
print(c, len(c))
i=1
while i <= len(c):
perm= permutations(c,i) #permuate the number c to the number of specified digits i
i+=1
# Print the obtained permutations
for i in list(perm):
print (i)
You are searching for a powerset, search in these functions by itertools for powerset. I just changed the combinations to permutations.
Then loop through all permutations and append them to a list (you could also use a dictionary)
import itertools
def powerset(iterable):
s = list(iterable)
return itertools.chain.from_iterable(itertools.permutations(s, r) for r in range(1,len(s)+1))
lst_of_numbers = [1, 2, 3]
out = []
for perm in powerset(lst_of_numbers):
out.append(perm)
print(out)
[(1,), (2,), (3,), (1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2), (1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]
Use another variable to hold all the permutations. Extend it with the list returned by each call to permutations().
def fp(number):
# A Python program to print all
# permutations using library function
from itertools import permutations
all_perms = []
c= list(map(int,str(number)))
print(c, len(c))
i=1
while i <= len(c):
perm= permutations(c,i) #permuate the number c to the number of specified digits i
all_perms.extend(perm)
i+=1
# Print the obtained permutations
for i in all_perms:
print (i)
So say I have a list sequences such as this.
I want to remove all sequences where its total sum = N and/or it has a contiguous subarray with sum = N.
For example, if N = 4, then (1,1,2) is not valid since its total is 4. (1,1,3) is also not valid since the (1,3) is also 4. (1,3,1) is also not valid for the same reason.
lst = [
(1,1,1), (1,1,2), (1,1,3),
(1,2,1), (1,2,2), (1,2,3),
(1,3,1), (1,3,2), (1,3,3),
(2,1,1), (2,1,2), (2,1,3),
(2,2,1), (2,2,2), (2,2,3),
(2,3,1), (2,3,2), (2,3,3),
(3,1,1), (3,1,2), (3,1,3),
(3,2,1), (3,2,2), (3,2,3),
(3,3,1), (3,3,2), (3,3,3)
]
What are some ways to do this?
I'm currently trying to see if I'm able to remove sequences whose total is not necessarily a multiple of N but not its contiguous subarrays, but I'm unsuccessful
for elements in list(product(range(1,n), repeat=n-1)):
lst.append(elements)
for val in lst:
if np.cumsum(val).any() %n != 0:
lst2.append(val) # append value to a filtered list
You can use itertools.combinations to generate all combinations of slice indices to test for sums of subsequences:
from itertools import combinations
[t for t in lst if not any(sum(t[l:h+1]) == 4 for l, h in combinations(range(len(t)), 2))]
This returns:
[(1, 1, 1), (1, 2, 3), (2, 3, 2), (2, 3, 3), (3, 2, 1), (3, 2, 3), (3, 3, 2), (3, 3, 3)]
You can split your problem into two subproblems:
The elements in your list sum up to N. Then you can simply test:
if sum(myList) == N:
# do fancy stuff
The elements in your list do not sum up to N. In this case, there might be a subsequence that sum up to N. To find it, let's define two pointers, l and r. Their name stand for left and right and will define the boundaries of your subsequence. Then, the solution is the following:
r = 1
l = 0
while r <= len(myList):
sum_ = sum(myList[l:r])
if sum_ < 4:
r += 1
elif sum_ > 4:
l += 1
else:
# do fancy stuff and exit from the loop
break
It works as follows. First you initialize l and r so that you consider the subsequence consisting of only the first element of myList. Then, you sum the element of the subsequence and if the sum is lower than N, you enlarge the subsequence by adding 1 to r. If it is greater than N, then you restrict the subsequence by adding 1 to l.
Note thanks to eozd:
The above algorithm works only if the elemnent of the list are non-negative.
Imagine we have a list of stocks:
stocks = ['AAPL','GOOGL','IBM']
The specific stocks don't matter, what matters is that we have n items in this list.
Imagine we also have a list of weights, from 0% to 100%:
weights = list(range(101))
Given n = 3 (or any other number) I need to produce a matrix with every possible combinations of weights that sum to a full 100%. E.g.
0%, 0%, 100%
1%, 0%, 99%
0%, 1%, 99%
etc...
Is there some method of itertools that can do this? Something in numpy? What is the most efficient way to do this?
The way to optimize this isn't to figure out a faster way to generate the permutations, it's to generate as few permutations as possible.
First, how would you do this if you only wanted the combination that were in sorted order?
You don't need to generate all possible combinations of 0 to 100 and then filter that. The first number, a, can be anywhere from 0 to 100. The second number, b, can be anywhere from 0 to (100-a). The third number, c, can only be 100-a-b. So:
for a in range(0, 101):
for b in range(0, 101-a):
c = 100-a-b
yield a, b, c
Now, instead of generating 100*100*100 combination to filter them down to 100*50*1+1, we're just generating the 100*50*1+1, for a 2000x speedup.
However, keep in mind that there are still around X * (X/2)**N answers. So, computing them in X * (X/2)**N time instead of X**N may be optimal—but it's still exponential time. And there's no way around that; you want an exponential number of results, after all.
You can look for ways to make the first part more concise with itertools.product combined with reduce or accumulate, but I think it's going to end up less readable, and you want to be able to extend to any arbitrary N, and also to get all permutations rather than just the sorted ones. So keep it understandable until you do that, and then look for ways to condense it after you're done.
You obviously need to either go through N steps. I think this is easier to understand with recursion than a loop.
When n is 1, the only combination is (x,).
Otherwise, for each of the values a from 0 to x, you can have that value, together with all of the combinations of n-1 numbers that sum to x-a. So:
def sum_to_x(x, n):
if n == 1:
yield (x,)
return
for a in range(x+1):
for result in sum_to_x(x-a, n-1):
yield (a, *result)
Now you just need to add in the permutations, and you're done:
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from itertools.permutations(combi)
But there's one problem: permutations permutes positions, not values. So if you have, say, (100, 0, 0), the six permutations of that are (100, 0, 0), (100, 0, 0), (0, 100, 0), (0, 0, 100), (0, 100, 0), (0, 0, 100).
If N is very small—as it is in your example, with N=3 and X=100—it may be fine to just generate all 6 permutations of each combination and filter them:
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from set(itertools.permutations(combi))
… but if N can grow large, we're talking about a lot of wasted work there as well.
There are plenty of good answers here on how to do permutations without repeated values. See this question, for example. Borrowing an implementation from that answer:
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from unique_permutations(combi)
Or, if we can drag in SymPy or more-itertools:
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from sympy.multiset_permutations(combi)
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from more_itertools.distinct_permutations(combi)
What you are looking for is product from itertools module
you can use it as shown below
from itertools import product
weights = list(range(101))
n = 3
lst_of_weights = [i for i in product(weights,repeat=n) if sum(i)==100]
What you need is combinations_with_replacement because in your question you wrote 0, 0, 100 which means you expect repetition, like 20, 20, 60 etc.
from itertools import combinations_with_replacement
weights = range(11)
n = 3
list = [i for i in combinations_with_replacement(weights, n) if sum(i) == 10]
print (list)
The above code results in
[(0, 0, 10), (0, 1, 9), (0, 2, 8), (0, 3, 7), (0, 4, 6), (0, 5, 5), (1, 1, 8), (1, 2, 7), (1, 3, 6), (1, 4, 5), (2, 2, 6), (2, 3, 5), (2, 4, 4), (3, 3, 4)]
Replace range(10), n and sum(i) == 10 by whatever you need.
This is a classic Stars and bars problem, and Python's itertools module does indeed provide a solution that's both simple and efficient, without any additional filtering needed.
Some explanation first: you want to divide 100 "points" between 3 stocks in all possible ways. For illustration purposes, let's reduce to 10 points instead of 100, with each one worth 10% instead of 1%. Imagine writing those points as a string of ten * characters:
**********
These are the "stars" of "stars and bars". Now to divide the ten stars amongst the 3 stocks, we insert two | divider characters (the "bars" of "stars and bars"). For example, one such division might look like this::
**|*******|*
This particular combination of stars and bars would correspond to the division 20% AAPL, 70% GOOGL, 10% IBM. Another division might look like:
******||****
which would correspond to 60% AAPL, 0% GOOGL, 40% IBM.
It's easy to convince yourself that every string consisting of ten * characters and two | characters corresponds to exactly one possible division of the ten points amongst the three stocks.
So to solve your problem, all we need to do is generate all possible strings containing ten * star characters and two | bar characters. Or, to think of this another way, we want to find all possible pairs of positions that we can place the two bar characters, in a string of total length twelve. Python's itertools.combinations function can be used to give us those possible positions, (for example with itertools.combinations(range(12), 2)) and then it's simple to translate each pair of positions back to a division of range(10) into three pieces: insert an extra imaginary divider character at the start and end of the string, then find the number of stars between each pair of dividers. That number of stars is simply one less than the distance between the two dividers.
Here's the code:
import itertools
def all_partitions(n, k):
"""
Generate all partitions of range(n) into k pieces.
"""
for c in itertools.combinations(range(n+k-1), k-1):
yield tuple(y-x-1 for x, y in zip((-1,) + c, c + (n+k-1,)))
For the case you give in the question, you want all_partitions(100, 3). But that yields 5151 partitions, starting with (0, 0, 100) and ending with (100, 0, 0), so it's impractical to show the results here. Instead, here are the results in a smaller case:
>>> for partition in all_partitions(5, 3):
... print(partition)
...
(0, 0, 5)
(0, 1, 4)
(0, 2, 3)
(0, 3, 2)
(0, 4, 1)
(0, 5, 0)
(1, 0, 4)
(1, 1, 3)
(1, 2, 2)
(1, 3, 1)
(1, 4, 0)
(2, 0, 3)
(2, 1, 2)
(2, 2, 1)
(2, 3, 0)
(3, 0, 2)
(3, 1, 1)
(3, 2, 0)
(4, 0, 1)
(4, 1, 0)
(5, 0, 0)
Say I have a list of valid X = [1, 2, 3, 4, 5] and a list of valid Y = [1, 2, 3, 4, 5].
I need to generate all combinations of every element in X and every element in Y (in this case, 25) and get those combinations in random order.
This in itself would be simple, but there is an additional requirement: In this random order, there cannot be a repetition of the same x in succession. For example, this is okay:
[1, 3]
[2, 5]
[1, 2]
...
[1, 4]
This is not:
[1, 3]
[1, 2] <== the "1" cannot repeat, because there was already one before
[2, 5]
...
[1, 4]
Now, the least efficient idea would be to simply randomize the full set as long as there are no more repetitions. My approach was a bit different, repeatedly creating a shuffled variant of X, and a list of all Y * X, then picking a random next one from that. So far, I've come up with this:
import random
output = []
num_x = 5
num_y = 5
all_ys = list(xrange(1, num_y + 1)) * num_x
while True:
# end if no more are available
if len(output) == num_x * num_y:
break
xs = list(xrange(1, num_x + 1))
while len(xs):
next_x = random.choice(xs)
next_y = random.choice(all_ys)
if [next_x, next_y] not in output:
xs.remove(next_x)
all_ys.remove(next_y)
output.append([next_x, next_y])
print(sorted(output))
But I'm sure this can be done even more efficiently or in a more succinct way?
Also, my solution first goes through all X values before continuing with the full set again, which is not perfectly random. I can live with that for my particular application case.
A simple solution to ensure an average O(N*M) complexity:
def pseudorandom(M,N):
l=[(x+1,y+1) for x in range(N) for y in range(M)]
random.shuffle(l)
for i in range(M*N-1):
for j in range (i+1,M*N): # find a compatible ...
if l[i][0] != l[j][0]:
l[i+1],l[j] = l[j],l[i+1]
break
else: # or insert otherwise.
while True:
l[i],l[i-1] = l[i-1],l[i]
i-=1
if l[i][0] != l[i-1][0]: break
return l
Some tests:
In [354]: print(pseudorandom(5,5))
[(2, 2), (3, 1), (5, 1), (1, 1), (3, 2), (1, 2), (3, 5), (1, 5), (5, 4),\
(1, 3), (5, 2), (3, 4), (5, 3), (4, 5), (5, 5), (1, 4), (2, 5), (4, 4), (2, 4),\
(4, 2), (2, 1), (4, 3), (2, 3), (4, 1), (3, 3)]
In [355]: %timeit pseudorandom(100,100)
10 loops, best of 3: 41.3 ms per loop
Here is my solution. First the tuples are chosen among the ones who have a different x value from the previous selected tuple. But I ve noticed that you have to prepare the final trick for the case you have only bad value tuples to place at end.
import random
num_x = 5
num_y = 5
all_ys = range(1,num_y+1)*num_x
all_xs = sorted(range(1,num_x+1)*num_y)
output = []
last_x = -1
for i in range(0,num_x*num_y):
#get list of possible tuple to place
all_ind = range(0,len(all_xs))
all_ind_ok = [k for k in all_ind if all_xs[k]!=last_x]
ind = random.choice(all_ind_ok)
last_x = all_xs[ind]
output.append([all_xs.pop(ind),all_ys.pop(ind)])
if(all_xs.count(last_x)==len(all_xs)):#if only last_x tuples,
break
if len(all_xs)>0: # if there are still tuples they are randomly placed
nb_to_place = len(all_xs)
while(len(all_xs)>0):
place = random.randint(0,len(output)-1)
if output[place]==last_x:
continue
if place>0:
if output[place-1]==last_x:
continue
output.insert(place,[all_xs.pop(),all_ys.pop()])
print output
Here's a solution using NumPy
def generate_pairs(xs, ys):
n = len(xs)
m = len(ys)
indices = np.arange(n)
array = np.tile(ys, (n, 1))
[np.random.shuffle(array[i]) for i in range(n)]
counts = np.full_like(xs, m)
i = -1
for _ in range(n * m):
weights = np.array(counts, dtype=float)
if i != -1:
weights[i] = 0
weights /= np.sum(weights)
i = np.random.choice(indices, p=weights)
counts[i] -= 1
pair = xs[i], array[i, counts[i]]
yield pair
Here's a Jupyter notebook that explains how it works
Inside the loop, we have to copy the weights, add them up, and choose a random index using the weights. These are all linear in n. So the overall complexity to generate all pairs is O(n^2 m)
But the runtime is deterministic and overhead is low. And I'm fairly sure it generates all legal sequences with equal probability.
An interesting question! Here is my solution. It has the following properties:
If there is no valid solution it should detect this and let you know
The iteration is guaranteed to terminate so it should never get stuck in an infinite loop
Any possible solution is reachable with nonzero probability
I do not know the distribution of the output over all possible solutions, but I think it should be uniform because there is no obvious asymmetry inherent in the algorithm. I would be surprised and pleased to be shown otherwise, though!
import random
def random_without_repeats(xs, ys):
pairs = [[x,y] for x in xs for y in ys]
output = [[object()], [object()]]
seen = set()
while pairs:
# choose a random pair from the ones left
indices = list(set(xrange(len(pairs))) - seen)
try:
index = random.choice(indices)
except IndexError:
raise Exception('No valid solution exists!')
# the first element of our randomly chosen pair
x = pairs[index][0]
# search for a valid place in output where we slot it in
for i in xrange(len(output) - 1):
left, right = output[i], output[i+1]
if x != left[0] and x != right[0]:
output.insert(i+1, pairs.pop(index))
seen = set()
break
else:
# make sure we don't randomly choose a bad pair like that again
seen |= {i for i in indices if pairs[i][0] == x}
# trim off the sentinels
output = output[1:-1]
assert len(output) == len(xs) * len(ys)
assert not any(L==R for L,R in zip(output[:-1], output[1:]))
return output
nx, ny = 5, 5 # OP example
# nx, ny = 2, 10 # output must alternate in 1st index
# nx, ny = 4, 13 # shuffle 'deck of cards' with no repeating suit
# nx, ny = 1, 5 # should raise 'No valid solution exists!' exception
xs = range(1, nx+1)
ys = range(1, ny+1)
for pair in random_without_repeats(xs, ys):
print pair
This should do what you want.
rando will never generate the same X twice in a row, but I realized that it is possible (though seems unlikely, in that I never noticed it happen in the 10 or so times I ran without the extra check) that due to the potential discard of duplicate pairs it could happen upon a previous X. Oh! But I think I figured it out... will update my answer in a moment.
import random
X = [1,2,3,4,5]
Y = [1,2,3,4,5]
def rando(choice_one, choice_two):
last_x = random.choice(choice_one)
while True:
yield last_x, random.choice(choice_two)
possible_x = choice_one[:]
possible_x.remove(last_x)
last_x = random.choice(possible_x)
all_pairs = set(itertools.product(X, Y))
result = []
r = rando(X, Y)
while set(result) != all_pairs:
pair = next(r)
if pair not in result:
if result and result[-1][0] == pair[0]:
continue
result.append(pair)
import pprint
pprint.pprint(result)
For completeness, I guess I will throw in the super-naive "just keep shuffling till you get one" solution. It's not guaranteed to even terminate, but if it does, it will have a good degree of randomness, and you did say one of the desired qualities was succinctness, and this sure is succinct:
import itertools
import random
x = range(5) # this is a list in Python 2
y = range(5)
all_pairs = list(itertools.product(x, y))
s = list(all_pairs) # make a working copy
while any(s[i][0] == s[i + 1][0] for i in range(len(s) - 1)):
random.shuffle(s)
print s
As was commented, for small values of x and y (especially y!), this is actually a reasonably quick solution. Your example of 5 for each completes in an average time of "right away". The deck of cards example (4 and 13) can take much longer, because it will usually require hundreds of thousands of shuffles. (And again, is not guaranteed to terminate at all.)
Distribute the x values (5 times each value) evenly across your output:
import random
def random_combo_without_x_repeats(xvals, yvals):
# produce all valid combinations, but group by `x` and shuffle the `y`s
grouped = [[x, random.sample(yvals, len(yvals))] for x in xvals]
last_x = object() # sentinel not equal to anything
while grouped[0][1]: # still `y`s left
for _ in range(len(xvals)):
# shuffle the `x`s, but skip any ordering that would
# produce consecutive `x`s.
random.shuffle(grouped)
if grouped[0][0] != last_x:
break
else:
# we tried to reshuffle N times, but ended up with the same `x` value
# in the first position each time. This is pretty unlikely, but
# if this happens we bail out and just reverse the order. That is
# more than good enough.
grouped = grouped[::-1]
# yield a set of (x, y) pairs for each unique x
# Pick one y (from the pre-shuffled groups per x
for x, ys in grouped:
yield x, ys.pop()
last_x = x
This shuffles the y values per x first, then gives you a x, y combination for each x. The order in which the xs are yielded is shuffled each iteration, where you test for the restriction.
This is random, but you'll get all numbers between 1 and 5 in the x position before you'll see the same number again:
>>> list(random_combo_without_x_repeats(range(1, 6), range(1, 6)))
[(2, 1), (3, 2), (1, 5), (5, 1), (4, 1),
(2, 4), (3, 1), (4, 3), (5, 5), (1, 4),
(5, 2), (1, 1), (3, 3), (4, 4), (2, 5),
(3, 5), (2, 3), (4, 2), (1, 2), (5, 4),
(2, 2), (3, 4), (1, 3), (4, 5), (5, 3)]
(I manually grouped that into sets of 5). Overall, this makes for a pretty good random shuffling of a fixed input set with your restriction.
It is efficient too; because there is only a 1-in-N chance that you have to re-shuffle the x order, you should only see one reshuffle on average take place during a full run of the algorithm. The whole algorithm stays within O(N*M) boundaries therefor, pretty much ideal for something that produces N times M elements of output. Because we limit the reshuffling to N times at most before falling back to a simple reverse we avoid the (extremely unlikely) posibility of endlessly reshuffling.
The only drawback then is that it has to create N copies of the M y values up front.
Here is an evolutionary algorithm approach. It first evolves a list in which the elements of X are each repeated len(Y) times and then it randomly fills in each element of Y len(X) times. The resulting orders seem fairly random:
import random
#the following fitness function measures
#the number of times in which
#consecutive elements in a list
#are equal
def numRepeats(x):
n = len(x)
if n < 2: return 0
repeats = 0
for i in range(n-1):
if x[i] == x[i+1]: repeats += 1
return repeats
def mutate(xs):
#swaps random pairs of elements
#returns a new list
#one of the two indices is chosen so that
#it is in a repeated pair
#and swapped element is different
n = len(xs)
repeats = [i for i in range(n) if (i > 0 and xs[i] == xs[i-1]) or (i < n-1 and xs[i] == xs[i+1])]
i = random.choice(repeats)
j = random.randint(0,n-1)
while xs[j] == xs[i]: j = random.randint(0,n-1)
ys = xs[:]
ys[i], ys[j] = ys[j], ys[i]
return ys
def evolveShuffle(xs, popSize = 100, numGens = 100):
#tries to evolve a shuffle of xs so that consecutive
#elements are different
#takes the best 10% of each generation and mutates each 9
#times. Stops when a perfect solution is found
#popsize assumed to be a multiple of 10
population = []
for i in range(popSize):
deck = xs[:]
random.shuffle(deck)
fitness = numRepeats(deck)
if fitness == 0: return deck
population.append((fitness,deck))
for i in range(numGens):
population.sort(key = (lambda p: p[0]))
newPop = []
for i in range(popSize//10):
fit,deck = population[i]
newPop.append((fit,deck))
for j in range(9):
newDeck = mutate(deck)
fitness = numRepeats(newDeck)
if fitness == 0: return newDeck
newPop.append((fitness,newDeck))
population = newPop
#if you get here :
return [] #no special shuffle found
#the following function takes a list x
#with n distinct elements (n>1) and an integer k
#and returns a random list of length nk
#where consecutive elements are not the same
def specialShuffle(x,k):
n = len(x)
if n == 2:
if random.random() < 0.5:
a,b = x
else:
b,a = x
return [a,b]*k
else:
deck = x*k
return evolveShuffle(deck)
def randOrder(x,y):
xs = specialShuffle(x,len(y))
d = {}
for i in x:
ys = y[:]
random.shuffle(ys)
d[i] = iter(ys)
pairs = []
for i in xs:
pairs.append((i,next(d[i])))
return pairs
for example:
>>> randOrder([1,2,3,4,5],[1,2,3,4,5])
[(1, 4), (3, 1), (4, 5), (2, 2), (4, 3), (5, 3), (2, 1), (3, 3), (1, 1), (5, 2), (1, 3), (2, 5), (1, 5), (3, 5), (5, 5), (4, 4), (2, 3), (3, 2), (5, 4), (2, 4), (4, 2), (1, 2), (5, 1), (4, 1), (3, 4)]
As len(X) and len(Y) gets larger this has more difficulty finding a solution (and is designed to return the empty list in that eventuality), in which case the parameters popSize and numGens could be increased. As is, it is able to find 20x20 solutions very rapidly. It takes about a minute when X and Y are of size 100 but even then is able to find a solution (in the times that I have run it).
Interesting restriction! I probably overthought this, solving a more general problem: shuffling an arbitrary list of sequences such that (if possible) no two adjacent sequences share a first item.
from itertools import product
from random import choice, randrange, shuffle
def combine(*sequences):
return playlist(product(*sequences))
def playlist(sequence):
r'''Shuffle a set of sequences, avoiding repeated first elements.
'''#"""#'''
result = list(sequence)
length = len(result)
if length < 2:
# No rearrangement is possible.
return result
def swap(a, b):
if a != b:
result[a], result[b] = result[b], result[a]
swap(0, randrange(length))
for n in range(1, length):
previous = result[n-1][0]
choices = [x for x in range(n, length) if result[x][0] != previous]
if not choices:
# Trapped in a corner: Too many of the same item are left.
# Backtrack as far as necessary to interleave other items.
minor = 0
major = length - n
while n > 0:
n -= 1
if result[n][0] == previous:
major += 1
else:
minor += 1
if minor == major - 1:
if n == 0 or result[n-1][0] != previous:
break
else:
# The requirement can't be fulfilled,
# because there are too many of a single item.
shuffle(result)
break
# Interleave the majority item with the other items.
major = [item for item in result[n:] if item[0] == previous]
minor = [item for item in result[n:] if item[0] != previous]
shuffle(major)
shuffle(minor)
result[n] = major.pop(0)
n += 1
while n < length:
result[n] = minor.pop(0)
n += 1
result[n] = major.pop(0)
n += 1
break
swap(n, choice(choices))
return result
This starts out simple, but when it discovers that it can't find an item with a different first element, it figures out how far back it needs to go to interleave that element with something else. Therefore, the main loop traverses the array at most three times (once backwards), but usually just once. Granted, each iteration of the first forward pass checks each remaining item in the array, and the array itself contains every pair, so the overall run time is O((NM)**2).
For your specific problem:
>>> X = Y = [1, 2, 3, 4, 5]
>>> combine(X, Y)
[(3, 5), (1, 1), (4, 4), (1, 2), (3, 4),
(2, 3), (5, 4), (1, 5), (2, 4), (5, 5),
(4, 1), (2, 2), (1, 4), (4, 2), (5, 2),
(2, 1), (3, 3), (2, 5), (3, 2), (1, 3),
(4, 3), (5, 3), (4, 5), (5, 1), (3, 1)]
By the way, this compares x values by equality, not by position in the X array, which may make a difference if the array can contain duplicates. In fact, duplicate values might trigger the fallback case of shuffling all pairs together if more than half of the X values are the same.
I have search the web which has provided various solution on how to produce a matrix of random numbers whose sum is a constant. My problem is slightly different. I want to generate an NX4 matrix of exhaustive list of integers such that sum of all numbers in the row is exactly 100. and integers have a range from [0,100]. I want to the integers to increment sequentially as opposed to random. How can I do it in Python?
Thank you.
product is a handy way of generating combinations
In [774]: from itertools import product
In [775]: [x for x in product(range(10),range(10)) if sum(x)==10]
Out[775]: [(1, 9), (2, 8), (3, 7), (4, 6), (5, 5), (6, 4), (7, 3), (8, 2), (9, 1)]
The tuples sum to 10, and step sequentially (in the first value at least).
I can generalize it to 3 tuples, and it still runs pretty fast.
In [778]: len([x for x in product(range(100),range(100),range(100)) if sum(x)==100])
Out[778]: 5148
Length 4 tuples takes much longer (on an old machine),
In [780]: len([x for x in product(range(100),range(100),range(100),range(100)) if sum(x)==100])
Out[780]: 176847
So there's probably case to be made for solving this incrementally.
[x for x in product(range(100),range(100),range(100)) if sum(x)<=100]
runs much faster, producing the same number of of 3 tuples (within 1 or 2). And the 4th value can be derived that that x.
In [790]: timeit len([x+(100-sum(x),) for x in product(range(100),range(100),range(100)) if sum(x)<=100])
1 loops, best of 3: 444 ms per loop
import itertools
import random
def makerow(L, T, R):
# make a row of size L and sum T, with the integers from 0-R, in ascending
answer = []
pool = list(itertools.takewhile(lambda x: x<T, range(R+1)))
for i in range(L-1):
answer.append(random.choice(pool))
T -= answer[-1]
pool = list(itertools.takewhile(lambda x: x<T, range(R+1)))
answer.append(T)
answer.sort()
return answer
def makematrix(M, N, T, R):
# make a matrix of M rows and N columns per row
# each row adds up to T
# using the numbers between 0-R
return [makerow(N, T, R) for _ in range(M)]