Related
Imagine we have a list of stocks:
stocks = ['AAPL','GOOGL','IBM']
The specific stocks don't matter, what matters is that we have n items in this list.
Imagine we also have a list of weights, from 0% to 100%:
weights = list(range(101))
Given n = 3 (or any other number) I need to produce a matrix with every possible combinations of weights that sum to a full 100%. E.g.
0%, 0%, 100%
1%, 0%, 99%
0%, 1%, 99%
etc...
Is there some method of itertools that can do this? Something in numpy? What is the most efficient way to do this?
The way to optimize this isn't to figure out a faster way to generate the permutations, it's to generate as few permutations as possible.
First, how would you do this if you only wanted the combination that were in sorted order?
You don't need to generate all possible combinations of 0 to 100 and then filter that. The first number, a, can be anywhere from 0 to 100. The second number, b, can be anywhere from 0 to (100-a). The third number, c, can only be 100-a-b. So:
for a in range(0, 101):
for b in range(0, 101-a):
c = 100-a-b
yield a, b, c
Now, instead of generating 100*100*100 combination to filter them down to 100*50*1+1, we're just generating the 100*50*1+1, for a 2000x speedup.
However, keep in mind that there are still around X * (X/2)**N answers. So, computing them in X * (X/2)**N time instead of X**N may be optimal—but it's still exponential time. And there's no way around that; you want an exponential number of results, after all.
You can look for ways to make the first part more concise with itertools.product combined with reduce or accumulate, but I think it's going to end up less readable, and you want to be able to extend to any arbitrary N, and also to get all permutations rather than just the sorted ones. So keep it understandable until you do that, and then look for ways to condense it after you're done.
You obviously need to either go through N steps. I think this is easier to understand with recursion than a loop.
When n is 1, the only combination is (x,).
Otherwise, for each of the values a from 0 to x, you can have that value, together with all of the combinations of n-1 numbers that sum to x-a. So:
def sum_to_x(x, n):
if n == 1:
yield (x,)
return
for a in range(x+1):
for result in sum_to_x(x-a, n-1):
yield (a, *result)
Now you just need to add in the permutations, and you're done:
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from itertools.permutations(combi)
But there's one problem: permutations permutes positions, not values. So if you have, say, (100, 0, 0), the six permutations of that are (100, 0, 0), (100, 0, 0), (0, 100, 0), (0, 0, 100), (0, 100, 0), (0, 0, 100).
If N is very small—as it is in your example, with N=3 and X=100—it may be fine to just generate all 6 permutations of each combination and filter them:
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from set(itertools.permutations(combi))
… but if N can grow large, we're talking about a lot of wasted work there as well.
There are plenty of good answers here on how to do permutations without repeated values. See this question, for example. Borrowing an implementation from that answer:
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from unique_permutations(combi)
Or, if we can drag in SymPy or more-itertools:
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from sympy.multiset_permutations(combi)
def perm_sum_to_x(x, n):
for combi in sum_to_x(x, n):
yield from more_itertools.distinct_permutations(combi)
What you are looking for is product from itertools module
you can use it as shown below
from itertools import product
weights = list(range(101))
n = 3
lst_of_weights = [i for i in product(weights,repeat=n) if sum(i)==100]
What you need is combinations_with_replacement because in your question you wrote 0, 0, 100 which means you expect repetition, like 20, 20, 60 etc.
from itertools import combinations_with_replacement
weights = range(11)
n = 3
list = [i for i in combinations_with_replacement(weights, n) if sum(i) == 10]
print (list)
The above code results in
[(0, 0, 10), (0, 1, 9), (0, 2, 8), (0, 3, 7), (0, 4, 6), (0, 5, 5), (1, 1, 8), (1, 2, 7), (1, 3, 6), (1, 4, 5), (2, 2, 6), (2, 3, 5), (2, 4, 4), (3, 3, 4)]
Replace range(10), n and sum(i) == 10 by whatever you need.
This is a classic Stars and bars problem, and Python's itertools module does indeed provide a solution that's both simple and efficient, without any additional filtering needed.
Some explanation first: you want to divide 100 "points" between 3 stocks in all possible ways. For illustration purposes, let's reduce to 10 points instead of 100, with each one worth 10% instead of 1%. Imagine writing those points as a string of ten * characters:
**********
These are the "stars" of "stars and bars". Now to divide the ten stars amongst the 3 stocks, we insert two | divider characters (the "bars" of "stars and bars"). For example, one such division might look like this::
**|*******|*
This particular combination of stars and bars would correspond to the division 20% AAPL, 70% GOOGL, 10% IBM. Another division might look like:
******||****
which would correspond to 60% AAPL, 0% GOOGL, 40% IBM.
It's easy to convince yourself that every string consisting of ten * characters and two | characters corresponds to exactly one possible division of the ten points amongst the three stocks.
So to solve your problem, all we need to do is generate all possible strings containing ten * star characters and two | bar characters. Or, to think of this another way, we want to find all possible pairs of positions that we can place the two bar characters, in a string of total length twelve. Python's itertools.combinations function can be used to give us those possible positions, (for example with itertools.combinations(range(12), 2)) and then it's simple to translate each pair of positions back to a division of range(10) into three pieces: insert an extra imaginary divider character at the start and end of the string, then find the number of stars between each pair of dividers. That number of stars is simply one less than the distance between the two dividers.
Here's the code:
import itertools
def all_partitions(n, k):
"""
Generate all partitions of range(n) into k pieces.
"""
for c in itertools.combinations(range(n+k-1), k-1):
yield tuple(y-x-1 for x, y in zip((-1,) + c, c + (n+k-1,)))
For the case you give in the question, you want all_partitions(100, 3). But that yields 5151 partitions, starting with (0, 0, 100) and ending with (100, 0, 0), so it's impractical to show the results here. Instead, here are the results in a smaller case:
>>> for partition in all_partitions(5, 3):
... print(partition)
...
(0, 0, 5)
(0, 1, 4)
(0, 2, 3)
(0, 3, 2)
(0, 4, 1)
(0, 5, 0)
(1, 0, 4)
(1, 1, 3)
(1, 2, 2)
(1, 3, 1)
(1, 4, 0)
(2, 0, 3)
(2, 1, 2)
(2, 2, 1)
(2, 3, 0)
(3, 0, 2)
(3, 1, 1)
(3, 2, 0)
(4, 0, 1)
(4, 1, 0)
(5, 0, 0)
Consider two list comprehensions gamma and delta with nearly redundant code. The difference being the sliced lists alpha and beta, namely
gamma = [alpha[i:i+30] for i in range(0,49980,30)]
delta = [beta[i:i+30] for i in range(0,49980,30)]
Is there a pythonic way to write this as a one liner (say gamma,delta = ... )?
I have a few other pieces of code that are similar in nature, and I'd like to simplify the code's seeming redundancy.
Although one-line list-comprehensions are really useful, they aren't always the best choice. So here since you're doing the same chunking to both lists, if you wanted to change the chunking, you would have to modify both lines.
Instead, we could use a function that would chunk any given list and then use a one-line assignment to chunk gamma and delta.
def chunk(l):
return [l[i:i+30] for i in range(0, len(l), 30)]
gamma, delta = chunk(gamma), chunk(delta)
As far as your question related to combining both the list comprehension expression above is concerned, you can get gamma and delta by using zip with single list comprehension as:
gamma, delta = zip(*[(alpha[i:i+30], beta[i:i+30]) for i in range(0,50000,30)])
Sample example to show how zip works:
>>> zip(*[(i, i+1) for i in range(0, 10, 2)])
[(0, 2, 4, 6, 8), (1, 3, 5, 7, 9)]
Here our list comprehension will return the list of tuples:
>>> [(i, i+1) for i in range(0, 10, 2)]
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]
Then we are unpacking this list using * and using zip we are aggregating the element from each of the iterables:
>>> zip(*[(i, i+1) for i in range(0, 10, 2)])
[(0, 2, 4, 6, 8), (1, 3, 5, 7, 9)]
As an alternative, for dividing the list into evenly sized chunks, please take a look at "How do you split a list into evenly sized chunks?"
Just another way...
gamma, delta = ([src[i:i+30] for i in range(0,49980,30)] for src in (alpha, beta))
It's a bit faster than the accepted zip solution:
genny 3.439506340350704
zippy 4.3039169818228515
Code:
from timeit import timeit
alpha = list(range(60000))
beta = list(range(60000))
def genny():
gamma, delta = ([src[i:i+30] for i in range(0,49980,30)] for src in (alpha, beta))
def zippy():
gamma, delta = zip(*[(alpha[i:i+30], beta[i:i+30]) for i in range(0,50000,30)])
n = 1000
print('genny', timeit(genny, number=n))
print('zippy', timeit(zippy, number=n))
You can you lambda expression:
g = lambda l: [l[i:i+30] for i in range(0,50000, 30)]
gamma, delta = g(alpha), g(beta)
I have search the web which has provided various solution on how to produce a matrix of random numbers whose sum is a constant. My problem is slightly different. I want to generate an NX4 matrix of exhaustive list of integers such that sum of all numbers in the row is exactly 100. and integers have a range from [0,100]. I want to the integers to increment sequentially as opposed to random. How can I do it in Python?
Thank you.
product is a handy way of generating combinations
In [774]: from itertools import product
In [775]: [x for x in product(range(10),range(10)) if sum(x)==10]
Out[775]: [(1, 9), (2, 8), (3, 7), (4, 6), (5, 5), (6, 4), (7, 3), (8, 2), (9, 1)]
The tuples sum to 10, and step sequentially (in the first value at least).
I can generalize it to 3 tuples, and it still runs pretty fast.
In [778]: len([x for x in product(range(100),range(100),range(100)) if sum(x)==100])
Out[778]: 5148
Length 4 tuples takes much longer (on an old machine),
In [780]: len([x for x in product(range(100),range(100),range(100),range(100)) if sum(x)==100])
Out[780]: 176847
So there's probably case to be made for solving this incrementally.
[x for x in product(range(100),range(100),range(100)) if sum(x)<=100]
runs much faster, producing the same number of of 3 tuples (within 1 or 2). And the 4th value can be derived that that x.
In [790]: timeit len([x+(100-sum(x),) for x in product(range(100),range(100),range(100)) if sum(x)<=100])
1 loops, best of 3: 444 ms per loop
import itertools
import random
def makerow(L, T, R):
# make a row of size L and sum T, with the integers from 0-R, in ascending
answer = []
pool = list(itertools.takewhile(lambda x: x<T, range(R+1)))
for i in range(L-1):
answer.append(random.choice(pool))
T -= answer[-1]
pool = list(itertools.takewhile(lambda x: x<T, range(R+1)))
answer.append(T)
answer.sort()
return answer
def makematrix(M, N, T, R):
# make a matrix of M rows and N columns per row
# each row adds up to T
# using the numbers between 0-R
return [makerow(N, T, R) for _ in range(M)]
Does anyone have some thoughts on elegant code and math using python to solve the following problem?
I have two lists of numbers:
A=[83.4,108,-240.2]
B=[10.3,96.7,-5.5,-20.4,30.9,2.1,-6.1,51.5,37.7,-25,-10.7,-250.4,-14.2,56.4,-11.5,163.9,-146.6,-2.6,7.9,-13.2]
I know that the B can be divided into three lists that contain elements of B such that the three lists together contain all of the elements in B but the three lists have no overlapping elements. The sum of these three lists will add up to the three elements in A.
I can do the brute force method, which is to create all possible combinations of the elements of B into three sets, but the number of possibilities blows up very quickly with the number of elements in B. I've also looked at the knapsack problem, but that seems to require only positive values.
This is indeed a variant of the subset sum problem:
In computer science, the subset sum problem is one of the important problems in complexity theory and cryptography. The problem is this: given a set (or multiset) of integers, is there a non-empty subset whose sum is zero? For example, given the set {−7, −3, −2, 5, 8}, the answer is yes because the subset {−3, −2, 5} sums to zero. The problem is NP-complete.
An equivalent problem is this: given a set of integers and an integer s, does any non-empty subset sum to s?
Proof that it's NP-complete:
The easiest way to prove that some new problem is NP-complete is first to prove that it is in NP, and then to reduce some known NP-complete problem to it.
It is in NP because it can be verified in polynomial time: given a potential solution, simply add up the numbers in the subsets and see if they correspond to the numbers in A. And, you can reduce the subset problem to this problem in polynomial time: Given set x and target sum s, let A = [s, sum(x) - s] and B = x.
It being NP-complete, there is no way to solve this quickly in the general case, using Python or otherwise:
Although any given solution to an NP-complete problem can be verified quickly (in polynomial time), there is no known efficient way to locate a solution in the first place; indeed, the most notable characteristic of NP-complete problems is that no fast solution to them is known. That is, the time required to solve the problem using any currently known algorithm increases very quickly as the size of the problem grows. As a consequence, determining whether or not it is possible to solve these problems quickly, called the P versus NP problem, is one of the principal unsolved problems in computer science today.
As #Claudiu explained well such problems are NP complete, and you can not solve them in a efficient or general way, but in this case as a special and not much efficient way you can play itertools module like following :
>>> from itertools import combinations,product,chain
>>> length=len(B)
>>> subset_lenght=[(i,j,k) for i,j,k in combinations(range(1,length),3) if i+j+k==length]
>>> all_combinations={i:combinations(B,i) for i in range(1,length-2)}
>>> for i,j,k in subset_lenght:
... for t,p,m in product(all_combinations[i],all_combinations[j],all_combinations[k]):
... if not set(t)&set(p)&set(m) and map(sum,(t,p,m))==A:
... print chain.fromiterable(t,p,m)
In this approach first of all you need all the possible lengths which those sum is equal to your main list length, for that aim you can use following list comprehension :
>>> [(i,j,k) for i,j,k in combinations(range(1,len(B)),3) if i+j+k==len(B)]
[(1, 2, 17), (1, 3, 16), (1, 4, 15), (1, 5, 14), (1, 6, 13), (1, 7, 12), (1, 8, 11), (1, 9, 10), (2, 3, 15), (2, 4, 14), (2, 5, 13), (2, 6, 12), (2, 7, 11), (2, 8, 10), (3, 4, 13), (3, 5, 12), (3, 6, 11), (3, 7, 10), (3, 8, 9), (4, 5, 11), (4, 6, 10), (4, 7, 9), (5, 6, 9), (5, 7, 8)]
Then you need to get all the combinations of the main list with the length 1 to len(main_list)-3 (17 in this case but since the range doesn't contains the last number we will put a number 1 grater) so, since we need to access this combinations with those length we can use a dict comprehension to create a dictionary with the partition length as key and the combinations as value :
>>> all_combinations={i:combinations(B,i) for i in range(1,length-2)}
And at last you need to get the the combinations based on subset_lenght items and then choose the ones that hasn't any intersection and those sum is equal to those corresponding item in A.
In the form f(x,y,z) where x is a given integer sum, y is the minimum length of the sequence, and z is the maximum length of the sequence. But for now let's pretend we're dealing with a sequence of a fixed length, because it will take me a long time to write the question otherwise.
So our function is f(x,r) where x is a given integer sum and r is the length of a sequence in the list of possible sequences.
For x = 10, and r = 2, these are the possible combinations:
1 + 9
2 + 8
3 + 7
4 + 6
5 + 5
Let's store that in Python as a list of pairs:
[(1,9), (2,8), (3,7), (4,6), (5,5)]
So usage looks like:
>>> f(10,2)
[(1,9), (2,8), (3,7), (4,6), (5,5)]
Back to the original question, where a sequence is return for each length in the range (y,x). I the form f(x,y,z), defined earlier, and leaving out sequences of length 1 (where y-z == 0), this would look like:
>>> f(10,1,3)
[{1: [(1,9), (2,8), (3,7), (4,6), (5,5)],
2: [(1,1,8), (1,2,7), (1,3,6) ... (2,4,4) ...],
3: [(1,1,1,7) ...]}]
So the output is a list of dictionaries where the value is a list of pairs. Not exactly optimal.
So my questions are:
Is there a library that handles this already?
If not, can someone help me write both of the functions I mentioned? (fixed sequence length first)?
Because of the huge gaps in my knowledge of fairly trivial math, could you ignore my approach to integer storage and use whatever structure the makes the most sense?
Sorry about all of these arithmetic questions today. Thanks!
The itertools module will definately be helpful as we're dealing with premutations - however, this looks suspiciously like a homework task...
Edit: Looks like fun though, so I'll do an attempt.
Edit 2: This what you want?
from itertools import combinations_with_replacement
from pprint import pprint
f = lambda target_sum, length: [sequence for sequence in combinations_with_replacement(range(1, target_sum+1), length) if sum(sequence) == target_sum]
def f2(target_sum, min_length, max_length):
sequences = {}
for length in range(min_length, max_length + 1):
sequence = f(target_sum, length)
if len(sequence):
sequences[length] = sequence
return sequences
if __name__ == "__main__":
print("f(10,2):")
print(f(10,2))
print()
print("f(10,1,3)")
pprint(f2(10,1,3))
Output:
f(10,2):
[(1, 9), (2, 8), (3, 7), (4, 6), (5, 5)]
f(10,1,3)
{1: [(10,)],
2: [(1, 9), (2, 8), (3, 7), (4, 6), (5, 5)],
3: [(1, 1, 8),
(1, 2, 7),
(1, 3, 6),
(1, 4, 5),
(2, 2, 6),
(2, 3, 5),
(2, 4, 4),
(3, 3, 4)]}
The problem is known as Integer Partitions, and has been widely studied.
Here you can find a paper comparing the performance of several algorithms (and proposing a particular one), but there are a lot of references all over the Net.
I just wrote a recursive generator function, you should figure out how to get a list out of it yourself...
def f(x,y):
if y == 1:
yield (x, )
elif y > 1:
for head in range(1, x-y+2):
for tail in f(x-head, y-1):
yield tuple([head] + list(tail))
def f2(x,y,z):
for u in range(y, z+1):
for v in f(x, u):
yield v
EDIT: I just see it is not exactly what you wanted, my version also generates duplicates where only the ordering differs. But you can simply filter them out by ordering all results and check for duplicate tuples.