iterating from a tuple of tuples optimizations - python

Basic problem: take a list of digits, find all permutations, filter, filter again, and sum.
this is my first python script so after some research i decided to use itertools.permutations. i then iterate through the tuple, and create a new tuple of tuples with only the tuples i wanted. i then concatenate the tuples because I want the permutations as numbers, not as broken strings.
then i do one more filter and sum them together.
for 8 digits, this is taking me about 2.5 seconds, far too slow if i want to scale to 15 digits (my goal).
(I decided to use tuples since a list of the permutations will be too large for memory)
EDIT: I realized that I don't care about the sum of the permutations, but rather just the count. If going the generator path, how could I include a counter instead of taking the sum?
Updated my original code with [very] slight improvements shortcuts also, as to not just copy pasta suggested answers before I truly understand them.
import itertools
digits= [0,1,2,3,4,5,6,7]
digital=(itertools.permutations(digits))
mytuple=()
for i in digital:
q=''
j=list(i)
if j[0] != 0:
for k in range(len(j)):
q=q+str(j[k])
mytuple=mytuple+(q,)
#print mytuple
z = [i for i in mytuple if i%7==0]
print len(z)
this being my first python script, any non-optimization pointers would also be appreciated.
thanks!

"Generator comprehensions" are your friend. Not least because a "generator" only works on one element at a time, helping you save memory. Also, some time can be saved by pre-computing the relevant powers of 10 and performing integer arithmetic instead of converting to and from strings:
import itertools
digits = [0,1,2,3,4,5,6,7,8,9]
oom = [ 10 ** i for i, digit in enumerate( digits ) ][ ::-1 ] # orders of magnitude
allperm = itertools.permutations( digits )
firstpass = ( sum( a * b for a, b in zip( perm, oom ) ) for perm in allperm if perm[ 0 ] )
print sum( i for i in firstpass if i % 7 == 0 )
This is faster than the original by a large factor, but the factorial nature of permutations means that 15 digits is still a long way away. I get 0.05s for len(digits)==8, 0.5s for len(digits)==9, but 9.3s for len(digits)==10...
Since you're working in base 10, digits sequences of length >10 will contain repeats, leading to repeats in the set of permutations. Your strategy will need to change if the repeats are not supposed to be counted separately (e.g. if the question is phrased as "how many 15-digit multiples of 7 are repermutations of the following digits...").

Using itertools is a good choice. Well investigated.
I tried to improve the nice solution of #jez. I rearanged the range, replaced zip through izip and cached the lookup in a local variable.
N = 10
gr = xrange(N-1,-1,-1)
ap = itertools.permutations(gr)
o = [10 ** i for i in gr]
zip = itertools.izip
print sum(i for i in (sum(a*b for a, b in zip(p, o)) for p in ap if p[0]) if i % 7 == 0)
For me it's about 17% faster for N=9 and 7% for N=10. The speed improvement may is negligible for larger N's, but not tested.

There are many short-cuts in python you're missing. Try this:
import itertools
digits= [0,1,2,3,4,5,6,7]
digital=(itertools.permutations(digits))
mytuple=set()
for i in digital:
if i[0] != 0:
mytuple.add(int(''.join(str(d) for d in i)))
z = [i for i in mytuple if i%7==0]
print sum(z)
Might be hard to get to 15 digits though. 15! is 1.3 trillion...if you could process 10 million permutations per second, it would still take 36 hours.

Related

How to make itertools combinations faster in python?

I tried a lot of things and still don't know why it doesn't work fast. How to I fix it?
It is a CodeWars 6 kyu task:
Given a set of elements (integers or string characters, characters only in RISC-V), where any element may occur more than once, return the number of subsets that do not contain a repeated element.
import itertools
def est_subsets(a):
counter = 0
a = list(set(a))
p = itertools.chain.from_iterable(itertools.combinations(a, r)for r in range(1, len(a) + 1))
for b in p:
counter += 1
return counter
itertools.combinations needs to generate all the values. But you could just compute the number of values that would be generated directly, instead of generating them at all. Just use math.comb (added in 3.8), selecting the length of your input and you'll get the same results in a tiny fraction of the time.
Please take a look at the manual:
https://docs.python.org/3/library/itertools.html#itertools.combinations
The number of items returned is n! / r! / (n-r)! when 0 <= r <= n or zero when r > n.
Which means that you can calculate the number or items it should return.

How do I create a data frame with every possible permutation of integers with certain constraints?

I have ten columns: n1, n2, n3, n4, n5, n6, n7, n8, n9, n10.
The values in a single row must add up to exactly 10, and all of the values must be non-negative integers less than or equal to 5.
I'd like to make a DataFrame with every possible permutation according to the constraints that I've just described. The order matters (i.e., [5,5,0,0,0,0,0,0,0,0] and [5,0,5,0,0,0,0,0,0,0] should both be separate rows).
Here's my attempt:
import itertools as it
permutations = [i for i in it.permutations(range(0,6), 10) if sum(i)==10]
df = pd.DataFrame(data=permutations,columns=['x1','x2','x3','x4','x5','x6','x7','x8','x9','x10'])
The problem is that there are zero rows in df. The array permutations is empty. I don't see why that is. If I replace it.permutations with it.combinations_with_replacement, the length of the resulting list is 30. Why does it.permutations return nothing?
It's an easy fix!
Since order matters, you're actually looking for itertools.product (I know, it's a weird name). Here's the documentation: https://docs.python.org/3/library/itertools.html#itertools.product.
Solution:
import itertools as it
permutations = [i for i in it.product(range(6), repeat=10) if sum(i) == 10]
You can't get a permutation of 10 items from a list of 6 items. (Maybe "permutation" doesn't mean what you think it means.)
Here's one way to get what you want (though its gonna take a while to run):
permutations = []
for p in [i for i in it.combinations_with_replacement(range(0,6), 10) if sum(i)==10]:
permutations += [x for x in set(it.permutations(p))]
(Explanation: each p is way of choosing 10 sets of values with the proper sum. The we use permutations to find all the ways to order that set of values.)
You chose the incorrect way to repeat things.
import itertools as it
permutations = [i for i in it.product(*it.repeat(range(6),10)) if sum(i)==10]
df = pd.DataFrame(data=permutations,columns=['x1','x2','x3','x4','x5','x6','x7','x8','x9','x10'])
This should get about 85228 results.

Fastest way to find sub-lists of a fixed length, from a given list of values, whose elements sum equals a defined number

In Python 3.6, suppose that I have a list of numbers L, and that I want to find all possible sub-lists S of a given pre-chosen length |S|, such that:
any S has to have length smaller than L, that is |S| < |L|
any S can only contain numbers present in L
numbers in S do not have to be unique (they can appear repeatedly)
the sum of all numbers in S should be equal to a pre-determined number N
A trivial solution for this can be found using the Cartesian Product with itertools.product. For example, suppose L is a simple list of all integers between 1 and 10 (inclusive) and |S| is chosen to be 3. Then:
import itertools
L = range(1,11)
N = 8
Slength = 3
result = [list(seq) for seq in itertools.product(L, repeat=Slength) if sum(seq) == N]
However, as larger lists L are chosen, and or larger |S|, the above approach becomes extremely slow. In fact, even for L = range(1,101) with |S|=5 and N=80, the computer almost freezes and it takes approximately an hour to compute the result.
My take is that:
there is a lot of unnecessary computations going on there under the hood, given the condition that sub-lists should sum to N
there is a ton of cache misses due to iterating over possibly millions of lists generated by itertools.product to just keep much much fewer
So, my question/challenge is: is there a way I can do this in a more computationally efficient way? Unless we are talking hundreds of Gigabytes, speed to me is more critical than memory, so the challenge focuses more on speed, even if considerations for memory efficiency are a welcome bonus.
So given an input list and a target length and sum, you want all the permutations of the numbers in the input list such that:
The sum equals the target sum
The length equals the target length
The following code should be faster:
# Input
input_list = range(1,101)
# Targets
target_sum = 15
target_length = 5
# Available numbers
numbers = set(input_list)
# Initialize the stack
stack = [[num] for num in numbers]
result = []
# Loop until we run out of permutations
while stack:
# Get a permutation from the stack
current = stack.pop()
# If it's too short
if len(current) < target_length:
# And the sum is too small
if sum(current) < target_sum:
# Then for each available number
for num in numbers:
# Append said number and put the resulting permutation back into the stack
stack.append(current + [num])
# If it's not too short and the sum equals the target, add to the result!
elif sum(current) == target_sum:
result.append(current)
print(len(result))

Python Lottery Number Generation

I am working on a lottery number generation program. I have a fixed list of allowed numbers (1-80) from which users can choose 6 numbers. Each number can only be picked once. I want to generate all possible combinations efficiently. Current implementation takes more than 30 seconds if allowed_numbers is [1,...,60]. Above that, it freezes my system.
from itertools import combinations
import numpy as np
LOT_SIZE = 6
allowed_numbers = np.arange(1, 61)
all_combinations = np.array(list(combinations(allowed_numbers, LOT_SIZE)))
print(len(all_combinations))
I think I would need a numpy array (not sure if 2D). Something like,
[[1,2,3,4,5,6],
[1,2,3,4,5,,7],...]
because I want to (quickly) perform several operations on these combinations. These operations may include,
Removing combinations that have only even numbers
Removing combinations who's sum is greater than 150 etc.
Checking if there is only one pair of consecutive numbers (Acceptable: [1,2,4,6,8,10] {Pair: (1,2)}| Not-acceptable: [1,2,4,5,7,9] {Pairs: (1,2) and (4,5)} )
Any help will be appreciated.
Thanks
Some options:
1) apply filters on the iterable instead of on the data, using filter:
def filt(x):
return sum(x) < 7
list(filter(filt, itertools.combinations(allowed, n)))
will save ~15% time vs. constructing the list and applying the filters then, i.e.:
[i for i in itertools.combinations(allowed, n) if filt(i) if filt(i)]
2) Use np.fromiter
arr = np.fromiter(itertools.chain.from_iterable(itertools.combinations(allowed, n)), int).reshape(-1, n)
return arr[arr.sum(1) < 7]
3) work on the generator object itself. In the example above, you can stop the itertools.combinations when the first number is above 7 (as an example):
def my_generator():
for i in itertools.combinations(allowed, n):
if i[0] >= 7:
return
elif sum(i) < 7:
yield i
list(my_generator()) # will build 3x times faster than option 1
Note that np.fromiter becomes less efficient on compound expressions, so the mask is applied afterwards
You can use itertools.combinations(allowed_numbers, 6) to get all combinations of length 6 from your list (this is the fastest way to get this operation done).

Algorithm - Grouping List in unique pairs

I'm having difficulties with an assignment I've received, and I am pretty sure the problem's text is flawed. I've translated it to this:
Consider a list x[1..2n] with elements from {1,2,..,m}, m < n. Propose and implement in Python an algorithm with a complexity of O(n) that groups the elements into pairs (pairs of (x[i],x[j]) with i < j) such as every element is present in a single pair. For each set of pairs, calculate the maximum sum of the pairs, then compare it with the rest of the sets. Return the set that has the minimum of those.
For example, x = [1,5,9,3] can be paired in three ways:
(1,5),(9,3) => Sums: 6, 12 => Maximum 12
(1,9),(5,3) => Sums: 10, 8 => Maximum 10
(1,3),(5,9) => Sums: 4, 14 => Maximum 14
----------
Minimum 10
Solution to be returned: (1,9),(5,3)
The things that strike me oddly are as follows:
Table contents definition It says that there are elements of 1..2n, from {1..m}, m < n. But if m < n, then there aren't enough elements to populate the list without duplicating some, which is not allowed. So then I would assume m >= 2n. Also, the example has n = 2 but uses elements that are greater than 1, so I assume that's what they meant.
O(n) complexity? So is there a way to combine them in a single loop? I can't think of anything.
My Calculations:
For n = 4:
Number of ways to combine: 6
Valid ways: 3
For n = 6
Number of ways to combine: 910
Valid ways: 15
For n = 8
Number of ways to combine: >30 000
Valid ways: ?
So obviously, I cannot use brute force and then figure out if it is valid after then. The formula I used to calculate the total possible ways is
C(C(n,2),n/2)
Question:
Is this problem wrongly written and impossible to solve? If so, what conditions should be added or removed to make it feasible? If you are going to suggest some code in python, remember I cannot use any prebuilt functions of any kind. Thank you
Assuming a sorted list:
def answer(L):
return list(zip(L[:len(L)//2], L[len(L)//2:][::-1]))
Or if you want to do it more manually:
def answer(L):
answer = []
for i in range(len(L)//2):
answer.append((L[i], L[len(L)-i-1)]))
return answer
Output:
In [3]: answer([1,3,5,9])
Out[3]: [(1, 9), (3, 5)]

Categories