How to make itertools combinations faster in python?

How to make itertools combinations faster in python? - python

I tried a lot of things and still don't know why it doesn't work fast. How to I fix it?
It is a CodeWars 6 kyu task:
Given a set of elements (integers or string characters, characters only in RISC-V), where any element may occur more than once, return the number of subsets that do not contain a repeated element.
import itertools
def est_subsets(a):
counter = 0
a = list(set(a))
p = itertools.chain.from_iterable(itertools.combinations(a, r)for r in range(1, len(a) + 1))
for b in p:
counter += 1
return counter

itertools.combinations needs to generate all the values. But you could just compute the number of values that would be generated directly, instead of generating them at all. Just use math.comb (added in 3.8), selecting the length of your input and you'll get the same results in a tiny fraction of the time.

Please take a look at the manual:
https://docs.python.org/3/library/itertools.html#itertools.combinations
The number of items returned is n! / r! / (n-r)! when 0 <= r <= n or zero when r > n.
Which means that you can calculate the number or items it should return.

Related

What is the math program I'm trying to solve in python?

I am trying to solve this math problem in python, and I'm not sure what it is called:
The answer X is always 100
Given a list of 5 integers, their sum would equal X
Each integer has to be between 1 and 25
The integers can appear one or more times in the list
I want to find all the possible unique lists of 5 integers that match.
These would match:
20,20,20,20,20
25,25,25,20,5
10,25,19,21,25
along with many more.
I looked at itertools.permutations, but I don't think that handles duplicate integers in the list. I'm thinking there must be a standard math algorithm for this, but my search queries must be poor.
Only other thing to mention is if it matters that the list size could change from 10 integers to some other length (6, 24, etc).

This is a constraint satisfaction problem. These can often be solved by a method called linear programming: You fix one part of the solution and then solve the remaining subproblem. In Python, we can implement this approach with a recursive function:
def csp_solutions(target_sum, n, i_min=1, i_max=25):
domain = range(i_min, i_max + 1)
if n == 1:
if target_sum in domain:
return [[target_sum]]
else:
return []
solutions = []
for i in domain:
# Check if a solution is still possible when i is picked:
if (n - 1) * i_min <= target_sum - i <= (n - 1) * i_max:
# Construct solutions recursively:
solutions.extend([[i] + sol
for sol in csp_solutions(target_sum - i, n - 1)])
return solutions
all_solutions = csp_solutions(100, 5)
This yields 23746 solutions, in agreement with the answer by Alex Reynolds.

Another approach with Numpy:
#!/usr/bin/env python
import numpy as np
start = 1
end = 25
entries = 5
total = 100
a = np.arange(start, end + 1)
c = np.array(np.meshgrid(a, a, a, a, a)).T.reshape(-1, entries)
assert(len(c) == pow(end, entries))
s = c.sum(axis=1)
#
# filter all combinations for those that meet sum criterion
#
valid_combinations = c[np.where(s == total)]
print(len(valid_combinations)) # 23746
#
# filter those combinations for unique permutations
#
unique_permutations = set(tuple(sorted(x)) for x in valid_combinations)
print(len(unique_permutations)) # 376

You want combinations_with_replacement from itertools library. Here is what the code would look like:
from itertools import combinations_with_replacement
values = [i for i in range(1, 26)]
candidates = []
for tuple5 in combinations_with_replacement(values, 5):
if sum(tuple5) == 100:
candidates.append(tuple5)
For me on this problem I get 376 candidates. As mentioned in the comments above if these are counted once for each arrangement of the 5-pair, then you'd want to look at all, permutations of the 5 candidates-which may not be all distinct. For example (20,20,20,20,20) is the same regardless of how you arrange the indices. However, (21,20,20,20,19) is not-this one has some distinct arrangements.

I think that this could be what you are searching for: given a target number SUM, a left treshold L, a right treshold R and a size K, find all the possible lists of K elements between L and R which sum gives SUM. There isn't a specific name for this problem though, as much as I was able to find.

generate unique string of length n without prefilled dictionary

I have an application that is kind of like a URL shortener and need to generate unique URL whenever a user requests.
For this I need a function to map an index/number to a unique string of length n with two requirements:
Two different numbers can not generate same string.
In other words as long as i,j<K: f(i) != f(j). K is the number of possible strings = 26^n. (26 is number of characters in English)
Two strings generated by number i and i+1 don't look similar most of the times. For example they are not abcdef1 and abcdef2. (So that users can not predict the pattern and the next IDs)
This is my current code in Python:
chars = "abcdefghijklmnopqrstuvwxyz"
for item in itertools.product(chars, repeat=n):
print("".join(item))
# For n = 7 generates:
# aaaaaaa
# aaaaaab
# aaaaaac
# ...
The problem with this code is there is no index that I can use to generate unique strings on demand by tracking that index. For example generate 1 million unique strings today and 2 million tomorrow without looping through or collision with the first 1 million.
The other problem with this code is that the strings that are created after each other look very similar and I need them to look random.
One option is to populate a table/dictionary with millions of strings, shuffle them and keep track of index to that table but it takes a lot of memory.
An option is also to check the database of existing IDs after generating a random string to make sure it doesn't exist but the problem is as I get closer to the K (26^n) the chance of collision increases and it wouldn't be efficient to make a lot of check_if_exist queries against the database.
Also if n was long enough I could use UUID with small chance of collision but in my case n is 7.

I'm going to outline a solution for you that is going to resist casual inspection even by a knowledgeable person, though it probably IS NOT cryptographically secure.
First, your strings and numbers are in a one-to-one map. Here is some simple code for that.
alphabet = 'abcdefghijklmnopqrstuvwxyz'
len_of_codes = 7
char_to_pos = {}
for i in range(len(alphabet)):
char_to_pos[alphabet[i]] = i
def number_to_string(n):
chars = []
for _ in range(len_of_codes):
chars.append(alphabet[n % len(alphabet)])
n = n // len(alphabet)
return "".join(reversed(chars))
def string_to_number(s):
n = 0
for c in s:
n = len(alphabet) * n + char_to_pos[c]
return n
So now your problem is how to take an ascending stream of numbers and get an apparently random stream of numbers out of it instead. (Because you know how to turn those into strings.) Well, there are lots of tricks for primes, so let's find a decent sized prime that fits in the range that you want.
def is_prime (n):
for i in range(2, n):
if 0 == n%i:
return False
elif n < i*i:
return True
if n == 2:
return True
else:
return False
def last_prime_before (n):
for m in range(n-1, 1, -1):
if is_prime(m):
return m
print(last_prime_before(len(alphabet)**len_of_codes)
With this we find that we can use the prime 8031810103. That's how many numbers we'll be able to handle.
Now there is an easy way to scramble them. Which is to use the fact that multiplication modulo a prime scrambles the numbers in the range 1..(p-1).
def scramble1 (p, k, n):
return (n*k) % p
Picking a random number to scramble by, int(random.random() * 26**7) happened to give me 3661807866, we get a sequence we can calculate with:
for i in range(1, 5):
print(number_to_string(scramble1(8031810103, 3661807866, i)))
Which gives us
lwfdjoc
xskgtce
jopkctb
vkunmhd
This looks random to casual inspection. But will be reversible for any knowledgeable someone who puts modest effort in. They just have to guess the prime and algorithm that we used, look at 2 consecutive values to get the hidden parameter, then look at a couple of more to verify it.
Before addressing that, let's figure out how to take a string and get the number back. Thanks to Fermat's little theorem we know for p prime and 1 <= k < p that (k * k^(p-2)) % p == 1.
def n_pow_m_mod_k (n, m, k):
answer = 1
while 0 < m:
if 1 == m % 2:
answer = (answer * n) % k
m = m // 2
n = (n * n) % k
return answer
print(n_pow_m_mod_k(3661807866, 8031810103-2, 8031810103))
This gives us 3319920713. Armed with that we can calculate scramble1(8031810103, 3319920713, string_to_number("vkunmhd")) to find out that vkunmhd came from 4.
Now let's make it harder. Let's generate several keys to be scrambling with:
import random
p = 26**7
for i in range(5):
p = last_prime_before(p)
print((p, int(random.random() * p)))
When I ran this I happened to get:
(8031810103, 3661807866)
(8031810097, 3163265427)
(8031810091, 7069619503)
(8031809963, 6528177934)
(8031809917, 991731572)
Now let's scramble through several layers, working from smallest prime to largest (this requires reversing the sequence):
def encode (n):
for p, k in [
(8031809917, 991731572)
, (8031809963, 6528177934)
, (8031810091, 7069619503)
, (8031810097, 3163265427)
, (8031810103, 3661807866)
]:
n = scramble1(p, k, n)
return number_to_string(n)
This will give a sequence:
ehidzxf
shsifyl
gicmmcm
ofaroeg
And to reverse it just use the same trick that reversed the first scramble (reversing the primes so I am unscrambling in the order that I started with):
def decode (s):
n = string_to_number(s)
for p, k in [
(8031810103, 3319920713)
, (8031810097, 4707272543)
, (8031810091, 5077139687)
, (8031809963, 192273749)
, (8031809917, 5986071506)
]:
n = scramble1(p, k, n)
return n
TO BE CLEAR I do NOT promise that this is cryptographically secure. I'm not a cryptographer, and I'm aware enough of my limitations that I know not to trust it.
But I do promise that you'll have a sequence of over 8 billion strings that you are able to encode/decode with no obvious patterns.
Now take this code, scramble the alphabet, regenerate the magic numbers that I used, and choose a different number of layers to go through. I promise you that I personally have absolutely no idea how someone would even approach the problem of figuring out the algorithm. (But then again I'm not a cryptographer. Maybe they have some techniques to try. I sure don't.)

How about :
from random import Random
n = 7
def f(i):
myrandom = Random()
myrandom.seed(i)
alphabet = "123456789"
return "".join([myrandom.choice(alphabet) for _ in range(7)])
# same entry, same output
assert f(0) == "7715987"
assert f(0) == "7715987"
assert f(0) == "7715987"
# different entry, different output
assert f(1) == "3252888"
(change the alphabet to match your need)
This "emulate" a UUID, since you said you could accept a small chance of collision. If you want to avoid collision, what you really need is a perfect hash function (https://en.wikipedia.org/wiki/Perfect_hash_function).

you can try something based on the sha1 hash
#!/usr/bin/python3
import hashlib
def generate_link(i):
n = 7
a = "abcdefghijklmnopqrstuvwxyz01234567890"
return "".join(a[x%36] for x in hashlib.sha1(str(i).encode('ascii')).digest()[-n:])

This is a really simple example of what I outlined in this comment. It just offsets the number based on i. If you want "different" strings, don't use this, because if num is 0, then you will get abcdefg (with n = 7).
alphabet = "abcdefghijklmnopqrstuvwxyz"
# num is the num to convert, i is the "offset"
def num_to_char(num, i):
return alphabet[(num + i) % 26]
# generate the link
def generate_link(num, n):
return "".join([num_to_char(num, i) for i in range(n)])
generate_link(0, 7) # "abcdefg"
generate_link(0, 7) # still "abcdefg"
generate_link(0, 7) # again, "abcdefg"!
generate_link(1, 7) # now it's "bcdefgh"!
You would just need to change the num + i to some complicated and obscure math equation.

Fastest way to find sub-lists of a fixed length, from a given list of values, whose elements sum equals a defined number

In Python 3.6, suppose that I have a list of numbers L, and that I want to find all possible sub-lists S of a given pre-chosen length |S|, such that:
any S has to have length smaller than L, that is |S| < |L|
any S can only contain numbers present in L
numbers in S do not have to be unique (they can appear repeatedly)
the sum of all numbers in S should be equal to a pre-determined number N
A trivial solution for this can be found using the Cartesian Product with itertools.product. For example, suppose L is a simple list of all integers between 1 and 10 (inclusive) and |S| is chosen to be 3. Then:
import itertools
L = range(1,11)
N = 8
Slength = 3
result = [list(seq) for seq in itertools.product(L, repeat=Slength) if sum(seq) == N]
However, as larger lists L are chosen, and or larger |S|, the above approach becomes extremely slow. In fact, even for L = range(1,101) with |S|=5 and N=80, the computer almost freezes and it takes approximately an hour to compute the result.
My take is that:
there is a lot of unnecessary computations going on there under the hood, given the condition that sub-lists should sum to N
there is a ton of cache misses due to iterating over possibly millions of lists generated by itertools.product to just keep much much fewer
So, my question/challenge is: is there a way I can do this in a more computationally efficient way? Unless we are talking hundreds of Gigabytes, speed to me is more critical than memory, so the challenge focuses more on speed, even if considerations for memory efficiency are a welcome bonus.

So given an input list and a target length and sum, you want all the permutations of the numbers in the input list such that:
The sum equals the target sum
The length equals the target length
The following code should be faster:
# Input
input_list = range(1,101)
# Targets
target_sum = 15
target_length = 5
# Available numbers
numbers = set(input_list)
# Initialize the stack
stack = [[num] for num in numbers]
result = []
# Loop until we run out of permutations
while stack:
# Get a permutation from the stack
current = stack.pop()
# If it's too short
if len(current) < target_length:
# And the sum is too small
if sum(current) < target_sum:
# Then for each available number
for num in numbers:
# Append said number and put the resulting permutation back into the stack
stack.append(current + [num])
# If it's not too short and the sum equals the target, add to the result!
elif sum(current) == target_sum:
result.append(current)
print(len(result))

I need help finding a smart solution to shorten the time this code runs

2 days ago i started practicing python 2.7 on Codewars.com and i came across a really interesting problem, the only thing is i think it's a bit too much for my level of python knowledge. I actually did solve it in the end but the site doesn't accept my solution because it takes too much time to complete when you call it with large numbers, so here is the code:
from itertools import permutations
def next_bigger(n):
digz =list(str(n))
nums =permutations(digz, len(digz))
nums2 = []
for i in nums:
z =''
for b in range(0,len(i)):
z += i[b]
nums2.append(int(z))
nums2 = list(set(nums2))
nums2.sort()
try:
return nums2[nums2.index(n)+1]
except:
return -1
"You have to create a function that takes a positive integer number and returns the next bigger number formed by the same digits" - These were the original instructions
Also, at one point i decided to forgo the whole permutations idea, and in the middle of this second attempt i realized that there's no way it would work:
def next_bigger(n):
for i in range (1,11):
c1 = n % (10**i) / (10**(i-1))
c2 = n % (10**(i+1)) / (10**i)
if c1 > c2:
return ((n /(10**(i+1)))*10**(i+1)) + c1 *(10**i) + c2*(10**(i-1)) + n % (10**(max((i-1),0)))
break
if anybody has any ideas, i'm all-ears and if you hate my code, please do tell, because i really want to get better at this.

stolen from http://www.geeksforgeeks.org/find-next-greater-number-set-digits/
Following are few observations about the next greater number.
1) If all digits sorted in descending order, then output is always “Not Possible”. For example, 4321.
2) If all digits are sorted in ascending
order, then we need to swap last two digits. For example, 1234.
3) For
other cases, we need to process the number from rightmost side (why?
because we need to find the smallest of all greater numbers)
You can now try developing an algorithm yourself.
Following is the algorithm for finding the next greater number.
I)
Traverse the given number from rightmost digit, keep traversing till
you find a digit which is smaller than the previously traversed digit.
For example, if the input number is “534976”, we stop at 4 because 4
is smaller than next digit 9. If we do not find such a digit, then
output is “Not Possible”.
II) Now search the right side of above found digit ‘d’ for the
smallest digit greater than ‘d’. For “534976″, the right side of 4
contains “976”. The smallest digit greater than 4 is 6.
III) Swap the above found two digits, we get 536974 in above example.
IV) Now sort all digits from position next to ‘d’ to the end of
number. The number that we get after sorting is the output. For above
example, we sort digits in bold 536974. We get “536479” which is the
next greater number for input 534976.

"formed by the same digits" - there's a clue that you have to break the number into digits: n = list(str(n))
"next bigger". The fact that they want the very next item means that you want to make the least change. Focus on changing the 1s digit. If that doesn't work, try the 10's digit, then the 100's, etc. The smallest change you can make is to exchange two furthest digits to the right that will increase the value of the integer. I.e. exchange the two right-most digits in which the more right-most is bigger.
def next_bigger(n):
n = list(str(n))
for i in range(len(n)-1, -1, -1):
for j in range(i-1, -1, -1):
if n[i] > n[j]:
n[i], n[j] = n[j], n[i]
return int("".join(n))
print next_bigger(123)
Oops. This fails for next_bigger(1675). I'll leave the buggy code here for a while, for whatever it is worth.

How about this? See in-line comments for explanations. Note that the way this is set up, you don't end up with any significant memory use (we're not storing any lists).
from itertools import permutations
#!/usr/bin/python3
def next_bigger(n):
# set next_bigger to an arbitrarily large value to start: see the for-loop
next_bigger = float('inf')
# this returns a generator for all the integers that are permutations of n
# we want a generator because when the potential number of permutations is
# large, we don't want to store all of them in memory.
perms = map(lambda x: int(''.join(x)), permutations(str(n)))
for p in perms:
if (p > n) and (p <= next_bigger):
# we can find the next-largest permutation by going through all the
# permutations, selecting the ones that are larger than n, and then
# selecting the smallest from them.
next_bigger = p
return next_bigger
Note that this is still a brute-force algorithm, even if implemented for speed. Here is an example result:
time python3 next_bigger.py 3838998888
3839888889
real 0m2.475s
user 0m2.476s
sys 0m0.000s
If your code needs to be faster yet, then you'll need a smarter, non-brute-force algorithm.

You don't need to look at all the permutations. Take a look at the two permutations of the last two digits. If you have an integer greater than your integer, that's it. If not, take a look at the permutations of the last three digits, etc.
from itertools import permutations
def next_bigger(number):
check = 2
found = False
digits = list(str(number))
if sorted(digits, reverse=True) == digits:
raise ValueError("No larger number")
while not found:
options = permutations(digits[-1*check:], check)
candidates = list()
for option in options:
new = digits.copy()[:-1*check]
new.extend(option)
candidate = int(''.join(new))
if candidate > number:
candidates.append(candidate)
if candidates:
result = sorted(candidates)[0]
found = True
return result
check += 1

compute mean in python for a generator

I'm doing some statistics work, I have a (large) collection of random numbers to compute the mean of, I'd like to work with generators, because I just need to compute the mean, so I don't need to store the numbers.
The problem is that numpy.mean breaks if you pass it a generator. I can write a simple function to do what I want, but I'm wondering if there's a proper, built-in way to do this?
It would be nice if I could say "sum(values)/len(values)", but len doesn't work for genetators, and sum already consumed values.
here's an example:
import numpy
def my_mean(values):
n = 0
Sum = 0.0
try:
while True:
Sum += next(values)
n += 1
except StopIteration: pass
return float(Sum)/n
X = [k for k in range(1,7)]
Y = (k for k in range(1,7))
print numpy.mean(X)
print my_mean(Y)
these both give the same, correct, answer, buy my_mean doesn't work for lists, and numpy.mean doesn't work for generators.
I really like the idea of working with generators, but details like this seem to spoil things.

In general if you're doing a streaming mean calculation of floating point numbers, you're probably better off using a more numerically stable algorithm than simply summing the generator and dividing by the length.
The simplest of these (that I know) is usually credited to Knuth, and also calculates variance. The link contains a python implementation, but just the mean portion is copied here for completeness.
def mean(data):
n = 0
mean = 0.0
for x in data:
n += 1
mean += (x - mean)/n
if n < 1:
return float('nan')
else:
return mean
I know this question is super old, but it's still the first hit on google, so it seemed appropriate to post. I'm still sad that the python standard library doesn't contain this simple piece of code.

Just one simple change to your code would let you use both. Generators were meant to be used interchangeably to lists in a for-loop.
def my_mean(values):
n = 0
Sum = 0.0
for v in values:
Sum += v
n += 1
return Sum / n

def my_mean(values):
total = 0
for n, v in enumerate(values, 1):
total += v
return total / n
print my_mean(X)
print my_mean(Y)
There is statistics.mean() in Python 3.4 but it calls list() on the input:
def mean(data):
if iter(data) is data:
data = list(data)
n = len(data)
if n < 1:
raise StatisticsError('mean requires at least one data point')
return _sum(data)/n
where _sum() returns an accurate sum (math.fsum()-like function that in addition to float also supports Fraction, Decimal).

The old-fashioned way to do it:
def my_mean(values):
sum, n = 0, 0
for x in values:
sum += x
n += 1
return float(sum)/n

One way would be
numpy.fromiter(Y, int).mean()
but this actually temporarily stores the numbers.

Your approach is a good one, but you should instead use the for x in y idiom instead of repeatedly calling next until you get a StopIteration. This works for both lists and generators:
def my_mean(values):
n = 0
Sum = 0.0
for value in values:
Sum += value
n += 1
return float(Sum)/n

You can use reduce without knowing the size of the array:
from itertools import izip, count
reduce(lambda c,i: (c*(i[1]-1) + float(i[0]))/i[1], izip(values,count(1)),0)

def my_mean(values):
n = 0
sum = 0
for v in values:
sum += v
n += 1
return sum/n
The above is very similar to your code, except by using for to iterate values you are good no matter if you get a list or an iterator.
The python sum method is however very optimized, so unless the list is really, really long, you might be more happy temporarily storing the data.
(Also notice that since you are using python3, you don't need float(sum)/n)

If you know the length of the generator in advance and you want to avoid storing the full list in memory, you can use:
reduce(np.add, generator)/length

Try:
import itertools
def mean(i):
(i1, i2) = itertools.tee(i, 2)
return sum(i1) / sum(1 for _ in i2)
print mean([1,2,3,4,5])
tee will duplicate your iterator for any iterable i (e.g. a generator, a list, etc.), allowing you to use one duplicate for summing and the other for counting.
(Note that 'tee' will still use intermediate storage).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.