How can I obtain all combinations of my list? - python

I am attempting to create an array with all the possible combinations of two numbers.
My array is [0, 17.1]
I wish to obtain all the possible combinations of these two values in a list of 48 elements long, both of which can be repeated.
from itertools import combinations_with_replacement
array = [0, 17.1]
combo_wr = combinations_with_replacement(array, 48)
print(len(list(combo_wr)))
I have attempted to make use of itertools.combinations_with_replacement to create something which looks like the following -> combo_wr = combinations_with_replacement(array, 48).
When I print the length of this I would expect a much larger number but I am only getting 49 combinations of these numbers. Where am I going wrong or what other functions would work better to get all the possible combinations, order does not matter in the instance.
Below is what I have tried so far for reproducibility
>>> from itertools import combinations_with_replacement
>>> array = [0, 17.1]
>>> combo_wr = combinations_with_replacement(array, 48)
>>> print(len(list(combo_wr)))
49

a sequence of 48 numbers each chosen from 2 different options gives a search space of 2^48 which is 281.4 trillion.
An added constrant that the sum of the numbers should be larger than 250, then with [0,17.1] means at least 15 of the elements must be 17.1 so you reduce your search space by 48 choose 15 which is 1 trillion, I.E. not enough to make much of a difference.
If you set the first (or last) 15 elements to 17.1 then it would reduce the search space to choosing the rest of the elements so 2^(58-15) = 2^33 which is 8.6 billion but I'm not sure that is the constraint you actually want or if that is still small enough to be useful.
So code that produces the results you asked for is not likely to help you.
But if you still wanted help generating those trillions of combinations
to clarify what the different options available to you:
itertools.product gives every possible sequence of heads and tails
itertools.combinations gives the unordered subsets of a given length
itertools.permutations gives all ways of reordering the given sequence, or ordering of all subsets of a given length
itertools.combinations_with_replacement gives all subsets where the number of repetitions of different options is unique, for 2 element input this would be like "after n coin flips what are the sequences where the number of heads is unique"
permutations and combinations don't make sense with len(array)==2 and r=48 since they are about subsets and product will do a lot more redundancy than you want.
order does not matter in the instance.
If this is the case then it is possible you are just expecting more combos then there are.
I wish to get all of them but is it possible to narrow down those of which would satisfy say the summated value of >= 250
ok so then you can get every unique value for the sum of elements with combinations_with_replacements then do permutations on that
array = [0, 17.1]
reps = 48
lower_bound = 250
upper_bound = float("inf") # you might have an upper bound, if not you can remove this from the condition below or leave it as inf
for combo in combinations_with_replacement(array, reps):
if lower_bound <= sum(combo) <= upper_bound:
# this combo of 'number of elements that are 17.1` meets criteria for further investigation
for perm in permutations(combo):
do_thing(perm)
although this still ends up visiting a ton of duplicate entries since permutations of a sequence with a lot of duplicate entries will swap elements that are equal and give the same entries so we can do better.
First the combinations_with_replacement is really only communicating how many of each element we are dealing with so we can just do for k in range(reps) to get that info, and then want every permutation that has exactly k repeats of the second element in array - which happens to be equivalent to choosing k indices to set to that.
So we can use combinations(range(reps), k) to get a set of indices to set to the second element and this I believe is the smallest set of possible sequences you would have to check to meet the "sum is greater than 250 requirement.
reps = 48
def isSummationValidCombo(summation):
return summation >= 250
for k in range(reps):
summation = array[1] *k + array[0] * (reps-k)
if not isSummationValidCombo(summation):
continue
for indices_of_sequence_to_set_to_second_element in combinations(range(reps), k):
# each combination of k inices to set to the higher value
seq = [array[0]]*reps
for idx in indices_of_sequence_to_set_to_second_element:
seq[idx] = array[1]
do_thing(seq)
this would leave your number of combinations as 280 trillion compared to the 281 trillion that would be hit by product so you will probably need to figure out other techniques to reduce search space

Related

Fast Python algorithm for random partitioning with subset sums equal or close to given ratios

This question is an extension of my previous question: Fast python algorithm to find all possible partitions from a list of numbers that has subset sums equal to a ratio
. I want to divide a list of numbers so that the ratios of subset sums equal to given values. The difference is now I have a long list of 200 numbers so that a enumeration is infeasible. Note that although there are of course same numbers in the list, every number is distinguishable.
import random
lst = [random.randrange(10) for _ in range(200)]
In this case, I want a function to stochastically sample a certain amount of partitions with subset sums equal or close to the given ratios. This means that the solution can be sub-optimal, but I need the algorithm to be fast enough. I guess a Greedy algorithm will do. With that being said, of course it would be even better if there is a relatively fast algorithm that can give the optimal solution.
For example, I want to sample 100 partitions, all with subset sum ratios of 4 : 3 : 3. Duplicate partitions are allowed but should be very unlikely for such long list. The function should be used like this:
partitions = func(numbers=lst, ratios=[4, 3, 3], num_gen=100)
To test the solution, you can do something like:
from math import isclose
eps = 0.05
assert all([isclose(ratios[i] / sum(ratios), sum(x) / sum(lst), abs_tol=eps)
for part in partitions for i, x in enumerate(part)])
Any suggestions?
You can use a greedy heuristic where you generate each partition from num_gen random permutations of the list. Each random permutation is partitioned into len(ratios) contiguous sublists. The fact that the partition subsets are sublists of a permutation make enforcing the ratio condition very easy to do during sublist generation: as soon as the sum of the sublist we are currently building reaches one of the ratios, we "complete" the sublist, add it to the partition and start creating a new sublist. We can do this in one pass through the entire permutation, giving us the following algorithm of time complexity O(num_gen * len(lst)).
M = 100
N = len(lst)
P = len(ratios)
R = sum(ratios)
S = sum(lst)
for _ in range(M):
# get a new random permutation
random.shuffle(lst)
partition = []
# starting index (in the permutation) of the current sublist
lo = 0
# permutation partial sum
s = 0
# index of sublist we are currently generating (i.e. what ratio we are on)
j = 0
# ratio partial sum
rs = ratios[j]
for i in range(N):
s += lst[i]
# if ratio of permutation partial sum exceeds ratio of ratio partial sum,
# the current sublist is "complete"
if s / S >= rs / R:
partition.append(lst[lo:i + 1])
# start creating new sublist from next element
lo = i + 1
j += 1
if j == P:
# done with partition
# remaining elements will always all be zeroes
# (i.e. assert should never fail)
assert all(x == 0 for x in lst[i+1:])
partition[-1].extend(lst[i+1:])
break
rs += ratios[j]
Note that the outer loop can be redesigned to loop indefinitely until num_gen good partitions are generated (rather than just looping num_gen times) for more robustness. This algorithm is expected to produce M good partitions in O(M) iterations (provided random.shuffle is sufficiently random) if the number of good partitions is not too small compared to the total number of partitions of the same size, so it should perform well for for most inputs. For an (almost) uniformly random list like [random.randrange(10) for _ in range(200)], every iteration produces a good partition with eps = 0.05 as is evident by running the example below. Of course, how well the algorithm performs will also depend on the definition of 'good' -- the stricter the closeness requirement (in other words, the smaller the epsilon), the more iterations it will take to find a good partition. This implementation can be found here, and will work for any input (assuming random.shuffle eventually produces all permutations of the input list).
You can find a runnable version of the code (with asserts to test how "good" the partitions are) here.

Fastest way to find sub-lists of a fixed length, from a given list of values, whose elements sum equals a defined number

In Python 3.6, suppose that I have a list of numbers L, and that I want to find all possible sub-lists S of a given pre-chosen length |S|, such that:
any S has to have length smaller than L, that is |S| < |L|
any S can only contain numbers present in L
numbers in S do not have to be unique (they can appear repeatedly)
the sum of all numbers in S should be equal to a pre-determined number N
A trivial solution for this can be found using the Cartesian Product with itertools.product. For example, suppose L is a simple list of all integers between 1 and 10 (inclusive) and |S| is chosen to be 3. Then:
import itertools
L = range(1,11)
N = 8
Slength = 3
result = [list(seq) for seq in itertools.product(L, repeat=Slength) if sum(seq) == N]
However, as larger lists L are chosen, and or larger |S|, the above approach becomes extremely slow. In fact, even for L = range(1,101) with |S|=5 and N=80, the computer almost freezes and it takes approximately an hour to compute the result.
My take is that:
there is a lot of unnecessary computations going on there under the hood, given the condition that sub-lists should sum to N
there is a ton of cache misses due to iterating over possibly millions of lists generated by itertools.product to just keep much much fewer
So, my question/challenge is: is there a way I can do this in a more computationally efficient way? Unless we are talking hundreds of Gigabytes, speed to me is more critical than memory, so the challenge focuses more on speed, even if considerations for memory efficiency are a welcome bonus.
So given an input list and a target length and sum, you want all the permutations of the numbers in the input list such that:
The sum equals the target sum
The length equals the target length
The following code should be faster:
# Input
input_list = range(1,101)
# Targets
target_sum = 15
target_length = 5
# Available numbers
numbers = set(input_list)
# Initialize the stack
stack = [[num] for num in numbers]
result = []
# Loop until we run out of permutations
while stack:
# Get a permutation from the stack
current = stack.pop()
# If it's too short
if len(current) < target_length:
# And the sum is too small
if sum(current) < target_sum:
# Then for each available number
for num in numbers:
# Append said number and put the resulting permutation back into the stack
stack.append(current + [num])
# If it's not too short and the sum equals the target, add to the result!
elif sum(current) == target_sum:
result.append(current)
print(len(result))

Python Lottery Number Generation

I am working on a lottery number generation program. I have a fixed list of allowed numbers (1-80) from which users can choose 6 numbers. Each number can only be picked once. I want to generate all possible combinations efficiently. Current implementation takes more than 30 seconds if allowed_numbers is [1,...,60]. Above that, it freezes my system.
from itertools import combinations
import numpy as np
LOT_SIZE = 6
allowed_numbers = np.arange(1, 61)
all_combinations = np.array(list(combinations(allowed_numbers, LOT_SIZE)))
print(len(all_combinations))
I think I would need a numpy array (not sure if 2D). Something like,
[[1,2,3,4,5,6],
[1,2,3,4,5,,7],...]
because I want to (quickly) perform several operations on these combinations. These operations may include,
Removing combinations that have only even numbers
Removing combinations who's sum is greater than 150 etc.
Checking if there is only one pair of consecutive numbers (Acceptable: [1,2,4,6,8,10] {Pair: (1,2)}| Not-acceptable: [1,2,4,5,7,9] {Pairs: (1,2) and (4,5)} )
Any help will be appreciated.
Thanks
Some options:
1) apply filters on the iterable instead of on the data, using filter:
def filt(x):
return sum(x) < 7
list(filter(filt, itertools.combinations(allowed, n)))
will save ~15% time vs. constructing the list and applying the filters then, i.e.:
[i for i in itertools.combinations(allowed, n) if filt(i) if filt(i)]
2) Use np.fromiter
arr = np.fromiter(itertools.chain.from_iterable(itertools.combinations(allowed, n)), int).reshape(-1, n)
return arr[arr.sum(1) < 7]
3) work on the generator object itself. In the example above, you can stop the itertools.combinations when the first number is above 7 (as an example):
def my_generator():
for i in itertools.combinations(allowed, n):
if i[0] >= 7:
return
elif sum(i) < 7:
yield i
list(my_generator()) # will build 3x times faster than option 1
Note that np.fromiter becomes less efficient on compound expressions, so the mask is applied afterwards
You can use itertools.combinations(allowed_numbers, 6) to get all combinations of length 6 from your list (this is the fastest way to get this operation done).

Random number generator that returns only one number each time

Does Python have a random number generator that returns only one random integer number each time when next() function is called? Numbers should not repeat and the generator should return random integers in the interval [1, 1 000 000] that are unique.
I need to generate more than million different numbers and that sounds as if it is very memory consuming in case all the number are generated at same time and stored in a list.
You are looking for a linear congruential generator with a full period. This will allow you to get a pseudo-random sequence of non-repeating numbers in your target number range.
Implementing a LCG is actually very simple, and looks like this:
def lcg(a, c, m, seed = None):
num = seed or 0
while True:
num = (a * num + c) % m
yield num
Then, it just comes down to choosing the correct values for a, c, and m to guarantee that the LCG will generate a full period (which is the only guarantee that you get non-repeating numbers). As the Wikipedia article explains, the following three conditions need to be true:
m and c need to be relatively prime.
a - 1 is divisible by all prime factors of m
a - 1 is divisible by 4, if m is also divisible by 4.
The first one is very easily guaranteed by simply choosing a prime for c. Also, this is the value that can be chosen last, and this will ultimately allow us to mix up the sequence a bit.
The relationship between a - 1 and m is more complicated though. In a full period LCG, m is the length of the period. Or in other words, it is the number range your numbers come from. So this is what you are usually choosing first. In your case, you want m to be around 1000000. Choosing exactly your maximum number might be difficult since that restricts you a lot (in both your choice of a and also c), so you can also choose numbers larger than that and simply skip all numbers outside of your range later.
Let’s choose m = 1000000 now though. The prime factors of m are 2 and 5. And it’s also obviously divisible by 4. So for a - 1, we need a number that is a multiple of 2 * 2 * 5 to satisfy the conditions 2 and 3. Let’s choose a - 1 = 160, so a = 161.
For c, we are using a random prime that’s somewhere in between of our range: c = 506903
Putting that into our LCG gives us our desired sequence. We can choose any seed value from the range (0 <= seed <= m) as the starting point of our sequence.
So let’s try it out and verify that what we thought of actually works. For this purpose, we are just collecting all numbers from the generator in a set until we hit a duplicate. At that point, we should have m = 1000000 numbers in the set:
>>> g = lcg(161, 506903, 1000000)
>>> numbers = set()
>>> for n in g:
if n in numbers:
raise Exception('Number {} already encountered before!'.format(n))
numbers.add(n)
Traceback (most recent call last):
File "<pyshell#5>", line 3, in <module>
raise Exception('Number {} already encountered before!'.format(n))
Exception: Number 506903 already encountered before!
>>> len(numbers)
1000000
And it’s correct! So we did create a pseudo-random sequence of numbers that allowed us to get non-repeating numbers from our range m. Of course, by design, this sequence will be always the same, so it is only random once when you choose those numbers. You can switch up the values for a and c to get different sequences though, as long as you maintain the properties mentioned above.
The big benefit of this approach is of course that you do not need to store all the previously generated numbers. It is a constant space algorithm as it only needs to remember the initial configuration and the previously generated value.
It will also not deteriorate as you get further into the sequence. This is a general problem with solutions that just keep generating a random number until a new one is found that hasn’t been encountered before. This is because the longer the list of generated numbers gets, the less likely you are going to hit a numbers that’s not in that list with an evenly distributed random algorithm. So getting the 1000000th number will likely take you a long time to generate with memory based random generators.
But of course, having this simply algorithm which just performs some multiplication and some addition does not appear very random. But you have to keep in mind that this is actually the basis for most pseudo-random number generators out there. So random.random() uses something like this internally. It’s just that the m is a lot larger, so you don’t notice it there.
If you really care about the memory you could use a NumPy array (or a Python array).
A one million NumPy array of int32 (more than enough to contain integers between 0 and 1 000 000) will only consume ~4MB, Python itself would require ~36MB (roughly 28byte per integer and 8 byte for each list element + overallocation) for an identical list:
>>> # NumPy array
>>> import numpy as np
>>> np.arange(1000000, dtype=np.int32).nbytes
4 000 000
>>> # Python list
>>> import sys
>>> import random
>>> l = list(range(1000000))
>>> random.shuffle(l)
>>> size = sys.getsizeof(l) # size of the list
>>> size += sum(sys.getsizeof(item) for item in l) # size of the list elements
>>> size
37 000 108
You only want unique values and you have a consecutive range (1 million requested items and 1 million different numbers), so you could simply shuffle the range and then yield items from your shuffled array:
def generate_random_integer():
arr = np.arange(1000000, dtype=np.int32)
np.random.shuffle(arr)
yield from arr
# yield from is equivalent to:
# for item in arr:
# yield item
And it can be called using next:
>>> gen = generate_random_integer()
>>> next(gen)
443727
However that will throw away the performance benefit of using NumPy, so in case you want to use NumPy don't bother with the generator and just perform the operations (vectorized - if possible) on the array. It consumes much less memory than Python and it could be orders of magnitude faster (factors of 10-100 faster are not uncommon!).
For a large number of non-repeating random numbers use an encryption. With a given key, encrypt the numbers: 0, 1, 2, 3, ... Since encryption is uniquely reversible then each encrypted number is guaranteed to be unique, provided you use the same key. For 64 bit numbers use DES. For 128 bit numbers use AES. For other size numbers use some Format Preserving Encryption. For pure numbers you might find Hasty Pudding cipher useful as that allows a large range of different bit sizes and non-bit sizes as well, like [0..5999999].
Keep track of the key and the last number you encrypted. When you need a new unique random number just encrypt the next number you haven't used so far.
Considering your numbers should fit in a 64bit integer, one million of them stored in a list would be up to 64 mega bytes plus the list object overhead, if your processing computer can afford that the easyest way is to use shuffle:
import random
randInts = list(range(1000000))
random.shuffle(randInts)
print(randInts)
Note that the other method is to keep track of the previously generated numbers, which will get you to the point of having all of them stored too.
I just needed that function, and to my huge surprise I haven't found anything that would suit my needs. #poke's answer didn't satisfy me because I needed to have precise borders, and other ones which included lists caused heaped memory.
Initially, I needed a function that would generate numbers from a to b, where a - b could be anything from 0 to 2^32 - 1, which means the range of those numbers could be as high as maximal 32-bit unsigned integer.
The idea of my own algorithm is simple both to understand and implement. It's a binary tree, where the next branch is chosen by 50/50 chance boolean generator. Basically, we divide all numbers from a to b into two branches, then decide from which one we yield the next value, then do that recursively until we end up with single nodes, which are also being picked up by random.
The depth of recursion is:
, which implies that for the given stack limit of 256, your highest range would be 2^256, which is impressive.
Things to note:
a must be lesser or equal b - otherwise no output will be displayed.
Boundaries are included, meaning unique_random_generator(0, 3) will generate [0, 1, 2, 3].
TL;DR - here's the code
import math, random
# a, b - inclusive
def unique_random_generator(a, b):
# corner case on wrong input
if a > b:
return
# end node of the tree
if a == b:
yield a
return
# middle point of tree division
c = math.floor((a + b) / 2)
generator_left = unique_random_generator(a, c) # left branch - contains all the numbers between 'a' and 'c'
generator_right = unique_random_generator(c + 1, b) # right branch - contains all the numbers between 'c + 1' and 'b'
has_values = True
while (has_values):
# decide whether we pick up a value from the left branch, or the right
decision = bool(random.getrandbits(1))
if decision:
next_left = next(generator_left, None)
# if left branch is empty, check the right one
if next_left == None:
next_right = next(generator_right, None)
# if both empty, current recursion's dessicated
if next_right == None:
has_values = False
else:
yield next_right
else:
yield next_left
next_right = next(generator_right, None)
if next_right != None:
yield next_right
else:
next_right = next(generator_right, None)
# if right branch is empty, check the left one
if next_right == None:
next_left = next(generator_left, None)
# if both empty, current recursion's dessicated
if next_left == None:
has_values = False
else:
yield next_left
else:
yield next_right
next_left = next(generator_left, None)
if next_left != None:
yield next_left
Usage:
for i in unique_random_generator(0, 2**32):
print(i)
import random
# number of random entries
x = 1000
# The set of all values
y = {}
while (x > 0) :
a = random.randint(0 , 10**10)
if a not in y :
a -= 1
This way you are sure you have perfectly random unique values
x represents the number of values you want
You can easily make one yourself:
from random import random
def randgen():
while True:
yield random()
ran = randgen()
next(ran)
next(ran)
...

Algorithm - Grouping List in unique pairs

I'm having difficulties with an assignment I've received, and I am pretty sure the problem's text is flawed. I've translated it to this:
Consider a list x[1..2n] with elements from {1,2,..,m}, m < n. Propose and implement in Python an algorithm with a complexity of O(n) that groups the elements into pairs (pairs of (x[i],x[j]) with i < j) such as every element is present in a single pair. For each set of pairs, calculate the maximum sum of the pairs, then compare it with the rest of the sets. Return the set that has the minimum of those.
For example, x = [1,5,9,3] can be paired in three ways:
(1,5),(9,3) => Sums: 6, 12 => Maximum 12
(1,9),(5,3) => Sums: 10, 8 => Maximum 10
(1,3),(5,9) => Sums: 4, 14 => Maximum 14
----------
Minimum 10
Solution to be returned: (1,9),(5,3)
The things that strike me oddly are as follows:
Table contents definition It says that there are elements of 1..2n, from {1..m}, m < n. But if m < n, then there aren't enough elements to populate the list without duplicating some, which is not allowed. So then I would assume m >= 2n. Also, the example has n = 2 but uses elements that are greater than 1, so I assume that's what they meant.
O(n) complexity? So is there a way to combine them in a single loop? I can't think of anything.
My Calculations:
For n = 4:
Number of ways to combine: 6
Valid ways: 3
For n = 6
Number of ways to combine: 910
Valid ways: 15
For n = 8
Number of ways to combine: >30 000
Valid ways: ?
So obviously, I cannot use brute force and then figure out if it is valid after then. The formula I used to calculate the total possible ways is
C(C(n,2),n/2)
Question:
Is this problem wrongly written and impossible to solve? If so, what conditions should be added or removed to make it feasible? If you are going to suggest some code in python, remember I cannot use any prebuilt functions of any kind. Thank you
Assuming a sorted list:
def answer(L):
return list(zip(L[:len(L)//2], L[len(L)//2:][::-1]))
Or if you want to do it more manually:
def answer(L):
answer = []
for i in range(len(L)//2):
answer.append((L[i], L[len(L)-i-1)]))
return answer
Output:
In [3]: answer([1,3,5,9])
Out[3]: [(1, 9), (3, 5)]

Categories