Algorithm - Grouping List in unique pairs - python

I'm having difficulties with an assignment I've received, and I am pretty sure the problem's text is flawed. I've translated it to this:
Consider a list x[1..2n] with elements from {1,2,..,m}, m < n. Propose and implement in Python an algorithm with a complexity of O(n) that groups the elements into pairs (pairs of (x[i],x[j]) with i < j) such as every element is present in a single pair. For each set of pairs, calculate the maximum sum of the pairs, then compare it with the rest of the sets. Return the set that has the minimum of those.
For example, x = [1,5,9,3] can be paired in three ways:
(1,5),(9,3) => Sums: 6, 12 => Maximum 12
(1,9),(5,3) => Sums: 10, 8 => Maximum 10
(1,3),(5,9) => Sums: 4, 14 => Maximum 14
----------
Minimum 10
Solution to be returned: (1,9),(5,3)
The things that strike me oddly are as follows:
Table contents definition It says that there are elements of 1..2n, from {1..m}, m < n. But if m < n, then there aren't enough elements to populate the list without duplicating some, which is not allowed. So then I would assume m >= 2n. Also, the example has n = 2 but uses elements that are greater than 1, so I assume that's what they meant.
O(n) complexity? So is there a way to combine them in a single loop? I can't think of anything.
My Calculations:
For n = 4:
Number of ways to combine: 6
Valid ways: 3
For n = 6
Number of ways to combine: 910
Valid ways: 15
For n = 8
Number of ways to combine: >30 000
Valid ways: ?
So obviously, I cannot use brute force and then figure out if it is valid after then. The formula I used to calculate the total possible ways is
C(C(n,2),n/2)
Question:
Is this problem wrongly written and impossible to solve? If so, what conditions should be added or removed to make it feasible? If you are going to suggest some code in python, remember I cannot use any prebuilt functions of any kind. Thank you

Assuming a sorted list:
def answer(L):
return list(zip(L[:len(L)//2], L[len(L)//2:][::-1]))
Or if you want to do it more manually:
def answer(L):
answer = []
for i in range(len(L)//2):
answer.append((L[i], L[len(L)-i-1)]))
return answer
Output:
In [3]: answer([1,3,5,9])
Out[3]: [(1, 9), (3, 5)]

Related

What is the math program I'm trying to solve in python?

I am trying to solve this math problem in python, and I'm not sure what it is called:
The answer X is always 100
Given a list of 5 integers, their sum would equal X
Each integer has to be between 1 and 25
The integers can appear one or more times in the list
I want to find all the possible unique lists of 5 integers that match.
These would match:
20,20,20,20,20
25,25,25,20,5
10,25,19,21,25
along with many more.
I looked at itertools.permutations, but I don't think that handles duplicate integers in the list. I'm thinking there must be a standard math algorithm for this, but my search queries must be poor.
Only other thing to mention is if it matters that the list size could change from 10 integers to some other length (6, 24, etc).
This is a constraint satisfaction problem. These can often be solved by a method called linear programming: You fix one part of the solution and then solve the remaining subproblem. In Python, we can implement this approach with a recursive function:
def csp_solutions(target_sum, n, i_min=1, i_max=25):
domain = range(i_min, i_max + 1)
if n == 1:
if target_sum in domain:
return [[target_sum]]
else:
return []
solutions = []
for i in domain:
# Check if a solution is still possible when i is picked:
if (n - 1) * i_min <= target_sum - i <= (n - 1) * i_max:
# Construct solutions recursively:
solutions.extend([[i] + sol
for sol in csp_solutions(target_sum - i, n - 1)])
return solutions
all_solutions = csp_solutions(100, 5)
This yields 23746 solutions, in agreement with the answer by Alex Reynolds.
Another approach with Numpy:
#!/usr/bin/env python
import numpy as np
start = 1
end = 25
entries = 5
total = 100
a = np.arange(start, end + 1)
c = np.array(np.meshgrid(a, a, a, a, a)).T.reshape(-1, entries)
assert(len(c) == pow(end, entries))
s = c.sum(axis=1)
#
# filter all combinations for those that meet sum criterion
#
valid_combinations = c[np.where(s == total)]
print(len(valid_combinations)) # 23746
#
# filter those combinations for unique permutations
#
unique_permutations = set(tuple(sorted(x)) for x in valid_combinations)
print(len(unique_permutations)) # 376
You want combinations_with_replacement from itertools library. Here is what the code would look like:
from itertools import combinations_with_replacement
values = [i for i in range(1, 26)]
candidates = []
for tuple5 in combinations_with_replacement(values, 5):
if sum(tuple5) == 100:
candidates.append(tuple5)
For me on this problem I get 376 candidates. As mentioned in the comments above if these are counted once for each arrangement of the 5-pair, then you'd want to look at all, permutations of the 5 candidates-which may not be all distinct. For example (20,20,20,20,20) is the same regardless of how you arrange the indices. However, (21,20,20,20,19) is not-this one has some distinct arrangements.
I think that this could be what you are searching for: given a target number SUM, a left treshold L, a right treshold R and a size K, find all the possible lists of K elements between L and R which sum gives SUM. There isn't a specific name for this problem though, as much as I was able to find.

How can I obtain all combinations of my list?

I am attempting to create an array with all the possible combinations of two numbers.
My array is [0, 17.1]
I wish to obtain all the possible combinations of these two values in a list of 48 elements long, both of which can be repeated.
from itertools import combinations_with_replacement
array = [0, 17.1]
combo_wr = combinations_with_replacement(array, 48)
print(len(list(combo_wr)))
I have attempted to make use of itertools.combinations_with_replacement to create something which looks like the following -> combo_wr = combinations_with_replacement(array, 48).
When I print the length of this I would expect a much larger number but I am only getting 49 combinations of these numbers. Where am I going wrong or what other functions would work better to get all the possible combinations, order does not matter in the instance.
Below is what I have tried so far for reproducibility
>>> from itertools import combinations_with_replacement
>>> array = [0, 17.1]
>>> combo_wr = combinations_with_replacement(array, 48)
>>> print(len(list(combo_wr)))
49
a sequence of 48 numbers each chosen from 2 different options gives a search space of 2^48 which is 281.4 trillion.
An added constrant that the sum of the numbers should be larger than 250, then with [0,17.1] means at least 15 of the elements must be 17.1 so you reduce your search space by 48 choose 15 which is 1 trillion, I.E. not enough to make much of a difference.
If you set the first (or last) 15 elements to 17.1 then it would reduce the search space to choosing the rest of the elements so 2^(58-15) = 2^33 which is 8.6 billion but I'm not sure that is the constraint you actually want or if that is still small enough to be useful.
So code that produces the results you asked for is not likely to help you.
But if you still wanted help generating those trillions of combinations
to clarify what the different options available to you:
itertools.product gives every possible sequence of heads and tails
itertools.combinations gives the unordered subsets of a given length
itertools.permutations gives all ways of reordering the given sequence, or ordering of all subsets of a given length
itertools.combinations_with_replacement gives all subsets where the number of repetitions of different options is unique, for 2 element input this would be like "after n coin flips what are the sequences where the number of heads is unique"
permutations and combinations don't make sense with len(array)==2 and r=48 since they are about subsets and product will do a lot more redundancy than you want.
order does not matter in the instance.
If this is the case then it is possible you are just expecting more combos then there are.
I wish to get all of them but is it possible to narrow down those of which would satisfy say the summated value of >= 250
ok so then you can get every unique value for the sum of elements with combinations_with_replacements then do permutations on that
array = [0, 17.1]
reps = 48
lower_bound = 250
upper_bound = float("inf") # you might have an upper bound, if not you can remove this from the condition below or leave it as inf
for combo in combinations_with_replacement(array, reps):
if lower_bound <= sum(combo) <= upper_bound:
# this combo of 'number of elements that are 17.1` meets criteria for further investigation
for perm in permutations(combo):
do_thing(perm)
although this still ends up visiting a ton of duplicate entries since permutations of a sequence with a lot of duplicate entries will swap elements that are equal and give the same entries so we can do better.
First the combinations_with_replacement is really only communicating how many of each element we are dealing with so we can just do for k in range(reps) to get that info, and then want every permutation that has exactly k repeats of the second element in array - which happens to be equivalent to choosing k indices to set to that.
So we can use combinations(range(reps), k) to get a set of indices to set to the second element and this I believe is the smallest set of possible sequences you would have to check to meet the "sum is greater than 250 requirement.
reps = 48
def isSummationValidCombo(summation):
return summation >= 250
for k in range(reps):
summation = array[1] *k + array[0] * (reps-k)
if not isSummationValidCombo(summation):
continue
for indices_of_sequence_to_set_to_second_element in combinations(range(reps), k):
# each combination of k inices to set to the higher value
seq = [array[0]]*reps
for idx in indices_of_sequence_to_set_to_second_element:
seq[idx] = array[1]
do_thing(seq)
this would leave your number of combinations as 280 trillion compared to the 281 trillion that would be hit by product so you will probably need to figure out other techniques to reduce search space

Python Lottery Number Generation

I am working on a lottery number generation program. I have a fixed list of allowed numbers (1-80) from which users can choose 6 numbers. Each number can only be picked once. I want to generate all possible combinations efficiently. Current implementation takes more than 30 seconds if allowed_numbers is [1,...,60]. Above that, it freezes my system.
from itertools import combinations
import numpy as np
LOT_SIZE = 6
allowed_numbers = np.arange(1, 61)
all_combinations = np.array(list(combinations(allowed_numbers, LOT_SIZE)))
print(len(all_combinations))
I think I would need a numpy array (not sure if 2D). Something like,
[[1,2,3,4,5,6],
[1,2,3,4,5,,7],...]
because I want to (quickly) perform several operations on these combinations. These operations may include,
Removing combinations that have only even numbers
Removing combinations who's sum is greater than 150 etc.
Checking if there is only one pair of consecutive numbers (Acceptable: [1,2,4,6,8,10] {Pair: (1,2)}| Not-acceptable: [1,2,4,5,7,9] {Pairs: (1,2) and (4,5)} )
Any help will be appreciated.
Thanks
Some options:
1) apply filters on the iterable instead of on the data, using filter:
def filt(x):
return sum(x) < 7
list(filter(filt, itertools.combinations(allowed, n)))
will save ~15% time vs. constructing the list and applying the filters then, i.e.:
[i for i in itertools.combinations(allowed, n) if filt(i) if filt(i)]
2) Use np.fromiter
arr = np.fromiter(itertools.chain.from_iterable(itertools.combinations(allowed, n)), int).reshape(-1, n)
return arr[arr.sum(1) < 7]
3) work on the generator object itself. In the example above, you can stop the itertools.combinations when the first number is above 7 (as an example):
def my_generator():
for i in itertools.combinations(allowed, n):
if i[0] >= 7:
return
elif sum(i) < 7:
yield i
list(my_generator()) # will build 3x times faster than option 1
Note that np.fromiter becomes less efficient on compound expressions, so the mask is applied afterwards
You can use itertools.combinations(allowed_numbers, 6) to get all combinations of length 6 from your list (this is the fastest way to get this operation done).

Is it correct to say that this algorithm is O(n+m)?

Firstly I looked into the following questions:
O(N+M) time complexity
Comparing complexity of O(n+m) and O(max(n,m))
Big-O of These Nested Loops
Is this function O(N+M) or O(N*M)?
How to find time complexity of an algorithm
However, I'm still not 100% confident. That said, I have the following python example code:
adjList = [[4,7,8,9],[7,7,5,6],[1,4,3],[2,9],[2,1,7]]
for i in adjList:
for j in i:
print "anything else"
I came to think this is an O(n+m) algorithm, here is my reasoning:
I have adjList which is a list of lists. The integers there are randomly picked for the sake of exemplification. This is actually an adjacency list where the vertex 1 is linked to the vertices 4, 7, 8 and 9 and so on.
I know that adjList[i][j] will return the j-th item from the i-th list. So adjList[0][2] is 8
The first for will loop through each of the i lists. If we have N lists this is an O(N).
The second for will loop through each of the j elements of a list. But in this case, j is not a fixed value, e.g., the first list has 4 elements (4,7,8,9) and the third one has 3 elements (1,4,3). So at the end the second for will loop M times, M being the sum of each of the different j values. So, M is the sum of elements of every list. Thus O(M)
In this example, the first for should loop 5 times and the second for should loop 16 times. A total of 21 loops. If I change the adjList to a single big list within a list like this:
adjList = [[4,7,8,9,7,7,5,6,1,4,3,2,9,2,1,7]]
It would still loop through the 16 elements in the second for plus 1 time for the first for.
Thereby I can say that the algorithm will loop N times plus M times. Where N is the number of lists in adjList and M is the sum of elements in each one of the lists inside adjList. So O(N+M)
So, where lies my doubt?
Anywhere I looked I've found examples of nested loops being O(N^2) or O(N*M). Even when people mentioned that they can be something else other than those I've found no example. I'm yet to find an example of O(N+M) nested loops. So I'm still in doubt if my reasoning is correct.
Part of me wonders if this is not a O(N*M) algorithm. But I wouldn't elaborate on this.
Thus my final questions remains: Is this reasoning correct and said algorithm is indeed O(N+M)? If not, care to show where my mistakes are?
Your big mistake is that you have not clearly identified M and N what mean.
For example:
Visiting all cells in an N x M matrix is O(N*M).
If you flatten that matrix into a list with P cells, visiting is O(P).
However P == N*M in this context ... so O(M*N) and O(P) mean the same thing ... in this context.
Looking at your reasoning, you seem to have conflated (your) M with the analog of my P. (I say analog because rectangular and ragged arrays are not identical.)
So, M is the sum of elements of every list.
That's not how I have used M. More importantly, it is not how the various other references you have looked at are using M. Specifically the ones that talk about an N x M matrix or an N x avge(M) ragged array. Hence your confusion.
Note that your M and N are not independent variables / parameters. If you scale a problem in N, that implicitly changes the value of M.
Hint: when reasoning about complexity, one way to be sure you get the correct is to go back to basics. Work out the formulae for counting the operations performed, and reasons about them rather than attempting to reason about how the "big O" notation composes.
You define N and M as follows:
Thereby I can say that the algorithm will loop N times plus M times. Where N is the number of lists in adjList and M is the sum of elements in each one of the lists inside adjList. So O(N+M)
By this definition, the algorithm is O(M)1. To understand why N vanishes, you need to consider the relationship between N and M. Suppose you have two lists, and you want to look at every possible pair of items from the two lists. We'll keep it simple:
list1 = [1, 2, 3]
list2 = [4, 5]
So you want to look at all six of these pairs:
pairs = [(1, 4), (2, 4), (3, 4), (1, 5), (2, 5), (3, 5)]
That's a total of 3 * 2 = 6. Now generalize that; say list1 has N items and list2 has M items. The total number of pairs is N * M, and so this will be an O(N * M) algorithm.
Now suppose that instead of looking at each pair, you just want to look at each item that is in one or the other list. Now you're just looking at all the values that appear in a concatenation of the two lists:
items = [1, 2, 3, 4, 5]
That's a total of 3 + 2 = 5 items. Now generalize; you'll get N + M, and so this will be an O(N + M) algorithm.
Given that, we should expect your case and the above case to be identical, if your case is indeed O(N + M). In other words, we should expect your case to involve looking at all the items from two different lists. But look:
all_lists = [[4,7,8,9],[7,7,5,6],[1,4,3],[2,9],[2,1,7]]
That's the same thing as:
list1 = [4,7,8,9]
list2 = [7,7,5,6]
list3 = [1,4,3]
list4 = [2,9]
list5 = [2,1,7]
Whereas in the O(N + M) case, there were only two lists, here, there are five lists! So this can't be O(N + M).
However, this should give you an idea of how to work out a better description. (Hint: it could include J, K, and L, in addition to M and N.)
The origin of your mistake is that in the first two examples, M and N are defined to be separate from one another, but your definitions of M and N overlap. In order for M and N to be summed or multiplied meaningfully, they need to be unrelated to one another. But in your definitions, the values of M and N are interrelated, and so in a sense, they repeat values that shouldn't be repeated.
Or, to put it in yet another way, suppose the sum of the lengths of all the inner lists adds up to M. If you have to take two steps instead of just one for each of those values, the result is still only a constant value C times M. And for any constant value C, C * O(M) is still O(M). So the work you do in the outer loop is already counted (up to a constant multiplier) by the O(M) term.
Notes:
1. Well, ok, not quite, as Stefan Pochmann points out. Because of a technicality, it might be more appropriate to say O(max(N,
M)) because if any inner lists are empty, you'll still have to visit them.
If your code looked like this:
for i in adjList:
<something significant 1>
for j in i:
<something significant 2>
Then I would agree with you. However <something significant 1> is missing (the internal work python does to execute the loop is not worth considering) and so there is no reason to include the O(N) part. Big O notation is not for counting every single thing, it's for showing how the algorithm scales as inputs get bigger. So we look at what matters, and that means your code should be considered O(M).
The reason nested loops are usually considered O(N*M) is because usually N is defined as the number of iterations of the outer loop (as you've done) and M is defined as the number of iterations of the inner loop per outer iteration, not in total. Therefore N*M by the common definition is equal to M in your definition.
EDIT: some are claiming that the time to loop should be considered, considering for example the case of a large number of empty lists. As the code below shows, it takes significantly longer just to construct such a nested list than to nested loop through it. And that's for a trivial construction, usually it would be more complicated. Therefore I do not believe it is worth considering the time to loop.
from time import time
start = time()
L = [[] for _ in range(10000000)]
construction_time = time() - start
start = time()
for sub in L:
for i in sub:
pass
loop_time = time() - start
print(construction_time / loop_time) # typically between 3 and 4

iterating from a tuple of tuples optimizations

Basic problem: take a list of digits, find all permutations, filter, filter again, and sum.
this is my first python script so after some research i decided to use itertools.permutations. i then iterate through the tuple, and create a new tuple of tuples with only the tuples i wanted. i then concatenate the tuples because I want the permutations as numbers, not as broken strings.
then i do one more filter and sum them together.
for 8 digits, this is taking me about 2.5 seconds, far too slow if i want to scale to 15 digits (my goal).
(I decided to use tuples since a list of the permutations will be too large for memory)
EDIT: I realized that I don't care about the sum of the permutations, but rather just the count. If going the generator path, how could I include a counter instead of taking the sum?
Updated my original code with [very] slight improvements shortcuts also, as to not just copy pasta suggested answers before I truly understand them.
import itertools
digits= [0,1,2,3,4,5,6,7]
digital=(itertools.permutations(digits))
mytuple=()
for i in digital:
q=''
j=list(i)
if j[0] != 0:
for k in range(len(j)):
q=q+str(j[k])
mytuple=mytuple+(q,)
#print mytuple
z = [i for i in mytuple if i%7==0]
print len(z)
this being my first python script, any non-optimization pointers would also be appreciated.
thanks!
"Generator comprehensions" are your friend. Not least because a "generator" only works on one element at a time, helping you save memory. Also, some time can be saved by pre-computing the relevant powers of 10 and performing integer arithmetic instead of converting to and from strings:
import itertools
digits = [0,1,2,3,4,5,6,7,8,9]
oom = [ 10 ** i for i, digit in enumerate( digits ) ][ ::-1 ] # orders of magnitude
allperm = itertools.permutations( digits )
firstpass = ( sum( a * b for a, b in zip( perm, oom ) ) for perm in allperm if perm[ 0 ] )
print sum( i for i in firstpass if i % 7 == 0 )
This is faster than the original by a large factor, but the factorial nature of permutations means that 15 digits is still a long way away. I get 0.05s for len(digits)==8, 0.5s for len(digits)==9, but 9.3s for len(digits)==10...
Since you're working in base 10, digits sequences of length >10 will contain repeats, leading to repeats in the set of permutations. Your strategy will need to change if the repeats are not supposed to be counted separately (e.g. if the question is phrased as "how many 15-digit multiples of 7 are repermutations of the following digits...").
Using itertools is a good choice. Well investigated.
I tried to improve the nice solution of #jez. I rearanged the range, replaced zip through izip and cached the lookup in a local variable.
N = 10
gr = xrange(N-1,-1,-1)
ap = itertools.permutations(gr)
o = [10 ** i for i in gr]
zip = itertools.izip
print sum(i for i in (sum(a*b for a, b in zip(p, o)) for p in ap if p[0]) if i % 7 == 0)
For me it's about 17% faster for N=9 and 7% for N=10. The speed improvement may is negligible for larger N's, but not tested.
There are many short-cuts in python you're missing. Try this:
import itertools
digits= [0,1,2,3,4,5,6,7]
digital=(itertools.permutations(digits))
mytuple=set()
for i in digital:
if i[0] != 0:
mytuple.add(int(''.join(str(d) for d in i)))
z = [i for i in mytuple if i%7==0]
print sum(z)
Might be hard to get to 15 digits though. 15! is 1.3 trillion...if you could process 10 million permutations per second, it would still take 36 hours.

Categories