Python - Combinations only if condition applies - python

Suppose I have a huge list of tuples:
tuples = ([1, 2], [2, 1], [3, 2], [25, 73], [1, 3]...)
This list has 360000 elements as of now (they are a list of coprimes numbers). I need to make combinations of 3 tuples, such that there are only 3 different numbers on each combination, example:
([2, 1], [3, 1], [3, 2])
([2, 1], [5, 1], [5, 2])
I need to discard combinations with 4 or more different numbers while generating the list of combinations.
If I try to bruteforce this and test each combination, I end up with 360000 choose 3 which is 7.77 * 10^15possible combinations to test.
EDIT:
The problem I am trying to solve is:
Find all combinations of coprime pairs in the form:
(a, b), (a, c), (b, c)
for c < 120000
Steps I've taken:
Generate the ternary tree for all Coprime pairs whereboth numbers are less than 120000
(ISSUE - Generate the combinations, bruteforcing it won't do)

Let's make a dict of sets mapping all larger elements to smaller elements within a tuple:
d = {}
for tup in tuples:
# here you may limit max(tup) to 120000
d.setdefault(min(tup), set()).add(max(tup))
# {1: {2, 3}, 2: {3}, 25: {73}}
This eliminates also all symetrical pairs: (1, 2), (2, 1).
And then search for all working combinations:
for a, bc in d.iteritems():
for b, c in it.combinations(sorted(bc), 2):
if b in d and c in d[b]:
print(a, b, c) # or yield or append to a list
Should be faster than your brute forceā€¦

for a in range(1, 120000):
for b in range(a+1, 120000):
if (gcd(a, b) > 1):
continue;
for c in range(b+1, 120000):
if (gcd(a, c) = 1 and gcd(b, c) = 1):
print (a, b, c)
With N = 120000 this takes time roughly N^3/pi^2. A brute force check of all tuples takes time N^6/48, so this is much faster -- about 3 * 10^14 times faster.

Related

Maximum sum combinations from 4 arrays

I have 4 arrays with 2 columns: one column measuring meters and the other measuring the money you get per meter, I want to get the highest sum combinations from these 4 arrays but I have 2 rules : the first rule is that each meter value in the sum has to be between 1 and 6 meters, and the second rule is that the meter value of the result has to be equal to 12 meters. I have written a code that gets the maximum sum out of a series of 4 arrays but I don't know how to implement the 2 rules in the code. This is why i am asking for your help.
My 4 arrays :
1,2,3,4,5,6 are the meter values
and the numbers below the meter values is the money earned by meters
A = [[1, 2, 3, 4, 5, 6],
[50.4, 100.8, 201.6, 403.2, 806.4, 1612.8]]
B = [[1, 2, 3, 4, 5, 6],
[40.8, 81.6, 163.2, 326.4, 652.8, 1305.6]]
C = [[1, 2, 3, 4, 5, 6],
[110, 220, 440, 880, 1760, 3520]]
D = [[1, 2, 3, 4, 5, 6],
[64, 128, 256, 512, 1024, 2048]]
My code :
import math
from queue import PriorityQueue
def KMaxCombinations(A, B, C, D, N, K):
# Max heap.
pq = PriorityQueue()
# Insert all the possible
# combinations in max heap.
for i in range(0,N):
for j in range(0,N):
for k in range(0,N):
for l in range(0,N):
a = A[i] + B[j] + C[k] + D[l]
pq.put((-a, a))
# Pop first N elements from
# max heap and display them.
count = 0
while (count < K):
print(pq.get()[1])
count = count + 1
# Driver method
A = [50.4, 100.8, 201.6, 403.2, 806.4, 1612.8]
B = [40.8, 81.6, 163.2, 326.4, 652.8, 1305.6]
C = [110, 220, 440, 880, 1760, 3520]
D = [64, 128, 256, 512, 1024, 2048]
N = len(A)
K = 3
# Function call
KMaxCombinations(A, B, C, D, N, K)
As it has been said in the comments other approaches may be more efficient. And of course we need to put the meters data in the list together with the prices:
A = [(1, 50.4), (2, 100.8), (3, 201.6), (4, 403.2), (5, 806.4), (6, 1612.8)]
B = [(1, 40.8), (2, 81.6), (3, 163.2), (4, 326.4), (5, 652.8), (6, 1305.6)]
C = [(1, 110), (2, 220), (3, 440), (4, 880), (5, 1760), (6, 3520)]
D = [(1, 64), (2, 128), (3, 256), (4, 512), (5, 1024), (6, 2048)]
Then, if we want to keep your approach (just allow me to use itertools.product instead of those 4 for loops) a possible solution would be:
def KMaxCombinations(A, B, C, D, N, K):
pq = PriorityQueue()
for p in product(A, B, C, D):
meters, prices = list(zip(*p))
for m in meters:
if not (0<m<7):
allgood = False
break
else:
allgood = True
if allgood and (sum(meters) == 12):
a = sum(prices)
pq.put((-a, a))
count = 0
while (count < K):
print(pq.get()[1])
count = count + 1
KMaxCombinations(A,B,C,D,N,K)
4123.2
4028.0
3960.8
Here's a modification of posted code to solve for solution.
Using exhaustive search as posted code
Changes:
Used heapq as PriorityQueue
Made A, B, C, D 2-dimensional lists to add meters
Added if conditionals to implement rules
Code
import heapq
def KMaxCombinations(A, B, C, D, N, K):
# Trying all combinations of the rows of A, B, C, D
priority_queue = []
for i in range(0,N):
if 1 <= A[0][i] <= 6: # the first rule is that each meter value in the sum has to be between 1 and 6 meters
for j in range(0,N):
if 1 <= B[0][j] <= 6: # the first rule is that each meter value in the sum has to be between 1 and 6 meters
for k in range(0,N):
if 1 <= C[0][k] <= 6: # the first rule is that each meter value in the sum has to be between 1 and 6 meters
for l in range(0,N):
if 1 <= D[0][l] <= 6: # the first rule is that each meter value in the sum has to be between 1 and 6 meters
# second rule is that the meter value of the result has to be equal to 12 meters
if A[0][i] + B[0][j] + C[0][k]+ D[0][l] == 12:
money_obtained = A[1][i] + B[1][j] + C[1][k] + D[1][l]
# Add another solution to priority queue
heapq.heappush(priority_queue, (-money_obtained, i, j, k, l))
return heapq.nsmallest(K, priority_queue)[K-1] # K-th most money
# use smallest since money is negative
# value to make max heap
Test
# Use 2D list for meters and money
# A[0] - list of meters
# A[1] - list of money
# same for B, C, D
A = [[1, 2, 3, 4, 5, 6],
[50.4, 100.8, 201.6, 403.2, 806.4, 1612.8]]
B = [[1, 2, 3, 4, 5, 6],
[40.8, 81.6, 163.2, 326.4, 652.8, 1305.6]]
C = [[1, 2, 3, 4, 5, 6],
[110, 220, 440, 880, 1760, 3520]]
D = [[1, 2, 3, 4, 5, 6],
[64, 128, 256, 512, 1024, 2048]]
K = 3 # 3rd best solution
solution = KMaxCombinations(A, B, C, D, len(A[0]), K)
# Show result
if solution:
max_money,*sol = solution
print(f'{K}rd Max money is {max_money}')
print(f'Using A[{sol[0]}] = {A[0][sol[0]]}')
print(f'Using B[{sol[1]}] = {A[0][sol[1]]}')
print(f'Using C[{sol[2]}] = {A[0][sol[2]]}')
print(f'Using D[{sol[3]}] = {A[0][sol[3]]}')
print(f'Sum A, B, C, d meters is: {A[0][sol[0]] + B[0][sol[1]] + C[0][sol[2]] + D[0][sol[3]]}')
else:
print("No Solution")
Output
3rd Max money is 3960.8
Using A[0] = 1
Using B[3] = 4
Using C[5] = 6
Using D[0] = 1
Sum A, B, C, d meters is: 12
This can be solved with additional info that wasn't in the question. The meter number is just the one-based index of the array. (see comments, and OP: please edit that info into the question. External links to pictures aren't allowed.)
Since you're not familiar with dynamic programming, and this problem suits itself to a brute force approach, we'll do that. You just need the meter number to check the constraints, which you can get from python's enumerate.
https://docs.python.org/3/library/functions.html#enumerate
Your for loops over a container should not be based on an index, but rather something simple like "for a in A", but since we want the index as well, let's do this
for a_meter, a_value in enumerate(A, start=1):
for b_meter, b_value in enumerate(B, start=1):
for c_meter, c_value in enumerate(C, start=1):
for d_meter, d_value in enumerate(D, start=1):
if check_constraints(a_meter, b_meter, c_meter, d_meter):
value = sum((a_value, b_value, c_value, d_value))
pq.put(-value, value)
We can check the constraints first, and only include the values that pass in the priority queue.
def check_constraints(a, b, c, d):
return sum((a, b, c, d)) == 12
When you do this kind of brute forcing, you really want to use itertools. What you've done with these for loops is basically itertools.product.
https://docs.python.org/3/library/itertools.html#itertools.product
Additionally, itertools doesn't care how many different sets of meters you have. It would be easy (with some practice) to write a function like KMaxCombinations(collection_of_meters, target=12, K=3) using itertools.
Additionally, you can chain iterators like enumerate directly into the product. You can also use itertools.islice to skip candidates that can't meet the criteria. That doesn't help in this particular problem, but if the 12 were different, it might be high enough that you can entirely skip the first few readings. Or if it's low enough, you can skip the last several readings.

Combinations of two lists in Python while preserving "column" order

I have an original list in Python that look like this:
list = [a, 1, b, 2, c, 3]
I've split it into two lists as follows:
list_1 = [a, b, c]
list_2 = [1, 2, 3]
What I want to find is a final list of lists that gives me all the possible combinations of list_1 and list_2 without changing any letter or number from their current "column". So the output should look like that:
Desired output:
final_list =[[a, b, c],
[a, 2, c],
[a, b, 3],
[a, 2, 3],
[1, b, c],
[1, 2, c],
[1, b, 3],
[1, 2, 3]]
Any ideas how I might be able to achive this?
Here is a brute-force recursive approach:
def get_combinations(curr_list, list1, list2, index, overall_list):
if index == len(list1) or index == len(list2):
overall_list.append(curr_list[:])
return
curr_list[index] = list1[index]
get_combinations(curr_list, list1, list2, index+1, overall_list)
curr_list[index] = list2[index]
get_combinations(curr_list, list1, list2, index+1, overall_list)
list1 = list(input().strip().split())
list2 = list(input().strip().split())
overall_list = []
curr_list = [None] * min(len(list1), len(list2))
get_combinations(curr_list, list1, list2, 0, overall_list)
for l in overall_list:
print(*l)
Input:
a b c
1 2 3
Output:
a b c
a b 3
a 2 c
a 2 3
1 b c
1 b 3
1 2 c
1 2 3
I think it will be easier to look at your problem this way:
list = [a, 1, b, 2, c, 3]
Make three lists, one for the values the first number can take, one for the second and one for the third:
first = [a,1]
second = [b,2]
third = [c,3]
To generate every combination, you can combine three for loops that iterate over the elements of your three lists. Which list is used in which for loop changes the order in which the combinations are generated. I made my example match your output:
for f in first:
for t in third:
for s in second:
//Here you have a [f,s,t] combination

Count number of appearances of a row in a tall matrix (list of lists)?

I have a tall (3,000,000 by 2) matrix, represented as a list of lists (A list with 3 million elements, each being a list with two elements) and I need to count the number of times each pair appears as a row (there's a finite number of possible pairs, around 5000). This is what I do so far, but it's highly inefficient:
for a in list1:
for b in list2:
count_here = tall_matrix.count([a,b])
Any ideas on how to make this quicker?
Thanks a lot!
This is damned simple using collections.Counter. Since your list contains sub-lists, and sub-lists are not hashable, you'll need to convert them to tuples first:
In [280]: x = [[1, 2], [1, 2], [3, 4], [4, 5], [5, 6], [4, 5]]
In [282]: c = collections.Counter(map(tuple, x))
In [283]: c
Out[283]: Counter({(1, 2): 2, (3, 4): 1, (4, 5): 2, (5, 6): 1})
c stores the counts of every single pair in your list.
Counter should do the trick :
Test for performance (using IPython) :
In [1]: import random
In [2]: a=[(random.randint(0, 10), random.randint(0, 10)) for i in range(3000000)]
In [3]: from collections import Counter
In [4]: %time c = Counter(a)
CPU times: user 940 ms, sys: 52 ms, total: 992 ms
Wall time: 891 ms

How does a Python custom comparator work?

I have the following Python dict:
[(2, [3, 4, 5]), (3, [1, 0, 0, 0, 1]), (4, [-1]), (10, [1, 2, 3])]
Now I want to sort them on the basis of sum of values of the values of dictionary, so for the first key the sum of values is 3+4+5=12.
I have written the following code that does the job:
def myComparator(a,b):
print "Values(a,b): ",(a,b)
sum_a=sum(a[1])
sum_b=sum(b[1])
print sum_a,sum_b
print "Comparision Returns:",cmp(sum_a,sum_b)
return cmp(sum_a,sum_b)
items.sort(myComparator)
print items
This is what the output that I get after running above:
Values(a,b): ((3, [1, 0, 0, 0, 1]), (2, [3, 4, 5]))
2 12
Comparision Returns: -1
Values(a,b): ((4, [-1]), (3, [1, 0, 0, 0, 1]))
-1 2
Comparision Returns: -1
Values(a,b): ((10, [1, 2, 3]), (4, [-1]))
6 -1
Comparision Returns: 1
Values(a,b): ((10, [1, 2, 3]), (3, [1, 0, 0, 0, 1]))
6 2
Comparision Returns: 1
Values(a,b): ((10, [1, 2, 3]), (2, [3, 4, 5]))
6 12
Comparision Returns: -1
[(4, [-1]), (3, [1, 0, 0, 0, 1]), (10, [1, 2, 3]), (2, [3, 4, 5])]
Now I am unable to understand as to how the comparator is working, which two values are being passed and how many such comparisons would happen? Is it creating a sorted list of keys internally where it keeps track of each comparison made? Also the behavior seems to be very random. I am confused, any help would be appreciated.
The number and which comparisons are done is not documented and in fact, it can freely change from different implementations. The only guarantee is that if the comparison function makes sense the method will sort the list.
CPython uses the Timsort algorithm to sort lists, so what you see is the order in which that algorithm is performing the comparisons (if I'm not mistaken for very short lists Timsort just uses insertion sort)
Python is not keeping track of "keys". It just calls your comparison function every time a comparison is made. So your function can be called many more than len(items) times.
If you want to use keys you should use the key argument. In fact you could do:
items.sort(key=lambda x: sum(x[1]))
This will create the keys and then sort using the usual comparison operator on the keys. This is guaranteed to call the function passed by key only len(items) times.
Given that your list is:
[a,b,c,d]
The sequence of comparisons you are seeing is:
b < a # -1 true --> [b, a, c, d]
c < b # -1 true --> [c, b, a, d]
d < c # 1 false
d < b # 1 false
d < a # -1 true --> [c, b, d, a]
how the comparator is working
This is well documented:
Compare the two objects x and y and return an integer according to the outcome. The return value is negative if x < y, zero if x == y and strictly positive if x > y.
Instead of calling the cmp function you could have written:
sum_a=sum(a[1])
sum_b=sum(b[1])
if sum_a < sum_b:
return -1
elif sum_a == sum_b:
return 0
else:
return 1
which two values are being passed
From your print statements you can see the two values that are passed. Let's look at the first iteration:
((3, [1, 0, 0, 0, 1]), (2, [3, 4, 5]))
What you are printing here is a tuple (a, b), so the actual values passed into your comparison functions are
a = (3, [1, 0, 0, 0, 1])
b = (2, [3, 4, 5]))
By means of your function, you then compare the sum of the two lists in each tuple, which you denote sum_a and sum_b in your code.
and how many such comparisons would happen?
I guess what you are really asking: How does the sort work, by just calling a single function?
The short answer is: it uses the Timsort algorithm, and it calls the comparison function O(n * log n) times (note that the actual number of calls is c * n * log n, where c > 0).
To understand what is happening, picture yourself sorting a list of values, say v = [4,2,6,3]. If you go about this systematically, you might do this:
start at the first value, at index i = 0
compare v[i] with v[i+1]
If v[i+1] < v[i], swap them
increase i, repeat from 2 until i == len(v) - 2
start at 1 until no further swaps occurred
So you get, i =
0: 2 < 4 => [2, 4, 6, 3] (swap)
1: 6 < 4 => [2, 4, 6, 3] (no swap)
2: 3 < 6 => [2, 4, 3, 6] (swap)
Start again:
0: 4 < 2 => [2, 4, 3, 6] (no swap)
1: 3 < 4 => [2, 3, 4, 6] (swap)
2: 6 < 4 => [2, 3, 4, 6] (no swap)
Start again - there will be no further swaps, so stop. Your list is sorted. In this example we have run through the list 3 times, and there were 3 * 3 = 9 comparisons.
Obviously this is not very efficient -- the sort() method only calls your comparator function 5 times. The reason is that it employs a more efficient sort algorithm than the simple one explained above.
Also the behavior seems to be very random.
Note that the sequence of values passed to your comparator function is not, in general, defined. However, the sort function does all the necessary comparisons between any two values of the iterable it receives.
Is it creating a sorted list of keys internally where it keeps track of each comparison made?
No, it is not keeping a list of keys internally. Rather the sorting algorithm essentially iterates over the list you give it. In fact it builds subsets of lists to avoid doing too many comparisons - there is a nice visualization of how the sorting algorithm works at Visualising Sorting Algorithms: Python's timsort by Aldo Cortesi
Basically, for the simple list such as [2, 4, 6, 3, 1] and the complex list you provided, the sorting algorithms are the same.
The only differences are the complexity of elements in the list and the comparing scheme that how to compare any tow elements (e.g. myComparator you provided).
There is a good description for Python Sorting: https://wiki.python.org/moin/HowTo/Sorting
First, the cmp() function:
cmp(...)
cmp(x, y) -> integer
Return negative if x<y, zero if x==y, positive if x>y.
You are using this line: items.sort(myComparator) which is equivalent to saying: items.sort(-1) or items.sort(0) or items.sort(1)
Since you want to sort based on the sum of each tuples list, you could do this:
mylist = [(2, [3, 4, 5]), (3, [1, 0, 0, 0, 1]), (4, [-1]), (10, [1, 2, 3])]
sorted(mylist, key=lambda pair: sum(pair[1]))
What this is doing is, I think, exactly what you wanted. Sorting mylist based on the sum() of each tuples list

Numpy - group data into sum values

Say I have an array of values:
a = np.array([1,5,4,2,4,3,1,2,4])
and three 'sum' values:
b = 10
c = 9
d = 7
Is there a way to group the values in a into groups of sets where the values combine to equal b,c and d? For example:
b: [5,2,3]
c: [4,4,1]
d: [4,2,1]
b: [5,4,1]
c: [2,4,3]
d: [4,2,1]
b: [4,2,4]
c: [5,4]
d: [1,1,2,3]
Note the sum of b,c and d should remain the same (==26). Perhaps this operation already has a name?
Here's a naive implementation using itertools
from itertools import chain, combinations
def group(n, iterable):
s = list(iterable)
return [c for c in chain.from_iterable(combinations(s, r)
for r in range(len(s)+1))
if sum(c) == n]
group(5, range(5))
yields
[(1, 4), (2, 3), (0, 1, 4), (0, 2, 3)]
Note, this probably will be very slow for large lists because we're essentially constructing and filtering through the power set of that list.
You could use this for
sum_vals = [10, 9, 7]
a = [1, 5, 4, 2, 4, 3, 1, 2, 4]
map(lambda x: group(x, a), sum_vals)
and then zip them together.

Categories