Get unique groups from a list - python

I am trying to create an algoritm to create unique groups of length = N from a list L with len(L) values.
EDIT: Unique group is the one, where any of the values has never been with any of the values in a group before.
EDIT: So if instead of values we would have people, anyone in the group should always meet only new people in the new group.
Say we have a list L and try to find unique groups of 4:
L = [1,2,3,4,5,6,7,8]
N = 4
unique_groups = [[1,2,3,4], [5,6,7,8]]
len(unique_groups) = 2
So here we have 8 values and 2 unique groups, any new group would contain at least one value which is contained in previous one, e.g. [1,2,3,5] or [1,3,5,7] contain at least two values from before, so these groups are not unique.
Where len(L) = 12, we have 3 different groups, while len(L) >= 16 gives us way more options:
L = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
N = 4
unique_groups = [[1,2,3,4], [5,6,7,8], [9,10,11,12], [13,14,15,16], [1,5,9,13], [2,3,7,12] ...]
len(unique_groups) = ?
I have tried few unintuitive and slow approaches, namely comparing all combinations, which should take a lot of time if length of list gets bigger.
This is one of the approaches:
def findsubsets(s, n):
return list(itertools.combinations(s, n))
s = [1,2,3,4,5,6,7,8]
sets = findsubsets(s,4)
sets_unique = []
def compare_sets(set1, set2):
init_eq = 0
for s1 in set1:
if s1 in set2 and init_eq > 0:
return False
elif s1 in set2 and init_eq < 1:
init_eq += 1
else:
continue
return True
for s in sets:
start_point = sets.index(s)
print(start_point)
for i in range(start_point + 1, len(sets) + 1):
set2 = sets[i]
if compare_sets(s, set2):
print(s, set2)
sets_unique.append(set2)
print(sets_unique)
EDIT2: Real life problem of this, is to match employees in a group of N, so these groups never contain two same people. Every person should meet only new people.

Your question update is quite clear now; thanks.
This problem is isomorphic to sets of points and lines in a projective plane. You are trying to construct as many lines as you can with N points on each line. Look under the subsection "A finite example" for a visualization of the process, and "Vector space construction" for the formal algorithm.
To give you an idea here you will begin at an arbitrarily chosen point1. Make sets (colinear points) by appending (conveniently) consecutive, disjoint triples:
1 2 3 4
1 5 6 7
1 8 9 10
1 11 12 13
...
This gives you all groups containing point 1. Now comes the mathematically interesting part: how you cross-link the points 2 and higher will define a projective plane; the solution from here is not unique. One standard algorithm searches for a homeomorphic solution, a greedy algorithm: at each choice point, choose the lowest numbered point that is legal for the current open slot.
This will give us
2 5 8 11
2 6 9 12
2 7 10 13
...
3 5 9 13
3 6 10 11
et cetera
You will need to define whether you want all possible distinct solutions, or merely the first one you can find. Each distinct solution defines a different projective plane topology.
Dealing with this from a standpoint of projective planes has you start with affines and derive the planes from there. Esepcially check out the three properties of an affine plane
Additional references:
http://www.mathpuzzle.com/MAA/47-Fano/mathgames_05_30_06.html
Does that give you enough to play with for now?

Related

Find the number of substrings of a string which can be divisible by a number k

given a string, I want to find all the sub-strings which can be formed from the original string which are divisible by an integer k. For example the string 14917 can form 7 sub-strings which are divisible by the integer 7. The sub-strings are: 14, 1491, 14917, 49, 91, 917 and 7. I have come up with a solution but it does not run efficiently when a large string is inputted. My code is
string = '14917'
divider = 7
count = 0
for i in range(len(string)):
for j in range(i+1, len(string)+1):
sub_string = string[i:j]
if int(sub_string) % divider == 0:
count += 1
print(count)
I have read about fast approaches for this kind of a problem, most of which talked about computing the rolling remainders of the string but I could not really implement it correctly. Is there any way in which this problem can be solved quickly. Thanks in advance.
Here is an outline of how to solve this problem if we just want the count, we don't mind that there are multiple ways of pulling the same substring out, and k is relatively prime to 10 (which 7 is).
First let's go from the last digit of our number to the first, keeping track of the remainder of the whole number. In the case of 14917 that means compiling the following table:
number 10**digits % 7 digit remainder
0
7 1 7 0+1*7 -> 0
17 3 1 0+3*1 -> 3
917 2 9 3+2*9 -> 0
4917 6 4 0+6*4 -> 3
14917 4 1 3+4*1 -> 0
Now here is the trick. Whenever you see the same remainder in two places, then from one to the other you've got something divisible by 7. So, for example, between the two 3's you get 49. If a particular value appears i times, then that represents i*(i-1)/2 (possibly identical) substrings that are divisible by 7.
If we want to get unique substrings, then we have to do a lot more work. But we can still be O(length of string) if we generate a suffix tree so that we can count the duplicates relatively quickly.
To actually produce the numbers, this approach will still be O(n^2). But it will be faster than your existing approach for large strings because you're only ever doing math with small integers. Converting to/from strings to numbers that are thousands of digits long is not particularly fast...
So here is more detail on the complications of the suffix tree approach for count of unique substrings. It is a lot harder to get right.
Above we proceeded from the end of the string back to the beginning, keeping track of the final remainder. But this means that what a particular digit adds to the remainder depends on its position in the string. However in a tree a given node is at different height's from the ends of the string. This makes the remainder at a particular node harder to calculate.
What we need to do is calculate some sort of remainder where the contribution of the current digit depends on its height, to instead keep the contribution of the current digit fixed. The trick to that is to multiply the set of possible remainders bubbling up by 10-1 instead. Then we'll get 0s if and only if the number starting here is divisible by k. What does 10-1 (mod k) mean? It means a number m such that (10*m) % k is 1. It can be seen by inspection that 5 works for 7 because 50 = 7*7 + 1. We can always find the inverse with trial and error. In general its existence and value can be more efficiently determined through Euler's Theorem. Either way, in our case it is 5.
Now it is more work to multiply the set of remainders by a number instead of the current digit, but it has the advantage that doing this we can merge branches of a tree. Consider, for example, a suffix tree for 5271756. (Note that uniqueness matters because the string 7 appears twice.)
(root):
a
b
c
d
e
(a): '17'
f
(b): '27'
a
(c): '5'
b
e
(d): '7'
a
f
(e): '6'(end)
(f): '5'
e
Now we can work our way back up the tree finding counts of remainders. The calculation for 756 illustrates the idea:
digit prev_remainders remainders
# for 6
6 {} {(6)%7: 1}
# for 5 56
5 {6: 1} {(5)%7: 1, (5+5*6)%7: 1}
{ 5: 1, 0: 1} = {0:1, 5:1}
# for 7 756 75
7 {0: 1, 2:1} {(7)%7: 1, (7+5*0)%7: 1, (7+5*5): 1}
{ 0: 1, 0: 1, 4: 1} = {0:2, 4:1}
And so at that point we have 2 strings divisible by 0 starting there, namely 7 and 756.
Filling out the whole tree starting from the root and bubbling back in the same way (done by hand, I could make mistakes - and made a lot of them the first time around!):
(root): {0:8, 1:6, 2:3, 4:1, 5:4, 6:4}
a
b
c
d
e
(a): '17' {0:1, 1:3}
f
(b): '27' {2:3, 6:3}
a
(c): '5' {0:4, 1:3, 5:1}
b
e
(d): '7' {0:3, 4:1, 5:3}
a
f
(e): '6'(end) {6:1}
(f): '5' {0:1, 5:1}
e
From which we conclude that there are 8 substrings divisible by 7. In fact they are:
175 (af)
5271 (cba)
52717 (cbaf)
5271756 (cbafe)
56 (ce)
7 (d)
7175 (daf)
756 (dcf)
What about the rest? What does it mean that, for example, there are 3 ways of getting 2? It means that there are 3 substrings s such that ( (s%7) * (5^(len(s)-1)) ) %7 == 2. So we didn't need that in our final answer, but we certainly did in the intermediate calculations!

Dynamic programming stack of plates [duplicate]

I've been given a little brainteaser to solve.
The task is to make a function with a single integer parameter. You have to figure out how many different combination of tower patterns you can make with that given amount of bricks (each proceeding tower must be less in height than one previous, kind of like). There must be 2 or more towers, one right next to the other.
For example, if you were given 3 blocks you can only produce 1 combination of towers, one with a height of 2 and its neighbor having a height of 1:
|
| |
2 1
Given 4 you can only still produce one combination since the next tower must be shorter than the previous:
|
|
| |
3 1
Given 5 you can produce 2 combinations:
|
|
|
| |
4 1
|
| |
| |
3 2
I have a function that can do all of this, however they give the example that 200 blocks should produce 487067745. Which my function simply does not do. I don't know what I am doing wrong. A push in the right direction would be very much appreciated. My function now looks like this:
def answer(num):
# divide the blocks so we have two towers, one with a height of n-1 and
# the other with a height of one
l1 = num-1
l2 = 1
combinations = 0
while True:
if l1 > l2:
# add 1 to the combinations along with how many combinations we
# can make using the blocks from tower two
combinations += 1 + answer(l2)
elif l1 == l2:
# see if we can make additional towers out of the rightmost tower
# and add that to the combinations
combinations += answer( l2 )
else:
# if the first tower is smaller than or equal to the other tower
# then we stop trying to make combinations
return combinations
l1 -= 1
l2 += 1
While this method does work for smaller numbers of bricks (returning 2 combinations for 5 blocks and 1 combination for 3 or 4 blocks), it does not work for much larger numbers that would be impossible to do on sheets of paper.
Wikipedia gives the generating function for the number of partitions of n with distinct parts as q(n) = product (1+x^k) for k=1..infinity. Given that you exclude the possibility of a single tower, the number of different valid tower arrangements is q(n)-1.
This gives this neat O(n^2) time and O(n) space program for counting tower arrangements.
def towers(n):
A = [1] + [0] * n
for k in xrange(1, n+1):
for i in xrange(n, k-1, -1):
A[i] += A[i-k]
return A[n] - 1
print towers(200)
The output is as required:
487067745
To understand the code, one can observe that A stores the first n+1 coefficients of the generating function product(1+x^k) for k=1...infinity. Each time through the k loop we add one more term to the product. We can stop at n rather than infinity, because subsequent terms of the product do not affect the first n+1 coefficients.
Another, more direct, way to think about the code is to define T(i, k) to be the number of tower combinations (including the single tower) with i blocks, and where the maximum height of any tower is k. Then:
T(0, 0) = 1
T(i, 0) = 0 if i > 0
T(i, k) = T(i, k-1) if i < k
= T(i, k-1) + T(i-k, k-1) if i >= k
Then one can observe that after j iterations of the for k loop, A contains the values of T(j, i) for i from 0 to n. The update is done somewhat carefully, updating the array from the end backwards so that results are changed only after they are used.
Imagine calling the function answer(6). Your code returns 2, the correct answer however is 3 (5, 1; 4, 2; 3, 2, 1). Why is this? your code stops when the amount of blocks above the bottom tower is greater than the length of the bottom tower, so it sees 3, 3 and stops, it therefor never considers the combination 3, 2, 1.
My advice would be to rethink the function, try to take into account the idea that you can stack a number of blocks N on top of a tower that is less than N high.

Brick Tower Building Puzzle

I've been given a little brainteaser to solve.
The task is to make a function with a single integer parameter. You have to figure out how many different combination of tower patterns you can make with that given amount of bricks (each proceeding tower must be less in height than one previous, kind of like). There must be 2 or more towers, one right next to the other.
For example, if you were given 3 blocks you can only produce 1 combination of towers, one with a height of 2 and its neighbor having a height of 1:
|
| |
2 1
Given 4 you can only still produce one combination since the next tower must be shorter than the previous:
|
|
| |
3 1
Given 5 you can produce 2 combinations:
|
|
|
| |
4 1
|
| |
| |
3 2
I have a function that can do all of this, however they give the example that 200 blocks should produce 487067745. Which my function simply does not do. I don't know what I am doing wrong. A push in the right direction would be very much appreciated. My function now looks like this:
def answer(num):
# divide the blocks so we have two towers, one with a height of n-1 and
# the other with a height of one
l1 = num-1
l2 = 1
combinations = 0
while True:
if l1 > l2:
# add 1 to the combinations along with how many combinations we
# can make using the blocks from tower two
combinations += 1 + answer(l2)
elif l1 == l2:
# see if we can make additional towers out of the rightmost tower
# and add that to the combinations
combinations += answer( l2 )
else:
# if the first tower is smaller than or equal to the other tower
# then we stop trying to make combinations
return combinations
l1 -= 1
l2 += 1
While this method does work for smaller numbers of bricks (returning 2 combinations for 5 blocks and 1 combination for 3 or 4 blocks), it does not work for much larger numbers that would be impossible to do on sheets of paper.
Wikipedia gives the generating function for the number of partitions of n with distinct parts as q(n) = product (1+x^k) for k=1..infinity. Given that you exclude the possibility of a single tower, the number of different valid tower arrangements is q(n)-1.
This gives this neat O(n^2) time and O(n) space program for counting tower arrangements.
def towers(n):
A = [1] + [0] * n
for k in xrange(1, n+1):
for i in xrange(n, k-1, -1):
A[i] += A[i-k]
return A[n] - 1
print towers(200)
The output is as required:
487067745
To understand the code, one can observe that A stores the first n+1 coefficients of the generating function product(1+x^k) for k=1...infinity. Each time through the k loop we add one more term to the product. We can stop at n rather than infinity, because subsequent terms of the product do not affect the first n+1 coefficients.
Another, more direct, way to think about the code is to define T(i, k) to be the number of tower combinations (including the single tower) with i blocks, and where the maximum height of any tower is k. Then:
T(0, 0) = 1
T(i, 0) = 0 if i > 0
T(i, k) = T(i, k-1) if i < k
= T(i, k-1) + T(i-k, k-1) if i >= k
Then one can observe that after j iterations of the for k loop, A contains the values of T(j, i) for i from 0 to n. The update is done somewhat carefully, updating the array from the end backwards so that results are changed only after they are used.
Imagine calling the function answer(6). Your code returns 2, the correct answer however is 3 (5, 1; 4, 2; 3, 2, 1). Why is this? your code stops when the amount of blocks above the bottom tower is greater than the length of the bottom tower, so it sees 3, 3 and stops, it therefor never considers the combination 3, 2, 1.
My advice would be to rethink the function, try to take into account the idea that you can stack a number of blocks N on top of a tower that is less than N high.

Algorithm - Grouping List in unique pairs

I'm having difficulties with an assignment I've received, and I am pretty sure the problem's text is flawed. I've translated it to this:
Consider a list x[1..2n] with elements from {1,2,..,m}, m < n. Propose and implement in Python an algorithm with a complexity of O(n) that groups the elements into pairs (pairs of (x[i],x[j]) with i < j) such as every element is present in a single pair. For each set of pairs, calculate the maximum sum of the pairs, then compare it with the rest of the sets. Return the set that has the minimum of those.
For example, x = [1,5,9,3] can be paired in three ways:
(1,5),(9,3) => Sums: 6, 12 => Maximum 12
(1,9),(5,3) => Sums: 10, 8 => Maximum 10
(1,3),(5,9) => Sums: 4, 14 => Maximum 14
----------
Minimum 10
Solution to be returned: (1,9),(5,3)
The things that strike me oddly are as follows:
Table contents definition It says that there are elements of 1..2n, from {1..m}, m < n. But if m < n, then there aren't enough elements to populate the list without duplicating some, which is not allowed. So then I would assume m >= 2n. Also, the example has n = 2 but uses elements that are greater than 1, so I assume that's what they meant.
O(n) complexity? So is there a way to combine them in a single loop? I can't think of anything.
My Calculations:
For n = 4:
Number of ways to combine: 6
Valid ways: 3
For n = 6
Number of ways to combine: 910
Valid ways: 15
For n = 8
Number of ways to combine: >30 000
Valid ways: ?
So obviously, I cannot use brute force and then figure out if it is valid after then. The formula I used to calculate the total possible ways is
C(C(n,2),n/2)
Question:
Is this problem wrongly written and impossible to solve? If so, what conditions should be added or removed to make it feasible? If you are going to suggest some code in python, remember I cannot use any prebuilt functions of any kind. Thank you
Assuming a sorted list:
def answer(L):
return list(zip(L[:len(L)//2], L[len(L)//2:][::-1]))
Or if you want to do it more manually:
def answer(L):
answer = []
for i in range(len(L)//2):
answer.append((L[i], L[len(L)-i-1)]))
return answer
Output:
In [3]: answer([1,3,5,9])
Out[3]: [(1, 9), (3, 5)]

k-greatest double selection

Imagine you have two sacks (A and B) with N and M balls respectively in it. Each ball with a known numeric value (profit). You are asked to extract (with replacement) the pair of balls with the maximum total profit (given by the multiplication of the selected balls).
The best extraction is obvious: Select the greatest valued ball from A as well as from B.
The problem comes when you are asked to give the 2nd or kth best selection. Following the previous approach you should select the greatest valued balls from A and B without repeating selections.
This can be clumsily solved calculating the value of every possible selection, ordering and ordering it (example in python):
def solution(A,B,K):
if K < 1:
return 0
pool = []
for a in A:
for b in B:
pool.append(a*b)
pool.sort(reverse=True)
if K>len(pool):
return 0
return pool[K-1]
This works but its worst time complexity is O(N*M*Log(M*M)) and I bet there are better solutions.
I reached a solution based on a table where A and B elements are sorted from higher value to lower and each of these values has associated an index representing the next value to test from the other column. Initially this table would look like:
The first element from A is 25 and it has to be tested (index 2 select from b = 0) against 20 so 25*20=500 is the first greatest selection and, after increasing the indexes to check, the table changes to:
Using these indexes we have a swift way to get the best selection candidates:
25 * 20 = 500 #first from A and second from B
20 * 20 = 400 #second from A and first from B
I tried to code this solution:
def solution(A,B,K):
if K < 1:
return 0
sa = sorted(A,reverse=true)
sb = sorted(B,reverse=true)
for k in xrange(K):
i = xfrom
j = yfrom
if i >= n and j >= n:
ret = 0
break
best = None
while i < n and j < n:
selected = False
#From left
nexti = i
nextj = sa[i][1]
a = sa[nexti][0]
b = sb[nextj][0]
if best is None or best[2]<a*b:
selected = True
best = [nexti,nextj,a*b,'l']
#From right
nexti = sb[j][1]
nextj = j
a = sa[nexti][0]
b = sb[nextj][0]
if best is None or best[2]<a*b:
selected = True
best = [nexti,nextj,a*b,'r']
#Keep looking?
if not selected or abs(best[0]-best[1])<2:
break
i = min(best[:2])+1
j = i
print("Continue with: ", best, selected,i,j)
#go,go,go
print(best)
if best[3] == 'l':
dx[best[0]][1] = best[1]+1
dy[best[1]][1] += 1
else:
dx[best[0]][1] += 1
dy[best[1]][1] = best[0]+1
if dx[best[0]][1]>= n:
xfrom = best[0]+1
if dy[best[1]][1]>= n:
yfrom = best[1]+1
ret = best[2]
return ret
But it did not work for the on-line Codility judge (Did I mention this is part of the solution to an, already expired, Codility challenge? Sillicium 2014)
My questions are:
Is the second approach an unfinished good solution? If that is the case, any clue on what I may be missing?
Do you know any better approach for the problem?
You need to maintain a priority queue.
You start with (sa[0], sb[0]), then move onto (sa[0], sb[1]) and (sa[1], sb[0]). If (sa[0] * sb[1]) > (sa[1] * sb[0]), can we say anything about the comparative sizes of (sa[0], sb[2]) and (sa[1], sb[0])?
The answer is no. Thus we must maintain a priority queue, and after removing each (sa[i], sb[j]) (such that sa[i] * sb[j] is the biggest in the queue), we must add to the priority queue (sa[i - 1], sb[j]) and (sa[i], sb[j - 1]), and repeat this k times.
Incidentally, I gave this algorithm as an answer to a different question. The algorithm may seem to be different at first, but essentially it's solving the same problem.
I'm not sure I understand the "with replacement" bit...
...but assuming this is in fact the same as "How to find pair with kth largest sum?", then the key to the solution is to consider the matrix S of all the sums (or products, in your case), constructed from A and B (once they are sorted) -- this paper (referenced by #EvgenyKluev) gives this clue.
(You want A*B rather than A+B... but the answer is the same -- though negative numbers complicate but (I think) do not invalidate the approach.)
An example shows what is going on:
for A = (2, 3, 5, 8, 13)
and B = (4, 8, 12, 16)
we have the (notional) array S, where S[r, c] = A[r] + B[c], in this case:
6 ( 2+4), 10 ( 2+8), 14 ( 2+12), 18 ( 2+16)
7 ( 3+4), 11 ( 3+8), 15 ( 3+12), 19 ( 3+16)
9 ( 5+4), 13 ( 5+8), 17 ( 5+12), 21 ( 5+16)
12 ( 8+4), 16 ( 8+8), 20 ( 8+12), 14 ( 8+16)
17 (13+4), 21 (13+8), 25 (13+12), 29 (13+16)
(As the referenced paper points out, we don't need to construct the array S, we can generate the value of an item in S if or when we need it.)
The really interesting thing is that each column of S contains values in ascending order (of course), so we can extract the values from S in descending order by doing a merge of the columns (reading from the bottom).
Of course, merging the columns can be done using a priority queue (heap) -- hence the max-heap solution. The simplest approach being to start the heap with the bottom row of S, marking each heap item with the column it came from. Then pop the top of the heap, and push the next item from the same column as the one just popped, until you pop the kth item. (Since the bottom row is sorted, it is a trivial matter to seed the heap with it.)
The complexity of this is O(k log n) -- where 'n' is the number of columns. The procedure works equally well if you process the rows... so if there are 'm' rows and 'n' columns, you can choose the smaller of the two !
NB: the complexity is not O(k log k)... and since for a given pair of A and B the 'n' is constant, O(k log n) is really O(k) !!
If you want to do many probes for different 'k', then the trick might be to cache the state of the process every now and then, so that future 'k's can be done by restarting from the nearest check-point. In the limit, one would run the merge to completion and store all possible values, for O(1) lookup !

Categories