Ways to create unique single elimination brackets

Ways to create unique single elimination brackets - python

I'm looking for a method to create all possible unique starting positions for a N-player (N is a power of 2) single elimination (knockout) bracket tournament.
Lets say we have players 'A', 'B', 'C', and 'D' and want to find out all possible initial positions. The tournament would then look tike this:
A vs B, C vs D. Then winner(AB) vs winner(CD).
(I will use the notation (A,B,C,D) for the setup above)
Those would simply be all possible permutations of 4 elements, there are 4!=24 of those, and it's easy to generate them.
But they wouldn't be unique for the Tournament, since
(A,B,C,D), (B,A,C,D), (B,A,D,C), (C,D,A,B), ...
would all lead to the same matches being played.
In this case, the set of unique setups is, I think:
(A,B,C,D), (A,C,B,D), (A,D,C,B)
All other combinations would be "symmetric".
Now my questions would be for the general case of N=2^d players:
how many such unique setups are there?
is this a known problem I could look up? Haven't found it yet.
is there a method to generate them all?
how would this method look in python
(questions ranked by perceived usefulness)
I have stumpled upon this entry, but it does not really deal with the problem I'm discussing here.

how many such unique setups are there?
Let there be n teams. There are n! ways to list them in order. We'll start with that. Then deal with the over-counting.
Say we have 8 teams. One possibility is
ABCDEFGH
Swapping teams 1 and 2 won't make a difference. We can have
BACDEFGH
and the same teams play.Divide by 2 to account for that. Swapping 3 and 4 won't either. Divide by 2 again. Same with teams 5 and 6. Total there are 4 groups of 2 (4 matches in the first round). So we take n!, and divide by 2^(n/2).
But here is the thing. We can have order
CDABEFGH
In this example, we are swapping the first two with third and fourth. CDABEFGH is indistinguishable from ABCDEFGH for the purpose of this. So here, we can divide by 2^(n/4).
The same can happen over and over again. At the end, the total number of starting positions should be n!/(2^(n-1)).
We can also think of it a bit different. If we look at https://stackoverflow.com/posts/2269581/revisions, we can also think of it as a tree.
a b (runner up)
a e
a c e h
a b c d e f h g
Here there are 8! ways for us to arrange all the letters at the base, determining one way for the bracket to work out. If we are looking at the starting position, it doesn't matter who won. There were a total of 7 games (and each of the games could have turned out differently), so we divide by 2^7 to account for that over counting.

Related

Possible permutation of variable length arrays in Python

I have 3 different product (A, B, C) to produce.
The quantities to produce are fix: A=3, B=3, C=2. So all together 8 products.
The problem is that I only have 2 preparation lines for 3 production lines each for a specific product: LineA, LineB, LineC.
The preparation lines can prepare all 3 type of products
That means that I can only have 2 active production lines, the 3rd one is idle for that shift.
The duration of each working shift is equal.
So all together I'll have 4 working shifts (8 products / 2 preparation lines)
My question is: How can I write a algorithm which shows me all possible permutations.
The output would be something similar to this (this is just one permutation, I'd need all possibilities to see the idle shifts) :
LineA LineB LineC
A B -
A - C
- B C
A B -
EDIT:
The actual lists for the above mentioned output are:
AAA
BBB
CC
EDIT2
The itertools functions are not working here as they are not taking into account the finite number of items in each list.
I have a finite number of items so I need a list of list of all possible permutations/combination (in this example I would need n times of 4x3 matrix)
First I would need a first combination (like drawn in the example) than I would need all the possible combination of that.
Of course as the number is increasing there will be a bigger n for a ?x3 matrix
You can also forget the empty values, so in this case the result for the above mentioned example would be n times of a 4x2 matrix

If I am not overlooking something, the set of combinations A,B,C,- is constant. It is exactly those from your example. I cannot see a way to produce the numbers required any other way. So what you have to do is produce all permutations of the 4(3) production states you showed.
If preparation lines are allowed to be idle, then things get markedly more interesting.

Possibilities to Split Book-Stack

I'm struggeling with the following problem in Python. The problem is more like math specific than Python specific.
I've got a number of N books as a Stack. And a number of P stacks to split on.
I'm looking for the possibilites to split this stack, avoiding repetitions and empty stacks.
So let's say my stack is 4 books tall, what are the possibilities to split on 2 stacks ?
The possibilities to split would be:
(1,3)
(2,2)
There would also be the possibility of (3,1), but since (1,3) is already in my output, I don't want (3,1) to be there too.
Another example:
5 books, 3 stacks
(3,1,1)
(2,2,1)
Solutions like (1,1,3), (2,1,2) are not in my output beacause they are redundant.
Im looking for an EFFICIENT way to compute the tuples of stacks.
I'm working with a starting stack sizes up to 400, this stack could be split into another stack, which could also be split and so on.
Is there already a reference which covers this problem?
I think it would be easy to solve in combinatoric terms, but the problem here is, I am interested in the Possibilities themself and not just the number of possibilities !
Any help here?
cheers

Eliminating duplicates:
You can do this by taking the first permutation of each combination.
In other words having the smallest stacks in front.
E.g {1,2,3},{1,3,2},{2,1,3},{2,3,1},{3,1,2},{3,2,1}
Efficiency:
You probably want to do this with recursion, so at each step you know the possible size of the stack is at least the size of the previous
You know that all following stackSizes have to be at least the current size. So the maximum size is the number of books left divided by the number of stacks left (floor).
E.g. 10 books left for 3 stacks. floor(10/3) = 3. Which is right because the max combination left at that point is {3,3,4}
Hence this will prevent you to step into a failing combination.
Code
import math
def bookStack(cur, min, booksLeft, sizes):
if len(sizes) == (cur+1):
sizes[cur] = booksLeft
print(sizes)
return;
max = math.floor(booksLeft / (len(sizes)-cur))+1;
for take in range(min,max):
sizes[cur] = take
bookStack(cur+1, take, booksLeft-take, sizes)
For 5 books over 3 stacks, call this with:
bookStack(0,1,5,[0]*3)
Run it here
Remark: Although you want all unique combinations, this still is a fast growing function and will only work for a small number of stacks. Or when the number of stacks is almost equal with the number of books. You will notice.

Statistics: Optimizing probability calculations within python

Setup:
The question is complex form of a classic probability question:
70 colored balls are placed in an urn, 10 for each of the seven rainbow colors.
What is the expected number of distinct colors in 20 randomly picked balls?
My solution is python's itertools library:
combos = itertools.combinations(urn, 20),
print sum([1 for x in combos])
(where urn is a list of the 70 balls in the urn).
I can unpack the iterator up to a length of combinations(urn, 8) past that my computer can't handle it.
Note: I know this wouldn't give me the answer, this is only the road block in my script, in other words if this worked my script would work.
Question: How could I find the expected colors accurately, without the worlds fastest super computer? Is my way even computationally possible?

Since a couple of people have asked to see the mathematical solution, I'll give it. This is one of the Project Euler problems that can be done in a reasonable amount of time with pencil and paper. The answer is
7(1 - (60 choose 20)/(70 choose 20))
To get this write X, the count of colors present, as a sum X0+X1+X2+...+X6, where Xi is 1 if the ith color is present, and 0 if it is not present.
E(X)
= E(X0+X1+...+X6)
= E(X0) + E(X1) + ... + E(X6) by linearity of expectation
= 7E(X0) by symmetry
= 7 * probability that a particular color is present
= 7 * (1- probability that a particular color is absent)
= 7 * (1 - (# ways to pick 20 avoiding a color)/(# ways to pick 20))
= 7 * (1 - (60 choose 20)/(70 choose 20))
Expectation is always linear. So, when you are asked to find the average value of some random quantity, it often helps to try to rewrite the quantity as a sum of simpler pieces such as indicator (0-1) random variables.
This does not say how to make the OP's approach work. Although there is a direct mathematical solution, it is good to know how to iterate through the cases in an organized and practicable fashion. This could help if you next wanted a more complicated function of the set of colors present than the count. Duffymo's answer suggested something that I'll make more explicit:
You can break up the ways to draw 20 calls from 70 into categories indexed by the counts of colors. For example, the index (5,5,10,0,0,0,0) means we drew 5 of the first color, 5 of the second color, 10 of the third color, and none of the other colors.
The set of possible indices is contained in the collection of 7-tuples of nonnegative integers with sum 20. Some of these are impossible, such as (11,9,0,0,0,0,0) by the problem's assumption that there are only 10 balls of each color, but we can deal with that. The set of 7-tuples of nonnegative numbers adding up to 20 has size (26 choose 6)=230230, and it has a natural correspondence with the ways of choosing 6 dividers among 26 spaces for dividers or objects. So, if you have a way to iterate through the 6 element subsets of a 26 element set, you can convert these to iterate through all indices.
You still have to weight the cases by the counts of the ways to draw 20 balls from 70 to get that case. The weight of (a0,a1,a2,...,a6) is (10 choose a0)(10 choose a1)...*(10 choose a6). This handles the case of impossible indices gracefully, since 10 choose 11 is 0 so the product is 0.
So, if you didn't know about the mathematical solution by the linearity of expectation, you could iterate through 230230 cases and compute a weighted average of the number of nonzero coordinates of the index vector, weighted by a product of small binomial terms.

Wouldn't it just be combinations with repetition?
http://www.mathsisfun.com/combinatorics/combinations-permutations.html

Make an urn with 10 of each color.
Decide on the number of trials you want.
Make a container to hold the result of each trial
for each trial, pick a random sample of twenty items from the urn, make a set of those items, add the length of that set to the results.
find the average of the results

Comparing similarity between multiple strings with a random starting point

I have a bunch of people names that are tied to their respective Identifying Numbers (e.g. Social Security Number/National ID/Passport Number). Due to duplication though, one Identity Number can have upto 100 names which could be similar or totally different. E.g. ID 221 could have the names Richard Parker, Mary Parker, Aunt May, Parker Richard, M#rrrrryy Richard etc etc. Some typos but some totally different names.
Initially, I want to display only 3 (or a similar small number) of the names that are as different as possible from the rest so as to alert that viewer that the multiple names could not be typos but could be even a case of identity theft or negligent data capture or anything else!
I've read up on an algorithm to detect similarity and am currently looking at this one which would allow you to compute a score and a score of 1 means the two strings are the same while a lower score means they are dissimilar. In my use case, how can I go through say the 100 names and display the 3 that are most dissimilar? The algorithm for that just escapes my mind as I feel like I need a starting point and then look and compare among all others and loop again etc etc

Take the function from https://stackoverflow.com/a/14631287/1082673 as you mentioned and iterate over all combinations in your list. This will work if you have not that many entries, otherwise the computation time can increase pretty fast…
Here is how to generate the pairs for a given list:
import itertools
persons = ['person1', 'person2', 'person3']
for p1, p2 in itertools.combinations(persons, 2):
print "Compare", p1, "and", p2

Challenging dynamic programming problem

This is a toned down version of a computer vision problem I need to solve. Suppose you are given parameters n,q and have to count the number of ways of assigning integers 0..(q-1) to elements of n-by-n grid so that for each assignment the following are all true
No two neighbors (horizontally or vertically) get the same value.
Value at positions (i,j) is 0
Value at position (k,l) is 0
Since (i,j,k,l) are not given, the output should be an array of evaluations above, one for every valid setting of (i,j,k,l)
A brute force approach is below. The goal is to get an efficient algorithm that works for q<=100 and for n<=18.
def tuples(n,q):
return [[a,]+b for a in range(q) for b in tuples(n-1,q)] if n>1 else [[a] for a in range(q)]
def isvalid(t,n):
grid=[t[n*i:n*(i+1)] for i in range(n)];
for r in range(n):
for c in range(n):
v=grid[r][c]
left=grid[r][c-1] if c>0 else -1
right=grid[r][c-1] if c<n-1 else -1
top=grid[r-1][c] if r > 0 else -1
bottom=grid[r+1][c] if r < n-1 else -1
if v==left or v==right or v==top or v==bottom:
return False
return True
def count(n,q):
result=[]
for pos1 in range(n**2):
for pos2 in range(n**2):
total=0
for t in tuples(n**2,q):
if t[pos1]==0 and t[pos2]==0 and isvalid(t,n):
total+=1
result.append(total)
return result
assert count(2,2)==[1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1]
Update 11/11
I've also asked this on TopCoder forums, and their solution is the most efficient one I've seen so far (about 3 hours for n=10, any q, from author's estimate)

Maybe this sounds too simple, but it works. Randomly distribute values to all the cells until only two are empty. Test for adjacency of all values. Compute the average the percent of successful casts vs. all casts until the variance drops to within an acceptable margin.
The risk goes to zero and the that which is at risk is only a little runtime.

This isn't an answer, just a contribution to the discussion which is too long for a comment.
tl; dr; Any algorithm which boils down to, "Compute the possibilities and count them," such as Eric Lippert's or a brute force approach won't work for #Yaroslav's goal of q <= 100 and n <= 18.
Let's first think about a single n x 1 column. How many valid numberings of this one column exist? For the first cell we can pick between q numbers. Since we can't repeat vertically, we can pick between q - 1 numbers for the second cell, and therefore q - 1 numbers for the third cell, and so on. For q == 100 and n == 18 that means there are q * (q - 1) ^ (n - 1) = 100 * 99 ^ 17 valid colorings which is very roughly 10 ^ 36.
Now consider any two valid columns (call them the bread columns) separated by a buffer column (call it the mustard column). Here is a trivial algorithm to find a valid set of values for the mustard column when q >= 4. Start at the top cell of the mustard column. We only have to worry about the adjacent cells of the bread columns which have at most 2 unique values. Pick any third number for the mustard column. Consider the second cell of the mustard column. We must consider the previous mustard cell and the 2 adjacent bread cells with a total of at most 3 unique values. Pick the 4th value. Continue to fill out the mustard column.
We have at most 2 columns containing a hard coded cell of 0. Using mustard columns, we can therefore make at least 6 bread columns, each with about 10 ^ 36 solutions for a total of at least 10 ^ 216 valid solutions, give or take an order of magnitude for rounding errors.
There are, according to Wikipedia, about 10 ^ 80 atoms in the universe.
Therefore, be cleverer.

Update 11/11 I've also asked this on TopCoder forums, and their solution is the most efficient one I've seen so far (about 41 hours hours for n=10, any q, from author's estimate)
I'm the author. Not 41, just 3 embarrassingly parallelizable CPU hours. I've counted symmetries. For n=10 there are only 675 really distinct pairs of (i,j) and (k,l). My program needs ~ 16 seconds per each.

I'm building a contribution based on the contribution to the discussion by Dave Aaron Smith.
Let's not consider for now the last two constraints ((i,j) and (k,l)).
With only one column (nx1) the solution is q * (q - 1) ^ (n - 1).
How many choices for a second column ? (q-1) for the top cell (1,2) but then q-1 or q-2 for the cell (2,2) if (1,2)/(2,1) have or not the same color.
Same thing for (3,2) : q-1 or q-2 solutions.
We can see we have a binary tree of possibilities and we need to sum over that tree. Let's assume left child is always "same color on top and at left" and right child is "different colors".
By computing over the tree the number of possibilities for the left column to create a such configurations and the number of possibilities for the new cells we are coloring we would count the number of possibilities for coloring two columns.
But let's now consider the probability distribution foe the coloring of the second column : if we want to iterate the process, we need to have an uniform distribution on the second column, it would be like the first one never existed and among all coloring of the first two column we could say things like 1/q of them have color 0 in the top cell of second column.
Without an uniform distribution it would be impossible.
The problem : is the distribution uniform ?
Answer :
We would have obtain the same number of solution by building first the second column them the first one and then the third one. The distribution of the second column is uniform in that case so it also is in the first case.
We can now apply the same "tree idea" to count the number of possibilities for the third column.
I will try to develop on that and build a general formula (since the tree is of size 2^n we don't want to explicitly explore it).

A few observations which might help other answerers as well:
The values 1..q are interchangeable - they could be letters and the result would be the same.
The constraints that no neighbours match is a very mild one, so a brute force approach will be excessively expensive. Even if you knew the values in all but one cell, there would still be at least q-8 possibilities for q>8.
The output of this will be pretty long - every set of i,j,k,l will need a line. The number of combinations is something like n2(n2-3), since the two fixed zeroes can be anywhere except adjacent to each other, unless they need not obey the first rule. For n=100 and q=18, the maximally hard case, this is ~ 1004 = 100 million. So that's your minimum complexity, and is unavoidable as the problem is currently stated.
There are simple cases - when q=2, there are the two possible checkerboards, so for any given pair of zeroes the answer is 1.
Point 3 makes the whole program O( n2(n2-3) ) as a minimum, and also suggests that you will need something reasonably efficient for each pair of zeroes as simply writing 100 million lines without any computation will take a while. For reference, at a second per line, that is 1x108s ~ 3 years, or 3 months on a 12-core box.
I suspect that there is an elegant answer given a pair of zeroes, but I'm not sure that there is an analytic solution to it. Given that you can do it with 2 or 3 colours depending on the positions of the zeroes, you could split the map into a series of regions, each of which uses only 2 or 3 colours, and then it's just the number of different combinations of 2 or 3 in q (qC2 or qC3) for each region times the number of regions, times the number of ways of splitting the map.

I'm not a mathematician, but it occurs to me that there ought to be an analytical solution to this problem, namely:
First, compute now many different colourings are possible for NxN board with Q colours (including that neighbours, defined as having common edge don't get same color). This ought to be pretty simple formula.
Then figure out how many of these solutions have 0 in (i,j), this should be 1/Q's fraction.
Then figure out how many of remaining solutions have 0 in (k,l) depending on manhattan distance |i-k|+|j-l|, and possibly distance to the board edge and "parity" of these distances, as in distance divisible by 2, divisible by 3, divisible by Q.
The last part is the hardest, though I think it might still be doable if you are really good at math.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.