Create a list of unique numbers by applying transitive closure

Create a list of unique numbers by applying transitive closure - python

I have a list of tuples (each tuple consists of 2 numbers) like:
array = [(1, 2), (1, 3), (2, 4), (5, 8), (8, 10)]
Lets say, these numbers are ids of some db objects (records) and inside a tuple, there are ids of duplicate objects. Which means 1 and 2 are duplicate. 1 and 3 are duplicate which means 2 and 3 are also duplicate.
if a == b and b == c then a == c
Now I want to merge all these duplicate objects ids into a single tuple like this:
output = [(1, 2, 3, 4), (5, 8, 10)]
I know I can do this using loops and redundant matches. I just want some better solution with low processing / calculations (if there is any).

You can use a data structure making it more efficient to perform a merge. Here you create some sort of opposite tree. So in your example you first would create the numbers listed:
1 2 3 4 5 8 10
Now if you iterate over the (1,2) tuple, you look up 1 and 2 in some sort of dictionary. You search their ancestors (there are none here) and then you create some sort of merge node:
1 2 3 4 5 8 10
\/
12
Next we merge (1,3) so we look up the ancestor of 1 (12) and 3 (3) and perform another merge:
1 2 3 4 5 8 10
\/ |
12 /
\/
123
Next we merge (2,4) and (5,8) and (8,10):
1 2 3 4 5 8 10
\/ | | \/ |
12 / | 58 /
\/ / \/
123 / 5810
\/
1234
You also keep a list of the "merge-heads" so you can easily return the elements.
Time to get our hands dirty
So now that we know how to construct such a datastructure, let's implement one. First we define a node:
class Merge:
def __init__(self,value=None,parent=None,subs=()):
self.value = value
self.parent = parent
self.subs = subs
def get_ancestor(self):
cur = self
while cur.parent is not None:
cur = cur.parent
return cur
def __iter__(self):
if self.value is not None:
yield self.value
elif self.subs:
for sub in self.subs:
for val in sub:
yield val
Now we first initialize a dictionary for every element in your list:
vals = set(x for tup in array for x in tup)
and create a dictionary for every element in vals that maps to a Merge:
dic = {val:Merge(val) for val in vals}
and the merge_heads:
merge_heads = set(dic.values())
Now for each tuple in the array, we lookup the corresponding Merge object that is the ancestor, we create a new Merge on top of that, remove the two old heads from the merge_head set and add the new merge to it:
for frm,to in array:
mra = dic[frm].get_ancestor()
mrb = dic[to].get_ancestor()
mr = Merge(subs=(mra,mrb))
mra.parent = mr
mrb.parent = mr
merge_heads.remove(mra)
merge_heads.remove(mrb)
merge_heads.add(mr)
Finally after we have done that we can simply construct a set for each Merge in merge_heads:
resulting_sets = [set(merge) for merge in merge_heads]
and resulting_sets will be (order may vary):
[{1, 2, 3, 4}, {8, 10, 5}]
Putting it all together (without class definition):
vals = set(x for tup in array for x in tup)
dic = {val:Merge(val) for val in vals}
merge_heads = set(dic.values())
for frm,to in array:
mra = dic[frm].get_ancestor()
mrb = dic[to].get_ancestor()
mr = Merge(subs=(mra,mrb))
mra.parent = mr
mrb.parent = mr
merge_heads.remove(mra)
merge_heads.remove(mrb)
merge_heads.add(mr)
resulting_sets = [set(merge) for merge in merge_heads]
This will worst case run in O(n2), but you can balance the tree such that the ancestor is found in O(log n) instead, making it O(n log n). Furthermore you can short-circuit the list of ancestors, making it even faster.

You can use disjoint set.
Disjoint set is actually a kind of tree structure. Let's consider each number as a tree node, and every time we read in an edge (u, v), we just easily associate the two trees u and v in (if it does not exist, create an one-node tree instead) by pointing the root node of one tree to another's. At the end, we should just walk through the forest to get the result.
from collections import defaultdict
def relation(array):
mapping = {}
def parent(u):
if mapping[u] == u:
return u
mapping[u] = parent(mapping[u])
return mapping[u]
for u, v in array:
if u not in mapping:
mapping[u] = u
if v not in mapping:
mapping[v] = v
mapping[parent(u)] = parent(v)
results = defaultdict(set)
for u in mapping.keys():
results[parent(u)].add(u)
return [tuple(x) for x in results.values()]
In the code above, mapping[u] stores the ancestor of node u (parent or root). Specially, the ancestor of one-node tree's node is itself.

See my comment on Moinuddin's answer : this accepted answer does not validates the tests that you can found at http://rosettacode.org/wiki/Set_consolidation#Python . I did not dig it up though.
I would make a new proposition, based on Willem's answer.
The problem in this proposition is the recursivity in the get_ancestor calls : why should we climb up the tree each time we are asked our ancestor, when we could just remember the last root found (and still climb up from that point in case it changed). Indeed, Willem's algorithm is not linear (something like nlogn or n²) while we could remove this non-linearity just as easily.
Another problem comes from the iterator : if the tree is too deep (I had the problem in my use case), you get a Python Exception (Too much recursion) inside the iterator. So instead of building a full tree, we should merge sub leafs (and instead of having branches with 2 leafs, we build branches with N leafs).
My version of the code is as follow :
class Merge:
def __init__(self,value=None,parent=None,subs=None):
self.value = value
self.parent = parent
self.subs = subs
self.root = None
if self.subs:
subs_a,subs_b = self.subs
if subs_a.subs:
subs_a = subs_a.subs
else:
subs_a = [subs_a]
if subs_b.subs:
subs_b = subs_b.subs
else:
subs_b = [subs_b]
self.subs = subs_a+subs_b
for s in self.subs:
s.parent = self
s.root = None
def get_ancestor(self):
cur = self if self.root is None else self.root
while cur.parent is not None:
cur = cur.parent
if cur != self:
self.root = cur
return cur
def __iter__(self):
if self.value is not None:
yield self.value
elif self.subs:
for sub in self.subs:
for val in sub:
yield val
def treeconsolidate(array):
vals = set(x for tup in array for x in tup)
dic = {val:Merge(val) for val in vals}
merge_heads = set(dic.values())
for settomerge in array:
frm = settomerge.pop()
for to in settomerge:
mra = dic[frm].get_ancestor()
mrb = dic[to].get_ancestor()
if mra == mrb:
continue
mr = Merge(subs=[mra,mrb])
merge_heads.remove(mra)
merge_heads.remove(mrb)
merge_heads.add(mr)
resulting_sets = [set(merge) for merge in merge_heads]
return resulting_sets
In small merges, this will not change many things but my experience shows that climbing up the tree in huge sets of many elements can cost a lot : in my case, I have to deal with 100k sets, each of them containing between 2 and 1000 elements, and each element may appear in 1 to 1000 sets...

I think the most efficient way to achieve this will be using set as:
def transitive_cloure(array):
new_list = [set(array.pop(0))] # initialize first set with value of index `0`
for item in array:
for i, s in enumerate(new_list):
if any(x in s for x in item):
new_list[i] = new_list[i].union(item)
break
else:
new_list.append(set(item))
return new_list
Sample run:
>>> transitive_cloure([(1,2), (1,3), (2,4), (5,8), (8,10)])
[{1, 2, 3, 4}, {8, 10, 5}]
Comparison with other answers (on Python 3.4):
This answer: 6.238126921001822
>>> timeit.timeit("moin()", setup="from __main__ import moin")
6.238126921001822
Willem's solution: 29.115453064994654 (Time related to declaration of class is excluded)
>>> timeit.timeit("willem()", setup="from __main__ import willem")
29.115453064994654
hsfzxjy's solution: 10.049749890022213
>>> timeit.timeit("hsfzxjy()", setup="from __main__ import hsfzxjy")
10.049749890022213

Related

Pass in any variable while guaranteeing predefined relationship

I have a simple formula taking basic arithmetic calculations given several inputs.
a = 1
b = 2
c = a + b #3=1+2
d = 4
e = c + d #7=3+4
In theory, the relationships should always hold true. And I want to write a function that user can modify any variable and the rest will be auto updated (update priority has been predefined if there are more than one alternative eg. update the most right node first).
def f():
#default state
state = {'a':1, 'b':2, 'c':3, 'd':4, 'e':7}
...
return state
f(a=0) == {'a':0, 'b':2, 'c':2, 'd':4, 'e':6}
f(c=4) == {'a':1, 'b':3, 'c':4, 'd':4, 'e':8}
f(b=2, c=4) == {'a':2, 'b':2, 'c':4, 'd':4, 'e':8}
I tried to use **kwargs and *args to allow the user to pass in any variable but have to hard code the update logic based on which variable got modified. Any better ideas?
P.S.: this example is for demonstration purpose; the real problem involves much more variables and the mathematical relationship is also more difficult (logarithm, exponential, ..)

You indeed can represent the state of the problem as a directional graph, where each edge between the variables contains an arithematic expression to use on the other node to update its state.
Than, use the BFS algorithem starting at the variable that you try to change and it will modify all of the varialbes that are linked to it and so forward.
If you have to change multiple varialbes, run the bfs multiple times, starting at each variable.
Than, if the one of the variable you ran the BFS on changes (and does not contain the wanted value), than you know the wanted state is not possible (because the variables are co-dependent).
So each time you add a state to the problem, you have to modify only the edges connected to it (and from it).

Problem Analysis
Two numbers get combined with a mathematical operators, resulting in another number
Resultant numbers depend on a relationship,
These relationships and numbers are illustrated as a tree structure.
Hence, there are two types of cells (#Copperfield):
Free cells, not depending on a relationship, dangling as leaves in the tree.
Inner cells, depending on a relationship between two cells, we will call them nodes.
Resulting tree never makes cycles.
Assumptions and Rationale
In his comments, #B.Mr.W. says, each relationship is formed by mathematical operators and there can be more than one nodes pointing to another node.
I assume he has relations like d = a - b * c. Evaluation of such expressions / relations has to follow operator precedence.
And, such expressions will be anyhow resolved to d = a - (b*c).
Evaluation of such expressions would result in sub-relationships which are again binary.
Note: These binary sub-relationships remain anonymous.
Requirements
Create new Leaf cells storing a number.
Create new Node cells by define relationships between other cells.
A Change to any cell should update all related cells, without affecting the relationship. That is, a cascading effect along the related branches.
Requirement 3. has to follow an ordering preference. The right-side node is preferred by default.
Solution Analysis
Both types of cells can be combined within or with other-type using a mathematical operator. (Implemented in class Cell)
I follow Python Data Model to solve this problem and interface style becomes:
# create new Leaf nodes with values:
a = Leaf(2)
b = Leaf(3)
c = Leaf(4)
# define relationships
d = a + b*c # (b*c) becomes anonymous
e = d - Leaf(3) # Leaf(3) is also anonymous
( ( 2 + ( 3 * 4 ) ) - 3 ) => { 'a':2, 'b':3, 'c':4, 'd':14, 'e':11 }
# get values of known nodes
a(), b(), c(), d(), e()
# update values
a(1) => { 'a':1, 'b':3, 'c':4, 'd':13, 'e':10 }
b(2) => { 'a':1, 'b':2, 'c':4, 'd':9, 'e':6 }
c(3) => { 'a':1, 'b':2, 'c':3, 'd':7, 'e':4 }
Code:
"""Super-type of Leaves and Nodes."""
class Cell:
"""Arithematic operation handlers between Leaves/Nodes"""
def __add__(self, other):
return Node(leftCell=self, rightCell=other, op='+')
def __sub__(self, other):
return Node(leftCell=self, rightCell=other, op='-')
def __mul__(self, other):
return Node(leftCell=self, rightCell=other, op='*')
def __truediv__(self, other):
return Node(leftCell=self, rightCell=other, op='/')
"""for clean presentation of float values"""
#staticmethod
def cleanFloat(val:int|float):
if isinstance(val, float):
rounded = round(val)
if abs(rounded-val)<0.011:
return rounded
else:
return round(val, 2)
return val
"""Leaves will contain values only"""
class Leaf(Cell):
def __init__(self, val:int|float):
self.val = val
"""Getter/Setter for Leaf val"""
#property
def val(self):
return self.cleanFloat(self.__val)
#val.setter
def val(self, val:int|float):
self.__val = self.cleanFloat(val)
"""Getter/Setter of Leaf object."""
def __call__(self, val:int|float=None):
if val == None:
return self.val
else:
self.val = val
def __str__(self):
return f"{self.val}"
"""Nodes contain left/right child, an arithematic operation and preferred side for update"""
class Node(Cell):
def __init__(self, leftCell:Cell, rightCell:Cell,
op:str, prefSide:str='right'):
self.leftCell = leftCell
self.rightCell = rightCell
self.op = op
self.prefSide = prefSide
"""
Preferred and the other cells for reverse path operation required during update.
These properties will help clean retrieval.
"""
#property
def preferredCell(self):
match self.prefSide:
case 'left' : return self.leftCell
case 'right': return self.rightCell
#property
def unPreferredCell(self):
match self.prefSide:
case 'left' : return self.rightCell
case 'right': return self.leftCell
"""Getter/Setter for Nodes"""
def __call__(self, val :int|float = None):
if val == None:
match self.op:
case '+':
nodeVal = self.leftCell() + self.rightCell()
case '-':
nodeVal = self.leftCell() - self.rightCell()
case '*':
nodeVal = self.leftCell() * self.rightCell()
case '/':
nodeVal = self.leftCell() / self.rightCell()
case _:
raise
return self.cleanFloat(nodeVal)
else:
match self.op:
case '+':
self.preferredCell( val - self.unPreferredCell() )
case '*':
self.preferredCell( val / self.unPreferredCell() )
case '-':
match self.prefSide:
case 'left' :
self.preferredCell( val + self.unPreferredCell() )
case 'right' :
self.preferredCell( self.unPreferredCell() - val )
case '/':
match self.prefSide:
case 'left ' :
self.preferredCell( val * self.unPreferredCell() )
case 'right' :
self.preferredCell( self.unPreferredCell() / val )
case _:
raise
def __str__(self):
return f"( {str(self.leftCell)} {self.op} {str(self.rightCell)} )"
if __name__ == '__main__':
def createTree():
# create new Leaf nodes having values
a = Leaf(2)
b = Leaf(3)
c = Leaf(4)
d = Leaf(5)
e = Leaf(6)
# define relationships
f = a + b
g = d / f - c # (d / f) becomes anonymous, higher precedence
h = Leaf(9) / e * g
# here, (Leaf(9)/e) creates the anonymous node, left to right associativity
return (a,b,c,d,e,f,g,h)
(a,b,c,d,e,f,g,h) = createTree()
def treeDict():
return f"{{ 'a':{a()}, 'b':{b()}, 'c':{c()}, 'd':{d()}, 'e':{e()}, 'f':{f()}, 'g':{g()}, 'h':{h()} }}"
print('\nget values of known cells:')
print(f"{h} => {treeDict()}\n")
print('each cell expanded (take care of anonymous cells):')
print(f"'a':{a}\n'b':{b}\n'c':{c}\n'd':{d}\n'e':{e}\n'f':{f}\n'g':{g}\n'h':{h}\n")
print('update values:')
a(1)
print( f"a(1) => {treeDict()}")
b(2)
print( f"b(2) => {treeDict()}")
c(3)
print( f"c(3) => {treeDict()}")
f(10)
print(f"f(10) => {treeDict()}")
g(10)
print(f"g(10) => {treeDict()}")
h(100)
print(f"h(100) => {treeDict()}")
print('\nchange ordering preference: g.prefSide = "left"')
(a,b,c,d,e,f,g,h) = createTree()
g.prefSide = 'left'
g(1)
print(f"g(1) => {treeDict()}")
print('\nchange ordering preference: g.prefSide = "left"')
(a,b,c,d,e,f,g,h) = createTree()
g.prefSide = 'left'
h(0)
print(f"h(0) => {treeDict()}")
print("\nAccessing Anonymous Cells:")
print(f"h.leftCell() : {h.leftCell()}")
print(f"h.leftCell.leftCell() : {h.leftCell.leftCell()}")

I wonder if this (using SymPy) gets at the desired behavior:
from sympy.abc import *
from sympy import *
ed = Tuple(
(Eq(d, a+b+c),c),
(Eq(g, e+f),f),
(Eq(h,d+g),g))
def indep(ed):
free = set()
dep = set()
for i,_ in ed:
assert i.lhs.is_Symbol
free |=i.rhs.free_symbols
dep |= {i.lhs}
return free - dep
def F(x, v, eq):
eq = list(eq)
free = indep(eq)
if x in free:
# leaf
return [i[0] for i in eq], free
else:
for i,y in eq:
if i.lhs == x:
return [i[0] for i in eq] + [Eq(x, v)], free - {y}
assert None, 'unknown parameter'
def update(x, v, ed):
eqs, ex = F(x, v, ed)
sol = solve(eqs, exclude=ex, dict=True)
assert len(sol) == 1
return Dict(sol), list(ordered(ex))
ed represents the list of equations and the defined "dependent/rightmost" variable for each equation, however you want to define it.
So if you want to update g = 10 you would have the following values:
>>> update(g, 10, ed)
({d: a + b + c, f: 10 - e, g: 10, h: a + b + c + 10}, [a, b, c, e])
The variables on the right are the ones that you would have to specify in order to get known values for all others
>>> _[0].subs(dict(zip(_[1],(1,1,1,1))))
{d: 3, f: 9, g: 10, h: 13}

Is it possible to make this algorithm recursive?

Background
We have a family tradition where my and my siblings' Christmas presents are identified by a code that can be solved using only numbers related to us. For example, the code could be birth month * age + graduation year (This is a simple one). If the numbers were 8 * 22 + 2020 = 2196, the number 2196 would be written on all my Christmas presents.
I've already created a Python class that solves the code with certain constraints, but I'm wondering if it's possible to do it recursively.
Current Code
The first function returns a result set for all possible combinations of numbers and operations that produce a value in target_values
#Master algorithm (Get the result set of all combinations of numbers and cartesian products of operations that reach a target_value, using only the number_of_numbers_in_solution)
#Example: sibling1.results[1] = [(3, 22, 4), (<built-in function add>, <built-in function add>), 29]. This means that 3 + 22 + 4 = 29, and 29 is in target_values
import operator
from itertools import product
from itertools import combinations
NUMBER_OF_OPERATIONS_IN_SOLUTION = 2 #Total numbers involved is this plus 1
NUMBER_OF_NUMBERS_IN_SOLUTION = NUMBER_OF_OPERATIONS_IN_SOLUTION + 1
TARGET_VALUES = {22,27,29,38,39}
def getresults( list ):
#Add the cartesian product of all possible operations to a variable ops
ops = []
opslist = [operator.add, operator.sub, operator.mul, operator.truediv]
for val in product(opslist, repeat=NUMBER_OF_OPERATIONS_IN_SOLUTION):
ops.append(val)
#Get the result set of all combinations of numbers and cartesian products of operations that reach a target_value
results = []
for x in combinations(list, NUMBER_OF_NUMBERS_IN_SOLUTION):
for y in ops:
result = 0
for z in range(len(y)):
#On the first iteration, do the operation on the first two numbers (x[z] and x[z+1])
if (z == 0):
#print(y[z], x[z], x[z+1])
result = y[z](x[z], x[z+1])
#For all other iterations, do the operation on the current result and x[z+1])
else:
#print(y[z], result, x[z+1])
result = y[z](result, x[z+1])
if result in TARGET_VALUES:
results.append([x, y, result])
#print (x, y)
print(len(results))
return results
Then a class that takes in personal parameters for each person and gets the result set
def getalpha( str, inverse ):
"Converts string to alphanumeric array of chars"
array = []
for i in range(0, len(str)):
alpha = ord(str[i]) - 96
if inverse:
array.append(27 - alpha)
else:
array.append(alpha)
return array;
class Person:
def __init__(self, name, middlename, birthmonth, birthday, birthyear, age, orderofbirth, gradyear, state, zip, workzip, cityfirst3):
#final list
self.listofnums = []
self.listofnums.extend((birthmonth, birthday, birthyear, birthyear - 1900, age, orderofbirth, gradyear, gradyear - 2000, zip, workzip))
self.listofnums.extend(getalpha(cityfirst3, False))
self.results = getresults(self.listofnums)
Finally, a "solve code" method that takes from the result sets and finds any possible combinations that produce the full list of target_values.
#Compares the values of two sets
def compare(l1, l2):
result = all(map(lambda x, y: x == y, l1, l2))
return result and len(l1) == len(l2)
#Check every result in sibling2 with a different result target_value and equal operation sets
def comparetwosiblings(current_values, sibling1, sibling2, a, b):
if sibling2.results[b][2] not in current_values and compare(sibling1.results[a][1], sibling2.results[b][1]):
okay = True
#If the indexes aren't alphanumeric, ensure they're the same before adding to new result set
for c in range(0, NUMBER_OF_NUMBERS_IN_SOLUTION):
indexintersection = set([index for index, value in enumerate(sibling1.listofnums) if value == sibling1.results[a][0][c]]) & set([index for index, value in enumerate(sibling2.listofnums) if value == sibling2.results[b][0][c]])
if len(indexintersection) > 0:
okay = True
else:
okay = False
break
else:
okay = False
return okay
#For every result, we start by adding the result number to the current_values list for sibling1, then cycle through each person and see if a matching operator list leads to a different result number. (Matching indices as well)
#If there's a result set for everyone that leads to five different numbers in the code, the values will be added to the newresult set
def solvecode( sibling1, sibling2, sibling3, sibling4, sibling5 ):
newresults = []
current_values = []
#For every result in sibling1
for a in range(len(sibling1.results)):
current_values = []
current_values.append(sibling1.results[a][2])
for b in range(len(sibling2.results)):
if comparetwosiblings(current_values, sibling1, sibling2, a, b):
current_values.append(sibling2.results[b][2])
for c in range(len(sibling3.results)):
if comparetwosiblings(current_values, sibling1, sibling3, a, c):
current_values.append(sibling3.results[c][2])
for d in range(len(sibling4.results)):
if comparetwosiblings(current_values, sibling1, sibling4, a, d):
current_values.append(sibling4.results[d][2])
for e in range(len(sibling5.results)):
if comparetwosiblings(current_values, sibling1, sibling5, a, e):
newresults.append([sibling1.results[a][0], sibling2.results[b][0], sibling3.results[c][0], sibling4.results[d][0], sibling5.results[e][0], sibling1.results[a][1]])
current_values.remove(sibling4.results[d][2])
current_values.remove(sibling3.results[c][2])
current_values.remove(sibling2.results[b][2])
print(len(newresults))
print(newresults)
It's the last "solvecode" method that I'm wondering if I can optimize and make into a recursive algorithm. In some cases it can be helpful to add or remove a sibling, which would look nice recursively (My mom sometimes makes a mistake with one sibling, or we get a new brother/sister-in-law)
Thank you for any and all help! I hope you at least get a laugh out of my weird family tradition.
Edit: In case you want to test the algorithm, here's an example group of siblings that result in exactly one correct solution
#ALL PERSONAL INFO CHANGED FOR STACKOVERFLOW
sibling1 = Person("sibling1", "horatio", 7, 8, 1998, 22, 5, 2020, "ma", 11111, 11111, "red")
sibling2 = Person("sibling2", "liem", 2, 21, 1995, 25, 4, 2018, "ma", 11111, 11111, "pho")
sibling3 = Person("sibling3", "kyle", 4, 21, 1993, 26, 3, 2016, "ma", 11111, 11111, "okl")
sibling4 = Person("sibling4", "jamal", 4, 7, 1991, 29, 2, 2014, "ma", 11111, 11111, "pla")
sibling5 = Person("sibling5", "roberto", 9, 23, 1990, 30, 1, 2012, "ma", 11111, 11111, "boe")

I just spent a while improving the code. Few things I need to mention:
It's not good practice to use python keywords(like list, str and zip) as variables, it will give you problems and it makes it harder to debug.
I feel like you should use the permutation function as combination gives unordered pairs while permutation gives ordered pairs which are more in number and will give more results. For example, for the sibling info you gave combination gives only 1 solution through solvecode() while permutation gives 12.
Because you are working with operators, there can be more cases with brackets. To solve that problem and to make the getresults() function a bit more optimized, I suggest you explore the reverse polish notation. Computerphile has an excellent video on it.
You don't need a compare function. list1==list2 works.
Here's the optimized code:
import operator
from itertools import product
from itertools import permutations
NUMBER_OF_OPERATIONS_IN_SOLUTION = 2 #Total numbers involved is this plus 1
NUMBER_OF_NUMBERS_IN_SOLUTION = NUMBER_OF_OPERATIONS_IN_SOLUTION + 1
TARGET_VALUES = {22,27,29,38,39}
def getresults(listofnums):
#Add the cartesian product of all possible operations to a variable ops
ops = []
opslist = [operator.add, operator.sub, operator.mul, operator.truediv]
for val in product(opslist, repeat=NUMBER_OF_OPERATIONS_IN_SOLUTION):
ops.append(val)
#Get the result set of all combinations of numbers and cartesian products of operations that reach a target_value
results = []
for x in permutations(listofnums, NUMBER_OF_NUMBERS_IN_SOLUTION):
for y in ops:
result = y[0](x[0], x[1])
if NUMBER_OF_OPERATIONS_IN_SOLUTION>1:
for z in range(1, len(y)):
result = y[z](result, x[z+1])
if result in TARGET_VALUES:
results.append([x, y, result])
return results
def getalpha(string, inverse):
"Converts string to alphanumeric array of chars"
array = []
for i in range(0, len(string)):
alpha = ord(string[i]) - 96
array.append(27-alpha if inverse else alpha)
return array
class Person:
def __init__(self, name, middlename, birthmonth, birthday, birthyear, age, orderofbirth, gradyear, state, zipcode, workzip, cityfirst3):
#final list
self.listofnums = [birthmonth, birthday, birthyear, birthyear - 1900, age, orderofbirth, gradyear, gradyear - 2000, zipcode, workzip]
self.listofnums.extend(getalpha(cityfirst3, False))
self.results = getresults(self.listofnums)
#Check every result in sibling2 with a different result target_value and equal operation sets
def comparetwosiblings(current_values, sibling1, sibling2, a, b):
if sibling2.results[b][2] not in current_values and sibling1.results[a][1]==sibling2.results[b][1]:
okay = True
#If the indexes aren't alphanumeric, ensure they're the same before adding to new result set
for c in range(0, NUMBER_OF_NUMBERS_IN_SOLUTION):
indexintersection = set([index for index, value in enumerate(sibling1.listofnums) if value == sibling1.results[a][0][c]]) & set([index for index, value in enumerate(sibling2.listofnums) if value == sibling2.results[b][0][c]])
if len(indexintersection) > 0:
okay = True
else:
okay = False
break
else:
okay = False
return okay
And now, the million dollar function or should i say two functions:
# var contains the loop variables a-e, depth keeps track of sibling number
def rec(arg, var, current_values, newresults, depth):
for i in range(len(arg[depth].results)):
if comparetwosiblings(current_values, arg[0], arg[depth], var[0], i):
if depth<len(arg)-1:
current_values.append(arg[depth].results[i][2])
rec(arg, var[:depth]+[i], current_values, newresults, depth+1)
current_values.remove(arg[depth].results[i][2])
else:
var.extend([i])
newresults.append([arg[0].results[var[0]][0], arg[1].results[var[1]][0], arg[2].results[var[2]][0], arg[3].results[var[3]][0], arg[4].results[var[4]][0], arg[0].results[var[0]][1]])
def solvecode(*arg):
newresults = []
for a in range(len(arg[0].results)):
current_values = [arg[0].results[a][2]]
rec(arg, var=[a], current_values=current_values, newresults=newresults, depth=1)
print(len(newresults))
print(newresults)
There is a need for two functions as the first one is the recursive one and the second one is like a packaging. I've also fulfilled your second wish, that was being able to have variable number of siblings' data that can be input into the new solvecode function. I've checked the new functions and they work together exactly like the original solvecode function. Something to be noted is that there is no significant difference in the version's runtimes although the second one has 8 less lines of code. Hope this helped. lmao took me 3 hours.

How to structure a program to work with minesweeper configurations

EDIT: This was a while ago and I've since got it working, if you'd like to see the code it's included at github.com/LewisGaul/minegaulerQt.
I'm trying to write a program to calculate probabilities for the game minesweeper, and have had some difficulty working out how best to structure it. While it may seem quite simple at first with the example below, I would like to know the best way to allow for more complex configurations. Note I am not looking for help with how to calculate probabilities - I know the method, I just need to implement it!
To make it clear what I'm trying to calculate, I will work through a simple example which can be done by hand. Consider a minesweeper configuration
# # # #
# 1 2 #
# # # #
where # represents an unclicked cell. The 1 tells us there is exactly 1 mine in the leftmost 7 unclicked cells, the 2 tells us there are exactly 2 in the rightmost 7. To calculate the probability of each individual cell containing a mine, we need to determine all the different cases (only 2 in this simple case):
1 mine in leftmost 3 cells, 2 mines in rightmost 3 cells (total of 3 mines, 3x3=9 combinations).
1 mine in center 4 cells, 1 mine in rightmost 3 cells (total of 2 mines, 4x3=12 combinations).
Given the probability of a mine being in a random cell is about 0.2, it is (in a random selection of cells) about 4 times more likely there is a total of 2 mines rather than a total of 3, so the total number of mines in a configuration matters, as well as the number of combinations of each configuration. So in this case the probability of case 1 is 9/(9+4x12)=0.158, and the probability of there being a mine in a given leftmost cell is therefore about 0.158/3=0.05, as those cells are effectively equivalent (they share exactly the same revealed neighbours).
I have created a GUI with Tkinter which allows me to easily enter configurations such as the one in the example, which stores the grid as a numpy array. I then made a NumberGroup class which isolates each of the clicked/numbered cells, storing the number and a set of the coordinates of its unclicked neighbours. These can be subtracted to get equivalence groups... Although this would not be as straightforward if there were three or more numbers instead of just two. But I am unsure how to go from here to getting the different configurations. I toyed with making a Configuration class, but am not hugely familiar with how different classes should work together. See working code below (numpy required).
Note: I am aware I could have attempted to use a brute force approach, but if possible I would like to avoid that, keeping the equivalent groups separate (in the above example there are 3 equivalence groups, the leftmost 3, the middle 4, the rightmost 3). I would like to hear your thoughts on this.
import numpy as np
grid = np.array(
[[0, 0, 0, 0],
[0, 2, 1, 0],
[0, 0, 0, 0]]
)
dims = (3, 4) #Dimensions of the grid
class NumberGroup(object):
def __init__(self, mines, coords, dims=None):
"""Takes a number of mines, and a set of coordinates."""
if dims:
self.dims = dims
self.mines = mines
self.coords = coords
def __repr__(self):
return "<Group of {} cells with {} mines>".format(
len(self.coords), self.mines)
def __str__(self):
if hasattr(self, 'dims'):
dims = self.dims
else:
dims = (max([c[0] for c in self.coords]) + 1,
max([c[1] for c in self.coords]) + 1)
grid = np.zeros(dims, int)
for coord in self.coords:
grid[coord] = 1
return str(grid).replace('0', '.').replace('1', '#')
def __sub__(self, other):
if type(other) is NumberGroup:
return self.coords - other.coords
elif type(other) is set:
return self.coords - other.coords
else:
raise TypeError("Can only subtract a group or a set from another.")
def get_neighbours(coord, dims):
x, y = coord
row = [u for u in range(x-1, x+2) if u in range(dims[0])]
col = [v for v in range(y-1, y+2) if v in range(dims[1])]
return {(u, v) for u in row for v in col}
groups = []
all_coords = [(i, j) for i in range(dims[0])
for j in range(dims[1])]
for coord, nr in [(c, grid[c]) for c in all_coords if grid[c] > 0]:
empty_neighbours = {c for c in get_neighbours(coord, dims)
if grid[c] == 0}
if nr > len(empty_neighbours):
print "Error: number {} in cell {} is too high.".format(nr, coord)
break
groups.append(NumberGroup(nr, empty_neighbours, dims))
print groups
for g in groups:
print g
print groups[0] - groups[1]
UPDATE:
I have added a couple of other classes and restructured a bit (see below for working code), and it is now capable of creating and displaying the equivalence groups, which is a step in the right direction. However I still need to work out how to iterate through all the possible mine-configurations, by assigning a number of mines to each group in a way that creates a valid configuration. Any help is appreciated.
For example,
# # # #
# 2 1 #
# # # #
There are three equivalence groups G1: the left 3, G2: the middle 4, G3: the right 3. I want the code to loop through, assigning groups with mines in the following way:
G1=2 (max the first group) => G2=0 => G3=1 (this is all configs with G1=2)
G1=1 (decrease by one) => G2=1 => G3=0 (this is all with G1=1)
G1=0 => G2=2 INVALID
So we arrive at both configurations. This needs to work for more complicated setups!
import numpy as np
def get_neighbours(coord, dims):
x, y = coord
row = [u for u in range(x-1, x+2) if u in range(dims[0])]
col = [v for v in range(y-1, y+2) if v in range(dims[1])]
return {(u, v) for u in row for v in col}
class NrConfig(object):
def __init__(self, grid):
self.grid = grid
self.dims = grid.shape # Dimensions of grid
self.all_coords = [(i, j) for i in range(self.dims[0])
for j in range(self.dims[1])]
self.numbers = dict()
self.groups = []
self.configs = []
self.get_numbers()
self.get_groups()
self.get_configs()
def __str__(self):
return str(self.grid).replace('0', '.')
def get_numbers(self):
for coord, nr in [(c, self.grid[c]) for c in self.all_coords
if self.grid[c] > 0]:
empty_neighbours = {c for c in get_neighbours(
coord, self.dims) if self.grid[c] == 0}
if nr > len(empty_neighbours):
print "Error: number {} in cell {} is too high.".format(
nr, coord)
return
self.numbers[coord] = Number(nr, coord, empty_neighbours,
self.dims)
def get_groups(self):
coord_neighbours = dict()
for coord in [c for c in self.all_coords if self.grid[c] == 0]:
# Must be a set so that order doesn't matter!
coord_neighbours[coord] = {self.numbers[c] for c in
get_neighbours(coord, self.dims) if c in self.numbers}
while coord_neighbours:
coord, neighbours = coord_neighbours.popitem()
equiv_coords = [coord] + [c for c, ns in coord_neighbours.items()
if ns == neighbours]
for c in equiv_coords:
if c in coord_neighbours:
del(coord_neighbours[c])
self.groups.append(EquivGroup(equiv_coords, neighbours, self.dims))
def get_configs(self):
pass # WHAT GOES HERE?!
class Number(object):
"""Contains information about the group of cells around a number."""
def __init__(self, nr, coord, neighbours, dims):
"""Takes a number of mines, and a set of coordinates."""
self.nr = nr
self.coord = coord
# A list of the available neighbouring cells' coords.
self.neighbours = neighbours
self.dims = dims
def __repr__(self):
return "<Number {} with {} empty neighbours>".format(
int(self), len(self.neighbours))
def __str__(self):
grid = np.zeros(self.dims, int)
grid[self.coord] = int(self)
for coord in self.neighbours:
grid[coord] = 9
return str(grid).replace('0', '.').replace('9', '#')
def __int__(self):
return self.nr
class EquivGroup(object):
"""A group of cells which are effectively equivalent."""
def __init__(self, coords, nrs, dims):
self.coords = coords
# A list of the neighbouring Number objects.
self.nr_neighbours = nrs
self.dims = dims
if self.nr_neighbours:
self.max_mines = min(len(self.coords),
max(map(int, self.nr_neighbours)))
else:
self.max_mines = len(coords)
def __repr__(self):
return "<Equivalence group containing {} cells>".format(
len(self.coords))
def __str__(self):
grid = np.zeros(self.dims, int)
for coord in self.coords:
grid[coord] = 9
for number in self.nr_neighbours:
grid[number.coord] = int(number)
return str(grid).replace('0', '.').replace('9', '#')
grid = np.array(
[[0, 0, 0, 0],
[0, 2, 1, 0],
[0, 0, 0, 0]]
)
config = NrConfig(grid)
print config
print "Number groups:"
for n in config.numbers.values():
print n
print "Equivalence groups:"
for g in config.groups:
print g

If you don't want to brute-force it, you could model the process as a decision tree. Suppose we start with your example:
####
#21#
####
If we want to start placing mines in a valid configuration, we at this point essentially have eight choices. Since it doesn't really matter which square we pick within an equivalence group, we can narrow that down to three choices. The tree branches. Let's go down one branch:
*###
#11#
####
I placed a mine in G1, indicated by the asterisk. Also, I've updated the numbers (just one number in this case) associated with this equivalence group, to indicate that these numbered squares can now border one fewer mines.
This hasn't reduced our freedom of choice for the following step, we can still place a mine in any of the equivalence groups. Let's place another one in G1:
*XX#
*01#
XXX#
Another asterisk marks the new mine, and the numbered square has again been lowered by one. It has now reached zero, meaning it cannot border any more mines. That means that for our next choice of mine placement, all the equivalence groups dependent upon this numbered square are ruled out. Xs mark squares where we can now not place any mine. We can only make one choice now:
*XX*
*00X
XXXX
Here the branch ends and you've found a valid configuration. By running along all the branches in this tree in this manner, you should find all of them. Here we found your first configuration. Of course, there's more than one way to get there. If we had started by placing a mine in G3, we would have been forced to place the other two in G1. That branch leads to the same configuration, so you should check for duplicates. I don't see a way to avoid this redundancy right now.
The second configuration is found by either starting with G2, or placing one mine in G1 and then the second in G2. In either case you again end up at a branch end:
**XX
X00X
XXXX
Invalid configurations like your example with zero mines in G1 do not pop up. There are no valid choices along the tree that lead you there. Here is the whole tree of valid choices.
Choice 1: 1 | 2 | 3
Choice 2: 1 2 3 | 1 | 1
Choice 3: 3 1 | |1
Valid configurations are the branch ends at which no further choice is possible, i.e.
113
12
131
21
311
which obviously fall into two equivalent classes if we disregard the order of the numbers.

python - unique set of ranges, merging when needed

Is there a datastructure that will maintain a unique set of ranges, merging an contiguous or overlapping ranges that are added? I need to track which ranges have been processed, but this may occur in an arbitrary order. E.g.:
range_set = RangeSet() # doesn't exist that I know of, this is what I need help with
def process_data(start, end):
global range_set
range_set.add_range(start, end)
# ...
process_data(0, 10)
process_data(20, 30)
process_data(5, 15)
process_data(50, 60)
print(range_set.missing_ranges())
# [[16,19], [31, 49]]
print(range_set.ranges())
# [[0,15], [20,30], [50, 60]]
Notice that overlapping or contiguous ranges get merged together. What is the best way to do this? I looked at using the bisect module, but its use didn't seem terribly clear.

Another approach is based on sympy.sets.
>>> import sympy as sym
>>> a = sym.Interval(1, 2, left_open=False, right_open=False)
>>> b = sym.Interval(3, 4, left_open=False, right_open=False)
>>> domain = sym.Interval(0, 10, left_open=False, right_open=False)
>>> missing = domain - a - b
>>> missing
[0, 1) U (2, 3) U (4, 10]
>>> 2 in missing
False
>>> missing.complement(domain)
[1, 2] U [3, 4]

You could get some similar functionality with pythons built-in set data structure; supposing only integer values are valid for start and end.
>>> whole_domain = set(range(12))
>>> A = set(range(0,1))
>>> B = set(range(4,9))
>>> C = set(range(3,6)) # processed range(3,5) twice
>>> done = A | B | C
>>> print done
set([0, 3, 4, 5, 6, 7, 8])
>>> missing = whole_domain - done
>>> print missing
set([1, 2, 9, 10, 11])
This still lacks many 'range'-features but might be sufficient.
A simple query if a certain range was already processed could look like this:
>>> isprocessed = [foo in done for foo in set(range(2,6))]
>>> print isprocessed
[False, True, True, True]

I've only lightly tested it, but it sounds like you're looking for something like this. You'll need to add the methods to get the ranges and missing ranges yourself, but it should be very straighforward as RangeSet.ranges is a list of Range objects maintained in sorted order. For a more pleasant interface you could write a convenience method that converted it to a list of 2-tuples, for example.
EDIT: I've just modified it to use less-than-or-equal comparisons for merging. Note, however, that this won't merge "adjacent" entries (e.g. it won't merge (1, 5) and (6, 10)). To do this you'd need to simply modify the condition in Range.check_merge().
import bisect
class Range(object):
# Reduces memory usage, overkill unless you're using a lot of these.
__slots__ = ["start", "end"]
def __init__(self, start, end):
"""Initialise this range."""
self.start = start
self.end = end
def __cmp__(self, other):
"""Sort ranges by their initial item."""
return cmp(self.start, other.start)
def check_merge(self, other):
"""Merge in specified range and return True iff it overlaps."""
if other.start <= self.end and other.end >= self.start:
self.start = min(other.start, self.start)
self.end = max(other.end, self.end)
return True
return False
class RangeSet(object):
def __init__(self):
self.ranges = []
def add_range(self, start, end):
"""Merge or insert the specified range as appropriate."""
new_range = Range(start, end)
offset = bisect.bisect_left(self.ranges, new_range)
# Check if we can merge backwards.
if offset > 0 and self.ranges[offset - 1].check_merge(new_range):
new_range = self.ranges[offset - 1]
offset -= 1
else:
self.ranges.insert(offset, new_range)
# Scan for forward merges.
check_offset = offset + 1
while (check_offset < len(self.ranges) and
new_range.check_merge(self.ranges[offset+1])):
check_offset += 1
# Remove any entries that we've just merged.
if check_offset - offset > 1:
self.ranges[offset+1:check_offset] = []

You have hit on a good solution in your example use case. Rather than try to maintain a set of the ranges that have been used, keep track of the ranges that haven't been used. This makes the problem pretty easy.
class RangeSet:
def __init__(self, min, max):
self.__gaps = [(min, max)]
self.min = min
self.max = max
def add(self, lo, hi):
new_gaps = []
for g in self.__gaps:
for ng in (g[0],min(g[1],lo)),(max(g[0],hi),g[1]):
if ng[1] > ng[0]: new_gaps.append(ng)
self.__gaps = new_gaps
def missing_ranges(self):
return self.__gaps
def ranges(self):
i = iter([self.min] + [x for y in self.__gaps for x in y] + [self.max])
return [(x,y) for x,y in zip(i,i) if y > x]
The magic is in the add method, which checks each existing gap to see whether it is affected by the new range, and adjusts the list of gaps accordingly.
Note that the behaviour of the tuples used for ranges here is the same as Python's range objects, i.e. they are inclusive of the start value and exclusive of the stop value. This class will not behave in exactly the way you described in your question, where your ranges seem to be inclusive of both.

Have a look at portion (https://pypi.org/project/portion/). I'm the maintainer of this library, and it supports disjuction of continuous intervals out of the box. It automatically simplifies adjacent and overlapping intervals.
Consider the intervals provided in your example:
>>> import portion as P
>>> i = P.closed(0, 10) | P.closed(20, 30) | P.closed(5, 15) | P.closed(50, 60)
>>> # get "used ranges"
>>> i
[0,15] | [20,30] | [50,60]
>>> # get "missing ranges"
>>> i.enclosure - i
(15,20) | (30,50)

Similar to DavidT's answer – also based on sympy's sets, but using a list of any length and addition (union) in a single operation:
import sympy
intervals = [[1,4], [6,10], [3,5], [7,8]] # pairs of left,right
print(intervals)
symintervals = [sympy.Interval(i[0],i[1], left_open=False, right_open=False) for i in intervals]
print(symintervals)
merged = sympy.Union(*symintervals) # one operation; adding to an union one by one is much slower for a large number of intervals
print(merged)
for i in merged.args: # assumes that the "merged" result is an union, not a single interval
print(i.left, i.right) # getting bounds of merged intervals

Here's my solution:
def flatten(collection):
subset = set()
for elem in collection:
to_add = elem
to_remove = set()
for s in subset:
if s[0] <= to_add[0] <= s[1] or s[0] <= to_add[1] <= s[1] or (s[0] > to_add[0] and s[1] < to_add[1]):
to_remove.add(s)
to_add = (min(to_add[0], s[0]), max(to_add[1], s[1]))
subset -= to_remove
subset.add(to_add)
return subset
range_set = {(-12, 4), (3, 20), (21, 25), (25, 30), (-13, -11), (5, 10), (-13, 20)}
print(flatten(range_set))
# {(21, 30), (-13, 20)}

How to get the least used item?

Say you have a defaultdict of usage counts like this:
usage_counts = collections.defaultdict(int)
usage_counts['foo1'] = 3
usage_counts['foo2'] = 3
usage_counts['foo3'] = 1
usage_counts['foo4'] = 1
usage_counts['foo5'] = 56
usage_counts['foo6'] = 65
And you have candidates foo1, foo3, foo4 and foo5 in some list:
candidates = ['foo1', 'foo3', 'foo4', 'foo5']
How can one pick randomly from the pool of least used candidates?
I came up with this function, but I am wondering if there is a better way.
def get_least_used(candidates, usage_counts):
candidate_counts = collections.defaultdict(int)
for candidate in candidates:
candidate_counts[candidate] = usage_counts[candidate]
lowest = min(v for v in candidate_counts.values())
return random.choice([c for c in candidates if candidate_counts[c] == lowest])

random.shuffle(candidates)
min_candidate = min(candidates, key=usage_counts.get)
returns the first "minimal" candidate from the shuffled list of candidates.

You could browse the list only once if you accept to do it explicitely, by generating a list for the canditates having lowest count. If current count if less than old min, you initialize a new list, if equal you add to the list :
def get_least_used(candidates, usage_counts):
mincount = sys.maxint
for c in candidates :
count = usage_counts[c]
if count < mincount:
leastc = [ c ]
mincount = count
elif count == mincount:
leastc.append(c)
return random.choice(leastc)
As you said you were using Python 2.6, I initialize mincount with sys.maxint. Under Python 3.x, you would have to choose a value reasonably great.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.