Check if some elements in a matrix are cohesive - python

I have to write a very little Python program that checks whether some group of coordinates are all connected together (by a line, not diagonally). The next 2 pictures show what I mean. In the left picture all colored groups are cohesive, in the right picture not:
I've already made this piece of code, but it doesn't seem to work and I'm quite stuck, any ideas on how to fix this?
def cohesive(container):
co = container.pop()
container.add(co)
return connected(co, container)
def connected(co, container):
done = {co}
todo = set(container)
while len(neighbours(co, container, done)) > 0 and len(todo) > 0:
done = done.union(neighbours(co, container, done))
return len(done) == len(container)
def neighbours(co, container, done):
output = set()
for i in range(-1, 2):
if i != 0:
if (co[0] + i, co[1]) in container and (co[0] + i, co[1]) not in done:
output.add((co[0] + i, co[1]))
if (co[0], co[1] + i) in container and (co[0], co[1] + i) not in done:
output.add((co[0], co[1] + i))
return output
this is some reference material that should return True:
cohesive({(1, 2), (1, 3), (2, 2), (0, 3), (0, 4)})
and this should return False:
cohesive({(1, 2), (1, 4), (2, 2), (0, 3), (0, 4)})
Both tests work, but when I try to test it with different numbers the functions fail.

You can just take an element and attach its neighbors while it is possible.
def dist(A,B):return abs(A[0]-B[0]) + abs(A[1]-B[1])
def grow(K,E):return {M for M in E for N in K if dist(M,N)<=1}
def cohesive(E):
K={min(E)} # an element
L=grow(K,E)
while len(K)<len(L) : K,L=L,grow(L,E)
return len(L)==len(E)
grow(K,E) return the neighborhood of K.
In [1]: cohesive({(1, 2), (1, 3), (2, 2), (0, 3), (0, 4)})
Out[1]: True
In [2]: cohesive({(1, 2), (1, 4), (2, 2), (0, 3), (0, 4)})
Out[2]: False

Usually, to check if something is connected, you need to use disjoint set data structures, the more efficient variations include weighted quick union, weighted quick union with path compression.
Here's an implementation, http://algs4.cs.princeton.edu/15uf/WeightedQuickUnionUF.java.html which you can modify to your needs. Also, the implementation found in the book "The Design and Analysis of Computer Algorithms" by A. Aho, allows you to specify the name of the group that you add 2 connected elements to, so I think that's the modification you're looking for.(It just involves using 1 extra array which keeps track of group numbers).
As a side note, since disjoint sets usually apply to arrays, don't forget that you can represent an N by N matrix as an array of size N*N.
EDIT: just realized that it wasn't clear to me what you were asking at first, and I realized that you also mentioned that diagonal components aren't connected, in that case the algorithm is as follows:
0) Check if all elements refer to the same group.
1) Iterate through the array of pairs that represent coordinates in the matrix in question.
2) For each pair make a set of pairs that satisfies the following formula:
|entry.x - otherEntry.x| + |entry.y - otherEntry.y|=1.
'entry' refers to the element that the outer for loop is referring to.
3) Check if all of the sets overlap. That can be done by "unioning" the sets you're looking at, at the end if you get more than 1 set, then the elements are not cohesive.
The complexity is O(n^2 + n^2 * log(n)).
Example:
(0,4), (1,2), (1,4), (2,2), (2,3)
0) check that they are all in the same group:
all of them belong to group 5.
1) make sets:
set1: (0,4), (1,4)
set2: (1,2), (2,2)
set3: (0,4), (1,4) // here we suppose that sets are sorted, other than that it
should be (1,4), (0,4)
set4: (1,2), (2,2), (2,3)
set5: (2,2), (2,3)
2) check for overlap:
set1 overlaps with set3, so we get:
set1' : (0,4), (1,4)
set2 overlaps with set4 and set 5, so we get:
set2' : (1,2), (2,2), (2,3)
as you can see set1' and set2' don't overlap, hence you get 2 disjoint sets that are in the same group, so the answer is 'false'.
Note that this is inefficient, but I have no idea how to do it more efficiently, but this answers your question.

The logic in your connected function seems wrong. You make a todo variable, but then never change its contents. You always look for neighbours around the same starting point.
Try this code instead:
def connected(co, container):
done = {co}
todo = {co}
while len(todo) > 0:
co = todo.pop()
n = neighbours(co, container, done)
done = done.union(n)
todo = todo.union(n)
return len(done) == len(container)
todo is a set of all the points we are still to check.
done is a set of all the points we have found to be 4-connected to the starting point.

I'd tackle this problem differently... if you're looking for five exactly, that means:
Every coordinate in the line has to be neighbouring another coordinate in the line, because anything less means that coordinate is disconnected.
At least three of the coordinates have to be neighbouring another two or more coordinates in the line, because anything less and the groups will be disconnected.
Hence, you can just get the coordinate's neighbours and check whether both conditions are fulfilled.
Here is a basic solution:
def cells_are_connected(connections):
return all(c > 0 for c in connections)
def groups_are_connected(connections):
return len([1 for c in connections if c > 1]) > 2
def cohesive(coordinates):
connections = []
for x, y in coordinates:
neighbours = [(x-1, y), (x+1, y), (x, y-1), (x, y+1)]
connections.append(len([1 for n in neighbours if n in coordinates]))
return cells_are_connected(connections) and groups_are_connected(connections)
print cohesive([(1, 2), (1, 3), (2, 2), (0, 3), (0, 4)]) # True
print cohesive([(1, 2), (1, 4), (2, 2), (0, 3), (0, 4)]) # False
No need for a general-case solution or union logic. :) Do note that it's specific to the five-in-a-line problem, however.

Related

Finding a closed path from list of start and end nodes

I have a list of edges (E) of a graph with nodes V = [1,2,3,4,5,6]:
E = [(1,2), (1,5), (2,3), (3,1), (5,6), (6,1)]
where each tuple (a,b) refers to the start & end node of the edge respectively.
If I know the edges form a closed path in graph G, can I recover the path?
Note that E is not the set of all edges of the graph. Its just a set of edges.
In this example, the path would be 1->2->3->1->5->6->1
A naive approach, I can think of is using a tree where I start with a node, say 1, then I look at all tuples that start with 1, here, (1,2) and (1,5). Then I have two branches, and with nodes as 2 & 5, I continue the process till I end at the starting node at a branch.
How to code this efficiently in python?
The networkx package has a function that can generate the desired circuit for you in linear time...
It is possible, that construction of nx.MultiDiGraph() is slower and not such efficient, as desired in question, or usage of external packages for only one function is rather excessive. If it is so, there is another way.
Plan: firstly we will find some way from start_node to start_node, then we will insert all loops, that were not visited yet.
from itertools import chain
from collections import defaultdict, deque
from typing import Tuple, List, Iterable, Iterator, DefaultDict, Deque
def retrieve_closed_path(arcs: List[Tuple[int, int]], start_node: int = 1) -> Iterator[int]:
if not arcs:
return iter([])
# for each node `u` carries queue of its
# neighbours to be visited from node `u`
d: DefaultDict[int, Deque[int]] = defaultdict(deque)
for u, v in arcs:
# deque pop and append complexity is O(1)
d[u].append(v)
def _dfs(node) -> Iterator[int]:
out: Iterator[int] = iter([])
# guarantee, that all queues
# will be emptied at the end
while d[node]:
# chain returns an iterator and helps to
# avoid unnecessary memory reallocations
out = chain([node], _dfs(d[node].pop()), out)
# if we return in this loop from recursive call, then
# `out` already carries some (node, ...) and we need
# only to insert all other loops which start at `node`
return out
return chain(_dfs(start_node), [start_node])
def path_to_string(path: Iterable[int]) -> str:
return '->'.join(str(x) for x in path)
Examples:
E = [(1, 2), (2, 1)]
p = retrieve_closed_path(E, 1)
print(path_to_string(p))
>> 1->2->1
E = [(1, 2), (1, 5), (2, 3), (3, 1), (5, 6), (6, 1)]
p = retrieve_closed_path(E, 1)
print(path_to_string(p))
>> 1->5->6->1->2->3->1
E = [(1, 2), (2, 3), (3, 4), (4, 2), (2, 1)]
p = retrieve_closed_path(E, 1)
print(path_to_string(p))
>> 1->2->3->4->2->1
E = [(5, 1), (1, 5), (5, 2), (2, 5), (5, 1), (1, 4), (4, 5)]
p = retrieve_closed_path(E, 1)
print(path_to_string())
>> 1->4->5->1->5->2->5->1
You're looking for a directed Eulerian circuit in your (sub)graph. An Eulerian circuit is a trail that visits every edge exactly once.
The networkx package has a function that can generate the desired circuit for you in linear time:
import networkx as nx
edges = [(1,2), (1,5), (2,3), (3,1), (5,6), (6,1)]
G = nx.MultiDiGraph()
G.add_edges_from(edges)
# Prints [(1, 5), (5, 6), (6, 1), (1, 2), (2, 3), (3, 1)]
# which matches the desired output (as asked in the comments).
print([edge for edge in nx.algorithms.euler.eulerian_circuit(G)])
The documentation cites a 1973 paper, if you're interested in understanding how the algorithm works. You can also take a look at the source code here. Note that we're working with multigraphs here, since you can have multiple edges that have the same source and destination node. There are probably other implementations floating around on the Internet, but they may or may not work for multigraphs.

When finding derivatives, how do you use the filter function to return only the terms whose derivatives are not multiplied by zero, in python 3?

I wrote a function that, given the terms of an equation, can find derivatives. However when one of the terms is a zero, the function breaks down. How would I use filter to make sure terms that are multiplied by zero don't return?
Here's my baseline code which works but doesn't include the filter yet:
def find_derivative(function_terms):
return [(function_terms[0][0]*function_terms[0][1], function_terms[0][1]-1),(function_terms[1][0]*function_terms[1][1], function_terms[1][1]-1)]
The function_terms[1][1]-1 reduces the power of the term of the derivative by 1.
It works like this.
Input:
# Represent each polynomial term with a tuple of (coefficient, power)
# f(x) = 4 x^3 - 3 x
four_x_cubed_minus_three_x = [(4, 3), (-3, 1)]
find_derivative(four_x_cubed_minus_three_x)
Output:
[(12, 2), (-3, 0)]
This is the correct answer of 12 x^2 - 3
But here it breaks down:
Input:
# f(x) = 3 x^2 - 11
three_x_squared_minus_eleven = [(3, 2), (-11, 0)]
find_derivative(three_x_squared_minus_eleven)
It is supposed to find the derivative, given the equation.
Output:
((6, 1), (0, -1))
This has a "ghost" term of 0 * x^(-1); I don't want this term printed.
Expected Output:
[(6, 1)]
You can use the filter() function to filter the list of tuples and then apply logic on the filtered list. Something like this should work.
filtered_terms = list(filter(lambda x: x[1]!=0, function_terms))
Now you have the tuples without constants. So rather than hard-coding derivatives, try looping through the list to get the derivative.
result = []
for term in filtered_terms:
result.append((term[0]*term[1], term[1]-1))
return result
There is a symbolic math solver in python called sympy. Maybe it can be useful for you.
from sympy import *
x = symbols('x')
init_printing(use_unicode=True)
equation = 4*x**3 -3*x
diff_equation = equation.diff()
solution = diff_equation.subs({x:2})
Two changes:
Make your routine iterate through the polynomial terms, handling them one at a time, rather than depending on having exactly two terms.
Apply the filtering to the individual term as you encounter it.
I've extended this to also eliminate anything with a zero coefficient, as well as a zero exponent. I added a test case with both of those and a negative exponent, since the symbolic differentiation theorem applies equally.
def find_derivative(function_terms):
return [(term[0]*term[1], term[1]-1)
for i, term in enumerate(function_terms)
if term[0] * term[1] != 0 ]
four_x_cubed_minus_three_x = [(4, 3), (-3, 1)]
print(find_derivative(four_x_cubed_minus_three_x) )
three_x_squared_minus_eleven = [(3, 2), (-11, 0)]
print(find_derivative(three_x_squared_minus_eleven) )
fifth_degree = [(1, 5), (-1, 4), (0, 3), (8, 2), (-16, 0), (1, -2)]
print(find_derivative(fifth_degree) )
Output:
[(12, 2), (-3, 0)]
[(6, 1)]
[(5, 4), (-4, 3), (16, 1), (-2, -3)]

How to index a Cartesian product

Suppose that the variables x and theta can take the possible values [0, 1, 2] and [0, 1, 2, 3], respectively.
Let's say that in one realization, x = 1 and theta = 3. The natural way to represent this is by a tuple (1,3). However, I'd like to instead label the state (1,3) by a single index. A 'brute-force' method of doing this is to form the Cartesian product of all the possible ordered pairs (x,theta) and look it up:
import numpy as np
import itertools
N_x = 3
N_theta = 4
np.random.seed(seed = 1)
x = np.random.choice(range(N_x))
theta = np.random.choice(range(N_theta))
def get_box(x, N_x, theta, N_theta):
states = list(itertools.product(range(N_x),range(N_theta)))
inds = [i for i in range(len(states)) if states[i]==(x,theta)]
return inds[0]
print (x, theta)
box = get_box(x, N_x, theta, N_theta)
print box
This gives (x, theta) = (1,3) and box = 7, which makes sense if we look it up in the states list:
[(0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2), (2, 3)]
However, this 'brute-force' approach seems inefficient, as it should be possible to determine the index beforehand without looking it up. Is there any general way to do this? (The number of states N_x and N_theta may vary in the actual application, and there might be more variables in the Cartesian product).
If you always store your states lexicographically and the possible values for x and theta are always the complete range from 0 to some maximum as your examples suggests, you can use the formula
index = x * N_theta + theta
where (x, theta) is one of your tuples.
This generalizes in the following way to higher dimensional tuples: If N is a list or tuple representing the ranges of the variables (so N[0] is the number of possible values for the first variable, etc.) and p is a tuple, you get the index into a lexicographically sorted list of all possible tuples using the following snippet:
index = 0
skip = 1
for dimension in reversed(range(len(N))):
index += skip * p[dimension]
skip *= N[dimension]
This might not be the most Pythonic way to do it but it shows what is going on: You think of your tuples as a hypercube where you can only go along one dimension, but if you reach the edge, your coordinate in the "next" dimension increases and your traveling coordinate resets. The reader is advised to draw some pictures. ;)
I think it depends on the data you have. If they are sparse, the best solution is a dictionary. And works for any tuple's dimension.
import itertools
import random
n = 100
m = 100
l1 = [i for i in range(n)]
l2 = [i for i in range(m)]
a = {}
prod = [element for element in itertools.product(l1, l2)]
for i in prod:
a[i] = random.randint(1, 100)
A very good source about the performance is in this discution.
For the sake of completeness I'll include my implementation of Julian Kniephoff's solution, get_box3, with a slightly adapted version of the original implementation, get_box2:
# 'Brute-force' method
def get_box2(p, N):
states = list(itertools.product(*[range(n) for n in N]))
return states.index(p)
# 'Analytic' method
def get_box3(p, N):
index = 0
skip = 1
for dimension in reversed(range(len(N))):
index += skip * p[dimension]
skip *= N[dimension]
return index
p = (1,3,2) # Tuple characterizing the total state of the system
N = [3,4,3] # List of the number of possible values for each state variable
print "Brute-force method yields %s" % get_box2(p, N)
print "Analytical method yields %s" % get_box3(p, N)
Both the 'brute-force' and 'analytic' method yield the same result:
Brute-force method yields 23
Analytical method yields 23
but I expect the 'analytic' method to be faster. I've changed the representation to p and N as suggested by Julian.

How to determine corner/vertex cells in an arbitrary shape composed of grid cells

I am dealing with polygons composed of square tiles on a 2D grid. A polygon is simply stored as a list of tuples, with each tuple representing the coordinates of a tile. The polygons are always contiguous and have no holes.
What I want to be able to do is determine which of the tiles represent vertices along the border of the polygon, such that later I could trace between each one to produce the polygon's border, or determine the distance between two consecutive vertices to find the length of a side, etc.
Here is an example of a polygon (a 5x4 rectangle with a 3x2 rectangle subtracted from the top left, producing a backward 'L'):
polygon_tiles = [(3, 0), (4, 0), (3, 1), (4, 1), (0, 2), (1, 2), (2, 2), (3, 2),
(4, 2), (0, 3), (1, 3), (2, 3), (3, 3), (4, 3)]
Ideally the algorithm I am seeking would produce a result that looked like this:
polygon_verts = [(3, 0), (4, 0), (4, 3), (0, 3), (0, 2), (3, 2)]
with the vertices listed in order tracing around the border clockwise.
Just fiddling around with some test cases, this problem seems to be much more complicated than I would have thought, especially in weird circumstances like when a polygon has a 1-tile-wide extrusion (in this case one of the tiles might have to be stored as a vertex twice??).
I'm working in Python, but any insight is appreciated, even if it's in pseudocode.
Assuming your shape has no internal holes.
Find the topmost row. Pick the leftmost tile of this row. This guarantees we begin on a corner.
From this tile, attempt to go straight right If you can't, go straight downright, straight down, etc until you have picked a direction. This guarnatees we can trace a clockwise perimeter of the polygon
Continue to take steps in your chosen direction. After each step:
If the next step would be onto a tile, rotate counterclockwise and look again.
If the next step would be onto an empty space, rotate clockwise and look again.
Stop rotating once you have moved onto empty space and back onto a tile again.
If we rotated from the initial direction, we must be standing on a vertex. Mark it as such.
Mark every other tile you traverse as being part of the edge.
Keep walking the edge until you arrive at your initial tile. You may walk over tiles more than once in the case of 1 tile extrusions.
If this algorithm doesn't make sense in your head, try getting out some paper and following it by hand :)
This problem is a convex hull variation, for which e.g. the gift wrapping algorithm could be applied. The constraints of discrete coordinates and line directions lead to simplifications. Here is some python code that gives the desired answer (Patashu's answer is in the same spirit):
#!/usr/bin/python
import math
def neighbors(coord):
for dir in (1,0):
for delta in (-1,1):
yield (coord[0]+dir*delta, coord[1]+(1-dir)*delta)
def get_angle(dir1, dir2):
angle = math.acos(dir1[0] * dir2[0] + dir1[1] * dir2[1])
cross = dir1[1] * dir2[0] - dir1[0] * dir2[1]
if cross > 0:
angle = -angle
return angle
def trace(p):
if len(p) <= 1:
return p
# start at top left-most point
pt0 = min(p, key = lambda t: (t[1],t[0]))
dir = (0,-1)
pt = pt0
outline = [pt0]
while True:
pt_next = None
angle_next = 10 # dummy value to be replaced
dir_next = None
# find leftmost neighbor
for n in neighbors(pt):
if n in p:
dir2 = (n[0]-pt[0], n[1]-pt[1])
angle = get_angle(dir, dir2)
if angle < angle_next:
pt_next = n
angle_next = angle
dir_next = dir2
if angle_next != 0:
outline.append(pt_next)
else:
# previous point was unnecessary
outline[-1]=pt_next
if pt_next == pt0:
return outline[:-1]
pt = pt_next
dir = dir_next
polygon_tiles = [(3, 0), (4, 0), (3, 1), (4, 1), (0, 2), (1, 2), (2, 2), (3, 2),
(4, 2), (0, 3), (1, 3), (2, 3), (3, 3), (4, 3)]
outline = trace(polygon_tiles)
print(outline)
I would just calculate the slopes of the lines between the vertices
# Do sort stuff
vertices = []
for position, polygon in enumerate(polygon_tiles):
# look for IndexErrors
try:
polygon_tiles[position+1]
except IndexError:
break
try:
polygon_tiles[position+2]
except IndexError:
# Bad practice
position = position - 1
# calculate the slope of the line between of vertex 1 and vertex 2
s1 = (polygon_tiles[position+1][1] - polygon[1]) / (polygon_tiles[position+1][0] - polygon[0])
# calculate the slope of vertex 2 and vertex 3
s2 = (polygon_tiles[position+2][1] - polygon_tiles[position+1][1]) / (polygon_tiles[position+2][0] - polygon_tiles[position+1][0])
# if the slopes differ then you have a vertex
if d1 != d2:
vertices.append(polygon_tiles[position+1])

List coordinates between a set of coordinates

This should be fairly easy, but I'm getting a headache from trying to figure it out. I want to list all the coordinates between two points. Like so:
1: (1,1)
2: (1,3)
In between: (1,2)
Or
1: (1,1)
2: (5,1)
In between: (2,1), (3,1), (4,1)
It does not need to work with diagonals.
You appear to be a beginning programmer. A general technique I find useful is to do the job yourself, on paper, then look at how you did it and translate that to a program. If you can't see how, break it down into simpler steps until you can.
Depending on how you want to handle the edge cases, this seems to work:
def points_between(p1, p2):
xs = range(p1[0] + 1, p2[0]) or [p1[0]]
ys = range(p1[1] + 1, p2[1]) or [p1[1]]
return [(x,y) for x in xs for y in ys]
print points_between((1,1), (5,1))
# [(2, 1), (3, 1), (4, 1)]
print points_between((5,6), (5,12))
# [(5, 7), (5, 8), (5, 9), (5, 10), (5, 11)]

Categories