Efficient way of finding rectangle coordinates in 0-1 arrays - python

Say I have an MxN matrix of 0's and 1's. It may or may not be sparse.
I want a function to efficiently find rectangles in the array, where by rectangle I mean:
a set of 4 elements that are all 1's that create the 4 corners of a
rectangle, such that the sides of the rectangle are orthogonal to the
array axes. In other words, a rectangle is a set of 4 1's elements
with coordinates [row index, column index] like so: [r1,c1], [r1,c2],
[r2,c2], [r2,c1].
E.g. this setup has one rectangle:
0 0 0 1 0 1 0
0 0 0 0 0 0 0
0 1 0 0 0 0 0
1 0 0 1 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 1
For a given MxN array, I want a Python function F(A) that returns an array L of subarrays, where each subarray is the coordinate pair of the corner of a rectangle (and includes all of the 4 corners of the rectangle). For the case where the same element of the array is the corner of multiple rectangles, it's ok to duplicate those coordinates.
My thinking so far is:
1) find the coordinates of the apex of each right triangle in the array
2) check each right triangle apex coordinate to see if it is part of a rectangle
Step 1) can be achieved by finding those elements that are 1's and are in a column with a column sum >=2, and in a row with a row sum >=2.
Step 2) would then iterate through each coordinate determined to be the apex of a right triangle. For a a given right triangle coordinate pair, it would iterate through that column, looking at every other right triangle coordinate from 1) that is in that column. For any pair of 2 right triangle points in a column, it would then check which row has a smaller row sum to know which row would be faster to iterate through. Then it would iterate through all of the right triangle column coordinates in that row and see if the other row also has a right triangle point in that column. If it does, those 4 points form a rectangle.
I think this will work, but there will be repetition, and overall this procedure seems like it would be reasonably computationally intensive. What are some better ways for detecting rectangle corners in 0-1 arrays?

This is from the top of my head and during 5 hrs layover at LAX. Following is my algorithm:
Step 1: Search all rows for at least two ones
| 0 0 0 1 0 1 0
| 0 0 0 0 0 0 0
| 0 1 0 0 0 0 0
\|/ 1 0 0 1 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 1
Output:
-> 0 0 0 1 0 1 0
0 0 0 0 0 0 0
0 1 0 0 0 0 0
-> 1 0 0 1 0 1 0
0 0 0 0 0 0 0
-> 0 0 0 1 0 0 1
Step 2: For each pair of ones at each row get the index for one's in the column corresponding to the ones, lets say for the first row:
-> 0 0 0 1 0 1 0
you check for ones in the following columns:
| |
\|/ \|/
0 0 0 1 0 1 0
0 0 0 0 0 0 0
0 1 0 0 0 0 0
1 0 0 1 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 1
Step 3: If both index match; return the indices of all four. This can be easily accessed as you know the row and index of ones at all steps. In our case the search at columns 3, 5 are going to return 3 assuming you start index from 0. So we get the indicies for the following:
0 0 0 ->1 0 ->1 0
0 0 0 0 0 0 0
0 1 0 0 0 0 0
1 0 0 ->1 0 ->1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 1
Step 4: Repeat for all pairs
Algorithm Complexity
I know you need to search for columns * rows * number of pairs but you can always use hashmaps to optimize search O(1). Which will make over complexity bound to the number of pairs. Please feel free to comment with any questions.

Here's an Python implementation which is similar to #PseudoAj solution. It will process the rows starting from top while constructing a dict where keys are x coordinates and values are sets of respective y coordinates.
For every row following steps are done:
Generate a list of x-coordinates with 1s from current row
If length of list is less than 2 move to next row
Iterate over all coordinate pairs left, right where left < right
For every coordinate pair take intersection from dict containing processed rows
For every y coordinate in the intersection add rectangle to result
Finally update dict with coordinates from current row
Code:
from collections import defaultdict
from itertools import combinations
arr = [
[0, 0, 0, 1, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 1]
]
# List corner coords
result = []
# Dict {x: set(y1, y2, ...)} of 1s in processed rows
d = defaultdict(set)
for y, row in enumerate(arr):
# Find indexes of 1 from current row
coords = [i for i, x in enumerate(row) if x]
# Move to next row if less than two points
if len(coords) < 2:
continue
# For every pair on this row find all pairs on previous rows
for left, right in combinations(coords, 2):
for top in d[left] & d[right]:
result.append(((top, left), (top, right), (y, left), (y, right)))
# Add coordinates on this row to processed rows
for x in coords:
d[x].add(y)
print(result)
Output:
[((0, 3), (0, 5), (3, 3), (3, 5))]

Related

Efficient way to find coordinates of connected blobs in binary image

I am looking for the coordinates of connected blobs in a binary image (2d numpy array of 0 or 1).
The skimage library provides a very fast way to label blobs within the array (which I found from similar SO posts). However I want a list of the coordinates of the blob, not a labelled array. I have a solution which extracts the coordinates from the labelled image. But it is very slow. Far slower than the inital labelling.
Minimal Reproducible example:
import timeit
from skimage import measure
import numpy as np
binary_image = np.array([
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,1,0,1,1,1,0,1,1,1,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,0,1,0,0,0,0,0,0,0,0,0],
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
])
print(f"\n\n2d array of type: {type(binary_image)}:")
print(binary_image)
labels = measure.label(binary_image)
print(f"\n\n2d array with connected blobs labelled of type {type(labels)}:")
print(labels)
def extract_blobs_from_labelled_array(labelled_array):
# The goal is to obtain lists of the coordinates
# Of each distinct blob.
blobs = []
label = 1
while True:
indices_of_label = np.where(labelled_array==label)
if not indices_of_label[0].size > 0:
break
else:
blob =list(zip(*indices_of_label))
label+=1
blobs.append(blob)
if __name__ == "__main__":
print("\n\nBeginning extract_blobs_from_labelled_array timing\n")
print("Time taken:")
print(
timeit.timeit(
'extract_blobs_from_labelled_array(labels)',
globals=globals(),
number=1
)
)
print("\n\n")
Output:
2d array of type: <class 'numpy.ndarray'>:
[[0 1 0 0 1 1 0 1 1 0 0 1]
[0 1 0 1 1 1 0 1 1 1 0 1]
[0 0 0 0 0 0 0 1 1 1 0 0]
[0 1 1 1 1 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 1 1 1 0 0]
[0 0 1 0 0 0 0 0 0 0 0 0]
[0 1 0 0 1 1 0 1 1 0 0 1]
[0 0 0 0 0 0 0 1 1 1 0 0]
[0 1 1 1 1 0 0 0 0 1 0 0]]
2d array with connected blobs labelled of type <class 'numpy.ndarray'>:
[[ 0 1 0 0 2 2 0 3 3 0 0 4]
[ 0 1 0 2 2 2 0 3 3 3 0 4]
[ 0 0 0 0 0 0 0 3 3 3 0 0]
[ 0 5 5 5 5 0 0 0 0 3 0 0]
[ 0 0 0 0 0 0 0 3 3 3 0 0]
[ 0 0 6 0 0 0 0 0 0 0 0 0]
[ 0 6 0 0 7 7 0 8 8 0 0 9]
[ 0 0 0 0 0 0 0 8 8 8 0 0]
[ 0 10 10 10 10 0 0 0 0 8 0 0]]
Beginning extract_blobs_from_labelled_array timing
Time taken:
9.346099977847189e-05
9e-05 is small but so is this image for the example. In reality I am working with very high resolution images for which the function takes approximately 10 minutes.
Is there a faster way to do this?
Side note: I'm only using list(zip()) to try get the numpy coordinates into something I'm used to (I don't use numpy much just Python). Should I be skipping this and just using the coordinates to index as-is? Will that speed it up?
The part of the code that slow is here:
while True:
indices_of_label = np.where(labelled_array==label)
if not indices_of_label[0].size > 0:
break
else:
blob =list(zip(*indices_of_label))
label+=1
blobs.append(blob)
First, a complete aside: you should avoid using while True when you know the number of elements you will be iterating over. It's a recipe for hard-to-find infinite-loop bugs.
Instead, you should use:
for label in range(np.max(labels)):
and then you can ignore the if ...: break.
A second issue is indeed that you are using list(zip(*)), which is slow compared to NumPy functions. Here you could get approximately the same result with np.transpose(indices_of_label), which will get you a 2D array of shape (n_coords, n_dim), ie (n_coords, 2).
But the Big Issue is the expression labelled_array == label. This will examine every pixel of the image once for every label. (Twice, actually, because then you run np.where(), which takes another pass.) This is a lot of unnecessary work, as the coordinates can be found in one pass.
The scikit-image function skimage.measure.regionprops can do this for you. regionprops goes over the image once and returns a list containing one RegionProps object per label. The object has a .coords attribute containing the coordinates of each pixel in the blob. So, here's your code, modified to use that function:
import timeit
from skimage import measure
import numpy as np
binary_image = np.array([
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,1,0,1,1,1,0,1,1,1,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,0,1,0,0,0,0,0,0,0,0,0],
[0,1,0,0,1,1,0,1,1,0,0,1],
[0,0,0,0,0,0,0,1,1,1,0,0],
[0,1,1,1,1,0,0,0,0,1,0,0],
])
print(f"\n\n2d array of type: {type(binary_image)}:")
print(binary_image)
labels = measure.label(binary_image)
print(f"\n\n2d array with connected blobs labelled of type {type(labels)}:")
print(labels)
def extract_blobs_from_labelled_array(labelled_array):
"""Return a list containing coordinates of pixels in each blob."""
props = measure.regionprops(labelled_array)
blobs = [p.coords for p in props]
return blobs
if __name__ == "__main__":
print("\n\nBeginning extract_blobs_from_labelled_array timing\n")
print("Time taken:")
print(
timeit.timeit(
'extract_blobs_from_labelled_array(labels)',
globals=globals(),
number=1
)
)
print("\n\n")

Numpy Interpolation Between Points Within Array (scipy.griddata)

I have a numpy array of a fixed size holding irregularly spaced data. An example would be:
[1 0 0 0 3 0 0 0 2 0
0 1 0 0 0 0 0 0 2 0
0 1 0 0 1 0 6 0 9 0
0 0 0 0 6 0 3 0 0 1]
I want to keep the array the same shape, but have all the 0 values overwritten with data interpolated from the points that do have data. If the data points in the array are thought of as height values, this would essentially be creating a surface over the points.
I have been trying to use scipy.interpolate.griddata but am continually getting errors. I start with an array of my known data points, as [x, y, value]. For the above, (first row only for brevity)
data = [0, 0, 1
0, 3, 3
0, 8, 2 ....................
I then define
points = (data[:,0], data[:,1])
values = (data[:,2])
Next, I define the points to sample at (in this case, the grid I desire)
grid = np.indices((4,10))
Finally, call griddata
t = interpolate.griddata(points, values, grid, method = 'linear')
This returns the following error
ValueError: number of dimensions in xi does not match x
Am I using the wrong function?
Thanks!
Solved: You need to pass the desired points as a tuple
t = interpolate.griddata(points, values, (grid[0,:,:], grid[1,:,:]), method = 'linear')

Python: Rearrange indices from np.where()

I would like to rearrange the indices in a tuple which was created with np.where.
The reason for this is, that I would like to apply values to a number of special position (a pipe) in a mesh, which were pre-selected. The values shall be applied in the direction of flow. The direction of flow is defined from top left to bottom left = from (3,0) to (3,6) to (7,6) to (7,0). Currently, the order of the index tuple ind is according to the automatic sorting of the indices. This leads to the figure, below, where the values 1:10 are correctly applied, but 11:17 are obviously in reverse order.
Is there a better way to grab the indices or how can I rearrange the tuple so that the values are applied in the direction of flow?
import numpy as np
import matplotlib.pyplot as plt
# mesh size
nx, ny = 10, 10
# special positions
sx1, sx2, sy = .3, .7, .7
T = 1
# create mesh
u0 = np.zeros((nx, ny))
# assign values to mesh
u0[int(nx*sx1), 0:int(ny*sy)] = T
u0[int(nx*sx2), 0:int(ny*sy)] = T
u0[int(nx*sx1+1):int(nx*sx2), int(ny*sy-1)] = T
# get indices of special positions
ind = np.where(u0 == T)
# EDIT: hand code sequence
length = len(u0[int(nx*sx2), 0:int(ny*sy)])
ind[0][-length:] = np.flip(ind[0][-length:])
ind[1][-length:] = np.flip(ind[1][-length:])
# apply new values on special positions
u0[ind] = np.arange(1, len(ind[1])+1,1)
fig, ax = plt.subplots()
fig = ax.imshow(u0, cmap=plt.get_cmap('RdBu_r'))
ax.figure.colorbar(fig)
plt.show()
Old image (without edit)
New image (after edit)
I think it's a fallacy to think that you can algorithmically deduce the correct "flow-sequence" of the grid points, by examining the contents of the tuple ind.
Here's an example that illustrates why:
0 0 0 0 0 0 0 0 0 0
A B C D E 0 0 0 0 0
0 0 0 0 F 0 0 0 0 0
0 0 0 0 G 0 0 0 0 0
0 0 0 I H 0 0 0 0 0
0 0 0 J K 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
This is a schematic representation of your grid matrix, where, if you follow the letters A, B, C, etc, you will get the sequence of the flow through the grid-elements.
However, note that, no matter how smart an algorithm is, it will be unable to choose between the two possible flows:
A, B, C, D, E, F, G, H, I, J, K
and
A, B, C, D, E, F, G, H, K, J, I
So, I think you will have to record the sequence explicitly yourself, rather than deduce it from the positions of the T values in the grid.
Any algorithm will stop at the ambiguity at the grid location H in the above example

Checking diagonally in nqueen

I have a fragment of my code where i wrote functions to check rows, column and diagonal for the queen placement so that they will not attack each other. Currently I'm having issues with the diagonal function:
def checkDiagonal(T):
for i in range(len(T) - 1):
if abs(T[i] - T[i + 1]) == 1:
return False
return True
The problem with this function is that it will only consider when queens are one length apart but not when cases more than one.
Example, if N = 7 it prints:
Enter the value of N: 7
0 Q 0 0 0 0 0
0 0 0 0 0 0 0
0 0 X 0 0 0 0
0 0 X 0 0 0 0
0 0 X 0 0 0 0
0 0 X 0 0 0 0
Q 0 0 0 0 0 0
the Q in the output is the partial solution i set in the code. The X is the next possible position for the queen but there is one X in the output that is clearly diagonal to the queen and will be attacked.
Partial solution list = [6,0], in this case it will be passed to the function as T
Two points (x1, y1) and (x2, y2) are one the same lower left -> upper right diagonal if and only if y1 - x1 == y2 - x2.
If I understand you question correctly, the partial solution T = [0,6] would represent the partial solution [(0,0), (1,6)]. So, since 0 - 0 = 0 != 5 == 6 - 1 , these two elements are not on the same diagonal.
However, for the partial solution [0 , 6, 2] = [(0,0), (1,6), (2,2)] we would have 0 - 0 == 0 == 2 - 2 and hence the two points would be on the same lower left -> upper right diagonal.
For the upper left -> lower right diagonal you would then have to find a similar condition, which I think you should be able to figure out, but let me know if you don't manage to find it.
This would lead to something like the code (only for this diagonal):
def checkDiagonal(T):
for i in xrange(len(T) - 1):
for j in xrange(i + 1, len(T))
if ((T[i] - i == T[j] - j):
return false
return true
Be careful however that I didn't have time to test this, so there might be small errors in it, but the general idea should be right.

Counting of adjacent cells in a numpy array

Past midnight and maybe someone has an idea how to tackle a problem of mine. I want to count the number of adjacent cells (which means the number of array fields with other values eg. zeroes in the vicinity of array values) as sum for each valid value!.
Example:
import numpy, scipy
s = ndimage.generate_binary_structure(2,2) # Structure can vary
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
print a
>[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 1 1 1 0]
[0 0 1 1 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
# The value at position [2,4] is surrounded by 6 zeros, while the one at
# position [2,2] has 5 zeros in the vicinity if 's' is the assumed binary structure.
# Total sum of surrounding zeroes is therefore sum(5+4+6+4+5) == 24
How can i count the number of zeroes in such way if the structure of my values vary?
I somehow believe to must take use of the binary_dilation function of SciPy, which is able to enlarge the value structure, but simple counting of overlaps can't lead me to the correct sum or does it?
print ndimage.binary_dilation(a,s).astype(a.dtype)
[[0 0 0 0 0 0]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 0]
[0 0 0 0 0 0]]
Use a convolution to count neighbours:
import numpy
import scipy.signal
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
b = 1-a
c = scipy.signal.convolve2d(b, numpy.ones((3,3)), mode='same')
print numpy.sum(c * a)
b = 1-a allows us to count each zero while ignoring the ones.
We convolve with a 3x3 all-ones kernel, which sets each element to the sum of it and its 8 neighbouring values (other kernels are possible, such as the + kernel for only orthogonally adjacent values). With these summed values, we mask off the zeros in the original input (since we don't care about their neighbours), and sum over the whole array.
I think you already got it. after dilation, the number of 1 is 19, minus 5 of the starting shape, you have 14. which is the number of zeros surrounding your shape. Your total of 24 has overlaps.

Categories