How to use skimage.measure.regionprops to query labels - python

Can someone help me out with skimage.measure.regionprops? The documentation was confusing to me in describing the list of properties that regionprops provides.
I would like to do the following:
Query a point (x,y) and return which labeled area the point belongs in.
Get an ndarray of all points within a labeled area.
Here is some code showing what I have so far:
import numpy as np
from skimage.measure import label
import matplotlib.pyplot as plt
arr = np.array([[1, 0, 1, 0, 0, 0, 1],
[1, 1, 1, 0, 0, 0, 1],
[0, 1, 1, 0, 0, 0, 1],
[0, 1, 1, 0, 0, 1, 1],
[0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 1],
[1, 0, 0, 1, 1, 1, 1],
[1, 0, 0, 1, 1, 1, 1],
[1, 0, 0, 1, 1, 1, 1]])
img = label(arr)
plt.imshow(img)
plt.show()
Examples of what I want to do are making a query to arr[8][6] and knowing which label it is a part of (green) and to know all of the points that belong to an arbitrary label (like green).

The numeric label of any pixel can be retrieved by indexing img:
In [67]: row, col = 8, 6
In [68]: index = img[row, col]
In [69]: print(f'The label of pixel [{row}, {col}] is {index}')
The label of pixel [8, 6] is 2
The you could use NumPy's nonzero to get the coordinates of all the pixels with the same label:
In [70]: coords = np.nonzero(img == index)
In [71]: coords
Out[71]:
(array([0, 1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8], dtype=int32),
array([6, 6, 6, 5, 6, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6], dtype=int32))
In [72]: out = np.zeros(shape = arr.shape + (3,), dtype=np.uint8)
In [73]: out[coords] = [0, 255, 0] # green
In [74]: plt.imshow(out)
Out[74]: <matplotlib.image.AxesImage at 0x11a2ec10>

Related

Python numpy: Add elements of a numpy array of arrays to elements of another array of arrays initialized to at the specified positions

Suppose we have a numpy array of numpy arrays of zeros as
arr1=np.zeros((len(Train),(L))
where Train is a (dataset) numpy array of arrays of integers of fixed length.
We also have another 1d numpy array, positions of length as len(Train).
Now we wish to add elements of Train to arr1 at the positions specified by positions.
One way is to use a for loop on the Train array as:
k=len(Train[0])
for i in range(len(Train)):
arr1[i,int(positions[i]):int((positions[i]+k))]=Train[i,0:k])]
However, going over the entire Train set using the explicit for loop is slow and I would like to optimize it.
Here is one way by generating all the indexes you want to assign to. Setup:
import numpy as np
n = 12 # Number of training samples
l = 8 # Number of columns in the output array
k = 4 # Number of columns in the training samples
arr = np.zeros((n, l), dtype=int)
train = np.random.randint(10, size=(n, k))
positions = np.random.randint(l - k, size=n)
Random example data:
>>> train
array([[3, 4, 3, 2],
[3, 6, 4, 1],
[0, 7, 9, 6],
[4, 0, 4, 8],
[2, 2, 6, 2],
[4, 5, 1, 7],
[5, 4, 4, 4],
[0, 8, 5, 3],
[2, 9, 3, 3],
[3, 3, 7, 9],
[8, 9, 4, 8],
[8, 7, 6, 4]])
>>> positions
array([3, 2, 3, 2, 0, 1, 2, 2, 3, 2, 1, 1])
Advanced indexing with broadcasting trickery:
rows = np.arange(n)[:, None] # Shape (n, 1)
cols = np.arange(k) + positions[:, None] # Shape (n, k)
arr[rows, cols] = train
output:
>>> arr
array([[0, 0, 0, 3, 4, 3, 2, 0],
[0, 0, 3, 6, 4, 1, 0, 0],
[0, 0, 0, 0, 7, 9, 6, 0],
[0, 0, 4, 0, 4, 8, 0, 0],
[2, 2, 6, 2, 0, 0, 0, 0],
[0, 4, 5, 1, 7, 0, 0, 0],
[0, 0, 5, 4, 4, 4, 0, 0],
[0, 0, 0, 8, 5, 3, 0, 0],
[0, 0, 0, 2, 9, 3, 3, 0],
[0, 0, 3, 3, 7, 9, 0, 0],
[0, 8, 9, 4, 8, 0, 0, 0],
[0, 8, 7, 6, 4, 0, 0, 0]])

Distance transform with Manhattan distance - Python / NumPy / SciPy

I would like to generate a 2d Array like this using Python and Numpy:
[
[0, 1, 2, 3, 4, 4, 3, 4],
[1, 2, 3, 4, 4, 3, 2, 3],
[2, 3, 4, 4, 3, 2, 1, 2],
[3, 4, 4, 3, 2, 1, 0, 1],
[4, 5, 5, 4, 3, 2, 1, 2]
]
Pretty much the the numbers spread left and right starting from the zeros. This matrix allows to see the distance of any point to the closest zero. I thought this matrix was common, but I couldn't found anything on the web, even its name. If you have a code to efficiently generate such a matrix or know at least how it's called, please let me know.
Thank you
Here's one with Scipy cdist -
from scipy.spatial.distance import cdist
def bwdist_manhattan(a, seedval=1):
seed_mask = a==seedval
z = np.argwhere(seed_mask)
nz = np.argwhere(~seed_mask)
out = np.zeros(a.shape, dtype=int)
out[tuple(nz.T)] = cdist(z, nz, 'cityblock').min(0).astype(int)
return out
In MATLAB, it's called Distance transform of binary image, hence a derivative name is given here.
Sample run -
In [60]: a # input binary image with 1s at "seed" positions
Out[60]:
array([[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0]])
In [61]: bwdist_manhattan(a)
Out[61]:
array([[0, 1, 2, 3, 4, 4, 3, 4],
[1, 2, 3, 4, 4, 3, 2, 3],
[2, 3, 4, 4, 3, 2, 1, 2],
[3, 4, 4, 3, 2, 1, 0, 1],
[4, 5, 5, 4, 3, 2, 1, 2]])

Skimage Polygon function: Why does the last vertice repeats in the polygon documentation example?

I'm trying to understand how the polygon function works with this example of the documentation:
from skimage.draw import polygon
img = np.zeros((10, 10), dtype=np.uint8)
r = np.array([1, 2, 8, 1])
c = np.array([1, 7, 4, 1])
rr, cc = polygon(r, c)
img[rr, cc] = 1
img
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
I have a few questions about this:
The r variable has the row coordinates, and the c variable has the column coordinates. From what I see, it means that there are 4 vertices like this: (1,1), (2,7), (8,4) and (1,1). But when I see the img array, it looks like a triangle... Shouldn't the total of vertices be 3 instead of 4?
If I remove the last vertice, and use the polygon function I get the same results.
r = np.array([1, 2, 8])
c = np.array([1, 7, 4])
rr, cc = polygon(r, c)
# rr2 = array([2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7])
# cc2 = array([1, 2, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 3, 4, 5, 4, 4])
r2 = np.array([1, 2, 8, 1])
c2 = np.array([1, 7, 4, 1])
rr2, cc2 = polygon(r2, c2)
# rr2 = array([2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7])
# cc2 = array([1, 2, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 3, 4, 5, 4, 4])
Why I get the same results? its ignoring the last vertice (1,1)?
The function polygon consumes two sequences, namely the row and column coordinates of the vertices of a polygon. You don't need to repeat the coordinates of the first vertex at the end of both sequences as they are assumed to define closed polygonal chains.
Having a look at the source code is insighful. Under the hood skimage.draw.polygon calls skimage._draw._polygon which in turn determines whether a pixel lies inside a polygon through a call to the helper function point_in_polygon. In this function there is a for loop which iterates over the line segments that make up the polygon. It clearly emerges from the code that the polygonal chain is enforced to be closed as the first line segment is defined by the vertices of indices n_vert - 1 and 0. As a consequence polygon([1, 2, 8, 1], [1, 7, 4, 1]) returns the coordinates of the pixels that lie inside the polygon defined by the following line segments:
(1, 1) - (1, 1)
(1, 1) - (2, 7)
(2, 7) - (8, 4)
(8, 4) - (1, 1)
while polygon([1, 2, 8], [1, 7, 4]) returns the coordinates of the pixels that lie inside the polygon defined by the following line segments
(8, 4) - (1, 1)
(1, 1) - (2, 7)
(2, 7) - (8, 4)
As the length of segment (1, 1) - (1, 1) is zero, both polygons are actually the same polygon. This is why you are getting the same results.

Python numpy array -- close smallest regions

I have a 2D boolean numpy array that represents an image, on which I call skimage.measure.label to label each segmented region, giving me a 2D array of int [0,500]; each value in this array represents the region label for that pixel. I would like to now remove the smallest regions. For example, if my input array is shape (n, n), I would like all labeled regions of < m pixels to be subsumed into the larger surrounding regions. For example if n=10 and m=5, my input could be,
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 7, 8, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 2, 2, 2, 1, 1
4, 4, 4, 4, 2, 2, 2, 2, 1, 1
4, 6, 6, 4, 2, 2, 2, 3, 3, 3
4, 6, 6, 4, 5, 5, 5, 3, 3, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5
and the output is then,
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 0, 0, 1, 1, 1 # 7 and 8 are replaced by 0
0, 0, 0, 0, 0, 0, 0, 1, 1, 1
0, 0, 0, 0, 0, 2, 2, 2, 1, 1
4, 4, 4, 4, 2, 2, 2, 2, 1, 1
4, 4, 4, 4, 2, 2, 2, 3, 3, 3 # 6 is gone, but 3 remains
4, 4, 4, 4, 5, 5, 5, 3, 3, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5
4, 4, 4, 4, 5, 5, 5, 5, 5, 5
I've looked into skimage morphology operations, including binary closing, but none seem to work well for my use case. Any suggestions?
You can do this by performing a binary dilation on the boolean region corresponding to each label. By doing this you will find the number of neighbours for each region. Using this you can then replace values as needed.
For an example code:
import numpy as np
import scipy.ndimage
m = 5
arr = [[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 7, 8, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 2, 2, 2, 1, 1],
[4, 4, 4, 4, 2, 2, 2, 2, 1, 1],
[4, 6, 6, 4, 2, 2, 2, 3, 3, 3],
[4, 6, 6, 4, 5, 5, 5, 3, 3, 5],
[4, 4, 4, 4, 5, 5, 5, 5, 5, 5],
[4, 4, 4, 4, 5, 5, 5, 5, 5, 5]]
arr = np.array(arr)
nval = np.max(arr) + 1
# Compute number of occurances of each number
counts, _ = np.histogram(arr, bins=range(nval + 1))
# Compute the set of neighbours for each number via binary dilation
c = np.array([scipy.ndimage.morphology.binary_dilation(arr == i)
for i in range(nval)])
# Loop over the set of arrays with bad count and update them to the most common
# neighbour
for i in filter(lambda i: counts[i] < m, range(nval)):
arr[arr == i] = np.argmax(np.sum(c[:, arr == i], axis=1))
Which gives the expected result:
>>> arr.tolist()
[[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 2, 2, 2, 1, 1],
[4, 4, 4, 4, 2, 2, 2, 2, 1, 1],
[4, 4, 4, 4, 2, 2, 2, 3, 3, 3],
[4, 4, 4, 4, 5, 5, 5, 3, 3, 5],
[4, 4, 4, 4, 5, 5, 5, 5, 5, 5],
[4, 4, 4, 4, 5, 5, 5, 5, 5, 5]]

Delete rows in ndarray where sum of multiple indexes is 0

So I have a very large two-dimensional numpy array such as:
array([[ 2, 4, 0, 0, 0, 5, 9, 0],
[ 2, 3, 0, 1, 0, 3, 1, 1],
[ 1, 5, 4, 3, 2, 7, 8, 3],
[ 0, 7, 0, 0, 0, 6, 4, 4],
...,
[ 6, 5, 6, 0, 0, 1, 9, 5]])
I would like to quickly remove each row of the array where np.sum(row[2:5]) == 0
The only way I can think to do this is with for loops, but that takes very long when there are millions of rows. Additionally, this needs to be constrained to Python 2.7
Boolean expressions can be used as an index. You can use them to mask the array.
inputarray = array([[ 2, 4, 0, 0, 0, 5, 9, 0],
[ 2, 3, 0, 1, 0, 3, 1, 1],
[ 1, 5, 4, 3, 2, 7, 8, 3],
[ 0, 7, 0, 0, 0, 6, 4, 4],
...,
[ 6, 5, 6, 0, 0, 1, 9, 5]])
mask = numpy.sum(inputarray[:,2:5], axis=1) != 0
result = inputarray[mask,:]
What this is doing:
inputarray[:, 2:5] selects all the columns you want to sum over
axis=1 means we're doing the sum on the columns
We want to keep the rows where the sum is not zero
The mask is used as a row index and selects the rows where the boolean expression is True
Another solution would be to use numpy.apply_along_axis to calculate the sums and cast it as a bool, and use that for your index:
my_arr = np.array([[ 2, 4, 0, 0, 0, 5, 9, 0],
[ 2, 3, 0, 1, 0, 3, 1, 1],
[ 1, 5, 4, 3, 2, 7, 8, 3],
[ 0, 7, 0, 0, 0, 6, 4, 4],])
my_arr[np.apply_along_axis(lambda x: bool(sum(x[2:5])), 1, my_arr)]
array([[2, 3, 0, 1, 0, 3, 1, 1],
[1, 5, 4, 3, 2, 7, 8, 3]])
We just cast the sum too a bool since any number that's not 0 is going to be True.
>>> a
array([[2, 4, 0, 0, 0, 5, 9, 0],
[2, 3, 0, 1, 0, 3, 1, 1],
[1, 5, 4, 3, 2, 7, 8, 3],
[0, 7, 0, 0, 0, 6, 4, 4],
[6, 5, 6, 0, 0, 1, 9, 5]])
You are interested in columns 2 through five
>>> a[:,2:5]
array([[0, 0, 0],
[0, 1, 0],
[4, 3, 2],
[0, 0, 0],
[6, 0, 0]])
>>> b = a[:,2:5]
You want to find the sum of those columns in each row
>>> sum_ = b.sum(1)
>>> sum_
array([0, 1, 9, 0, 6])
These are the rows that meet your criteria
>>> sum_ != 0
array([False, True, True, False, True], dtype=bool)
>>> keep = sum_ != 0
Use boolean indexing to select those rows
>>> a[keep, :]
array([[2, 3, 0, 1, 0, 3, 1, 1],
[1, 5, 4, 3, 2, 7, 8, 3],
[6, 5, 6, 0, 0, 1, 9, 5]])
>>>

Categories