I'm using cv2.goodFeaturesToTrack function to find feature points in an image. The end goal is to extract square blocks of certain size, with feature points being the centers of those blocks.
However, lots of the feature points are close to each other, so the blocks are overlapping, which is not what I want.
This is an example of all feature points (centers):
array([[3536., 1419.],
[2976., 1024.],
[3504., 1400.],
[3574., 1505.],
[3672., 1453.],
[3671., 1442.],
[3489., 1429.],
[3108., 737.]])
Let's say I want to find the first n blocks with a blockRadius = 400 which are not overlapping. Any ideas on how to achieve this?
You could get closer with scipy.spatial.KDTree - though it doesn't support querying blocks that consists of distinct amounts of points in blocks. So it can be used in conjunction with another library python-igraph that allows to find connected components of close points in a fast manner:
from scipy.spatial import KDTree
import igraph as ig
data = np.array([[3536., 1419.],
[2976., 1024.],
[3504., 1400.],
[3574., 1505.],
[3672., 1453.],
[3671., 1442.],
[3489., 1429.],
[3108., 737.]])
edges1 = KDTree(data[:,:1]).query_pairs(r=400)
edges2 = KDTree(data[:,1:]).query_pairs(r=400)
g = ig.Graph(n = len(data), edges=edges1 & edges2)
i = g.clusters()
So clusters corresponds to sequences of indices of block points of some kind of internal type igraph. There's a quick preview:
>>> print(i)
Clustering with 8 elements and 2 clusters
[0] 0, 2, 3, 4, 5, 6
[1] 1, 7
>>> pal = ig.drawing.colors.ClusterColoringPalette(len(i)) #number of colors used
color = pal.get_many(i.membership) #list of color tags
ig.plot(g, bbox = (200, 100), layout=g.layout('circle'), vertex_label=g.vs.indices,
vertex_color = color, vertex_size = 12, vertex_label_size = 8)
Example of usage:
>>> [data[n] for n in i] #or list(i)
[array([[3536., 1419.],
[3504., 1400.],
[3574., 1505.],
[3672., 1453.],
[3671., 1442.],
[3489., 1429.]]),
array([[2976., 1024.],
[3108., 737.]])]
Remark: this method allows to work with pairs of close points instead of n*n matrix which is more efficient in memory in some cases.
You'll need something iterative to do that, as recurrent dropouts like this aren't vectorizable. Something like this will work, I think
from scipy.spatial.distance import pdist, squareform
c = np.array([[3536., 1419.],
[2976., 1024.],
[3504., 1400.],
[3574., 1505.],
[3672., 1453.],
[3671., 1442.],
[3489., 1429.],
[3108., 737.]])
dists = squareform(pdist(c, metric = 'chebyshev')) # distance matrix, chebyshev here since you seem to want blocks
indices = np.arange(c.shape[0]) # indices that haven't been dropped (all to start)
out = [0] # always want the first index
while True:
try:
indices = indices[dists[indices[0], indices] > 400] #drop indices that are inside threshhold
out.append(indices[0]) # add the next index that hasn't been dropped to the output
except:
break # once you run out of indices, you'll get an IndexError and you're done
print(out)
[0, 1]
let's try with a whole bunch of points:
np.random.seed(42)
c = np.random.rand(10000, 2) * 800
dists = squareform(pdist(c, metric = 'chebyshev')) # distance matrix, checbyshev here since you seem to want squares
indices = np.arange(c.shape[0]) # indices that haven't been dropped (all to start)
out = [0] # always want the first index
while True:
try:
indices = indices[dists[indices[0], indices] > 400] #drop indices that are inside threshhold
out.append(indices[0]) # add the next index that hasn't been dropped to the output
except:
break # once you run out of indices, you'll get an IndexError and you're done
print(out, pdist(c[out], metric = 'chebyshev'))
[0, 2, 6, 17] [635.77582886 590.70015659 472.87353138 541.13920029 647.69071411
476.84658995]
So, 4 points (makes sense since 4 400x400 blocks tile a 800x800 space with 4 tiles), mostly low values (17 << 10000) and distance between kept points is always > 400
Related
In order to speed up my code I want to exchange my for loops by vectorization or other recommended tools. I found plenty of examples with replacing simple for loops but nothing for replacing nested for loops in combination with conditions, which I was able to comprehend / would have helped me...
With my code I want to check if points (X, Y coordinates) can be connected by lineaments (linear structures). I started pretty simple but over time the code outgrew itself and is now exhausting slow...
Here is an working example of the part taking the most time:
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import MultiLineString, LineString, Point
from shapely.affinity import rotate
from math import sqrt
from tqdm import tqdm
import random as rng
# creating random array of points
xys = rng.sample(range(201 * 201), 100)
points = [list(divmod(xy, 201)) for xy in xys]
# plot points
plt.scatter(*zip(*points))
# calculate length for rotating lines -> diagonal of bounds so all points able to be reached
length = sqrt(2)*200
# calculate angles to rotate lines
angles = []
for a in range(0, 360, 1):
angle = np.deg2rad(a)
angles.append(angle)
# copy points array to helper array (points_list) so original array is not manipulated
points_list = points.copy()
# array to save final lines
lines = []
# iterate over every point in points array to search for connecting lines
for point in tqdm(points):
# delete point from helper array to speed up iteration -> so points do not get
# double, triple, ... checked
if len(points_list) > 0:
points_list.remove(point)
else:
break
# create line from original point to point at end of line (x+length) - this line
# gets rotated at calculated angles
start = Point(point)
end = Point(start.x+length, start.y)
line = LineString([start,end])
# iterate over angle Array to rotate line by each angle
for angle in angles:
rot_line = rotate(line, angle, origin=start, use_radians=True)
lst = list(rot_line.coords)
# save starting point (a) and ending point(b) of rotated line for np.cross()
# (cross product to check if points on/near rotated line)
a = np.asarray(lst[0])
b = np.asarray(lst[1])
# counter to count number of points on/near line
count = 0
line_list = []
# iterate manipulated points_list array (only points left for which there has
# not been a line rotated yet)
for poi in points_list:
# check whether point (pio) is on/near rotated line by calculating cross
# product (np.corss())
p = np.asarray(poi)
cross = np.cross(p-a,b-a)
# check if poi is inside accepted deviation from cross product
if cross > -750 and cross < 750:
# check if more than 5 points (poi) are on/near the rotated line
if count < 5:
line_list.append(poi)
count += 1
# if 5 points are connected by the rotated line sort the coordinates
# of the points and check if the length of the line meets the criteria
else:
line_list = sorted(line_list , key=lambda k: [k[1], k[0]])
line_length = LineString(line_list)
if line_length.length >= 10 and line_length.length <= 150:
lines.append(line_list)
break
# use shapeplys' MultiLineString to create lines from coordinates and plot them
# afterwards
multiLines = MultiLineString(lines)
fig, ax = plt.subplots()
ax.set_title("Lines")
for multiLine in MultiLineString(multiLines).geoms:
# print(multiLine)
plt.plot(*multiLine.xy)
As mentioned above it was thinking about using pandas or numpy vectorization and therefore build a pandas df for the points and lines (gdf) and one with the different angles (angles) to rotate the lines:
Name
Type
Size
Value
gdf
DataFrame
(122689, 6)
Column name: x, y, value, start, end, line
angles
DataFrame
(360, 1)
Column name: angle
But I ran out of ideas to replace this nested for loops with conditions with pandas vectorization. I found this article on medium and halfway through the article there are conditions for vectorization mentioned and I was wondering if my code maybe is not suitbale for vectorization because of dependencies within the loops...
If this is right, it does not necessarily needs to be vectoriation everything boosting the performance is welcome!
You can quite easily vectorize the most computationally intensive part: the innermost loop. The idea is to compute the points_list all at once. np.cross can be applied on each lines, np.where can be used to filter the result (and get the IDs).
Here is the (barely tested) modified main loop:
for point in tqdm(points):
if len(points_list) > 0:
points_list.remove(point)
else:
break
start = Point(point)
end = Point(start.x+length, start.y)
line = LineString([start,end])
# CHANGED PART
if len(points_list) == 0:
continue
p = np.asarray(points_list)
for angle in angles:
rot_line = rotate(line, angle, origin=start, use_radians=True)
a, b = np.asarray(rot_line.coords)
cross = np.cross(p-a,b-a)
foundIds = np.where((cross > -750) & (cross < 750))[0]
if foundIds.size > 5:
# Similar to the initial part, not efficient, but rarely executed
line_list = p[foundIds][:5].tolist()
line_list = sorted(line_list, key=lambda k: [k[1], k[0]])
line_length = LineString(line_list)
if line_length.length >= 10 and line_length.length <= 150:
lines.append(line_list)
This is about 15 times faster on my machine.
Most of the time is spent in the shapely module which is very inefficient (especially rotate and even np.asarray(rot_line.coords)). Indeed, each call to rotate takes about 50 microseconds which is simply insane: it should take no more than 50 nanoseconds, that is, 1000 time faster (actually, an optimized native code should be able to to that in less than 20 ns on my machine). If you want a faster code, then please consider not using this package (or improving its performance).
I'm trying to vectorize my fitness function for a Minimum Vector Cover genetic algorithm, but I'm at a loss about how to do it.
As it stands now:
vert_cover_fitness = [1 if self.dna[edge[0]] or self.dna[edge[1]] else -num_edges for edge in edges]
The dna is a one-dimensional binary array of size [0..n], where each index corresponds to a vertex, and its value indicates if we have chosen it or not. edges is a two dimensional positive integer array, where each value corresponds to a vertex (index) in dna. Both are ndarrays.
Simply explained - if one of the vertices connected by an edge is "selected", then we get a score of one. If not, the function is penalized by -num_edges.
I have tried np.vectorize as an attempt to get away cheap with a lambda function:
fit_func = np.vectorize(lambda edge: 1 if self.dna[edge[0]] or self.dna[edge[1]] else -num_edges)
vert_cover_fitness = fit_func(edges)
This returns IndexError: invalid index to scalar variable., as this function is applied to each value, and not each row.
To fix this I tried np.apply_along_axis. This works but it's just a wrapper for a loop so I'm not getting any speedups.
If any Numpy wizards can see some obvious way to do this, I would much appreciate your help. I'm guessing a problem lies with the representation of the problem, and that changing either the dna or edges shapes could help. I'm just not skilled enough to see what I should do.
I came up with this bit of numpy code, it runs 30x faster than your for loop on my randomly generated data.
import numpy as np
num_vertices = 1000
num_edges = 500
dna = np.random.choice([0, 1], num_vertices)
edges = np.random.randint(0, num_vertices, num_edges * 2).reshape(-1, 2)
vert_cover_fitness1 = [1 if dna[edge[0]] or dna[edge[1]] else -num_edges for edge in edges]
vert_cover_fitness2 = np.full([num_edges], -num_edges)
mask = (dna[edges[:, 0]] | dna[edges[:, 1]]).astype(bool)
vert_cover_fitness2[mask] = 1.0
print((vert_cover_fitness1 == vert_cover_fitness2).all()) # this shows it's correct
Here is the timeit code used to measure the speedup.
import timeit
setup = """
import numpy as np
num_vertices = 1000
num_edges = 500
dna = np.random.choice([0, 1], num_vertices)
edges = np.random.randint(0, num_vertices, num_edges*2).reshape(-1, 2)
"""
python_loop = "[1 if dna[edge[0]] or dna[edge[1]] else -num_edges for edge in edges]"
print(timeit.timeit(python_loop, setup, number=1000))
vectorised="""
vert_cover_fitness2 = np.full([num_edges], -num_edges)
mask = (dna[edges[:, 0]] | dna[edges[:, 1]]).astype(bool)
vert_cover_fitness2[mask] = 1.0
"""
print(timeit.timeit(vectorised, setup, number=1000))
# prints:
# 0.375906624016352
# 0.012783741112798452
I'm trying to generate all possible bidirectional graphs given a set of nodes. I'm storing my graphs as numpy vectors, and need them that way as there is some downstream code consumes the graphs in this format.
Lets say I have two sets of nodes in which nodes in the same set do not connect. It is however possible to have graphs in which members of two sets do not meet at all.
posArgs= [0,1] # (0->1) / (1->1) is not allowed..neither is (0->1)
negArgs= [2] # (0->2) is possible, and so is (0 2) - meaning no connection.
To illustrate what I mean:
translates as:
import numpy as np
graph = np.array([0,0,1,0,0,1,0,0,0])
singleGraph= np.vstack( np.array_split(np.array(graph), nargs))
>>[[0 0 1]
[0 0 1]
[0 0 0]]
# Where the first row represent node 0's relationships with nodes 0, 1, 2
# the second row represents node 1's relationships with nodes 0, 1, 2 etc
What I want is to generate all the possible ways in which these two sets can then be constructed (i.e. all the possible node combinations as vectors). At the moment I am using itertools.product to generate a power set of all the nodes, then I create a bunch of vectors in which there are circular connections and same set connections. I then delete them from the powerset. Hence using the sets above I have the following code:
import numpy as np
import itertools
posArgs= [0,1] # 0->1 / 1->1 is not allowed..neither is 0->1
negArgs= [2]
nargs= len(posArgs+ negArgs)
allPermutations= np.array(list(itertools.product([0,1], repeat=nargs*nargs)))
# Create list of Attacks that we will never need. Circular attacks, and attacks between arguments of same polarity
circularAttacks = (np.arange(0, nargs*nargs, nargs+1)).tolist()
samePolarityAttacks = []
posList = list(itertools.permutations(posArgs, 2))
negList = list(itertools.permutations(negArgs, 2))
totList = posList + negList
for l in totList:
ptn = ((l[0]+1)*nargs)- ((nargs+1) - l[1]) + 1 # All the odd +1 are to account for the shift in 0 index
samePolarityAttacks.append(ptn)
graphsToDelete = np.unique([circularAttacks + samePolarityAttacks])
subGraphs = allPermutations[:,graphsToDelete]
cutDownGraphs = np.delete(allPermutations, (np.where(subGraphs>0)[0]).tolist(), axis = 0)
for graph in cutDownGraphs:
singleGraph= np.vstack( np.array_split(np.array(graph), nargs))
print(singleGraph)
My problem is that when I have more than 5 nodes within both of my sets my itertools.product is trying to produce (2^25) set of vectors. This of course is really expensive and leaves me with memory leaks.
Are you aware of a smart way in which I can reshape this code while ensuring my graphs stay in this numpy array format?
--Additional info:
For two sets, one node a set, all possible combinations look as follows:
Thanks
I have some code for calculating missing values in an image, based on neighbouring values in a 2D circular window. It also uses the values from one or more temporally-adjacent images at the same locations (i.e. the same 2D window shifted in the 3rd dimension).
For each position that is missing, I need to calculate the value based not necessarily on all the values available in the whole window, but only on the spatially-nearest n cells that do have values (in both images / Z-axis positions), where n is some value less than the total number of cells in the 2D window.
At the minute, it's much quicker to calculate for everything in the window, because my means of sorting to get the nearest n cells with data is the slowest part of the function as it has to be repeated each time even though the distances in terms of window coordinates do not change. I'm not sure this is necessary and feel I must be able to get the sorted distances once, and then mask those in the process of only selecting available cells.
Here's my code for selecting the data to use within a window of the gap cell location:
# radius will in reality be ~100
radius = 2
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
dist = np.sqrt(x**2 + y**2)
circle_template = dist > radius
# this will in reality be a very large 3 dimensional array
# representing daily images with some gaps, indicated by 0s
dataStack = np.zeros((2,5,5))
dataStack[1] = (np.random.random(25) * 100).reshape(dist.shape)
dataStack[0] = (np.random.random(25) * 100).reshape(dist.shape)
testdata = dataStack[1]
alternatedata = dataStack[0]
random_gap_locations = (np.random.random(25) * 30).reshape(dist.shape) > testdata
testdata[random_gap_locations] = 0
testdata[radius, radius] = 0
# in reality we will go through every gap (zero) location in the data
# for each image and for each gap use slicing to get a window of
# size (radius*2+1, radius*2+1) around it from each image, with the
# gap being at the centre i.e.
# testgaplocation = [radius, radius]
# and the variables testdata, alternatedata below will refer to these
# slices
locations_to_exclude = np.logical_or(circle_template, np.logical_or
(testdata==0, alternatedata==0))
# the places that are inside the circular mask and where both images
# have data
locations_to_include = ~locations_to_exclude
number_available = np.count_nonzero(locations_to_include)
# we only want to do the interpolation calculations from the nearest n
# locations that have data available, n will be ~100 in reality
number_required = 3
available_distances = dist[locations_to_include]
available_data = testdata[locations_to_include]
available_alternates = alternatedata[locations_to_include]
if number_available > number_required:
# In this case we need to find the closest number_required of elements, based
# on distances recorded in dist, from available_data and available_alternates
# Having to repeat this argsort for each gap cell calculation is slow and feels
# like it should be avoidable
sortedDistanceIndices = available_distances.argsort(kind = 'mergesort',axis=None)
requiredIndices = sortedDistanceIndices[0:number_required]
selected_data = np.take(available_data, requiredIndices)
selected_alternates = np.take(available_alternates , requiredIndices)
else:
# we just use available_data and available_alternates as they are...
# now do stuff with the selected data to calculate a value for the gap cell
This works, but over half of the total time of the function is taken in the argsort of the masked spatial distance data. (~900uS of a total 1.4mS - and this function will be running tens of billions of times, so this is an important difference!)
I am sure that I must be able to just do this argsort once outside of the function, when the spatial distance window is originally set up, and then include those sort indices in the masking, to get the first howManyToCalculate indices without having to re-do the sort. The answer might involve putting the various bits that we are extracting from, into a record array - but I can't figure out how, if so. Can anyone see how I can make this part of the process more efficient?
So you want to do the sorting outside of the loop:
sorted_dist_idcs = dist.argsort(kind='mergesort', axis=None)
Then using some variables from the original code, this is what I could come up with, though it still feels like a major round-trip..
loc_to_incl_sorted = locations_to_include.take(sorted_dist_idcs)
sorted_dist_idcs_to_incl = sorted_dist_idcs[loc_to_incl_sorted]
required_idcs = sorted_dist_idcs_to_incl[:number_required]
selected_data = testdata.take(required_idcs)
selected_alternates = alternatedata.take(required_idcs)
Note the required_idcs refer to locations in the testdata and not available_data as in the original code. And this snippet I used take for the purpose of conveniently indexing the flattened array.
#moarningsun - thanks for the comment and answer. These got me on the right track, but don't quite work for me when the gap is < radius from the edge of the data: in this case I use a window around the gap cell which is "trimmed" to the data bounds. In this situation the indices reflect the "full" window and thus can't be used to select cells from the bounded window.
Unfortunately I edited that part of my code out when I clarified the original question but it's turned out to be relevant.
I've realised now that if you use argsort again on the output of argsort then you get ranks; i.e. the position that each item would have when the overall array was sorted. We can safely mask these and then take the smallest number_required of them (and do this on a structured array to get the corresponding data at the same time).
This implies another sort within the loop, but in fact we can use partitioning rather than a full sort, because all we need is the smallest num_required items. If num_required is substantially less than the number of data items then this is much faster than doing the argsort.
For example with num_required = 80 and num_available = 15000 the full argsort takes ~900µs whereas argpartition followed by index and slice to get the first 80 takes ~110µs. We still need to do the argsort to get the ranks at the outset (rather than just partitioning based on distance) in order to get the stability of the mergesort, and thus get the "right one" when distance is not unique.
My code as shown below now runs in ~610uS on real data, including the actual calculations that aren't shown here. I'm happy with that now, but there seem to be several other apparently minor factors that can have an influence on the runtime that's hard to understand.
For example putting the circle_template in the structured array along with dist, ranks, and another field not shown here, doubles the runtime of the overall function (even if we don't access circle_template in the loop!). Even worse, using np.partition on the structured array with order=['ranks'] increases the overall function runtime by almost two orders of magnitude vs using np.argpartition as shown below!
# radius will in reality be ~100
radius = 2
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
dist = np.sqrt(x**2 + y**2)
circle_template = dist > radius
ranks = dist.argsort(axis=None,kind='mergesort').argsort().reshape(dist.shape)
diam = radius * 2 + 1
# putting circle_template in this array too doubles overall function runtime!
fullWindowArray = np.zeros((diam,diam),dtype=[('ranks',ranks.dtype.str),
('thisdata',dayDataStack.dtype.str),
('alternatedata',dayDataStack.dtype.str),
('dist',spatialDist.dtype.str)])
fullWindowArray['ranks'] = ranks
fullWindowArray['dist'] = dist
# this will in reality be a very large 3 dimensional array
# representing daily images with some gaps, indicated by 0s
dataStack = np.zeros((2,5,5))
dataStack[1] = (np.random.random(25) * 100).reshape(dist.shape)
dataStack[0] = (np.random.random(25) * 100).reshape(dist.shape)
testdata = dataStack[1]
alternatedata = dataStack[0]
random_gap_locations = (np.random.random(25) * 30).reshape(dist.shape) > testdata
testdata[random_gap_locations] = 0
testdata[radius, radius] = 0
# in reality we will loop here to go through every gap (zero) location in the data
# for each image
gapz, gapy, gapx = 1, radius, radius
desLeft, desRight = gapx - radius, gapx + radius+1
desTop, desBottom = gapy - radius, gapy + radius+1
extentB, extentR = dataStack.shape[1:]
# handle the case where the gap is < search radius from the edge of
# the data. If this is the case, we can't use the full
# diam * diam window
dataL = max(0, desLeft)
maskL = 0 if desLeft >= 0 else abs(dataL - desLeft)
dataT = max(0, desTop)
maskT = 0 if desTop >= 0 else abs(dataT - desTop)
dataR = min(desRight, extentR)
maskR = diam if desRight <= extentR else diam - (desRight - extentR)
dataB = min(desBottom,extentB)
maskB = diam if desBottom <= extentB else diam - (desBottom - extentB)
# get the slice that we will be working within
# ranks, dist and circle are already populated
boundedWindowArray = fullWindowArray[maskT:maskB,maskL:maskR]
boundedWindowArray['alternatedata'] = alternatedata[dataT:dataB, dataL:dataR]
boundedWindowArray['thisdata'] = testdata[dataT:dataB, dataL:dataR]
locations_to_exclude = np.logical_or(boundedWindowArray['circle_template'],
np.logical_or
(boundedWindowArray['thisdata']==0,
boundedWindowArray['alternatedata']==0))
# the places that are inside the circular mask and where both images
# have data
locations_to_include = ~locations_to_exclude
number_available = np.count_nonzero(locations_to_include)
# we only want to do the interpolation calculations from the nearest n
# locations that have data available, n will be ~100 in reality
number_required = 3
data_to_use = boundedWindowArray[locations_to_include]
if number_available > number_required:
# argpartition seems to be v fast when number_required is
# substantially < data_to_use.size
# But partition on the structured array itself with order=['ranks']
# is almost 2 orders of magnitude slower!
reqIndices = np.argpartition(data_to_use['ranks'],number_required)[:number_required]
data_to_use = np.take(data_to_use,reqIndices)
else:
# we just use available_data and available_alternates as they are...
pass
# now do stuff with the selected data to calculate a value for the gap cell
i have a large numpy array and labeled it with the connected component labeling in scipy. Now i want to create subsets of this array, where only the biggest or smallest labels in size are left.
Both extrema can of course occur several times.
import numpy
from scipy import ndimage
....
# Loaded in my image file here. To big to paste
....
s = ndimage.generate_binary_structure(2,2) # iterate structure
labeled_array, numpatches = ndimage.label(array,s) # labeling
# get the area (nr. of pixels) of each labeled patch
sizes = ndimage.sum(array,labeled_array,range(1,numpatches+1))
# To get the indices of all the min/max patches. Is this the correct label id?
map = numpy.where(sizes==sizes.max())
mip = numpy.where(sizes==sizes.min())
# This here doesn't work! Now i want to create a copy of the array and fill only those cells
# inside the largest, respecitively the smallest labeled patches with values
feature = numpy.zeros_like(array, dtype=int)
feature[labeled_array == map] = 1
Someone can give me hint how to move on?
Here is the full code:
import numpy
from scipy import ndimage
array = numpy.zeros((100, 100), dtype=np.uint8)
x = np.random.randint(0, 100, 2000)
y = np.random.randint(0, 100, 2000)
array[x, y] = 1
pl.imshow(array, cmap="gray", interpolation="nearest")
s = ndimage.generate_binary_structure(2,2) # iterate structure
labeled_array, numpatches = ndimage.label(array,s) # labeling
sizes = ndimage.sum(array,labeled_array,range(1,numpatches+1))
# To get the indices of all the min/max patches. Is this the correct label id?
map = numpy.where(sizes==sizes.max())[0] + 1
mip = numpy.where(sizes==sizes.min())[0] + 1
# inside the largest, respecitively the smallest labeled patches with values
max_index = np.zeros(numpatches + 1, np.uint8)
max_index[map] = 1
max_feature = max_index[labeled_array]
min_index = np.zeros(numpatches + 1, np.uint8)
min_index[mip] = 1
min_feature = min_index[labeled_array]
Notes:
numpy.where returns a tuple
the size of label 1 is sizes[0], so you need to add 1 to the result of numpy.where
To get a mask array with multiple labels, you can use labeled_array as the index of a label mask array.
The results:
first you need a labeled mask, given a mask with only 0(background) and 1(foreground):
labeled_mask, cc_num = ndimage.label(mask)
then find the largest connected component:
largest_cc_mask = (labeled_mask == (np.bincount(labeled_mask.flat)[1:].argmax() + 1))
you can deduce the smallest object finding by using argmin()..