Isolate greatest/smallest labeled patches from numpy array - python

i have a large numpy array and labeled it with the connected component labeling in scipy. Now i want to create subsets of this array, where only the biggest or smallest labels in size are left.
Both extrema can of course occur several times.
import numpy
from scipy import ndimage
....
# Loaded in my image file here. To big to paste
....
s = ndimage.generate_binary_structure(2,2) # iterate structure
labeled_array, numpatches = ndimage.label(array,s) # labeling
# get the area (nr. of pixels) of each labeled patch
sizes = ndimage.sum(array,labeled_array,range(1,numpatches+1))
# To get the indices of all the min/max patches. Is this the correct label id?
map = numpy.where(sizes==sizes.max())
mip = numpy.where(sizes==sizes.min())
# This here doesn't work! Now i want to create a copy of the array and fill only those cells
# inside the largest, respecitively the smallest labeled patches with values
feature = numpy.zeros_like(array, dtype=int)
feature[labeled_array == map] = 1
Someone can give me hint how to move on?

Here is the full code:
import numpy
from scipy import ndimage
array = numpy.zeros((100, 100), dtype=np.uint8)
x = np.random.randint(0, 100, 2000)
y = np.random.randint(0, 100, 2000)
array[x, y] = 1
pl.imshow(array, cmap="gray", interpolation="nearest")
s = ndimage.generate_binary_structure(2,2) # iterate structure
labeled_array, numpatches = ndimage.label(array,s) # labeling
sizes = ndimage.sum(array,labeled_array,range(1,numpatches+1))
# To get the indices of all the min/max patches. Is this the correct label id?
map = numpy.where(sizes==sizes.max())[0] + 1
mip = numpy.where(sizes==sizes.min())[0] + 1
# inside the largest, respecitively the smallest labeled patches with values
max_index = np.zeros(numpatches + 1, np.uint8)
max_index[map] = 1
max_feature = max_index[labeled_array]
min_index = np.zeros(numpatches + 1, np.uint8)
min_index[mip] = 1
min_feature = min_index[labeled_array]
Notes:
numpy.where returns a tuple
the size of label 1 is sizes[0], so you need to add 1 to the result of numpy.where
To get a mask array with multiple labels, you can use labeled_array as the index of a label mask array.
The results:

first you need a labeled mask, given a mask with only 0(background) and 1(foreground):
labeled_mask, cc_num = ndimage.label(mask)
then find the largest connected component:
largest_cc_mask = (labeled_mask == (np.bincount(labeled_mask.flat)[1:].argmax() + 1))
you can deduce the smallest object finding by using argmin()..

Related

What is the best way/method to digitize the data of a 3D surface into a grid of pixels with smaller resolution in Python?

I want to digitize (= average out over cells) photon count data into pixels given by a grid that tells how they are aligned. The photon count data is stored in a 2D array. I want to split that data into cells, each of which would correspond to a pixel. The idea is basically the same as changing an HD image to a smaller resolution. I'd like to achieve this in Python.
The digitizing function I've written:
import numpy as np
def digitize(function_data, grid_shape):
"""
function_data = 2D array of function values of some 3D shape,
eg.: exp(-(x^2 + y^2 -> want to digitize this
grid_shape: an array of length 2 which contains the dimensions of the smaller resolution
"""
l = len(function_data)
pixel_len_x = int(l/grid_shape[0])
pixel_len_y = int(l/grid_shape[1])
digitized_data = np.empty((grid_shape[0], grid_shape[1]))
for i in range(grid_shape[0]): #row-index of pixel in smaller-resolution grid
for j in range(grid_shape[1]): #column-index of pixel in smaller-resolution grid
hd_pixel = []
for k in range(pixel_len_y):
hd_pixel.append(z_data[k][j:j*pixel_len_x])
hd_pixel = np.ravel(hd_pixel) #turns 2D array into 1D to be able to compute average
pixel_avg = np.average(hd_pixel)
digitized_data[i][j] = pixel_avg
return digitized_data
In theory, this function should do what I want to achieve, but when tested it doesn't yield the expected results. Either a completed version of my function or any other method that achieves my goal would be extremely helpful.
You could also use a interpolation function, if you can use SciPy. Here we use one of the gridded data interpolating functions, RectBivariateSpline to upsample your function, but you can find numerous examples on this and other sites.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import RectBivariateSpline as rbs
# Sampling coordinates
x = np.linspace(-2,2,20)
y = np.linspace(-2,2,30)
# Your function
f = np.exp(-(x[:,None]**2 + y**2))
# Interpolator
interp = rbs(x, y, f)
# Higher resolution coordinates
x_hd = np.linspace(x.min(), x.max(), x.size * 5)
y_hd = np.linspace(y.min(), y.max(), y.size * 5)
# New higher res function
f_hd = interp(x_hd, y_hd, grid = True)
# Some plots
fig, ax = plt.subplots(ncols = 2)
ax[0].imshow(f)
ax[1].imshow(f_hd)

How to remove overlapping blocks from numpy array?

I'm using cv2.goodFeaturesToTrack function to find feature points in an image. The end goal is to extract square blocks of certain size, with feature points being the centers of those blocks.
However, lots of the feature points are close to each other, so the blocks are overlapping, which is not what I want.
This is an example of all feature points (centers):
array([[3536., 1419.],
[2976., 1024.],
[3504., 1400.],
[3574., 1505.],
[3672., 1453.],
[3671., 1442.],
[3489., 1429.],
[3108., 737.]])
Let's say I want to find the first n blocks with a blockRadius = 400 which are not overlapping. Any ideas on how to achieve this?
You could get closer with scipy.spatial.KDTree - though it doesn't support querying blocks that consists of distinct amounts of points in blocks. So it can be used in conjunction with another library python-igraph that allows to find connected components of close points in a fast manner:
from scipy.spatial import KDTree
import igraph as ig
data = np.array([[3536., 1419.],
[2976., 1024.],
[3504., 1400.],
[3574., 1505.],
[3672., 1453.],
[3671., 1442.],
[3489., 1429.],
[3108., 737.]])
edges1 = KDTree(data[:,:1]).query_pairs(r=400)
edges2 = KDTree(data[:,1:]).query_pairs(r=400)
g = ig.Graph(n = len(data), edges=edges1 & edges2)
i = g.clusters()
So clusters corresponds to sequences of indices of block points of some kind of internal type igraph. There's a quick preview:
>>> print(i)
Clustering with 8 elements and 2 clusters
[0] 0, 2, 3, 4, 5, 6
[1] 1, 7
>>> pal = ig.drawing.colors.ClusterColoringPalette(len(i)) #number of colors used
color = pal.get_many(i.membership) #list of color tags
ig.plot(g, bbox = (200, 100), layout=g.layout('circle'), vertex_label=g.vs.indices,
vertex_color = color, vertex_size = 12, vertex_label_size = 8)
Example of usage:
>>> [data[n] for n in i] #or list(i)
[array([[3536., 1419.],
[3504., 1400.],
[3574., 1505.],
[3672., 1453.],
[3671., 1442.],
[3489., 1429.]]),
array([[2976., 1024.],
[3108., 737.]])]
Remark: this method allows to work with pairs of close points instead of n*n matrix which is more efficient in memory in some cases.
You'll need something iterative to do that, as recurrent dropouts like this aren't vectorizable. Something like this will work, I think
from scipy.spatial.distance import pdist, squareform
c = np.array([[3536., 1419.],
[2976., 1024.],
[3504., 1400.],
[3574., 1505.],
[3672., 1453.],
[3671., 1442.],
[3489., 1429.],
[3108., 737.]])
dists = squareform(pdist(c, metric = 'chebyshev')) # distance matrix, chebyshev here since you seem to want blocks
indices = np.arange(c.shape[0]) # indices that haven't been dropped (all to start)
out = [0] # always want the first index
while True:
try:
indices = indices[dists[indices[0], indices] > 400] #drop indices that are inside threshhold
out.append(indices[0]) # add the next index that hasn't been dropped to the output
except:
break # once you run out of indices, you'll get an IndexError and you're done
print(out)
[0, 1]
let's try with a whole bunch of points:
np.random.seed(42)
c = np.random.rand(10000, 2) * 800
dists = squareform(pdist(c, metric = 'chebyshev')) # distance matrix, checbyshev here since you seem to want squares
indices = np.arange(c.shape[0]) # indices that haven't been dropped (all to start)
out = [0] # always want the first index
while True:
try:
indices = indices[dists[indices[0], indices] > 400] #drop indices that are inside threshhold
out.append(indices[0]) # add the next index that hasn't been dropped to the output
except:
break # once you run out of indices, you'll get an IndexError and you're done
print(out, pdist(c[out], metric = 'chebyshev'))
[0, 2, 6, 17] [635.77582886 590.70015659 472.87353138 541.13920029 647.69071411
476.84658995]
So, 4 points (makes sense since 4 400x400 blocks tile a 800x800 space with 4 tiles), mostly low values (17 << 10000) and distance between kept points is always > 400

Create image histogram using Python

I have a table of 10000 RGB triplets and another table of corresponding colour names. But the number of unique colour names is only 10. Now I want to create a 10-bin histogram for a given image using these two tables. I use NearestNeighbors from scikit-learn for this. Here's part of my code:
rgb_matrix = np.asarray(joblib.load('rgb-matrix.pkl'))
rgb_colors = np.asarray(joblib.load('rgb-colors.pkl'))
color_list = []
for i in xrange(len(rgb_colors)):
color = rgb_colors[i]
if color not in color_list:
color_list.append(color)
nbrs = NearestNeighbors(n_neighbors=4,algorithm='ball_tree').fit(rgb_matrix)
rgb_arr = input_image.reshape(-1,3)
color_arr = nbrs.kneighbors(rgb_arr)[1] # No of nearest neighbours is set to 4
color_index = np.asarray(color_arr[:,0]) # Get the top color index
hist = np.zeros(10)
for i in xrange(len(color_index)):
hist[color_list.index(rgb_colors[color_index[i]])] += 1.0
But this loop makes the process really slow. Is there a way I can use np.histogram here?

Python: using X and Y values to draw a picture

I have a series of methods that take an image 89x22 pixels (although the size, theoretically, is irrelevant) and fits a curve to each row of pixels to find the location of the most significant signal. At the end, I have a list of Y-values, one for each row of pixels, and a list of X-values, the location of the most significant peak for each row.
I would like to test different types of curves to see which models the data better, and in order to do so, I would like to be able to print out a new image, also 89x22 pixels, with the location of the most significant peak marked with a single red pixel for each line. A have attached an input example and a (poorly drawn) example of what I expect a good output to look like:
Any suggestions on which modules to start looking in?
class image :
def importImage (self) :
"""Open an image and sort all pixel values into a list of lists"""
from PIL import Image #imports Image from PIL library
im = Image.open("testTop.tif") #open the file
size = im.size #size object is a tuple with the pixel width and pixel height
width = size[0] #defines width object as the image width in pixels
height = size[1] #defines the height object as the image height in pixels
allPixels = list(im.getdata()) #makes a list of all pixels values
pixelList = [allPixels[width*i : width * (i+1)] for i in range(height)] #takes mega-list and makes a list of lists by row
return(pixelList) #returns list of lists
def fitCurves (self) :
"""
Iterate through a list of lists and fit a curve to each list of integers.
Append the position of the list and the location of the vertex to a growing list.
"""
from scipy.optimize import curve_fit
import numpy as np
from matplotlib import pyplot as pp
from scipy.misc import factorial
image = self.importImage()
xList = []
yList = []
position = 0
for row in image :
#Gaussian fit equations kindly provided by user mcwitt
x = np.arange(len(row))
ffunc = lambda x, a, x0, s: a*np.exp(-0.5*(x-x0)**2/s**2) # define function to fit
p, _ = curve_fit(ffunc, x, row, p0=[100,5,2]) # fit with initial guess a=100, x0=5, s=2
x0 = p[1]
yList.append(position)
position = position + 1
xList.append(x0)
print(yList)
print(xList)
newImage = image()
newImage.fitCurves()
Mabye:
import numpy as np
from matplotlib import pyplot as plt
from scipy import ndimage
from scipy import optimize
%matplotlib inline
# just a gaussian (copy paste from lmfit, another great package)
def my_gaussian(p,x):
amp = p[0]
cen = p[1]
wid = p[2]
return amp * np.exp(-(x-cen)**2 /wid)
# I do like to write a cost function separately. For the leastsquare algorithm it should return a vector.
def my_cost(p,data):
return data - my_gaussian(p,data)
# i load the image and generate the x values
image = ndimage.imread('2d_gaussian.png',flatten=True)
x = np.arange(image.shape[1])
popt = []
# enumerate is a convenient way to loop over an iterable and keep track of the index.
y = []
for index,data in enumerate(image):
''' this is the trick to make the algorithm robust.
I do plug the index of the maximum value of the current row as
initial guess for the center. Maybe it would be enough to do
just that and the fit is unnecessary. Haven`t checked that.
'''
max_index = np.argmax(data)
# initial guess.
x0 = [1.,max_index,10]
# call to the solver
p,_ = optimize.leastsq(my_cost, x0, args = data)
popt.append(p)
y.append(index)
'''
I do transpose the data.
As a consequence the values are stored row, not columnwise.
It is often easier to store the reusults inside a loop and
convert the data later into a numpy array.
'''
gaussian_hat = np.array(popt).T
# without the transpose, it would be center = gaussian_hat[:,1]
center = gaussian_hat[1]
y = np.array(y)
''' i do like to use an axis handle for the plot.
Not necessary, but gives me the opportunity to add new axis if necessary.
'''
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.imshow(image)
# since it is just a plot, I can plot the x, y coordinates
ax.plot(center,y,'k-')
# fitt of a 3th order polynomial
poly = np.polyfit(y,center,3)
# evaluation at points y
x_hat = np.polyval(poly,y)
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.imshow(image)
ax.plot(x_hat,y,'k-')
plt.savefig('2d_gaussian_fit.png')

Python - Iter through identified component features

I am standing in front of a huge problem. Using the python libraries NumPy and SciPy, I identified several features in large array. For this purpose, I created a 3x3 neighbor structure and used it for a connected component analysis --> see docs.
struct = scipy.ndimage.generate_binary_structure(2,2)
labeled_array, num_features = ndimage.label(array,struct)
My problem now is that I want to iterate through all identified features in a loop. Someone has an idea how to address individual features in the resulting NumPy array?
Here's an example of handling features identified by ndimage.label. Whether this helps you or not depends on what you want to do with the features.
import numpy as np
import scipy.ndimage as ndi
import matplotlib.pyplot as plt
# Make a small array for the demonstration.
# The ndimage.label() function treats 0 as the "background".
a = np.zeros((16, 16), dtype=int)
a[:6, :8] = 1
a[9:, :5] = 1
a[8:, 13:] = 2
a[5:13, 6:12] = 3
struct = ndi.generate_binary_structure(2, 2)
lbl, n = ndi.label(a, struct)
# Plot the original array.
plt.figure(figsize=(11, 4))
plt.subplot(1, n + 1, 1)
plt.imshow(a, interpolation='nearest')
plt.title("Original")
plt.axis('off')
# Plot the isolated features found by label().
for i in range(1, n + 1):
# Make an array of zeros the same shape as `a`.
feature = np.zeros_like(a, dtype=int)
# Set the elements that are part of feature i to 1.
# Feature i consists of elements in `lbl` where the value is i.
# This statement uses numpy's "fancy indexing" to set the corresponding
# elements of `feature` to 1.
feature[lbl == i] = 1
# Make an image plot of the feature.
plt.subplot(1, n + 1, i + 1)
plt.imshow(feature, interpolation='nearest', cmap=plt.cm.copper)
plt.title("Feature {:d}".format(i))
plt.axis('off')
plt.show()
Here's the image generated by the script:
Just a quick note on an alternative way to solve the above mentioned problem. Instead of using the NumPy "fanzy indexing" one could also use the ndimage "find_objects" function.
example:
# Returns a list of slices for the labeled array. The slices represent the position of features in the labeled area
s = ndi.find_objects(lbl, max_label=0)
# Then you can simply output the patches
for i in n:
print a[s[i]]
I will leave the question open because i couldn't solve an additional arising problem. I want to get the size of the features (already solved, quite easy via ndi.sum() ) as well as the number of nonlabeled cells in direct vicinity of the feature (ergo counting the number of zeros around the feature).

Categories