I have multichannel microscopy images and would like to use the skimage regionprops_table function with extra_properties that calculate relationships between different channels.
E.g. I have a 2 channel image and for every segmented element I want to measure the correlation, euclidean distance and others.
So rather than calculating the same property for every region in all channels I want to calculate a relationship between the channels at every region. Therefore I expect a single column as a result also.
Example:
from skimage import measure, segmentation
from skimage import data
from sklearn.metrics.pairwise import euclidean_distances
coffee = data.coffee()
labels = segmentation.slic(coffee, start_label=1)
def euclidean_distance(regionmask, intensity_image):
dist = np.linalg.norm(intensity_image[regionmask])
return dist
props = regionprops_table(labels,intensity_image=coffee, extra_properties=(euclidean_distance,))
The code runs but it doesn't compute the difference between channels. Instead it calculates a euclidean distance within every region for the two channels separately.
Instead I want to have something like this:
def euclidean_distance2(regionmask, intensity_image):
dist = np.linalg.norm(intensity_image[regionmask][...,0].flatten()- intensity_image[regionmask][...,1].flatten())
return dist
This should calculate the distance between the two channels in every region. But this doesn't work.
props = regionprops_table(labels,intensity_image=coffee, extra_properties=(euclidean_distance2,))
>IndexError: index 1 is out of bounds for axis 0 with size 1
I hope I am making this clear. Let me know if it is not.
I think I understand what you are after but I'm not clear on why you don't want to want to just run regionprops_table on the separate channels and then calculate the distance between objects? It seems like you're not saving on speed or memory usage by using extra_properties parameter but maybe I'm missing something.
Related
After reading this post, and play also with SciKit-image I found a difference in Python compared to MATLAB's function imregionalmax.
I have these lines of code:
from skimage.feature import peak_local_max
manos = np.ones([5,5])
manos[2,2] = 0.
manos[2,4] = 2.
giannis = peak_local_max(manos,min_distance=1, indices=False, exclude_border=False)
giorgos = ndimage.filters.maximum_filter(manos, footprint=np.ones([3,3]))
giorgos = (giorgos == manos)
I would expect a 2D array with only one True value ([2,4]) for variables giannis or giorgos as I get in MATLAB. Instead I take more than one maximum.
Any idea why this works this way and how to make it work like in MATLAB?
Both giannis and giorgos are similar in that they find pixels that are equal or larger than the other pixels in the 3x3 neighborhood. I believe giannis would have some additional thresholding.
Neither of these methods guarantee that the pixels found are actually local maxima. Note where I said "larger or equal" above. Any plateau in your image (a region where all pixels have the same value) that is large enough will be marked by the algorithm, no matter if they are local maxima, local minima or somewhere in between.
For example:
import numpy as np
import matplotlib.pyplot as pp
import scipy.ndimage as ndimage
manos = np.sin(np.arange(100)/10)
manos = np.round(30*manos)/30 # Rounding to create plateaus
giorgos = ndimage.filters.maximum_filter(manos, footprint=np.ones([3]))
giorgos = (giorgos == manos)
pp.plot(manos);
pp.plot(giorgos);
pp.show()
Note how the filter identified three points near the local minimum of the sinusoid. The middle one of these is the actual local minimum, the other two are plateaus that are neither local maxima nor minima.
In contrast, the MATLAB function imregionalmax identifies all plateaus that are surrounded by pixels with a lower value. The algorithm required to do this is very different from the one above. It can be efficiently accomplished using a Union-Find algorithm, or less efficiently using a flood-fill-type algorithm. The main idea is to find a pixel that is not lower than any neighbor, then expand from it to its equal-valued neighbors until the whole plateau has been explored or until you find one of the pixels in the plateau with a higher-valued neighbor.
One implementation available from Python is in DIPlib (note: I'm an author):
import diplib as dip
nikos = dip.Maxima(manos)
pp.plot(manos);
pp.plot(nikos);
pp.show()
Another implementation is in SciKit-Image (Thanks to Juan for pointing this out):
nikos = skimage.morphology.local_maxima(manos)
I am working on 3-D numpy array in python and want to do post-processing on CNN output of Brain Tumor Segmentation images. We get a 3-D (208x208x155) numpy array with values as 0/1/2/4 for each pixel. I want to remove the connected components with a threshold less than 1000 for better results.
I tried erosion-dilation but don't get good results. Can anyone help me?
Ok, so shrink and grow will, as you realised yourself, not be the way to approach this problem. What you need to do is region labelling, and it seems that Scipy has a method that will let you do that for nd images.
I assume that by threshold less than 1000 you mean sum of the pixel values in the connected components.
Here is an outline of how I would do it.
from scipy.ndimage import label
segmentation_mask = [...] # This should be your 3D mask.
# Let us create a binary mask.
# It is 0 everywhere `segmentation_mask` is 0 and 1 everywhere else.
binary_mask = segmentation_mask.copy()
binary_mask[binary_mask != 0] = 1
# Now, we perform region labelling. This way, every connected component
# will have their own colour value.
labelled_mask, num_labels = label(binary_mask)
# Let us now remove all the too small regions.
refined_mask = segmentation_mask.copy()
minimum_cc_sum = 1000
for label in range(num_labels):
if np.sum(refined_mask[labelled_mask == label]) < minimum_cc_sum:
refined_mask[labelled_mask == label] = 0
I am trying to to perform an image quantization (reducing the number of colors of an image) using one of the k-means algorithms of numpy/scipy for a school project. The algirithm works fine, but I also want to calculate the sum of error for each iteration of the algorithm, e.i. the sum of distances of samples to their closest cluster center (this is one of the project tasks).
I could't find any kmeans method of numpy or other fast, elegant way do perform this.
Is there such a way or method, and if not, what is the best way to perform this task? my goal is to minimize any re-implimentation of the existing kmeans algorithm.
Below I added my code so far
import scipy.cluster.vq as vq
def quantize_rgb(im_orig, n_quant, n_iter):
"""
A function that performs optimal quantization of a given RGB image.
:param im_orig: the input RGB image to be quantized (float32 image with values in [0, 1])
:param n_quant: the number of intensities the output image should have
:param n_iter: the maximum number of iterations of the optimization procedure (may converge earlier.)
"""
reshaped_im = im_orig.reshape(im_orig.shape[0] * im_orig.shape[1], 3)
centroids, label = vq.kmeans2(reshaped_im, n_quant, n_iter)
reshaped_im = centroids[label]
im_quant = reshaped_im.reshape(im_orig.shape[0], im_orig.shape[1], 3)
return im_quant
Simply use
vq.kmeans2(k=previous_centers, iter=1, minit="matrix")
to only do one iteration at a time.
I'm implementing a CNN with Theano. In the paper, I have to do this image preprocess before train the CNN
We extracted RGB patches of 61x61 dimensions associated with each poselet activation, subtracted the mean and used this data to train the convnet model shown in Table 1
Can you tell me what does it mean with "subtracted the mean"? Tell me if these steps are correct (it is what I understood)
1) Compute the mean for Red Channel, Green Channel and Blue Channel for the whole image
2) For each pixel, subtract from red value the mean of red channel, from green value the mean of green channel and the same for the blue channel
3) Is it correct to have negative value or do I have use the abs?
Thanks all!!
You should read paper carefully, but what is the most probable is that they mean mean of the patches, so you have N matrices 61x61 pixels, which is equivalent of a vector of length 61^2 (if there are three channels then 3*61^2). What they do - they simple compute mean of each dimension, so they calculate mean over these N vectors in respect to each of the 3*61^2 dimensions. As the result they obtain a mean vector of length 3*61^2 (or mean matrix/mean patch if you prefer) and they substract it from all of these N patches. Resulting patches will have negatives values, it is perfectly fine, you should not take abs value, neural networks prefer this kind of data.
I would assume the mean mentioned in the paper is the mean over all images used in the training set (computed separately for each channel).
Several indications:
Caffe is a lib for ConvNets. In their tutorial they mention the compute image mean part: http://caffe.berkeleyvision.org/gathered/examples/imagenet.html
For this they use the following script: https://github.com/BVLC/caffe/blob/master/examples/imagenet/make_imagenet_mean.sh
which does what I indicated.
Google played around with ConvNets and published their code here: https://github.com/google/deepdream/blob/master/dream.ipynb and they do also use the mean of the training set.
This is of course only indirect evidence since I can not explain you why this happens. In fact I stumbled over this question while trying to figure out precisely that.
//EDIT:
In the mean time I found a source confirming my claim (Highlighting added by me):
There are three common forms of data preprocessing a data matrix X [...]
Mean subtraction is the most common form of preprocessing. It
involves subtracting the mean across every individual feature in the
data, and has the geometric interpretation of centering the cloud of
data around the origin along every dimension. In numpy, this operation
would be implemented as: X -= np.mean(X, axis = 0). With images
specifically, for convenience it can be common to subtract a single
value from all pixels (e.g. X -= np.mean(X)), or to do so separately
across the three color channels.
As we can see, the whole data is used to compute the mean.
I need to convert a codebase relying on the scipy.cluster.vq module to not use scipy so that I can implement it in C++.
First I am trying to replicate the results using only numpy.
Starting with an image of dimensions MxNx3 , I create a "centroids" Kx3 array using kmeans with opencv.
I need to map each pixel of the original image to the pixel value in the centroids array that is closest to the original pixel.
I have it working, but performance is awful. I'm sure there must be more advanced ways to compute this, and I suspect it's related to a nearest neighbour search (maybe?) but don't know for sure.
Here is what I'm currently doing: I think this may be called a "brute force" approach
iterate over every pixel in the image
calculate the euclidean distance between this pixel and each pixel in the centroid list
return the minimum value from the list generated in step 2
assign the original image pixel to the value of the centroids list that returned the minimum distance.
def vq(self,image,centroids):
x,y,z = image.shape
Z=np.reshape(image,(x*y,z))
counts = np.zeros(len(centroids))
clusterMap = np.zeros(Z.shape,np.uint8)
for i in range(Z.shape[0]):
color = Z[i]
closestIndex = self.getClosestCenter(color, centroids)
counts[closestIndex]+=1# tracking how often each color occurs
clusterMap[i] = centroids[closestIndex]
return clusterMap,counts
def getClosestCenter(self,color,centers):
distances = [0 for i in range(len(centers))]
for i,center in enumerate(centers):
distances[i] = self.getDistance(color, center)
return distances.index(min(distances))
def getDistance(self,value1,value2):
if len(value1) !=len(value2): return None #error
sum = 0
for i in range(len(value1)):
sum+=(value1[i]-value2[i])**2
return sum**(0.5)
First of all, profile your code to see where exactly it is slow.
Constructs such as enumerate can be very expensive because they require the creation and garbage collection of many tuple objects. A good rule of thumb is to avoid object allocations in inner loops and functions (this includes hidden objects such as tuples)
Last but not least, kmeans does not use Euclidean distance. It uses sum-of-squares. Get rid of the square root.