Remove connected components below a threshold in a 3-d array - python

I am working on 3-D numpy array in python and want to do post-processing on CNN output of Brain Tumor Segmentation images. We get a 3-D (208x208x155) numpy array with values as 0/1/2/4 for each pixel. I want to remove the connected components with a threshold less than 1000 for better results.
I tried erosion-dilation but don't get good results. Can anyone help me?

Ok, so shrink and grow will, as you realised yourself, not be the way to approach this problem. What you need to do is region labelling, and it seems that Scipy has a method that will let you do that for nd images.
I assume that by threshold less than 1000 you mean sum of the pixel values in the connected components.
Here is an outline of how I would do it.
from scipy.ndimage import label
segmentation_mask = [...] # This should be your 3D mask.
# Let us create a binary mask.
# It is 0 everywhere `segmentation_mask` is 0 and 1 everywhere else.
binary_mask = segmentation_mask.copy()
binary_mask[binary_mask != 0] = 1
# Now, we perform region labelling. This way, every connected component
# will have their own colour value.
labelled_mask, num_labels = label(binary_mask)
# Let us now remove all the too small regions.
refined_mask = segmentation_mask.copy()
minimum_cc_sum = 1000
for label in range(num_labels):
if np.sum(refined_mask[labelled_mask == label]) < minimum_cc_sum:
refined_mask[labelled_mask == label] = 0

Related

skimage regionprops_table extra_properties multichannel relationship between channels example

I have multichannel microscopy images and would like to use the skimage regionprops_table function with extra_properties that calculate relationships between different channels.
E.g. I have a 2 channel image and for every segmented element I want to measure the correlation, euclidean distance and others.
So rather than calculating the same property for every region in all channels I want to calculate a relationship between the channels at every region. Therefore I expect a single column as a result also.
Example:
from skimage import measure, segmentation
from skimage import data
from sklearn.metrics.pairwise import euclidean_distances
coffee = data.coffee()
labels = segmentation.slic(coffee, start_label=1)
def euclidean_distance(regionmask, intensity_image):
dist = np.linalg.norm(intensity_image[regionmask])
return dist
props = regionprops_table(labels,intensity_image=coffee, extra_properties=(euclidean_distance,))
The code runs but it doesn't compute the difference between channels. Instead it calculates a euclidean distance within every region for the two channels separately.
Instead I want to have something like this:
def euclidean_distance2(regionmask, intensity_image):
dist = np.linalg.norm(intensity_image[regionmask][...,0].flatten()- intensity_image[regionmask][...,1].flatten())
return dist
This should calculate the distance between the two channels in every region. But this doesn't work.
props = regionprops_table(labels,intensity_image=coffee, extra_properties=(euclidean_distance2,))
>IndexError: index 1 is out of bounds for axis 0 with size 1
I hope I am making this clear. Let me know if it is not.
I think I understand what you are after but I'm not clear on why you don't want to want to just run regionprops_table on the separate channels and then calculate the distance between objects? It seems like you're not saving on speed or memory usage by using extra_properties parameter but maybe I'm missing something.

How to fill in missing center of a 2d gaussian

I have a 2d gaussian whose center has been destroyed by pixel saturation. I need the center to be filled in because a poorly filled in center will confuse a neural network I'm trying to train. See below:
The scattered nan values I can handle fairly easily, but the large cluster in the gaussian's center I cannot.
I've tried various methods to correct this, but none seem to work in the sense that the gaussian is filled in correctly.
Here are some other similar answers that I've tried:
Python Image Processing - How to remove certain contour and blend the value with surrounding pixels?
https://docs.astropy.org/en/stable/convolution/index.html
These work well for the small discrete nans floating around the image, but don't adequately address the center cluster.
This is what I get with convolution infilling:
I've taken slices of the centers as well.
I do actually have a reference image that does not have nans. However, the scaling of the pixel values are not constant, so I've made a function that takes into account the different scaling of each pixel.
def mult_mean_surround(s_arr, c_arr, coord):
directions = np.array([[1,0],[-1,0],[0,1],[0,-1],[1,1],[1,-1],[-1,-1],[-1,1]])
s = np.array([])
for i in directions:
try:
if not np.isnan(s_arr[coord[0]+i[0],coord[1]+i[1]]):
s=np.append(s,s_arr[coord[0]+i[0],coord[1]+i[1]]/c_arr[coord[0]+i[0],coord[1]+i[1]])
except IndexError:
pass
if len(s)!=0:
s_arr[coord[0],coord[1]] = c_arr[coord[0],coord[1]] *np.mean(s)
It copies the corresponding pixel values of the reference image and scales it to the correct amount.
Ideally, it would look something like this:
The center is brighter than the rim and it looks more like a gaussian. However, this method is also substantially slower than the rest, so I'm not sure how to get around either of my issues. I've tried boosting speed with cupy to no luck as shown here: Boosting algorithm with cupy
If anyone has any helpful ideas, that would be great.
I am assuming that you are filling the 'hole' with only one gaussian.
First make a mask of all the NaNs, i.e. NaN = 1, not NaN = 0.
You can do a neighbor-count check to remove all mask pixels with no neighbors, then use a clustering algorithm (like DBSCAN) to find the largest cluster of pixels.
Calculate the centroid, width (max x - min x), and height (max y - min y) of the resulting cluster.
You can then use the following code:
import math
def gaussian_fit(query_x,query_y,
centroid_x,centroid_y,
filter_w,filter_h,
sigma_at_edge = 1.0):
x_coord = (query_x - centroid_x) * 2 / (filter_w * sigma_at_edge)
y_coord = (query_y - centroid_y) * 2 / (filter_h * sigma_at_edge)
return math.exp(-1.0*(x_coord**2+y_coord**2))
You may need to rescale the result by some constant.

How to normalize colors acquiring a single color?

I have to build an algorithm that takes an RBG image and returns the image turned into a wood-like mosaic. For this, I was given some wood tablets samples as seen in the image below:
I'd like to know how I can normalize the colors of each tablet, resulting in a single color, so I can build a map of reference colors to convert the input image colors to.
I've searched for how to achieve that, but I only found a Wikipedia article, but I couldn't understand much of it.
Thanks in advance for all help you might provide me.
PS: I'm considering using Python to develop this. So if you come up with something done using this language, I'd really appreciate it.
The way to get the average color is to simply take the average of the RGB values.
To get a more accurate average you should do this with linear color values. Usually RGB uses a gamma corrected value, but you can easily undo it then redo it once you have the average. Here's how you'd do it with Python's PIL using a gamma of 2.2:
def average_color(sample):
pix = sample.load()
totals = [0.0, 0.0, 0.0]
for y in range(sample.size[1]):
for x in range(sample.size[0]):
color = pix[x,y]
for c in range(3):
totals[c] += color[c] ** 2.2
count = sample.size[0] * sample.size[1]
color = tuple(int(round((totals[c] / count) ** (1/2.2))) for c in range(3))
return color
For the sample in the upper left of your examples, the result is (144, 82, 66). Here's a visual of all of them:
To make one color represent a tile, a simple option would be to find the mean color of a random sample of pixels in a specific tile. You can choose an appropriate sample size as a trade-off between speed and accuracy.
For your specific use case, I'd recommend further division of tiles, say into 3 columns (because of the top-to-bottom design of most wood panels). Find the mean color of each column and eliminate any which is beyond a certain measure of variance. This is to try to ensure that tiles such as the right most one in the 4th row don't get mapped to the darker shade.
An alternate approach would be to convert both your input image and these wood tiles in to and carry out your processing in grayscale. The opencv library has various simple functions for RGB2GRAYconversions.
One trivial way to normalize the colors is to simply force the mean and standard deviation of RGB values in all images to be the same.
Here is an example with the two panels at the top of the left column in the example image. I'm using MATLAB with DIPimage 3.0, because that is what I know, but this is trivial enough to implement in Python with NumPy, or any other desired language/library:
img = readim('https://i.stack.imgur.com/HK6VY.png')
tab1 = dipcrop; % Interactive cropping of a tile from the displayed image
tab2 = dipcrop;
m1 = mean(tab1);
s1 = std(tab1);
m2 = mean(tab2);
s2 = std(tab2);
tab2b = (tab2 - m2) ./ s2 .* s1 + m1;
What the code does to the image tab2 is, on a per-channel basis, to subtract the mean and divide by the standard deviation. Next, it multiplies each channel by the standard deviation of the corresponding channel of the template image, and adds the mean of that channel.

Laplacian of Gaussian Edge Detector Being Affected by Change of Mask Size

For a class, I've written a Laplacian of Gaussian edge detector that works in the following way.
Make a Laplacian of Gaussian mask given the variance of the Gaussian the size of the mask
Convolve it with the image
Find the zero crossings in a really shoddy manner, these are the edges of the image
If you so desire, the code for this program can be viewed here, but the most important part is where I create my Gaussian mask which depends on two functions that I've reproduced here for your convenience:
# Function for calculating the laplacian of the gaussian at a given point and with a given variance
def l_o_g(x, y, sigma):
# Formatted this way for readability
nom = ( (y**2)+(x**2)-2*(sigma**2) )
denom = ( (2*math.pi*(sigma**6) ))
expo = math.exp( -((x**2)+(y**2))/(2*(sigma**2)) )
return nom*expo/denom
# Create the laplacian of the gaussian, given a sigma
# Note the recommended size is 7 according to this website http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm
# Experimentally, I've found 6 to be much more reliable for images with clear edges and 4 to be better for images with a lot of little edges
def create_log(sigma, size = 7):
w = math.ceil(float(size)*float(sigma))
# If the dimension is an even number, make it uneven
if(w%2 == 0):
print "even number detected, incrementing"
w = w + 1
# Now make the mask
l_o_g_mask = []
w_range = int(math.floor(w/2))
print "Going from " + str(-w_range) + " to " + str(w_range)
for i in range_inc(-w_range, w_range):
for j in range_inc(-w_range, w_range):
l_o_g_mask.append(l_o_g(i,j,sigma))
l_o_g_mask = np.array(l_o_g_mask)
l_o_g_mask = l_o_g_mask.reshape(w,w)
return l_o_g_mask
All in all, it works relatively well, even if it is extremely slow because I don't know how to leverage Numpy. However, whenever I change the size of the Gaussian mask, the thickness of the edges I detect change drastically.
Here is the image run with a size of mask equivalent to 4 times the given variance of the Gaussian:
Here is the same image run with a size of mask equivalent to 6 times the variance:
I'm kind of baffled, because the only thing the size parameter should change is the accuracy of the approximation of the Laplacian of Gaussian mask before I begin to convolve it with the image. So I ran a test where I wanted to vizualize how my mask looked given different size parameters.
Here it is with a size of 4:
Here it is with a size of 6:
The shape of the function seems to be the same as far as I can tell from the zero crossings (they happen to be spaced around four pixels apart) and their peaks. Is there a better way to check?
Any suggestions as to why this issue might be occurring or how to investigate further are appreciated.
It turns out your concept about the effect of increasing the mask size is wrong. Increasing the size doesn't actually improve the quality of approximation or the resolution of the function. To explain, instead of using a complicated 2D function like the Laplacian of the Gaussian, let's take things back down to the one dimension and pretend we are approximating the function f(x) = x^2.
Now you code for calculating the function would look like this:
def derp(theta, size):
w = math.ceil(float(size)*float(sigma))
# If the dimension is an even number, make it uneven
if(w%2 == 0):
print "even number detected, incrementing"
w = w + 1
# Now make the mask
x_mask = []
w_range = int(math.floor(w/2))
print "Going from " + str(-w_range) + " to " + str(w_range)
for i in range_inc(-w_range, w_range):
x_mask = a*i^2
If you were to increase the "size" of this function, you wouldn't be increasing the resolution, you're actually increasing the range of values of x that you're grabbing from. For example, for a size of 3 you're evaluating -1, 0, 1, for a size of 5 you're evaluating -2, -1, 0, 1, 2. Notice this doesn't increase the spacing between the pixels. This is what you're actually seeing when you talk about the zero crossing occurring the same number of pixels apart.
Consequently, when convoluting with this really silly mask, you would get really different results. But what if we went back to the Laplacian of the Gaussian?
Well, the nice property the Laplacian of the Gaussian has is that the farther you go with it, the more zero values you get. So unlike our silly x^2 function, you should be getting the same results after some time.
Now, I think the reason you didn't see this with your test cases is because they were too limited in size, because your program is too slow for you to really see the difference between size=15 and size=20, but if were to actually run those cases I think you would see that the image doesn't change that much.
This still doesn't answer what you should be doing, for that, we're going to have to look to the professionals. Namely, the implementation of the gaussian_filter in Scipy (source here).
When you look at their source code, the first thing you'll notice is that when creating their mask they're basically doing the same thing as you. They are always using an integer step size and they are scaling the size of the mask by it's standard deviation.
As to why they are doing it that way, I can't answer, since I don't have that much of an in depth knowledge of image processing or Scipy. However, this may make for a good new question to ask on SO.

Finding matching submatrices inside a matrix

I have a 100x200 2D array expressed as a numpy array consisting of black (0) and white (255) cells. It is a bitmap file. I then have 2D shapes (it's easiest to think of them as letters) that are also 2D black and white cells.
I know I can naively iterate through the matrix but this is going to be a 'hot' portion of my code so speed is an concern. Is there a fast way to perform this in numpy/scipy?
I looked briefly at Scipy's correlate function. I am not interested in 'fuzzy matches', only exact matches. I also looked at some academic papers but they are above my head.
You can use correlate. You'll need to set your black values to -1 and your white values to 1 (or vice-versa) so that you know the value of the peak of the correlation, and that it only occurs with the correct letter.
The following code does what I think you want.
import numpy
from scipy import signal
# Set up the inputs
a = numpy.random.randn(100, 200)
a[a<0] = 0
a[a>0] = 255
b = numpy.random.randn(20, 20)
b[b<0] = 0
b[b>0] = 255
# put b somewhere in a
a[37:37+b.shape[0], 84:84+b.shape[1]] = b
# Now the actual solution...
# Set the black values to -1
a[a==0] = -1
b[b==0] = -1
# and the white values to 1
a[a==255] = 1
b[b==255] = 1
max_peak = numpy.prod(b.shape)
# c will contain max_peak where the overlap is perfect
c = signal.correlate(a, b, 'valid')
overlaps = numpy.where(c == max_peak)
print overlaps
This outputs (array([37]), array([84])), the locations of the offsets set in the code.
You will likely find that if your letter size multiplied by your big array size is bigger than roughly Nlog(N), where N is corresponding size of the big array in which you're searching (for each dimension), then you will probably get a speed up by using an fft based algorithm like scipy.signal.fftconvolve (bearing in mind that you'll need to flip each axis of one of the datasets if you're using a convolution rather than a correlation - flipud and fliplr). The only modification would be to assigning c:
c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')
Comparing the timings on the sizes above:
In [5]: timeit c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')
100 loops, best of 3: 6.78 ms per loop
In [6]: timeit c = signal.correlate(a, b, 'valid')
10 loops, best of 3: 151 ms per loop
Here is a method you may be able to use, or adapt, depending upon the details of your requirements. It uses ndimage.label and ndimage.find_objects:
label the image using ndimage.label this finds all blobs in the array and labels them to integers.
Get the slices of these blobs using ndimage.find_objects
Then use set intersection to see if the found blobs correspond with your wanted blobs
Code for 1. and 2.:
import scipy
from scipy import ndimage
import matplotlib.pyplot as plt
#flatten to ensure greyscale.
im = scipy.misc.imread('letters.png',flatten=1)
objects, number_of_objects = ndimage.label(im)
letters = ndimage.find_objects(objects)
#to save the images for illustrative purposes only:
plt.imsave('ob.png',objects)
for i,j in enumerate(letters):
plt.imsave('ob'+str(i)+'.png',objects[j])
example input:
labelled:
isolated blobs to test against:

Categories