I have a task of identifying the number of bacterial colonies on a relatively diverse set of top-down photos of a Petri dish located on a table. The basic process is the following:
detect the Petri dish on the image, crop everything outside of it;
apply binary thresholding which should result in a black background and white colonies or clusters thereof;
use simple blob detector or watershed to identify the colonies, highlight them on the source image and output their count.
Input example 1 Input example 2 Input example 3: an edge case
Problem 1
The table around the Petri dish isn't smooth and contains spots so I usually use Hough transform to detect the dish and remove everything outside of it. The problem is that there are light reflections near the edge of the Petri dish represented as rings with their radius on par with that of the dish edge, as well as other reflections that obscure the view of the colonies and affect the thresholding applied. So I need reliable code for detecting the innermost circle that has roughly the same centre as the outer border of the Petri dish and doesn't contain any further reflections, i.e. cropping at the outer border is sub-optimal.
Cropping attempt
fig. 1. Grab first detected circle with a radius within the range of [int(image.shape[1]/4),int(image.shape[1]/2)] from circle Hough Transform, use a mask and crop to [x+r:x-r,y+r:y-r]
Problem 2.1
The colonies have a colour usually close to the colour of the background (the agar) and in different areas these can overlap (e.g. colony colour in a section A has the same colour as the background in a section B). This renders the method of general thresholding useless. Different photos having different brightness is an issue as well in the context of the binary method and its rigid parameters - for some images a param of (184,255) is useful while on others only a setting as low as (120,255) results in something half-usable.
Gaussian blur, pyrmeanshift, binary threshold
fig 2.1. Gaussian blur (3,3) + pyrmeanshift (6,27) + binary threshold (205,255)
Problem 2.2
The bacterial colonies have round shapes which sometimes form clusters of overlapping circles so the simple blob detector tends to ignore those. The algorithm is supposed to detect the cluster and identify how many colonies (circles) are in it. To tackle this, I've tried Euclidean distance transform coupled with watershed as an alternative to simple blob detector but this needs to be fed a clean image not containing anything other than the colonies themselves, so a robust threshold algorithm is required for removing all the light reflections and eliminating the background's (agar's) gradient. There are also many spots on the Petri dish usually smaller in size than the colonies and not really round - these should be ignored by the detector algorithm. I've heard of adaptive thresholding used for overcoming the problem of a varied background but this tends to convert non-colony small spots on the dish into full-fledged circles which isn't very optimal.
Adaptive threshold - Gaussian method
fig 2.2. Adaptive threshold (Gaussian C)
An attempt at detecting colonies on input example 2
fig 3. A failed attempt at using watershed with distance transform, demonstrating that this algorithm requires a well cleaned-up and properly thresholded input
I'm interested to know whether this task is feasible in the context of a varied collection of photos taken in different lighting conditions as well as different colonies having different sizes and colours. If so, what ways are there to approach this?
I am performing adaptive thresholding on an image to find an object's borders. However, on some points of the object, the adaptive threshold has gaps and spills over into other objects (due to colour similarity in rare points on the object's border). These gaps are too large to fill in using morphological closing (dilate followed by erosion).
Is there a way to detect these and remove them, considering they don't occur very often, and are relatively small to the perimeter of the entire shape.
I've found that just increasing iterations/window sizes for closing just messes up things too much to be useful.
I want to perform houghline transform after a good adaptive threshold, but the adaptive threshold spilling out of the object of interest's boundaries add many more potential line detections into the binary image.
I thought maybe running kernels over the detected lines in those images and adding white pixels if sufficiently small breaks are detected in the adaptive threshold borders? Has anyone tried something like this?
Thanks!
Example gap in threshold before morph close
Example gap in threshold after morph close
I am analyzing histology tissue images stained with a specific protein marker which I would like to identify the positive pixels for that marker. My problem is that thresholding on the image gives too much false positives which I'd like to exclude.
I am using color deconvolution (separate_stains from skimage.color) to get the AEC channel (corresponding to the red marker), separating it from the background (Hematoxylin blue color) and applying cv2 Otsu thresholding to identify the positive pixels using cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU), but it is also picking up the tissue boundaries (see white lines in the example picture, sometimes it even has random colors other than white) and sometimes even non positive cells (blue regions in the example picture). It's also missing some faint positive pixels which I'd like to capture.
Overall: (1) how do I filter the false positive tissue boundaries and blue pixels? and (2) how do I adjust the Otsu thresholding to capture the faint red positives?
Adding a revised example image -
top left the original image after using HistoQC to identify tissue regions and apply the mask it identified on the tissue such that all of the non-tissue regions are black. I should tru to adjust its parameters to exclude the folded tissue regions which appear more dark (towards the bottom left of this image). Suggestions for other tools to identify tissue regions are welcome.
top right hematoxylin after the deconvolution
bottom left AEC after the deconvolution
bottom right Otsu thresholding applied not the original RGB image trying to capture only the AEC positives pixels but showing also false positives and false negatives
Thanks
#cris-luengo thank you for your input on scikit-image! I am one of the core developers, and based on #assafb input, we are trying to rewrite the code on color/colorconv/separate_stains.
#Assafb: The negative log10 transformation is the Beer-Lambert mapping. What I don't understand in that code is the line rgb += 2. I don't know where that comes from or why they use it. I'm 100% sure it is wrong. I guess they're trying to avoid log10(0), but that should be done differently. I bet this is where your negative values come from, though.
Yes, apparently (I am not the original author of this code) we use rgb += 2 to avoid log10(0). I checked Fiji's Colour Deconvolution plugin, and they add 1 to their input. I tested several input numbers to help on that, and ~2 would let us closer to the desirable results.
#Assafb: Compare the implementation in skimage with what is described in the original paper. You'll see several errors in the implementation, most importantly the lack of a division by the max intensity. They should have used -np.log10(rgb/255) (assuming that 255 is the illumination intensity), rater than -np.log10(rgb).
Our input data is float; the max intensity in this case would be 1. I'd say that that's the reason we don't divide by something.
Besides that, I opened an issue on scikit-image to discuss these problems — and to specify a solution. I made some research already — I even checked DIPlib's documentation —, and implemented a different version of that specific function. However, stains are not my main area of expertise, and we would be glad if you could help evaluating that code — and maybe pointing a better solution.
Thank you again for your help!
There are several issues that cause improper quantification. I'll go over the details of how I would recommend you tackle these slides.
I'm using DIPlib, because I'm most familiar with it (I'm an author). It has Python bindings, which I use here, and can be installed with pip install diplib. However, none of this is complicated image processing, and you should be able to do similar processing with other libraries.
Loading image
There is nothing special here, except that the image has strong JPEG compression artifacts, which can interfere with the stain unmixing. We help the process a bit by smoothing the image with a small Gaussian filter.
import diplib as dip
import numpy as np
image = dip.ImageRead('example.png')
image = dip.Gauss(image, [1]) # because of the severe JPEG compression artifacts
Stain unmixing
[Personal note: I find it unfortunate that Ruifrok and Johnston, the authors of the paper presenting the stain unmixing method, called it "deconvolution", since that term already had an established meaning in image processing, especially in combination with microscopy. I always refer to this as "stain unmixing", never "deconvolution".]
This should always be the first step in any attempt at quantifying from a bightfield image. There are three important RGB triplets that you need to determine here: the RGB value of the background (which is the brightness of the light source), and the RGB value of each of the stains. The unmixing process has two components:
First we apply the Beer-Lambert mapping. This mapping is non-linear. It converts the transmitted light (as recorded by the microscope) into absorbance values. Absorbance indicates how strongly each point on the slide absorbs light of the various wavelengths. The stains absorb light, and differ by the relative absorbance in each of the R, G and B channels of the camera.
background_intensity = [209, 208, 215]
image = dip.BeerLambertMapping(image, background_intensity)
I manually determined the background intensity, but you can automate that process quite well if you have whole slide images: in whole slide images, the edges of the image always correspond to background, so you can look there for intensities.
The second step is the actual unmixing. The mixing of absorbances is a linear process, so the unmixing is solving of a set of linear equations at each pixel. For this we need to know the absorbance values for each of the stains in each of the channels. Using standard values (as in skimage.color.hax_from_rgb) might give a good first approximation, but rarely will provide the best quantification.
Stain colors change from assay to assay (for example, hematoxylin has a different color depending on who made it, what tissue is stained, etc.), and change also depending on the camera used to image the slide (each model has different RGB filters). The best way to determine these colors is to prepare a slide for each stain, using all the same protocol but not putting on the other dyes. From these slides you can easily obtain stain colors that are valid for your assay and your slide scanner. This is however rarely if ever done in practice.
A more practical solution involves estimating colors from the slide itself. By finding a spot on the slide where you see each of the stains individually (where stains are not mixed) one can manually determine fairly good values. It is possible to automatically determine appropriate values, but is much more complex and it'll be hard finding an existing implementation. There are a few papers out there that show how to do this with non-negative matrix factorization with a sparsity constraint, which IMO is the best approach we have.
hematoxylin_color = np.array([0.2712, 0.2448, 0.1674])
hematoxylin_color = (hematoxylin_color/np.linalg.norm(hematoxylin_color)).tolist()
aec_color = np.array([0.2129, 0.2806, 0.4348])
aec_color = (aec_color/np.linalg.norm(aec_color)).tolist()
stains = dip.UnmixStains(image, [hematoxylin_color, aec_color])
stains = dip.ClipLow(stains, 0) # set negative values to 0
hematoxylin = stains.TensorElement(0)
aec = stains.TensorElement(1)
Note how the linear unmixing can lead to negative values. This is a result of incorrect color vectors, noise, JPEG artifacts, and things on the slide that absorb light that are not the two stains we defined.
Identifying tissue area
You already have a good method for this, which is applied to the original RGB image. However, don't apply the mask to the original image before doing the unmixing above, keep the mask as a separate image. I wrote the next bit of code that finds tissue area based on the hematoxylin stain. It's not very good, and it's not hard to improve it, but I didn't want to waste too much time here.
tissue = dip.MedianFilter(hematoxylin, dip.Kernel(5))
tissue = dip.Dilation(tissue, [20])
tissue = dip.Closing(tissue, [50])
area = tissue > 0.2
Identifying tissue folds
You were asking about this step too. Tissue folds typically appear as larger darker regions in the image. It is not trivial to find an automatic method to identify them, because a lot of other things can create darker regions in the image too. Manual annotation is a good start, if you collect enough manually annotated examples you could train a Deep Learning model to help you out. I did this just as a place holder, again it's not very good, and identifies some positive regions as folds. Folds are subtracted from the tissue area mask.
folds = dip.Gauss(hematoxylin - aec, [20])
area -= folds > 0.2
Identifying positive pixels
It is important to use a fixed threshold for this. Only a pathologist can tell you what the threshold should be, they are the gold-standard for what constitutes positive and negative.
Note that the slides must all have been prepared following the same protocol. In clinical settings this is relatively easy because the assays used are standardized and validated, and produce a known, limited variation in staining. In an experimental setting, where assays are less strictly controlled, you might see more variation in staining quality. You will even see variation in staining color, unfortunately. You can use automated thresholding methods to at least get some data out, but there will be biases that you cannot control. I don't think there is a way out: inconsistent stain in, inconsistent data out.
Using an image-content-based method such as Otsu causes the threshold to vary from sample to sample. For example, in samples with few positive pixels the threshold will be lower than other samples, yielding a relative overestimation of the percent positive.
positive = aec > 0.1 # pick a threshold according to pathologist's idea what is positive and what is not
pp = 100 * dip.Count(dip.And(positive, area)) / dip.Count(area)
print("Percent positive:", pp)
I get a 1.35% in this sample. Note that the % positive pixels is not necessarily related to the % positive cells, and should not be used as a substitute.
I ended up incorporating some of the feedback given above by Chris into the following possible unconventional solution for which I would appreciate getting feedback (to the specific questions below but also general suggestions for improvement or more effective/accurate tools or strategy):
Define (but not apply yet) tissue mask (HistoQC) after optimizing HistoQC script to remove as much of the tissue folds as possible without removing normal tissue area
Apply deconvolution on the original RGB image using hax_from_rgb
Using the second channel which should correspond to the red stain pixels, and subtract from it the third channel which as far as I see corresponds to the background non-red/blue pixels of the image. This step removes the high values in the second channel that which up because of tissue folds or other artifacts that weren't removed in the first step (what does the third channel correspond to? The Green element of RGB?)
Blur the adjusted image and threshold based on the median of the image plus 20 (Semi-arbitrary but it works. Are there better alternatives? Otsu doesn't work here at all)
Apply the tissue regions mask on the thresholded image yielding only positive red/red-ish pixels without the non-tissue areas
Count the % of positive pixels relative to the tissue mask area
I have been trying to apply, as suggested above, the tissue mask on the deconvolution red channel output and then use Otsu thresholding. But it failed since the black background generated by the applying the tissue regions mask makes the Otsu threshold detect the entire tissue as positive. So I have proceeded instead to apply the threshold on the adjusted red channel and then apply the tissue mask before counting positive pixels. I am interested in learning what am I doing wrong here.
Other than that, the LoG transformation didn't seem to work well because it produced a lot of stretched bright segments rather than just circular blobs where cells are located. I'm not sure why this is happening.
Use ML for this case.
Create manually binary mask for your pictures: each red pixel - white, background pixels - black.
Work in HSV or Lab color space.
Train simple classifier: decision tree or SVM (linear or with RBF)..
Let's test!
See on a good and very simple example with skin color segmentation.
And in the future you can add new examples and new cases without code refactoring: just update dataset and retrain model.
For a school project I am trying to write a program in Python that tracks the movement of the pupil. In order to do that I am using OpenCV.
After looking up some tutorials on the internet, I noticed that almost everyone is using thresholding to achieve this, since a binary image is necessary for almost every step further down the road (e.g. HoughCircle Transofrmation, Contours). However, from my understanding thresholding is extremly light sensitive, therefore such an approach would only return good results in optimal lightning conditions.
So here comes my question: Is there any alternative or better approach than just Thresholding the image? Or is my understanding of thresholding in OpenCV wrong in the first place?
Here is a example image:
The purpose of thresholding is to segment the desired objects from the background where you can then perform additional processing (applying morphological operations) then perform contour filtering to further isolate the desired objects. Instead of applying image processing techniques on a BGR (3-channel) image or a grayscale (1-channel) image with range [0...255], thresholding allows us to obtain a binary image where every pixel is either 0 or 1 which makes distinguishing objects easier. Depending on your situation, there are many ways obtain a binary image, here are several methods:
cv2.Canny - Canny edge detection which uses a minVal and maxVal to determine edges
cv2.threshold - Simple thresholding with user selected arbitrary global threshold value
cv2.threshold + cv2.THRESH_OTSU - Otsu's thresholding to automatically calculate the threshold value.
cv2.adaptiveThreshold - Adaptive thresholding where the image has different lighting conditions in different areas. Essentially it will automatically calculate the threshold value for different regions of the image and gives better results with images with varying illumination
cv2.inRange - Color segmentation. The idea is to use lower and upper threshold ranges to obtain a binary image. Useful when trying to isolate a single color range
I am attempting to use machine learning (namely random forests) for image segmentation. The classifier utilizes a number of different pixel level features to classify pixels as either edge pixels or non edge pixels. I recently applied my classifier to a set of images that are pretty difficult to segment even manually (Image segmentation based on edge pixel map) and am still working on obtaining reasonable contours from the resulting probability map. I also applied the classifier to an easier set of images and am obtaining quite good predicted outlines (Rand index > 0.97) when I adjust the threshold to 0.95. I am interested in improving the segmentation result by filtering contours extracted from the probability map.
Here is the original image:
The expert outlines:
The probability map generated from my classifier:
This can be further refined when I convert the image to binary based on a threshold of 0.95:
I tried filling holes in the probability map, but that left me with a lot of noise and sometimes merged nearby cells. I also tried contour finding in openCV but this didn't work either as many of these contours are not completely connected - a few pixels will be missing here and there in the outlines.
Edit: I ended up using Canny edge detection on the probability map.
The initial image seems to be well contrasted and I guess we can simply threshold to obtain a good estimate of the cells. Here is a morphological area based filtering of the thresholded image:
Threshold:
Area based opening filter(this needs to be set based on your dataset of cells under study):
Area based closing filter(this needs to be set based on your dataset of cells under study):
Contours using I-Erosion(I):
Code snippet:
C is input image
C10 = C>10; %threshold depends on the average contrast in your dataset
C10_areaopen = bwareaopen(C10,2500); %area filters average remove small components that are not cells
C10_areaopenclose = ~bwareaopen(~C10_areaopen,100); %area filter fills holes
se = strel('disk',1);
figure, imshow(C10_areaopenclose-imerode(C10_areaopenclose,se)) %inner contour
To get smoother shapes I guess fine opening operations can be performed on the filtered images, thus removing any concave parts of the cells. Also for cells that are attached one could use the distance function and the watershed over the distance function to obtain segmentations of the cells: http://www.ias-iss.org/ojs/IAS/article/viewFile/862/765
I guess this can be also used on your probability/confidence maps to perform nonlinear area based filtering.