I am analyzing histology tissue images stained with a specific protein marker which I would like to identify the positive pixels for that marker. My problem is that thresholding on the image gives too much false positives which I'd like to exclude.
I am using color deconvolution (separate_stains from skimage.color) to get the AEC channel (corresponding to the red marker), separating it from the background (Hematoxylin blue color) and applying cv2 Otsu thresholding to identify the positive pixels using cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU), but it is also picking up the tissue boundaries (see white lines in the example picture, sometimes it even has random colors other than white) and sometimes even non positive cells (blue regions in the example picture). It's also missing some faint positive pixels which I'd like to capture.
Overall: (1) how do I filter the false positive tissue boundaries and blue pixels? and (2) how do I adjust the Otsu thresholding to capture the faint red positives?
Adding a revised example image -
top left the original image after using HistoQC to identify tissue regions and apply the mask it identified on the tissue such that all of the non-tissue regions are black. I should tru to adjust its parameters to exclude the folded tissue regions which appear more dark (towards the bottom left of this image). Suggestions for other tools to identify tissue regions are welcome.
top right hematoxylin after the deconvolution
bottom left AEC after the deconvolution
bottom right Otsu thresholding applied not the original RGB image trying to capture only the AEC positives pixels but showing also false positives and false negatives
Thanks
#cris-luengo thank you for your input on scikit-image! I am one of the core developers, and based on #assafb input, we are trying to rewrite the code on color/colorconv/separate_stains.
#Assafb: The negative log10 transformation is the Beer-Lambert mapping. What I don't understand in that code is the line rgb += 2. I don't know where that comes from or why they use it. I'm 100% sure it is wrong. I guess they're trying to avoid log10(0), but that should be done differently. I bet this is where your negative values come from, though.
Yes, apparently (I am not the original author of this code) we use rgb += 2 to avoid log10(0). I checked Fiji's Colour Deconvolution plugin, and they add 1 to their input. I tested several input numbers to help on that, and ~2 would let us closer to the desirable results.
#Assafb: Compare the implementation in skimage with what is described in the original paper. You'll see several errors in the implementation, most importantly the lack of a division by the max intensity. They should have used -np.log10(rgb/255) (assuming that 255 is the illumination intensity), rater than -np.log10(rgb).
Our input data is float; the max intensity in this case would be 1. I'd say that that's the reason we don't divide by something.
Besides that, I opened an issue on scikit-image to discuss these problems — and to specify a solution. I made some research already — I even checked DIPlib's documentation —, and implemented a different version of that specific function. However, stains are not my main area of expertise, and we would be glad if you could help evaluating that code — and maybe pointing a better solution.
Thank you again for your help!
There are several issues that cause improper quantification. I'll go over the details of how I would recommend you tackle these slides.
I'm using DIPlib, because I'm most familiar with it (I'm an author). It has Python bindings, which I use here, and can be installed with pip install diplib. However, none of this is complicated image processing, and you should be able to do similar processing with other libraries.
Loading image
There is nothing special here, except that the image has strong JPEG compression artifacts, which can interfere with the stain unmixing. We help the process a bit by smoothing the image with a small Gaussian filter.
import diplib as dip
import numpy as np
image = dip.ImageRead('example.png')
image = dip.Gauss(image, [1]) # because of the severe JPEG compression artifacts
Stain unmixing
[Personal note: I find it unfortunate that Ruifrok and Johnston, the authors of the paper presenting the stain unmixing method, called it "deconvolution", since that term already had an established meaning in image processing, especially in combination with microscopy. I always refer to this as "stain unmixing", never "deconvolution".]
This should always be the first step in any attempt at quantifying from a bightfield image. There are three important RGB triplets that you need to determine here: the RGB value of the background (which is the brightness of the light source), and the RGB value of each of the stains. The unmixing process has two components:
First we apply the Beer-Lambert mapping. This mapping is non-linear. It converts the transmitted light (as recorded by the microscope) into absorbance values. Absorbance indicates how strongly each point on the slide absorbs light of the various wavelengths. The stains absorb light, and differ by the relative absorbance in each of the R, G and B channels of the camera.
background_intensity = [209, 208, 215]
image = dip.BeerLambertMapping(image, background_intensity)
I manually determined the background intensity, but you can automate that process quite well if you have whole slide images: in whole slide images, the edges of the image always correspond to background, so you can look there for intensities.
The second step is the actual unmixing. The mixing of absorbances is a linear process, so the unmixing is solving of a set of linear equations at each pixel. For this we need to know the absorbance values for each of the stains in each of the channels. Using standard values (as in skimage.color.hax_from_rgb) might give a good first approximation, but rarely will provide the best quantification.
Stain colors change from assay to assay (for example, hematoxylin has a different color depending on who made it, what tissue is stained, etc.), and change also depending on the camera used to image the slide (each model has different RGB filters). The best way to determine these colors is to prepare a slide for each stain, using all the same protocol but not putting on the other dyes. From these slides you can easily obtain stain colors that are valid for your assay and your slide scanner. This is however rarely if ever done in practice.
A more practical solution involves estimating colors from the slide itself. By finding a spot on the slide where you see each of the stains individually (where stains are not mixed) one can manually determine fairly good values. It is possible to automatically determine appropriate values, but is much more complex and it'll be hard finding an existing implementation. There are a few papers out there that show how to do this with non-negative matrix factorization with a sparsity constraint, which IMO is the best approach we have.
hematoxylin_color = np.array([0.2712, 0.2448, 0.1674])
hematoxylin_color = (hematoxylin_color/np.linalg.norm(hematoxylin_color)).tolist()
aec_color = np.array([0.2129, 0.2806, 0.4348])
aec_color = (aec_color/np.linalg.norm(aec_color)).tolist()
stains = dip.UnmixStains(image, [hematoxylin_color, aec_color])
stains = dip.ClipLow(stains, 0) # set negative values to 0
hematoxylin = stains.TensorElement(0)
aec = stains.TensorElement(1)
Note how the linear unmixing can lead to negative values. This is a result of incorrect color vectors, noise, JPEG artifacts, and things on the slide that absorb light that are not the two stains we defined.
Identifying tissue area
You already have a good method for this, which is applied to the original RGB image. However, don't apply the mask to the original image before doing the unmixing above, keep the mask as a separate image. I wrote the next bit of code that finds tissue area based on the hematoxylin stain. It's not very good, and it's not hard to improve it, but I didn't want to waste too much time here.
tissue = dip.MedianFilter(hematoxylin, dip.Kernel(5))
tissue = dip.Dilation(tissue, [20])
tissue = dip.Closing(tissue, [50])
area = tissue > 0.2
Identifying tissue folds
You were asking about this step too. Tissue folds typically appear as larger darker regions in the image. It is not trivial to find an automatic method to identify them, because a lot of other things can create darker regions in the image too. Manual annotation is a good start, if you collect enough manually annotated examples you could train a Deep Learning model to help you out. I did this just as a place holder, again it's not very good, and identifies some positive regions as folds. Folds are subtracted from the tissue area mask.
folds = dip.Gauss(hematoxylin - aec, [20])
area -= folds > 0.2
Identifying positive pixels
It is important to use a fixed threshold for this. Only a pathologist can tell you what the threshold should be, they are the gold-standard for what constitutes positive and negative.
Note that the slides must all have been prepared following the same protocol. In clinical settings this is relatively easy because the assays used are standardized and validated, and produce a known, limited variation in staining. In an experimental setting, where assays are less strictly controlled, you might see more variation in staining quality. You will even see variation in staining color, unfortunately. You can use automated thresholding methods to at least get some data out, but there will be biases that you cannot control. I don't think there is a way out: inconsistent stain in, inconsistent data out.
Using an image-content-based method such as Otsu causes the threshold to vary from sample to sample. For example, in samples with few positive pixels the threshold will be lower than other samples, yielding a relative overestimation of the percent positive.
positive = aec > 0.1 # pick a threshold according to pathologist's idea what is positive and what is not
pp = 100 * dip.Count(dip.And(positive, area)) / dip.Count(area)
print("Percent positive:", pp)
I get a 1.35% in this sample. Note that the % positive pixels is not necessarily related to the % positive cells, and should not be used as a substitute.
I ended up incorporating some of the feedback given above by Chris into the following possible unconventional solution for which I would appreciate getting feedback (to the specific questions below but also general suggestions for improvement or more effective/accurate tools or strategy):
Define (but not apply yet) tissue mask (HistoQC) after optimizing HistoQC script to remove as much of the tissue folds as possible without removing normal tissue area
Apply deconvolution on the original RGB image using hax_from_rgb
Using the second channel which should correspond to the red stain pixels, and subtract from it the third channel which as far as I see corresponds to the background non-red/blue pixels of the image. This step removes the high values in the second channel that which up because of tissue folds or other artifacts that weren't removed in the first step (what does the third channel correspond to? The Green element of RGB?)
Blur the adjusted image and threshold based on the median of the image plus 20 (Semi-arbitrary but it works. Are there better alternatives? Otsu doesn't work here at all)
Apply the tissue regions mask on the thresholded image yielding only positive red/red-ish pixels without the non-tissue areas
Count the % of positive pixels relative to the tissue mask area
I have been trying to apply, as suggested above, the tissue mask on the deconvolution red channel output and then use Otsu thresholding. But it failed since the black background generated by the applying the tissue regions mask makes the Otsu threshold detect the entire tissue as positive. So I have proceeded instead to apply the threshold on the adjusted red channel and then apply the tissue mask before counting positive pixels. I am interested in learning what am I doing wrong here.
Other than that, the LoG transformation didn't seem to work well because it produced a lot of stretched bright segments rather than just circular blobs where cells are located. I'm not sure why this is happening.
Use ML for this case.
Create manually binary mask for your pictures: each red pixel - white, background pixels - black.
Work in HSV or Lab color space.
Train simple classifier: decision tree or SVM (linear or with RBF)..
Let's test!
See on a good and very simple example with skin color segmentation.
And in the future you can add new examples and new cases without code refactoring: just update dataset and retrain model.
For a school project I am trying to write a program in Python that tracks the movement of the pupil. In order to do that I am using OpenCV.
After looking up some tutorials on the internet, I noticed that almost everyone is using thresholding to achieve this, since a binary image is necessary for almost every step further down the road (e.g. HoughCircle Transofrmation, Contours). However, from my understanding thresholding is extremly light sensitive, therefore such an approach would only return good results in optimal lightning conditions.
So here comes my question: Is there any alternative or better approach than just Thresholding the image? Or is my understanding of thresholding in OpenCV wrong in the first place?
Here is a example image:
The purpose of thresholding is to segment the desired objects from the background where you can then perform additional processing (applying morphological operations) then perform contour filtering to further isolate the desired objects. Instead of applying image processing techniques on a BGR (3-channel) image or a grayscale (1-channel) image with range [0...255], thresholding allows us to obtain a binary image where every pixel is either 0 or 1 which makes distinguishing objects easier. Depending on your situation, there are many ways obtain a binary image, here are several methods:
cv2.Canny - Canny edge detection which uses a minVal and maxVal to determine edges
cv2.threshold - Simple thresholding with user selected arbitrary global threshold value
cv2.threshold + cv2.THRESH_OTSU - Otsu's thresholding to automatically calculate the threshold value.
cv2.adaptiveThreshold - Adaptive thresholding where the image has different lighting conditions in different areas. Essentially it will automatically calculate the threshold value for different regions of the image and gives better results with images with varying illumination
cv2.inRange - Color segmentation. The idea is to use lower and upper threshold ranges to obtain a binary image. Useful when trying to isolate a single color range
How to go from the image on the left to the image on the right programmatically using Python (and maybe some tools, like OpenCV)?
I made this one by hand using an online tool for clipping. I am completely noob in image processing (especially in practice). I was thinking to apply some edge or contour detection to create a mask, which I will apply later on the original image to paint everything else (except the region of interest) black. But I failed miserably.
The goal is to preprocess a dataset of very similar images, in order to train a CNN binary classifier. I tried to train it by just cropping the image close to the region of interest, but the noise is so high that the CNN learned absolutely nothing.
Can someone help me do this preprocessing?
I used OpenCV's implementation of watershed algorithm to solve your problem. You can find out how to use it if you read this great tutorial, so I will not explain this into a lot of detail.
I selected four points (markers). One is located on the region that you want to extract, one is outside and the other two are within lower/upper part of the interior that does not interest you. I then created an empty integer array (the so-called marker image) and filled it with zeros. Then I assigned unique values to pixels at marker positions.
The image below shows the marker positions and marker values, drawn on the original image:
I could also select more markers within the same area (for example several markers that belong to the area you want to extract) but in that case they should all have the same values (in this case 255).
Then I used watershed. The first input is the image that you provided and the second input is the marker image (zero everywhere except at marker positions). The algorithm stores the result in the marker image; the region that interests you is marked with the value of the region marker (in this case 255):
I set all pixels that did not have the 255 value to zero. I dilated the obtained image three times with 3x3 kernel. Then I used the dilated image as a mask for the original image (i set all pixels outside the mask to zero) and this is the result i got:
You will probably need some kind of method that will find markers automatically. The difficulty of this task depends heavily on the set of the input images. In some cases, the method can be really straightforward and simple (as in the tutorial linked above) but sometimes this can be a tough nut to crack. But I can't recommend anything because I don't know how your images look like in general (you only provided one). :)
I am in the process of putting together an OpenCV script to analyze immunohistochemically stained heart tissue. Our staining procedure renders cell types expressing certain proteins in their plasma membranes with pigments visible under a light microscope, which we use to photograph the images.
So far, I've succeded in segmenting the images to different layers based on color range using a modified version of the frequently cited color segmentation script available through the OpenCV community(http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_colorspaces/py_colorspaces.html).
A screen shot of the original image:
B-Cell layer displayed:
At this point, I would like to calculate the ratio of area of B-Cells to unstained tissue. This operation prompted an extraction of the background cell layer as such based on color range:
Obviously, these results leave much to be desired.
Does anyone have ideas of how to approach this problem? Again, I would like to segment the background tissue (transparent) layer, which is unfortunately fairly sponge-like in texture. My goal is to create a mask representive of the area of unstained tissue. It seems a blur technique is necessary to fill the gaps in the tissue, but the loss in accuracy this approach entails is obvious.
In the sample image, the channels look highly correlated. If you apply decorrelation-stretching to the image you should be able to see more detail. Here in my blog post I've implemented decorrelation-stretching in C++ (unfortualtely not Python).
Using the sample code in the blog I did the following to segment the cell region:
dstretch the CIE Lab image with following targetMean and tergetSigma.
float mu[3] = {128.0f, 128.0f, 128.0f};
float sd[3] = {128.0f, 5.0f, 5.0f};
Mat mean = Mat(3, 1, CV_32F, mu);
Mat sigma = Mat(3, 1, CV_32F, sd);
Convert the dstretched CIE Lab image back to BGR.
Erode this BGR image with a 3x3 rectangular structuring element once.
Apply kmeans clustering to this eroded image with k = 2.
I don't know how good this segmentation is. I think it is possible to get a better segmentation by trying different values for the above parameters (mean, sigma, structuring element size and number of times the image is eroded).
(Following images are not to the original scale)
Original:
dstretched CIE Lab converted back to BGR:
Eroded:
kmeans with k = 2:
I am attempting to use machine learning (namely random forests) for image segmentation. The classifier utilizes a number of different pixel level features to classify pixels as either edge pixels or non edge pixels. I recently applied my classifier to a set of images that are pretty difficult to segment even manually (Image segmentation based on edge pixel map) and am still working on obtaining reasonable contours from the resulting probability map. I also applied the classifier to an easier set of images and am obtaining quite good predicted outlines (Rand index > 0.97) when I adjust the threshold to 0.95. I am interested in improving the segmentation result by filtering contours extracted from the probability map.
Here is the original image:
The expert outlines:
The probability map generated from my classifier:
This can be further refined when I convert the image to binary based on a threshold of 0.95:
I tried filling holes in the probability map, but that left me with a lot of noise and sometimes merged nearby cells. I also tried contour finding in openCV but this didn't work either as many of these contours are not completely connected - a few pixels will be missing here and there in the outlines.
Edit: I ended up using Canny edge detection on the probability map.
The initial image seems to be well contrasted and I guess we can simply threshold to obtain a good estimate of the cells. Here is a morphological area based filtering of the thresholded image:
Threshold:
Area based opening filter(this needs to be set based on your dataset of cells under study):
Area based closing filter(this needs to be set based on your dataset of cells under study):
Contours using I-Erosion(I):
Code snippet:
C is input image
C10 = C>10; %threshold depends on the average contrast in your dataset
C10_areaopen = bwareaopen(C10,2500); %area filters average remove small components that are not cells
C10_areaopenclose = ~bwareaopen(~C10_areaopen,100); %area filter fills holes
se = strel('disk',1);
figure, imshow(C10_areaopenclose-imerode(C10_areaopenclose,se)) %inner contour
To get smoother shapes I guess fine opening operations can be performed on the filtered images, thus removing any concave parts of the cells. Also for cells that are attached one could use the distance function and the watershed over the distance function to obtain segmentations of the cells: http://www.ias-iss.org/ojs/IAS/article/viewFile/862/765
I guess this can be also used on your probability/confidence maps to perform nonlinear area based filtering.