Clipping image/remove background programmatically in Python

Clipping image/remove background programmatically in Python - python

How to go from the image on the left to the image on the right programmatically using Python (and maybe some tools, like OpenCV)?
I made this one by hand using an online tool for clipping. I am completely noob in image processing (especially in practice). I was thinking to apply some edge or contour detection to create a mask, which I will apply later on the original image to paint everything else (except the region of interest) black. But I failed miserably.
The goal is to preprocess a dataset of very similar images, in order to train a CNN binary classifier. I tried to train it by just cropping the image close to the region of interest, but the noise is so high that the CNN learned absolutely nothing.
Can someone help me do this preprocessing?

I used OpenCV's implementation of watershed algorithm to solve your problem. You can find out how to use it if you read this great tutorial, so I will not explain this into a lot of detail.
I selected four points (markers). One is located on the region that you want to extract, one is outside and the other two are within lower/upper part of the interior that does not interest you. I then created an empty integer array (the so-called marker image) and filled it with zeros. Then I assigned unique values to pixels at marker positions.
The image below shows the marker positions and marker values, drawn on the original image:
I could also select more markers within the same area (for example several markers that belong to the area you want to extract) but in that case they should all have the same values (in this case 255).
Then I used watershed. The first input is the image that you provided and the second input is the marker image (zero everywhere except at marker positions). The algorithm stores the result in the marker image; the region that interests you is marked with the value of the region marker (in this case 255):
I set all pixels that did not have the 255 value to zero. I dilated the obtained image three times with 3x3 kernel. Then I used the dilated image as a mask for the original image (i set all pixels outside the mask to zero) and this is the result i got:
You will probably need some kind of method that will find markers automatically. The difficulty of this task depends heavily on the set of the input images. In some cases, the method can be really straightforward and simple (as in the tutorial linked above) but sometimes this can be a tough nut to crack. But I can't recommend anything because I don't know how your images look like in general (you only provided one). :)

Related

Skewing text - How to take advantage of existing edges

I have the following JPG image. If I want to find the edges where the white page meets the black background. So I can rotate the contents a few degrees clockwise. My aim is to straighten the text for using with Tesseract OCR conversion. I don't see the need to rotate the text blocks as I have seen in similar examples.
In the docs Canny Edge Detection the third arg 200 eg edges = cv.Canny(img,100,200) is maxVal and said to be 'sure to be edges'. Is there anyway to determine these (max/min) values ahead of any trial & error approach?
I have used code examples which utilize the Python cv2 module. But the edge detection is set up for simpler applications.
Is there any approach I can use to take the text out of the equation. For example: only detecting edge lines greater than a specified length?
Any suggestions would be appreciated.
Below is an example of edge detection (above image same min/max values) The outer edge of the page is clearly defined. The image is high contrast b/w. It has even lighting. I can't see a need for the use of an adaptive threshold. Simple global is working. Its just at what ratio to use it.
I don't have the answer to this yet. But to add. I now have the contours of the above doc.
I used find contours tutorial with some customization of the file loading. Note: removing words gives a thinner/cleaner outline.

Consider Otsu.
Its chief virtue is that it is adaptive to local
illumination within the image.
In your case, blank margins might be the saving grace.
Consider working on a series of 2x reduced resolution images,
where new pixel is min() (or even max()!) of original four pixels.
These reduced images might help you to focus on the features
that matter for your use case.
The usual way to deskew scanned text is to binarize and
then keep changing theta until "sum of pixels across raster"
is zero, or small. In particular, with few descenders
and decent inter-line spacing, we will see "lots" of pixels
on each line of text and "near zero" between text lines,
when theta matches the original printing orientation.
Which lets us recover (1.) pixels per line, and (2.) inter-line spacing, assuming we've found a near-optimal theta.
In your particular case, focusing on the ... leader dots
seems a promising approach to finding the globally optimal
deskew correction angle. Discarding large rectangles of
pixels in the left and right regions of the image could
actually reduce noise and enhance the accuracy of
such an approach.

Identifying positive pixels after color deconvolution ignoring boundaries

I am analyzing histology tissue images stained with a specific protein marker which I would like to identify the positive pixels for that marker. My problem is that thresholding on the image gives too much false positives which I'd like to exclude.
I am using color deconvolution (separate_stains from skimage.color) to get the AEC channel (corresponding to the red marker), separating it from the background (Hematoxylin blue color) and applying cv2 Otsu thresholding to identify the positive pixels using cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU), but it is also picking up the tissue boundaries (see white lines in the example picture, sometimes it even has random colors other than white) and sometimes even non positive cells (blue regions in the example picture). It's also missing some faint positive pixels which I'd like to capture.
Overall: (1) how do I filter the false positive tissue boundaries and blue pixels? and (2) how do I adjust the Otsu thresholding to capture the faint red positives?
Adding a revised example image -
top left the original image after using HistoQC to identify tissue regions and apply the mask it identified on the tissue such that all of the non-tissue regions are black. I should tru to adjust its parameters to exclude the folded tissue regions which appear more dark (towards the bottom left of this image). Suggestions for other tools to identify tissue regions are welcome.
top right hematoxylin after the deconvolution
bottom left AEC after the deconvolution
bottom right Otsu thresholding applied not the original RGB image trying to capture only the AEC positives pixels but showing also false positives and false negatives
Thanks

#cris-luengo thank you for your input on scikit-image! I am one of the core developers, and based on #assafb input, we are trying to rewrite the code on color/colorconv/separate_stains.
#Assafb: The negative log10 transformation is the Beer-Lambert mapping. What I don't understand in that code is the line rgb += 2. I don't know where that comes from or why they use it. I'm 100% sure it is wrong. I guess they're trying to avoid log10(0), but that should be done differently. I bet this is where your negative values come from, though.
Yes, apparently (I am not the original author of this code) we use rgb += 2 to avoid log10(0). I checked Fiji's Colour Deconvolution plugin, and they add 1 to their input. I tested several input numbers to help on that, and ~2 would let us closer to the desirable results.
#Assafb: Compare the implementation in skimage with what is described in the original paper. You'll see several errors in the implementation, most importantly the lack of a division by the max intensity. They should have used -np.log10(rgb/255) (assuming that 255 is the illumination intensity), rater than -np.log10(rgb).
Our input data is float; the max intensity in this case would be 1. I'd say that that's the reason we don't divide by something.
Besides that, I opened an issue on scikit-image to discuss these problems — and to specify a solution. I made some research already — I even checked DIPlib's documentation —, and implemented a different version of that specific function. However, stains are not my main area of expertise, and we would be glad if you could help evaluating that code — and maybe pointing a better solution.
Thank you again for your help!

There are several issues that cause improper quantification. I'll go over the details of how I would recommend you tackle these slides.
I'm using DIPlib, because I'm most familiar with it (I'm an author). It has Python bindings, which I use here, and can be installed with pip install diplib. However, none of this is complicated image processing, and you should be able to do similar processing with other libraries.
Loading image
There is nothing special here, except that the image has strong JPEG compression artifacts, which can interfere with the stain unmixing. We help the process a bit by smoothing the image with a small Gaussian filter.
import diplib as dip
import numpy as np
image = dip.ImageRead('example.png')
image = dip.Gauss(image, [1]) # because of the severe JPEG compression artifacts
Stain unmixing
[Personal note: I find it unfortunate that Ruifrok and Johnston, the authors of the paper presenting the stain unmixing method, called it "deconvolution", since that term already had an established meaning in image processing, especially in combination with microscopy. I always refer to this as "stain unmixing", never "deconvolution".]
This should always be the first step in any attempt at quantifying from a bightfield image. There are three important RGB triplets that you need to determine here: the RGB value of the background (which is the brightness of the light source), and the RGB value of each of the stains. The unmixing process has two components:
First we apply the Beer-Lambert mapping. This mapping is non-linear. It converts the transmitted light (as recorded by the microscope) into absorbance values. Absorbance indicates how strongly each point on the slide absorbs light of the various wavelengths. The stains absorb light, and differ by the relative absorbance in each of the R, G and B channels of the camera.
background_intensity = [209, 208, 215]
image = dip.BeerLambertMapping(image, background_intensity)
I manually determined the background intensity, but you can automate that process quite well if you have whole slide images: in whole slide images, the edges of the image always correspond to background, so you can look there for intensities.
The second step is the actual unmixing. The mixing of absorbances is a linear process, so the unmixing is solving of a set of linear equations at each pixel. For this we need to know the absorbance values for each of the stains in each of the channels. Using standard values (as in skimage.color.hax_from_rgb) might give a good first approximation, but rarely will provide the best quantification.
Stain colors change from assay to assay (for example, hematoxylin has a different color depending on who made it, what tissue is stained, etc.), and change also depending on the camera used to image the slide (each model has different RGB filters). The best way to determine these colors is to prepare a slide for each stain, using all the same protocol but not putting on the other dyes. From these slides you can easily obtain stain colors that are valid for your assay and your slide scanner. This is however rarely if ever done in practice.
A more practical solution involves estimating colors from the slide itself. By finding a spot on the slide where you see each of the stains individually (where stains are not mixed) one can manually determine fairly good values. It is possible to automatically determine appropriate values, but is much more complex and it'll be hard finding an existing implementation. There are a few papers out there that show how to do this with non-negative matrix factorization with a sparsity constraint, which IMO is the best approach we have.
hematoxylin_color = np.array([0.2712, 0.2448, 0.1674])
hematoxylin_color = (hematoxylin_color/np.linalg.norm(hematoxylin_color)).tolist()
aec_color = np.array([0.2129, 0.2806, 0.4348])
aec_color = (aec_color/np.linalg.norm(aec_color)).tolist()
stains = dip.UnmixStains(image, [hematoxylin_color, aec_color])
stains = dip.ClipLow(stains, 0) # set negative values to 0
hematoxylin = stains.TensorElement(0)
aec = stains.TensorElement(1)
Note how the linear unmixing can lead to negative values. This is a result of incorrect color vectors, noise, JPEG artifacts, and things on the slide that absorb light that are not the two stains we defined.
Identifying tissue area
You already have a good method for this, which is applied to the original RGB image. However, don't apply the mask to the original image before doing the unmixing above, keep the mask as a separate image. I wrote the next bit of code that finds tissue area based on the hematoxylin stain. It's not very good, and it's not hard to improve it, but I didn't want to waste too much time here.
tissue = dip.MedianFilter(hematoxylin, dip.Kernel(5))
tissue = dip.Dilation(tissue, [20])
tissue = dip.Closing(tissue, [50])
area = tissue > 0.2
Identifying tissue folds
You were asking about this step too. Tissue folds typically appear as larger darker regions in the image. It is not trivial to find an automatic method to identify them, because a lot of other things can create darker regions in the image too. Manual annotation is a good start, if you collect enough manually annotated examples you could train a Deep Learning model to help you out. I did this just as a place holder, again it's not very good, and identifies some positive regions as folds. Folds are subtracted from the tissue area mask.
folds = dip.Gauss(hematoxylin - aec, [20])
area -= folds > 0.2
Identifying positive pixels
It is important to use a fixed threshold for this. Only a pathologist can tell you what the threshold should be, they are the gold-standard for what constitutes positive and negative.
Note that the slides must all have been prepared following the same protocol. In clinical settings this is relatively easy because the assays used are standardized and validated, and produce a known, limited variation in staining. In an experimental setting, where assays are less strictly controlled, you might see more variation in staining quality. You will even see variation in staining color, unfortunately. You can use automated thresholding methods to at least get some data out, but there will be biases that you cannot control. I don't think there is a way out: inconsistent stain in, inconsistent data out.
Using an image-content-based method such as Otsu causes the threshold to vary from sample to sample. For example, in samples with few positive pixels the threshold will be lower than other samples, yielding a relative overestimation of the percent positive.
positive = aec > 0.1 # pick a threshold according to pathologist's idea what is positive and what is not
pp = 100 * dip.Count(dip.And(positive, area)) / dip.Count(area)
print("Percent positive:", pp)
I get a 1.35% in this sample. Note that the % positive pixels is not necessarily related to the % positive cells, and should not be used as a substitute.

I ended up incorporating some of the feedback given above by Chris into the following possible unconventional solution for which I would appreciate getting feedback (to the specific questions below but also general suggestions for improvement or more effective/accurate tools or strategy):
Define (but not apply yet) tissue mask (HistoQC) after optimizing HistoQC script to remove as much of the tissue folds as possible without removing normal tissue area
Apply deconvolution on the original RGB image using hax_from_rgb
Using the second channel which should correspond to the red stain pixels, and subtract from it the third channel which as far as I see corresponds to the background non-red/blue pixels of the image. This step removes the high values in the second channel that which up because of tissue folds or other artifacts that weren't removed in the first step (what does the third channel correspond to? The Green element of RGB?)
Blur the adjusted image and threshold based on the median of the image plus 20 (Semi-arbitrary but it works. Are there better alternatives? Otsu doesn't work here at all)
Apply the tissue regions mask on the thresholded image yielding only positive red/red-ish pixels without the non-tissue areas
Count the % of positive pixels relative to the tissue mask area
I have been trying to apply, as suggested above, the tissue mask on the deconvolution red channel output and then use Otsu thresholding. But it failed since the black background generated by the applying the tissue regions mask makes the Otsu threshold detect the entire tissue as positive. So I have proceeded instead to apply the threshold on the adjusted red channel and then apply the tissue mask before counting positive pixels. I am interested in learning what am I doing wrong here.
Other than that, the LoG transformation didn't seem to work well because it produced a lot of stretched bright segments rather than just circular blobs where cells are located. I'm not sure why this is happening.

Use ML for this case.
Create manually binary mask for your pictures: each red pixel - white, background pixels - black.
Work in HSV or Lab color space.
Train simple classifier: decision tree or SVM (linear or with RBF)..
Let's test!
See on a good and very simple example with skin color segmentation.
And in the future you can add new examples and new cases without code refactoring: just update dataset and retrain model.

How does the Image Registration/alignment and transformation work on the pixel level?

I know the basic flow or process of the Image Registration/Alignment but what happens at the pixel level when 2 images are registered/aligned i.e. similar pixels of moving image which is transformed to the fixed image are kept intact but what happens to the pixels which are not matched, are they averaged or something else?
And how the correct transformation technique is estimated i.e. how will I know that whether to apply translation, scaling, rotation, etc and how much(i.e. what value of degrees for rotation, values for translation, etc.) to apply?
Also, in the initial step how the similar pixel values are identified and matched?
I've implemented the python code given in https://simpleitk.readthedocs.io/en/master/Examples/ImageRegistrationMethod1/Documentation.html
Input images are of prostate MRI scans:
Fixed Image Moving Image Output Image Console output
The difference can be seen in the output image on the top right and top left. But I can't interpret the console output and how the things actually work internally.
It'll be very helpful if I get a deep explanation of this thing. Thank you.

A transformation is applied to all pixels. You might be confusing rigid transformations, which will only translate, rotate and scale your moving image to match the fixed image, with elastic transformations, which will also allow some morphing of the moving image.
Any pixel that a transformation cannot place in the fixed image is interpolated from the pixels that it is able to place, though a registration is not really intelligent.
What it attempts to do is simply reduce a cost function, where a high cost is associated with a large difference and a low cost is associated with a small difference. Cost functions can be intensity based (pixel values) or feature based (shapes). It will (semi-)randomly shift the image around untill a preset criteria is met, generally a maximum amount of iterations.
What that might look like can be seen in the following gif:
http://insightsoftwareconsortium.github.io/SimpleITK-Notebooks/registration_visualization.gif

OpenCV find subjective contours like the human eye does

When humans see markers suggesting the form of a shape, they immediately perceive the shape itself, as in https://en.wikipedia.org/wiki/Illusory_contours. I'm trying to accomplish something similar in OpenCV in order to detect the shape of a hand in a depth image with very heavy noise. In this question, assume that skin color based detection is not working (actually it is the best I've achieved so far but it is not robust under changing light conditions, shadows or skin colors. Also various paper shapes (flat and colorful) are on the table, confusing color-based approaches. This is why I'm attempting to use the depth cam instead).
Here's a sample image of the live footage that is already pre-processed for better contrast and with background gradient removed:
I want to isolate the exact shape of the hand from the rest of the picture. For a human eye this is a trivial thing to do. So here are a few attempts I did:
Here's the result with canny edge detection applied. The problem here is that the black shape inside the hand is larger than the actual hand, causing the detected hand to overshoot in size. Also, the lines are not connected and I fail at detecting contours.
Update: Combining Canny and a morphological closing (4x4 px ellipse) makes contour detection possible with the following result. It is still waaay too noisy.
Update 2: The result can be slightly enhanced by drawing that contour to an empty mask, save that in a buffer and re-detect yet another contour on a merge of three buffered images. The line that combines the buffered images is is hand_img = np.array(np.minimum(255, np.multiply.reduce(self.buf)), np.uint8) which is then morphed once again (closing) and finally contour detected. The results are slightly less horrible than in the picture above but laggy instead.
Alternatively I tried to use an existing CNN (https://github.com/victordibia/handtracking) for detecting the approximate position of the hand's center (this step works) and then flood from there. In order to detect contours the result is put into an OTSU filter and then the largest contour is taken, resulting in the following picture (ignore black rectangles in the left). The problem is that some of the noise is flooded as well and the results are mediocre:
Finally, I tried background removers such as MOG2 or GMG. They are confused by the enormous amount of fast-moving noise. Also they cut off the fingertips (which are crucial for this project). Finally, they don't see enough details in the hand (8 bit plus further color reduction via equalizeHist yield a very poor grayscale resolution) to reliably detect small movements.
It's ridiculous how simple it is for a human to see the exact precise shape of the hand in the first picture and how incredibly hard it is for the computer to draw a shape.
What would be your recommended method to achieve an exact hand segmentation?

After two days of desperate testing, the solution was to VERY carefully apply thresholding to an well-preprocessed image.
Here are the steps:
Remove as much noise as you possibly can. In my case, denoising was done using Intel's pyrealsense2 (I'm using an Intel RealSense depth camera and the algorithms were written for that camera family, thus they work very well). I used rs.temporal_filter() and directly after rs.hole_filling_filter() on every frame.
Capture the very first frame. Besides capturing the exact distance to the table (for later thresholding), this step also saves a still picture that is blurred by a 100x100 px kernel. Since the camera is never mounted perfectly but slightly tilted, there's an ugly grayscale gradient going over the picture and making operations impossible. This still picture is then subtracted from every single later frame, eliminating the gradient. BTW: this gradient removal step is already incorporated in the screenshots shown in the question above
Now the picture is almost noise-free. Do not use equalizeHist. This does not simply increase the general contrast regularly but instead empathizes the remaining noise way too much. This was my main error I did in almost all experiments. Instead, apply a threshold (binary with fixed border) directly. The border is extremely thin, setting it at 104 instead of 205 makes a huge difference.
Invert colors (unless you have taken BINARY_INV in the previous step), apply contours, take the largest one and write it to a mask
Voilà!

Automatically find optimal image threshold value from density of histogram plot

I'm looking to perform optical character recognition (OCR) on a display, and want the program to work under different light conditions. To do this, I need to process and threshold the image such that there is no noise surrounding each digit, allowing me to detect the contour of the digit and perform OCR from there. I need the threshold value I use to be adaptable to these different light conditions. I've tried adaptive thresholding, but I haven't been able to get it to work.
My image processing is simple: load the image (i), grayscale i (g), apply a histogram equalization to g (h), and apply a binary threshold to h with a threshold value = t. I've worked with a couple of different datasets, and found that the optimal threshold value to make the OCR work consistently lies within the range of highest density in a histogram plot of (h) (the only part of the plot without gaps).
A histogram of (h). The values t=[190,220] are optimal for OCR. A more complete set of images describing my problem is available here: http://imgur.com/a/wRgi7
My current solution, which works but is clunky and slow, checks for:
1. There must be 3 digits
2. The first digit must be reasonably small in size
3. There must be at least one contour recognized as a digit
4. The digit must be recognized in the digit dictionary
Barring all cases being accepted, the threshold is increased by 10 (beginning at a low value) and an attempt is made again.
The fact that I can recognize the optimal threshold value on the histogram plot of (h) may just be confirmation bias, but I'd like to know if there's a way I can extract the value. This is different from how I've worked with histograms before, which has been more on finding peaks/valleys.
I'm using cv2 for image processing and matplotlib.pyplot for the histogram plots.

Check this: link it really not depend on density, it works because you did separation of 2 maximums. Local maximums are main classes foreground - left local maximum (text pixels), and background right local maximum (white paper). Optimal threshold should optimally separate these maximums. And the optimal threshold value lies in local minimum region between two local maximums.

At first, I thought "well, just make a histogram of the indexes in which data appears" which would totally work, but I don't think that will actually solve your underlying work you want to do.
I think you're misinterpreting histogram equalization. What histogram equalization does is thins out the histogram in highly concentrated areas so that if you take different bin sizes with the histogram, you'll get more or less equal quantity inside the bins. The only reason those values are dense is specifically because they appear less in the image. Histogram equalization makes other, more popular values, appear less. And the reason that range works out well is, as you see in the original grayscale histogram, values between 190 and 220 are really close to where the image begins to get bright again; i.e., where there is a clear demarkation of bright values.
You can see the way equalizeHist works directly by plotting histograms with different bin sizes. For example, here's looping over bin sizes from 3 to 20:
Edit: So just to be clear, what you want is this demarked area between the lower bump and the higher bump in your original histogram. You don't need to use equalized histograms for this. In fact, this is what Otsu thresholding (following Otsu's method) actually does: you assume the data follows a bimodal distribution, and find the point which clearly marks the point between the two distributions.

Basically, what you're asking is to find the indexes of the longest sequence of non-zero element in a 256 x 1 array.
Based on this answer, you should get what you want like this :
import cv2
import numpy as np
# load in grayscale
img = cv2.imread("image.png",0)
hist = cv2.calcHist([img],[0],None,[256],[0,256])
non_zero_sequences = np.where(np.diff(np.hstack(([False],hist!=0,[False]))))[0].reshape(-1,2)
longest_sequence_id = np.diff(non_zero_sequences,axis=1).argmax()
longest_sequence_start = non_zero_sequences[longest_sequence_id,0]
longest_sequence_stop = non_zero_sequences[longest_sequence_id,1]
Note that it is untested.

I would also recommend to use an automatic thresholding method like the Otsu's method (here a nice explanation of the method).
In Python OpenCV, you have this tutorial that explains how to do Otsu's binarization.
If you want to experiment other automatic thresholding methods, you can look at the ImageJ / Fiji software. For instance, this page summarizes all the methods implemented.
Grayscale image:
Results:
If you want to reimplement the methods, you can check the source code of the Auto_Threshold plugin. I used Fiji for this demo.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.