I'm trying to understand how I can show image segmentation results to users.
I mean that if I have this image:
I want to show to the user this result:
These images are from this Github. I have checked their code, but I haven't found where they show their results to the users.
How can I show semantic segmentation to the user?
What is the output of a semantic segmentation network?
UNet (the one in the example) and essentially every other network that deals with Semantic Segmentation produce as output an image whose size is proportional to the input image and in which each pixel is classified as one of the possible classes specified.
For binary classification, typically the raw output is a single-channel float image with values in [0,1] that must be thresholded at 0.5 in order to obtain the "foreground" binary mask. It's also possible that the network is trained implicitly with two classes (foreground/background), in which case, read on for how to deal with for multi-class classification output.
For multi-class classification, the raw output image has N channels, one per class, with value at index [x, y, c] being the score for the pixel (think of it as the probability of pixel x,y to belong to class c, although in principle scores don't have to be probabilities). For each pixel, the selected class is the one of the channel with the highest score.
Images can then be postprocessed (e.g., flattening them and assigning to each pixel class label of the "winning" class), as it seems to be the case for the example you link (if you take a look at the implementation of labelVisualize(), they use a dict mapping class codes to colors).
Related
Lets assume i have a little dataset. I want to implement data augmentation. First i implement image segmentation (after this, image will be binary image) and then implement data augmentation. Is this a good way?
For image augmentation in segmentation and instance segmentation, you have to either no change the positions of the objects contained in the image by manipulating colors for example, or modify these positions by applying translations and rotation.
So, yes this way works, but you have to take into consideration the type of data you have and what you are looking to achieve. Data augmentation isn't a ready to-go process with good results everywhere.
In case you have a:
Semantic segmentation : Each pixel of your image has a row i and a column j which are labeled as its enclosing object. This means having your main image I and a label image L with its same size linking every pixel to its object label. In this case, your data augmentation is applied to both I and L, giving a combination of the two transformed images.
Instance segmentation : Here we generate a mask for every instance of the original image and the augmentation is applied to all of them including the original, then from these transformed masks we get our new instances.
EDIT:
Take a look at CLoDSA (Classification, Localization, Detection and Segmentation Augmentor) it may help you implement your idea.
In case your dataset is small, you should add data-augmentation during the training. It is important to change the original image & the targets (masks) in the same way !!.
For example, If an image is rotated 90 degrees, then its mask should also be rotated 90 degrees. Since you are using Keras library, You should check if the ImageDataGenerator also changes the target images (masks), along with the inputs. If it doesn't, You can implement the augmentations by yourself. This repository shows how it is done in OpenCV here:
https://github.com/kochlisGit/random-data-augmentations
I am trying to classify images as "good" or "bad". In a given region of interest, if the region of interest painted well then it is a good image else bad. I segmented the painted parts using K means clustering then I counted pixels of painted parts. How can I set a threshold value to classify images as good or bad by using the counted pixel numbers? Or is there a better approach that I can try? I tried training simple CNN but the dataset has a big class imbalance (as I observed) and I don't have labels for images.
There is no "right" answer to your question, you are the only who could know what constitutes an acceptable paint-job. My suggestion would be to create a script which processes a big number of images you consider to be "good", append all your pixel counts to a list and then extract some statistics from that list. See what the min, max, mean values of that list are and decide accordingly what your thershold value would be. Then make the same thing for images you consider to be "bad" and see if the threshold value is always biggest than your max "bad" value. Of course the more data you have, the more reliable your result will be.
I'm creating an encoder-decoder CNN for some images. Each image has a geometric shape around the center - circle, ellipse, etc.
I want my CNN to ignore all the values that are in this shape. All my input data values have been normalized to be around 0-1. I set all the shape values to be 0.
I thought that setting them to zero would mean that they will not be updated, however, the output of my encoder-decoder CNN changes the shape.
What can I do to ensure these values stay put and do not update?
Thank you!
I think you are looking for "partial convolution". This is a work published by Guilin Liu and colleagues that extends convolution to take an input mask as well as an input feature map and apply the convolution only to the unmasked pixels. They also suggest how to compensate for pixels on the boundary of the mask, where the kernel "sees" both valid and masked-out pixels.
Please note that their implementation may have issues running with automatic mixed precision (AMP).
I am trying to detect plants in the photos, i've already labeled photos with plants (with labelImg), but i don't understand how to train model with only background photos, so that when there is no plant here model can tell me so.
Do I need to set labeled box as the size of image?
p.s. new to ml so don't be rude, please)
I recently had a problem where all my training images were zoomed in on the object. This meant that the training images all had very little background information. Since object detection models use space outside bounding boxes as negative examples of these objects, this meant that the model had no background knowledge. So the model knew what objects were, but didn't know what they were not.
So I disagree with #Rika, since sometimes background images are useful. With my example, it worked to introduce background images.
As I already said, object detection models use non-labeled space in an image as negative examples of a certain object. So you have to save annotation files without bounding boxes for background images. In the software you use here (labelImg), you can use verify image to say that it saves the annotation file of the image without boxes. So it saves a file that says it should be included in training, but has no bounding box information. The model uses this as negative examples.
In your case, you don't need to do anything in that regard. Just grab the detection data that you created and train your network with it. When it comes to testing, you usually set a threshold for bounding boxes accuracy, because you may get lots of them so you only want the ones with the highest confidence.
Then you get/show the ones with highest bbox accuracies and there your go, you get your detection result and you can do what ever you want like cropping them using the bounding box coordinates you get.
If there are no plants, your network will likely create bboxes with an accuracy below your threshold (very low confidence) and then, you just ignore them.
I am in the process of putting together an OpenCV script to analyze immunohistochemically stained heart tissue. Our staining procedure renders cell types expressing certain proteins in their plasma membranes with pigments visible under a light microscope, which we use to photograph the images.
So far, I've succeded in segmenting the images to different layers based on color range using a modified version of the frequently cited color segmentation script available through the OpenCV community(http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_colorspaces/py_colorspaces.html).
A screen shot of the original image:
B-Cell layer displayed:
At this point, I would like to calculate the ratio of area of B-Cells to unstained tissue. This operation prompted an extraction of the background cell layer as such based on color range:
Obviously, these results leave much to be desired.
Does anyone have ideas of how to approach this problem? Again, I would like to segment the background tissue (transparent) layer, which is unfortunately fairly sponge-like in texture. My goal is to create a mask representive of the area of unstained tissue. It seems a blur technique is necessary to fill the gaps in the tissue, but the loss in accuracy this approach entails is obvious.
In the sample image, the channels look highly correlated. If you apply decorrelation-stretching to the image you should be able to see more detail. Here in my blog post I've implemented decorrelation-stretching in C++ (unfortualtely not Python).
Using the sample code in the blog I did the following to segment the cell region:
dstretch the CIE Lab image with following targetMean and tergetSigma.
float mu[3] = {128.0f, 128.0f, 128.0f};
float sd[3] = {128.0f, 5.0f, 5.0f};
Mat mean = Mat(3, 1, CV_32F, mu);
Mat sigma = Mat(3, 1, CV_32F, sd);
Convert the dstretched CIE Lab image back to BGR.
Erode this BGR image with a 3x3 rectangular structuring element once.
Apply kmeans clustering to this eroded image with k = 2.
I don't know how good this segmentation is. I think it is possible to get a better segmentation by trying different values for the above parameters (mean, sigma, structuring element size and number of times the image is eroded).
(Following images are not to the original scale)
Original:
dstretched CIE Lab converted back to BGR:
Eroded:
kmeans with k = 2: