I'm trying to design and train a convolutional neural network to identify circular cells in an image. I am training it on "cutouts" of the full images, which either have a circle in the middle of the image (positive training sample) or don't (negative training sample).
Example of an image with a circle in the middle (the heatmap colors are wonky, the images are all grayscale): http://imgur.com/a/6q8LZ
Rather than just classify the two types of input images (circle or not in the middle), I'd like the network output to be a binary bitmap, which is either a uniform value (e.g. -1) if there is no circle in the input image or has a "blotch" (ideally a single point) in the middle of the image to indicate the center of the circle. This would then be applied to a large image containing many such circular cells and the output should be a bitmap with blotches where the cells are.
In order to train this, I'm using the mean square error between the output image and a 2D gaussian filter (http://imgur.com/a/fvfP6) for positive training samples and the MSE between the image and a uniform matrix with value -1 for negative training samples. Ideally, this should cause the CNN to converge on an image, which resembles the gaussian peak in the middle for positive training samples, and an image, which is uniformly -1 for negative training samples.
HOWEVER, the network keeps converging on a unversal solution of "make everything zero". This does not minimize the MSE, so I don't think it's an inherent problem with the network structure (I've tried different structures, from a single layer CNN with a filter as large as the input image to multilayer CNNs with filters of varying size, all with the same result).
The loss function I am using is as follows:
weighted_score = tf.reduce_sum(tf.square(tf.sub(conv_squeeze, y)),
reduction_indices=[1, 2])
with conv_squeeze being the output image of the network and y being the label (i.e. the gaussian template shown above). I've already tried averaging over the batch size as suggested here:
Using squared difference of two images as loss function in tensorflow
but without success. I cannot find any academic publications on how to train neural networks with template images as labels and as such would be grateful for anybody to point me in the right direction. Thank you so much!
According to you description, I think you are facing an "imbalanced data" problem. And you can try Hinge Loss instead of MSE, it may solve your problem.
Related
I'm implementing a VAE with a Binary Images Dataset (pixels are b or w), where in a image every pixel has a meaning (belonging to a class).
Searching online I found that the best implementation is to use as the last activation function the Sigmoid, and binary crossentropy as loss function, correct me if I'm wrong.
When I try to generate an image from the latent space using random coordinates, or some that I obtained encoding an image in input, I may obtain blurry images, that it's normal, but I want only 0 and 1 as values (because i want to know if an element belongs to that class or not).
So my question is: there are some standard procedures in order to have only binary images outputs or to train the model to have this result (maybe changing the loss or something), or the model has to be implemented in this way and in order to have a binary image I just have to set a threshold (0.5) to the pixel of the images in output as the only solution?
I have a UNet Segmentation network implemented in Keras that simply maps all pixels in an RGB image to 4 categories which is trained on a heat map mask (Low, Low-Med, High-Med, High). Using CCE or categorical Dice loss I am able to get decent results.
However, The mask in it's original form is a heat map image with 255 bits of resolution. It seems like a totally arbitrary introduction of error to shoehorn it into the Unet by reducing the 255 bits of resolution into 4 categories.
I would like the network to output an image with each pixel having a value between (0,1), and train the network with masks that are produced by multiplying the heat map image by 1./255.
Where, in this case, the loss function would incorporate the mathematical difference between the mask and the prediction from the network. Can anyone point me in the direction of someone who has done something similar? I think I am just awful at describing what I'm looking for with relevant terminology because it seems like this would be a fairly common goal in computer vision..?
If I understand your question correctly - the "ground truth" mask is just a gray-scale image with values in range [0,255], meaning , there is a strong relation between it's values (for example - 25 is closer to 26 then to 70. this is not the case with regular segmentation, where you assign a different class to each pixel and the classes values may represent arbitrary objects such as "bicycle" or "person"). In other words, this is a regression problem, and to be more specific an image-to-image regression. You are trying to reconstruct a gray-scale image which should be identical to the ground truth mask, pixel-wise.
If I understood you correctly - you should look for regression losses. Common examples that can be used are Mean Squared Error (aka MSE, L2 norm) or Mean Absolute Error (aka MAE, L1 norm). Those are the "usual suspects" and I suggest you start with them, although many other losses exists.
How to give superpixels as input to CNN? I used SLIC algorithm to segment the images into superpixels.
How can I use this for classification using CNN?
I will try to help you. CNN (Convolutional Neural Networks) work with unique datas of input, not matrices (superpixel is a matrix). So, for this, you need to remove each superpixel and make it its own image. So, in other words, if you segment your image in 300 superpixels, after, you need to create 300 new images, one of each superpixel.
After this it's notorious that each new image, perhaps, will have differents sizes. You can't work like that, because the number of neurons of input in the CNN can't change. For this, you can centralize each "new image" in a background NxN ('N' must be enough to cover all new images). WIth a centralized superpixel (new images centralized), each pixel will be input of your CNN.In other words:
1) Each centralized superpixel will be input one at a time;
2) The quantity of inputs in the CNN will be X*Y, being X the shape[0] of the superpixel centralized, and Y the shape[1] of the superpixel centralized;
3) Whereas 300 superpixels centralized, your CNN must calculate the output for each one.
Ilustration: https://imgur.com/k8pRDw7
Look the ilustration and good luck! :)
I am extracting a road network from satellite imagery. Herein the pixel classification is binary ( 0 = non-road, 1 = road). Hence, the mask of the complete satellite image which is 6400 x 6400 pixels shows one large road network where each road is connected to another road. For the implementation of the U-net I divided that large image in 625 images of 256 x 256 pixels.
My question is: Can a neural network easier find structure with an increase in batch size (thus can it find structure between different batches), or can it only find structure if the input image size is enlarged?
If your model is a regular convolutional network (without any weird hacks), the samples in a batch will not be connected to each other.
Depending on which loss function you use, the batch size might be important too. For regular functions (available 'mse', 'binary_crossentropy', 'categorical_crossentropy', etc.), they all keep the samples independent from each other. But some losses might consider the entire batch. (F1 metrics, for instance). If you're using a loss function that doesn't treat samples independently, then the batch size is very important.
That said, having a bigger batch size may help the net to find its way more easily, since one image might push weights towards one direction, while another may want a different direction. The mean results of all images in the batch should then be more representative of a general weight update.
Now, entering an experimenting field (we never know everything about neural networks until we test them), consider this comparison:
a batch with 1 huge image versus
a batch of patches of the same image
Both will have the same amount of data, and for a convolutional network, it wouldn't make a drastic difference. But for the first case, the net will probably be better at finding connections between roads, maybe find more segments where the road might be covered by something, while the small patches, being full of borders might be looking more into textures and be not good at identifying these gaps.
All of this is, of course, a guess. Testing is the best.
My net in a GPU cannot really use big patches, which is bad for me...
I have a project that use Deep CNN to classify parking lot. My idea is to classify every space whether there is a car or not. and my question is, how do i prepare my image dataset to train my model ?
i have downloaded PKLot dataset for training included negative and positive image.
should i turn all my data training image to grayscale ? should i rezise all my training image to one fix size? (but if i resize my training image to one fixed size, i have landscape and portrait image). Thanks :)
This is an extremely vague question since every image processing algorithm has different approaches to extracting features. However, in your parking lot example, you would probably need to do RGB to Greyscale conversion and Size normalization among other image processing techniques.
A great starting point would be in this link: http://www.scipy-lectures.org/advanced/image_processing/
First detect the cars present in the image, and obtain their size and alignment. Then go for segmentation and labeling of the parking lot by fixing a suitable size and alignment.
as you want use pklot dataset for training your machine and test with real data, the best approach is to make both datasets similar and homological, they must be normalized , fixed sized , gray-scaled and parameterized shapes. then you can use Scale-invariant feature transform (SIFT) for image feature extraction as basic method.the exact definition often depends on the problem or the type of application. Since features are used as the starting point and main primitives for subsequent algorithms, the overall algorithm will often only be as good as its feature detector. you can use these types of image features based on your problem:
Corners / interest points
Edges
Blobs / regions of interest points
Ridges
...