How to give superpixels as input to CNN? I used SLIC algorithm to segment the images into superpixels.
How can I use this for classification using CNN?
I will try to help you. CNN (Convolutional Neural Networks) work with unique datas of input, not matrices (superpixel is a matrix). So, for this, you need to remove each superpixel and make it its own image. So, in other words, if you segment your image in 300 superpixels, after, you need to create 300 new images, one of each superpixel.
After this it's notorious that each new image, perhaps, will have differents sizes. You can't work like that, because the number of neurons of input in the CNN can't change. For this, you can centralize each "new image" in a background NxN ('N' must be enough to cover all new images). WIth a centralized superpixel (new images centralized), each pixel will be input of your CNN.In other words:
1) Each centralized superpixel will be input one at a time;
2) The quantity of inputs in the CNN will be X*Y, being X the shape[0] of the superpixel centralized, and Y the shape[1] of the superpixel centralized;
3) Whereas 300 superpixels centralized, your CNN must calculate the output for each one.
Ilustration: https://imgur.com/k8pRDw7
Look the ilustration and good luck! :)
Related
I am working on a problem where I have to classify images into different groups. I am a beginner and working with Keras with simple sequence model. How should I tackle the problem of images with different dimension in below code e.g. some images have dimension 2101583 while some have 210603 etc. Please suggest.
model.add(Dense(100,input_dim = ?,activation= "sigmoid"))
model.add(Dense(100,input_dim = ?,activation= "sigmoid"))
For simple classification algorithms in Keras the input should always be the same size. Since Feedforward Neural Networks are consisting of an input layer, one to many hidden layers and one output layer, with all nodes connected, you should always have an Input at each Input-Node. Moreover, the shape and other hyperparameters of the Neural Network are static, so you can't change the number of inputs, and therefor the size for each Image in one Neural Network.
The best practice for your case would be to either downsize all Images to the size of your smallest Image, or upsize all Images to the size of your largest Image.
Downsizing
With downsizing you would actively delete pixels from your Image, including information contained in the pixels. This can lead to overfitting, but would decrease the computational time, too.
Upsizing
With upsizing you would add pixels to your image, without adding information. This would increase computational time, but you would keep the inforamtion of each Image.
For a good start I would suggest you to try and downsize your Images to the smallest one. This is a common practice in science as well [1]. One library to do so is OpenCV, for implementation issues, please refer to the multiple questions on Stackoverflow:
Python - resize image
Opencv-Python-Resizing image
I am trying to train my model which classifies images.
The problem I have is, they have different sizes. how should i format my images/or model architecture ?
You didn't say what architecture you're talking about. Since you said you want to classify images, I'm assuming it's a partly convolutional, partly fully connected network like AlexNet, GoogLeNet, etc. In general, the answer to your question depends on the network type you are working with.
If, for example, your network only contains convolutional units - that is to say, does not contain fully connected layers - it can be invariant to the input image's size. Such a network could process the input images and in turn return another image ("convolutional all the way"); you would have to make sure that the output matches what you expect, since you have to determine the loss in some way, of course.
If you are using fully connected units though, you're up for trouble: Here you have a fixed number of learned weights your network has to work with, so varying inputs would require a varying number of weights - and that's not possible.
If that is your problem, here's some things you can do:
Don't care about squashing the images. A network might learn to make sense of the content anyway; does scale and perspective mean anything to the content anyway?
Center-crop the images to a specific size. If you fear you're losing data, do multiple crops and use these to augment your input data, so that the original image will be split into N different images of correct size.
Pad the images with a solid color to a squared size, then resize.
Do a combination of that.
The padding option might introduce an additional error source to the network's prediction, as the network might (read: likely will) be biased to images that contain such a padded border.
If you need some ideas, have a look at the Images section of the TensorFlow documentation, there's pieces like resize_image_with_crop_or_pad that take away the bigger work.
As for just don't caring about squashing, here's a piece of the preprocessing pipeline of the famous Inception network:
# This resizing operation may distort the images because the aspect
# ratio is not respected. We select a resize method in a round robin
# fashion based on the thread number.
# Note that ResizeMethod contains 4 enumerated resizing methods.
# We select only 1 case for fast_mode bilinear.
num_resize_cases = 1 if fast_mode else 4
distorted_image = apply_with_random_selector(
distorted_image,
lambda x, method: tf.image.resize_images(x, [height, width], method=method),
num_cases=num_resize_cases)
They're totally aware of it and do it anyway.
Depending on how far you want or need to go, there actually is a paper here called Spatial Pyramid Pooling in Deep Convolution Networks for Visual Recognition that handles inputs of arbitrary sizes by processing them in a very special way.
Try making a spatial pyramid pooling layer. Then put it after your last convolution layer so that the FC layers always get constant dimensional vectors as input . During training , train the images from the entire dataset using a particular image size for one epoch . Then for the next epoch , switch to a different image size and continue training .
Is it possible to train convolutional autoencoder (CAE) with non-square (rectangular) input matrix? All the tutorials and resources I have studied on CAE seems to use squared images. The data I am working with is not image. I have hundreds of single cells and for each cell there is a matrix (genomic data) with thousands of genes in rows and hundreds of bins in columns (genomic region of interest for each gene divided into the bins of equal size).
I have tried some models with Keras, but size of input in the encoder part of the model is always different than the size of output matrix in the decoder. So it is giving error. Can someone help me how to solve this problem?
it is hard to tell what is the issue here since no sample code is provided . However most propably your Matrix is diminsions are odd ( example 9×9 ) or becomes odd while pooling. tofix this issue you need to either pad your input to make the matrix diminsions even . our crop the decoder lsyers of your autoencoder to have a matching output size
Good afternoon,
I have a convolutional neural network to perform pixelwise classification with 6 classes of a batch of images. I would like to apply superpixel algorithm (the one in opencv) to the output of the network. Actually, the superpixels would be calculated from the input images, and then for every of these superpixel locations in the network output, I would compute the mode of the output classes, in order to have the same output class for every superpixel of the input image.
Since the output of the network during a feedforward pass is a [batch, w, h, 6] size tensor, I was thinking to reshape the tensor to [batch*w, h, 6] and then to iterate for every class (for i in range(6)) and compute the mode of that class for every superpixel and then reshape back to the original size.
What I would code in a numpy-based script should be something like:
for i in range ( number of superpixels):
for j in range(number of classes=6):
mask = superpixel_location[i]
net_new_output[:,:,j][mask] = mode(net_output[:,:,j][mask])
Although this is definitely easy to code in numpy, I am having issues trying to executing it in tensorflow, since I do not know how to implement for loops or how to manage them.
Can you help me out?
Thank you,
MC
I'm trying to design and train a convolutional neural network to identify circular cells in an image. I am training it on "cutouts" of the full images, which either have a circle in the middle of the image (positive training sample) or don't (negative training sample).
Example of an image with a circle in the middle (the heatmap colors are wonky, the images are all grayscale): http://imgur.com/a/6q8LZ
Rather than just classify the two types of input images (circle or not in the middle), I'd like the network output to be a binary bitmap, which is either a uniform value (e.g. -1) if there is no circle in the input image or has a "blotch" (ideally a single point) in the middle of the image to indicate the center of the circle. This would then be applied to a large image containing many such circular cells and the output should be a bitmap with blotches where the cells are.
In order to train this, I'm using the mean square error between the output image and a 2D gaussian filter (http://imgur.com/a/fvfP6) for positive training samples and the MSE between the image and a uniform matrix with value -1 for negative training samples. Ideally, this should cause the CNN to converge on an image, which resembles the gaussian peak in the middle for positive training samples, and an image, which is uniformly -1 for negative training samples.
HOWEVER, the network keeps converging on a unversal solution of "make everything zero". This does not minimize the MSE, so I don't think it's an inherent problem with the network structure (I've tried different structures, from a single layer CNN with a filter as large as the input image to multilayer CNNs with filters of varying size, all with the same result).
The loss function I am using is as follows:
weighted_score = tf.reduce_sum(tf.square(tf.sub(conv_squeeze, y)),
reduction_indices=[1, 2])
with conv_squeeze being the output image of the network and y being the label (i.e. the gaussian template shown above). I've already tried averaging over the batch size as suggested here:
Using squared difference of two images as loss function in tensorflow
but without success. I cannot find any academic publications on how to train neural networks with template images as labels and as such would be grateful for anybody to point me in the right direction. Thank you so much!
According to you description, I think you are facing an "imbalanced data" problem. And you can try Hinge Loss instead of MSE, it may solve your problem.