Using non-square input matrix for convolutional autoencoder

Using non-square input matrix for convolutional autoencoder - python

Is it possible to train convolutional autoencoder (CAE) with non-square (rectangular) input matrix? All the tutorials and resources I have studied on CAE seems to use squared images. The data I am working with is not image. I have hundreds of single cells and for each cell there is a matrix (genomic data) with thousands of genes in rows and hundreds of bins in columns (genomic region of interest for each gene divided into the bins of equal size).
I have tried some models with Keras, but size of input in the encoder part of the model is always different than the size of output matrix in the decoder. So it is giving error. Can someone help me how to solve this problem?

it is hard to tell what is the issue here since no sample code is provided . However most propably your Matrix is diminsions are odd ( example 9×9 ) or becomes odd while pooling. tofix this issue you need to either pad your input to make the matrix diminsions even . our crop the decoder lsyers of your autoencoder to have a matching output size

Related

How to tackle different Image dimensions for classification

I am working on a problem where I have to classify images into different groups. I am a beginner and working with Keras with simple sequence model. How should I tackle the problem of images with different dimension in below code e.g. some images have dimension 2101583 while some have 210603 etc. Please suggest.
model.add(Dense(100,input_dim = ?,activation= "sigmoid"))
model.add(Dense(100,input_dim = ?,activation= "sigmoid"))

For simple classification algorithms in Keras the input should always be the same size. Since Feedforward Neural Networks are consisting of an input layer, one to many hidden layers and one output layer, with all nodes connected, you should always have an Input at each Input-Node. Moreover, the shape and other hyperparameters of the Neural Network are static, so you can't change the number of inputs, and therefor the size for each Image in one Neural Network.
The best practice for your case would be to either downsize all Images to the size of your smallest Image, or upsize all Images to the size of your largest Image.
Downsizing
With downsizing you would actively delete pixels from your Image, including information contained in the pixels. This can lead to overfitting, but would decrease the computational time, too.
Upsizing
With upsizing you would add pixels to your image, without adding information. This would increase computational time, but you would keep the inforamtion of each Image.
For a good start I would suggest you to try and downsize your Images to the smallest one. This is a common practice in science as well [1]. One library to do so is OpenCV, for implementation issues, please refer to the multiple questions on Stackoverflow:
Python - resize image
Opencv-Python-Resizing image

How to train different size of image using cnn? [duplicate]

I am trying to train my model which classifies images.
The problem I have is, they have different sizes. how should i format my images/or model architecture ?

You didn't say what architecture you're talking about. Since you said you want to classify images, I'm assuming it's a partly convolutional, partly fully connected network like AlexNet, GoogLeNet, etc. In general, the answer to your question depends on the network type you are working with.
If, for example, your network only contains convolutional units - that is to say, does not contain fully connected layers - it can be invariant to the input image's size. Such a network could process the input images and in turn return another image ("convolutional all the way"); you would have to make sure that the output matches what you expect, since you have to determine the loss in some way, of course.
If you are using fully connected units though, you're up for trouble: Here you have a fixed number of learned weights your network has to work with, so varying inputs would require a varying number of weights - and that's not possible.
If that is your problem, here's some things you can do:
Don't care about squashing the images. A network might learn to make sense of the content anyway; does scale and perspective mean anything to the content anyway?
Center-crop the images to a specific size. If you fear you're losing data, do multiple crops and use these to augment your input data, so that the original image will be split into N different images of correct size.
Pad the images with a solid color to a squared size, then resize.
Do a combination of that.
The padding option might introduce an additional error source to the network's prediction, as the network might (read: likely will) be biased to images that contain such a padded border.
If you need some ideas, have a look at the Images section of the TensorFlow documentation, there's pieces like resize_image_with_crop_or_pad that take away the bigger work.
As for just don't caring about squashing, here's a piece of the preprocessing pipeline of the famous Inception network:
# This resizing operation may distort the images because the aspect
# ratio is not respected. We select a resize method in a round robin
# fashion based on the thread number.
# Note that ResizeMethod contains 4 enumerated resizing methods.
# We select only 1 case for fast_mode bilinear.
num_resize_cases = 1 if fast_mode else 4
distorted_image = apply_with_random_selector(
distorted_image,
lambda x, method: tf.image.resize_images(x, [height, width], method=method),
num_cases=num_resize_cases)
They're totally aware of it and do it anyway.
Depending on how far you want or need to go, there actually is a paper here called Spatial Pyramid Pooling in Deep Convolution Networks for Visual Recognition that handles inputs of arbitrary sizes by processing them in a very special way.

Try making a spatial pyramid pooling layer. Then put it after your last convolution layer so that the FC layers always get constant dimensional vectors as input . During training , train the images from the entire dataset using a particular image size for one epoch . Then for the next epoch , switch to a different image size and continue training .

Superpixels input to CNN

How to give superpixels as input to CNN? I used SLIC algorithm to segment the images into superpixels.
How can I use this for classification using CNN?

I will try to help you. CNN (Convolutional Neural Networks) work with unique datas of input, not matrices (superpixel is a matrix). So, for this, you need to remove each superpixel and make it its own image. So, in other words, if you segment your image in 300 superpixels, after, you need to create 300 new images, one of each superpixel.
After this it's notorious that each new image, perhaps, will have differents sizes. You can't work like that, because the number of neurons of input in the CNN can't change. For this, you can centralize each "new image" in a background NxN ('N' must be enough to cover all new images). WIth a centralized superpixel (new images centralized), each pixel will be input of your CNN.In other words:
1) Each centralized superpixel will be input one at a time;
2) The quantity of inputs in the CNN will be X*Y, being X the shape[0] of the superpixel centralized, and Y the shape[1] of the superpixel centralized;
3) Whereas 300 superpixels centralized, your CNN must calculate the output for each one.
Ilustration: https://imgur.com/k8pRDw7
Look the ilustration and good luck! :)

Multiple Output Vectors for a single Input in Keras

I want to create a Neural Network in Keras for converting handwriting into computer letters.
My first step is to convert a sentence into an Array. My Array has the shape (1, number of letters,27). Now I want to input it in my Deep Neural Network and train.
But how do I input it properly if the dimension doesn't fit those from my image? And how do I achieve that my predict function gives me an output array of (1, number of letters,27)?

Seems like you are attempting to do Handwritten Recognition or similarly Optical Character Recognition or OCR. This is quite a broad field, and there are many ways to proceed. Even though, one approach I suggest is the following:
It is commonly known that Neural Networks have fixed size inputs, that is if you build it to take, say, inputs of shape (28,28,1) then the model will expect that shape as their inputs. Therefore, having a dimension in your samples that depends on the number of letters in a sentence (something variable) is not recommended, as you will not be able to train a model in such way with NNs.
Training such a model could be possible if you design it to predict one character at a time, instead a whole sentence that can have different lengths, and then group the predicted characters. The steps you could try to achieve this could be:
Obtain training samples for the characters you wish to recognize (like the MNIST database for example), and design and train your model to predict one character at a time.
Take the image with writing to classify and pass a Sliding Window over it that matches your expected input size (say a 28x28 window). Then, classify each of those windows to a character. Instead of Sliding Window, you could try isolating your desired features somehow and just classify those 28x28 segments instead.
Group the predicted characters somehow so you get words (probably grouping those separated by empty spaces) or do whatever you want with the predictions.
You can also try searching for tutorials or guides for Handwriting recognition like this one I have found quite useful. Hope this helps you get on track, good luck.

Training a convolutional network on template images

I'm trying to design and train a convolutional neural network to identify circular cells in an image. I am training it on "cutouts" of the full images, which either have a circle in the middle of the image (positive training sample) or don't (negative training sample).
Example of an image with a circle in the middle (the heatmap colors are wonky, the images are all grayscale): http://imgur.com/a/6q8LZ
Rather than just classify the two types of input images (circle or not in the middle), I'd like the network output to be a binary bitmap, which is either a uniform value (e.g. -1) if there is no circle in the input image or has a "blotch" (ideally a single point) in the middle of the image to indicate the center of the circle. This would then be applied to a large image containing many such circular cells and the output should be a bitmap with blotches where the cells are.
In order to train this, I'm using the mean square error between the output image and a 2D gaussian filter (http://imgur.com/a/fvfP6) for positive training samples and the MSE between the image and a uniform matrix with value -1 for negative training samples. Ideally, this should cause the CNN to converge on an image, which resembles the gaussian peak in the middle for positive training samples, and an image, which is uniformly -1 for negative training samples.
HOWEVER, the network keeps converging on a unversal solution of "make everything zero". This does not minimize the MSE, so I don't think it's an inherent problem with the network structure (I've tried different structures, from a single layer CNN with a filter as large as the input image to multilayer CNNs with filters of varying size, all with the same result).
The loss function I am using is as follows:
weighted_score = tf.reduce_sum(tf.square(tf.sub(conv_squeeze, y)),
reduction_indices=[1, 2])
with conv_squeeze being the output image of the network and y being the label (i.e. the gaussian template shown above). I've already tried averaging over the batch size as suggested here:
Using squared difference of two images as loss function in tensorflow
but without success. I cannot find any academic publications on how to train neural networks with template images as labels and as such would be grateful for anybody to point me in the right direction. Thank you so much!

According to you description, I think you are facing an "imbalanced data" problem. And you can try Hinge Loss instead of MSE, it may solve your problem.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.