I'm trying to use a pretrained network such as tf.keras.applications.ResNet50 but I have two problems:
I just want to obtain the top embedding layers at the end of the network, because I don't want to do any image classification. So due to this there is no need for a classes number I think.
tf.keras.applications.ResNet50 takes a default parameter 'classes=1000'
Is there a way how I can omit this parameter?
My input pictures are 128*128*1 pixels and not 224*224*3
What is the best way to fix my input data shape?
My goal is to make a triplet loss network with the output of a resnet network.
Thanks a lot!
ResNet50 has a parameter include_top exactly for that purpose -- set it to False to skip the last fully connected layer. (It then outputs a feature vector of length 2048).
The best way to reduce your image size is to resample the images, e.g. using the dedicated function tf.image.resample_images.
Also, I did not notice at first that your input images have only three channels, thx #Daniel. I suggest you build your 3-channel grayscale image on the GPU (not on the host using numpy) to avoid tripling your data transfer to GPU memory, using tf.tile:
im3 = tf.tile(im, (1, 1, 1, 3))
As a complement to the other answer. You will also need to make your images have three channels, although technically not the best input for Resnet, it is the easiest solution (changing the Resnet model is an option too, if you visit the source code and change the input shape yourself).
Use numpy to pack images in three channels:
images3ch = np.concatenate([images,images,images], axis=-1)
Related
I am working on a problem where I have to classify images into different groups. I am a beginner and working with Keras with simple sequence model. How should I tackle the problem of images with different dimension in below code e.g. some images have dimension 2101583 while some have 210603 etc. Please suggest.
model.add(Dense(100,input_dim = ?,activation= "sigmoid"))
model.add(Dense(100,input_dim = ?,activation= "sigmoid"))
For simple classification algorithms in Keras the input should always be the same size. Since Feedforward Neural Networks are consisting of an input layer, one to many hidden layers and one output layer, with all nodes connected, you should always have an Input at each Input-Node. Moreover, the shape and other hyperparameters of the Neural Network are static, so you can't change the number of inputs, and therefor the size for each Image in one Neural Network.
The best practice for your case would be to either downsize all Images to the size of your smallest Image, or upsize all Images to the size of your largest Image.
Downsizing
With downsizing you would actively delete pixels from your Image, including information contained in the pixels. This can lead to overfitting, but would decrease the computational time, too.
Upsizing
With upsizing you would add pixels to your image, without adding information. This would increase computational time, but you would keep the inforamtion of each Image.
For a good start I would suggest you to try and downsize your Images to the smallest one. This is a common practice in science as well [1]. One library to do so is OpenCV, for implementation issues, please refer to the multiple questions on Stackoverflow:
Python - resize image
Opencv-Python-Resizing image
I am trying to train my model which classifies images.
The problem I have is, they have different sizes. how should i format my images/or model architecture ?
You didn't say what architecture you're talking about. Since you said you want to classify images, I'm assuming it's a partly convolutional, partly fully connected network like AlexNet, GoogLeNet, etc. In general, the answer to your question depends on the network type you are working with.
If, for example, your network only contains convolutional units - that is to say, does not contain fully connected layers - it can be invariant to the input image's size. Such a network could process the input images and in turn return another image ("convolutional all the way"); you would have to make sure that the output matches what you expect, since you have to determine the loss in some way, of course.
If you are using fully connected units though, you're up for trouble: Here you have a fixed number of learned weights your network has to work with, so varying inputs would require a varying number of weights - and that's not possible.
If that is your problem, here's some things you can do:
Don't care about squashing the images. A network might learn to make sense of the content anyway; does scale and perspective mean anything to the content anyway?
Center-crop the images to a specific size. If you fear you're losing data, do multiple crops and use these to augment your input data, so that the original image will be split into N different images of correct size.
Pad the images with a solid color to a squared size, then resize.
Do a combination of that.
The padding option might introduce an additional error source to the network's prediction, as the network might (read: likely will) be biased to images that contain such a padded border.
If you need some ideas, have a look at the Images section of the TensorFlow documentation, there's pieces like resize_image_with_crop_or_pad that take away the bigger work.
As for just don't caring about squashing, here's a piece of the preprocessing pipeline of the famous Inception network:
# This resizing operation may distort the images because the aspect
# ratio is not respected. We select a resize method in a round robin
# fashion based on the thread number.
# Note that ResizeMethod contains 4 enumerated resizing methods.
# We select only 1 case for fast_mode bilinear.
num_resize_cases = 1 if fast_mode else 4
distorted_image = apply_with_random_selector(
distorted_image,
lambda x, method: tf.image.resize_images(x, [height, width], method=method),
num_cases=num_resize_cases)
They're totally aware of it and do it anyway.
Depending on how far you want or need to go, there actually is a paper here called Spatial Pyramid Pooling in Deep Convolution Networks for Visual Recognition that handles inputs of arbitrary sizes by processing them in a very special way.
Try making a spatial pyramid pooling layer. Then put it after your last convolution layer so that the FC layers always get constant dimensional vectors as input . During training , train the images from the entire dataset using a particular image size for one epoch . Then for the next epoch , switch to a different image size and continue training .
I am dealing with the Convolutional image model, which I convert and store the model to yaml file and then use it in code.
The full size of input image is 256 * 256, but during the training, I train the model using a patch of size 128 * 128, and in the validation process I get the full size image. Therefore, the input size of the model is set to None.
I would like to create a model by cropping only the middle part of the image, size of 64 * 64 from this input layer. At this time, the model have to crop the image in different length according to the input image size to produce the same output size(64*64). However, Is it possible to apply the if-else statement in my code? I would appreciate it if you could help me with the code.
patch = (None,None, 6)
x_input = Input(shape=patch)
def get_crop(x):
from keras.layers import Cropping2D
if x.get_shape().as_list()[1:3] ==[256,256]:
return Cropping2D(cropping=(96,96))(x)
else:
return Cropping2D(cropping=(32,32))(x)
x_crop = Lambda(get_crop)(x_input)
You better ask in another StackExchange like CrossValidation but here is a short answer.
When dealing with varying image sizes, there’s two solutions. First is cropping the big image into multiple sub images and use a vote on the class you get for each sub image. The second solution, which is far better, is to have a fully Convolutional network. You can replace fully-connected blocks by big convolutions and use a global pooling for the classification layer (GlobalAveragePooling or MaxPooling).
Note that those solutions work only if the images you get are bigger. If there’s smaller images, the solution is to zoom the image or pad it. But better zoom.
What I am trying to accomplish is doing inference in Tensorflow with a batch of images at a time instead of a single image. I am wondering what is the most appropriate way of handling processing of multiple images to speed up inference?
Doing inference on a single image is easily done and quite used in most tutorials, but what I have not seen yet is doing that in a batch-like style.
Here's what I am currently using at a high level:
pl = tf.placeholder(tf.float32)
...
sess.run([boxes, confs], feed_dict={pl: image})
I would appreciate any input on this.
Depending on how your model is designed, you can just feed an array of images to pl. The first dimension of your outputs then corresponds to the index of your image in the batch.
Many tensor ops have an implementation for multiple examples in a batch. There are some exceptions though, for example tf.image.decode_jpeg. In this case, you will have to rewrite your network, using tf.map_fn, for example.
I'm running the default classify_image code of the imagenet model. Is there any way to visualize the features that it has extracted? If I use 'pool_3:0', that gives me the feature vector. Is there any way to overlay this on top of my image to see which features it has picked as important?
Ross Girshick described one way to visualize what a pooling layer has learned: https://www.cs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf
Essentially instead of visualizing features, you find a few images that a neuron fires most on. You repeat that for a few or all neurons from your feature vector. The algorithm needs lots of images to choose from of course, e.g. the test set.
I wrote my implementation of this idea for cifar10 model in Tensorflow today, which I want to share (uses OpenCV): https://gist.github.com/kukuruza/bb640cebefcc550f357c
You could use it if you manage to provide the images tensor for reading images by batches, and the pool_3:0 tensor.