Tensorflow pretrained models input channel range - python

I came across this example which implements a pretrained model. It says:
Format the Data
Use the tf.image module to format the images for the task.
Resize the images to a fixed input size, and rescale the input
channels to a range of [-1,1]
IMG_SIZE = 160 # All images will be resized to 160x160
def format_example(image, label):
image = tf.cast(image, tf.float32)
image = (image/127.5) - 1
image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
return image, label
I was wondering about this. What I understand is that image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE)) resizes the images (which can have any size) to one consistent size. I understand that image = (image/127.5) - 1 does not change the actual size of the images, but changes the values (pixels) (which are between 0 to 255) to a range of [-1,1]. In other examples I saw normalization/standardization being done to a range of [0,1], so rescaling by 1.0/255. I do not understand when I have to use which. If I use my own model, it is up to me to scale to a range of [-1,1] or [0,1]? However, when I use a pretrained model I need to know what is required. I googled the mobilenetv2 model, but could not find any documentation telling me that the required input channel is [-1,1]. In this comment it says all pretrained tensorflow models require an input channel of [-1,1]. Is that true? Especially, is that true that all models in the tensorflow hub (if about images) require a range of [-1,1]?
Finally, how do I find out what the required range is for a pretrained model? I would not have figured out the [-1,1] in case of MobileNetv2 by my own. On the tensorflow MobileNetv2 page I could not find this information.
Furthermore: Is there a way to basically have this done automatically? So that I use a function and it automatically checks the pretrained tensorflow dataset (which has an object storing that information) and applies it (assuming 0-255 is my input)? I think tf.keras.applications.mobilenet_v2.preprocess_input is doing something else (I am not really understanding what it does)? And it is also just for mobilenetv2.

Generally, you are concerned with 'what scaling should I choose between [0, 1], [-1, 1] ?' As the answer may be different depending on the cases, I would like to point them out below.
CNN architectures work better in short closed range input values. Therefore, both, [0, 1] and [-1, 1] may be a good choice. However, depending on the architecture, the selection can be different. As a result, it would be a good option to try various scales.
Concerning the pre-trained model of Keras, I noticed that most models that use residuals (such as, ResNets, MobileNetV2, InceptionResNetV2) use [-1, 1] scale. Using [-1, 1] scales in residuals, causes some edges to be deactivated in some cases. To further understand, let us consider a perceptron y = wx + b. If w = 1 and b = 1 then using input x = 1 results y = 0. This states that using [-1, 1] scale, some input values can be nullified by the bias (without setting w=0). This concept is mostly true for the other models (excluding Keras) as well.
Almost all of the Keras architectures use scaling techniques. I believe in some cases, they did not perform the suggested operations instructed by the original papers. So, I believe you should stick with Keras' documentation in case of using their pre-trained model. If you do not find any scaling on their documentation, you should avoid scaling it.
Furthermore, you should try testing different scaling methods while you are using different datasets. However, this should not highly improve the accuracy of the model in most cases. Please let me know if you have more queries. Thanks.

Related

How to train different size of image using cnn? [duplicate]

I am trying to train my model which classifies images.
The problem I have is, they have different sizes. how should i format my images/or model architecture ?
You didn't say what architecture you're talking about. Since you said you want to classify images, I'm assuming it's a partly convolutional, partly fully connected network like AlexNet, GoogLeNet, etc. In general, the answer to your question depends on the network type you are working with.
If, for example, your network only contains convolutional units - that is to say, does not contain fully connected layers - it can be invariant to the input image's size. Such a network could process the input images and in turn return another image ("convolutional all the way"); you would have to make sure that the output matches what you expect, since you have to determine the loss in some way, of course.
If you are using fully connected units though, you're up for trouble: Here you have a fixed number of learned weights your network has to work with, so varying inputs would require a varying number of weights - and that's not possible.
If that is your problem, here's some things you can do:
Don't care about squashing the images. A network might learn to make sense of the content anyway; does scale and perspective mean anything to the content anyway?
Center-crop the images to a specific size. If you fear you're losing data, do multiple crops and use these to augment your input data, so that the original image will be split into N different images of correct size.
Pad the images with a solid color to a squared size, then resize.
Do a combination of that.
The padding option might introduce an additional error source to the network's prediction, as the network might (read: likely will) be biased to images that contain such a padded border.
If you need some ideas, have a look at the Images section of the TensorFlow documentation, there's pieces like resize_image_with_crop_or_pad that take away the bigger work.
As for just don't caring about squashing, here's a piece of the preprocessing pipeline of the famous Inception network:
# This resizing operation may distort the images because the aspect
# ratio is not respected. We select a resize method in a round robin
# fashion based on the thread number.
# Note that ResizeMethod contains 4 enumerated resizing methods.
# We select only 1 case for fast_mode bilinear.
num_resize_cases = 1 if fast_mode else 4
distorted_image = apply_with_random_selector(
distorted_image,
lambda x, method: tf.image.resize_images(x, [height, width], method=method),
num_cases=num_resize_cases)
They're totally aware of it and do it anyway.
Depending on how far you want or need to go, there actually is a paper here called Spatial Pyramid Pooling in Deep Convolution Networks for Visual Recognition that handles inputs of arbitrary sizes by processing them in a very special way.
Try making a spatial pyramid pooling layer. Then put it after your last convolution layer so that the FC layers always get constant dimensional vectors as input . During training , train the images from the entire dataset using a particular image size for one epoch . Then for the next epoch , switch to a different image size and continue training .

When making keras model having none-fixed input data size, is it possible to apply different layers depending on the size using lambda layer?

I am dealing with the Convolutional image model, which I convert and store the model to yaml file and then use it in code.
The full size of input image is 256 * 256, but during the training, I train the model using a patch of size 128 * 128, and in the validation process I get the full size image. Therefore, the input size of the model is set to None.
I would like to create a model by cropping only the middle part of the image, size of 64 * 64 from this input layer. At this time, the model have to crop the image in different length according to the input image size to produce the same output size(64*64). However, Is it possible to apply the if-else statement in my code? I would appreciate it if you could help me with the code.
patch = (None,None, 6)
x_input = Input(shape=patch)
def get_crop(x):
from keras.layers import Cropping2D
if x.get_shape().as_list()[1:3] ==[256,256]:
return Cropping2D(cropping=(96,96))(x)
else:
return Cropping2D(cropping=(32,32))(x)
x_crop = Lambda(get_crop)(x_input)
You better ask in another StackExchange like CrossValidation but here is a short answer.
When dealing with varying image sizes, there’s two solutions. First is cropping the big image into multiple sub images and use a vote on the class you get for each sub image. The second solution, which is far better, is to have a fully Convolutional network. You can replace fully-connected blocks by big convolutions and use a global pooling for the classification layer (GlobalAveragePooling or MaxPooling).
Note that those solutions work only if the images you get are bigger. If there’s smaller images, the solution is to zoom the image or pad it. But better zoom.

How to use pre-trained models without classes in Tensorflow?

I'm trying to use a pretrained network such as tf.keras.applications.ResNet50 but I have two problems:
I just want to obtain the top embedding layers at the end of the network, because I don't want to do any image classification. So due to this there is no need for a classes number I think.
tf.keras.applications.ResNet50 takes a default parameter 'classes=1000'
Is there a way how I can omit this parameter?
My input pictures are 128*128*1 pixels and not 224*224*3
What is the best way to fix my input data shape?
My goal is to make a triplet loss network with the output of a resnet network.
Thanks a lot!
ResNet50 has a parameter include_top exactly for that purpose -- set it to False to skip the last fully connected layer. (It then outputs a feature vector of length 2048).
The best way to reduce your image size is to resample the images, e.g. using the dedicated function tf.image.resample_images.
Also, I did not notice at first that your input images have only three channels, thx #Daniel. I suggest you build your 3-channel grayscale image on the GPU (not on the host using numpy) to avoid tripling your data transfer to GPU memory, using tf.tile:
im3 = tf.tile(im, (1, 1, 1, 3))
As a complement to the other answer. You will also need to make your images have three channels, although technically not the best input for Resnet, it is the easiest solution (changing the Resnet model is an option too, if you visit the source code and change the input shape yourself).
Use numpy to pack images in three channels:
images3ch = np.concatenate([images,images,images], axis=-1)

How could I feed a custom image into this model?

I've been following a course online and one of the exercises was to create a simple image detection model (using MNIST data) to detect written numbers. I've been trying to load a custom image I drew in (128x128 jpg) but I can't seem to figure it out. I'm really close, but I think I'm just confused about what parameters the model takes in. Any help would be appreciated!!
Here is my code
Simply convert your image to an 128x128 numpy array with values between 0 and 1.
Then:
image = Variable(torch.from_numpy(image))[None, :, :]
classification = model(image)
classification is then a pytorch Variable containing probabilities of belonging to each class.

Tensorflow ConcatOp Error with Object Detection API

I'm following tensorflow object detection api instructions and trying to train existing object-detection model("faster_rcnn_resnet101_coco") with my own dataset having 50 classes.
So according to my own dataset, I created
TFRecord (FOR training,evaluation and testing separately)
labelmap.pbtxt
Next, I edited model.config only for model-faster_rcnn-num_classes(90 -> 50(the number of classes of my own dataset), train_config-batch_size(1 -> 10), train_config-num_steps(200000 -> 100), train_input_reader-tf_record_input_reader-input_path(to the path where TFRecord resides) and train_input_reader-label_map_path(to the path where labelmap.pbtxt resides).
Finally, I run the command
python train.py \
--logtostderr \
--pipeline_config_path="PATH WHERE CONFIG FILE RESIDES" \
--train_dir="PATH WHERE MODEL DIRECTORY RESIDES"
And I met the error below:
InvalidArgumentError (see above for traceback): ConcatOp : Dimensions
of inputs should match: shape[0] = [1,890,600,3] vs. shape[1] =
[1,766,600,3] [[Node: concat_1 = ConcatV2[N=10, T=DT_FLOAT,
Tidx=DT_INT32,
_device="/job:localhost/replica:0/task:0/cpu:0"](Preprocessor/sub, Preprocessor_1/sub, Preprocessor_2/sub, Preprocessor_3/sub,
Preprocessor_4/sub, Preprocessor_5/sub, Preprocessor_6/sub,
Preprocessor_7/sub, Preprocessor_8/sub, Preprocessor_9/sub,
concat_1/axis)]]
It seems like the dimension of input images so it may be caused by not resizing the raw image data.
But As I know, model automatically resizes the input image to train (isn't it?)
Then I'm stuck with this issue.
If there is solution, I'll appreciate it for your answer.
Thanks.
UPDATE
When I updated my batch_size field from 10 to one(original one), it seems to train without any problem... but I don't understand why...
TaeWoo is right, you have to set batch_size to 1 in order to train Faster RCNN.
This is because FRCNN uses a keep_aspect_ratio_resizer, which in turn means that if you have images of different sizes, they will also be different sizes after the preprocessing. This practically makes batching impossible, since a batch tensor has a shape [num_batch, height, width, channels]. You can see this is a problem when (height, width) differ from one example to the next.
This is in contrast to the SSD model, which uses a "normal" resizer, i.e. regardless of the input image, all preprocessed examples will end-up having the same size, which allows them to be batched together.
Now, if you have images of different sizes, you practically have two ways of using batching:
use Faster RCNN and pad your images before, either one time before training, or continuously as a preprocessing step. I'd suggest the former, since this type of preprocessing seems to slow down learning a lot
use SSD, but be sure that your objects are not affected too much by distortion. This shouldn't be a very big problem, it works as a way of data augmentation.
I had the same problem. Setting batch_size=1 does indeed seem to solve this problem but i am not sure if this will have any effect on accuracy of the model. Would love to get TF team's answer to this.
I had a similar problem that I want to share, maybe it would others with similar situations. I've changed SSD OD net to find bboxes with a fifth variable which is an angle. The problem was that we inserted an empty list to the angle variable in the bounding box and then I had a problem in tf.concat operation :
Dimensions of inputs should match: shape[0] = [1,43] vs. shape[4] = [1,0]
(shape[0] changed if I rerun the session but shape[4] stayed the same [1,0])
I fixed the problem by fixing my tf record to have a list of angles in the same lenth of other bbox variables (xmin, xmax, ymin, ymax).
Hope it helps someone , it took me a whole day to find out the problem.
Regards,
Alon

Categories