I have a CNN that I built in TF, that takes Input(shape = (1000,1000,5)). This is because the image is a stack of 5 black-and-white images. All my samples are aerial shots of different locales, so the original images are of varying sizes (and much larger than 1000, 1000). However, TF requires that all inputs are of the same pre-determined size, so we'd decided to cut a 1000, 1000 from within all images.
This obviously loses a lot of usable information. I'm looking into ways to possibly make network with dynamic input shape, such that could take all of the original images. I found multiple suggestions online but I find problems with each.
Making a separate Input() for each image size (say I have 2 types of sizes) which could then be convoluted into the same shape tensor. The issue is that this means one convolutional layer will be unaffected by half the observations which is bad (unless I misunderstand how the layer below works). For example:
# a and b are fixed, different, image shapes
for image in inputs:
if image.shape == a:
x = Input(shape = a)
x = conv2d_transpose(output_shape = (1000, 1000, 5))
else:
y = Input(shape = b)
y = conv2d_transpose(output_shape = (1000, 1000, 5))
Using Eager Execution. Here all the examples I find are still for fixed input shape so I can't think of a way to use it to iterate over samples and create different inputs. I found this thread but the asker answered himself and I can't really say I understand the solution I'm after.
Resizing the images before the input. This is a particularly bad idea because resizing distorts the image and loses plenty of spatial elements, which are vital
Any input (I'm sorry) would be appreciated.
Any reason not to take the maximum possible size and zero-pad inputs? That's what the fella did in the thread you linked. Kinda tacky, but it was once common practice in NLP, and can optionally take place within convolutional layers.
Related
If I'm working with a dataset where I have ~100,000 training images and ~20,000 validation images, each of size 32 x 32 x 3, how does the size and the dimensions of my dataset affect the number of Conv2d layers I have in my CNN? My intuition is to use fewer Conv2d layers, 2-3, because any more than 3 layers will be working with parts of the image that are too small to gain relevant data from.
In addition, does it make sense to have layers with a large number of filters, >128? My thought is that when dealing with small images, it doesn't make sense to have a large number of parameters.
Since you have the exact input size like the images in Cifar10 and Cifar100 just have a look what people tried out.
In general you can start with something like a ResNet18. Also I don't quite understand why you say
because any more than 3 layers will be working with parts of the image that are too small to gain relevant data from.
As long as you don't downsample using something like max pooling or a conv with padding 1 and stride 2. The size of 32x32 will be the same and only the number of channels will change depending on the network.
Designing networks is almost always at looking what did other people do and what worked for them and starting from there. You almost never want to do it from scratch on your own, since the iteration cycles are just to long and models released by researches from Google, Facebook ... had way more resources then you will ever have to find something good.
I am trying to train my model which classifies images.
The problem I have is, they have different sizes. how should i format my images/or model architecture ?
You didn't say what architecture you're talking about. Since you said you want to classify images, I'm assuming it's a partly convolutional, partly fully connected network like AlexNet, GoogLeNet, etc. In general, the answer to your question depends on the network type you are working with.
If, for example, your network only contains convolutional units - that is to say, does not contain fully connected layers - it can be invariant to the input image's size. Such a network could process the input images and in turn return another image ("convolutional all the way"); you would have to make sure that the output matches what you expect, since you have to determine the loss in some way, of course.
If you are using fully connected units though, you're up for trouble: Here you have a fixed number of learned weights your network has to work with, so varying inputs would require a varying number of weights - and that's not possible.
If that is your problem, here's some things you can do:
Don't care about squashing the images. A network might learn to make sense of the content anyway; does scale and perspective mean anything to the content anyway?
Center-crop the images to a specific size. If you fear you're losing data, do multiple crops and use these to augment your input data, so that the original image will be split into N different images of correct size.
Pad the images with a solid color to a squared size, then resize.
Do a combination of that.
The padding option might introduce an additional error source to the network's prediction, as the network might (read: likely will) be biased to images that contain such a padded border.
If you need some ideas, have a look at the Images section of the TensorFlow documentation, there's pieces like resize_image_with_crop_or_pad that take away the bigger work.
As for just don't caring about squashing, here's a piece of the preprocessing pipeline of the famous Inception network:
# This resizing operation may distort the images because the aspect
# ratio is not respected. We select a resize method in a round robin
# fashion based on the thread number.
# Note that ResizeMethod contains 4 enumerated resizing methods.
# We select only 1 case for fast_mode bilinear.
num_resize_cases = 1 if fast_mode else 4
distorted_image = apply_with_random_selector(
distorted_image,
lambda x, method: tf.image.resize_images(x, [height, width], method=method),
num_cases=num_resize_cases)
They're totally aware of it and do it anyway.
Depending on how far you want or need to go, there actually is a paper here called Spatial Pyramid Pooling in Deep Convolution Networks for Visual Recognition that handles inputs of arbitrary sizes by processing them in a very special way.
Try making a spatial pyramid pooling layer. Then put it after your last convolution layer so that the FC layers always get constant dimensional vectors as input . During training , train the images from the entire dataset using a particular image size for one epoch . Then for the next epoch , switch to a different image size and continue training .
I am dealing with the Convolutional image model, which I convert and store the model to yaml file and then use it in code.
The full size of input image is 256 * 256, but during the training, I train the model using a patch of size 128 * 128, and in the validation process I get the full size image. Therefore, the input size of the model is set to None.
I would like to create a model by cropping only the middle part of the image, size of 64 * 64 from this input layer. At this time, the model have to crop the image in different length according to the input image size to produce the same output size(64*64). However, Is it possible to apply the if-else statement in my code? I would appreciate it if you could help me with the code.
patch = (None,None, 6)
x_input = Input(shape=patch)
def get_crop(x):
from keras.layers import Cropping2D
if x.get_shape().as_list()[1:3] ==[256,256]:
return Cropping2D(cropping=(96,96))(x)
else:
return Cropping2D(cropping=(32,32))(x)
x_crop = Lambda(get_crop)(x_input)
You better ask in another StackExchange like CrossValidation but here is a short answer.
When dealing with varying image sizes, there’s two solutions. First is cropping the big image into multiple sub images and use a vote on the class you get for each sub image. The second solution, which is far better, is to have a fully Convolutional network. You can replace fully-connected blocks by big convolutions and use a global pooling for the classification layer (GlobalAveragePooling or MaxPooling).
Note that those solutions work only if the images you get are bigger. If there’s smaller images, the solution is to zoom the image or pad it. But better zoom.
I'm following tensorflow object detection api instructions and trying to train existing object-detection model("faster_rcnn_resnet101_coco") with my own dataset having 50 classes.
So according to my own dataset, I created
TFRecord (FOR training,evaluation and testing separately)
labelmap.pbtxt
Next, I edited model.config only for model-faster_rcnn-num_classes(90 -> 50(the number of classes of my own dataset), train_config-batch_size(1 -> 10), train_config-num_steps(200000 -> 100), train_input_reader-tf_record_input_reader-input_path(to the path where TFRecord resides) and train_input_reader-label_map_path(to the path where labelmap.pbtxt resides).
Finally, I run the command
python train.py \
--logtostderr \
--pipeline_config_path="PATH WHERE CONFIG FILE RESIDES" \
--train_dir="PATH WHERE MODEL DIRECTORY RESIDES"
And I met the error below:
InvalidArgumentError (see above for traceback): ConcatOp : Dimensions
of inputs should match: shape[0] = [1,890,600,3] vs. shape[1] =
[1,766,600,3] [[Node: concat_1 = ConcatV2[N=10, T=DT_FLOAT,
Tidx=DT_INT32,
_device="/job:localhost/replica:0/task:0/cpu:0"](Preprocessor/sub, Preprocessor_1/sub, Preprocessor_2/sub, Preprocessor_3/sub,
Preprocessor_4/sub, Preprocessor_5/sub, Preprocessor_6/sub,
Preprocessor_7/sub, Preprocessor_8/sub, Preprocessor_9/sub,
concat_1/axis)]]
It seems like the dimension of input images so it may be caused by not resizing the raw image data.
But As I know, model automatically resizes the input image to train (isn't it?)
Then I'm stuck with this issue.
If there is solution, I'll appreciate it for your answer.
Thanks.
UPDATE
When I updated my batch_size field from 10 to one(original one), it seems to train without any problem... but I don't understand why...
TaeWoo is right, you have to set batch_size to 1 in order to train Faster RCNN.
This is because FRCNN uses a keep_aspect_ratio_resizer, which in turn means that if you have images of different sizes, they will also be different sizes after the preprocessing. This practically makes batching impossible, since a batch tensor has a shape [num_batch, height, width, channels]. You can see this is a problem when (height, width) differ from one example to the next.
This is in contrast to the SSD model, which uses a "normal" resizer, i.e. regardless of the input image, all preprocessed examples will end-up having the same size, which allows them to be batched together.
Now, if you have images of different sizes, you practically have two ways of using batching:
use Faster RCNN and pad your images before, either one time before training, or continuously as a preprocessing step. I'd suggest the former, since this type of preprocessing seems to slow down learning a lot
use SSD, but be sure that your objects are not affected too much by distortion. This shouldn't be a very big problem, it works as a way of data augmentation.
I had the same problem. Setting batch_size=1 does indeed seem to solve this problem but i am not sure if this will have any effect on accuracy of the model. Would love to get TF team's answer to this.
I had a similar problem that I want to share, maybe it would others with similar situations. I've changed SSD OD net to find bboxes with a fifth variable which is an angle. The problem was that we inserted an empty list to the angle variable in the bounding box and then I had a problem in tf.concat operation :
Dimensions of inputs should match: shape[0] = [1,43] vs. shape[4] = [1,0]
(shape[0] changed if I rerun the session but shape[4] stayed the same [1,0])
I fixed the problem by fixing my tf record to have a list of angles in the same lenth of other bbox variables (xmin, xmax, ymin, ymax).
Hope it helps someone , it took me a whole day to find out the problem.
Regards,
Alon
I'm a tensorflow beginner so please bear with me.
Right now I am trying to modifiy an existing python programm for a CNN that creates a superresolution image. The Code can be found here if you're interested: https://github.com/pinae/Superresolution
The input tensor has the shape <5,240,320,3>, 5 being batch size, 240 and 320 the size of the image(s) and 3 being the number of channels (RGB). I want to modify this program for black and white images, so just 1 channel -> <5,240,320,1>
First, I convert the testing and validation images to b/w:
image = image.convert('L')
The images then get written into an array and this is where my issue starts. The array will have the size of <240,320>. The array of 5 images get written into a list and is handed over to tensorflow.
Tensorflow expects a <5,240,320,1> tensor but the list of images has the shape <5,240,320>, so one dimension is missing. I tried adding a dimension with np.expand_dims and the like but no success.
input_batches = np.expand_dims(input_batches, axis=-1)
Why does the index of channels of a tensorflow placeholder seem to start at 1 while the index of resolution starts at 0?
I'm sure there will be many more issues down the road like adjusting the filters but this is where I'm stuck now.
If you have a tensor of shape [5,240,320] you can reshape it to be [5,240,320,1] with this one command
correctSizedTensor = tf.reshape( wrongSizedTensor, [5,240,320,1] )
You need to understand the code. I believe the problem may be deeper rooted. With the limited information, I assume you are using network.py. In there, we see this in line 16:
self.inputs = tf.placeholder(
tf.float32, [batch_size, dimensions[1], dimensions[0], 3], name='input_images'
)
The depth dimension is already hard coded as 3. You would have to edit that as well, amongst possibly many other things.
As a caveat, most super-resolution CNNs use small patch sizes. Definitely not (240, 320), I suspect it will be hard to converge as the batch size is small.