Issue with tensorflow shape with b/w images - python

I'm a tensorflow beginner so please bear with me.
Right now I am trying to modifiy an existing python programm for a CNN that creates a superresolution image. The Code can be found here if you're interested: https://github.com/pinae/Superresolution
The input tensor has the shape <5,240,320,3>, 5 being batch size, 240 and 320 the size of the image(s) and 3 being the number of channels (RGB). I want to modify this program for black and white images, so just 1 channel -> <5,240,320,1>
First, I convert the testing and validation images to b/w:
image = image.convert('L')
The images then get written into an array and this is where my issue starts. The array will have the size of <240,320>. The array of 5 images get written into a list and is handed over to tensorflow.
Tensorflow expects a <5,240,320,1> tensor but the list of images has the shape <5,240,320>, so one dimension is missing. I tried adding a dimension with np.expand_dims and the like but no success.
input_batches = np.expand_dims(input_batches, axis=-1)
Why does the index of channels of a tensorflow placeholder seem to start at 1 while the index of resolution starts at 0?
I'm sure there will be many more issues down the road like adjusting the filters but this is where I'm stuck now.

If you have a tensor of shape [5,240,320] you can reshape it to be [5,240,320,1] with this one command
correctSizedTensor = tf.reshape( wrongSizedTensor, [5,240,320,1] )

You need to understand the code. I believe the problem may be deeper rooted. With the limited information, I assume you are using network.py. In there, we see this in line 16:
self.inputs = tf.placeholder(
tf.float32, [batch_size, dimensions[1], dimensions[0], 3], name='input_images'
)
The depth dimension is already hard coded as 3. You would have to edit that as well, amongst possibly many other things.
As a caveat, most super-resolution CNNs use small patch sizes. Definitely not (240, 320), I suspect it will be hard to converge as the batch size is small.

Related

How do I stack the color planes of multiple images and run inference on that?

as a part of my project, i choosed remote sensing and deep learning topic.
I obtained few images using remote sensing techniques and loaded them into the colab.
The above picture describes the shape of each image
Now my model requires it in different format as it is trained on
where 3039 denotes the number of training samples.
so i have to change (2583,1900),(2411,2571),(2583,1900),(2583,1900,3) into a single array with dimensions (1,128,128,6)
The problem is how do i make 3 one-dimensional arrays and 1 three-dimensional arrays into 1 six-dimensional array.
How do i do it.
Please help me.
you can first resize the arrays to 128x128, then concat them along the last dim:
x = np.concatenate((
np.resize(dem,(1,128,128,1)),
np.resize(slope,(1,128,128,1)),
np.resize(nvdi,(1,128,128,1)),
np.resize(rgb,(1,128,128,3))), -1)

Dealing with varying sizes inputs for TF

I have a CNN that I built in TF, that takes Input(shape = (1000,1000,5)). This is because the image is a stack of 5 black-and-white images. All my samples are aerial shots of different locales, so the original images are of varying sizes (and much larger than 1000, 1000). However, TF requires that all inputs are of the same pre-determined size, so we'd decided to cut a 1000, 1000 from within all images.
This obviously loses a lot of usable information. I'm looking into ways to possibly make network with dynamic input shape, such that could take all of the original images. I found multiple suggestions online but I find problems with each.
Making a separate Input() for each image size (say I have 2 types of sizes) which could then be convoluted into the same shape tensor. The issue is that this means one convolutional layer will be unaffected by half the observations which is bad (unless I misunderstand how the layer below works). For example:
# a and b are fixed, different, image shapes
for image in inputs:
if image.shape == a:
x = Input(shape = a)
x = conv2d_transpose(output_shape = (1000, 1000, 5))
else:
y = Input(shape = b)
y = conv2d_transpose(output_shape = (1000, 1000, 5))
Using Eager Execution. Here all the examples I find are still for fixed input shape so I can't think of a way to use it to iterate over samples and create different inputs. I found this thread but the asker answered himself and I can't really say I understand the solution I'm after.
Resizing the images before the input. This is a particularly bad idea because resizing distorts the image and loses plenty of spatial elements, which are vital
Any input (I'm sorry) would be appreciated.
Any reason not to take the maximum possible size and zero-pad inputs? That's what the fella did in the thread you linked. Kinda tacky, but it was once common practice in NLP, and can optionally take place within convolutional layers.

How can only giving number of channels and no height and width to my convolutional neural network work?

Hello I am a bit new to the deep learning community and I have been really fed up with how to feed in data throught a neural network. So I was doing the sentdex pytorch series and I was learning convnets. He was using the cats and dogs dataset of microsoft on kaggle. He had resized the image to 50 by 50 and turned them into grayscale. If you want to see the video to answer my question here it is -
https://pythonprogramming.net/convnet-model-deep-learning-neural-network-pytorch/
So a few thoughts came in my mind while watching the video. The input he passed is only the colour channel of the image -
At once at seeing the input he entered it came in my mind why is he only passing the number of channels which is a grayscale image. When a conv2d takes 3 inputs.
And I mean it litterally works. I tried researching a bit but no where I found a good explaination for the input shape that is being fed in here
So I have 2 thoughts and questions about this -
So does that line mean that the convolutional neural network will
only take in an image that is grayscale and is of any height and
width and if so please tell how to limit the dimensions like that to
make our cnn only accept a input shape of (50, 50, 1).
And if not then please explain what does it mean, and how we can make
it accept any input.
Convolutional layers use the convolution operation i.e. sliding of a kernel (matrix) over the input and taking the sum of elementwise products at each position while sliding. Thus, the input dimensions will affect the output dimensions, however, it is not necessary to fix the input dimensions.
Thus, the layer can be defined as nn.Conv2d(1, 32, 5) where 1 indicates number of channels of input, 32 is number of channels of output, and 5 is the size of the kernel (it is 5x5 in this case since it is 2D).
The 32 output channels will actually mean that there will be 32 such 5x5 kernels which will be applied to the input and each output will be stacked to get an output of h x w x 32. Note that this h and w will be different than the h_in and w_in in case of not using padding, but same if you use padding.
1 input channel mentioned in the layer means that the layer will accept only single channeled inputs (which are effectively grayscale images).
If you want to limit your CNN to use (50, 50, 1) inputs only, then you can resize the image before feeding it (you can do that using OpenCV).
Check this site for some animations of convolutions.
Update: Adding more things asked in the comments by the OP.
Yes, you can input images of any shape (I suppose they still have to be at least the size of the kernel). So, theoretically, you can input any image to a convolutional layer, but not necessarily to your CNN. That is because the CNN may have flattening operations followed by fully connected layers (nn.Linear). These flattening + fully connected will expect certain dimensions (which are fixed by you in the code), so you cannot give any input image to your CNN i.e. you have to ensure that flattening the last convolutional layer's output has dimensions equal to your first fully connected layer's dimensions.
Edit: You can actually give any sized input even for a CNN containing fully-connected layers by using a Global Average Pooling (GAP) layer to reduce the size to a fixed size irrespective of the input size. It is called Adaptive Average Pooling in PyTorch.
For example, consider this network (image attached)
In this, the convolutional kernels sizes are mentioned below the arrows, and the blue cuboids represent the output after each convolutional layer. At the end, there are fully connected layers (boxes with circles) which have fixed dimensions. So, the last convolutional layer output has dimensions 66256 = 9216, which is also the dimension of the first fully connected layer.
So, basically, you design your network such that the last convolutional output flattened has same dimensions as the first fully connected layer. Note that there are some networks called Fully Convolutional Networks (FCNs) which don't use these fully connected layers and thus are input size independent. The network design and choice of layers depends on your application.

How to use pre-trained models without classes in Tensorflow?

I'm trying to use a pretrained network such as tf.keras.applications.ResNet50 but I have two problems:
I just want to obtain the top embedding layers at the end of the network, because I don't want to do any image classification. So due to this there is no need for a classes number I think.
tf.keras.applications.ResNet50 takes a default parameter 'classes=1000'
Is there a way how I can omit this parameter?
My input pictures are 128*128*1 pixels and not 224*224*3
What is the best way to fix my input data shape?
My goal is to make a triplet loss network with the output of a resnet network.
Thanks a lot!
ResNet50 has a parameter include_top exactly for that purpose -- set it to False to skip the last fully connected layer. (It then outputs a feature vector of length 2048).
The best way to reduce your image size is to resample the images, e.g. using the dedicated function tf.image.resample_images.
Also, I did not notice at first that your input images have only three channels, thx #Daniel. I suggest you build your 3-channel grayscale image on the GPU (not on the host using numpy) to avoid tripling your data transfer to GPU memory, using tf.tile:
im3 = tf.tile(im, (1, 1, 1, 3))
As a complement to the other answer. You will also need to make your images have three channels, although technically not the best input for Resnet, it is the easiest solution (changing the Resnet model is an option too, if you visit the source code and change the input shape yourself).
Use numpy to pack images in three channels:
images3ch = np.concatenate([images,images,images], axis=-1)

Tensorflow ConcatOp Error with Object Detection API

I'm following tensorflow object detection api instructions and trying to train existing object-detection model("faster_rcnn_resnet101_coco") with my own dataset having 50 classes.
So according to my own dataset, I created
TFRecord (FOR training,evaluation and testing separately)
labelmap.pbtxt
Next, I edited model.config only for model-faster_rcnn-num_classes(90 -> 50(the number of classes of my own dataset), train_config-batch_size(1 -> 10), train_config-num_steps(200000 -> 100), train_input_reader-tf_record_input_reader-input_path(to the path where TFRecord resides) and train_input_reader-label_map_path(to the path where labelmap.pbtxt resides).
Finally, I run the command
python train.py \
--logtostderr \
--pipeline_config_path="PATH WHERE CONFIG FILE RESIDES" \
--train_dir="PATH WHERE MODEL DIRECTORY RESIDES"
And I met the error below:
InvalidArgumentError (see above for traceback): ConcatOp : Dimensions
of inputs should match: shape[0] = [1,890,600,3] vs. shape[1] =
[1,766,600,3] [[Node: concat_1 = ConcatV2[N=10, T=DT_FLOAT,
Tidx=DT_INT32,
_device="/job:localhost/replica:0/task:0/cpu:0"](Preprocessor/sub, Preprocessor_1/sub, Preprocessor_2/sub, Preprocessor_3/sub,
Preprocessor_4/sub, Preprocessor_5/sub, Preprocessor_6/sub,
Preprocessor_7/sub, Preprocessor_8/sub, Preprocessor_9/sub,
concat_1/axis)]]
It seems like the dimension of input images so it may be caused by not resizing the raw image data.
But As I know, model automatically resizes the input image to train (isn't it?)
Then I'm stuck with this issue.
If there is solution, I'll appreciate it for your answer.
Thanks.
UPDATE
When I updated my batch_size field from 10 to one(original one), it seems to train without any problem... but I don't understand why...
TaeWoo is right, you have to set batch_size to 1 in order to train Faster RCNN.
This is because FRCNN uses a keep_aspect_ratio_resizer, which in turn means that if you have images of different sizes, they will also be different sizes after the preprocessing. This practically makes batching impossible, since a batch tensor has a shape [num_batch, height, width, channels]. You can see this is a problem when (height, width) differ from one example to the next.
This is in contrast to the SSD model, which uses a "normal" resizer, i.e. regardless of the input image, all preprocessed examples will end-up having the same size, which allows them to be batched together.
Now, if you have images of different sizes, you practically have two ways of using batching:
use Faster RCNN and pad your images before, either one time before training, or continuously as a preprocessing step. I'd suggest the former, since this type of preprocessing seems to slow down learning a lot
use SSD, but be sure that your objects are not affected too much by distortion. This shouldn't be a very big problem, it works as a way of data augmentation.
I had the same problem. Setting batch_size=1 does indeed seem to solve this problem but i am not sure if this will have any effect on accuracy of the model. Would love to get TF team's answer to this.
I had a similar problem that I want to share, maybe it would others with similar situations. I've changed SSD OD net to find bboxes with a fifth variable which is an angle. The problem was that we inserted an empty list to the angle variable in the bounding box and then I had a problem in tf.concat operation :
Dimensions of inputs should match: shape[0] = [1,43] vs. shape[4] = [1,0]
(shape[0] changed if I rerun the session but shape[4] stayed the same [1,0])
I fixed the problem by fixing my tf record to have a list of angles in the same lenth of other bbox variables (xmin, xmax, ymin, ymax).
Hope it helps someone , it took me a whole day to find out the problem.
Regards,
Alon

Categories