I would like to train a CNN only on a cropped part of the CNN-output. This mean that the input image has a resolution of w x h. The output image processed by my model is also w x h. The loss should then be computed by comparing the crop of the output image at the center (w/2 x h/2) and the label.
Is PyTorch taking care of the cropping or will my model not learn properly because of the cropping?
I am aware that training a CNN with a variable input size is possible; however, I am not sure if the weights will be adjusted correctly since my loss operates on a different resolution than my model's output.
Thanks!
Related
I am using the automatic encoder to reduce the dimension of the matrix, and I find that the accuracy is zero, which indicates that the data reconstructed by the automatic encoder is very different from the original data. Why?
enter image description here
Part of the code is shown below.
self.autoencoder = Model(inputs=self.input_factor, outputs=self.decoded)
self.autoencoder.compile(optimizer='adam', loss='mse',metrics = ['accuracy'])
supplement:
I'm working on the cancer dataset.
The input to the encoder is a matrix of (141, 39505).
The goal is to reduce the dimension of the matrix to (141,100).
The problem is that the accuracy of the encoder is always zero.
When loading a model and using it for inference, I feed in an array of images,
image_tile_tensor where the shape is (total_tile, tile_height, tile_width, 3). image_tile_tensor is a numpy array.
I input this into my model for inference using the code below:
image_tile_tensor = tf.image.convert_image_dtype(image_tile_tensor, tf.float32)
image_tile_prediction = model.predict(image_tile_tensor, verbose=1)
During inference, all the prediction come out nicely except for the last few image tile in image_tile_tensor. For example, if I have 112 total image tile for inference, the last 12 image tiles has prediction with all values equal to NaN while the first 100 image tile has the expected prediction value.
Any idea what the problem might be? I am abit lost and dont know where to start debug this at the moment. If it helps, the input tile_height and tile_width are (192,192) and training batch size = 16.
I am trying to train a Unet model to do per pixel regression predictions on images. To do this, I separate my large image (1000x1000) to 200x200 pixel squares. Then use that to train an FCN model with a linear final layer. The loss function is MSE loss. In the prediction stage, I extract the same boxes but stitch it together and obtain a final output image. When I do that, the problem I am getting is that there is discontinuities between the boundary of boxes. (I can clearly see the boxes)
I've tried to deal with this by feeding 250x250 boxes to my FCN and calculating the loss for the 200x200 centre region. I do the same process for the prediction state. Extract 250x250 patches crop the 200x200 centre region and stitch the image back together. Please see some code below:
Loss Function:
criterion = nn.MSELoss()
optimizer = optim.Adam(self.model.parameters(), lr=LR)
for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
output = model(inputs)
output = output.squeeze()
_, dimx, dimy = output.shape
loss = criterion(output[:,25:dimx-25, 25:dimy-25], labels[:,25:dimx-25, 25:dimy-25])
loss.backward()
optimizer.step()
My code for predictions is as follows:
pred = np.zeros((height, width))
for i in range(25, height, 200):
for j in range(25, width, 200):
patch = img[:, i-25:i+225, j-25:j+225]
patch = torch.from_numpy(patch)
patch = patch.unsqueeze(dim=0).to(device)
out = model(patch)
out = out[0,0,25:225, 25:225]
pred[i:i+200, j:j+200] = out.cpu().numpy()
I'm not sure if my problem makes complete sense. I can provide more clarification if necessary but I have been stuck on this for a while now.
It makes sense to have discontinuity near the boundary because there is no requirement for the network to have smooth predictions across boxes during the training.
I assume you have limited GPU memory, so you take only 200x200 pixels as input at a time; Thus, I would suggest the following two possible workarounds.
First, You could use torchvision.transform.RandomCrop to generate 200x200 cropped regions as inputs of the training. At the testing phase, you directly input the whole image to do the prediction. The intuition is that the model can see the full resolution of images, which is the same as testing data, while consuming fewer GPU memory during the training. In this case, you would also expect that the model needs more time to learn all training data patterns because it only sees partial data at a time.
Second, You could simply downsample training data, say 0.5x, and keep the output size, i.e. 1x. For example, in your case, after downsampling the input image to 200x200, the model takes it to predict 1000x1000 pixel level labels (you could use bilinear upsampling or deconv layers). This workaround method has been used in some segmentation implementations (AdaptSeg, DISE).
After some troubleshooting I realized that I had this problem because I was performing batch normalization between each convolutional layer. Removing that step solved the discontinuity problem.
I have a dataset containing 2000 images that look like this https://imgur.com/a/IG3yLpe
The goal is to use CNN to predict the center point (x,y) of the rounded rectangle shape. Center points associated with training images are provided as training labels.
Here is the general flow of my program:
Crop the original image (2592 x 1944) into Region of interests (ROIs) of (2092 x 544)
Use data augmentation techniques (rotation, gaussian noise, brightness, zoom, shifting) to increase the image quantity to 20000.
Crop the augmented ROIs images to 64x64 numpy array (aspect ratio is no preserved)
Use 10% of the dataset as validation set
Fit the above images into a CNN model. I use the default Adam optimizer. The model summary and train result are in this file
https://gofile.io/?c=G9z8Y7"
Evaluate the model on the whole dataset + 100 test images
Compare the prediction results with ground truths
I use val_mae metric to measure success. I was not satisfied with the prediction results although the val_mae reached 0.61. The highest x gap is 3.4 pixels and only 50% of the images achieved my goal (x_gap < 1px and y_gap < 1px).
How should i tune my CNN model to achieve my target
Thank you for your help
I'm trying to calculate the gradient at some layer with respect to the input image. The gradient is defined as
feature = g.get_tensor_by_name('inception/conv2d0_pre_relu:0')
gradient = tf.gradients(tf.reduce_max(feature, 3), x)
and my input image has a shape of (299,299), which is the size that inception is trained at
print(img.shape)
# output (299,299,3)
Then the gradient with respect to the input can be calculated as
img_4d=img[np.newaxis]
res = sess.run(gradient, feed_dict={x: img_4d})[0]
print(res.shape)
# output (1,299,299,3)
We see that the gradient has the same shape as the input image, which is expected.
However, it appears that one can use image with any size but still get the gradient. For example, if I have a img_resized with a shape (150,150,3), the gradient with respect to this input will also with a shape of (150,150,3):
img_resized=skimage.transform.resize(img, [150,150], preserve_range=True)
img_4d=img_resized[np.newaxis]
res = sess.run(gradient, feed_dict={x: img_4d})[0]
res.shape
# output (1,150,150,3)
So why does this work? In my naive picture, the dimension of the input image must be fixed at (299,299,3), and the gradient at some layer with respect to the input would always have the shape of (299,299,3). Why is it able to generate a gradient of other sizes?
In other words, what happens in the above code? When we feed an image with shape (150,150,3), does tensorflow resize the image to (299,299,3) and calculate the gradient with shape (299,299,3), and then resize the gradient back to (150,150,3)?
This is an expected phenomena esp. in the case of inception net which can work with any sized input owing to being fully convolutional network. Unlike Alexnet or VGG which rely on Fully Connected layer in later part of the network, Fully Convolutional networks can work on any sized input. Hope this answers your question.