CV2 resizing with CNN

CV2 resizing with CNN - python

I am using CV2 to resize various images with different dimensions(i.e. 70*300, 800*500, 60*50) to a specific (200*200) pixels dimension. Later, I am feeding the pictures to CNN algorithm to classify the images. (my understanding that pictures must have the same size when fed into CNN).
My questions:
1- How low picture resolutions are converted into higher one and how higher resolutions are converted into lower one? Will this affect the stored information in the pictures
2- Is it good practice to use this approach with CNN? Or is it better to Pad zeros to the end of the image to get the desired resolution? I have seen many researchers pad the end of a file with zeros when trying to detect Malware files to have a common dimension for all the files. Does this mean that padding is more accurate than resizing?

Using interpolation. https://chadrick-kwag.net/cv2-resize-interpolation-methods/
Definitely, resizing is a lossy process and you'll lose information.
Both are okay and used depending on the needs. Resizing is also equally applicable. If your CNN can't differentiate between the original and resized images it must be a badly overfitted one. Resizing is a very light regularization too, even it's advisable to apply more augmentation schemes on the images before CNN training.

Related

How can I reduce the number of channels in an MRI (.nii format) image?

I have been trying to feed a dataset of brain MRI images (IXI dataset) to a ConvNet, however, some of the images have 140 channels some others 150 channels. How can I make all the images have the same number of channels so that I won't run into trouble with a fixed CNN input shape? I am using nibabel lib for reading the .nii files.
EDIT:
I don't have much knowledge about MRI images, what channels should be discarded?

The obvious approach is definitely:
Find the minimum number of channels in the sample.
Discard all the other channels for any sample.
Now, the discarding can happen from the middle of the slice which will probably contain better details. But this is based on the specific domain.
Or, 2. you can select a mean from the number of channels. and try to discard for the images with higher number of channels and add a black slice for images with lower number of channels.

I assume by "channels" you mean number of slices, am I right? Then another approach is to duplicate some of the images to make all of them have 150 channels. If you think about data augmentation, duplicating (and probably make minor alterations) might be a good idea. Of course, depends on the actual content of your images, this may or may not be applicable.

clever image augmentation - random zoom out

i'm building a CNN to identify facial keypoints. i want to make the net more robust, so i thought about applying some zoom-out transforms because most pictures have about the same location of keypoints, so the net doesn't learn much.
my approach:
i want augmented images to keep the original image size so apply MaxPool2d and then random (not equal) padding until the original size is reached.
first question
is it going to work with simple average padding or zero padding? i'm sure it would be even better if i made the padding appear more like a background but is there a simple way to do that?
second question
the keypoints are the target vector, they come as a row vector of 30. i'm getting confused with the logic needed to transform them to the smaller space.
generally if an original point was at (x=5,y=7) it transforms to (x=2,y=3)- i'm not sure about it but so far manually checked and it's correct. but what to do if to keypoints are in the same new pixel? i can't feed the network with less target values.
that's it. would be happy to hear your thoughts

I suggest to use torchvision.transforms.RandomResizedCrop as a part of your Compose statement. which will give you random zooms AND resize the resulting the images to some standard size. This avoids issues in both your questions.

Tensorflow object detection API and images size

I'm practicing with computer vision in general and specifically with the TensorFlow object detection API, and there are a few things I don't really understand yet.
I'm trying to re-train an SSD model to detect one class of custom objects (guitars).
I've been using ssd_mobilenet_v1_coco and ssd_mobilenet_v2_coco models, with a dataset of 1000K pre-labeled images downloaded from the OpenImage dataset. I used the standard configuration file, only modifying the necessary parts.
I'm getting slightly unsatisfactory detections on small objects, which is supposed to be normal when using SSD models. Here on stackoverflow I saw people suggesting to crop the image in smaller frames, but I'm having trouble understanding a few things:
According to the .config file and the SSD papers, images are resized to the fixed dimension of 300x300 pixels (I'm assuming it holds both when training the model and when using it for inference). So, I guess it means that the original size of training and test/evaluation images doesn't matter, because they're always resized to 300x300 anyway? Then, I don't understand why many people suggest using images of the same size of the ones the models has been trained on...does it matter or not?
It's not really clear to me what "small objects" means in the first place.
Is it referred to the size ratio between the object and the whole image? So, a small object is one that covers...say, less than 5% of the total image?
Or is it referred to the number of pixels forming the object?
In the first case, cropping the image around the object would make sense. In the second case, it shouldn't work, because the number of useful pixels identifying the object stays the same.
Thanks!

I am not sure for the answer I am giving below but it worked for me, as you correctly said that images are resized to 300x 300 in the config file of ssd_mobilenet-v2, what this resizing does is compress image to 300 x 300 thus loosing the important features. This adversely effect the object that are small in size as they have much to loose. Depending on the GPU power you have you can make some changes in the config file:
1st- change the following line as
image_resizer {
fixed_shape_resizer {
height: 600
width: 600
}
}
thus now giving double the data(in the config file).
2nd - What the above change will do is throw your GPU out of memory, so you need to reduce the batch size from 24 to 12 or 8, which can lead to over fitting so do check the regularization parameters too.
3rd- optional method is to comment out the following
enter image description here
this helps a lot and reduce the time to train by almost half. the trade off is if the image is not aligned as your train data, the confidence level of the model will drop and it may completely not recognize inverted cat.

I do not see why one would get better results in keeping image size the SSD model was trained on. SSD detectors are fully convolutionnal and covolutions are not concerned with image sizes.
'Small objects' refers to the number of pixels containing information about the object. Here is how it makes sense to crop images to improve performance on small objects : Tensorflow object detection API performs data augmentations before resizing images, (check inputs.transform_input_data doc strings) so cropping then resizing the cropped image will preserve more information than resizing the full image because the donwsizing factor is smaller for the cropped image than for the full image.

How to make a neural network with batches with different input shapes

I want to make a CNN or FCN that can take grayscale images as an input and outputs a color image. It is very important to me that the size of the images can vary. I heard that I can only do this when I make a FCN and take a batch with images of one size and another batch with images of another size. But I don't know how to make this concept in Tensorflow Keras (the Python version) and I was wondering if you could provide some sample code or pseudo code? I appreciate that. Thanks!

I know you want to keep them all in their original size, but that's not possible. Don't worry, though, because the resizing can take place while the images are being fed into the model (while in memory); the image will never be touched except to be read.
Here's a great example that I frequently reference!

Add chroma noise to image

I'm training a deep neural network to improve the quality of images. The images contain some specific types of noise that I want to reduce/remove by means of a deep learning model. In order to do so I'm using a huge dataset of similar clear high-res images with barely any noise, add the specific types of noise to the images and train the network on regenerating the original image (a custom autoencoder network). With one of the several noise types this works very well so far. Without going to far into the details, adding that particular type of noise was easy.
Now I need to add another noise type to the images, more precisely: chroma noise like in the following image (the bottom right one): link
How do I artificially generate and add chroma noise to an image in Python? I can use the full range of image processing packages, PIL, numpy, OpenCV, torchvision...

You need to convert the image to a colorspace such as HSV or CIE Lab. You then add noise to the chromacity channels (a, b in Lab, or H, S is HSV). Finally, convert back to RGB.
This colorspace conversion step is very common and most image toolkits should have that functionality.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.