Currently, I'm working with a dataset where I have two kinds of images: "sharp version" of the image and "blurry version" of the same images, where a blur was added synthetically. My goal is to train a model that takes the blurry version of the images in and tries to deblur the image as much as it can so that the "deblurred image" is closer to the sharp version. In the literature, the UNet architecture seemed to be a model with good results. Additionally, I can use a pre-trained U-Net via Pytorch (https://pytorch.org/hub/mateuszbuda_brain-segmentation-pytorch_unet/).
My problem is now: When I train this pre-trained U-Net with my images and then try it on my test set, I get the following output:
The original image:
I know that this pre-trained model is usually used for biomedical image segmentation but I'm rather confused about how I have to modify the model to use it for an Image Deblurring/Reconstruction task. Does anyone have any advice on how to do this?
I would appreciate any feedback :)
The U-net you're using is for segmentation (classification of each pixels of the image) whereas you're trying to denoise the image (getting your image "sharper"/remove noise). It explains the results you got.
To get what you want you need and as DerekG said, you first need to modify the number of channels of the output. By modify it, you can't load all the pretrained model. You will have to copy parameters by parameters until the last one.
As the last layer is initialized randomly, you can retrained the model with your training set. You can freeze or not the pretrained parts.
Also, I'm not sure what your new dataset is but if it's really not related to biomedical images you should retrain your network from scratch (transfer learning shouldn't be done in these cases), maybe even change the encoder-decoder network.
From the included link:
import torch
model = torch.hub.load('mateuszbuda/brain-segmentation-pytorch', 'unet',
in_channels=3, out_channels=1, init_features=32, pretrained=True)
The model you define has a single output channel, resulting in a grayscale image output. You need 3 output channels for an RGB image.
Related
For my project, I have to feed an image, which is an apparent resistivity model, into a convolutional neural network and output a corresponding "true" model image. The idea is like this: apparent resistivity model -> true model
Both sets of images are saved as tiff files in different folders. From what I understand I need to convert these to floating-point tensors to feed them into the CNN. However I'm confused as to the overall big picture - how do I feed both the apparent resistivity model and the true model into the CNN for training? Can I simply create a function that will extract the images from their directories (it is very important this is done in the correct corresponding order), convert them to arrays, normalise them, and then store them in lists? And then pass both sets of lists into the CNN as X_train and y_train?
After training, I will feed more apparent resistivity pictures into the CNN to see the image output and compare it to the "real" true model, in order to check the performance of the model. How will I get the CNN to output an image?
Sorry if these are questions are too vague, large or basic. The tutorials I've seen online all have to do with training a neural net for classification, which is not helpful to me. Thanks in advance.
Tensorflow 2.3 introduced new preprocessing layers, such as tf.keras.layers.experimental.preprocessing.Resizing.
However, the typical flow to train on images with Keras uses tf.keras.preprocessing.image.ImageDataGenerator, which can only take a fixed target_size parameter. As far as I understand, the root cause is that keras is handling images as a numpy array in the background, where all images have to be the same size (is that true?).
While I could then use a model with a resizing layer that was trained on a fixed size to then predict images of arbitrary size, this seems to be risky since the training data and inference data would have systematic differences. One workaround could be to use ImageDataGenerator with a target_size and interpolation method that match the ones of the resizing layer, so that during training the resizing layer basically does nothing, but then it seems that the resizing layer is not really of any benefit.
So the question is, is there a way to train directly on mixed size images to fully take advantage of the resizing layer?
models needs to operate on images of a FIXED size. If you train a model with a fixed size for example (224 X 224) then if you want to use the trained model to make predictions on images you need to resize those images to 224 X 224. Specifically whatever pre-processing you did on the training images you should also do on the images that you wish to predict. For example if your model was trained on RGB images but the images you want to predict are say BGR images (like reading in images with CV2) the results will be incorrect. You would need to convert them to RGB . Similarly if you rescaled you training images by dividing by 255 you should also rescale the images you want to predict.
I have created an own model and I trained it with ImageTrainGenerator- from Keras using flow_from_directory.
Like this: how to train model with batches. Everything works fine, I checked the generated batches, and the pictures are as it has to be.
My problem is, that I want to use this trained model in online face detection. I crop the faces on desired width and height, I convert it into array, but the prediction is horrible.
I think that the live streamed image has to be the same as what the Imagetraingenerator creates (batches). Any idea how can I convert cv2.imread(path) image into batch to predict the class?
You just have to add the batch dimensions to convert it to a batch with 1 sample: np.expand_dims(img, axis=0).
I've just started with tensorflow. I wrote a program that uses Fashion_MNIST dataset to train the model. And then predicts the labels using 'test_images'and it's working good so far.
But what I am curious how can I use my own image of a shoe or shirt for prediction. Because all the test images are of shape 28*28. How can I do this ?
The task you are engaged in is the task of data preparation and preprocessing. Among the things you must do already having a directory with images is the tagging of the images, for this task I recommend labelImg.
If you also need the dimensionality of the input to be of a specific size like the example you give, you can use digital image processing software. The OpenCV library has dimensionality reduction tools that work for this.
I'm trying to look for the classification of images with labels using RNN with custom data. I can't find any example other than the Mnist dataset. Any help like this repository where CNN is used for classification would be grateful. Any help regarding the classification of images using RNN would be helpful. Trying to replace the CNN network of the following tutorial.
Aymericdamien has some of the best examples out there, and they have an example of using an RNN with images.
https://github.com/aymericdamien/TensorFlow-Examples
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb
The example is using MNIST, but it can be applied to any image.
However, I'll point out that you're unlikely to find many examples of using an RNN to classify an image because RNNs are inferior to CNNs for most image processing tasks. The example linked to above is for educational purposes more than practical purposes.
Now, if you are attempting to use an RNN because you have a sequence of images you wish to process, such as with a video, in this case a more natural approach would be to combine both a CNN (for the image processing part) with an RNN (for the sequence processing part). To do this you would typically pretrain the CNN on some classification task such as Imagenet, then feed the image through the CNN, then the last layer of the CNN would be the input to each timestep of an RNN. You would then let the entire network train with the loss function defined on the RNN.