After training my classification model I get an accuracy of 94% on my Test data. I am working with TIFF images. To load the data and feed it into the classification model I am using the Dataloader from Pytorch.
My Dataloader function looks like this:
def dataload(self,train_path,batch_train,test_path,batch_train_val):
#Transforms
transformer=transforms.Compose([
#transforms.Resize((450,450)),
transforms.Resize((150,150)),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(), #0-255 to 0-1, numpy to tensors
transforms.Normalize([0.5,0.5,0.5], # 0-1 to [-1,1]
[0.5,0.5,0.5])])
train_loader=DataLoader(
torchvision.datasets.ImageFolder(train_path,transform=transformer),
#torchvision.datasets.ImageFolder(train_path,transform=get_transformer()),
batch_size=batch_train, shuffle=True
)
test_loader=DataLoader(
torchvision.datasets.ImageFolder(test_path,transform=transformer),
#torchvision.datasets.ImageFolder(train_path,transform=get_transformer()),
batch_size=batch_train_val, shuffle=True
)
return [train_loader,test_loader]
Dataloader manages Tiff images and converts them somehow automatically into a three layer image, because a TIFF image has 4 layers but my model need a three layer image as an input.
When I finally tried to to use my saved model I got several problems. Since I am loading each image separately to predict the label I don't use Dataloader from Pytorch anymore.
My code looks like this:
all_images = [f for f in listdir(pred_path) if isfile(join(pred_path, f))]
for i in all_images:
transformer=transforms.Compose([
transforms.Resize((150,150)),
transforms.ToTensor(),
transforms.Normalize([0.5,0.5,0.5], # 0-1 to [-1,1]
[0.5,0.5,0.5])])
image=Image.open(pred_path+"/"+i).convert('RGB')
image_tensor=transformer(image).float()
image_tensor=image_tensor.unsqueeze_(0)
input=Variable(image_tensor)
output=model(input)
index=output.data.numpy().argmax()
Since Tiff images have 4 layers and my model expects a three layer image I get an error. However when I manually convert the Tiff to a JPG image or convert them directly to RGB in my code, the model always predicts the same label for every image.
The strange part is that I only get all these problems when using the Efficientnet B7 model. When I use a small custom model, everything works fine I get neither of the above problems.
Related
I have trained a tensorflow model and converted it to tflite model.
I want to build a Tensorflow lite (.tflite) model which does pre-processing, model execution, and post-processing. Pre-processing mainly consists of reading a single image, resizing it with padding, and converting it to an array. This array is input to tflite model and output of the model are several arrays. These arrays need to be processed to get meaningful information out of them.
Is it possible to create a tflite model which can do pre-processing and post-processing? I only need to give image as input and get the desired output.
For e.g.
pre-processing.py --> import image, resize image, normalize image, convert to numpy array (float32)
post-processing.py --> read model output arrays, extract segmentation masks, plot on image
WHAT I WANT
input_image.jpg-->model-->output_image.jpg with segmentation mask plotted
Currently, I'm working with a dataset where I have two kinds of images: "sharp version" of the image and "blurry version" of the same images, where a blur was added synthetically. My goal is to train a model that takes the blurry version of the images in and tries to deblur the image as much as it can so that the "deblurred image" is closer to the sharp version. In the literature, the UNet architecture seemed to be a model with good results. Additionally, I can use a pre-trained U-Net via Pytorch (https://pytorch.org/hub/mateuszbuda_brain-segmentation-pytorch_unet/).
My problem is now: When I train this pre-trained U-Net with my images and then try it on my test set, I get the following output:
The original image:
I know that this pre-trained model is usually used for biomedical image segmentation but I'm rather confused about how I have to modify the model to use it for an Image Deblurring/Reconstruction task. Does anyone have any advice on how to do this?
I would appreciate any feedback :)
The U-net you're using is for segmentation (classification of each pixels of the image) whereas you're trying to denoise the image (getting your image "sharper"/remove noise). It explains the results you got.
To get what you want you need and as DerekG said, you first need to modify the number of channels of the output. By modify it, you can't load all the pretrained model. You will have to copy parameters by parameters until the last one.
As the last layer is initialized randomly, you can retrained the model with your training set. You can freeze or not the pretrained parts.
Also, I'm not sure what your new dataset is but if it's really not related to biomedical images you should retrain your network from scratch (transfer learning shouldn't be done in these cases), maybe even change the encoder-decoder network.
From the included link:
import torch
model = torch.hub.load('mateuszbuda/brain-segmentation-pytorch', 'unet',
in_channels=3, out_channels=1, init_features=32, pretrained=True)
The model you define has a single output channel, resulting in a grayscale image output. You need 3 output channels for an RGB image.
Tensorflow 2.3 introduced new preprocessing layers, such as tf.keras.layers.experimental.preprocessing.Resizing.
However, the typical flow to train on images with Keras uses tf.keras.preprocessing.image.ImageDataGenerator, which can only take a fixed target_size parameter. As far as I understand, the root cause is that keras is handling images as a numpy array in the background, where all images have to be the same size (is that true?).
While I could then use a model with a resizing layer that was trained on a fixed size to then predict images of arbitrary size, this seems to be risky since the training data and inference data would have systematic differences. One workaround could be to use ImageDataGenerator with a target_size and interpolation method that match the ones of the resizing layer, so that during training the resizing layer basically does nothing, but then it seems that the resizing layer is not really of any benefit.
So the question is, is there a way to train directly on mixed size images to fully take advantage of the resizing layer?
models needs to operate on images of a FIXED size. If you train a model with a fixed size for example (224 X 224) then if you want to use the trained model to make predictions on images you need to resize those images to 224 X 224. Specifically whatever pre-processing you did on the training images you should also do on the images that you wish to predict. For example if your model was trained on RGB images but the images you want to predict are say BGR images (like reading in images with CV2) the results will be incorrect. You would need to convert them to RGB . Similarly if you rescaled you training images by dividing by 255 you should also rescale the images you want to predict.
I am working on an Image Classification problem and my aim is to create a model where I can input the image, its class and the values for the bounding box(x_min,y_min,x_max,y_max). As of now I have worked with only image detection where I used ImageDataGenerator for loading my Images so this is something new to me.
In the book Hands-on Machine Learning with Scikit-Learn,Keras & TensorFlow by Aurélien Géron, he briefly mentions Image classification and Localisation and provides a example model.
base_model =keras.applications.xception.Xception(weights="imagenet",include_top=False)
avg = keras.layers.GlobalAveragePooling2D()(base_model.output)
class_output = keras.layers.Dense(n_classes, activation="softmax")(avg)
loc_output = keras.layers.Dense(4)(avg)
model = keras.Model(inputs=base_model.input, outputs=[class_output, loc_output])
model.compile(loss=["sparse_categorical_crossentropy", "mse"], loss_weights=[0.8, 0.2], optimizer='adam', metrics=["accuracy"])
He also mentions that the data should be in the form of tuple
(images, (class_labels, bounding_boxes))
But as far as I know keras only accepts data in the form of array. So if any one could help me in understanding how the model should be designed and how one should feed inputs to the model to get the class of the image and the values of the bounding boxes as output.
Let the input data be the image and target values be the 4 coordinates i.e. coordinates of bounding box
I have created an own model and I trained it with ImageTrainGenerator- from Keras using flow_from_directory.
Like this: how to train model with batches. Everything works fine, I checked the generated batches, and the pictures are as it has to be.
My problem is, that I want to use this trained model in online face detection. I crop the faces on desired width and height, I convert it into array, but the prediction is horrible.
I think that the live streamed image has to be the same as what the Imagetraingenerator creates (batches). Any idea how can I convert cv2.imread(path) image into batch to predict the class?
You just have to add the batch dimensions to convert it to a batch with 1 sample: np.expand_dims(img, axis=0).