I am working on an Image Classification problem and my aim is to create a model where I can input the image, its class and the values for the bounding box(x_min,y_min,x_max,y_max). As of now I have worked with only image detection where I used ImageDataGenerator for loading my Images so this is something new to me.
In the book Hands-on Machine Learning with Scikit-Learn,Keras & TensorFlow by Aurélien Géron, he briefly mentions Image classification and Localisation and provides a example model.
base_model =keras.applications.xception.Xception(weights="imagenet",include_top=False)
avg = keras.layers.GlobalAveragePooling2D()(base_model.output)
class_output = keras.layers.Dense(n_classes, activation="softmax")(avg)
loc_output = keras.layers.Dense(4)(avg)
model = keras.Model(inputs=base_model.input, outputs=[class_output, loc_output])
model.compile(loss=["sparse_categorical_crossentropy", "mse"], loss_weights=[0.8, 0.2], optimizer='adam', metrics=["accuracy"])
He also mentions that the data should be in the form of tuple
(images, (class_labels, bounding_boxes))
But as far as I know keras only accepts data in the form of array. So if any one could help me in understanding how the model should be designed and how one should feed inputs to the model to get the class of the image and the values of the bounding boxes as output.
Let the input data be the image and target values be the 4 coordinates i.e. coordinates of bounding box
Related
For my project, I have to feed an image, which is an apparent resistivity model, into a convolutional neural network and output a corresponding "true" model image. The idea is like this: apparent resistivity model -> true model
Both sets of images are saved as tiff files in different folders. From what I understand I need to convert these to floating-point tensors to feed them into the CNN. However I'm confused as to the overall big picture - how do I feed both the apparent resistivity model and the true model into the CNN for training? Can I simply create a function that will extract the images from their directories (it is very important this is done in the correct corresponding order), convert them to arrays, normalise them, and then store them in lists? And then pass both sets of lists into the CNN as X_train and y_train?
After training, I will feed more apparent resistivity pictures into the CNN to see the image output and compare it to the "real" true model, in order to check the performance of the model. How will I get the CNN to output an image?
Sorry if these are questions are too vague, large or basic. The tutorials I've seen online all have to do with training a neural net for classification, which is not helpful to me. Thanks in advance.
Hello Stack Overflow!
I am looking to use a resnet50 face classification model to transform it in a ssd, yolo or efficientDet. Is this even possible? Basically I am looking to use a trained model that detects single classes in an image to detect more than a single class in an image. To partition an input image, detect the objects(faces) in the given image based on my resnet50 classification model, where I give the yolo my resnet classification model as parameter.
Thanks in advance!
Currently, I'm working with a dataset where I have two kinds of images: "sharp version" of the image and "blurry version" of the same images, where a blur was added synthetically. My goal is to train a model that takes the blurry version of the images in and tries to deblur the image as much as it can so that the "deblurred image" is closer to the sharp version. In the literature, the UNet architecture seemed to be a model with good results. Additionally, I can use a pre-trained U-Net via Pytorch (https://pytorch.org/hub/mateuszbuda_brain-segmentation-pytorch_unet/).
My problem is now: When I train this pre-trained U-Net with my images and then try it on my test set, I get the following output:
The original image:
I know that this pre-trained model is usually used for biomedical image segmentation but I'm rather confused about how I have to modify the model to use it for an Image Deblurring/Reconstruction task. Does anyone have any advice on how to do this?
I would appreciate any feedback :)
The U-net you're using is for segmentation (classification of each pixels of the image) whereas you're trying to denoise the image (getting your image "sharper"/remove noise). It explains the results you got.
To get what you want you need and as DerekG said, you first need to modify the number of channels of the output. By modify it, you can't load all the pretrained model. You will have to copy parameters by parameters until the last one.
As the last layer is initialized randomly, you can retrained the model with your training set. You can freeze or not the pretrained parts.
Also, I'm not sure what your new dataset is but if it's really not related to biomedical images you should retrain your network from scratch (transfer learning shouldn't be done in these cases), maybe even change the encoder-decoder network.
From the included link:
import torch
model = torch.hub.load('mateuszbuda/brain-segmentation-pytorch', 'unet',
in_channels=3, out_channels=1, init_features=32, pretrained=True)
The model you define has a single output channel, resulting in a grayscale image output. You need 3 output channels for an RGB image.
I have created an own model and I trained it with ImageTrainGenerator- from Keras using flow_from_directory.
Like this: how to train model with batches. Everything works fine, I checked the generated batches, and the pictures are as it has to be.
My problem is, that I want to use this trained model in online face detection. I crop the faces on desired width and height, I convert it into array, but the prediction is horrible.
I think that the live streamed image has to be the same as what the Imagetraingenerator creates (batches). Any idea how can I convert cv2.imread(path) image into batch to predict the class?
You just have to add the batch dimensions to convert it to a batch with 1 sample: np.expand_dims(img, axis=0).
I've been following a course online and one of the exercises was to create a simple image detection model (using MNIST data) to detect written numbers. I've been trying to load a custom image I drew in (128x128 jpg) but I can't seem to figure it out. I'm really close, but I think I'm just confused about what parameters the model takes in. Any help would be appreciated!!
Here is my code
Simply convert your image to an 128x128 numpy array with values between 0 and 1.
Then:
image = Variable(torch.from_numpy(image))[None, :, :]
classification = model(image)
classification is then a pytorch Variable containing probabilities of belonging to each class.