I've been following a course online and one of the exercises was to create a simple image detection model (using MNIST data) to detect written numbers. I've been trying to load a custom image I drew in (128x128 jpg) but I can't seem to figure it out. I'm really close, but I think I'm just confused about what parameters the model takes in. Any help would be appreciated!!
Here is my code
Simply convert your image to an 128x128 numpy array with values between 0 and 1.
Then:
image = Variable(torch.from_numpy(image))[None, :, :]
classification = model(image)
classification is then a pytorch Variable containing probabilities of belonging to each class.
Related
For my project, I have to feed an image, which is an apparent resistivity model, into a convolutional neural network and output a corresponding "true" model image. The idea is like this: apparent resistivity model -> true model
Both sets of images are saved as tiff files in different folders. From what I understand I need to convert these to floating-point tensors to feed them into the CNN. However I'm confused as to the overall big picture - how do I feed both the apparent resistivity model and the true model into the CNN for training? Can I simply create a function that will extract the images from their directories (it is very important this is done in the correct corresponding order), convert them to arrays, normalise them, and then store them in lists? And then pass both sets of lists into the CNN as X_train and y_train?
After training, I will feed more apparent resistivity pictures into the CNN to see the image output and compare it to the "real" true model, in order to check the performance of the model. How will I get the CNN to output an image?
Sorry if these are questions are too vague, large or basic. The tutorials I've seen online all have to do with training a neural net for classification, which is not helpful to me. Thanks in advance.
Currently, I'm working with a dataset where I have two kinds of images: "sharp version" of the image and "blurry version" of the same images, where a blur was added synthetically. My goal is to train a model that takes the blurry version of the images in and tries to deblur the image as much as it can so that the "deblurred image" is closer to the sharp version. In the literature, the UNet architecture seemed to be a model with good results. Additionally, I can use a pre-trained U-Net via Pytorch (https://pytorch.org/hub/mateuszbuda_brain-segmentation-pytorch_unet/).
My problem is now: When I train this pre-trained U-Net with my images and then try it on my test set, I get the following output:
The original image:
I know that this pre-trained model is usually used for biomedical image segmentation but I'm rather confused about how I have to modify the model to use it for an Image Deblurring/Reconstruction task. Does anyone have any advice on how to do this?
I would appreciate any feedback :)
The U-net you're using is for segmentation (classification of each pixels of the image) whereas you're trying to denoise the image (getting your image "sharper"/remove noise). It explains the results you got.
To get what you want you need and as DerekG said, you first need to modify the number of channels of the output. By modify it, you can't load all the pretrained model. You will have to copy parameters by parameters until the last one.
As the last layer is initialized randomly, you can retrained the model with your training set. You can freeze or not the pretrained parts.
Also, I'm not sure what your new dataset is but if it's really not related to biomedical images you should retrain your network from scratch (transfer learning shouldn't be done in these cases), maybe even change the encoder-decoder network.
From the included link:
import torch
model = torch.hub.load('mateuszbuda/brain-segmentation-pytorch', 'unet',
in_channels=3, out_channels=1, init_features=32, pretrained=True)
The model you define has a single output channel, resulting in a grayscale image output. You need 3 output channels for an RGB image.
I came across this example which implements a pretrained model. It says:
Format the Data
Use the tf.image module to format the images for the task.
Resize the images to a fixed input size, and rescale the input
channels to a range of [-1,1]
IMG_SIZE = 160 # All images will be resized to 160x160
def format_example(image, label):
image = tf.cast(image, tf.float32)
image = (image/127.5) - 1
image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
return image, label
I was wondering about this. What I understand is that image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE)) resizes the images (which can have any size) to one consistent size. I understand that image = (image/127.5) - 1 does not change the actual size of the images, but changes the values (pixels) (which are between 0 to 255) to a range of [-1,1]. In other examples I saw normalization/standardization being done to a range of [0,1], so rescaling by 1.0/255. I do not understand when I have to use which. If I use my own model, it is up to me to scale to a range of [-1,1] or [0,1]? However, when I use a pretrained model I need to know what is required. I googled the mobilenetv2 model, but could not find any documentation telling me that the required input channel is [-1,1]. In this comment it says all pretrained tensorflow models require an input channel of [-1,1]. Is that true? Especially, is that true that all models in the tensorflow hub (if about images) require a range of [-1,1]?
Finally, how do I find out what the required range is for a pretrained model? I would not have figured out the [-1,1] in case of MobileNetv2 by my own. On the tensorflow MobileNetv2 page I could not find this information.
Furthermore: Is there a way to basically have this done automatically? So that I use a function and it automatically checks the pretrained tensorflow dataset (which has an object storing that information) and applies it (assuming 0-255 is my input)? I think tf.keras.applications.mobilenet_v2.preprocess_input is doing something else (I am not really understanding what it does)? And it is also just for mobilenetv2.
Generally, you are concerned with 'what scaling should I choose between [0, 1], [-1, 1] ?' As the answer may be different depending on the cases, I would like to point them out below.
CNN architectures work better in short closed range input values. Therefore, both, [0, 1] and [-1, 1] may be a good choice. However, depending on the architecture, the selection can be different. As a result, it would be a good option to try various scales.
Concerning the pre-trained model of Keras, I noticed that most models that use residuals (such as, ResNets, MobileNetV2, InceptionResNetV2) use [-1, 1] scale. Using [-1, 1] scales in residuals, causes some edges to be deactivated in some cases. To further understand, let us consider a perceptron y = wx + b. If w = 1 and b = 1 then using input x = 1 results y = 0. This states that using [-1, 1] scale, some input values can be nullified by the bias (without setting w=0). This concept is mostly true for the other models (excluding Keras) as well.
Almost all of the Keras architectures use scaling techniques. I believe in some cases, they did not perform the suggested operations instructed by the original papers. So, I believe you should stick with Keras' documentation in case of using their pre-trained model. If you do not find any scaling on their documentation, you should avoid scaling it.
Furthermore, you should try testing different scaling methods while you are using different datasets. However, this should not highly improve the accuracy of the model in most cases. Please let me know if you have more queries. Thanks.
I'm trying to use a pretrained VGG 16 from keras. But I'm really unsure about what the input range should be.
Quick answer, which of these color orders?
RGB
BGR
And which range?
0 to 255?
balanced from about -125 to about +130?
0 to 1?
-1 to 1?
I notice the file where the model is defined imports an input preprocessor:
from .imagenet_utils import preprocess_input
But this preprocessor is never used in the rest of the file.
Also, when I check the code for this preprocessor, it has two modes: caffe and tf (tensorflow).
Each mode works differently.
Finally, I can't find consistent documentation on the internet.
So, what is the best range for working? To what range are the model weights trained?
The model weights were ported from caffe, so it's in BGR format.
Caffe uses a BGR color channel scheme for reading image files. This is
due to the underlying OpenCV implementation of imread. The assumption
of RGB is a common mistake.
You can find the original caffe model weight files on VGG website. This link can also be found on Keras documentation.
I think the second range would be the closest one. There's no scaling during training, but the authors have subtracted the mean value of the ILSVRC2014 training set. As stated in the original VGG paper, section 2.1:
The only preprocessing we do is subtracting the mean RGB value,
computed on the training set, from each pixel.
This sentence is actually what imagenet_utils.preprocess_input(mode='caffe') does.
Convert from RGB to BGR: because keras.preprocessing.image.load_img() loads images in RGB format, this conversion is required for VGG16 (and all models ported from caffe).
Subtract the mean BGR values: (103.939, 116.779, 123.68) is subtracted from the image array.
The preprocessor is not used in vgg16.py. It's imported in the file so that users can use the preprocess function by calling keras.applications.vgg16.preprocess_input(rgb_img_array), without caring about where model weights come from. The argument for preprocess_input() is always an image array in RGB format. If the model was trained with caffe, preprocess_input() will convert the array into BGR format.
Note that the function preprocess_input() is not intended to be called from imagenet_utils module. If you are using VGG16, call keras.applications.vgg16.preprocess_input() and the images will be converted to a suitable format and range that VGG16 was trained on. Similarly, if you are using Inception V3, call keras.applications.inception_v3.preprocess_input() and the images will be converted to the range that Inception V3 was trained on.
The purpose of the task is to classify images by means of SVM. The variable 'images' is supposed to contain the image information and correspondingly labels contains image labels. How can I build (what format and dimensions) should the images and labels have? I tried unsuccesfully images to be a Python array (appending flattened images) and then, in another attempt, Numpy arrays:
images=np.zeros((number_of_images, image_size))
labels=np.zeros((number_of_images, 1))
svm=cv2.SVM()
svm.train(images, labels)
Is it a right approach to the problem and if so, what is the correct way for training the classifier?
I don't think that you can use raw image data to train SVM model. Ok-ok, you can, but it won't be very fruitful.
The basic approach is to extract some features from each image and to use these features for training your model. A set of features forms a dictionary of words, each of which describes your image. Due to the fact that you are using the same set of words to describe each image, you can compare features corresponding to different images. This link introduces more details, check it.
Whats next?
Choose a feature extractor for your algo - HOG, SURF, SIFT (link)
Extract features from each image. You'll get an array of the same length as images array.
Initialize bag-of-words (BoG) model
Train SVM with BoG
Useful links:
C++ vey detailed example
Documentation for existing BOG classifier