I am having issue in feeding my own image into LeNet using Caffe library. I have deployed and initialised the weight obtained through training with no difficulties. As the net is trained using input size of 28x28, I tried resizing the input image to 28x28 and feed into the deployed LeNet, but it gave me "unhashable numpy array" error.
Not only that, I also tried to transpose it with img = img.transpose(img, (2,0,1)) after resizing but it gave me "TypeError: only length-1 arrays can be converted to Python scalars"
Below are the python codes I tried so far in pre-processing my image :
img = caffe.io.load_image('number5.png')
img = caffe.io.resize_image(img, (28,28), interp_order=3)
img = img.transpose(img, (2,0,1))
I am a beginner in using Caffe and still in the process of learning. I hope someone can give me some example or insights in how to pre-process an image before feeding into the net.
Thank You.
Best regards.
Just use this instead:
img = img.transpose((2,0,1))
You can use caffe.io.Transformer
This is used to preprocess caffe 'data' blob.
Define it as
transformer = caffe.io.Transformer({'data':net.blobs['data'].data.shape})
transformer.set_transpose('data',(2,0,1))
then you can,
img = caffe.io.load_image('number5.png')
img = caffe.io.resize_image(img, (28,28), interp_order=3)
img_transposed=transformer.preprocess('data',img)
Related
I want to be able to predict the Class of a Single Image from the Learner and i always get an Index Out of Bound Exception .
Here is the Code
data = ImageDataLoader.from_folder(path, train="Train", valid ="Valid",
ds_tfms=get_transforms(), size=(256,256), bs=32, num_workers=4)
//Model is a Sequential One
learn = Learner(data, model, loss_func = nn.CrossEntropyLoss(), metrics=accuracy)// The Model
learn.fit_one_cycle(100, lr_max=3e-3)
Img = //PIL Image Path
learn.predict(img)
The Model is able to Predict on ImageDataLoader but not on a Single Image .If anyone has any clue it would be much appreciated
Here is a Link to FastAi but didnt solve the issue
https://forums.fast.ai/t/how-to-use-learner-predict-list-index-out-of-range/81998/7
EDIT NOTE : I have tried to convert the Image to a tensor Flow but another error is given .Photo of the Error
I think that the problem is that data is a rank 4 tensor whereas Img is rank 3. In other words, it is missing the #points or batch dimension up front. In TF one can fix that with tf.expand_dims like so
img = tf.expand_dims(img, axis=0)
or fixing it when passing to the model
learn.predict(tf.expand_dims(img, axis=0))
You can also look at tf.newaxis (see the second code example here).
I have been trying to stack two images.
The end result will be used as the input to my convolutional neural network.
Now I tried to use dstack, I also tried to use PIL by importing Image.blend but I cannot seem to be arriving to my desired result.
I am asking if anyone has any other ideas which I can use would be greatly appreciated.
This could help you out.
from PIL import Image
image1 = Image.open("img1.jpg")
image2 = Image.open("img2.jpg")
image1 = image1.resize((224, 224))
image1_size = image1.size
image2_size = image2.size
new_image = Image.new('RGB',(2*image1_size[0], image1_size[1]), (250,250,250))
Resize them so that they are the same size, and then use np.stack with axis=3 (if you are using multi-channel images. Else, use axis=2.
Or are you trying to combine them into one image? If so, how? Masking, adding subtracting?
I am trying to load a grayscale image dataset(fashion-mnist) to MobileNet model to predict hand written numbers but according to this tutorial only RGB images can be loaded to the model. When I try to feed fashion-mnist samples, it gives me the following error
Error when checking input: expected keras_layer_13_input to have shape
(224, 224, 3) but got array with shape (224, 224, 1)
How to solve this problem ?
Probably pre-trained MobileNet is not suitable for this task. You have two different problems. Mobilenet is made for Imagenet images which are 224x224 images with 3 color channels, while MNIST dataset is 28x28 images with one color channel. You can repeat the color channel in RGB:
# data.shape [70000, 224, 224, 1] -> [70000, 224, 224, 3]
data = np.repeat(data, 3, -1)
But before that, you need to resize images. For example, you can use PIL for resizing images:
from PIL import Image
data = np.array([Image.fromarray(x).resize([224,224]) for x in data])
There are some small details here which you should figure out yourself. Such as dtype of the images if you have loaded from the dataset as numpy. You may need to convert numpy types to integers with np.uint8().
Mobilenet v2 needs RGB. You might also be able to use the convert function from PIL.
Try this:
from PIL import Image
x= Image.open(input_image).resize((96,96)).convert("RGB")
documentation is here: https://pillow.readthedocs.io/en/stable/reference/Image.html
try this, x = np.stack((x,)*3, axis=-1). Please refer to this link for more details: https://github.com/malnakli/ML/blob/master/tf_serving_keras_mobilenetv2/main.ipynb
I was looking at this Tensorflow tutorial.
In the tutorial the images are magically read like this:
mnist = learn.datasets.load_dataset("mnist")
train_data = mnist.train.images
My images are placed in two directories:
../input/test/
../input/train/
They all have a *.jpg ending.
So how can read them into my program?
I don't think I can use learn.datasets.load_dataset because this seems to take in a specialized dataset structure, while I only have folders with images.
mnist.train.images is essentially a numpy array of shape [55000, 784]. Where, 55000 is the number of images and 784 is the number of pixels in each image (each image is 28x28)
You need to create a similar numpy array from your data in case you want to run this exact code. So, you'll need to iterate over all your images, read image as a numpy array, flatten it and create a matrix of size [num_examples, image_size]
The following code snippet should do it:
import os
import cv2
import numpy as np
def load_data(img_dir):
return np.array([cv2.imread(os.path.join(img_dir, img)).flatten() for img in os.listdir(img_dir) if img.endswith(".jpg")])
A more comprehensive code to enable debugging:
import os
list_of_imgs = []
img_dir = "../input/train/"
for img in os.listdir("."):
img = os.path.join(img_dir, img)
if not img.endswith(".jpg"):
continue
a = cv2.imread(img)
if a is None:
print "Unable to read image", img
continue
list_of_imgs.append(a.flatten())
train_data = np.array(list_of_imgs)
Note:
If your images are not 28x28x1 (B/W images), you will need to change the neural network architecture (defined in cnn_model_fn). The architecture in the tutorial is a toy architecture which only works for simple images like MNIST. Alexnet may be a good place to start for RGB images.
You can check the answers given in How do I convert a directory of jpeg images to TFRecords file in tensorflow?. Easiest way is to use the utility provided by tensor flow :build_image_data.py, which does exactly the thing you want to do.
I am a beginner in Caffe and I am trying to use the Imagenet model for object classification. My requirement is that I want to use it from a webcam feed and detect the objects from the webcam feed.For this, I use the following code
cap = cv2.VideoCapture(0)
while(True):
ret, frame = cap.read() #frame is of type numpy array
#frame = caffe.io.array_to_datum(frame)
img = caffe.io.load_image(frame)
Obviously this does not work since caffe.io.load_image expects an image path.
As you can see, I also tried using caffe io.py's array_to_datum function (got it from this stackoverflow question ) and passed the frame to caffe io load_image but this too does not work.
How can I pass the captured video frames from the webcam directly to caffe io load_image ?
and If that is not possible then what is the way to load the frame into caffe io? Please help. Thanks in advance.
caffe.io.load_image does not do much. It only does the following :
Read image from disk (given the path)
Make sure that the returned image has 3 dimensions (HxWx1 or HxWx3)
(see source caffe.io.load_image)
So it does not load the image into your model, it's just a helper function that loads an image from disk. To load an image into memory, you can load it however you like (from disk, from webcam..etc). To load the network, feed the image into it and do inference, you can do something like the following :
# Load pre-trained network
net=caffe.Net(deployprototxt_path, caffemodel_path, caffe.TEST)
# Feed network with input
net.blobs['data'].data[0,...] = frame
# Do inference
net.forward()
# Access prediction
probabilities = net.blobs['prob'].data
Make sure the frame dimensions match the expected input dimensions as specified in the deploy.prototxt (see example for CaffeNet)
If you are reading the camera frames using OpenCV, you need to re-order color space from OpenCV's default space (BGR) to Caffe input order RGB, then put all values as single precision float:
# load caffe model
net = caffe.Net(model_def, # defines the structure of the model
model_weights, # contains the trained weights
caffe.TEST) # use test mode (e.g., don't perform dropout)
cap = cv2.VideoCapture(1)
cap.set(cv2.CAP_PROP_FRAME_WIDTH,1280);
cap.set(cv2.CAP_PROP_FRAME_HEIGHT,720);
while(True):
ret,frame = cap.read()
image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB).astype(np.float32)/255.0
# you can now pass image to Caffe
net.blobs['data'].data[...] = image
# forward pass, obtain detections
detections = net.forward()['detection_out']