Using pre-trained inception_resnet_v2 with Tensorflow - python

I have been trying to use the pre-trained inception_resnet_v2 model released by Google. I am using their model definition(https://github.com/tensorflow/models/blob/master/slim/nets/inception_resnet_v2.py) and given checkpoint(http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz) to load the model in tensorflow as below [Download a extract the checkpoint file and download sample images dog.jpg and panda.jpg to test this code]-
import tensorflow as tf
slim = tf.contrib.slim
from PIL import Image
from inception_resnet_v2 import *
import numpy as np
checkpoint_file = 'inception_resnet_v2_2016_08_30.ckpt'
sample_images = ['dog.jpg', 'panda.jpg']
#Load the model
sess = tf.Session()
arg_scope = inception_resnet_v2_arg_scope()
with slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(input_tensor, is_training=False)
saver = tf.train.Saver()
saver.restore(sess, checkpoint_file)
for image in sample_images:
im = Image.open(image).resize((299,299))
im = np.array(im)
im = im.reshape(-1,299,299,3)
predict_values, logit_values = sess.run([end_points['Predictions'], logits], feed_dict={input_tensor: im})
print (np.max(predict_values), np.max(logit_values))
print (np.argmax(predict_values), np.argmax(logit_values))
However, the results from this model code does not give the expected results (class no 918 is predicted irrespective of the input image). Can someone help me understand where I am going wrong?

The Inception networks expect the input image to have color channels scaled from [-1, 1]. As seen here.
You could either use the existing preprocessing, or in your example just scale the images yourself: im = 2*(im/255.0)-1.0 before feeding them to the network.
Without scaling the input [0-255] is much larger than the network expects and the biases all work to very strongly predict category 918 (comic books).

Related

Keras: ValueError: decode_predictions expects a batch of predictions, NEW

I'm using keras' pre-trained model VGG16, following this link: Transfer learning I'm trying predict content of an image:
# example of using a pre-trained model as a classifier
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions
from keras.applications.vgg16 import VGG16
# load an image from file
image = load_img('dog.jpg', target_size=(224, 224))
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the VGG model
image = preprocess_input(image)
# load the model
model = VGG16()
# predict the probability across all output classes
yhat = model.predict(image)
# convert the probabilities to class labels
label = decode_predictions(yhat)
# retrieve the most likely result, e.g. highest probability
label = label[0][0]
# print the classification
print('%s (%.2f%%)' % (label[1], label[2]*100))
Full Error Output:
ValueError: decode_predictions expects a batch of predictions (i.e. a 2D array of shape (samples, 2622)) for V1 or (samples, 8631) for V2.Found array with shape: (1, 1000)
This is link to a seemingly similar question on SO.
Any comments and suggestions highly appreciated. Thank you!
I ran your code and it works properly. Since I do not have your image dog.jpg I used a color jpg image of an Afghan dog and the network identified it correctly as an Afghan Hound. So I suspect there is something amiss with your image. Yhat is a 1 X 1000 array as expected. Ensure you image is an rgb image.
thank you for your help. I was running this in Colab and had earlier tests code where in different cell i have imported for:
from keras_vggface.vggface import VGGFace
from keras_vggface.utils import preprocess_input
from keras_vggface.utils import decode_predictions
That was the reason for the error.... –

Tensorflow Lite quantization on Raspberry Pi 0 in image classification problem

I am developing an image classification model/program for Raspberry Pi 0 W. I was wondering if it is possible to make a code upgrade that will accelerate image processing.
General information:
the main model was trained on EfficientNetB5
image dimensions are 240x320 in grayscale
on Raspberry, it should be an image classification, no possibility of 'live streaming' and object detection
I acknowledge that Raspberry Pi 0 W is not the best match for TF, but anyway maybe there is a way for acceleration
at the moment one image is being predicted in 60 seconds, which is too much
My thoughts about this are that maybe I should train the model with lower dimensions and maybe the learning_rate of the main model can affect rpi's speed?
Below I am attaching two scripts.
Tensorflow save_model transformation into tf_lite quantized model
import tensorflow as tf
import tensorflow_hub as hub
from tensorflow.keras.models import load_model
model = load_model('../models/effnet_v22.h5')
TFLITE_QUANT_MODEL = "../tflite_models/effnet_v22_quant.tflite"
run_model = tf.function(lambda x : model(x))
# Save the concrete function.
concrete_func = run_model.get_concrete_function(
tf.TensorSpec(model.inputs[0].shape, model.inputs[0].dtype)
)
# Convert the model to quantized version with post-training quantization
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tflite_quant_model = converter.convert()
open(TFLITE_QUANT_MODEL, "wb").write(tflite_quant_model)
print("TFLite Quantized Model Is Created")
One image processing on Raspberry Pi 0
import tensorflow as tf
import numpy as np
import matplotlib.image as img
import cv2
# uploading tflite model
tflite_interpreter =tf.lite.Interpreter(
model_path='../../tflite_models/effnet_v22_quant.tflite')
# taking pre-trained model parameters
input_details = tflite_interpreter.get_input_details()
output_details = tflite_interpreter.get_output_details()
img_width = input_details[0]['shape'][2]
img_height = input_details[0]['shape'][1]
# uploading and processing the image to be predicted
testimg=img.imread('../img/c21.jpg')
testimg=cv2.resize(testimg, (img_width,img_height))
testimg=cv2.cvtColor(testimg, cv2.COLOR_BGR2GRAY)
testimg=testimg[np.newaxis, ..., np.newaxis]
testimg=np.array(testimg, dtype=np.float32)
# resizing tflite's tensors
tflite_interpreter.resize_tensor_input(input_details[0]['index'], (1, img_height, img_width, 1))
tflite_interpreter.resize_tensor_input(output_details[0]['index'], (1, 8))
tflite_interpreter.allocate_tensors()
input_details = tflite_interpreter.get_input_details()
output_details = tflite_interpreter.get_output_details()
tflite_interpreter.set_tensor(input_details[0]['index'], testimg)
tflite_interpreter.invoke()
tflite_model_predictions = tflite_interpreter.get_tensor(output_details[0]['index'])
# TFLite prediction results
classes = np.array([101,102,104,105, 107, 110, 113, 115]) # class array creation
mat = np.vstack([classes, tflite_model_predictions])
np.set_printoptions(suppress=True, precision = 10) # to get rid of scientific numbers
if np.max(mat[1,:]) > 0.50:
theclass = int(mat[0, np.argmax(mat[1,:])])
else:
theclass = "NO_CLASS"
print(mat)
print("The predicted class is", theclass)
You are using EfficientNet-B5 model which has nearly 30M parameters. Even though you get benefits from Tensorflow Lite and quantization method, it is very hard to get a latency of inference below 30ms assuming you are using high-performance CPU like in Pixel 4. Considering you are using very limited powered embedded system, it is normal to get 60 seconds for one inferencing.
There exists one well-explained webpage about latency on EfficientNet-lite models. Here, you can visit, https://blog.tensorflow.org/2020/03/higher-accuracy-on-vision-models-with-efficientnet-lite.html

How to run prediction (using image as input) for a saved model?

Problem:
I am very new to Tensorflow. My specific question is what particular arguments should I put inside sess.run(fetches, feed_dict) function. For instance, how could find out what the values of the arguments?
Steps:
Here is my understanding of the steps after looking at other posts.
Save tranied tensorflow model, it should consists of 4 files, below are my outputs:
checkpoint
Inception_resnet_v2.ckpt.data-00000-of-00001
Inception_resnet_v2.ckpt.index
Inception_resnet_v2.ckpt.meta
Resize the input image to whatever format required by the neural network.
Start tensorflow session.
Retrive the Graph and associated parameters, tensors...
Predict the input image.
Code:
Traning code:
https://github.com/taki0112/SENet-Tensorflow/blob/master/SE_Inception_resnet_v2.py
[Solved] Test code:
import tensorflow as tf
import numpy as np
import cv2
labels = ["airplane","automobile","bird","cat","deer","dog","frog","horse","ship","truck"]
# Load graph and parameters, etc.
sess=tf.Session()
saver = tf.train.import_meta_graph('./model/Inception_resnet_v2.ckpt.meta')
saver.restore(sess, tf.train.latest_checkpoint("./model/"))
graph = tf.get_default_graph()
# Get tensor names
x = graph.get_tensor_by_name("Placeholder:0")
training_flag = graph.get_tensor_by_name("Placeholder_2:0")
op_to_restore = graph.get_tensor_by_name("final_fully_connected/dense/BiasAdd:0")
# Preprocess imgae imput
src = cv2.imread("./input/car3.jpg")
dst = cv2.resize(src, (32, 32), interpolation=cv2.INTER_CUBIC)
b,g,r = cv2.split(dst)
b = (b - np.mean(b)) / np.std(b) * .1
g = (g - np.mean(g)) / np.std(g) * .1
r = (r - np.mean(r)) / np.std(r) * .1
src = cv2.merge((b,g,r))
picture = dst.reshape(1, 32, 32, 3)
feed_dict ={x: picture, training_flag:False}
result_index = sess.run(op_to_restore,feed_dict)
print(result_index)
print (labels[np.argmax(result_index)])
the arguments actually depend on what you're doing, but mostly the first argument is the weights and placeholders. Whenever you are working with Tensorflow, you define a graph which is fed examples(training data) and some hyperparameters like learning rate, global step etc. It’s a standard practice to feed all the training data and hyperparameters using placeholders. when you build a network using placeholders and save it the network is saved, however, values of the placeholders are not saved.
Let's see a toy example:
import tensorflow as tf
#Prepare to feed input, i.e. feed_dict and placeholders
w1 = tf.placeholder("float", name="w1")
w2 = tf.placeholder("float", name="w2")
b1= tf.Variable(2.0,name="bias")
feed_dict ={w1:4,w2:8}
#Define a test operation that we will restore
w3 = tf.add(w1,w2)
w4 = tf.multiply(w3,b1,name="op_to_restore")
sess = tf.Session()
sess.run(tf.global_variables_initializer())
#Create a saver object which will save all the variables
saver = tf.train.Saver()
#Run the operation by feeding input
print sess.run(w4,feed_dict)
#Prints 24 which is sum of (w1+w2)*b1
#Now, save the graph
saver.save(sess, 'my_test_model',global_step=1000)
Now, when we want to restore it, we not only have to restore the graph and weights, but also prepare a new feed_dict that will feed the new training data to the network. We can get reference to these saved operations and placeholder variables via graph.get_tensor_by_name() method. So if you want to train the same model with further new data, then you would have to utilize those weigtages, if however you just want to get the prediction from the model you trained, you could utilize the op_to_restore and the feed_dict as new data. Something like this, if you follow the above example:
import tensorflow as tf
sess=tf.Session()
#First let's load meta graph and restore weights
saver = tf.train.import_meta_graph('my_test_model-1000.meta')
saver.restore(sess,tf.train.latest_checkpoint('./'))
# Now, let's access and create placeholders variables and
# create feed-dict to feed new data
graph = tf.get_default_graph()
w1 = graph.get_tensor_by_name("w1:0")
w2 = graph.get_tensor_by_name("w2:0")
feed_dict ={w1:13.0,w2:17.0}
#Now, access the op that you want to run.
op_to_restore = graph.get_tensor_by_name("op_to_restore:0")
print sess.run(op_to_restore,feed_dict)
#This will print 60 which is calculated
#using new values of w1 and w2 and saved value of b1.
So, this is how it works, in your case, since you're trying to load the Inception model, your op_to_restore should depend on what you're trying to restore if you could tell us what you're trying to do, then only it's possible to suggest something. However in the other parameter feed_dict , it's just the numpy array of image pixel, of you, you're trying to classify/predict or whatever you're doing.
I took the code from the following article. This will help you as well. http://cv-tricks.com/tensorflow-tutorial/save-restore-tensorflow-models-quick-complete-tutorial/
Update: For your particular case, you may like to try the following code to predict the classes in the new images.
import tensorflow as tf
slim = tf.contrib.slim
from inception_resnet_v2 import *
#Well, since you're using resnet_v2, this may be equivalent to you.
checkpoint_file = 'inception_resnet_v2_2016_08_30.ckpt'
sample_images = ['dog.jpg', 'panda.jpg']
#Load the model
sess = tf.Session()
arg_scope = inception_resnet_v2_arg_scope()
with slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(input_tensor, is_training=False)
#With this, you could consider the op_variable with the following
predict_values, logit_values = sess.run([end_points['Predictions'], logits], feed_dict={input_tensor: im})
#Here im is the normalized numpy array of the image pixels.
Furthermore, the following resources may help you even more:
Using pre-trained inception_resnet_v2 with Tensorflow
https://github.com/tensorflow/tensorflow/issues/7172

Load a single image in a pretrained pytorch net

Total newbie here, I'm using this pytorch SegNet implementation with a '.pth' file containing weights from a 50 epochs training.
How can I load a single test image and see the net prediction?
I know this may sound like a stupid question but I'm stuck.
What I've got is:
from segnet import SegNet
import torch
model = SegNet(2)
model.load_state_dict(torch.load('./model_segnet_epoch50.pth'))
How do I "use" the net on a single test picture?
I provide with an example of ResNet152 pre-trained model.
def image_loader(loader, image_name):
image = Image.open(image_name)
image = loader(image).float()
image = torch.tensor(image, requires_grad=True)
image = image.unsqueeze(0)
return image
data_transforms = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor()
])
model_ft = models.resnet152(pretrained=True)
model_ft.eval()
print( np.argmax(model_ft(image_loader(data_transforms, $FILENAME)).detach().numpy()))
$FILENAME is the path and name of your image to be loaded. I got necessary help from this post.
output = model(image)
.
Note that the image should be a Variable object and that the output will be as well.
If your image is, for example, a Numpy array, you can convert it like so:
var_image = Variable(torch.Tensor(image))

How to pass images through TensorFlow-Slim VGG Pre-Trained Net in Batches?

I want to pass the images through the network for a transfer learning task. In the following code I'm building the graph and then getting the outputs of a fully connected layer. I wanted to get the outputs in batches because I have an array with more than 20k images.
The vgg.vgg_16(images) required images to be an array of images. I tried feeding an input placeholder (after looking at the docs) but when loading the checkpoint I got an error There are no variables to save.
I can feed vgg.vgg_16(images) a few images at a time but I would need to load the checkpoint for each batch. I'm pretty sure there is a better way to do that. Is there any examples or references I can look at?
from tensorflow.contrib import slim
from tensorflow.contrib.slim.nets import vgg
images = np.array(read_images(val_filenames[:4], 224, 224), dtype=np.float32) # load images and resize to 224 x 224
vgg_graph = tf.Graph()
with vgg_graph.as_default():
with slim.arg_scope(vgg.vgg_arg_scope()):
outputs, end_points = vgg.vgg_16(images, is_training=False)
fc6 = end_points['vgg_16/fc6']
with tf.Session(graph=vgg_graph) as sess:
saver = tf.train.Saver()
saver.restore(sess, 'checkpoints/vgg_16.ckpt')
# pass images through the network
fc6_output = sess.run(fc6)
I also tried this and this references but I didn't find the answers.
You can create a placeholder that you can pass it to vgg network. Change your code to:
images = tf.placeholder(tf.float32, shape=[batch_size, height, width, channels])
with slim.arg_scope(vgg.vgg_arg_scope()):
outputs, end_points = vgg.vgg_16(images, is_training=False)
and during training, pass the input to the network:
fc6_output = sess.run(fc6, feed_dict={images:batch_images})

Categories