Tensorflow Lite quantization on Raspberry Pi 0 in image classification problem - python

I am developing an image classification model/program for Raspberry Pi 0 W. I was wondering if it is possible to make a code upgrade that will accelerate image processing.
General information:
the main model was trained on EfficientNetB5
image dimensions are 240x320 in grayscale
on Raspberry, it should be an image classification, no possibility of 'live streaming' and object detection
I acknowledge that Raspberry Pi 0 W is not the best match for TF, but anyway maybe there is a way for acceleration
at the moment one image is being predicted in 60 seconds, which is too much
My thoughts about this are that maybe I should train the model with lower dimensions and maybe the learning_rate of the main model can affect rpi's speed?
Below I am attaching two scripts.
Tensorflow save_model transformation into tf_lite quantized model
import tensorflow as tf
import tensorflow_hub as hub
from tensorflow.keras.models import load_model
model = load_model('../models/effnet_v22.h5')
TFLITE_QUANT_MODEL = "../tflite_models/effnet_v22_quant.tflite"
run_model = tf.function(lambda x : model(x))
# Save the concrete function.
concrete_func = run_model.get_concrete_function(
tf.TensorSpec(model.inputs[0].shape, model.inputs[0].dtype)
)
# Convert the model to quantized version with post-training quantization
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tflite_quant_model = converter.convert()
open(TFLITE_QUANT_MODEL, "wb").write(tflite_quant_model)
print("TFLite Quantized Model Is Created")
One image processing on Raspberry Pi 0
import tensorflow as tf
import numpy as np
import matplotlib.image as img
import cv2
# uploading tflite model
tflite_interpreter =tf.lite.Interpreter(
model_path='../../tflite_models/effnet_v22_quant.tflite')
# taking pre-trained model parameters
input_details = tflite_interpreter.get_input_details()
output_details = tflite_interpreter.get_output_details()
img_width = input_details[0]['shape'][2]
img_height = input_details[0]['shape'][1]
# uploading and processing the image to be predicted
testimg=img.imread('../img/c21.jpg')
testimg=cv2.resize(testimg, (img_width,img_height))
testimg=cv2.cvtColor(testimg, cv2.COLOR_BGR2GRAY)
testimg=testimg[np.newaxis, ..., np.newaxis]
testimg=np.array(testimg, dtype=np.float32)
# resizing tflite's tensors
tflite_interpreter.resize_tensor_input(input_details[0]['index'], (1, img_height, img_width, 1))
tflite_interpreter.resize_tensor_input(output_details[0]['index'], (1, 8))
tflite_interpreter.allocate_tensors()
input_details = tflite_interpreter.get_input_details()
output_details = tflite_interpreter.get_output_details()
tflite_interpreter.set_tensor(input_details[0]['index'], testimg)
tflite_interpreter.invoke()
tflite_model_predictions = tflite_interpreter.get_tensor(output_details[0]['index'])
# TFLite prediction results
classes = np.array([101,102,104,105, 107, 110, 113, 115]) # class array creation
mat = np.vstack([classes, tflite_model_predictions])
np.set_printoptions(suppress=True, precision = 10) # to get rid of scientific numbers
if np.max(mat[1,:]) > 0.50:
theclass = int(mat[0, np.argmax(mat[1,:])])
else:
theclass = "NO_CLASS"
print(mat)
print("The predicted class is", theclass)

You are using EfficientNet-B5 model which has nearly 30M parameters. Even though you get benefits from Tensorflow Lite and quantization method, it is very hard to get a latency of inference below 30ms assuming you are using high-performance CPU like in Pixel 4. Considering you are using very limited powered embedded system, it is normal to get 60 seconds for one inferencing.
There exists one well-explained webpage about latency on EfficientNet-lite models. Here, you can visit, https://blog.tensorflow.org/2020/03/higher-accuracy-on-vision-models-with-efficientnet-lite.html

Related

Model size is smaller in .onnx format than in .tflite format

I have a pre-trained PyTorch model that I want to convert to TFlite. The model is from the seisbench API. I have used the code below for the conversion. The code has some checks to confirm that the various format conversions worked.
I have followed the flow .pt -> .onnx -> tensorflow -> tflite, but I obtain an .onnx file which is smaller (98 kB) than the final tflite model (108 kB). I am using the onnx-tensorflow library to convert the .onnx file to tensorflow (https://github.com/onnx/onnx-tensorflow)
model = sbm.PhaseNet.from_pretrained("instance") #load the model from the seisbench api
#model.load_state_dict(pNET.state_dict())
print("Model's state_dict:")
for param_tensor in model.state_dict():
print(param_tensor, "\t", model.state_dict()[param_tensor].size())
# Save model information
print(model.get_model_args())
input_lenght = model.in_samples
input_depth = model.in_channels
# save to .pt
model.eval() #turn off gradient computations and other training-only operations
torch.save(model, 'pNET.pt')
# check if the model has been saved correctly
temp_model = torch.load('pNET.pt')
temp_model.eval()
print("Model's state_dict:")
for param_tensor in temp_model.state_dict():
print(param_tensor, "\t", temp_model.state_dict()[param_tensor].size())
# save to .onnx
# define an input vector (random vector)
sample_input = torch.randn(1, input_depth, input_lenght, requires_grad=True) #order is width, depth, lenght of input
#width fixed to 1 for time series data
# export
torch.onnx.export(
model, # PyTorch Model
sample_input, # Input tensor
'pNET.onnx', # Output file name
input_names=['input'], # Input tensor name (arbitrary)
output_names=['output'] # Output tensor name (arbitrary)
)
# check if the model has been saved correctly
onnx_model = onnx.load('pNET.onnx')
# Check that the IR is well formed
onnx.checker.check_model(onnx_model)
# Print a Human readable representation of the graph
onnx.helper.printable_graph(onnx_model.graph)
# Try to run an inference with the newly saved onnx model
import onnxruntime as ort
import numpy as np
ort_session = ort.InferenceSession('pNET.onnx')
outputs = ort_session.run(
None,
{'input': np.random.randn(1, input_depth, input_lenght).astype(np.float32)} #random input
)
print(outputs) #check if you get a tensor of the right shape
print(output_data.shape)
from onnx_tf.backend import prepare
# Converting to TensorFlow model
onnx_model = onnx.load("pNET.onnx") # load onnx model
tf_rep = prepare(onnx_model) # prepare tf representation
tf_rep.export_graph("pNET") # export the model
# Check if the conversion worked
# Run a TF inference
import tensorflow as tf
model = tf.saved_model.load("./pNET")
model.trainable = False
input_tensor = tf.random.uniform([1, input_depth, input_lenght])
out = model(**{'input': input_tensor})
print(out) #check if you get a tensor of the right shape
print(output_data.shape)
# float16 quantization
converter = tf.lite.TFLiteConverter.from_saved_model("./pNET")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_quant_model = converter.convert()
# Save the model
with open('pNETlite16float.tflite', 'wb') as f:
f.write(tflite_model) # same size as when I use interpreter instead of converter?
My confusion stems from the fact that I was expecting post-training quantization to reduce model size. Does TFLite add some extra wrappers or methods to a model, increasing the size compared to .onnx?

How to preprocess input for a keras H5 converted model into a pb file

I successfully converted a Keras H5 model into a Tensorflow pb file but I get totally different result when making a prediction.
In Python I use 2 Keras modules to preprocess the data before feeding the network:
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
Here is how I preprocess the data in my Python code:
# extract the object ROI, convert it from BGR to RGB channel
# ordering, resize it to 224x224, and preprocess it
moving_object = img_orig[startY:endY, startX:endX]
moving_object = cv2.cvtColor(moving_object, cv2.COLOR_BGR2RGB)
moving_object = cv2.resize(moving_object, (224, 224))
moving_object = img_to_array(moving_object)
moving_object = preprocess_input(moving_object)
objects.append(moving_object)
Then I make batch predictions via the Keras predict method:
# only make a predictions if at least one object was detected
if len(objects) > 0:
objects = np.array(objects, dtype="float32")
preds = wine_plant_model.predict(objects)
Here is how I preprocess the data in C++:
vector<Mat> detected_objects;
//extract the object ROI
Mat image_roi = img_orig(roi);
detected_objects.push_back(image_roi);
and how I make batch predictions in C++:
if (detected_objects.size() > 0) {
vector<Mat> preds;
Mat inputBlobs = cv::dnn::blobFromImages(detected_objects, 1.0, Size(224, 224));
net.setInput(inputBlobs);
Mat outputs = net.forward();
}
It seems that I am not preprocessing the image the right way in C++ and therefore I am not getting the same results. But I cannot find a equivalent for the Keras preprocess_input() method in C++.
Looking at the Keras documentation the python preprocess_input() method scale the data between 1 and -1. So I do not know if I should normalize the data using the cv::normalize method or do something with the blobFromImages scale factor. I am a bit confused here.
Could you please tell me what I should do to preprocess the data the same way in C++ even if it is not through Keras which does not seem to be available in C++.

How to load a tflite model in script?

I have converted the .pb file to tflite file using the bazel. Now I want to load this tflite model in my python script just to test that weather this is giving me correct output or not ?
You can use TensorFlow Lite Python interpreter to load the tflite model in a python shell, and test it with your input data.
The code will be like this:
import numpy as np
import tensorflow as tf
# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
The above code is from TensorFlow Lite official guide, for more detailed information, read this.
Using TensorFlow lite models in Python:
The verbosity of TensorFlow Lite is powerful because it allows you more control, but in many cases you just want to pass input and get an output, so I made a class that wraps this logic:
The following works with classification models from tfhub.dev, for example: https://tfhub.dev/tensorflow/lite-model/mobilenet_v2_1.0_224/1/metadata/1
# Usage
model = TensorflowLiteClassificationModel("path/to/model.tflite")
(label, probability) = model.run_from_filepath("path/to/image.jpeg")
import tensorflow as tf
import numpy as np
from PIL import Image
class TensorflowLiteClassificationModel:
def __init__(self, model_path, labels, image_size=224):
self.interpreter = tf.lite.Interpreter(model_path=model_path)
self.interpreter.allocate_tensors()
self._input_details = self.interpreter.get_input_details()
self._output_details = self.interpreter.get_output_details()
self.labels = labels
self.image_size=image_size
def run_from_filepath(self, image_path):
input_data_type = self._input_details[0]["dtype"]
image = np.array(Image.open(image_path).resize((self.image_size, self.image_size)), dtype=input_data_type)
if input_data_type == np.float32:
image = image / 255.
if image.shape == (1, 224, 224):
image = np.stack(image*3, axis=0)
return self.run(image)
def run(self, image):
"""
args:
image: a (1, image_size, image_size, 3) np.array
Returns list of [Label, Probability], of type List<str, float>
"""
self.interpreter.set_tensor(self._input_details[0]["index"], image)
self.interpreter.invoke()
tflite_interpreter_output = self.interpreter.get_tensor(self._output_details[0]["index"])
probabilities = np.array(tflite_interpreter_output[0])
# create list of ["label", probability], ordered descending probability
label_to_probabilities = []
for i, probability in enumerate(probabilities):
label_to_probabilities.append([self.labels[i], float(probability)])
return sorted(label_to_probabilities, key=lambda element: element[1])
Caution
However, you'll need to modify this to support different use cases, since I am passing images as input, and getting classification ([label, probability]) output. If you need text input (NLP), or other output (object detection outputs bounding boxes, labels and probabilities), classification (just labels), etc).
Also, if you are expecting different size image inputs, then you'd have to change the input size and reallocate the model (self.interpreter.allocate_tensors()). This is slow (inefficient). It's better to use the platform resizing functionality (e.g. Android graphics library) instead of using a TensorFlow lite model to do the resizing. Alternatively, you could resize the model with a separate model which would be much quicker to allocate_tensors() for.

Using pre-trained inception_resnet_v2 with Tensorflow

I have been trying to use the pre-trained inception_resnet_v2 model released by Google. I am using their model definition(https://github.com/tensorflow/models/blob/master/slim/nets/inception_resnet_v2.py) and given checkpoint(http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz) to load the model in tensorflow as below [Download a extract the checkpoint file and download sample images dog.jpg and panda.jpg to test this code]-
import tensorflow as tf
slim = tf.contrib.slim
from PIL import Image
from inception_resnet_v2 import *
import numpy as np
checkpoint_file = 'inception_resnet_v2_2016_08_30.ckpt'
sample_images = ['dog.jpg', 'panda.jpg']
#Load the model
sess = tf.Session()
arg_scope = inception_resnet_v2_arg_scope()
with slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(input_tensor, is_training=False)
saver = tf.train.Saver()
saver.restore(sess, checkpoint_file)
for image in sample_images:
im = Image.open(image).resize((299,299))
im = np.array(im)
im = im.reshape(-1,299,299,3)
predict_values, logit_values = sess.run([end_points['Predictions'], logits], feed_dict={input_tensor: im})
print (np.max(predict_values), np.max(logit_values))
print (np.argmax(predict_values), np.argmax(logit_values))
However, the results from this model code does not give the expected results (class no 918 is predicted irrespective of the input image). Can someone help me understand where I am going wrong?
The Inception networks expect the input image to have color channels scaled from [-1, 1]. As seen here.
You could either use the existing preprocessing, or in your example just scale the images yourself: im = 2*(im/255.0)-1.0 before feeding them to the network.
Without scaling the input [0-255] is much larger than the network expects and the biases all work to very strongly predict category 918 (comic books).

How to speed up caffe classifer in python

I am using python to use caffe classifier. I got image from my camera and peform predict image from training set. It work well but the problem is speed very slow. I thinks just 4 frames/second. Could you suggest to me some way to improve computational time in my code?
The problem can be explained as following. I have to reload an network model age_net.caffemodel that its size about 80MB by following code
age_net_pretrained='./age_net.caffemodel'
age_net_model_file='./deploy_age.prototxt'
age_net = caffe.Classifier(age_net_model_file, age_net_pretrained,
mean=mean,
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(256, 256))
And for each input image (caffe_input), I call the predict function
prediction = age_net.predict([caffe_input])
I think that due to size of network is very large. Then predict function takes long time to predict image. I think the slow time is from it.
This is my full reference code. It changed by me.
from conv_net import *
import matplotlib.pyplot as plt
import numpy as np
import cv2
import glob
import os
caffe_root = './caffe'
import sys
sys.path.insert(0, caffe_root + 'python')
import caffe
DATA_PATH = './face/'
cnn_params = './params/gender_5x5_5_5x5_10.param'
face_params = './params/haarcascade_frontalface_alt.xml'
def format_frame(frame):
img = frame.astype(np.float32)/255.
img = img[...,::-1]
return img
if __name__ == '__main__':
files = glob.glob(os.path.join(DATA_PATH, '*.*'))
# This is the configuration of the full convolutional part of the CNN
# `d` is a list of dicts, where each dict represents a convolution-maxpooling
# layer.
# Eg c1 - first layer, convolution window size
# p1 - first layer pooling window size
# f_in1 - first layer no. of input feature arrays
# f_out1 - first layer no. of output feature arrays
d = [{'c1':(5,5),
'p1':(2,2),
'f_in1':1, 'f_out1':5},
{'c2':(5,5),
'p2':(2,2),
'f_in2':5, 'f_out2':10}]
# This is the configuration of the mlp part of the CNN
# first tuple has the fan_in and fan_out of the input layer
# of the mlp and so on.
nnet = [(800,256),(256,2)]
c = ConvNet(d,nnet, (45,45))
c.load_params(cnn_params)
face_cascade = cv2.CascadeClassifier(face_params)
cap = cv2.VideoCapture(0)
cv2.namedWindow("Image", cv2.WINDOW_NORMAL)
plt.rcParams['figure.figsize'] = (10, 10)
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
mean_filename='./mean.binaryproto'
proto_data = open(mean_filename, "rb").read()
a = caffe.io.caffe_pb2.BlobProto.FromString(proto_data)
mean = caffe.io.blobproto_to_array(a)[0]
age_net_pretrained='./age_net.caffemodel'
age_net_model_file='./deploy_age.prototxt'
age_net = caffe.Classifier(age_net_model_file, age_net_pretrained,
mean=mean,
channel_swap=(2,1,0),
raw_scale=255,
image_dims=(256, 256))
age_list=['(0, 2)','(4, 6)','(8, 12)','(15, 20)','(25, 32)','(38, 43)','(48, 53)','(60, 100)']
while(True):
val, image = cap.read()
if image is None:
break
image = cv2.resize(image, (320,240))
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.3, 5, minSize=(30,30))
for f in faces:
x,y,w,h = f
cv2.rectangle(image, (x,y), (x+w,y+h), (0,255,255))
face_image_rgb = image[y:y+h, x:x+w]
caffe_input = cv2.resize(face_image_rgb, (256, 256)).astype(np.float32)
prediction = age_net.predict([caffe_input])
print 'predicted age:', age_list[prediction[0].argmax()]
cv2.imshow('Image', image)
ch = 0xFF & cv2.waitKey(1)
if ch == 27:
break
#break
Try calling age_net.predict([caffe_input]) with oversmaple=False:
prediction = age_net.predict([caffe_input], oversample=False)
The default behavior of predict is to create 10, slightly different, crops of the input image and feed them to the network to classify, by disabling this option you should get a x10 speedup.
For all of you who still use Caffe, I'd recommend trying OpenVINO to decrease inference time. OpenVINO optimizes your model by converting to Intermediate Representation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime. OpenVINO is optimized for Intel hardware, but it should work with any CPU.
Some snippets are below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[caffe]
Use Model Optimizer to convert Caffe model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Caffe model to IR, a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (change data_type). Run in the command line:
mo --input_model "age_net.caffemodel" --data_type FP32 --source_layout "[n,c,h,w]" --target_layout "[n,h,w,c]" --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device, e.g., CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what the best choice for you is, use AUTO. If you care about latency or throughput, I suggest adding a performance hint (as shown below) to use the device that fulfills your requirement.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/age_net.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"}) # alternatively THROUGHPUT or CUMULATIVE_THROUGHPUT
# Get input and output layers
input_layer_ir = compiled_model_ir.input(0)
output_layer_ir = compiled_model_ir.output(0)
# Resize and reshape input image
height, width = list(input_layer_ir.shape)[1:3]
input_image = cv2.resize(input_image, (width, height))[np.newaxis, ...]
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
Disclaimer: I work on OpenVINO.

Categories