accuracy drop between tensorflow model and converted tflite - python

I'm running into an issue where I convert my keras model into tensorflow lite format but once I do the model accuracy of the converted model drops significantly. The model is a fairly simple natural language processing model. Before conversion the model has an accuracy of around 96%, but once it is converted into the tensorflow lite format (without any optimizations) it drops to around 20%. This is a ridiculous drop in performance so I was wondering is this something that can happen or am I doing something wrong here? I am running the tflite model on a beaglebone SBC running debian and running the inferences on python.
My tflite conversion code:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
My model code:
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, 128, input_length=maxlen),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(24, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])

I encountered the same problem. I solved it with post-training quantization. So I applied quantization on my trained model, and retrain it. It reduced the accuracy significantly that there was no more than roughly 2-10% difference on keras and TFLite.
It seems that when a keras model was converted to TFLite, a sort of quantization was also applied and the float parameters were converted to integers, which resulted in the accuracy drop. By quantizing the model first, we trained the model with integers. I think this is more or less what happened. Correct me if I'm wrong
References
https://www.tensorflow.org/model_optimization/guide/quantization/training
https://www.tensorflow.org/lite/performance/model_optimization

Related

WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op while saving in tensorflow

i just trained a cnn with 99% accuracy on the mnist dataset. my model is working fine and giving accurate results. but when i converted my h5 model to a tflite model, im getting only one result at every time. ie :
code to convert my h5 model into a tflite model :
tf_lite_interpreter=tf.lite.TFLiteConverter.from_keras_model(model)
with open("mnist_tflite.tflite","wb") as f:
f.write(tf_lite_interpreter.convert())
i noticed that i get this warning when converting
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op while saving (showing 1 of 1). These functions will not be directly callable after loading.
also, when i removed Conv2D and MaxPooling2D layers, the warning was gone
model's structure :
model=tf.keras.models.Sequential([
tf.keras.layers.Conv2D(64,(3,3),input_shape=(28,28,1),activation=tf.nn.relu),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Dense(128,activation=tf.nn.relu),
tf.keras.layers.Dense(10,activation=tf.nn.softmax),
])
why is this happening?
any help would be appreciated

accuracy drops around 10% after exporting from pytorch to ONNX

I've been training an efficientnetV2 network using this repository.
The train process goes well and I reach around 93-95% validation accuracy. After that I run an inference process over a set test which contains new images with an acceptable accuracy, around 88% (for example).
After I check if the model works fine on pytorch I need to convert it to ONNX and then to a tensorrt engine. I have a script to run inference with an ONNX model to check if I'm having some problems with the conversion process.
I'm using this code to convert the model:
import torch
from timm.models import create_model
import os
# create model
base_model = create_model(
model_arch,
num_classes=num_classes,
in_chans=3,
checkpoint_path=model_path)
model = torch.nn.Sequential(
base_model,
torch.nn.Softmax(dim=1)
)
model.cpu()
model.eval()
dummy_input = torch.randn(1, 3, 224, 224, requires_grad=True)
torch.onnx.export(model,
dummy_input,
model_export,
verbose=False,
export_params=True,
do_constant_folding=True
)
I've tried several tutorials like this one but unfortunately I'm getting the same result.
I've tried different onset combinations, with and without do_constant_folding, I've even trained another model with parameter called 'exportable', which is a bool and tells the train script if the model is exportable or not (is an experimental feature according to repository's documentation).
Do you have any idea about this issue?
Thanks in advance.
Hard to guess which bug you get, here is few typical:
some layers have not properly converted even after eval
you may need to write timm.create_model(...scriptable=True, exportable=True)
different preprocessing, e.g. timm model input normalized to specific values, after conversion - not.
Do those models output the near values on model(dummy_input) ?

Model accuracy reduction in .tflite model

I'm currently developing some model optimization using TensorFlow by trying different features (quantization, weight pruning...) in some of my company models.
My problem comes when I convert an h5 model to .tflite without any type of special optimization with the following code:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
The model (MobileNet v2) size went from 18.5 MB to 8.9 MB and accuracy from 99.48% to 98.51%
I can't explain this change that is also happening in other models, for example, this ResNet50:
94.9MB to 94MB and 98.51% to 94.51%.
Why is the conversion to .tflite reducing weight and accuracy without any type of extra optimization?

Saving model in pytorch and keras

I have trained model with keras and saved in with the help of pytorch. Will it cause any problems in the future. As far as I know the only difference between them is Keras saves its model's weights as doubles while PyTorch saves its weights as floats.
You can convert your model to double by doing
model.double()
Note that after this, you will need your input to be DoubleTensor.

Quantize a Keras neural network model

Recently, I've started creating neural networks with Tensorflow + Keras and I would like to try the quantization feature available in Tensorflow. So far, experimenting with examples from TF tutorials worked just fine and I have this basic working example (from https://www.tensorflow.org/tutorials/keras/basic_classification):
import tensorflow as tf
from tensorflow import keras
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# fashion mnist data labels (indexes related to their respective labelling in the data set)
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
# preprocess the train and test images
train_images = train_images / 255.0
test_images = test_images / 255.0
# settings variables
input_shape = (train_images.shape[1], train_images.shape[2])
# create the model layers
model = keras.Sequential([
keras.layers.Flatten(input_shape=input_shape),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
# compile the model with added settings
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# train the model
epochs = 3
model.fit(train_images, train_labels, epochs=epochs)
# evaluate the accuracy of model on test data
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)
Now, I would like to employ quantization in the learning and classification process. The quantization documentation (https://www.tensorflow.org/performance/quantization) (the page is no longer available since cca September 15, 2018) suggests to use this piece of code:
loss = tf.losses.get_total_loss()
tf.contrib.quantize.create_training_graph(quant_delay=2000000)
optimizer = tf.train.GradientDescentOptimizer(0.00001)
optimizer.minimize(loss)
However, it does not contain any information about where this code should be utilized or how it should be connected to a TF code (not even mentioning a high level model created with Keras). I have no idea how this quantization part relates to the previously created neural network model. Just inserting it following the neural network code runs into the following error:
Traceback (most recent call last):
File "so.py", line 41, in <module>
loss = tf.losses.get_total_loss()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/util.py", line 112, in get_total_loss
return math_ops.add_n(losses, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 2119, in add_n
raise ValueError("inputs must be a list of at least one Tensor with the "
ValueError: inputs must be a list of at least one Tensor with the same dtype and shape
Is it possible to quantize a Keras NN model in this way or am I missing something basic?
A possible solution that crossed my mind could be using low level TF API instead of Keras (needing to do quite a bit of work to construct the model), or maybe trying to extract some of the lower level methods from the Keras models.
As mentioned in other answers, TensorFlow Lite can help you with network quantization.
TensorFlow Lite provides several levels of support for quantization.
Tensorflow Lite post-training quantization quantizes weights and
activations post training easily. Quantization-aware training allows
for training of networks that can be quantized with minimal accuracy
drop; this is only available for a subset of convolutional neural
network architectures.
So first, you need to decide whether you need post-training quantization or quantization-aware training. For example, if you already saved the model as *.h5 files, you would probably want to follow #Mitiku's instruction and do the post-training quantization.
If you prefer to achieve higher performance by simulating the effect of quantization in training (using the method you quoted in the question), and your model is in the subset of CNN architecture supported by quantization-aware training, this example may help you in terms of interaction between Keras and TensorFlow. Basically, you only need to add this code between model definition and its fitting:
sess = tf.keras.backend.get_session()
tf.contrib.quantize.create_training_graph(sess.graph)
sess.run(tf.global_variables_initializer())
As your network looks quite simple, you can maybe use Tensorflow lite.
Tensorflow lite can be used to quantize keras model.
The following code was written for tensorflow 1.14. It might not work for earlier versions.
First, after training the model you should save your model to h5
model.fit(train_images, train_labels, epochs=epochs)
# evaluate the accuracy of model on test data
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)
model.save("model.h5")
To load keras model use tf.lite.TFLiteConverter.from_keras_model_file
# load the previously saved model
converter = tf.lite.TFLiteConverter.from_keras_model_file("model.h5")
tflite_model = converter.convert()
# Save the model to file
with open("tflite_model.tflite", "wb") as output_file:
output_file.write(tflite_model)
The saved model can be loaded to python script or to other platforms and languages. To use saved tflite model, tensorlfow.lite provides Interpreter. The following example from here shows how to load tflite model from local file using python scripts.
import numpy as np
import tensorflow as tf
# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="tflite_model.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

Categories