I'm currently developing some model optimization using TensorFlow by trying different features (quantization, weight pruning...) in some of my company models.
My problem comes when I convert an h5 model to .tflite without any type of special optimization with the following code:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
The model (MobileNet v2) size went from 18.5 MB to 8.9 MB and accuracy from 99.48% to 98.51%
I can't explain this change that is also happening in other models, for example, this ResNet50:
94.9MB to 94MB and 98.51% to 94.51%.
Why is the conversion to .tflite reducing weight and accuracy without any type of extra optimization?
Related
I have a simple pytorch model which I transformed into ONNX and eventually to tflite.
When I load the model and do inference with TF.lite, all goes well.
However when I try using tflite_runtime to load the model and do inference, I get the following error:
RunTimeError: external/org_tensorflow/tensorflow/lite/kernels/add.cc:385 Type INT64 is unsupported by op Add.Node number 70 (ADD) failed to invoke.
Here is the conversion code I'm currently using with TF2.6:
converter = tf.lite.TFLiteConverter.from_saved_model(path)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter.allow_custom_ops=True # if omitted, conversion fails
tflite_rep = converter.convert()
open('exports/deep_snore.tflite', 'wb').write(tflite_rep)
I have checked many possible tf blogs but I can't figure out where the issue is.
The solution I can think of is to rewrite the model with TF, retrain it and transform it to tflite.
I am pretty new to Deep Learning, I have a custom dataset which is quite large, how do I convert the .h5 model to a .tflite model and also how do I generate all the labels without doing it manually?
From Tensorflow documentation
Convert the model
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) # path to the SavedModel directory
tflite_model = converter.convert()
I'm running into an issue where I convert my keras model into tensorflow lite format but once I do the model accuracy of the converted model drops significantly. The model is a fairly simple natural language processing model. Before conversion the model has an accuracy of around 96%, but once it is converted into the tensorflow lite format (without any optimizations) it drops to around 20%. This is a ridiculous drop in performance so I was wondering is this something that can happen or am I doing something wrong here? I am running the tflite model on a beaglebone SBC running debian and running the inferences on python.
My tflite conversion code:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
My model code:
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, 128, input_length=maxlen),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(24, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
I encountered the same problem. I solved it with post-training quantization. So I applied quantization on my trained model, and retrain it. It reduced the accuracy significantly that there was no more than roughly 2-10% difference on keras and TFLite.
It seems that when a keras model was converted to TFLite, a sort of quantization was also applied and the float parameters were converted to integers, which resulted in the accuracy drop. By quantizing the model first, we trained the model with integers. I think this is more or less what happened. Correct me if I'm wrong
References
https://www.tensorflow.org/model_optimization/guide/quantization/training
https://www.tensorflow.org/lite/performance/model_optimization
I was trying to compile a py file to binary that simply reads the model from a json file and predicts the output via the imported model.
Now the issue is that when I try to compile the program through pyinstaller the resulting file is around 290mb as it tries to compile the whole tensorflow package and it unwanted dependencies. Not to mention that it is very very slow to start up as tries to extract the files.
As you can see it is just a simple code that runs through a folder of images and identifies them as either a meme or a non meme content to clean my whatsapp folder.
import os
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import model_from_json
import shutil
BATCH_SIZE = 8
SHAPE = (150,150,3)
p = input("Enter path of folder to extract memes from")
p = p.replace("'","")
p = p.replace('"','')
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
model = model_from_json(loaded_model_json)
model.load_weights('weights.best.hdf5')
test_datagen = ImageDataGenerator(rescale=1./255)
generator = test_datagen.flow_from_directory(
p,
target_size=SHAPE[:-1],
batch_size=BATCH_SIZE,
class_mode=None,
shuffle=False)
pred = model.predict_generator(generator)
# print(pred)
# print(model.summary())
pred[pred>0.5] = 1
pred[pred<=0.5] = 0
d = ["garbage","good"]
for i in range(len(pred)):
filename = generator.filenames[i].split('\\')[1]
if(pred[i][0] == 0):
shutil.move(generator.filepaths[i],os.path.join(os.path.join('test',str(int(pred[i][0]))),filename))
So my question is that, is there an alternative to the model.predict function that may be a lot lighter than the one tensorflow has as I do not want to include the whole 600mb tensorflow package in my distribution.
Why don't you use quantization on your Tensorflow models? Model size can be reduced to 75%. This is the processing that enables Tensorflow Lite to make predictions on pictures in real-time, on nothing more than a mobile phone CPU.
Essentially, weights can be converted to types with reduced precision, such as 16 bit floats or 8 bit integers. Tensorflow generally recommends 16-bit floats for GPU acceleration and 8-bit integer for CPU execution. Read the guide here.
Quantization brings improvements via model compression and latency reduction. With the API defaults, the model size shrinks by 4x, and we typically see between 1.5 - 4x improvements in CPU latency in the tested backends. Eventually, latency improvements can be seen on compatible machine learning accelerators, such as the EdgeTPU and NNAPI.
Post-training quantization includes general techniques to reduce CPU and hardware accelerator latency, processing, power, and model size with little degradation in model accuracy. These techniques can be performed on an already-trained float TensorFlow model and applied during TensorFlow Lite conversion.
Read more about post-training model quantization here.
Is there any way to convert data-00000-of-00001 to Tensorflow Lite model?
The file structure is like this
|-semantic_model.data-00000-of-00001
|-semantic_model.index
|-semantic_model.meta
Using TensorFlow Version: 1.15
The following 2 steps will convert it to a .tflite model.
1. Generate a TensorFlow Model for Inference (a frozen graph .pb file) using the answer posted here
What you currently have is model checkpoint (a TensorFlow 1 model saved in 3 files: .data..., .meta and .index. This model can be further trained if needed). You need to convert this to a frozen graph (a TensorFlow 1 model saved in a single .pb file. This model cannot be trained further and is optimized for inference/prediction).
2. Generate a TensorFlow lite model ( .tflite file)
A. Initialize the TFLiteConverter: The .from_frozen_graph API can be defined this way and the attributes which can be added are here. To find the names of these arrays, visualize the .pb file in Netron
converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(
graph_def_file='....path/to/frozen_graph.pb',
input_arrays=...,
output_arrays=....,
input_shapes={'...' : [_, _,....]}
)
B. Optional: Perform the simplest optimization known as post-training dynamic range quantization. You can refer to the same document for other types of optimizations/quantization methods.
converter.optimizations = [tf.lite.Optimize.DEFAULT]
C. Convert it to a .tflite file and save it
tflite_model = converter.convert()
tflite_model_size = open('model.tflite', 'wb').write(tflite_model)
print('TFLite Model is %d bytes' % tflite_model_size)