I want to quantization-aware train with my keras model. I have tried like below. I'm using tensorflow 1.14.0
train_graph = tf.Graph()
train_sess = tf.compat.v1.Session(graph=train_graph)
tf.compat.v1.keras.backend.set_session(train_sess)
with train_graph.as_default():
tf.keras.backend.set_learning_phase(1)
model = my_keras_model()
tf.contrib.quantize.create_training_graph(input_graph = train_graph, quant_delay=5)
train_sess.run(tf.global_variables_initializer())
model.compile(...)
model.fit_generator(...)
saver = tf.compat.v1.train.Saver()
saver.save(train_sess, checkpoint_path)
It works without errors.
However, size of saved model(h5 and ckpt) is completely same as the model without quantization.
Is it the right way? How I can check whether it is quantized well?
Or, is there better way to quantize?
When you finish the quantization-aware-training and save your model to disk, it is actually not already quantized. In other words, it is "prepared" for quantization, but the weights are still float32. You have to further convert your model to TFLite for it to actually be quantized. You can do so with the following piece of code:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()
This will quantize your model with int8 weights and uint8 activations.
Have a look at the official example for further reference.
Related
What I did:
convert the .weights to SavedModel (.pb), using tf.keras.models.save()
load the new model file in two ways below:
# 1st way: tf.saved_model.load
model = tf.saved_model.load(model_path, tags=[tag_constants.SERVING])
infer = model.signatures['serving_default']
res = infer(tf.constant(batch_data))
# 2nd way: tf.keras.models.load
model = tf.keras.models.load_model(model_path, compile=False)
res = model.predict(batch_data)
The first runs at 15 FPS, while the second runs at 10 FPS, 1.5x slower...
My ultimate goal is to output both intermediate layer outputs and final predictions. And the only (simple) way to achieve this in TF2 is by tf.keras.Modul(model.inputs, [<layer>.output, model.output]). I need that loaded model to be an keras object to implement this.
So how could I use keras way and maintain the same inference speed?
I'm working with neuronal networks and I need to use a Raspberry Pi v2.
When I want to install tensorflow 2.X it fails, and I just can install tensorflow 1.14. For this reason I found the tflite library, that theoretically helps me, with a lite version of tf.
Here an image that shows I can't install it.
First of all, I convert my keras model (model.h5) into .tflite model.
# Convert the model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Save the model.
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
Since here, all OK. The problem is when I want to use this model. With tensorflow I know how to do it,
from tensorflow import keras
def importModel(myPath):
file = open(myPath+'model/model.json', 'r')
model_json = file.read(); file.close()
model = keras.models.model_from_json(model_json)
model.load_weights(myPath+'model/model.h5')
return model, scaler
But I really don't understand how to do it with tflite, can somebody help me, please??
You can find this in the official documentation
import numpy as np
import tensorflow as tf
# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
And if you have trouble to install TensorFlow 2.x on your raspberry it's maybe because you are not using the latest version of Python3
I trained a Pipeline model, which uses CountVectorizer, TfidfTransformer, OneVsRestClassifier and also a GridSearchCV.
Now I want to save it into a tflite file, to use it on my Android app.
For a Sequential model (where my tflite file was created successfully), I did:
sequential_model = Sequential()
...
# train and fit the model
...
h5_file = "h5_model.h5"
tflite_file = "tflite_model.tflite"
sequential_model.save(h5_file)
converter = tf.lite.TFLiteConverter.from_keras_model_file(h5_file)
tflite_model = converter.convert()
open(tflite_file, "wb").write(tflite_model)
All good to save Sequential model into a tflite file.
Well, Pipeline has no attribute "save", unlike a Sequential model, so I tried saving the Pipeline model with joblib and then with pickle, but none of them worked.
Let's say that pipeline_model is my trained model (the one described in the first sentence).
pb_file = 'pipeline_model.pb'
# I also tried with other extensions, like h5, hdf5, sav, pkl
joblib.dump(pipeline_model, filename)
# or with pickle equivalent and pkl extension
# pickle.dump(pipeline_model, open(pb_file, 'wb'))
Now the pb file is created and I want to create a tflite one. Since it's not a Keras model, I can't use from_keras_model_file, so I tried instead with from_saved_model.
pb_file = 'pipeline_model.pb'
tflite_file = "tflite_model.tflite"
converter = tf.lite.TFLiteConverter.from_saved_model(pb_file)
tflite_model = converter.convert()
open(tflite_file, "wb").write(tflite_model)
It generates the error on line of converter = ...:
OSError: SavedModel file does not exist at: pb_file.pb/{saved_model.pbtxt|saved_model.pb}
I tried running it on Kaggle, Colab, PyCharm IDE, with both versions of tensorflow (1 and 2), with different file extensions and nothing seems to work.
I also noticed that TFLiteConverter contains the methods from_frozen_graph and from_session, but these two requires an extra parameter, so I don't think these could be the solution.
So, how can I obtain my tflite file from the trained Pipeline model? Please, if you find any solution, tell me the library versions that you used, since there could be a different behaviour on different libs.
PS. Please dont point me to converting Keras model directly to tflite as my .h5 file would fail to convert to .tflite directly. I somehow managed to convert my .h5 file to .pb
I have followed this Jupyter notebook for face recognition using Keras. I then saved my model to a model.h5 file, then converted it to a frozen graph, model.pb using this.
Now I want to use my tensorflow file in Android. For this I will need to have Tensorflow Lite, which requires me to convert my model into a .tflite format.
For this, I'm trying to follow the official guidelines for it here. As you can see there, it requires input_array and output_array arrays. How do I obtain details of these things from my model.pb file?
input arrays and output arrays are the arrays which store input and output tensors respectively.
They intend to inform the TFLiteConverter about the input and output tensors which will be used at the time of inference.
For a Keras model,
The input tensor is the placeholder tensor of the first layer.
input_tensor = model.layers[0].input
The output tensor may relate with a activation function.
output_tensor = model.layers[ LAST_LAYER_INDEX ].output
For a Frozen Graph,
import tensorflow as tf
gf = tf.GraphDef()
m_file = open('model.pb','rb')
gf.ParseFromString(m_file.read())
We get the names of the nodes,
for n in gf.node:
print( n.name )
To get the tensor,
tensor = n.op
The input tensor may be a placeholder tensor. Output tensor is the tensor which you run using session.run()
For conversion, we get,
input_array =[ input_tensor ]
output_array = [ output_tensor ]
I trained my model with estimator of TensorFlow. It seems that export_savedmodel should be used to make .pb file, but I don't really know how to construct the serving_input_receiver_fn. Anybody any ideas?
Example code is welcomed.
Extra questions:
Is .pb the only file I need when I want to reload the model? Variable unnecessary?
How much will .pb reduced the model file size compared with .ckpt with adam optimizer?
You can use freeze_graph.py to produce a .pb from .ckpt + .pbtxt
if you're using tf.estimator.Estimator, then you'll find these two files in the model_dir
python freeze_graph.py \
--input_graph=graph.pbtxt \
--input_checkpoint=model.ckpt-308 \
--output_graph=output_graph.pb
--output_node_names=<output_node>
Is .pb the only file I need when I want to reload the model? Variable unnecessary?
Yes, You'll have to know you're model's input nodes and output node names too. Then use import_graph_def to load the .pb file and get the input and output operations using get_operation_by_name
How much will .pb reduced the model file size compared with .ckpt with adam optimizer?
A .pb file is not a compressed .ckpt file, so there is no "compression rate".
However, there is a way to optimize your .pb file for inference, and this optimization may reduce the file size as it removes parts of the graph that are training only operations (see the complete description here).
[comment] how can I get the input and output node names?
You can set the input and output node names using the op name parameter.
To list the node names in your .pbtxt file, use the following script.
import tensorflow as tf
from google.protobuf import text_format
with open('graph.pbtxt') as f:
graph_def = text_format.Parse(f.read(), tf.GraphDef())
print [n.name for n in graph_def.node]
[comment] I found that there is a tf.estimator.Estimator.export_savedmodel(), is that the function to store model in .pb directly? And I'm struggling in it's parameter serving_input_receiver_fn. Any ideas?
export_savedmodel() generates a SavedModel which is a universal serialization format for TensorFlow models. It should contain everything's needed to fit with TensorFlow Serving APIs
serving_input_receiver_fn() is a part of those needed things you have to provide in order to generate a SavedModel, it determines the input signature of your model by adding placeholders to the graph.
From the doc
This function has the following purposes:
To add placeholders to the graph that the serving system will feed
with inference requests.
To add any additional ops needed to convert
data from the input format into the feature Tensors expected by the
model.
If you're receiving your inference requests in the form of serialized tf.Examples (which is a typical pattern) then you can use the example provided in the doc.
feature_spec = {'foo': tf.FixedLenFeature(...),
'bar': tf.VarLenFeature(...)}
def serving_input_receiver_fn():
"""An input receiver that expects a serialized tf.Example."""
serialized_tf_example = tf.placeholder(dtype=tf.string,
shape=[default_batch_size],
name='input_example_tensor')
receiver_tensors = {'examples': serialized_tf_example}
features = tf.parse_example(serialized_tf_example, feature_spec)
return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)
[comment] Any idea to list the node names in '.pb'?
It depends on how it was generated.
if it's a SavedModel the use:
import tensorflow as tf
with tf.Session() as sess:
meta_graph_def = tf.saved_model.loader.load(
sess,
[tf.saved_model.tag_constants.SERVING],
'./saved_models/1519232535')
print [n.name for n in meta_graph_def.graph_def.node]
if it's a MetaGraph then use:
import tensorflow as tf
from tensorflow.python.platform import gfile
with tf.Session() as sess:
with gfile.FastGFile('model.pb', 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
sess.graph.as_default()
tf.import_graph_def(graph_def, name='')
print [n.name for n in graph_def.node]