How to Optimize a Trained Frozen Model for Inference?

How to Optimize a Trained Frozen Model for Inference? - python

My goal is to decrease the size and complexity of a pre-trained Model (a Tensorflow Frozen Graph as Protobuf .pb file) as far as possible to make Inference (in my case realtime object detection using Webcams) as fast as possible.
(See my project repo for more information: https://github.com/GustavZ/realtime_object_detection)
Let's take a look at the pre-trained ssd_mobilenet_v1_coco provided by the tensorflow object detection API:
link to ssd_mobilenet graph
Which Layers are not necessary for inference (so only for the already completed training) and thus can be removed (from the config file to export a new frozen model using the export_inference_graph.py script)?
It would be very nice to get a general answer on how to optimize Models for inference as well as an answer on my special case as I think this could be of interest for others.
EDIT: I know now about the optimize_for_inference.py script provided py tensorflow. But i have no experience using it, for example how do i know which are the really necessary Input and Output nodes, or how do i read them from tensorboard?

Related

What is the difference between freeze_graph and write_graph?

Google says:
But as far as I know if I would like to use tensorflow inference i need protobuf file (.pb) which I can get using freeze_graph method. So what is the difference between those two?

As a heads-up, freeze_graph is generally deprecated in TensorFlow 2.x. You should be using Saved Models for the same functionality in Tensorflow 2.x. I'll be answering this question from the perspective of TensorFlow 1.x.
Before understanding the difference between the two, you need to know how a TensorFlow model is shaped.
Each TensorFlow model is composed of a graph data structure, which contains the Operation objects, which are the units of computation, and the Tensor objects, which are the units of data that flow between them.
However, a graph alone is not enough to do anything like inference. When you train a model, it learns and optimizes a unique set of parameters for the different parts of your graph.
A PB file, then, includes both of these parts - the graph that represents the structure of the model, and the parameters that the model has learned through training.
So back to the original question - what's the difference between write_graph and freeze_graph?
write_graph writes out the graph of the model (the structure) into the PB file.
This does not require any training, so it doesn't include any parameters the model may have learned.
freeze_graph takes the trained parameters of the model from a training checkpoint and saves that as well to the PB file.

Correct pb file to move Tensorflow model into ML.NET

I have a TensorFlow model that I built (a 1D CNN) that I would now like to implement into .NET.
In order to do so I need to know the Input and Output nodes.
When I uploaded the model on Netron I get a different graph depending on my save method and the only one that looks correct comes from an h5 upload. Here is the model.summary():
If I save the model as an h5 model.save("Mn_pb_model.h5") and load that into the Netron to graph it, everything looks correct:
However, ML.NET will not accept h5 format so it needs to be saved as a pb.
In looking through samples of adopting TensorFlow in ML.NET, this sample shows a TensorFlow model that is saved in a similar format to the SavedModel format - recommended by TensorFlow (and also recommended by ML.NET here "Download an unfrozen [SavedModel format] ..."). However when saving and loading the pb file into Netron I get this:
And zoomed in a little further (on the far right side),
As you can see, it looks nothing like it should.
Additionally the input nodes and output nodes are not correct so it will not work for ML.NET (and I think something is wrong).
I am using the recommended way from TensorFlow to determine the Input / Output nodes:
When I try to obtain a frozen graph and load it into Netron, at first it looks correct, but I don't think that it is:
There are four reasons I do not think this is correct.
it is very different from the graph when it was uploaded as an h5 (which looks correct to me).
as you can see from earlier, I am using 1D convolutions throughout and this is showing that it goes to 2D (and remains that way).
this file size is 128MB whereas the one in the TensorFlow to ML.NET example is only 252KB. Even the Inception model is only 56MB.
if I load the Inception model in TensorFlow and save it as an h5, it looks the same as from the ML.NET resource, yet when I save it as a frozen graph it looks different. If I take the same model and save it in the recommended SavedModel format, it shows up all messed up in Netron. Take any model you want and save it in the recommended SavedModel format and you will see for yourself (I've tried it on a lot of different models).
Additionally in looking at the model.summary() of Inception with it's graph, it is similar to its graph in the same way my model.summary() is to the h5 graph.
It seems like there should be an easier way (and a correct way) to save a TensorFlow model so it can be used in ML.NET.
Please show that your suggested solution works: In the answer that you provide, please check that it works (load the pb model [this should also have a Variables folder in order to work for ML.NET] into Netron and show that it is the same as the h5 model, e.g., screenshot it). So that we are all trying the same thing, here is a link to a MNIST ML crash course example. It takes less than 30s to run the program and produces a model called my_model. From here you can save it according to your method and upload it to see the graph on Netron. Here is the h5 model upload:

This answer is made of 3 parts:
going through other programs
NOT going through other programs
Difference between op-level graph and conceptual graph (and why Netron show you different graphs)
1. Going through other programs:
ML.net needs an ONNX model, not a pb file.
There is several ways to convert your model from TensorFlow to an ONNX model you could load in ML.net :
With WinMLTools tools: https://learn.microsoft.com/en-us/windows/ai/windows-ml/convert-model-winmltools
With MMdnn: https://github.com/microsoft/MMdnn
With tf2onnx: https://github.com/onnx/tensorflow-onnx
If trained with Keras, with keras2onnx: https://github.com/onnx/keras-onnx
This SO post could help you too: Load model with ML.NET saved with keras
And here you will find more informations on the h5 and pb files formats, what they contain, etc.: https://www.tensorflow.org/guide/keras/save_and_serialize#weights_only_saving_in_savedmodel_format
2. But you are asking "TensorFlow -> ML.NET without going through other programs":
2.A An overview of the problem:
First, the pl file format you made using the code you provided from seems, from what you say, to not be the same as the one used in the example you mentionned in comment (https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/text-classification-tf)
Could to try to use the pb file that will be generated via tf.saved_model.save ? Is it working ?
A thought about this microsoft blog post:
From this page we can read:
In ML.NET you can load a frozen TensorFlow model .pb file (also called
“frozen graph def” which is essentially a serialized graph_def
protocol buffer written to disk)
and:
That TensorFlow .pb model file that you see in the diagram (and the
labels.txt codes/Ids) is what you create/train in Azure Cognitive
Services Custom Vision then exporte as a frozen TensorFlow model file
to be used by ML.NET C# code.
So, this pb file is a type of file generated from Azure Cognitive Services Custom Vision.
Perharps you could try this way too ?
2.B Now, we'll try to provide the solution:
In fact, in TensorFlow 1.x you could save a frozen graph easily, using freeze_graph.
But TensorFlow 2.x does not support freeze_graph and converter_variables_to_constants.
You could read some usefull informations here too: Tensorflow 2.0 : frozen graph support
Some users are wondering how to do in TF 2.x: how to freeze graph in tensorflow 2.0 (https://github.com/tensorflow/tensorflow/issues/27614)
There are some solutions however to create the pb file you could load in ML.net as you want:
https://leimao.github.io/blog/Save-Load-Inference-From-TF2-Frozen-Graph/
How to save Keras model as frozen graph? (already linked in your question though)
Difference between op-level graph and conceptual graph (and why Netron show you different graphs):
As #mlneural03 said in a comment to you question, Netron shows a different graph depending on what file format you give:
If you load a h5 file, Netron wil display the conceptual graph
If you load a pb file, Netron wil display the op-level graph
What is the difference between a op-level graph and a conceptual graph ?
In TensorFlow, the nodes of the op-level graph represent the operations ("ops"), like tf.add , tf.matmul , tf.linalg.inv, etc.
The conceptual graph will show you your your model's structure.
That's completely different things.
"ops" is an abbreviation for "operations".
Operations are nodes that perform the computations.
So, that's why you get a very large graph with a lot of nodes when you load the pb fil in Netron: you see all the computation nodes of the graph.
but when you load the h5 file in Netron, you "just" see your model's tructure, the design of your model.
In TensorFlow, you can view your graph with TensorBoard:
By default, TensorBoard displays the op-level graph.
To view the coneptual graph, in TensorBoard, select the "keras" tag.
There is a Jupyter Notebook that explains very clearly the difference between the op-level graph and the coneptual graph here: https://colab.research.google.com/github/tensorflow/tensorboard/blob/master/docs/graphs.ipynb
You can also read this "issue" on the TensorFlow Github too, related to your question: https://github.com/tensorflow/tensorflow/issues/39699
In a nutshell:
In fact there is no problem, just a little misunderstanding (and that's OK, we can't know everything).
You would like to see the same graphs when loading the h5 file and the pb file in Netron, but it has to be unsuccessful, because the files does not contains the same graphs. These graphs are two ways of displaying the same model.
The pb file created with the method we described will be the correct pb file to load whith ML.NET, as described in the Microsoft's tutorial we talked about. SO, if you load you correct pb file as described in these tutorials, you wil load your real/true model.

Tensorflow Object Detection API Untrained Faster-RCNN Model

I am currently trying to build an Object Detector using the the Tensorflow Object Detection API with python. I have managed to retrain the faster-rcnn model by following the instructions posted here and here
However, training time is considerably long as I understand that I am. I understand that I am using transfer learning as opposed to training a faster-rcnn model from scratch. I am wondering if there is anyway to download an untrained faster-rcnn model and train it from scratch (end-to-end) instead of having to recourse to transfer-learning.
I am familiar with the advantages of transfer learning, however, my object detector is aimed at being quickly trainable, narrow in scope, and trained on letters as opposed to objects, so I do not think transfer learning is the best route.
I beleive solving this will have something to do with the pipeline.config file, particulary in this part:
fine_tune_checkpoint: "PATH/TO/PRETRAINED/model.ckpt"
from_detection_checkpoint: true
num_steps: 200000
But I am not sure how to specify that there is no fine_tune_checkpoint

To train your own model from scratch do the following:
Comment out the following lines
# fine_tune_checkpoint: <YOUR PATH>
# from_detection_checkpoint: true
Remove your downloaded pretrained model or rename its path in case you followed the tutorial.
You don't have to download an "empty" model. Instead you can specify your own weight initialization in the config file, e.g., as done here: How to initialize weight for convolution layers in Tensorflow Object Detection API?

Improve accuracy with Tensorflow Object detection pretrained model

I am working on building an object detection model which I would like to create with 22 new classes (most of them are not in COCO or PETS datasets)
What I've already done is:
Prepared images with multiple labels using LabelIMG.
Decrease image size in 2 for images that are bigger than 500k
Convert XML to CSV file
Convert CSV and images to TFRecord
Using the Tensorflow sample config files I've trained with several pretrained checkpoints.
Results: SSD_Mobilenet and SSD_Inception resulted in no classes
found(loss ~10.0) while faster RCNN Inception did succeed to detect some of the
objects(loss ~0.7).
My questions are:
What is the difference between train.py from Object detection, which I used in the above, to retrain.py from image_retraining to train_image_classifier.py from Slim?
Which is better for my task? Or should I do it in a different way?
While running the train.py on FRCNN inception I found that the loss was around 0.7 and not going lower even after 100k steps. Is there any goal in terms of loss to achieve?
How do you suggest to change the config file to improve this?
I found other models for instance Inception V4 etc... which doesn't have sample config files - TF slim. Should I try them and if so how can I use them?
I am pretty new in this field and I need some support in understanding the terms and actions.
BTW: I am using GTX 1060 (GPU) for training but eval is not working in parallel so I can't get the mAP for validation. I tried to force eval for CPU but with no success.
Thanks.

1) What is the difference between train.py from Object detection, which I used in the above, to retrain.py from image_retraining to train_image_classifier.py from Slim
Ans : To what i know, none. Because train.py imports trainer.py which imports slim.learning.train(the same class which is used in train_image_classifier.py) to train.
2) Which is better for my task? Or should I do it in a different way?
Ans: The above answer answers this question too.
3) While running the train.py on FRCNN inception I found that the loss was around 0.7 and not going lower even after 100k steps. Is there any goal in terms of loss to achieve?
Ans: If you use tensorboard to visualize your results, you will find that when your classification loss graph is not changing a lot(has converged), your model is trained. Regarding the loss of 0.7, thats high after training for so many steps. Just check your pipeline config file parameters.
4) How you suggest to change the config file to improve this?
Ans: Learning rate value can be a good start
5) I found other models for instance Inception V4 etc... which doesn't have sample config files - TF slim ? Should I try them and if som how can I use them?
Ans: currently, I dont have an answer for this. but will get back to you.

(Not a complete answer, but I hope it helps in some way!)
Are your annotated objects small relative to the image size?
I had the same problems with no or few detections with SSD and found that model is very sensitive to the setting which determines the size of the box proposals (anchor generator). Here is a link with some details
Further, having an active eval job running is very important when debugging and tuning a model. TotalLoss or any of the parameters returned from the train job does not inform you of the performance of the actual model, only whether it is converging. The eval job gives you e.g. mAP which is a real measure of performance.
A simple way to force an eval job on cpu is by doing the following:
a) install a virtual environment dedicated for the eval job, instructions here
b) activate the virtual environment and install tensorflow cpu in the virtual environment (yes, you should install tensorflow again, and without gpu support)
c) start the train job as usual on your tensorflow-gpu (in whatever way you have installed it)
d) run the eval job in the virtual environment (this will force it to run on the cpu and works great! I also run tensorboard from this installation to minimise risk of interference with the train job)

Retrain is used to add a level in top of pretrained model... You can win time like this.. Useful for thousand of picture, useless for million labelised picture... Less efficient than train from skratch. There is template for config file. If thereis not config file create your own.. Look at tensorflow github explainations...

TensorFlow restore/deploy network without the model?

I've built and trained some networks with TensorFlow and successfully managed to save and restore the model's parameters.
However, for some scenarios - e.g. like deploying a trained network in a customer's infrastructure - it is not the best solution to ship the full code/model. Thus, I am wondering if there is any way to restore/run a trained network without the original code/model used for training?
I guess this leads to the question if TensorFlow is able to save a (compressed?) version of the network architecture into the checkpoint files in addition to the weights of the variables.
Is this somehow possible?

If you really need to restore just from the graphdef file (*.pb), to load it from another application for instance, you will need to use the freeze_graph.py script from here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py
This script takes a graphdef (.pb) and a checkpoint (.ckpt) file as input and outputs a graphdef file which contains the weights in the form of constants (you can read the docs on the script for more details).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.