Performance of Tensorflow vs Tensorflow Lite

Performance of Tensorflow vs Tensorflow Lite - python

Is there a performance loss when converting TensorFlow models to the TensorFlow Lite format?
Because I got these results from different edge-devices:
Does it make sense that the Nvidia Jetson has a higher accuracy with the TensorFlow model (TensorRT optimized) when comparing it to the Raspberry one which is a TensorFlow Lite model.

Normally, there is a performance loss, but not such a significant one, more precisely around 3% in accuracy for instance in some certain models, but you have to test it on your own to check the accuracy.
Models which are subjected to TensorRT or TensorFlow-Lite do not go through the same exact conversion steps(otherwise they would be the same). Therefore, it is evident that a difference is noticeable.
To conclude: The gain in speed as compared to the performance loss(max 3%) is much more important. For each and every assumption tests should be employed.
This article is also a good read: https://www.hackster.io/news/benchmarking-tensorflow-and-tensorflow-lite-on-the-raspberry-pi-43f51b796796

Related

Do I need gpu while working with pretrained model?

I am already using Google Colab to train my model. So I will not use my own GPU for training. I want to ask, is there a performance difference beetween GPU and CPU while working with pre-trained model. I already trained a model with Google Colab GPU and used with my own local CPU. Should I use GPU for testing?

It depends how many predictions you need to do. Usually in training you are making many calculations therefore parallelisation by GPU shortens overall training time. Usually, when using a trained model you just need to do a sparse prediction per time unit. In such situation CPU approach should be OK. However, if you need to do as many predictions as during training then GPU would be beneficial. This can particularly be true with reinforcement training, when your model must adopt to continuously changing environmental input.

Neural network inference with one image: Why is GPU utilization not 100%?

First of all: this question is connected to neural network inference and not training.
I have discovered, that when doing inference of a trained neural network with only one image over and over on a GPU (e.g. P100) the utilization of the computing power with Tensorflow is not reaching 100%, but instead around 70%. This is also the case if the image does not have to be transferred to the GPU. Therefore, the issue has to be connected to constraints in the parallelization of the calculations. My best guesses for the reasons are:
Tensorflow can only utilize the parallelization capabilities of a GPU up to a certain level. (Also the higher utilization of the same model as a TensorRT models suggest that). In this case, the question is: What is the reason for that?
The inherent neural network structure with several subsequent layers avoids that a higher usage is possible. Therefore the problem is not overhead of a framework but lies in the general design of neural networks. In this case, the question is: What are the restrictions to that?
Both of the above combined.
Thanks for your ideas on the issue!

Why do you expect the GPU utilization to go to 100% when you run the neuronal network prediction for one image?
The GPU utilization is per time unit (e.g. 1 second). This means, when the neuronal network algorithm finished before this time unit elapsed (e.g within 0.5s) Then the rest of the time the GPU may get used by other programs or not get used at all. If the GPU is not used by any other programs neither then well you will not reach 100%.

how to speedup tensorflow RNN inference time

We've trained a tf-seq2seq model for question answering. The main framework is from google/seq2seq. We use bidirectional RNN( GRU encoders/decoders 128units), adding soft attention mechanism.
We limit maximum length to 100 words. It mostly just generates 10~20 words.
For model inference, we try two cases:
normal(greedy algorithm). Its inference time is about 40ms~100ms
beam search. We try to use beam width 5, and its inference time is about 400ms~1000ms.
So, we want to try to use beam width 3, its time may decrease, but it may also influence the final effect.
So are there any suggestion to decrease inference time for our case? Thanks.

you can do network compression.
cut the sentence into pieces by byte-pair-encoding or unigram language model and etc and then try TreeLSTM.
you can try faster softmax like adaptive softmax](https://arxiv.org/pdf/1609.04309.pdf)
try cudnnLSTM
try dilated RNN
switch to CNN like dilated CNN, or BERT for parallelization and more efficient GPU support

If you require improved performance, I'd propose that you use OpenVINO. It reduces inference time by graph pruning and fusing some operations. Although OpenVINO is optimized for Intel hardware, it should work with any CPU.
Here are some performance benchmarks for NLP model (BERT) and various CPUs.
It's rather straightforward to convert the Tensorflow model to OpenVINO unless you have fancy custom layers. The full tutorial on how to do it can be found here. Some snippets below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[tensorflow2]
Use Model Optimizer to convert SavedModel model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, which is a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (just change data_type). Run in the command line:
mo --saved_model_dir "model" --input_shape "[1, 3, 224, 224]" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what is the best choice for you, just use AUTO.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
Disclaimer: I work on OpenVINO.

Keras model behaves very differently between two computers

I'm running a keras model for binary classification on two separate computers; the first is running python 2.7.5 with Tensorflow 1.0.1 and keras 2.0.2 and cpu computations; the second is running python 2.7.5 with Tensorflow 1.2.1 and keras 2.0.6 and uses the gpu.
My model is modified from the siamese network model at https://gist.github.com/mmmikael/0a3d4fae965bdbec1f9d. I added regularization (activity_regularizer=keras.regularizers.l1 in the Dense layers), but for everything else I'm using the same structure as the mnist example.
I use the exact same code and training data on both computers, but the first one gives me a classification accuracy 86% and recall 88% on the test set, and the other gives me accuracy 52% and recall 100% (it classifies every test sample as "positive"). These results are reproducible with separate initializations.
I'm not even sure where to start looking for why the performance is so vastly different. I've been reading through the keras/tensorflow release notes to see if any of the changes pertain to something in my model, but I don't see anything that looks helpful. It doesn't make sense that a version change in tensorflow/keras would cause that much of a difference in the performance. Any sort of help figuring this out would be greatly appreciated.

Cannot train a CNN (or other neural network) to fit (or even over-fit) on Keras + Theano

For learning purposes, I am trying to implement a CNN from scratch, but the results do not seem to improve from random guessing. I know this is not the best approach on home hardware, and following course.fast.ai I have obtained much better results via transfer learning, but for a deeper understanding I would like to see, at least in theory, how one could do it otherwise.
Testing on CIFAR-10 posed no issues - a small CNN trained from scratch in a matter of minutes with an error of less than 0.5%.
However, when trying to test against the Cats vs. Dogs Kaggle dataset, the results did not bulge from 50% accuracy. The architecture is basically a copy of AlexNet, including the non-state-of-the-art choices (large filters, histogram equalization, Nesterov-SGD optimizer). For more details, I put the code in a notebook on GitHub:
https://github.com/mspinaci/deep-learning-examples/blob/master/dogs_vs_cats_with_AlexNet.ipynb
(I also tried different architectures, more VGG-like and using Adam optimizer, but the result was the same; the reason why I followed the structure above was to match as closely as possible the Caffe procedure described here:
https://github.com/adilmoujahid/deeplearning-cats-dogs-tutorial
and that seems to converge quickly enough, according to the author's description here: http://adilmoujahid.com/posts/2016/06/introduction-deep-learning-python-caffe/).
I was expecting some fitting to happen quickly, possibly flattening out due to the many suboptimal choices made (e.g. small dataset, no data augmentation). Instead, I saw no increment at all, as the notebook shows.
So I thought that maybe I was simply overestimating my GPU and patience, and that the model was too complicated even to overfit my data in a few hours (I ran 70 epochs, each time roughly 360 batches of 64 images). Therefore I tried to overfit as hard as I could, running these other models:
https://github.com/mspinaci/deep-learning-examples/blob/master/Ridiculously%20overfitting%20models...%20or%20maybe%20not.ipynb
The purely linear model started showing some overfit - around 53.5% training accuracy vs 52% validation accuracy (which I guess is thus my best result). That followed my expectations. However, to try and overfit as hard as I could, the second model is a simple 2 layers feedforward neural network, without any regularization, that I trained on just 2000 images with batch size up to 500. I was expecting the NN to overfit wildly, quickly getting to 100% train accuracy (after all it has 77M parameters for 2k pictures!). Instead, nothing happened, and the accuracy flattened to 50% quickly enough.
Any tip about why none of the "multi-layer" models seems able to pick any feature (be it "true" or out of overfitting) would be very much appreciated!
Note on versions etc: the notebooks were run on Python 2.7, Keras 2.0.8, Theano 0.9.0. The OS is Windows 10, and the GPU is a not-so-powerful, but that should be sufficient for basic tasks, GeForce GTX 960M.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.