How to deploy and serve prediction using TensorFlow from API? - python

From google tutorial we know how to train a model in TensorFlow. But what is the best way to save a trained model, then serve the prediction using a basic minimal python api in production server.
My question is basically for TensorFlow best practices to save the model and serve prediction on live server without compromising speed and memory issue. Since the API server will be running on the background for forever.
A small snippet of python code will be appreciated.

TensorFlow Serving is a high performance, open source serving system for machine learning models, designed for production environments and optimized for TensorFlow. The initial release contains C++ server and Python client examples based on gRPC. The basic architecture is shown in the diagram below.
To get started quickly, check out the tutorial.

Related

How to run your own python code on amazon sagemaker

I have a python code which uses keras and tensorflow backend. My system doesn't support training this model due to low memory space. I want to take use of Amazon sagemaker.
However all the tutorials I find are about deploying your model in docker containers. My model isn't trained and I want to train it on Amazon Sagemaker.
Is there a way to do this?
EDIT : Also can I make a script of my python code and run on it on AWS sagemaker?
SageMaker provides the capability for users to bring in their custom training scripts and train their algorithms using the script it on SageMaker using one of the pre-built containers for frameworks like TensorFlow, MXNet, PyTorch.
Please take a look at https://github.com/aws/amazon-sagemaker-examples/blob/master/frameworks/tensorflow/get_started_mnist_train.ipynb
It walks through how you can bring in your training script using TensorFlow and train it using SageMaker.
There are several other examples in the repository which will help you answer other questions you might have as you progress on with your SageMaker journey.

Tf objection detection API deployment on django

I am currently trying to serve into a webapp build in Django however I am facing some difficulties.
1. I have succesfully trained my model and I have the precious frozen_inference_graph.
2. I am creating the webapp on django.
I would like to call my model directly from my webapp folder. However, when using the inference method from my webapp folder, no inference is done: there is not bug and the script does nothing . In my tf folder, the script does the proper inference.
Do you have any clues? Also I have not found any tutorial for serving tensorflow on Django, are you aware of some?
Thanks,
Have a nice day !
You could serving your model.
TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.
https://www.tensorflow.org/tfx/guide/serving
good luck!

Is it possible to make predictions with a keras/TensorFlow model without downloading TensorFlow and all its dependancies?

I'm trying to use a custom keras model I trained with tensorflow-gpu on my desktop with Python on a mobile phone (Android), however I need to run it with Python on the phone as well. I looked up TensorFlow Lite, however that appears to be written for Java.
Is there any lite (Python) version of TensorFlow, some kind of barebones package that's just set up for making predictions from a TensorFlow/keras model file? I'm trying to focus on saving space, so a solution under 50mb would be desired.
Thanks
TensorFlow Serving was built for the specific purpose of serving pre-trained models. I'm not sure if it runs (or how difficult to make it run) on Android or what it's compiled footprint is, if it's less than 50MB. If you can make it work, please do report back here!

Compiling model as executable for faster inference?

Is there a way to compile the entire Python script with my trained model for faster inference? Seems like loading the Python interpreter, all of Tensorflow, numpy, etc. takes a non-trivial amount of time. When this has to happen at a server responding to a non-trivial frequency of requests, it seems slow.
Edit
I know I can use Tensorflow serving, but don't want to because of the costs associated with it.
How do you set up a server? If you are setting up a server using python framework like django, flask or tornado, you just need to preload your model and keep it as a global variable, and then use this global variable to predict.
If you are using some other server. You can also make the entire python script you use to predict as a local server, and transform request or response between python server and web server.
Do you want to only serve the tensorflow model, or are you doing any work outside of tensorflow?
For just the tensorflow model, you could use TensorFlow Serving. If you are comfortable with gRPC, this will serve you quite well.

TensorFlow in production for real time predictions in high traffic app - how to use?

What is the right way to use TensorFlow for real time predictions in a high traffic application.
Ideally I would have a server/cluster running tensorflow listening on a port(s) where I can connect from app servers and get predictions similar to the way databases are used.
Training should be done by cron jobs feeding the training data through the network to the same server/cluster.
How does one actually use tensorflow in production? Should I build a setup where the python is running as a server and use the python scripts to get predictions? I'm still new to this but I feel that such script will need to open sessions etc.. which is not scalable. (I'm talking about 100s of predictions/sec).
Any pointer to relevant information will be highly appreciated. I could not find any.
This morning, our colleagues released TensorFlow Serving on GitHub, which addresses some of the use cases that you mentioned. It is a distributed wrapper for TensorFlow that is designed to support high-performance serving of multiple models. It supports both bulk processing and interactive requests from app servers.
For more information, see the basic and advanced tutorials.

Categories