Compiling model as executable for faster inference? - python

Is there a way to compile the entire Python script with my trained model for faster inference? Seems like loading the Python interpreter, all of Tensorflow, numpy, etc. takes a non-trivial amount of time. When this has to happen at a server responding to a non-trivial frequency of requests, it seems slow.
Edit
I know I can use Tensorflow serving, but don't want to because of the costs associated with it.

How do you set up a server? If you are setting up a server using python framework like django, flask or tornado, you just need to preload your model and keep it as a global variable, and then use this global variable to predict.
If you are using some other server. You can also make the entire python script you use to predict as a local server, and transform request or response between python server and web server.

Do you want to only serve the tensorflow model, or are you doing any work outside of tensorflow?
For just the tensorflow model, you could use TensorFlow Serving. If you are comfortable with gRPC, this will serve you quite well.

Related

How to use Tensorflow in local python project

I have a model created in Tensorflow that is already trained and accurate. How do I get this to run in a project? I can load the model but I can't figure out how to feed it a single image that my software is generating.
Also if this was a transfer learning project do I have to load and create a model before loading the weights?
All the tutorials are on how to set it up in the cloud or with a local server which I would like to avoid. I am tempted to save the data and then run it but that is a lot slower.
Addition:
My environment I am building this for is a google colab Jupyter notebook. The idea is to have no installation for users which is why it must be self contained.

Is it possible to make predictions with a keras/TensorFlow model without downloading TensorFlow and all its dependancies?

I'm trying to use a custom keras model I trained with tensorflow-gpu on my desktop with Python on a mobile phone (Android), however I need to run it with Python on the phone as well. I looked up TensorFlow Lite, however that appears to be written for Java.
Is there any lite (Python) version of TensorFlow, some kind of barebones package that's just set up for making predictions from a TensorFlow/keras model file? I'm trying to focus on saving space, so a solution under 50mb would be desired.
Thanks
TensorFlow Serving was built for the specific purpose of serving pre-trained models. I'm not sure if it runs (or how difficult to make it run) on Android or what it's compiled footprint is, if it's less than 50MB. If you can make it work, please do report back here!

Speed up Python startup or connect it with VB.NET

I've got a following setup:
VB.NET Web-Service is running and it needs to regularly call Python script with machine learning model to predict some stuff. To do this my Web-Service generates a file with input for Python and runs Python script as a subprocess. The script makes predictions and returns them, as standard output, back to Web-Service.
The problem is, that the script requires a few seconds to import all the machine learning libraries and load saved model from drive. It's much more than doing actual prediction. During this time Web-Service is blocked by running subprocess. I have to reduce this time drastically.
What I need is a solution to either:
1. Improve libraries and model loading time.
2. Communicate Python script with VB.NET Web-Service and run Python all the time with imports and ML model already loaded.
Not sure I understood the question but here are some things I can think of.
If this is a network thing you should have python compress the code before sending it over the web.
Or if you're able to, use multithreading while reading the file from the web.
Maybe you should upload some more code so we can help you better.
I've found what I needed.
I've used web.py to convert the Python script into a Web-Service and now both VB.NET and Python Web-Services can communicate. Python is running all the time, so there is no delay for loading libraries and data each time a calculation have to be done.

TensorFlow in production for real time predictions in high traffic app - how to use?

What is the right way to use TensorFlow for real time predictions in a high traffic application.
Ideally I would have a server/cluster running tensorflow listening on a port(s) where I can connect from app servers and get predictions similar to the way databases are used.
Training should be done by cron jobs feeding the training data through the network to the same server/cluster.
How does one actually use tensorflow in production? Should I build a setup where the python is running as a server and use the python scripts to get predictions? I'm still new to this but I feel that such script will need to open sessions etc.. which is not scalable. (I'm talking about 100s of predictions/sec).
Any pointer to relevant information will be highly appreciated. I could not find any.
This morning, our colleagues released TensorFlow Serving on GitHub, which addresses some of the use cases that you mentioned. It is a distributed wrapper for TensorFlow that is designed to support high-performance serving of multiple models. It supports both bulk processing and interactive requests from app servers.
For more information, see the basic and advanced tutorials.

How to deploy and serve prediction using TensorFlow from API?

From google tutorial we know how to train a model in TensorFlow. But what is the best way to save a trained model, then serve the prediction using a basic minimal python api in production server.
My question is basically for TensorFlow best practices to save the model and serve prediction on live server without compromising speed and memory issue. Since the API server will be running on the background for forever.
A small snippet of python code will be appreciated.
TensorFlow Serving is a high performance, open source serving system for machine learning models, designed for production environments and optimized for TensorFlow. The initial release contains C++ server and Python client examples based on gRPC. The basic architecture is shown in the diagram below.
To get started quickly, check out the tutorial.

Categories