I want to implement a neural network model in Django application so that it can communicate via REST API with other application. Django application iteratively (1) collects a batch of training data from the other application, (2) retrains the model on so far aggregated data and (3) gives predictions on demand from that other application. Time is crucial factor here. How and where can I store an instance of the trained model between those steps?
If you don't want to use a (SQL) database you can also use Django's caching framework to store nearly any kind of data that is somehow serializable. It offers a quite simplte and convenient API (cache.set()/cache.get() and you can use backends like memcached and redis (which could also be stored to disk). For more complicated use cases you might be looking into using redis with its own API which enables you to do more complicated stuff than when accessing it trough the caching API. Using these possiblities you can also share data between multiple processes/workers.
Related
I've trained 10 different TensorFlow models for style transfer, basically, each model is responsible to apply filters to a image based on a style image. So, every model is functioning independently and I want to integrate this into an application. Is there any way to deploy these models using AWS?
I've tried deploying these models using AWS SageMaker and then using the endpoint with AWS Lambda and then finally creating an API using API Gateway. But the catch here is that we can only deploy a single model on SageMaker, but in my case I want to deploy 10 different models.
I expect to provide a link to each model in my application, so the selected filter will trigger the model on AWS and will apply the filter.
What I did for something similar is that I created my own docker container with an api code capable of loading and predicting with multiple models. The api, when it starts it copies a model.tar.gz from an S3 bucket, and inside that tar.gz are the weights for all my models, my code then scans the content and loads all the models. If your models are too big (RAM consumption) you might need to handle this differently, as it's said here, that it loads the model only when you call predict. I load all the models at the beginning to have faster predicts. That is not actually a big change in code.
Another approach that I'm trying right now is to have the API Gateway call multiple Sagemaker endpoints, although I did not find good documentation for that.
There are couple options, and the final choice depends on your priorities in terms of cost, latency, reliability, simplicity.
Different SageMaker endpoints per model - one benefit of that is that it leads to better robustness, because models are isolated from one another. If one model gets called a lot, it won't put the whole fleet down. They each live their own life, and can also be hosted on separate type of machines, to achieve better economics. Note that to achieve high-availability it is even recommended to double hardware backend (2+ servers per SageMaker endpoint), so that endpoints are multi-zone, as SageMaker does its best to host endpoint backend on different availability zones if an endpoint has two or more instances.
One SageMaker TFServing multi-model endpoint - If all your models are TensorFlow models and if their artifacts are compatible with TFServing, you may be able to host all of them in a single SageMaker TFServing endpoint. See this section of the docs: Deploying more than one model to your endpoint
One SageMaker Multi-Model Endpoint, a feature that was released end of 2019 and that enables hosting of multiple models in the same container.
Serverless deployment in AWS Lambda - this can be cost-effective: models generate charges only when called. This is limited to pairs of {DL model ; DL framework} that fit within Lambda memory and storage limits and that do not require GPU. It's been documented couple times in the past, notably with Tensorflow and MXNet
I am new to machine learning. I'm done with k-means clustering and the ml model is trained. My question is how to pass input for my trained model?
Example:
Consider a google image processing ML model. For that we pass an image that gives the proper output like emotion from that picture.
Now my doubt is how to do like that I'm done the k-means to predict mall_customer who spending more money to buy a product for this I want to call or pass the input to the my trained model.
I am using python and sci-kit learn.
What you want here is an API where you can send request/input and get response/predictions.
You can create a Flask server, save your trained model as a pickle file and load it when making predictions. This might be some work to do.
Please refer these :
https://towardsdatascience.com/deploying-a-machine-learning-model-as-a-rest-api-4a03b865c166
https://hackernoon.com/deploy-a-machine-learning-model-using-flask-da580f84e60c
Note: The Flask inbuilt server is not production ready. You might want to refer uwsgi + ngnix
In case you are using docker : https://hub.docker.com/r/tiangolo/uwsgi-nginx-flask/ this will be a great help.
Since the question was asked in 2019, many Python libraries exist that allow users to quickly deploy machine learning models without having to learn Flask, containerization, and getting a web hosting solution. The best solution depends on factors like how long you need to deploy the model for, and whether it needs to be able handle heavy traffic.
For the use case that the user described, it sounds like the gradio library could be helpful (http://www.gradio.app/), which allows users to soft-deploy models with public links and user interfaces with a few lines of Python code, like below:
Let's say all you know is how to train and save a model and want some way of using it in a real app or some way of presenting it to the world.
Here's what you'll need to do:
Create an API (eg: using Flask, FastAPI, Starlette etc) which will serve your model, ie, it will receive inputs, run your model on them, and send back outputs.
Need to setup a webserver (eg: uvicorn), that will host your Flask App and serve as a bridge between host machine and your Flask App.
Deploy the whole thing behind a cloud provider like (netlify, GCP, AWS etc). This will give you a url that can be used to call your API.
Then there are other optional things like:
Docker, which let you package your model, it's dependencies and your Flask App together inside a docker image, which can be easily deployed on different platforms due to consistency. Your app will then run as a docker container. This solves the environment consistency problems.
Kubernetes, which lets you make sure your Flask App always stays available by spinning up new docker container every time something goes wrong in the one that's up. This solves availability and scalability problem.
There are multiple tools that ease or automate different parts of this process. You can also check out mia which lets you do all the above and also give a nice frontend UI to your model web app. Its a no-code, low-code tool so you can go from a saved model to a deployed Web App and an API endpoint within minutes.
(Edit - Disclaimer: I'm part of the team responsible for building mia)
I am serving a model trained using object detection API. Here is how I did it:
Create a Tensorflow service on port 9000 as described in the basic tutorial
Create a python code calling this service using predict_pb2 from tensorflow_serving.apis similar to this
Call this code inside a Flask server to make the service available with HTTP
Still, I could have done things much easier the following way :
Create a python code for inference like in the example in object detection repo
Call this code inside a Flask server to make the service available with HTTP
As you can see, I could have skipped the use of Tensorflow serving.
So, is there any good reason to use Tensorflow serving in my case ? If not, what are the cases where I should use it ?
I believe most of the reasons why you would prefer Tensorflow Serving over Flask are related to performance:
Tensorflow Serving makes use of gRPC and Protobuf while a regular
Flask web service uses REST and JSON. JSON relies on HTTP 1.1 while
gRPC uses HTTP/2 (there are important differences). In addition,
Protobuf is a binary format used to serialize data and it is more
efficient than JSON.
TensorFlow Serving can batch requests to the same model, which uses hardware (e.g. GPUs) more appropriate.
TensorFlow Serving can manage model versioning
As almost everything, it depends a lot on the use case you have and your scenario, so it's important to think about pros and cons and your requirements. TensorFlow Serving has great features, but these features could be also implemented to work with Flask with some effort (for instance, you could create your batch mechanism).
Flask is used to handle request/response whereas Tensorflow serving is particularly built for serving flexible ML models in production.
Let's take some scenarios where you want to:
Serve multiple models to multiple products (Many to Many relations) at
the same time.
Look which model is making an impact on your product (A/B Testing).
Update model weights in production, which is as easy as saving a new
model to a folder.
Have a performance equal to code written in C/C++.
And you can always use all those advantages for FREE by sending requests to TF Serving using Flask.
I'm building a system with two servers and I'm building an API interface between them. The one is a normal Django web server and the other is a calculation server (also Django powered) which performs complex calculations from specific inputs. I've split the website and the calculation server to decouple the components.
I'm using the Django rest framework and I've created a serialization class on the webserver. This is for the inputs that gets sent to the calculation server and is populated from various DB entries. I then pass the serialized data as parameters in a get request to the calc server. I then copy that same serialization class to the calculation server to de-serialize/decode the data and perform the calculation.
Is it normal to use this approach, where I'm copying the serialization class between the two servers? Usually when I copy something I'm doing it wrong.
The calculated results are then just returned to my web server using built-in python and django functions. I don't see a need for the django rest framework during this step.
We have this open source app we built with Django http://map.ninux.org/ which is used by our wireless community network and is hosted outside of our network with Hetzner in Nurimberg.
We'd like to have a mirror inside our network for internal use only.
I would like to set the mirror to do write queries on the database which is hosted outside the network.
Best would be to set the mirror to do write queries both on its local DB and on the one outside the network.
Any suggestion?
I'm also wondering if there are any articles about developing distributed / redundant / decentralized applications with Django.
Thanks!
The multiple database documentation shows you how to set up two databases and how to select a database for saving which is what write operations do.