I've trained 10 different TensorFlow models for style transfer, basically, each model is responsible to apply filters to a image based on a style image. So, every model is functioning independently and I want to integrate this into an application. Is there any way to deploy these models using AWS?
I've tried deploying these models using AWS SageMaker and then using the endpoint with AWS Lambda and then finally creating an API using API Gateway. But the catch here is that we can only deploy a single model on SageMaker, but in my case I want to deploy 10 different models.
I expect to provide a link to each model in my application, so the selected filter will trigger the model on AWS and will apply the filter.
What I did for something similar is that I created my own docker container with an api code capable of loading and predicting with multiple models. The api, when it starts it copies a model.tar.gz from an S3 bucket, and inside that tar.gz are the weights for all my models, my code then scans the content and loads all the models. If your models are too big (RAM consumption) you might need to handle this differently, as it's said here, that it loads the model only when you call predict. I load all the models at the beginning to have faster predicts. That is not actually a big change in code.
Another approach that I'm trying right now is to have the API Gateway call multiple Sagemaker endpoints, although I did not find good documentation for that.
There are couple options, and the final choice depends on your priorities in terms of cost, latency, reliability, simplicity.
Different SageMaker endpoints per model - one benefit of that is that it leads to better robustness, because models are isolated from one another. If one model gets called a lot, it won't put the whole fleet down. They each live their own life, and can also be hosted on separate type of machines, to achieve better economics. Note that to achieve high-availability it is even recommended to double hardware backend (2+ servers per SageMaker endpoint), so that endpoints are multi-zone, as SageMaker does its best to host endpoint backend on different availability zones if an endpoint has two or more instances.
One SageMaker TFServing multi-model endpoint - If all your models are TensorFlow models and if their artifacts are compatible with TFServing, you may be able to host all of them in a single SageMaker TFServing endpoint. See this section of the docs: Deploying more than one model to your endpoint
One SageMaker Multi-Model Endpoint, a feature that was released end of 2019 and that enables hosting of multiple models in the same container.
Serverless deployment in AWS Lambda - this can be cost-effective: models generate charges only when called. This is limited to pairs of {DL model ; DL framework} that fit within Lambda memory and storage limits and that do not require GPU. It's been documented couple times in the past, notably with Tensorflow and MXNet
Related
I have successfully deployed a Django app to Heroku using Postgres. The only problem is that some of the python functions I have written can take up to several minutes to run (data scraping many pages with selenium and generating 50 different deep learning models with keras). If it takes longer than 30 seconds the app crashes. I ultimately plan on using this Heroku app as an API that I will connect to a frontend using React on netlify. Is there a way to automatically run these functions behind the scenes somehow? If not, how can I deploy a website that runs time consuming python functions in the backend and uses React for the frontend?
Okay, I think we can divide the problems in TWO parts:\
1- Heroku free Tier (assuming it is) "kills" the server after 30min of absence (source), so basically its very difficult to host a backend in heroku. And besides that, since you're training A LOT of deep learning models you could go out of memory and things like that.
2- You might want to redesign your architecture. What about creating a server that once in a while train this machine learning models and the other one just consume and make inferences on those models? You could also separate the scrapping part from the actual server, and just pull data from db.
Since you didn't added constraints to your problem, I see it that way.
I'm trying out SageMaker and I've created a model using autopilot. The point is that SageMaker only allows you to deploy directly to an endpoint. But since I'll only be using the model a couple of times a day, what is the most direct way to schedule deployments by events (for example when loading new csv's into an s3 directory or when I see a queue in sqs) or at least periodically?
The answer above is incorrect. Boto3 is part of the Lambda Python environment, so all you need to do is create a SageMaker client and invoke the appropriate API.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html
You can use a trigger (e.g. Cloudwatch Events/EventBridge, S3 event, etc.) to run a Lambda function that deploys your SageMaker model. The Lambda function, however, requires a runtime that can call SageMaker APIs. You will have to create a custom runtime (via Layers) for that. If you're using Python, use this as reference: https://dev.to/vealkind/getting-started-with-aws-lambda-layers-4ipk.
I am new to machine learning. I'm done with k-means clustering and the ml model is trained. My question is how to pass input for my trained model?
Example:
Consider a google image processing ML model. For that we pass an image that gives the proper output like emotion from that picture.
Now my doubt is how to do like that I'm done the k-means to predict mall_customer who spending more money to buy a product for this I want to call or pass the input to the my trained model.
I am using python and sci-kit learn.
What you want here is an API where you can send request/input and get response/predictions.
You can create a Flask server, save your trained model as a pickle file and load it when making predictions. This might be some work to do.
Please refer these :
https://towardsdatascience.com/deploying-a-machine-learning-model-as-a-rest-api-4a03b865c166
https://hackernoon.com/deploy-a-machine-learning-model-using-flask-da580f84e60c
Note: The Flask inbuilt server is not production ready. You might want to refer uwsgi + ngnix
In case you are using docker : https://hub.docker.com/r/tiangolo/uwsgi-nginx-flask/ this will be a great help.
Since the question was asked in 2019, many Python libraries exist that allow users to quickly deploy machine learning models without having to learn Flask, containerization, and getting a web hosting solution. The best solution depends on factors like how long you need to deploy the model for, and whether it needs to be able handle heavy traffic.
For the use case that the user described, it sounds like the gradio library could be helpful (http://www.gradio.app/), which allows users to soft-deploy models with public links and user interfaces with a few lines of Python code, like below:
Let's say all you know is how to train and save a model and want some way of using it in a real app or some way of presenting it to the world.
Here's what you'll need to do:
Create an API (eg: using Flask, FastAPI, Starlette etc) which will serve your model, ie, it will receive inputs, run your model on them, and send back outputs.
Need to setup a webserver (eg: uvicorn), that will host your Flask App and serve as a bridge between host machine and your Flask App.
Deploy the whole thing behind a cloud provider like (netlify, GCP, AWS etc). This will give you a url that can be used to call your API.
Then there are other optional things like:
Docker, which let you package your model, it's dependencies and your Flask App together inside a docker image, which can be easily deployed on different platforms due to consistency. Your app will then run as a docker container. This solves the environment consistency problems.
Kubernetes, which lets you make sure your Flask App always stays available by spinning up new docker container every time something goes wrong in the one that's up. This solves availability and scalability problem.
There are multiple tools that ease or automate different parts of this process. You can also check out mia which lets you do all the above and also give a nice frontend UI to your model web app. Its a no-code, low-code tool so you can go from a saved model to a deployed Web App and an API endpoint within minutes.
(Edit - Disclaimer: I'm part of the team responsible for building mia)
I want to implement a neural network model in Django application so that it can communicate via REST API with other application. Django application iteratively (1) collects a batch of training data from the other application, (2) retrains the model on so far aggregated data and (3) gives predictions on demand from that other application. Time is crucial factor here. How and where can I store an instance of the trained model between those steps?
If you don't want to use a (SQL) database you can also use Django's caching framework to store nearly any kind of data that is somehow serializable. It offers a quite simplte and convenient API (cache.set()/cache.get() and you can use backends like memcached and redis (which could also be stored to disk). For more complicated use cases you might be looking into using redis with its own API which enables you to do more complicated stuff than when accessing it trough the caching API. Using these possiblities you can also share data between multiple processes/workers.
I am serving a model trained using object detection API. Here is how I did it:
Create a Tensorflow service on port 9000 as described in the basic tutorial
Create a python code calling this service using predict_pb2 from tensorflow_serving.apis similar to this
Call this code inside a Flask server to make the service available with HTTP
Still, I could have done things much easier the following way :
Create a python code for inference like in the example in object detection repo
Call this code inside a Flask server to make the service available with HTTP
As you can see, I could have skipped the use of Tensorflow serving.
So, is there any good reason to use Tensorflow serving in my case ? If not, what are the cases where I should use it ?
I believe most of the reasons why you would prefer Tensorflow Serving over Flask are related to performance:
Tensorflow Serving makes use of gRPC and Protobuf while a regular
Flask web service uses REST and JSON. JSON relies on HTTP 1.1 while
gRPC uses HTTP/2 (there are important differences). In addition,
Protobuf is a binary format used to serialize data and it is more
efficient than JSON.
TensorFlow Serving can batch requests to the same model, which uses hardware (e.g. GPUs) more appropriate.
TensorFlow Serving can manage model versioning
As almost everything, it depends a lot on the use case you have and your scenario, so it's important to think about pros and cons and your requirements. TensorFlow Serving has great features, but these features could be also implemented to work with Flask with some effort (for instance, you could create your batch mechanism).
Flask is used to handle request/response whereas Tensorflow serving is particularly built for serving flexible ML models in production.
Let's take some scenarios where you want to:
Serve multiple models to multiple products (Many to Many relations) at
the same time.
Look which model is making an impact on your product (A/B Testing).
Update model weights in production, which is as easy as saving a new
model to a folder.
Have a performance equal to code written in C/C++.
And you can always use all those advantages for FREE by sending requests to TF Serving using Flask.