How to deal with time consuming python functions in Heroku? - python

I have successfully deployed a Django app to Heroku using Postgres. The only problem is that some of the python functions I have written can take up to several minutes to run (data scraping many pages with selenium and generating 50 different deep learning models with keras). If it takes longer than 30 seconds the app crashes. I ultimately plan on using this Heroku app as an API that I will connect to a frontend using React on netlify. Is there a way to automatically run these functions behind the scenes somehow? If not, how can I deploy a website that runs time consuming python functions in the backend and uses React for the frontend?

Okay, I think we can divide the problems in TWO parts:\
1- Heroku free Tier (assuming it is) "kills" the server after 30min of absence (source), so basically its very difficult to host a backend in heroku. And besides that, since you're training A LOT of deep learning models you could go out of memory and things like that.
2- You might want to redesign your architecture. What about creating a server that once in a while train this machine learning models and the other one just consume and make inferences on those models? You could also separate the scrapping part from the actual server, and just pull data from db.
Since you didn't added constraints to your problem, I see it that way.

Related

Minimal reproducible example of a django project using appengine task queue (dev_appserver.py)

I am trying to create a django project which uses AppEngine's task queue, and would like to test it locally before deploying (using gclouds dev_appserver.py`).
I can't seem to find resources that help in local development, and the closest thing was a Medium article that helps setting up django with Datastore (https://medium.com/#bcrodrigues/quick-start-django-datastore-app-engine-standard-3-7-dev-appserver-py-way-56a0f90c53a3).
Does anyone have an example that I could look into for understanding how to start my implementation?
I don't think you can test it locally. I'd create a new app engine project and test it there. You should be able to stay within the free quota.
Once you have it working, you can write unit tests with mocks of task queue API calls.

How to deploy our ML trained model?

I am new to machine learning. I'm done with k-means clustering and the ml model is trained. My question is how to pass input for my trained model?
Example:
Consider a google image processing ML model. For that we pass an image that gives the proper output like emotion from that picture.
Now my doubt is how to do like that I'm done the k-means to predict mall_customer who spending more money to buy a product for this I want to call or pass the input to the my trained model.
I am using python and sci-kit learn.
What you want here is an API where you can send request/input and get response/predictions.
You can create a Flask server, save your trained model as a pickle file and load it when making predictions. This might be some work to do.
Please refer these :
https://towardsdatascience.com/deploying-a-machine-learning-model-as-a-rest-api-4a03b865c166
https://hackernoon.com/deploy-a-machine-learning-model-using-flask-da580f84e60c
Note: The Flask inbuilt server is not production ready. You might want to refer uwsgi + ngnix
In case you are using docker : https://hub.docker.com/r/tiangolo/uwsgi-nginx-flask/ this will be a great help.
Since the question was asked in 2019, many Python libraries exist that allow users to quickly deploy machine learning models without having to learn Flask, containerization, and getting a web hosting solution. The best solution depends on factors like how long you need to deploy the model for, and whether it needs to be able handle heavy traffic.
For the use case that the user described, it sounds like the gradio library could be helpful (http://www.gradio.app/), which allows users to soft-deploy models with public links and user interfaces with a few lines of Python code, like below:
Let's say all you know is how to train and save a model and want some way of using it in a real app or some way of presenting it to the world.
Here's what you'll need to do:
Create an API (eg: using Flask, FastAPI, Starlette etc) which will serve your model, ie, it will receive inputs, run your model on them, and send back outputs.
Need to setup a webserver (eg: uvicorn), that will host your Flask App and serve as a bridge between host machine and your Flask App.
Deploy the whole thing behind a cloud provider like (netlify, GCP, AWS etc). This will give you a url that can be used to call your API.
Then there are other optional things like:
Docker, which let you package your model, it's dependencies and your Flask App together inside a docker image, which can be easily deployed on different platforms due to consistency. Your app will then run as a docker container. This solves the environment consistency problems.
Kubernetes, which lets you make sure your Flask App always stays available by spinning up new docker container every time something goes wrong in the one that's up. This solves availability and scalability problem.
There are multiple tools that ease or automate different parts of this process. You can also check out mia which lets you do all the above and also give a nice frontend UI to your model web app. Its a no-code, low-code tool so you can go from a saved model to a deployed Web App and an API endpoint within minutes.
(Edit - Disclaimer: I'm part of the team responsible for building mia)

Sklearn Model (Python) with NodeJS (Express): how to connect both?

I have a web server using NodeJS - Express and I have a Scikit-Learn (machine learning) model pickled (dumped) in the same machine.
What I need is to demonstrate the model by sending/receiving data from it to the server. I want to load the model on startup of the web server and keep "listening" for data inputs. When receive data, executes a prediction and send it back.
I am relatively new to Python. From what I've seen I could use a "Child Process" to execute that. I also saw some modules that run Python script from Node.
The problem is I want to load the model once and let it be for as long as the server is on. I don't want to keep loading the model every time due to it's size. How is the best way to perform that?
The idea is running everything in a AWS machine.
Thank you in advance.
My recommendation: write a simple python web service (personally recommend flask) and deploy your ML model. Then you can easily send requests to your python web service from your node back-end. You wouldn't have a problem with the initial model loading. it is done once in the app startup, and then you're good to go
DO NOT GO FOR SCRIPT EXECUTIONS AND CHILD PROCESSES!!! I just wrote it in bold-italic all caps so to be sure you wouldn't do that. Believe me... it potentially go very very south, with all that zombie processes upon job termination and other stuff. let's just simply say it's not the standard way to do that.
You need to think about multi-request handling. I think flask now has it by default
I am just giving you general hints because your problem has been generally introduced.

Python Web Backend

I am an experienced Python developer starting to work on web service
backend system. The system feeds data (constantly) from the web to a
MySQL database. This data is later displayed by a frontend side (there
is no connection between the frontend and the backend). The backend
system constantly downloads flight information from the web (some of
the data is fetched via APIs, and some by downloading and parsing
text / xls files). I already have a script that downloads the data,
parses it, and inserts it to the MySQL db - all in a big loop. The
frontend side is just a bunch of php pages that properly display the
data by querying the MySQL server.
It is crucial that this web service be robust, strong and reliable.
Therefore, I have been looking into the proper ways to design it, and came across the following parts to comprise my system:
1) django as a framework (for HTTP connections and for using Piston)
2) Piston as an API provider (this is great because then my front-end can use the API instead of actually running queries)
3) SQLAlchemy as the DB layer (I don't like the little control you get when using django ORM, I want to be able to run a more complex DB framework)
4) Apache with mod_wsgi to run everything
5) And finally, Celery (or django-cron) to actually run my infinite loop that pulls the data off the web - hopefully in some sort of organized tasks format). This is the part I am least sure of, and any pointers are appreciated.
This all sounds great. I used django before to write websites (aka
request handlers that return data). However, other than using Celery or django-cron I can't really see how it fits a role of a constant data feeding backend.
I just wanted to run this by you guys to hear your ideas / comments. Any input you have / pointers to documentation and/or other libraries would be greatly greatly appreciated!
If You are about to use SQLAlchemy, I would refrain from using Django: Django is fine if You are using the whole stack, but as You are about to rip Models off, I do not see much value in using it and I would take a look at another option (perhaps Pylons or pure old CherryPy would do).
Even more so if FEs will not run queries, but only ask API providers.
As for robustness, I am more satisfied with starting separate fcgi processess with supervise and using more lightweight web server (ligty / nginx), but that's a matter of taste.
For the "infinite loop" part, it depends on what behavior you want: if there is a problem with the source, would you just like to skip the step or repeat it multiple times when source is back up?
Periodic Tasks might be good for former, while cron that would just spawn scraping tasks is better for latter.

fastcgi, cherrypy, and python

So I'm trying to do more web development in python, and I've picked cherrypy, hosted by lighttpd w/ fastcgi. But my question is a very basic one: why do I need to restart lighttpd (or apache) every time I change my application code, or the code for an underlying library?
I realize this question extends from a basic mis(i.e. poor)understanding of the fastcgi model, so I'm open to any schooling here, but I'm used to just changing a PHP file and it showing up, versus having to bounce the web server.
Any elucidation/useful mockery appreciated.
This is because of performance. For development, autoreloading is helpful. But for production, you don't want to autoreload. This is actually a decently-sized bottleneck in say PHP. Every time you access a PHP webpage, the server has to parse and load each page from scratch. With Python, the script is already loaded and running after the first access.
As has been pointed out, CherryPy has a autoreload setting. I'd recommend using the CherryPy built-in server for development and using lighttpd for production. That will likely save you some time. The tutorial shows you how to do this.
From a system-software-writer's pointer of view: This all depends on how the meta-data about the server process is organized within your daemon (lighttpd or fcgi). Some programs are designed for one time only initialization -- MOSTLY this allows a much simpler and better performing internal programming model.
Often it is very hard to program a server process reload config data in a easy way. You might have to introduce locks and external event objects (signals in UNIX). When you can synchronize the data structures by design -- i.e., only initializing once .... why complicate things by making the data model modifiable multiple times ?

Categories