I have a web server using NodeJS - Express and I have a Scikit-Learn (machine learning) model pickled (dumped) in the same machine.
What I need is to demonstrate the model by sending/receiving data from it to the server. I want to load the model on startup of the web server and keep "listening" for data inputs. When receive data, executes a prediction and send it back.
I am relatively new to Python. From what I've seen I could use a "Child Process" to execute that. I also saw some modules that run Python script from Node.
The problem is I want to load the model once and let it be for as long as the server is on. I don't want to keep loading the model every time due to it's size. How is the best way to perform that?
The idea is running everything in a AWS machine.
Thank you in advance.
My recommendation: write a simple python web service (personally recommend flask) and deploy your ML model. Then you can easily send requests to your python web service from your node back-end. You wouldn't have a problem with the initial model loading. it is done once in the app startup, and then you're good to go
DO NOT GO FOR SCRIPT EXECUTIONS AND CHILD PROCESSES!!! I just wrote it in bold-italic all caps so to be sure you wouldn't do that. Believe me... it potentially go very very south, with all that zombie processes upon job termination and other stuff. let's just simply say it's not the standard way to do that.
You need to think about multi-request handling. I think flask now has it by default
I am just giving you general hints because your problem has been generally introduced.
Related
The project that I am working on is a bit confidential, but I will try to explain my issues and be as clear as possible because I need your opinion.
Project:
They asked me to set up a local ELK environment , and to use Python scripts to communicate with this stack (ELK), to store data, retrieve it, analyse it and visualise it thanks to Kibana, and finally there is a decision making based on that data(AI). So as you can see, it is a Data Engineering project with some AI for the decision making process. The issues that I am facing are:
I don't know how to use Python to communicate with the stack, I didn't find resources about it
Since the data is confidential, how can I assure a high security?
How many instances to use?
I am lost because I am new to ELK and my team is not Dev oriented
I am new to ELK, so please any advice would be really helpful!
I don't know how to use Python to communicate with the stack, I didn't
find resources about it
For learning how to interact with your stack use the python library:
You can install using pip3 install elasticsearch and the following links contain a wealth of tutorials on almost anything you would need to be doing.
https://kb.objectrocket.com/category/elasticsearch?filter=python
Suggest you start with these two:
https://kb.objectrocket.com/elasticsearch/how-to-parse-lines-in-a-text-file-and-index-as-elasticsearch-documents-using-python-641
https://kb.objectrocket.com/elasticsearch/how-to-query-elasticsearch-documents-in-python-268
Since the data is confidential, how can I assure a high security?
You can mask the data or restrict index access.
https://www.elastic.co/guide/en/elasticsearch/reference/current/authorization.html
https://nl.devoteam.com/expert-view/field-level-security-and-data-masking-in-elasticsearch/
How many instances to use?
I am lost because I am new to ELK and my team is not Dev oriented
I suggest you start with 1 Elasticsearch node, if you're on AWS use a t3a.large or equivalent and run Elasticsearch, Kibana and Logstash all on the same machine.
For setting it up: https://www.elastic.co/guide/en/elastic-stack-get-started/current/get-started-stack-docker.html#run-docker-secure
If you want to use phyton as your integration tools to Elasticsearch you can use elasticsearch phyton client.
The other options you can use python to create the result and save it in log file or insert to database than Logstash will get your data.
For the security ELK have good security from API authorization user authentication to cluster security. you can see in here Secure the Elastic Stack
I just use 1 instance, but feel free if you think you will need to separate between Kibana and Elasticsearch and Logstash (if you use it) or you can use docker to separate it.
Based on my experience, if you are going to load a lot of data in a short time it will be wise If you separate it so the processes don't interfere with each other.
I have successfully deployed a Django app to Heroku using Postgres. The only problem is that some of the python functions I have written can take up to several minutes to run (data scraping many pages with selenium and generating 50 different deep learning models with keras). If it takes longer than 30 seconds the app crashes. I ultimately plan on using this Heroku app as an API that I will connect to a frontend using React on netlify. Is there a way to automatically run these functions behind the scenes somehow? If not, how can I deploy a website that runs time consuming python functions in the backend and uses React for the frontend?
Okay, I think we can divide the problems in TWO parts:\
1- Heroku free Tier (assuming it is) "kills" the server after 30min of absence (source), so basically its very difficult to host a backend in heroku. And besides that, since you're training A LOT of deep learning models you could go out of memory and things like that.
2- You might want to redesign your architecture. What about creating a server that once in a while train this machine learning models and the other one just consume and make inferences on those models? You could also separate the scrapping part from the actual server, and just pull data from db.
Since you didn't added constraints to your problem, I see it that way.
I am new to machine learning. I'm done with k-means clustering and the ml model is trained. My question is how to pass input for my trained model?
Example:
Consider a google image processing ML model. For that we pass an image that gives the proper output like emotion from that picture.
Now my doubt is how to do like that I'm done the k-means to predict mall_customer who spending more money to buy a product for this I want to call or pass the input to the my trained model.
I am using python and sci-kit learn.
What you want here is an API where you can send request/input and get response/predictions.
You can create a Flask server, save your trained model as a pickle file and load it when making predictions. This might be some work to do.
Please refer these :
https://towardsdatascience.com/deploying-a-machine-learning-model-as-a-rest-api-4a03b865c166
https://hackernoon.com/deploy-a-machine-learning-model-using-flask-da580f84e60c
Note: The Flask inbuilt server is not production ready. You might want to refer uwsgi + ngnix
In case you are using docker : https://hub.docker.com/r/tiangolo/uwsgi-nginx-flask/ this will be a great help.
Since the question was asked in 2019, many Python libraries exist that allow users to quickly deploy machine learning models without having to learn Flask, containerization, and getting a web hosting solution. The best solution depends on factors like how long you need to deploy the model for, and whether it needs to be able handle heavy traffic.
For the use case that the user described, it sounds like the gradio library could be helpful (http://www.gradio.app/), which allows users to soft-deploy models with public links and user interfaces with a few lines of Python code, like below:
Let's say all you know is how to train and save a model and want some way of using it in a real app or some way of presenting it to the world.
Here's what you'll need to do:
Create an API (eg: using Flask, FastAPI, Starlette etc) which will serve your model, ie, it will receive inputs, run your model on them, and send back outputs.
Need to setup a webserver (eg: uvicorn), that will host your Flask App and serve as a bridge between host machine and your Flask App.
Deploy the whole thing behind a cloud provider like (netlify, GCP, AWS etc). This will give you a url that can be used to call your API.
Then there are other optional things like:
Docker, which let you package your model, it's dependencies and your Flask App together inside a docker image, which can be easily deployed on different platforms due to consistency. Your app will then run as a docker container. This solves the environment consistency problems.
Kubernetes, which lets you make sure your Flask App always stays available by spinning up new docker container every time something goes wrong in the one that's up. This solves availability and scalability problem.
There are multiple tools that ease or automate different parts of this process. You can also check out mia which lets you do all the above and also give a nice frontend UI to your model web app. Its a no-code, low-code tool so you can go from a saved model to a deployed Web App and an API endpoint within minutes.
(Edit - Disclaimer: I'm part of the team responsible for building mia)
I am an experienced Python developer starting to work on web service
backend system. The system feeds data (constantly) from the web to a
MySQL database. This data is later displayed by a frontend side (there
is no connection between the frontend and the backend). The backend
system constantly downloads flight information from the web (some of
the data is fetched via APIs, and some by downloading and parsing
text / xls files). I already have a script that downloads the data,
parses it, and inserts it to the MySQL db - all in a big loop. The
frontend side is just a bunch of php pages that properly display the
data by querying the MySQL server.
It is crucial that this web service be robust, strong and reliable.
Therefore, I have been looking into the proper ways to design it, and came across the following parts to comprise my system:
1) django as a framework (for HTTP connections and for using Piston)
2) Piston as an API provider (this is great because then my front-end can use the API instead of actually running queries)
3) SQLAlchemy as the DB layer (I don't like the little control you get when using django ORM, I want to be able to run a more complex DB framework)
4) Apache with mod_wsgi to run everything
5) And finally, Celery (or django-cron) to actually run my infinite loop that pulls the data off the web - hopefully in some sort of organized tasks format). This is the part I am least sure of, and any pointers are appreciated.
This all sounds great. I used django before to write websites (aka
request handlers that return data). However, other than using Celery or django-cron I can't really see how it fits a role of a constant data feeding backend.
I just wanted to run this by you guys to hear your ideas / comments. Any input you have / pointers to documentation and/or other libraries would be greatly greatly appreciated!
If You are about to use SQLAlchemy, I would refrain from using Django: Django is fine if You are using the whole stack, but as You are about to rip Models off, I do not see much value in using it and I would take a look at another option (perhaps Pylons or pure old CherryPy would do).
Even more so if FEs will not run queries, but only ask API providers.
As for robustness, I am more satisfied with starting separate fcgi processess with supervise and using more lightweight web server (ligty / nginx), but that's a matter of taste.
For the "infinite loop" part, it depends on what behavior you want: if there is a problem with the source, would you just like to skip the step or repeat it multiple times when source is back up?
Periodic Tasks might be good for former, while cron that would just spawn scraping tasks is better for latter.
So I'm trying to do more web development in python, and I've picked cherrypy, hosted by lighttpd w/ fastcgi. But my question is a very basic one: why do I need to restart lighttpd (or apache) every time I change my application code, or the code for an underlying library?
I realize this question extends from a basic mis(i.e. poor)understanding of the fastcgi model, so I'm open to any schooling here, but I'm used to just changing a PHP file and it showing up, versus having to bounce the web server.
Any elucidation/useful mockery appreciated.
This is because of performance. For development, autoreloading is helpful. But for production, you don't want to autoreload. This is actually a decently-sized bottleneck in say PHP. Every time you access a PHP webpage, the server has to parse and load each page from scratch. With Python, the script is already loaded and running after the first access.
As has been pointed out, CherryPy has a autoreload setting. I'd recommend using the CherryPy built-in server for development and using lighttpd for production. That will likely save you some time. The tutorial shows you how to do this.
From a system-software-writer's pointer of view: This all depends on how the meta-data about the server process is organized within your daemon (lighttpd or fcgi). Some programs are designed for one time only initialization -- MOSTLY this allows a much simpler and better performing internal programming model.
Often it is very hard to program a server process reload config data in a easy way. You might have to introduce locks and external event objects (signals in UNIX). When you can synchronize the data structures by design -- i.e., only initializing once .... why complicate things by making the data model modifiable multiple times ?