Can Flask run on Gunicorn alone?

Can Flask run on Gunicorn alone? - python

I'm currently developing HTTP Rest API server using Flask and Gunicorn. For various reason, it is not possible to put a reverse proxy server in front of Gunicorn. I don't have any static media, and all url are being served by #app.route pattern in Flask Framework. Can Flask run on Gunicorn alone?

It could, but it is a very bad idea. Gunicorn is not working well without a proxy that is doing request and response buffering for slow clients.
Without buffering the gunicorn worker has to wait until the client has send the whole request and then has to wait until the client has read the whole response.
This can be a serious problem if there are clients on a slow network for example.
http://docs.gunicorn.org/en/latest/deploy.html?highlight=buffering
see also: http://blog.etianen.com/blog/2014/01/19/gunicorn-heroku-django/
Because Gunicorn has a relatively small (2x CPU cores) pool of workers, if can only handle a small number of concurrent requests. If all the worker processes become tied up waiting for network traffic, the entire server will become unresponsive. To the outside world, your web application will cease to exist.

Related

heroku timeout error H12 when calling API

Im receiving a heroku timeout error with code H12 when Im calling an api via my flask app. The api usualy responds within 2min. Im calling the api via a different thread so that the main flask app thread keeps running.
with ThreadPoolExecutor(max_workers=5) as executor:
future = executor.submit(shub_api, website, merchant.id)
result = future.result()
There is some documentation on Heroku on running background tasks, however the python examples were for using Redis that i know nothing about. Are there some other solutions to this problem?

This is not working because of the way Heroku is architected.
When your web application is deployed to Heroku, it runs on dynos. Dynos are "ephemeral webservers" that only live for a small amount of time. This means that when a user makes a request to your app, the user's request will be handled by a dyno that may only live for a short period of time.
Heroku dynos are constantly starting, stopping, and being moved around to other physical hosts. This means that web dynos should not be used to run tasks that take a long time to complete (there are different worker dynos for that).
Furthermore, every web request that is served by a Heroku dyno has a 30-second timeout. What this means is that if someone makes an HTTP request to your app on Heroku, your app must start responding to the client within 30 seconds, otherwise, Heroku's routing layer will issue an H12 TIMEOUT error to you because it thinks your app has frozen or gotten stuck in a loop somewhere.
To sum it up: Heroku's architecture is such that it is designed from the ground up to follow web best practices, which means having your HTTP requests finish quickly (< 30 seconds) and not relying on your web servers being permanent fixtures where you can just run code on them all the time.
What you should do to resolve this issue instead is to use a background worker process (essentially it's just a second type of dyno you can run some code on that will process long-running tasks) and have your web application send a notification to your worker process to start running your task code.
This is typically done via a message queue like Redis, AWS SQS, etc. This Heroku article explains the concept in more detail.

How to interpose RabbitMQ between REST client and (Python) REST server?

If I develop a REST service hosted in Apache and a Python plugin which services GET, PUT, DELETE, PATCH; and this service is consumed by an Angular client (or other REST interacting browser technology). Then how do I make it scale-able with RabbitMQ (AMQP)?
Potential Solution #1
Multiple Apache's still faces off against the browser's HTTP calls.
Each Apache instance uses an AMQP plugin and then posts message to a queue
Python microservices monitor a queue and pull a message, service it and return response
Response passed back to Apache plugin, in turn Apache generates the HTTP response
Does this mean the Python microservice no longer has any HTTP server code at all. This will change that component a lot. Perhaps best to decide upfront if you want to use this pattern as it seems it would be a task to rip out any HTTP server code.
Other potential solutions? I am genuinely puzzled as to how we're supposed to take a classic REST server component and upgrade it to be scale-able with RabbitMQ/AMQP with minimal disruption.

I would recommend switching wsgi to asgi(nginx can help here), Im not sure why you think rabbitmq is the solution to your problem, as nothing you described seems like that would be solved by using this method.
asgi is not supported by apache as far as I know, but it allows the server to go do work, and while its working it can continue to service new requests that come in. (gross over simplification)
If for whatever reason you really want to use job workers (rabbitmq, etc) then I would suggest returning to the user a "token" (really just the job_id) and then they can call with that token, and it will report back either the current job status or the result

Regarding GIL in python

I know GIL blocks python from running its threads across cores. If it does so, why python is being used in webservers, how are the companies like youtube, instagram handling it.
PS: I know alternatives like multiprocessing can solve it. But it would be great if anyone can post it with a scenario that was handled by them.

Python is used for server-side handling in webservers, but not (usually) as webserver.
On normal setup: we have have Apache or other webserver to handles a lot of processes (server-side) (python uses usually wsgi). Note usually apache handles directly "static" files. So we have one apache server, many parallel apache processes (to handle connection and basic http) and many python processes which handles one connection per time.
Each of such process are independent each others (they just use the same resources), so you can program your server side part easily, without worrying about deadlocks. It is mostly a trade-off: performance of code, and easy and quickly to produce code without huge problems. But usually webserver with python scale very well (also on large sites), and servers are cheaper then programmers.
Note: security is also increased by having just one request in a process.

GIL exists in CPython, (Python interpreter made in C and most used), other interpreter versions such as Jython or IronPython don't have such problem, because they don't have GIL.
Even though, using CPython you can still have concurrency, just do your thing in C and then "link it" in your Python code, just like Numpy or similar do.
Other thing is, even though you have your page using Flask or Django, when you set up it in a production server, you have an Apache or Nginx, etc which has a real charge balancer (or load balancer, I can't remember the name in english now) that can serve the page to many people at the same time.
Take it from the Flask docs (link):
Flask’s built-in server is not suitable for production as it doesn’t scale well and by default serves only one request at a time.
[...]
If you want to deploy your Flask application to a WSGI server not listed here, look up the server documentation about how to use a WSGI app with it. Just remember that your Flask application object is the actual WSGI application.

Although a bit late, but I will try to give a generic and useful answer.
#Giacomo Catenazzi's answer is a good one but some part of it is factually incorrect.
API requests (or other form of web requests) are served from an already running process. The creation of this 'already running' process is handled by some webserver like gunicorn which on startup creates specified number of processes that are running the code in your web application continuously waiting to serve any incoming request.
Needless to say, each of these processes are limited by the GIL to only run one thread at a time. But one process in its lifetime handles more than one (normally many) request. Here it would be better if we could understand the flow of a request.
We will take an example of flask but this is applicable to most web frameworks. When a request comes from Nginx, it is handed over to gunicorn which interacts with your web application via wsgi. When the request reaches to the framework, an app context is created and some variables are pushed into the app-context. Then it follows the normal route that mostly people are familiar with: routing, db calls, response creation and so on. The response is then handed back to the gunicorn via wsgi again. At the time of handing over the response, the app context is teared down. So it's the app context, not the process that is created on every new request.
Also, I have talked only about the sync worker in gunicorn but it also has an option of async worker which can handle multiple requests in parallel through coroutines. But thats a separate topic.
So answering your question:
Nginx (Capable of handling multiple requests at a time)
Gunicorn creates a pool of n number of processes at the start and also manages the pool in the sense that if a process exits or gets stuck, it kills/recreates ans adds that to the pool.
Each process handling 1 request at a time.
Read more about gunicorn's design and how it can be used to help you achieve your requirements. This is a good thread about gunicorn with flask understanding. And this is a great resource to understand flask app context

How to pass long-pooling requests or web socket requests to heroku worker dyno?

I have a web dyno that handles all my web requests. If a request comes in that is a long-pooling request how to you pass that connection off to a worker dyno?
Ex. Request to example.com/long-pooling handled by web dyno needs to be passed to worker dyno to free up the web dyno for more requests.
Thanks.

You don't exactly pass on the request. It sounds like you want a queueing library like RQ or Celery. Essentially you queue up the long running request and then on the client side you poll for updates
https://devcenter.heroku.com/articles/python-rq
https://github.com/celery/celery/

Differentiate nginx, haproxy, varnish and uWSGI/Gunicorn [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am really new to sys admin stuff, and have only provisioned a VPS with nginx(serving the static files) and gunicorn as the web server.
I have lately been reading about different other stuff. I came to know about other tools:
nginx : high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server
haproxy : high performance load balancer
varnish : caching HTTP reverse proxy
gunicorn : python WSGI http server
uwsgi : another python WSGI server
I have been reading about all the above 5 tools and I have confused myself as to which one is used for what purpose? Could someone please explain me in lay man terms what use is each of the tool put in, when used together and which specific concern do they address?

Let's say you plan to host a few websites on your new VPS. Let's look at the tools you might need for each site.
HTTP Servers
Website 'Alpha' just consists of a some pure HTML, CSS and Javascript. The content is static.
When someone visits website Alpha, their browser will issue an HTTP request. You have configured (via DNS and name server configuration) that request to be directed to the IP address of your VPS. Now you need your VPS to be able to accept that HTTP request, decide what to do with it, and issue a response that the visitor's browser can understand. You need an HTTP server, such as Apache httpd or NGINX, and let's say you do some research and eventually decide on NGINX.
Application Servers
Website 'Beta' is dynamic, written using the Django Web Framework.
WSGI is an protocol that describes the interface between a Python application (the django app) and an application server. So what you need now is an WSGI app server, which will be able to understand web requests, make appropriate 'calls' to the application's various objects, and return the results. You have many options here, including gunicorn and uWSGI. Let's say you do some research and eventually decide on uWSGI.
uWSGI can accept and handle HTTPS requests for static content as well, so if you wanted to you could have website Alpha served entirely by NGINX and website Beta served entirely by uWSGI. And that would be that.
Reverse Proxy Servers
But uWSGI has poor performance in dealing with static content, so you would rather use NGINX for static content like images, even on website Beta. But then something would have to distinguish between requests and send them to the right place. Is that possible?
It turns out NGINX is not just an HTTP server but also a reverse proxy server: it is capable of redirecting incoming requests to another place, like your uWSGI application server, or many other places, collecting the response(s) and sending them back to the original requester. Awesome! So you configure all incoming requests to go to NGINX, which will serve up static content or, when required, redirect it to the app server.
Load Balancing with multiple web servers
You are also hosting Website Gamma, which is a blog that is popular internationally and receives a ton of traffic.
For Gamma you decide to set up multiple web servers. All incoming requests are going to your original VPS with NGINX, and you configure NGINX to redirect the request to one of several other web servers based in round-robin fashion, and return the response to the original requester.
HAProxy is web server that specializes in balancing loads for high traffic sites. In this case, you were able to use NGINX to handle traffic for site Gamma. In other scenarios, one may choose to set up a high-availability cluster: e.g., send all requests to a server like HAProxy, which intelligently redirects traffic to a cluster of nginx servers similar to your original VPS.
Cache Server
Website Gamma exceeded the capacity of your VPS due to the sheer volume of traffic. Let's say you instead hosted website Delta, and the reason your web server is unable to handle Delta is due to a popular feature that is very content-heavy.
A cache server is able to understand what media content is being frequently requested and store this content differently, such that it can be more quickly served. This is achieved by reducing disk IO operations; the popular content can be stored in memory or virtual memory instead. You might decide to combine your existing NGINX stack with a technology like Varnish or Memchached to achieve this type of optimization and server website Gamma more effectively.

I will put a very concise (very informal) description for each one, in the order they would be hit when you make a request from your web browser:
HAProxy balances your traffic load, so if your webpage is receiving 5000 hits per second, you can't handle that with only one
webserver, so HAProxy will balance the hits among the webservers you
had behind.
Varnish is a cache server, it sits upfront your webservers and behind HAProxy, so if a resource is already cached by Varnish he will serve the request itself, instead
of passing the request to the webservers behind.
ngingx, gunicorn, uwsgi are web servers, that would be behind varnish and will get the requests that varnish will let pass
through. These web servers use optimized designs to handle high
loads (requests per second).

First gunicorn and uwsgi are both appservers. In other words they are responsible for running your python code in a stable and performant manner. Usually as a backend to a regular webserver.
The webserver would be nginx, it excels at serving static assets and passing the requests for dynamic content on to the appservers.
If the above doesn't give enough performance you add in varnish between nginx and the client, it should speed up repeated requests for the same thing.
haproxy is a load balancer, if you have several servers for the same content, this software will attempt to distribute requests among them optimally.
so basically:
your python code lives in the appserver (uwsgi or gunicorn)
your static webassets live in nginx
haproxy and varnish are software that allow you to better server very large amounts of requests

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.