How many concurrent requests does a single Flask process receive?

How many concurrent requests does a single Flask process receive? - python

I'm building an app with Flask, but I don't know much about WSGI and it's HTTP base, Werkzeug. When I start serving a Flask application with gunicorn and 4 worker processes, does this mean that I can handle 4 concurrent requests?
I do mean concurrent requests, and not requests per second or anything else.

When running the development server - which is what you get by running app.run(), you get a single synchronous process, which means at most 1 request is being processed at a time.
By sticking Gunicorn in front of it in its default configuration and simply increasing the number of --workers, what you get is essentially a number of processes (managed by Gunicorn) that each behave like the app.run() development server. 4 workers == 4 concurrent requests. This is because Gunicorn uses its included sync worker type by default.
It is important to note that Gunicorn also includes asynchronous workers, namely eventlet and gevent (and also tornado, but that's best used with the Tornado framework, it seems). By specifying one of these async workers with the --worker-class flag, what you get is Gunicorn managing a number of async processes, each of which managing its own concurrency. These processes don't use threads, but instead coroutines. Basically, within each process, still only 1 thing can be happening at a time (1 thread), but objects can be 'paused' when they are waiting on external processes to finish (think database queries or waiting on network I/O).
This means, if you're using one of Gunicorn's async workers, each worker can handle many more than a single request at a time. Just how many workers is best depends on the nature of your app, its environment, the hardware it runs on, etc. More details can be found on Gunicorn's design page and notes on how gevent works on its intro page.

Currently there is a far simpler solution than the ones already provided. When running your application you just have to pass along the threaded=True parameter to the app.run() call, like:
app.run(host="your.host", port=4321, threaded=True)
Another option as per what we can see in the werkzeug docs, is to use the processes parameter, which receives a number > 1 indicating the maximum number of concurrent processes to handle:
threaded – should the process handle each request in a separate thread?
processes – if greater than 1 then handle each request in a new process up to this maximum number of concurrent processes.
Something like:
app.run(host="your.host", port=4321, processes=3) #up to 3 processes
More info on the run() method here, and the blog post that led me to find the solution and api references.
Note: on the Flask docs on the run() methods it's indicated that using it in a Production Environment is discouraged because (quote): "While lightweight and easy to use, Flask’s built-in server is not suitable for production as it doesn’t scale well."
However, they do point to their Deployment Options page for the recommended ways to do this when going for production.

Flask will process one request per thread at the same time. If you have 2 processes with 4 threads each, that's 8 concurrent requests.
Flask doesn't spawn or manage threads or processes. That's the responsability of the WSGI gateway (eg. gunicorn).

No- you can definitely handle more than that.
Its important to remember that deep deep down, assuming you are running a single core machine, the CPU really only runs one instruction* at a time.
Namely, the CPU can only execute a very limited set of instructions, and it can't execute more than one instruction per clock tick (many instructions even take more than 1 tick).
Therefore, most concurrency we talk about in computer science is software concurrency.
In other words, there are layers of software implementation that abstract the bottom level CPU from us and make us think we are running code concurrently.
These "things" can be processes, which are units of code that get run concurrently in the sense that each process thinks its running in its own world with its own, non-shared memory.
Another example is threads, which are units of code inside processes that allow concurrency as well.
The reason your 4 worker processes will be able to handle more than 4 requests is that they will fire off threads to handle more and more requests.
The actual request limit depends on HTTP server chosen, I/O, OS, hardware, network connection etc.
Good luck!
*instructions are the very basic commands the CPU can run. examples - add two numbers, jump from one instruction to another

Related

How does Waitress handle concurrent tasks?

I'm trying to build a python webserver using Django and Waitress, but I'd like to know how Waitress handles concurrent requests, and when blocking may occur.
While the Waitress documentation mentions that multiple worker threads are available, it doesn't provide a lot of information on how they are implemented and how the python GIL affects them (emphasis my own):
When a channel determines the client has sent at least one full valid HTTP request, it schedules a "task" with a "thread dispatcher". The thread dispatcher maintains a fixed pool of worker threads available to do client work (by default, 4 threads). If a worker thread is available when a task is scheduled, the worker thread runs the task. The task has access to the channel, and can write back to the channel's output buffer. When all worker threads are in use, scheduled tasks will wait in a queue for a worker thread to become available.
There doesn't seem to be much information on Stackoverflow either. From the question "Is Gunicorn's gthread async worker analogous to Waitress?":
Waitress has a master async thread that buffers requests, and enqueues each request to one of its sync worker threads when the request I/O is finished.
These statements don't address the GIL (at least from my understanding) and it'd be great if someone could elaborate more on how worker threads work for Waitress. Thanks!

Here's how the event-driven asynchronous servers generally work:
Start a process and listen to incoming requests. Utilizing the event notification API of the operating system makes it very easy to serve thousands of clients from single thread/process.
Since there's only one process managing all the connections, you don't want to perform any slow (or blocking) tasks in this process. Because then it will block the program for every client.
To perform blocking tasks, the server delegates the tasks to "workers". Workers can be threads (running in the same process) or separate processes (or subprocesses). Now the main process can keep on serving clients while workers perform the blocking tasks.
How does Waitress handle concurrent tasks?
Pretty much the same way I just described above. And for workers it creates threads, not processes.
how the python GIL affects them
Waitress uses threads for workers. So, yes they are affected by GIL in that they aren't truly concurrent though they seem to be. "Asynchronous" is the correct term.
Threads in Python run inside a single process, on a single CPU core, and don't run in parallel. A thread acquires the GIL for a very small amount of time and executes its code and then the GIL is acquired by another thread.
But since the GIL is released on network I/O, the parent process will always acquire the GIL whenever there's a network event (such as an incoming request) and this way you can stay assured that the GIL will not affect the network bound operations (like receiving requests or sending response).
On the other hand, Python processes are actually concurrent: they can run in parallel on multiple cores. But Waitress doesn't use processes.
Should you be worried?
If you're just doing small blocking tasks like database read/writes and serving only a few hundred users per second, then using threads isn't really that bad.
For serving a large volume of users or doing long running blocking tasks, you can look into using external task queues like Celery. This will be much better than spawning and managing processes yourself.

Hint: Those were my comments to the accepted answer and the conversation below, moved to a separate answer for space reasons.
Wait.. The 5th request will stay in the queue until one of the 4 threads is done with their previous handling, and therefore gone back to the pool. One thread will only ever server one request at a time. "IO bound" tasks only help in that the threads waiting for IO will implicitly (e.g. by calling time.sleep) tell the scheduler (python's internal one) that it can pass the GIL along to another thread since there's currently nothing to do, so that the others will get more CPU time for their stuff. On thread level this is fully sequential, which is still concurrent and asynchronous on process level, just not parallel. Just to get some wording staight.
Also, Python threads are "standard" OS threads (like those in C). So they will use all CPU cores and make full use of them. The only thing restricting them is that they need to hold the GIL when calling Python C-API functions, because the whole API in general is not thread-safe. On the other hand, calls to non-Python functions, i.e. functions in C extensions like numpy for example, but also many database APIs, including anything loaded via ctypes, do not hold the GIL while running. Why should they, they are running external C binaries which don't know anything of the Python interpreter running in the parent process. Therefore, such tasks will run truely in parallel when called from a WSGI app hosted by waitress. And if you've got more cores available, turn the thread number up to that amount (threads=X kwarg on waitress.create_server).

Is every flask request run on a separate thread? [duplicate]

What exactly does passing threaded = True to app.run() do?
My application processes input from the user, and takes a bit of time to do so. During this time, the application is unable to handle other requests. I have tested my application with threaded=True and it allows me to handle multiple requests concurrently.

As of Flask 1.0, the WSGI server included with Flask is run in threaded mode by default.
Prior to 1.0, or if you disable threading, the server is run in single-threaded mode, and can only handle one request at a time. Any parallel requests will have to wait until they can be handled, which can lead to issues if you tried to contact your own server from a request.
With threaded=True requests are each handled in a new thread. How many threads your server can handle concurrently depends entirely on your OS and what limits it sets on the number of threads per process. The implementation uses the SocketServer.ThreadingMixIn class, which sets no limits to the number of threads it can spin up.
Note that the Flask server is designed for development only. It is not a production-ready server. Don't rely on it to run your site on the wider web. Use a proper WSGI server (like gunicorn or uWSGI) instead.

How many requests will my application be able to handle concurrently with this statement?
This depends drastically on your application. Each new request will have a thread launched- it depends on how many threads your machine can handle. I don't see an option to limit the number of threads (like uwsgi offers in a production deployment).
What are the downsides to using this? If i'm not expecting more than a few requests concurrently, can I just continue to use this?
Switching from a single thread to multi-threaded can lead to concurrency bugs... if you use this be careful about how you handle global objects (see the g object in the documentation!) and state.

ThreadPoolExecutor on long running process

I want to use ThreadPoolExecutor on a webapp (django),
All examples that I saw are using the thread pool like that:
with ThreadPoolExecutor(max_workers=1) as executor:
code
I tried to store the thread pool as a class member of a class and to use map fucntion
but I got memory leak, the only way I could use it is by the with notation
so I have 2 questions:
Each time I run with ThreadPoolExecutor does it creates threads again and then release them, in other word is this operation is expensive?
If I avoid using with how can I release the memory of the threads
thanks

Normally, web applications are stateless. That means every object you create should live in a request and die at the end of the request. That includes your ThreadPoolExecutor. Having an executor at the application level may work, but it will be embedded into your web application instead of running as a separate group of processes.
So if you want to take the workers down or restart them, your web app will have to restart as well.
And there will be stability concerns, since there is no main process watching over child processes detecting which one has gotten stale, so requires a lot of code to get multiprocessing right.
Alternatively, If you want a persistent group of processes to listen to a job queue and run your tasks, there are several projects that do that for you. All you need to do is to set up a server that takes care of queueing and locking such as redis or rabbitmq, then point your project at that server and start the workers. Some projects even let you use the database as a job queue backend.

Handle Flask requests concurrently with threaded=True

What exactly does passing threaded = True to app.run() do?
My application processes input from the user, and takes a bit of time to do so. During this time, the application is unable to handle other requests. I have tested my application with threaded=True and it allows me to handle multiple requests concurrently.

As of Flask 1.0, the WSGI server included with Flask is run in threaded mode by default.
Prior to 1.0, or if you disable threading, the server is run in single-threaded mode, and can only handle one request at a time. Any parallel requests will have to wait until they can be handled, which can lead to issues if you tried to contact your own server from a request.
With threaded=True requests are each handled in a new thread. How many threads your server can handle concurrently depends entirely on your OS and what limits it sets on the number of threads per process. The implementation uses the SocketServer.ThreadingMixIn class, which sets no limits to the number of threads it can spin up.
Note that the Flask server is designed for development only. It is not a production-ready server. Don't rely on it to run your site on the wider web. Use a proper WSGI server (like gunicorn or uWSGI) instead.

How many requests will my application be able to handle concurrently with this statement?
This depends drastically on your application. Each new request will have a thread launched- it depends on how many threads your machine can handle. I don't see an option to limit the number of threads (like uwsgi offers in a production deployment).
What are the downsides to using this? If i'm not expecting more than a few requests concurrently, can I just continue to use this?
Switching from a single thread to multi-threaded can lead to concurrency bugs... if you use this be careful about how you handle global objects (see the g object in the documentation!) and state.

how to process long-running requests in python workers?

I have a python (well, it's php now but we're rewriting) function that takes some parameters (A and B) and compute some results (finds best path from A to B in a graph, graph is read-only), in typical scenario one call takes 0.1s to 0.9s to complete. This function is accessed by users as a simple REST web-service (GET bestpath.php?from=A&to=B). Current implementation is quite stupid - it's a simple php script+apache+mod_php+APC, every requests needs to load all the data (over 12MB in php arrays), create all structures, compute a path and exit. I want to change it.
I want a setup with N independent workers (X per server with Y servers), each worker is a python app running in a loop (getting request -> processing -> sending reply -> getting req...), each worker can process one request at a time. I need something that will act as a frontend: get requests from users, manage queue of requests (with configurable timeout) and feed my workers with one request at a time.
how to approach this? can you propose some setup? nginx + fcgi or wsgi or something else? haproxy? as you can see i'am a newbie in python, reverse-proxy, etc. i just need a starting point about architecture (and data flow)
btw. workers are using read-only data so there is no need to maintain locking and communication between them

The typical way to handle this sort of arrangement using threads in Python is to use the standard library module Queue. An example of using the Queue module for managing workers can be found here: Queue Example

Looks like you need the "workers" to be separate processes (at least some of them, and therefore might as well make them all separate processes rather than bunches of threads divided into several processes). The multiprocessing module in Python 2.6 and later's standard library offers good facilities to spawn a pool of processes and communicate with them via FIFO "queues"; if for some reason you're stuck with Python 2.5 or even earlier there are versions of multiprocessing on the PyPi repository that you can download and use with those older versions of Python.
The "frontend" can and should be pretty easily made to run with WSGI (with either Apache or Nginx), and it can deal with all communications to/from worker processes via multiprocessing, without the need to use HTTP, proxying, etc, for that part of the system; only the frontend would be a web app per se, the workers just receive, process and respond to units of work as requested by the frontend. This seems the soundest, simplest architecture to me.
There are other distributed processing approaches available in third party packages for Python, but multiprocessing is quite decent and has the advantage of being part of the standard library, so, absent other peculiar restrictions or constraints, multiprocessing is what I'd suggest you go for.

There are many FastCGI modules with preforked mode and WSGI interface for python around, the most known is flup. My personal preference for such task is superfcgi with nginx. Both will launch several processes and will dispatch requests to them. 12Mb is not as much to load them separately in each process, but if you'd like to share data among workers you need threads, not processes. Note, that heavy math in python with single process and many threads won't use several CPU/cores efficiently due to GIL. Probably the best approach is to use several processes (as much as cores you have) each running several threads (default mode in superfcgi).

The most simple solution in this case is to use the webserver to do all the heavy lifting. Why should you handle threads and/or processes when the webserver will do all that for you?
The standard arrangement in deployments of Python is:
The webserver start a number of processes each running a complete python interpreter and loading all your data into memory.
HTTP request comes in and gets dispatched off to some process
Process does your calculation and returns the result directly to the webserver and user
When you need to change your code or the graph data, you restart the webserver and go back to step 1.
This is the architecture used Django and other popular web frameworks.

I think you can configure modwsgi/Apache so it will have several "hot" Python interpreters
in separate processes ready to go at all times and also reuse them for new accesses
(and spawn a new one if they are all busy).
In this case you could load all the preprocessed data as module globals and they would
only get loaded once per process and get reused for each new access. In fact I'm not sure this isn't the default configuration
for modwsgi/Apache.
The main problem here is that you might end up consuming
a lot of "core" memory (but that may not be a problem either).
I think you can also configure modwsgi for single process/multiple
thread -- but in that case you may only be using one CPU because
of the Python Global Interpreter Lock (the infamous GIL), I think.
Don't be afraid to ask at the modwsgi mailing list -- they are very
responsive and friendly.

You could use nginx load balancer to proxy to PythonPaste paster (which serves WSGI, for example Pylons), that launches each request as separate thread anyway.

Another option is a queue table in the database.
The worker processes run in a loop or off cron and poll the queue table for new jobs.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.