Preventing slow queries from exausting gunicorn worker pool

Preventing slow queries from exausting gunicorn worker pool - python

Let's say that we have a rather typical Django web application:
there is an Nginx in front of the app doing proxy stuff and serving
static content
there is gunicorn starting workers to handle Django requests
there is Django-based web app doing all kinds of fun stuff
there is a Redis server for sessions/cache
there is a MySQL database serving queries from Django
Some URLs have basically just a rendered Django template with almost no queries, some pages incorporate some info from Redis. But there are a few pages that do some rather involved database queries, which can (after all possible optimizations) take several seconds to execute on MySQL side.
And here my problem - each time a gunicorn worker gets a request for such heavy URL it no longer serves other requests for a while - it just sits there idle waiting for the database to reply. If there are enough such queries then eventually all workers just sit idle and wait on the heavy URLs leaving none to serve the other, faster pages.
It there a way to either allow worker to do other work while it is waiting on a database reply? Or to somehow scale up worker pool in such situation (preferably without also scaling RAM usage and database connection count :))? At least is there a way to find out any statistics on how many workers are busy in a gunicorn pool and for how long each of them has been processing a request?

A simple way that might work in your case would be to increase the number of workers. The recommended number of workers is 2-4 x {NUM CPUS}. Depending on load and the type of requests to the site this might be enough.
The next step to look into if increasing number of workers isn't enough, would be to look into using async workers (docs about it here). More detailed configuration options are described here. Note that depending on what type of async worker you choose to use, you will have to install either eventlet, gevent or tornado.

Related

django-plotly-dash multi session on CPU intensive pages

Running django-plotly-dash, I have multiple python pages. The issue is that when I am loading one of the pages while it is running some calculations, I can not run the same page or other pages from a different session, and the webserver is not responding for other users. If I look at the runserver output, it is busy rendering the first request only.

If I look at the runserver output, it is busy rendering the first request only.
If I understand correctly, this means you use Django's development server, and thus that you are in development (if you use django-admin runserver in production, that's a serious issue).
Now’s a good time to note: don’t use this server in anything resembling a production environment. It’s intended only for use while developing. (We’re in the business of making web frameworks, not web servers.)
Django's development server is supposed to be multithreaded and support concurrent requests. However, from my experience I also noticed that it can handle only one request at a time. I didn't dig too much into it but I assume it might be caused by an app that overrides the runserver command and disables multithreading.
In development this shouldn't be too much of an issue. And in production you won't suffer this kind of blocks as real WSGI servers such as gunicorn will be able to handle several concurrent requests (provided it is configured to use the available resources correctly, and the hardware is able to handle the load).
However if your pages are actually slow to respond, this can be an issue for the user loading the page, and will also require more resources to handle more concurrent requests. It all depends if "slow" means 2 seconds, 5 seconds, 30 seconds or even more. Reducing the response time will depend a lot on the bottleneck of your code and could include:
Optimizing the algorithms
Reducing and optimizing SQL queries (See Database access optimization)
Delaying to Celery the calculations that do not affect the response
Using websockets to flow and display the data as they get calculated without blocking the client until the whole page is calculated. (See django-channels)
Using asyncio to avoid staying idle while waiting for I/O operations.

Which requests should be handled by the webserver and which by a task queue worker?

I am working on a Python web app that uses Celery to schedule and execute user job requests.
Most of the time the requests submitted by a user can't be resolved immediately and thus it makes sense to me to schedule them in a queue.
However, now that I have the whole queuing architecture in place, I'm confused about whether I should delegate all the request processing logic to the queue/workers or if I should leave some of the work to the webserver itself.
For example, apart from the job scheduling, there are times where a user only needs to perform a simple database query, or retrieve a static JSON file. Should I also delegate these "synchronous" requests to the queue/workers?
Right now, my webserver controllers don't do anything except validating incoming JSON request schemas and forwarding them to the queue. What are the pros and cons of having a dumb webserver like this?

I believe the way you have it right now plus giving the workers the small jobs now is good. That way the workers would be overloaded first in the event of an attack or huge request influx. :)

Creating queue for Flask backend that can handle multiple users

I am creating a robot that has a Flask and React (running on raspberry pi zero) based interface for users to request it to perform tasks. When a user requests a task I want the backend to put it in a queue, and have the backend constantly looking at the queue and processing it on a one-by-one basis. Each tasks can take anywhere from 15-60 seconds so they are pretty lengthy.
Currently I just immediately do the task in the same python process that is running the Flask server, and from testing locally It seems like i can go to the react app in two different browsers and request tasks at the same time and it looks like the raspberry pi is trying to run them in parallel (from what I'm seeing in the printed logs).
What is the best way to allow multiple users to go to the front-end and queue up tasks? When multiple users go to the react app I assume they all connect to the same instance of the back-end. So it it enough just to add a dequeue to the back-end and protect it with a mutex lock (what is the pythonic way to use mutexes?). Or is this too simple? Do I need some other process or method to implement the task queue (such as writing/reading to an external file to act as the queue)?

In general, the most popular way to run tasks in Python is using Celery. It is a Python framework that runs on a separate process, continuously checking a queue (like Redis or AMQP) for tasks. When it finds one, it executes it, and logs the result to a "result backend" (like a database or Redis again). Then you have the Flask servers just push the tasks to the queue.
In order to notify the users, you could use polling from the React app, which is just requesting an update every 5 seconds until you see from the result backend that the task has completed successfully. As soon as you see that, stop polling and show the user the notification.
You can easily have multiple worker processes run in parallel, if the app would become large enough to need it. In general, you just need to remember to have every process do what it's needed to do: Flask servers should answer web requests, and Celery servers should process tasks. Not the other way around.

Does Django Block When Celery Queue Fills?

I'm doing some metric analysis on on my web app, which makes extensive use of celery. I have one metric which measures the full trip from a post_save signal through a celery task (which itself calls a number of different celery tasks) to the end of that task. I've been hitting the server with up to 100 requests in 5 seconds.
What I find interesting is that when I hit the server with hundreds of requests (which entails thousands of celery worker processes being queued), the time it takes for the trip from post save to the end of the main celery task increases significantly, even though I never do any additional database calls, and none of the celery tasks should be blocking the main task.
Could the fact that there are so many celery tasks in the queue when I make a bunch of requests really quickly be slowing down the logic in my post_save function and main celery task? That is, could the processing associated with getting the sub-tasks that the main celery task creates onto a crowded queue be having a significant impact on the time it takes to reach the end of the main celery task?

It's impossible to really answer your question without an in-depth analysis of your actual code AND benchmark protocol, and while having some working experience with Python, Django and Celery I wouldn't be able to do such an in-depth analysis. Now there are a couple very obvious points :
if your workers are running on the same computer as your Django instance, they will compete with Django process(es) for CPU, RAM and IO.
if the benchmark "client" is also running on the same computer then you have a "heisenbench" case - bombing a server with 100s of HTTP request per second also uses a serious amount of resources...
To make a long story short: concurrent / parallel programming won't give you more processing power, it will only allow you to (more or less) easily scale horizontally.

I'm not sure about slowing down, but it can cause your application to hang. I've had this problem where one application would backup several other queues with no workers. My application could then no longer queue messages.
If you open up a django shell and try to queue a task. Then hit ctrl+c. I can't quite remember what the stack trace should be, but if you post it here I could confirm it.

How many concurrent requests does a single Flask process receive?

I'm building an app with Flask, but I don't know much about WSGI and it's HTTP base, Werkzeug. When I start serving a Flask application with gunicorn and 4 worker processes, does this mean that I can handle 4 concurrent requests?
I do mean concurrent requests, and not requests per second or anything else.

When running the development server - which is what you get by running app.run(), you get a single synchronous process, which means at most 1 request is being processed at a time.
By sticking Gunicorn in front of it in its default configuration and simply increasing the number of --workers, what you get is essentially a number of processes (managed by Gunicorn) that each behave like the app.run() development server. 4 workers == 4 concurrent requests. This is because Gunicorn uses its included sync worker type by default.
It is important to note that Gunicorn also includes asynchronous workers, namely eventlet and gevent (and also tornado, but that's best used with the Tornado framework, it seems). By specifying one of these async workers with the --worker-class flag, what you get is Gunicorn managing a number of async processes, each of which managing its own concurrency. These processes don't use threads, but instead coroutines. Basically, within each process, still only 1 thing can be happening at a time (1 thread), but objects can be 'paused' when they are waiting on external processes to finish (think database queries or waiting on network I/O).
This means, if you're using one of Gunicorn's async workers, each worker can handle many more than a single request at a time. Just how many workers is best depends on the nature of your app, its environment, the hardware it runs on, etc. More details can be found on Gunicorn's design page and notes on how gevent works on its intro page.

Currently there is a far simpler solution than the ones already provided. When running your application you just have to pass along the threaded=True parameter to the app.run() call, like:
app.run(host="your.host", port=4321, threaded=True)
Another option as per what we can see in the werkzeug docs, is to use the processes parameter, which receives a number > 1 indicating the maximum number of concurrent processes to handle:
threaded – should the process handle each request in a separate thread?
processes – if greater than 1 then handle each request in a new process up to this maximum number of concurrent processes.
Something like:
app.run(host="your.host", port=4321, processes=3) #up to 3 processes
More info on the run() method here, and the blog post that led me to find the solution and api references.
Note: on the Flask docs on the run() methods it's indicated that using it in a Production Environment is discouraged because (quote): "While lightweight and easy to use, Flask’s built-in server is not suitable for production as it doesn’t scale well."
However, they do point to their Deployment Options page for the recommended ways to do this when going for production.

Flask will process one request per thread at the same time. If you have 2 processes with 4 threads each, that's 8 concurrent requests.
Flask doesn't spawn or manage threads or processes. That's the responsability of the WSGI gateway (eg. gunicorn).

No- you can definitely handle more than that.
Its important to remember that deep deep down, assuming you are running a single core machine, the CPU really only runs one instruction* at a time.
Namely, the CPU can only execute a very limited set of instructions, and it can't execute more than one instruction per clock tick (many instructions even take more than 1 tick).
Therefore, most concurrency we talk about in computer science is software concurrency.
In other words, there are layers of software implementation that abstract the bottom level CPU from us and make us think we are running code concurrently.
These "things" can be processes, which are units of code that get run concurrently in the sense that each process thinks its running in its own world with its own, non-shared memory.
Another example is threads, which are units of code inside processes that allow concurrency as well.
The reason your 4 worker processes will be able to handle more than 4 requests is that they will fire off threads to handle more and more requests.
The actual request limit depends on HTTP server chosen, I/O, OS, hardware, network connection etc.
Good luck!
*instructions are the very basic commands the CPU can run. examples - add two numbers, jump from one instruction to another

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.