Python Django Celery AsyncResult Memory Leak

Python Django Celery AsyncResult Memory Leak - python

The problem is a very serious memory leak until the server crashes (or you could recover by killing the celery worker service, which releases all the RAM used)
There seems to be a bunch of reported bugs on this matter, but very little attention is paid to this warning, In the celery API docs, here
Warning:
Backends use resources to store and transmit results. To ensure that resources are released, you must eventually call get() or forget() on EVERY AsyncResult instance returned after calling a task.
And it is reasonable to assume that the leak is related to this warning.
But the conceptual problem is, based on my understanding of celery, that AsyncResult instances are created across multiple Django views within a user session: some are created as you initiate/spawn new tasks in one view, and some you may create later manually (using task_id saved in the user session) to check on the progress (state) of those tasks in another view.
Therefore, AsynResult objects will eventually go out of scope across multiple Views in a real world Django application, and you don't want to call get() in ANY of these views, because you don't want to slow down the Django (or the apache2) daemon process.
Is the solution to never let AsyncResult Objects go out of scope before calling their get() method?
CELERY_RESULT_BACKEND = 'django-db' #backend is a mysql DB
BROKER_URL = 'pyamqp://localhost' #rabbitMQ

We also faced multiple issues with celery in production, and also tackled a memory leak issue. I'm not sure if our problem scope is the same, but if you don't mind you could try out our solution.
You see we had multiple tasks running on a couple of workers managed by supervisor (all workers were on the same Queue). Now, what we saw that when there were a lot of tasks being queued, the broker (in our case rabbitmq) was sending the amount of tasks our celery workers could process and keeping the rest in memory. This resulted in our memory overflowing and the broker started paginating in our hard drive. We found out from reading the docs that if we allow our broker to not wait for worker results, this issue could be resolved. Thus, in our tasks we used the option,
#task(time_limit=10, ignore_result=True)
def ggwp():
# do sth
Here, the time limit would close the task after a certain amount of time, and the ignore_result option would allow the broker to just send the task in celery workers as soon as a worker is freed.

Related

Persisting all job results in a separate db in Celery

We are running an API server where users submit jobs for calculation, which take between 1 second and 1 hour. They then make requests to check the status and get their results, which could be (much) later, or even never.
Currently jobs are added to a pub/sub queue, and processed by various worker processes. These workers then send pub/sub messages back to a listener, which stores the status/results in a postgres database.
I am looking into using Celery to simplify things and allow for easier scaling.
Submitting jobs and getting results isn't a problem in Celery, using celery_app.send_task. However, I am not sure how to best ensure the results are stored when, particularly for long-running or possibly abandoned jobs.
Some solutions I considered include:
Give all workers access to the database and let them handle updates. The main limitation to this seems to be the db connection pool limit, as worker processes can scale to 50 replicas in some cases.
Listen to celery events in a separate pod, and write changes based on this to the jobs db. Only 1 connection needed, but as far as I understand, this would miss out on events while this pod is redeploying.
Only check job results when the user asks for them. It seems this could lead to lost results when the user takes too long, or slowly clog the results cache.
As in (3), but periodically check on all jobs not marked completed in the db. A tad complicated, but doable?
Is there a standard pattern for this, or am I trying to do something unusual with Celery? Any advice on how to tackle this is appreciated.

In the past I solved similar problem by modifying tasks to not only return result of the computation, but also store it into a cache server (Redis) right before it returns. I had a task that periodically (every 5min) collects these results and writes data (in bulk, so quite effective) to a relational database. It was quite effective until we started filling the cache with hundreds of thousands of results, so we implemented a tiny service that does this instead of task that runs periodically.

how to profile memory usage of a celery task?

I have a django application that runs background tasks using the celery lib and I need to obtain and store the max memory usage of a task.
I've tried memory_usage from memory_profiler library, but I can not use this function inside a task because I get the error: "daemonic processes not allowed have children". I've also tried the memory_usage function outside the task, to monitor the task.async call, but for some reason the task is triggered twice.
All the other ways I found out there consist of checking the memory usage in different places of the code and then getting the maximum, but I have the feeling that it is very inaccurate and there are probably some calls that have a high memory usage that is left out because of garbage collection before I manage to check the current memory usage.
the official documentation has some useful functions but it would have to rely on the method above. https://docs.celeryproject.org/en/latest/reference/celery.utils.debug.html
Thanks in advance!

Why not a controller tasks?
Celery infrastructure let to query the current status of all workers:
from celery import Celery
app = Celery(...)
app.control.inspect().active()
This can be used inside a task to poll every # sec the cluster and understand what's happening.
I've used a similar approach to identify and send the kill() command between tasks. My tasks are killable so each of them know how to handle the soft kill.

ThreadPoolExecutor on long running process

I want to use ThreadPoolExecutor on a webapp (django),
All examples that I saw are using the thread pool like that:
with ThreadPoolExecutor(max_workers=1) as executor:
code
I tried to store the thread pool as a class member of a class and to use map fucntion
but I got memory leak, the only way I could use it is by the with notation
so I have 2 questions:
Each time I run with ThreadPoolExecutor does it creates threads again and then release them, in other word is this operation is expensive?
If I avoid using with how can I release the memory of the threads
thanks

Normally, web applications are stateless. That means every object you create should live in a request and die at the end of the request. That includes your ThreadPoolExecutor. Having an executor at the application level may work, but it will be embedded into your web application instead of running as a separate group of processes.
So if you want to take the workers down or restart them, your web app will have to restart as well.
And there will be stability concerns, since there is no main process watching over child processes detecting which one has gotten stale, so requires a lot of code to get multiprocessing right.
Alternatively, If you want a persistent group of processes to listen to a job queue and run your tasks, there are several projects that do that for you. All you need to do is to set up a server that takes care of queueing and locking such as redis or rabbitmq, then point your project at that server and start the workers. Some projects even let you use the database as a job queue backend.

Does Django Block When Celery Queue Fills?

I'm doing some metric analysis on on my web app, which makes extensive use of celery. I have one metric which measures the full trip from a post_save signal through a celery task (which itself calls a number of different celery tasks) to the end of that task. I've been hitting the server with up to 100 requests in 5 seconds.
What I find interesting is that when I hit the server with hundreds of requests (which entails thousands of celery worker processes being queued), the time it takes for the trip from post save to the end of the main celery task increases significantly, even though I never do any additional database calls, and none of the celery tasks should be blocking the main task.
Could the fact that there are so many celery tasks in the queue when I make a bunch of requests really quickly be slowing down the logic in my post_save function and main celery task? That is, could the processing associated with getting the sub-tasks that the main celery task creates onto a crowded queue be having a significant impact on the time it takes to reach the end of the main celery task?

It's impossible to really answer your question without an in-depth analysis of your actual code AND benchmark protocol, and while having some working experience with Python, Django and Celery I wouldn't be able to do such an in-depth analysis. Now there are a couple very obvious points :
if your workers are running on the same computer as your Django instance, they will compete with Django process(es) for CPU, RAM and IO.
if the benchmark "client" is also running on the same computer then you have a "heisenbench" case - bombing a server with 100s of HTTP request per second also uses a serious amount of resources...
To make a long story short: concurrent / parallel programming won't give you more processing power, it will only allow you to (more or less) easily scale horizontally.

I'm not sure about slowing down, but it can cause your application to hang. I've had this problem where one application would backup several other queues with no workers. My application could then no longer queue messages.
If you open up a django shell and try to queue a task. Then hit ctrl+c. I can't quite remember what the stack trace should be, but if you post it here I could confirm it.

how to track revoked tasks in across multiple celeryd processes

I have a reminder type app that schedules tasks in celery using the "eta" argument. If the parameters in the reminder object changes (e.g. time of reminder), then I revoke the task previously sent and queue a new task.
I was wondering if there's any good way of keeping track of revoked tasks across celeryd restarts. I'd like to have the ability to scale celeryd processes up/down on the fly, and it seems that any celeryd processes started after the revoke command was sent will still execute that task.
One way of doing it is to keep a list of revoked task ids, but this method will result in the list growing arbitrarily. Pruning this list requires guarantees that the task is no longer in the RabbitMQ queue, which doesn't seem to be possible.
I've also tried using a shared --statedb file for each of the celeryd workers, but it seems that the statedb file is only updated on termination of the workers and thus not suitable for what I would like to accomplish.
Thanks in advance!

Interesting problem, I think it should be easy to solve using broadcast commands.
If when a new worker starts up it requests all the other workers to dump its revoked
tasks to the new worker. Adding two new remote control commands,
you can easily add new commands by using #Panel.register,
Module control.py:
from celery.worker import state
from celery.worker.control import Panel
#Panel.register
def bulk_revoke(panel, ids):
state.revoked.update(ids)
#Panel.register
def broadcast_revokes(panel, destination):
panel.app.control.broadcast("bulk_revoke", arguments={
"ids": list(state.revoked)},
destination=destination)
Add it to CELERY_IMPORTS:
CELERY_IMPORTS = ("control", )
The only missing problem now is to connect it so that the new worker
triggers broadcast_revokes at startup. I guess you could use the worker_ready
signal for this:
from celery import current_app as celery
from celery.signals import worker_ready
def request_revokes_at_startup(sender=None, **kwargs):
celery.control.broadcast("broadcast_revokes",
destination=sender.hostname)

I had to do something similar in my project and used celerycam with django-admin-monitor. The monitor takes a snapshot of tasks and saves them in the database periodically. And there is a nice user interface to browse and check the status of all tasks. And you can even use it even if your project is not Django based.

I implemented something similar to this some time ago, and the solution I came up with was very similar to yours.
The way I solved this problem was to have the worker fetch the Task object from the database when the job ran (by passing it the primary key, as the documentation recommends). In your case, before the reminder is sent the worker should perform a check to ensure that the task is "ready" to be run. If not, it should simply return without doing any work (assuming that the ETA has changed and another worker will pick up the new job).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.