How to cancel conflicting / old tasks in Celery? - python

I'm using Celery + RabbitMQ.
When a Celery worker isn't available all the tasks are waiting in RabbitMQ.
Just as it becomes online all this bunch of tasks is executed immediately.
Can I somehow prevent it happening?
For example there are 100 tasks (the same) waiting for a Celery worker, can I execute only 1 of them when a Celery worker comes online?

Since all the tasks are the same in your queue, A better way to do this is to send the task only once, to do this you need to be able to track that the task was published, for example:
Using a lock, example: Ensuring a task is only executed one at a time
Using a custom task ID and a custom state after the task is published, for example:
To add a custom state when the task is published:
from celery import current_app
from celery.signals import after_task_publish
#after_task_publish.connect
def add_sent_state(sender=None, body=None, **kwargs):
"""Track Published Tasks."""
# get the task instance from its name
task = current_app.tasks.get(sender)
# if there is no task.backend fallback to app.backend
backend = task.backend if task else current_app.backend
# store the task state
backend.store_result(body['id'], None, 'SENT')
When you want to send the task you can check if the task has already been published, and since we're using a custom state the task's state won't be PENDING when it's published (which could be unkown) so we can check using:
from celery import states
# the task has a custom ID
task = task_func.AsyncResult('CUSTOM_ID')
if task.state != states.PENDING:
# the task already exists
else:
# send the task
task_func.apply_async(args, kwargs, task_id='CUSTOM_ID')
I'm using this approach in my app and it's working great, my tasks could be sent multiple times and they are identified by their IDs so this way each task is sent once.
If you're still want to cancel all the tasks in the queue you can use:
# import your Celery instance
from project.celery import app
app.control.purge()
Check the Celery FAQ How do I purge all waiting tasks ?

There are two ways to do this.
First, Run only one worker with a concurrency of one.
celery worker -A your_app -l info -c 1
This command starts a worker with a concurrency of one. So only one task will be executed at a time. This is the preferred way to do it.
Second method is bit complicated. You need to acquire lock and release the lock to make sure only one task is executed at a time.
Alternatively, if you want, you can remove all the tasks from queue using purge command.
celery -A your_app purge

Related

Synchronous celery queues

I have an app where each user is able to create tasks, and each task the user creates is added to a dynamic queue for the specific user. So all tasks from User1 are added to User1_queue, User2 to User2_queue, etc.
What I need to happen is when User1 adds Task1, Task2, and Task3 to their queue, Task1 is executed and Celery waits until it is finished before it executes Task2, and so on.
Having them execute along side each other from multiple queues is fine, so Task1 from both User1_queue, and Task1 from User2_queue. Its just limiting Celery to synchronously execute tasks in a queue in the order they're added.
Is it possible to have Celery have a concurrency of 1 per queue so that tasks are not executed alongside each other in the same queue?
If it helps anyone that visits this question, I solved my problem by setting up multiple workers with a concurrency of 1, each on a unique queue. I then used some logic in my Django app to store a queue name per session for the active user.
Down the line I'll add extra logic to select the 'least busy' worker, and try to evenly spread users across the active workers. But for now it is working perfectly.
Flower, the monitoring tool for Celery, was also a huge help while trying to figure this out.
You can select a queue -Q option for each worker to work on with --concurrency=1
celery -A proj worker --concurrency=1 -n user1#%h -Q User1_queue
#JamesFoley Although following could be implemented on celery side, for now it is not ( https://github.com/celery/celery/issues/1599 )
Some of ideas:
dynamically spawn/control celery workers
singular beat task to decide which tasks can be spawned to run ( lock/mutex or DB table monitoring tasks)

How to rate limit Celery tasks by task name?

I'm using Celery to process asynchronous tasks from a Django app. Most tasks are short and run in a few seconds, but I have one task that can take a few hours.
Due to processing restrictions on my server, Celery is configured to only run 2 tasks at once. That means if someone launches two of these long-running tasks, it effectively blocks all other Celery processing site wide for several hours, which is very bad.
Is there any way to configure Celery so it only processes one type of task no more than one at a time? Something like:
#task(max_running_instances=1)
def my_really_long_task():
for i in range(1000000000):
time.sleep(6000)
Note, I don't want to cancel all other launches of my_really_long_task. I just don't want them to start right away, and only begin once all other tasks of the same name finish.
Since this doesn't seem to be supported by Celery, my current hacky solution is to query other tasks within the task, and if we find other running instances, then reschedule ourselves to run later, e.g.
from celery.task.control import inspect
def get_all_active_celery_task_names(ignore_id=None):
"""
Returns Celery task names for all running tasks.
"""
i = inspect()
task_names = defaultdict(int) # {name: count}
if i:
active = i.active()
if active is not None:
for worker_name, tasks in i.active().iteritems():
for task in tasks:
if ignore_id and task['id'] == ignore_id:
continue
task_names[task['name']] += 1
return task_names
#task
def my_really_long_task():
all_names = get_all_active_celery_task_names()
if 'my_really_long_task' in all_names:
my_really_long_task.retry(max_retries=100, countdown=random.randint(10, 300))
return
for i in range(1000000000):
time.sleep(6000)
Is there a better way to do this?
I'm aware of other hacky solutions like this, but setting up a separate memcache server to track task uniqueness is even less reliable, and more complicated than the method I use above.
An alternate solution is to queue my_really_long_task into a seperate queue.
my_really_long_task.apply_async(*args, queue='foo')
Then start a worker with a concurrency of 1 to consume these tasks so that only 1 task gets executed at a time.
celery -A foo worker -l info -Q foo

How to limit the maximum number of running Celery tasks by name

How do you limit the number of instances of a specific Celery task that can be ran simultaneously?
I have a task that processes large files. I'm running into a problem where a user may launch several tasks, causing the server to run out of CPU and memory as it tries to process too many files at once. I want to ensure that only N instances of this one type of task are ran at any given time, and that other tasks will sit queued in the scheduler until the others complete.
I see there's a rate_limit option in the task decorator, but I don't think this does what I want. If I'm understanding the docs correctly, this will just limit how quickly the tasks are launched, but it won't restrict the overall number of tasks running, so this will make my server will crash more slowly...but it will still crash nonetheless.
You have to setup extra queue and set desired concurrency level for it. From Routing Tasks:
# Old config style
CELERY_ROUTES = {
'app.tasks.limited_task': {'queue': 'limited_queue'}
}
or
from kombu import Exchange, Queue
celery.conf.task_queues = (
Queue('default', default_exchange, routing_key='default'),
Queue('limited_queue', default_exchange, routing_key='limited_queue')
)
And start extra worker, serving only limited_queue:
$ celery -A celery_app worker -Q limited_queue --loglevel=info -c 1 -n limited_queue
Then you can check everything running smoothly using Flower or inspect command:
$ celery -A celery_app worker inspect --help
What you can do is to push these tasks to a specific queue and have X number of workers processing them. Having two workers on a queue with 100 items will ensure that there will only be two tasks processed at the same time.
I am not sure you can do that in Celery, what you can do is check how many tasks of that name are currently running when a request arrives and if it exceeds the maximum either return an error or add a mechanism that periodically checks if there are open slots for the tasks and runs it (if you add such a mechanism, you don't need to double check, just at each request add it to it's queue.
In order to check running tasks, you can use the inspect command.
In short:
app = Celery(...)
i = app.control.inspect()
i.active()

How to delete tasks from celery task queue?

How can i delete all tasks in a queue, right after a task ended?
I want something like this (Deleting all pending tasks in celery / rabbitmq) but for celery 3.0.
Thanks
EDIT:
From celery documentation:
http://docs.celeryproject.org/en/latest/faq.html#how-do-i-purge-all-waiting-tasks
My code looks like:
from celery import current_app as celery
#task
def task_a():
celery.control.purge()
I was expecting that, if i issued 5 tasks, only the first would run. Somehow, i'ts not doind that.
Thanks
Those tasks might have been already prefetched by workers. To find out is this so, try to run amount of tasks more than active workers multplied by prefetch multiplier (see below), and check what result is returned by celery.control.purge(). You can control amount of prefetched tasks using config parameters CELERYD_PREFETCH_MULTIPLIER and CELERY_ACKS_LATE.

How can I view the enqueued tasks in RabbitMQ?

I'm using RabbitMQ as my message broker and my workers are Celery tasks. I'm trying to diagnose an issue where I'm enqueue tasks to RabbitMQ but Celery doesn't pick then up.
Is there a way I can check what tasks are enqueued in RabbitMQ? I'd like to see the date and time when they are enqueued, any ETA is specified, the arguments and the task name.
I haven't been able to find this information in the docs — maybe I've overlooked it — and was hoping that some of you might know an easy way to inspect the task queue. Thanks.
You can use Flower to monitor tasks in real time.
https://github.com/mher/flower
Check out also rabbitmqclt command which inspects RabbitMQ server status:
http://www.rabbitmq.com/man/rabbitmqctl.1.man.html
rabbitmqctl list_queues
Also some celery tasks to monitor the queue:
http://docs.celeryproject.org/en/latest/userguide/monitoring.html
Check out these commands:
#shows status of all worker nodes
celery status
#List active tasks
celery inspect active
#Show worker statistics (call counts etc.)
celery inspect stats
I believe the command you are looking for is:
celery inspect reserved
The documentation[1] has the following description:
Reserved tasks are tasks that have been received, but are still waiting to be executed.
[1] http://docs.celeryproject.org/en/latest/userguide/workers.html?highlight=inspect%20reserved
As long as the management plugin is enabled, an arbitrary number of messages can be consumed from the queue and optionally requeued:
rabbitmqadmin get queue=queue_name requeue=true count=100

Categories