how to remove task from celery with redis broker? - python

I Have add some wrong task to a celery with redis broker
but now I want to remove the incorrect task and I can't find any way to do this
Is there some commands or some api to do this ?

I know two ways of doing so:
1) Delete queue directly from broker. In your case it's Redis. There are two commands that could help you: llen (to find right queue) and del (to delete it).
2) Start celery worker with --purge or --discard options. Here is help:
--purge, --discard Purges all waiting tasks before the daemon is started.
**WARNING**: This is unrecoverable, and the tasks will
be deleted from the messaging server.

The simplest way is to use the celery control revoke [id1 [id2 [... [idN]]]] (do not forget to pass the -A project.application flag too). Where id1 to idN are task IDs. However, it is not guaranteed to succeed every time you run it, for valid reasons...
Sure Celery has API for it. Here is an example how to do it from a script: res = app.control.revoke(task_id, terminate=True)
In the example above app is an instance of the Celery application.
In some rare ocasions the control command above will not work, in which case you have to instruct Celery worker to kill the worker process: res = app.control.revoke(task_id, terminate=True, signal='SIGKILL')

I just had this problem so for future readers,
http://celery.readthedocs.org/en/latest/faq.html#i-ve-purged-messages-but-there-are-still-messages-left-in-the-queue
so to properly purge the queue of waiting tasks you have to stop all
the workers, and then purge the tasks using celery.control.purge().

1.
To properly purge the queue of waiting tasks you have to stop all the workers (http://celery.readthedocs.io/en/latest/faq.html#i-ve-purged-messages-but-there-are-still-messages-left-in-the-queue):
2 ... and then purge the tasks from a specific queue:
$ cd <source_dir
$ celery amqp queue.purge <queue name>
3.
Start workers again

try to remove the .state file and if you are using a beat worker (celery worker -B) then remove the schedule file as well

Related

Celery: How to separate the logic of Publisher and Consumer?

I am new to Celery. In this example, I am unable to figure out how to separate the logic of publisher and consumer. Is the command celery -A tasks worker --loglevel=INFO used to start working for publishing or consuming?
If add.delay(4, 4) is to push data into a queue, how do I connect to the same queue in a separate code file and consume it?
Publishers are typically either Celery beat (scheduler), custom scripts that you develop, or other tasks executed by Celery workers in your cluster.
Consumers are EXCLUSIVELY Celery workers. Unless you dig really deep into Celery/Kombu and implement your own consumer you are pretty much not able to write consumer so easily.

Stop celery workers to consume from all queues

Cheers,
I have a celery setup running in a production environment (on Linux) where I need to consume two different task types from two dedicated queues (one for each). The problem that arises is, that all workers are always bound to both queues, even when I specify them to only consume from one of them.
TL;DR
Celery running with 2 queues
Messages are published in correct queue as designed
Workers keep consuming both queues
Leads to deadlock
General Information
Think of my two different task types as a hierarchical setup:
A task is a regular celery task that may take quite some time, because it dynamically dispatches other celery tasks and may be required to chain through their respective results
A node is a dynamically dispatched sub-task, which also is a regular celery task but itself can be considered an atomic unit.
My task thus can be a more complex setup of nodes where the results of one or more nodes serves as input for one or more subsequent nodes, and so on. Since my tasks can take longer and will only finish when all their nodes have been deployed, it is essential that they are handled by dedicated workers to keep a sufficient number of workers free to consume the nodes. Otherwise, this could lead to the system being stuck, when a lot of tasks are dispatched, each consumed by another worker, and their respective nodes are only queued but will never be consumed, because all workers are blocked.
If this is a bad design in general, please make any propositions on how I can improve it. I did not yet manage to build one of these processes using celery's built-in canvas primitives. Help me, if you can?!
Configuration/Setup
I run celery with amqp and have set up the following queues and routes in the celery configuration:
CELERY_QUERUES = (
Queue('prod_nodes', Exchange('prod'), routing_key='prod.node'),
Queue('prod_tasks', Exchange('prod'), routing_key='prod.task')
)
CELERY_ROUTES = (
'deploy_node': {'queue': 'prod_nodes', 'routing_key': 'prod.node'},
'deploy_task': {'queue': 'prod_tasks', 'routing_key': 'prod.task'}
)
When I launch my workers, I issue a call similar to the following:
celery multi start w_task_01 w_node_01 w_node_02 -A my.deployment.system \
-E -l INFO -P gevent -Q:1 prod_tasks -Q:2-3 prod_nodes -c 4 --autoreload \
--logfile=/my/path/to/log/%N.log --pidfile=/my/path/to/pid/%N.pid
The Problem
My queue and routing setup seems to work properly, as I can see messages being correctly queued in the RabbitMQ Management web UI.
However, all workers always consume celery tasks from both queues. I can see this when I start and open up the flower web UI and inspect one of the deployed tasks, where e.g. w_node_01 starts consuming messages from the prod_tasks queue, even though it shouldn't.
The RabbitMQ Management web UI furthermore tells me, that all started workers are set up as consumers for both queues.
Thus, I ask you...
... what did I do wrong?
Where is the issue with my setup or worker start call; How can I circumvent the problem of workers always consuming from both queues; Do I really have to make additional settings during runtime (what I certainly do not want)?
Thanks for your time and answers!
You can create 2 separate workers for each queue and each one's define what queue it should get tasks from using the -Q command line argument.
If you want to keep the number processes the same, by default a process is opened for each core for each worker you can use the --concurrency flag (See Celery docs for more info)
Celery allows configuring a worker with a specific queue.
1) Specify the name of the queue with 'queue' attribute for different types of jobs
celery.send_task('job_type1', args=[], kwargs={}, queue='queue_name_1')
celery.send_task('job_type2', args=[], kwargs={}, queue='queue_name_2')
2) Add the following entry in configuration file
CELERY_CREATE_MISSING_QUEUES = True
3) On starting the worker, pass -Q 'queue_name' as argument, for consuming from that desired queue.
celery -A proj worker -l info -Q queue_name_1 -n worker1
celery -A proj worker -l info -Q queue_name_2 -n worker2

How to cancel conflicting / old tasks in Celery?

I'm using Celery + RabbitMQ.
When a Celery worker isn't available all the tasks are waiting in RabbitMQ.
Just as it becomes online all this bunch of tasks is executed immediately.
Can I somehow prevent it happening?
For example there are 100 tasks (the same) waiting for a Celery worker, can I execute only 1 of them when a Celery worker comes online?
Since all the tasks are the same in your queue, A better way to do this is to send the task only once, to do this you need to be able to track that the task was published, for example:
Using a lock, example: Ensuring a task is only executed one at a time
Using a custom task ID and a custom state after the task is published, for example:
To add a custom state when the task is published:
from celery import current_app
from celery.signals import after_task_publish
#after_task_publish.connect
def add_sent_state(sender=None, body=None, **kwargs):
"""Track Published Tasks."""
# get the task instance from its name
task = current_app.tasks.get(sender)
# if there is no task.backend fallback to app.backend
backend = task.backend if task else current_app.backend
# store the task state
backend.store_result(body['id'], None, 'SENT')
When you want to send the task you can check if the task has already been published, and since we're using a custom state the task's state won't be PENDING when it's published (which could be unkown) so we can check using:
from celery import states
# the task has a custom ID
task = task_func.AsyncResult('CUSTOM_ID')
if task.state != states.PENDING:
# the task already exists
else:
# send the task
task_func.apply_async(args, kwargs, task_id='CUSTOM_ID')
I'm using this approach in my app and it's working great, my tasks could be sent multiple times and they are identified by their IDs so this way each task is sent once.
If you're still want to cancel all the tasks in the queue you can use:
# import your Celery instance
from project.celery import app
app.control.purge()
Check the Celery FAQ How do I purge all waiting tasks ?
There are two ways to do this.
First, Run only one worker with a concurrency of one.
celery worker -A your_app -l info -c 1
This command starts a worker with a concurrency of one. So only one task will be executed at a time. This is the preferred way to do it.
Second method is bit complicated. You need to acquire lock and release the lock to make sure only one task is executed at a time.
Alternatively, if you want, you can remove all the tasks from queue using purge command.
celery -A your_app purge

How can I view the enqueued tasks in RabbitMQ?

I'm using RabbitMQ as my message broker and my workers are Celery tasks. I'm trying to diagnose an issue where I'm enqueue tasks to RabbitMQ but Celery doesn't pick then up.
Is there a way I can check what tasks are enqueued in RabbitMQ? I'd like to see the date and time when they are enqueued, any ETA is specified, the arguments and the task name.
I haven't been able to find this information in the docs — maybe I've overlooked it — and was hoping that some of you might know an easy way to inspect the task queue. Thanks.
You can use Flower to monitor tasks in real time.
https://github.com/mher/flower
Check out also rabbitmqclt command which inspects RabbitMQ server status:
http://www.rabbitmq.com/man/rabbitmqctl.1.man.html
rabbitmqctl list_queues
Also some celery tasks to monitor the queue:
http://docs.celeryproject.org/en/latest/userguide/monitoring.html
Check out these commands:
#shows status of all worker nodes
celery status
#List active tasks
celery inspect active
#Show worker statistics (call counts etc.)
celery inspect stats
I believe the command you are looking for is:
celery inspect reserved
The documentation[1] has the following description:
Reserved tasks are tasks that have been received, but are still waiting to be executed.
[1] http://docs.celeryproject.org/en/latest/userguide/workers.html?highlight=inspect%20reserved
As long as the management plugin is enabled, an arbitrary number of messages can be consumed from the queue and optionally requeued:
rabbitmqadmin get queue=queue_name requeue=true count=100

How to make celery retry using the same worker?

I'm just starting out with celery in a Django project, and am kinda stuck at this particular problem: Basically, I need to distribute a long-running task to different workers. The task is actually broken into several steps, each of which takes considerable time to complete. Therefore, if some step fails, I'd like celery to retry this task using the same worker to reuse the results from the completed steps. I understand that celery uses routing to distribute tasks to certain server, but I can't find anything about this particular problem. I use RabbitMQ as my broker.
You could have every celeryd instance consume from a queue named after the hostname of the worker:
celeryd -l info -n worker1.example.com -Q celery,worker1.example.com
sets the hostname to worker1.example.com and will consume from a queue named the same, as well as the default queue (named celery).
Then to direct a task to a specific worker you can use:
task.apply_async(args, kwargs, queue="worker1.example.com")
similary to direct a retry:
task.retry(queue="worker1.example.com")
or to direct the retry to the same worker:
task.retry(queue=task.request.hostname)

Categories