Celery Tasks with eta get removed from RabbitMQ - python

I'm using Django 1.6, RabbitMQ 3.5.6, celery 3.1.19.
There is a periodic task which runs every 30 seconds and creates 200 tasks with given eta parameter. After I run the celery worker, slowly the queue gets created in RabbitMQ and I see around 1200 scheduled tasks waiting to be fired. Then, I restart the celery worker and all of the waiting 1200 scheduled tasks get removed from RabbitMQ.
How I create tasks:
my_task.apply_async((arg1, arg2), eta=my_object.time_in_future)
I run the worker like this:
python manage.py celery worker -Q my_tasks_1 -A my_app -l
CELERY_ACKS_LATE is set to True in Django settings. I couldn't find any possible reason.
Should I run the worker with a different configuration/flag/parameter? Any idea?

As far as I know Celery does not rely on RabbitMQ's scheduled queues. It implements ETA/Countdown internally.
It seems that you have enough workers that are able to fetch enough messages and schedule them internally.
Mind that you don't need 200 workers. You have the prefetch multiplier set to the default value so you need less.

Related

Django Celery & Django-Celery-Beat

I'm new to asynchronous tasks and I'm using django-celery and was hoping to use django-celery-beat to schedule periodic tasks.
However it looks like celery-beat doesn't pick up one-off tasks. Do I need two Celery instances, one as a worker for one off tasks and one as beat for scheduled tasks for this to work?
Pass -B parameter to your worker, it is a parameter to run beat schedule. This worker will do all other tasks, the ones sent from beat, and the "one-off" ones, it really doesn't matter for worker.
So the full command looks like:
celery -A flock.celery worker -l DEBUG -BE.
If you have multiple periodic tasks executing for example every 10 seconds, then they should all point to the same schedule object. please refer here

Synchronous celery queues

I have an app where each user is able to create tasks, and each task the user creates is added to a dynamic queue for the specific user. So all tasks from User1 are added to User1_queue, User2 to User2_queue, etc.
What I need to happen is when User1 adds Task1, Task2, and Task3 to their queue, Task1 is executed and Celery waits until it is finished before it executes Task2, and so on.
Having them execute along side each other from multiple queues is fine, so Task1 from both User1_queue, and Task1 from User2_queue. Its just limiting Celery to synchronously execute tasks in a queue in the order they're added.
Is it possible to have Celery have a concurrency of 1 per queue so that tasks are not executed alongside each other in the same queue?
If it helps anyone that visits this question, I solved my problem by setting up multiple workers with a concurrency of 1, each on a unique queue. I then used some logic in my Django app to store a queue name per session for the active user.
Down the line I'll add extra logic to select the 'least busy' worker, and try to evenly spread users across the active workers. But for now it is working perfectly.
Flower, the monitoring tool for Celery, was also a huge help while trying to figure this out.
You can select a queue -Q option for each worker to work on with --concurrency=1
celery -A proj worker --concurrency=1 -n user1#%h -Q User1_queue
#JamesFoley Although following could be implemented on celery side, for now it is not ( https://github.com/celery/celery/issues/1599 )
Some of ideas:
dynamically spawn/control celery workers
singular beat task to decide which tasks can be spawned to run ( lock/mutex or DB table monitoring tasks)

Stop celery workers to consume from all queues

Cheers,
I have a celery setup running in a production environment (on Linux) where I need to consume two different task types from two dedicated queues (one for each). The problem that arises is, that all workers are always bound to both queues, even when I specify them to only consume from one of them.
TL;DR
Celery running with 2 queues
Messages are published in correct queue as designed
Workers keep consuming both queues
Leads to deadlock
General Information
Think of my two different task types as a hierarchical setup:
A task is a regular celery task that may take quite some time, because it dynamically dispatches other celery tasks and may be required to chain through their respective results
A node is a dynamically dispatched sub-task, which also is a regular celery task but itself can be considered an atomic unit.
My task thus can be a more complex setup of nodes where the results of one or more nodes serves as input for one or more subsequent nodes, and so on. Since my tasks can take longer and will only finish when all their nodes have been deployed, it is essential that they are handled by dedicated workers to keep a sufficient number of workers free to consume the nodes. Otherwise, this could lead to the system being stuck, when a lot of tasks are dispatched, each consumed by another worker, and their respective nodes are only queued but will never be consumed, because all workers are blocked.
If this is a bad design in general, please make any propositions on how I can improve it. I did not yet manage to build one of these processes using celery's built-in canvas primitives. Help me, if you can?!
Configuration/Setup
I run celery with amqp and have set up the following queues and routes in the celery configuration:
CELERY_QUERUES = (
Queue('prod_nodes', Exchange('prod'), routing_key='prod.node'),
Queue('prod_tasks', Exchange('prod'), routing_key='prod.task')
)
CELERY_ROUTES = (
'deploy_node': {'queue': 'prod_nodes', 'routing_key': 'prod.node'},
'deploy_task': {'queue': 'prod_tasks', 'routing_key': 'prod.task'}
)
When I launch my workers, I issue a call similar to the following:
celery multi start w_task_01 w_node_01 w_node_02 -A my.deployment.system \
-E -l INFO -P gevent -Q:1 prod_tasks -Q:2-3 prod_nodes -c 4 --autoreload \
--logfile=/my/path/to/log/%N.log --pidfile=/my/path/to/pid/%N.pid
The Problem
My queue and routing setup seems to work properly, as I can see messages being correctly queued in the RabbitMQ Management web UI.
However, all workers always consume celery tasks from both queues. I can see this when I start and open up the flower web UI and inspect one of the deployed tasks, where e.g. w_node_01 starts consuming messages from the prod_tasks queue, even though it shouldn't.
The RabbitMQ Management web UI furthermore tells me, that all started workers are set up as consumers for both queues.
Thus, I ask you...
... what did I do wrong?
Where is the issue with my setup or worker start call; How can I circumvent the problem of workers always consuming from both queues; Do I really have to make additional settings during runtime (what I certainly do not want)?
Thanks for your time and answers!
You can create 2 separate workers for each queue and each one's define what queue it should get tasks from using the -Q command line argument.
If you want to keep the number processes the same, by default a process is opened for each core for each worker you can use the --concurrency flag (See Celery docs for more info)
Celery allows configuring a worker with a specific queue.
1) Specify the name of the queue with 'queue' attribute for different types of jobs
celery.send_task('job_type1', args=[], kwargs={}, queue='queue_name_1')
celery.send_task('job_type2', args=[], kwargs={}, queue='queue_name_2')
2) Add the following entry in configuration file
CELERY_CREATE_MISSING_QUEUES = True
3) On starting the worker, pass -Q 'queue_name' as argument, for consuming from that desired queue.
celery -A proj worker -l info -Q queue_name_1 -n worker1
celery -A proj worker -l info -Q queue_name_2 -n worker2

How to limit the maximum number of running Celery tasks by name

How do you limit the number of instances of a specific Celery task that can be ran simultaneously?
I have a task that processes large files. I'm running into a problem where a user may launch several tasks, causing the server to run out of CPU and memory as it tries to process too many files at once. I want to ensure that only N instances of this one type of task are ran at any given time, and that other tasks will sit queued in the scheduler until the others complete.
I see there's a rate_limit option in the task decorator, but I don't think this does what I want. If I'm understanding the docs correctly, this will just limit how quickly the tasks are launched, but it won't restrict the overall number of tasks running, so this will make my server will crash more slowly...but it will still crash nonetheless.
You have to setup extra queue and set desired concurrency level for it. From Routing Tasks:
# Old config style
CELERY_ROUTES = {
'app.tasks.limited_task': {'queue': 'limited_queue'}
}
or
from kombu import Exchange, Queue
celery.conf.task_queues = (
Queue('default', default_exchange, routing_key='default'),
Queue('limited_queue', default_exchange, routing_key='limited_queue')
)
And start extra worker, serving only limited_queue:
$ celery -A celery_app worker -Q limited_queue --loglevel=info -c 1 -n limited_queue
Then you can check everything running smoothly using Flower or inspect command:
$ celery -A celery_app worker inspect --help
What you can do is to push these tasks to a specific queue and have X number of workers processing them. Having two workers on a queue with 100 items will ensure that there will only be two tasks processed at the same time.
I am not sure you can do that in Celery, what you can do is check how many tasks of that name are currently running when a request arrives and if it exceeds the maximum either return an error or add a mechanism that periodically checks if there are open slots for the tasks and runs it (if you add such a mechanism, you don't need to double check, just at each request add it to it's queue.
In order to check running tasks, you can use the inspect command.
In short:
app = Celery(...)
i = app.control.inspect()
i.active()

How to make celery retry using the same worker?

I'm just starting out with celery in a Django project, and am kinda stuck at this particular problem: Basically, I need to distribute a long-running task to different workers. The task is actually broken into several steps, each of which takes considerable time to complete. Therefore, if some step fails, I'd like celery to retry this task using the same worker to reuse the results from the completed steps. I understand that celery uses routing to distribute tasks to certain server, but I can't find anything about this particular problem. I use RabbitMQ as my broker.
You could have every celeryd instance consume from a queue named after the hostname of the worker:
celeryd -l info -n worker1.example.com -Q celery,worker1.example.com
sets the hostname to worker1.example.com and will consume from a queue named the same, as well as the default queue (named celery).
Then to direct a task to a specific worker you can use:
task.apply_async(args, kwargs, queue="worker1.example.com")
similary to direct a retry:
task.retry(queue="worker1.example.com")
or to direct the retry to the same worker:
task.retry(queue=task.request.hostname)

Categories