Django Celery & Django-Celery-Beat - python

I'm new to asynchronous tasks and I'm using django-celery and was hoping to use django-celery-beat to schedule periodic tasks.
However it looks like celery-beat doesn't pick up one-off tasks. Do I need two Celery instances, one as a worker for one off tasks and one as beat for scheduled tasks for this to work?

Pass -B parameter to your worker, it is a parameter to run beat schedule. This worker will do all other tasks, the ones sent from beat, and the "one-off" ones, it really doesn't matter for worker.
So the full command looks like:
celery -A flock.celery worker -l DEBUG -BE.

If you have multiple periodic tasks executing for example every 10 seconds, then they should all point to the same schedule object. please refer here

Related

Setup celery periodic task

How do I set up a periodic task with Celerybeat and Flask that queries a database every hour?
The environment looks like this:
/
|-app
|-__init__.py
|-jobs
|-task.py
|-celery-beat.sh
|-celery-worker.sh
|-manage.py
I currently have a query function called run_query() located in task.py
I want the scheduler to kick in once the application initiates so I have the following lines in my /app/__init__.py folder:
celery = Celery()
#celery.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(1, app.jobs.task.run_query())
(For simplicity's sake, I've set it up so that if it runs, it will run every minute. No such luck yet.)
When I launch the celery-worker.sh it recognizes my function under the [tasks] heading. But the scheduled function never runs. I can manually force the function to run by issuing the following at the command prompt:
>> from app.jobs import task
>> task.run_query.delay()
EDIT: Added celerybeat.sh
As a follow up: If the database is accessed through a flask context, during my asynch function call is it wise to create a new flask context to access the database? Use the existing flask context? Or forget contexts altogether and just initiate a connection to the database? My worry is that if I just initiate a new connection it may interfere with the existing context's connection?
To run periodic tasks you need some kind of schduler (Eg. celery beat).
celery beat is a scheduler; It kicks off tasks at regular intervals, that are then executed by available worker nodes in the
cluster.
You have to ensure only a single scheduler is running for a schedule
at a time, otherwise you’d end up with duplicate tasks. Using a
centralized approach means the schedule doesn’t have to be
synchronized, and the service can operate without using locks.
Reference: periodic-tasks
You can invoke scheduler with command,
$ celery -A proj beat #different process from your worker
You can also embed beat inside the worker by enabling the workers -B
option, this is convenient if you’ll never run more than one worker
node, but it’s not commonly used and for that reason isn’t recommended
for production use Starting scheduler:
$ celery -A proj worker -B
Reference: celery starting scheduler

Stop celery workers to consume from all queues

Cheers,
I have a celery setup running in a production environment (on Linux) where I need to consume two different task types from two dedicated queues (one for each). The problem that arises is, that all workers are always bound to both queues, even when I specify them to only consume from one of them.
TL;DR
Celery running with 2 queues
Messages are published in correct queue as designed
Workers keep consuming both queues
Leads to deadlock
General Information
Think of my two different task types as a hierarchical setup:
A task is a regular celery task that may take quite some time, because it dynamically dispatches other celery tasks and may be required to chain through their respective results
A node is a dynamically dispatched sub-task, which also is a regular celery task but itself can be considered an atomic unit.
My task thus can be a more complex setup of nodes where the results of one or more nodes serves as input for one or more subsequent nodes, and so on. Since my tasks can take longer and will only finish when all their nodes have been deployed, it is essential that they are handled by dedicated workers to keep a sufficient number of workers free to consume the nodes. Otherwise, this could lead to the system being stuck, when a lot of tasks are dispatched, each consumed by another worker, and their respective nodes are only queued but will never be consumed, because all workers are blocked.
If this is a bad design in general, please make any propositions on how I can improve it. I did not yet manage to build one of these processes using celery's built-in canvas primitives. Help me, if you can?!
Configuration/Setup
I run celery with amqp and have set up the following queues and routes in the celery configuration:
CELERY_QUERUES = (
Queue('prod_nodes', Exchange('prod'), routing_key='prod.node'),
Queue('prod_tasks', Exchange('prod'), routing_key='prod.task')
)
CELERY_ROUTES = (
'deploy_node': {'queue': 'prod_nodes', 'routing_key': 'prod.node'},
'deploy_task': {'queue': 'prod_tasks', 'routing_key': 'prod.task'}
)
When I launch my workers, I issue a call similar to the following:
celery multi start w_task_01 w_node_01 w_node_02 -A my.deployment.system \
-E -l INFO -P gevent -Q:1 prod_tasks -Q:2-3 prod_nodes -c 4 --autoreload \
--logfile=/my/path/to/log/%N.log --pidfile=/my/path/to/pid/%N.pid
The Problem
My queue and routing setup seems to work properly, as I can see messages being correctly queued in the RabbitMQ Management web UI.
However, all workers always consume celery tasks from both queues. I can see this when I start and open up the flower web UI and inspect one of the deployed tasks, where e.g. w_node_01 starts consuming messages from the prod_tasks queue, even though it shouldn't.
The RabbitMQ Management web UI furthermore tells me, that all started workers are set up as consumers for both queues.
Thus, I ask you...
... what did I do wrong?
Where is the issue with my setup or worker start call; How can I circumvent the problem of workers always consuming from both queues; Do I really have to make additional settings during runtime (what I certainly do not want)?
Thanks for your time and answers!
You can create 2 separate workers for each queue and each one's define what queue it should get tasks from using the -Q command line argument.
If you want to keep the number processes the same, by default a process is opened for each core for each worker you can use the --concurrency flag (See Celery docs for more info)
Celery allows configuring a worker with a specific queue.
1) Specify the name of the queue with 'queue' attribute for different types of jobs
celery.send_task('job_type1', args=[], kwargs={}, queue='queue_name_1')
celery.send_task('job_type2', args=[], kwargs={}, queue='queue_name_2')
2) Add the following entry in configuration file
CELERY_CREATE_MISSING_QUEUES = True
3) On starting the worker, pass -Q 'queue_name' as argument, for consuming from that desired queue.
celery -A proj worker -l info -Q queue_name_1 -n worker1
celery -A proj worker -l info -Q queue_name_2 -n worker2

Celery Tasks with eta get removed from RabbitMQ

I'm using Django 1.6, RabbitMQ 3.5.6, celery 3.1.19.
There is a periodic task which runs every 30 seconds and creates 200 tasks with given eta parameter. After I run the celery worker, slowly the queue gets created in RabbitMQ and I see around 1200 scheduled tasks waiting to be fired. Then, I restart the celery worker and all of the waiting 1200 scheduled tasks get removed from RabbitMQ.
How I create tasks:
my_task.apply_async((arg1, arg2), eta=my_object.time_in_future)
I run the worker like this:
python manage.py celery worker -Q my_tasks_1 -A my_app -l
CELERY_ACKS_LATE is set to True in Django settings. I couldn't find any possible reason.
Should I run the worker with a different configuration/flag/parameter? Any idea?
As far as I know Celery does not rely on RabbitMQ's scheduled queues. It implements ETA/Countdown internally.
It seems that you have enough workers that are able to fetch enough messages and schedule them internally.
Mind that you don't need 200 workers. You have the prefetch multiplier set to the default value so you need less.

How can I configure celery task

I am using Celery in a project, where I am using it as a scheduler( as periodic task).
My Celery task looks like:
#periodic_task(run_every=timedelta(seconds=300))
def update_all_feed():
feed_1()
feed_2()
...........
feed_n()
But as the number of feeds increases it is taking a long time to get to other feeds (e.g when Celery is working with feed number n it takes a long time to get to the next feed (n+1). I want to use Celery's concurrency to start multiple feeds.
After going through the docs, I found I can call a celery task like below:
feed.delay()
How can I configure celery so that it gets all the feed ids and aggregates them (e.g for example 5 feeds at a time)? I realize that to achieve this I will have to run Celery as daemon.
N.B: I am using mongodb as a broker, all I did was install it and add the url in Celery's config.
You can schedule all your feeds like this
#periodic_task(run_every=timedelta(seconds=300))
def update_all_feed():
feed_1.delay()
feed_2.delay()
.......
feed_n.delay()
or you can use a group so simplify it
from celery import group
#periodic_task(run_every=timedelta(seconds=300))
def update_all_feed():
group(feed.delay(i) for i in range(10))
Now to run the tasks you can start a worker to execute tasks
celery worker -A your_app -l info --beat
This starts executing your task every five minutes. However the default concurrency is equal to cores of your cpu. You can change the concurrency also. If you want to execute 10 tasks at a time concurrently then
celery worker -A your_app -l info --beat -c 10
From the Celery documentation:
from celery.task.sets import TaskSet
from .tasks import feed, get_feed_ids
job = TaskSet(tasks=[
feed.subtask((feed_id,)) for feed_id in get_feed_ids()
])
result = job.apply_async()
results = result.join() # There's more in the documentation

How to delete tasks from celery task queue?

How can i delete all tasks in a queue, right after a task ended?
I want something like this (Deleting all pending tasks in celery / rabbitmq) but for celery 3.0.
Thanks
EDIT:
From celery documentation:
http://docs.celeryproject.org/en/latest/faq.html#how-do-i-purge-all-waiting-tasks
My code looks like:
from celery import current_app as celery
#task
def task_a():
celery.control.purge()
I was expecting that, if i issued 5 tasks, only the first would run. Somehow, i'ts not doind that.
Thanks
Those tasks might have been already prefetched by workers. To find out is this so, try to run amount of tasks more than active workers multplied by prefetch multiplier (see below), and check what result is returned by celery.control.purge(). You can control amount of prefetched tasks using config parameters CELERYD_PREFETCH_MULTIPLIER and CELERY_ACKS_LATE.

Categories