I am using Celery in a project, where I am using it as a scheduler( as periodic task).
My Celery task looks like:
#periodic_task(run_every=timedelta(seconds=300))
def update_all_feed():
feed_1()
feed_2()
...........
feed_n()
But as the number of feeds increases it is taking a long time to get to other feeds (e.g when Celery is working with feed number n it takes a long time to get to the next feed (n+1). I want to use Celery's concurrency to start multiple feeds.
After going through the docs, I found I can call a celery task like below:
feed.delay()
How can I configure celery so that it gets all the feed ids and aggregates them (e.g for example 5 feeds at a time)? I realize that to achieve this I will have to run Celery as daemon.
N.B: I am using mongodb as a broker, all I did was install it and add the url in Celery's config.
You can schedule all your feeds like this
#periodic_task(run_every=timedelta(seconds=300))
def update_all_feed():
feed_1.delay()
feed_2.delay()
.......
feed_n.delay()
or you can use a group so simplify it
from celery import group
#periodic_task(run_every=timedelta(seconds=300))
def update_all_feed():
group(feed.delay(i) for i in range(10))
Now to run the tasks you can start a worker to execute tasks
celery worker -A your_app -l info --beat
This starts executing your task every five minutes. However the default concurrency is equal to cores of your cpu. You can change the concurrency also. If you want to execute 10 tasks at a time concurrently then
celery worker -A your_app -l info --beat -c 10
From the Celery documentation:
from celery.task.sets import TaskSet
from .tasks import feed, get_feed_ids
job = TaskSet(tasks=[
feed.subtask((feed_id,)) for feed_id in get_feed_ids()
])
result = job.apply_async()
results = result.join() # There's more in the documentation
Related
I'm new to asynchronous tasks and I'm using django-celery and was hoping to use django-celery-beat to schedule periodic tasks.
However it looks like celery-beat doesn't pick up one-off tasks. Do I need two Celery instances, one as a worker for one off tasks and one as beat for scheduled tasks for this to work?
Pass -B parameter to your worker, it is a parameter to run beat schedule. This worker will do all other tasks, the ones sent from beat, and the "one-off" ones, it really doesn't matter for worker.
So the full command looks like:
celery -A flock.celery worker -l DEBUG -BE.
If you have multiple periodic tasks executing for example every 10 seconds, then they should all point to the same schedule object. please refer here
I have two workers:
celery worker -l info --concurrency=2 -A o_broker -n main_worker
celery worker -l info --concurrency=2 -A o_broker -n second_worker
I am using flower to monitor and receive API requests for these workers:
flower -A o_broker
to launch these celery workers from an API I use flower per the docs:
curl -X POST -d '{"args":[1,2]}' 'http://localhost:5555/api/task/async-apply/o_broker.add'
However, with this POST request it runs the task on either one of the workers. I need to choose to run a specific broker to complete the task.
How do I specify or set this up so I can choose what worker to use for the add task? If you have a solution using another API without flower, that would also work.
The easiest way to achieve this is with separate queues. Start worker with -Q first_worker,celery and the second broker with -Q second_worker,celery. celery is the default queue name in celery.
Now, when you want to send a task to just the first worker, you can route the task to the first_worker queue using celery's task_routes setting. You treat routing tasks to the second_worker queue symmetrically. You can also manually route a particular task call to a certain queue when using apply_async, e.g.:
add.apply_async(args=(1, 2), queue='first_worker')
n.b., last I checked, flower will only monitor one of your queues (by default it's the celery queue).
How do I set up a periodic task with Celerybeat and Flask that queries a database every hour?
The environment looks like this:
/
|-app
|-__init__.py
|-jobs
|-task.py
|-celery-beat.sh
|-celery-worker.sh
|-manage.py
I currently have a query function called run_query() located in task.py
I want the scheduler to kick in once the application initiates so I have the following lines in my /app/__init__.py folder:
celery = Celery()
#celery.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(1, app.jobs.task.run_query())
(For simplicity's sake, I've set it up so that if it runs, it will run every minute. No such luck yet.)
When I launch the celery-worker.sh it recognizes my function under the [tasks] heading. But the scheduled function never runs. I can manually force the function to run by issuing the following at the command prompt:
>> from app.jobs import task
>> task.run_query.delay()
EDIT: Added celerybeat.sh
As a follow up: If the database is accessed through a flask context, during my asynch function call is it wise to create a new flask context to access the database? Use the existing flask context? Or forget contexts altogether and just initiate a connection to the database? My worry is that if I just initiate a new connection it may interfere with the existing context's connection?
To run periodic tasks you need some kind of schduler (Eg. celery beat).
celery beat is a scheduler; It kicks off tasks at regular intervals, that are then executed by available worker nodes in the
cluster.
You have to ensure only a single scheduler is running for a schedule
at a time, otherwise you’d end up with duplicate tasks. Using a
centralized approach means the schedule doesn’t have to be
synchronized, and the service can operate without using locks.
Reference: periodic-tasks
You can invoke scheduler with command,
$ celery -A proj beat #different process from your worker
You can also embed beat inside the worker by enabling the workers -B
option, this is convenient if you’ll never run more than one worker
node, but it’s not commonly used and for that reason isn’t recommended
for production use Starting scheduler:
$ celery -A proj worker -B
Reference: celery starting scheduler
How do you limit the number of instances of a specific Celery task that can be ran simultaneously?
I have a task that processes large files. I'm running into a problem where a user may launch several tasks, causing the server to run out of CPU and memory as it tries to process too many files at once. I want to ensure that only N instances of this one type of task are ran at any given time, and that other tasks will sit queued in the scheduler until the others complete.
I see there's a rate_limit option in the task decorator, but I don't think this does what I want. If I'm understanding the docs correctly, this will just limit how quickly the tasks are launched, but it won't restrict the overall number of tasks running, so this will make my server will crash more slowly...but it will still crash nonetheless.
You have to setup extra queue and set desired concurrency level for it. From Routing Tasks:
# Old config style
CELERY_ROUTES = {
'app.tasks.limited_task': {'queue': 'limited_queue'}
}
or
from kombu import Exchange, Queue
celery.conf.task_queues = (
Queue('default', default_exchange, routing_key='default'),
Queue('limited_queue', default_exchange, routing_key='limited_queue')
)
And start extra worker, serving only limited_queue:
$ celery -A celery_app worker -Q limited_queue --loglevel=info -c 1 -n limited_queue
Then you can check everything running smoothly using Flower or inspect command:
$ celery -A celery_app worker inspect --help
What you can do is to push these tasks to a specific queue and have X number of workers processing them. Having two workers on a queue with 100 items will ensure that there will only be two tasks processed at the same time.
I am not sure you can do that in Celery, what you can do is check how many tasks of that name are currently running when a request arrives and if it exceeds the maximum either return an error or add a mechanism that periodically checks if there are open slots for the tasks and runs it (if you add such a mechanism, you don't need to double check, just at each request add it to it's queue.
In order to check running tasks, you can use the inspect command.
In short:
app = Celery(...)
i = app.control.inspect()
i.active()
I'm using Celery + RabbitMQ.
When a Celery worker isn't available all the tasks are waiting in RabbitMQ.
Just as it becomes online all this bunch of tasks is executed immediately.
Can I somehow prevent it happening?
For example there are 100 tasks (the same) waiting for a Celery worker, can I execute only 1 of them when a Celery worker comes online?
Since all the tasks are the same in your queue, A better way to do this is to send the task only once, to do this you need to be able to track that the task was published, for example:
Using a lock, example: Ensuring a task is only executed one at a time
Using a custom task ID and a custom state after the task is published, for example:
To add a custom state when the task is published:
from celery import current_app
from celery.signals import after_task_publish
#after_task_publish.connect
def add_sent_state(sender=None, body=None, **kwargs):
"""Track Published Tasks."""
# get the task instance from its name
task = current_app.tasks.get(sender)
# if there is no task.backend fallback to app.backend
backend = task.backend if task else current_app.backend
# store the task state
backend.store_result(body['id'], None, 'SENT')
When you want to send the task you can check if the task has already been published, and since we're using a custom state the task's state won't be PENDING when it's published (which could be unkown) so we can check using:
from celery import states
# the task has a custom ID
task = task_func.AsyncResult('CUSTOM_ID')
if task.state != states.PENDING:
# the task already exists
else:
# send the task
task_func.apply_async(args, kwargs, task_id='CUSTOM_ID')
I'm using this approach in my app and it's working great, my tasks could be sent multiple times and they are identified by their IDs so this way each task is sent once.
If you're still want to cancel all the tasks in the queue you can use:
# import your Celery instance
from project.celery import app
app.control.purge()
Check the Celery FAQ How do I purge all waiting tasks ?
There are two ways to do this.
First, Run only one worker with a concurrency of one.
celery worker -A your_app -l info -c 1
This command starts a worker with a concurrency of one. So only one task will be executed at a time. This is the preferred way to do it.
Second method is bit complicated. You need to acquire lock and release the lock to make sure only one task is executed at a time.
Alternatively, if you want, you can remove all the tasks from queue using purge command.
celery -A your_app purge