I am new to Python and Celery-Redis, so Please correct me if my understanding is incorrect.
I have been debugging a code base Which has structure like -
TaskClass -> Celery Task
HandlerClass1, HandlerClass2 -> These are python classes extending Object class
The application creates TaskClass say dumyTask instance and dumyTask creates celery subtasks(I believe these subtasks are unique) say dumySubTask1, dumySubTask2 by taking signatures of handler.
What I am not able to understand?
1) How does celery manages the results of dumySubTask1, dumySubTask2 and dumyTask? I mean the results of dumySubTask1 and dumySubTask2 should be aggregated and given as result of dumyTask. How does Celery-Redis manage this?
2) once the task is executed how does celery stores tasks results in backend? I mean will the result of dumySubTask1 and dumySubTask2 be stored in backend and then results returned to dumyTask and then dumyTask return results to QUEUE(Please correct if I am wrong)?
3) Does Celery maintains Tasks and subtasks as STACK? Please see snapshot.Task-SubTask Tree
Any guidance is highly appreciated. Thanks.
celery worker can invoke 'tasks' . This 'task' can have 'subtasks' which can be 'chained' together i.e invokes sequentially. 'chain' is the term specifically used in celery canvas guides. The result is then returned to the queue in redis.
celery worker are use to invoke 'independent tasks' mostly used for 'network use cases' i.e 'sending email','hitting url'
You need to get it from the celery instance with
task = app_celery.AsyncResult(task_id)
Full example - below
My celery_worker.py file is:
import os
import time
from celery import Celery
from dotenv import load_dotenv
load_dotenv(".env")
celery = Celery(__name__)
celery.conf.broker_url = os.environ.get("CELERY_BROKER_URL")
celery.conf.result_backend = os.environ.get("CELERY_RESULT_BACKEND")
#celery.task(name="create_task")
def create_task(a, b, c):
print(f"Executing create_task it will take {a}")
[print(i) for i in range(100)]
time.sleep(a)
return b + c
I'm using FastAPI my endpoints are:
# To execute the task
#app.get("/sum")
async def root(sleep_time: int, first_number: int, second_number: int):
process = create_task.delay(sleep_time, first_number, second_number)
return {"process_id": process.task_id, "result": process.result}
# To get the task status and result
from celery_worker import create_task, celery
#app.get("/task/{task_id}")
async def check_task_status(task_id: str):
task = celery.AsyncResult(task_id)
return {"status": task.status, "result": task.result}
My .env file has:
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/0
Related
I would like to use Django signals to trigger a celery task like so:
def delete_content(sender, instance, **kwargs):
task_id = uuid()
task = delete_libera_contents.apply_async(kwargs={"instance": instance}, task_id=task_id)
task.wait(timeout=300, interval=2)
But I'm always running into kombu.exceptions.EncodeError: Object of type MusicTracks is not JSON serializable
Now I'm not sure how to tread MusicTracks instance as it's a model class instance. How can I properly pass such instances to my task?
At my tasks.py I have the following:
#app.task(name="Delete Libera Contents", queue='high_priority_tasks')
def delete_libera_contents(instance, **kwargs):
libera_backend = instance.file.libera_backend
...
Never send instance in celery task, you only should send variables for example instanse primary key and then inside of the celery task via this pk find this instance and then do your logic
your code should be like this:
views.py
def delete_content(sender, **kwargs):
task_id = uuid()
task = delete_libera_contents.apply_async(kwargs={"instance_pk": sender.pk}, task_id=task_id)
task.wait(timeout=300, interval=2)
task.py
#app.task(name="Delete Libera Contents", queue='high_priority_tasks')
def delete_libera_contents(instance_pk, **kwargs):
instance = Instance.ojbects.get(pk = instance_pk)
libera_backend = instance.file.libera_backend
...
you can find this rule in celery documentation (can't find link), one of
reasons imagine situation:
you send your instance to celery tasks (it is delayed for any reason for 5 min)
then your project makes logic with this instance, before your task finished
then celery's task time come and it uses this instance old version, and this instance become corrupted
(this is the reason as I think it is, not from the documentation)
First off, sorry for making the question a bit confusing, especially for the people that have already written an answer.
In my case, the delete_content signal can be trigger from three different models, so it actually looks like this:
#receiver(pre_delete, sender=MusicTracks)
#receiver(pre_delete, sender=Movies)
#receiver(pre_delete, sender=TvShowEpisodes)
def delete_content(sender, instance, **kwargs):
delete_libera_contents.delay(instance_pk=instance.pk)
So every time one of these models triggers a delete action, this signal will also trigger a celery task to actually delete the stuff in the background (all stored on S3).
As I cannot and should not pass instances around directly as pointed out by #oruchkin, I pass the instance.pk to the celery task which I then have to find in the celery task as I don't know in the celery task what model has triggered the delete action:
#app.task(name="Delete Libera Contents", queue='high_priority_tasks')
def delete_libera_contents(instance_pk, **kwargs):
if Movies.objects.filter(pk=instance_pk).exists():
instance = Movies.objects.get(pk=instance_pk)
elif MusicTracks.objects.filter(pk=instance_pk).exists():
instance = MusicTracks.objects.get(pk=instance_pk)
elif TvShowEpisodes.objects.filter(pk=instance_pk).exists():
instance = TvShowEpisodes.objects.get(pk=instance_pk)
else:
raise logger.exception("Task: 'Delete Libera Contents', reports: No instance found (code: JFN4LK) - Warning")
libera_backend = instance.file.libera_backend
You might ask why do you not simply pass the sender from the signal to the celery task. I also tried this and again, as already pointed out, I cannot pass instances and I fail with:
kombu.exceptions.EncodeError: Object of type ModelBase is not JSON serializable
So it really seems I have to hard obtain the instance using the if-elif-else clauses at the celery task.
I am using celery to execute my asynchronous tasks and what i'm trying to achieve is get the name and the id of each task in the work flow after i executed it.
exec_workflow = chain(
task1.si(),
task2.si(),
task3.si()
)
result = exec_workflow.apply_async()
tasks = []
for t in result._parents():
tasks.append({"id": t.id, "name": t.name})
but it seems like AsyncResult does not have the name property for some strange reason. any idea on what would be the appropriate way to do this?
A different approach to this maybe to force an id on each task before i execute apply_async and this would solve my problem cause i will be able to match id to task name. but i'm not sure if its possible.
Thanks.
Not the best solution but it works.
result = signature.apply_async()
result._cache['task_name']
#'procedures.tasks.stop'
There is a configuration option result_extended in Celery for this purpose (it is set to False by default).
Enables extended task result attributes (name, args, kwargs, worker, retries, queue, delivery_info) to be written to backend.
Ref.:
https://docs.celeryproject.org/en/master/userguide/configuration.html#result-extended
Consumer example (Worker)
from typing import Final
from celery import Celery
app: Final = Celery(
broker="amqp://...",
result_backend="redis://...",
result_extended=True,
)
#app.task(
name="foo-service:bar"
)
def _() -> int:
return 42
Producer example (Client)
from pprint import pprint
from typing import Final
from celery import Celery
from celery.result import AsyncResult
app: Final = Celery(broker="amqp://...", result_backend="redis://...")
result: AsyncResult = app.send_task("foo-service:bar")
assert result.get() == 42
assert result.name == "foo-service:bar"
assert result.queue == ...
assert result.args == ...
assert result.kwargs == ...
assert result.worker == ...
pprint(result.__dict__)
Alright so I've solved my problem. What i did eventually was to just set the id property of each task.
I'm a newcomer to celery and I try to integrate this task queue into my project but I still don't figure out how celery handles the failed tasks and I'd like to keep all those in a amqp dead-letter queue.
According to the doc here it seems that raising Reject in a Task having acks_late enabled produces the same effect as acking the message and then we have a few words about dead-letter queues.
So I added a custom default queue to my celery config
celery_app.conf.update(CELERY_ACCEPT_CONTENT=['application/json'],
CELERY_TASK_SERIALIZER='json',
CELERY_QUEUES=[CELERY_QUEUE,
CELERY_DLX_QUEUE],
CELERY_DEFAULT_QUEUE=CELERY_QUEUE_NAME,
CELERY_DEFAULT_EXCHANGE=CELERY_EXCHANGE
)
and my kombu objects are looking like
CELERY_DLX_EXCHANGE = Exchange(CELERY_DLX_EXCHANGE_NAME, type='direct')
CELERY_DLX_QUEUE = Queue(CELERY_DLX_QUEUE_NAME, exchange=DLX_EXCHANGE,
routing_key='celery-dlq')
DEAD_LETTER_CELERY_OPTIONS = {'x-dead-letter-exchange': CELERY_DLX_EXCHANGE_NAME,
'x-dead-letter-routing-key': 'celery-dlq'}
CELERY_EXCHANGE = Exchange(CELERY_EXCHANGE_NAME,
arguments=DEAD_LETTER_CELERY_OPTIONS,
type='direct')
CELERY_QUEUE = Queue(CELERY_QUEUE_NAME,
exchange=CELERY_EXCHANGE,
routing_key='celery-q')
And the task I'm executing is:
class HookTask(Task):
acks_late = True
def run(self, ctx, data):
logger.info('{0} starting {1.name}[{1.request.id}]'.format(self.__class__.__name__.upper(), self))
self.hook_process(ctx, data)
def on_failure(self, exc, task_id, args, kwargs, einfo):
logger.error('task_id %s failed, message: %s', task_id, exc.message)
def hook_process(self, t_ctx, body):
# Build context
ctx = TaskContext(self.request, t_ctx)
logger.info('Task_id: %s, handling request %s', ctx.task_id, ctx.req_id)
raise Reject('no_reason', requeue=False)
I made a little test with it but with no results when raising a Reject exception.
Now I'm wondering if it's a good idea to force the failed task route to the dead-letter queue by overriding the Task.on_failure. I think this would work but I also think that this solution is not so clean because according to what I red celery should do this all alone.
Thanks for your help.
I think you should not add arguments=DEAD_LETTER_CELERY_OPTIONS in CELERY_EXCHANGE. You should add it to CELERY_QUEUE with queue_arguments=DEAD_LETTER_CELERY_OPTIONS.
The following example is what I did and it works fine:
from celery import Celery
from kombu import Exchange, Queue
from celery.exceptions import Reject
app = Celery(
'tasks',
broker='amqp://guest#localhost:5672//',
backend='redis://localhost:6379/0')
dead_letter_queue_option = {
'x-dead-letter-exchange': 'dlx',
'x-dead-letter-routing-key': 'dead_letter'
}
default_exchange = Exchange('default', type='direct')
dlx_exchange = Exchange('dlx', type='direct')
default_queue = Queue(
'default',
default_exchange,
routing_key='default',
queue_arguments=dead_letter_queue_option)
dead_letter_queue = Queue(
'dead_letter', dlx_exchange, routing_key='dead_letter')
app.conf.task_queues = (default_queue, dead_letter_queue)
app.conf.task_default_queue = 'default'
app.conf.task_default_exchange = 'default'
app.conf.task_default_routing_key = 'default'
#app.task
def add(x, y):
return x + y
#app.task(acks_late=True)
def div(x, y):
try:
z = x / y
return z
except ZeroDivisionError as exc:
raise Reject(exc, requeue=False)
After the creation of queue, you should see that on the 'Features' column, it shows DLX (dead-letter-exchange) and DLK (dead-letter-routing-key) labels.
NOTE: You should delete the previous queues, if you have already created them in RabbitMQ. This is because celery won't delete the existing queue and re-create a new one.
I am having a similar case and I faced the same problems. I also wanted a solution that was based on configuration and not hard coded values. The proposed solution of Hengfeng Li was very helpfull and helped me understand the mechanism and the concepts. But there was a problem with the declaration of dead-letter queues. Specifically if you injected the DLQ in the task_default_queues, the Celery was consuming the queue and it was always empty. So a manual way of declaring DL(X/Q) was needed.
I used Celery's Bootsteps as they provide a good control on the stage that the code was run. My initial experiment was to create them exactly after the app creation but this created stalled connection after the forking of processes and it created an ugly exception. With a bootstep that runs exactly after the Pool step you can be guaranteed that it runs in the begining of each worker after it is forked and the connection pool is ready.
Finally I created a decorator that converts uncaught exceptions to task rejections by reraising with celery's Reject. Special care is taken for cases where a task is already decided on how to be handled, such as retries.
Here is a full working example. Try to run the task div.delay(1, 0) and see how it works.
from celery import Celery
from celery.exceptions import Reject, TaskPredicate
from functools import wraps
from kombu import Exchange, Queue
from celery import bootsteps
class Config(object):
APP_NAME = 'test'
task_default_queue = '%s_celery' % APP_NAME
task_default_exchange = "%s_celery" % APP_NAME
task_default_exchange_type = 'direct'
task_default_routing_key = task_default_queue
task_create_missing_queues = False
task_acks_late = True
# Configuration for DLQ support
dead_letter_exchange = '%s_dlx' % APP_NAME
dead_letter_exchange_type = 'direct'
dead_letter_queue = '%s_dlq' % APP_NAME
dead_letter_routing_key = dead_letter_queue
class DeclareDLXnDLQ(bootsteps.StartStopStep):
"""
Celery Bootstep to declare the DL exchange and queues before the worker starts
processing tasks
"""
requires = {'celery.worker.components:Pool'}
def start(self, worker):
app = worker.app
# Declare DLX and DLQ
dlx = Exchange(
app.conf.dead_letter_exchange,
type=app.conf.dead_letter_exchange_type)
dead_letter_queue = Queue(
app.conf.dead_letter_queue,
dlx,
routing_key=app.conf.dead_letter_routing_key)
with worker.app.pool.acquire() as conn:
dead_letter_queue.bind(conn).declare()
app = Celery('tasks', broker='pyamqp://guest#localhost//')
app.config_from_object(Config)
# Declare default queues
# We bypass the default mechanism tha creates queues in order to declare special queue arguments for DLX support
default_exchange = Exchange(
app.conf.task_default_exchange,
type=app.conf.task_default_exchange_type)
default_queue = Queue(
app.conf.task_default_queue,
default_exchange,
routing_key=app.conf.task_default_routing_key,
queue_arguments={
'x-dead-letter-exchange': app.conf.dead_letter_exchange,
'x-dead-letter-routing-key': app.conf.dead_letter_routing_key
})
# Inject the default queue in celery application
app.conf.task_queues = (default_queue,)
# Inject extra bootstep that declares DLX and DLQ
app.steps['worker'].add(DeclareDLXnDLQ)
def onfailure_reject(requeue=False):
"""
When a task has failed it will raise a Reject exception so
that the message will be requeued or marked for insertation in Dead Letter Exchange
"""
def _decorator(f):
#wraps(f)
def _wrapper(*args, **kwargs):
try:
return f(*args, **kwargs)
except TaskPredicate:
raise # Do not handle TaskPredicate like Retry or Reject
except Exception as e:
print("Rejecting")
raise Reject(str(e), requeue=requeue)
return _wrapper
return _decorator
#app.task()
#onfailure_reject()
def div(x, y):
return x / y
Edit: I updated the code to use the new configuration schema of celery (lower-case) as I found some compatibility issues in Celery 4.1.0.
I'm trying to get the state of a task as follow :
__init__.py
celery = Celery(app.name,backend='amqp',broker=app.config['CELERY_BROKER_URL'])
celery.conf.update(app.config)
foo.py
class Foo(object):
def bar(self):
task = self._bar_async.apply_async()
return task.id
#celery.task(filter=task_method,bind=True)
def _bar_async(task,self):
for i in range(0,100):
task.update_state(state='PROGRESS',meta={'progress':i})
time.sleep(2)
taskstatus.py
def taskstatus(task_id):
task = celery.AsyncResult(id=task_id)
Is it the recommended way to use update_state with bind ?
Also when I try to get the state of the task using taskstatus, I always get NoneType for task. What is the problem ?
There are two issues in your code
Firstly, add an argument self to apply_async method
def bar(self):
task = self._bar_async.apply_async([self])
This change will fix the get NoneType for task issue. The reason is the task will be failed in worker, so you could not get the result.
Secondly, should use app.backend.get_result in taskstatus() to see the progress instead of AsyncResult since AsyncResult.get() will block until the task status become ready.
from apps import celery
app = celery.app
r = app.backend.get_result(task_id)
print r
The Celery documentation mentions testing Celery within Django but doesn't explain how to test a Celery task if you are not using Django. How do you do this?
It is possible to test tasks synchronously using any unittest lib out there. I normaly do 2 different test sessions when working with celery tasks. The first one (as I'm suggesting bellow) is completely synchronous and should be the one that makes sure the algorithm does what it should do. The second session uses the whole system (including the broker) and makes sure I'm not having serialization issues or any other distribution, comunication problem.
So:
from celery import Celery
celery = Celery()
#celery.task
def add(x, y):
return x + y
And your test:
from nose.tools import eq_
def test_add_task():
rst = add.apply(args=(4, 4)).get()
eq_(rst, 8)
Here is an update to my seven years old answer:
You can run a worker in a separate thread via a pytest fixture:
https://docs.celeryq.dev/en/v5.2.6/userguide/testing.html#celery-worker-embed-live-worker
According to the docs, you should not use "always_eager" (see the top of the page of the above link).
Old answer:
I use this:
with mock.patch('celeryconfig.CELERY_ALWAYS_EAGER', True, create=True):
...
Docs: https://docs.celeryq.dev/en/3.1/configuration.html#celery-always-eager
CELERY_ALWAYS_EAGER lets you run your task synchronously, and you don't need a celery server.
Depends on what exactly you want to be testing.
Test the task code directly. Don't call "task.delay(...)" just call "task(...)" from your unit tests.
Use CELERY_ALWAYS_EAGER. This will cause your tasks to be called immediately at the point you say "task.delay(...)", so you can test the whole path (but not any asynchronous behavior).
For those on Celery 4 it's:
#override_settings(CELERY_TASK_ALWAYS_EAGER=True)
Because the settings names have been changed and need updating if you choose to upgrade, see
https://docs.celeryproject.org/en/latest/history/whatsnew-4.0.html?highlight=what%20is%20new#lowercase-setting-names
unittest
import unittest
from myproject.myapp import celeryapp
class TestMyCeleryWorker(unittest.TestCase):
def setUp(self):
celeryapp.conf.update(CELERY_ALWAYS_EAGER=True)
py.test fixtures
# conftest.py
from myproject.myapp import celeryapp
#pytest.fixture(scope='module')
def celery_app(request):
celeryapp.conf.update(CELERY_ALWAYS_EAGER=True)
return celeryapp
# test_tasks.py
def test_some_task(celery_app):
...
Addendum: make send_task respect eager
from celery import current_app
def send_task(name, args=(), kwargs={}, **opts):
# https://github.com/celery/celery/issues/581
task = current_app.tasks[name]
return task.apply(args, kwargs, **opts)
current_app.send_task = send_task
As of Celery 3.0, one way to set CELERY_ALWAYS_EAGER in Django is:
from django.test import TestCase, override_settings
from .foo import foo_celery_task
class MyTest(TestCase):
#override_settings(CELERY_ALWAYS_EAGER=True)
def test_foo(self):
self.assertTrue(foo_celery_task.delay())
Since Celery v4.0, py.test fixtures are provided to start a celery worker just for the test and are shut down when done:
def test_myfunc_is_executed(celery_session_worker):
# celery_session_worker: <Worker: gen93553#mymachine.local (running)>
assert myfunc.delay().wait(3)
Among other fixtures described on http://docs.celeryproject.org/en/latest/userguide/testing.html#py-test, you can change the celery default options by redefining the celery_config fixture this way:
#pytest.fixture(scope='session')
def celery_config():
return {
'accept_content': ['json', 'pickle'],
'result_serializer': 'pickle',
}
By default, the test worker uses an in-memory broker and result backend. No need to use a local Redis or RabbitMQ if not testing specific features.
reference
using pytest.
def test_add(celery_worker):
mytask.delay()
if you use flask, set the app config
CELERY_BROKER_URL = 'memory://'
CELERY_RESULT_BACKEND = 'cache+memory://'
and in conftest.py
#pytest.fixture
def app():
yield app # Your actual Flask application
#pytest.fixture
def celery_app(app):
from celery.contrib.testing import tasks # need it
yield celery_app # Your actual Flask-Celery application
In my case (and I assume many others), all I wanted was to test the inner logic of a task using pytest.
TL;DR; ended up mocking everything away (OPTION 2)
Example Use Case:
proj/tasks.py
#shared_task(bind=True)
def add_task(self, a, b):
return a+b;
tests/test_tasks.py
from proj import add_task
def test_add():
assert add_task(1, 2) == 3, '1 + 2 should equal 3'
but, since shared_task decorator does a lot of celery internal logic, it isn't really a unit tests.
So, for me, there were 2 options:
OPTION 1: Separate internal logic
proj/tasks_logic.py
def internal_add(a, b):
return a + b;
proj/tasks.py
from .tasks_logic import internal_add
#shared_task(bind=True)
def add_task(self, a, b):
return internal_add(a, b);
This looks very odd, and other than making it less readable, it requires to manually extract and pass attributes that are part of the request, for instance the task_id in case you need it, which make the logic less pure.
OPTION 2: mocks
mocking away celery internals
tests/__init__.py
# noinspection PyUnresolvedReferences
from celery import shared_task
from mock import patch
def mock_signature(**kwargs):
return {}
def mocked_shared_task(*decorator_args, **decorator_kwargs):
def mocked_shared_decorator(func):
func.signature = func.si = func.s = mock_signature
return func
return mocked_shared_decorator
patch('celery.shared_task', mocked_shared_task).start()
which then allows me to mock the request object (again, in case you need things from the request, like the id, or the retries counter.
tests/test_tasks.py
from proj import add_task
class MockedRequest:
def __init__(self, id=None):
self.id = id or 1
class MockedTask:
def __init__(self, id=None):
self.request = MockedRequest(id=id)
def test_add():
mocked_task = MockedTask(id=3)
assert add_task(mocked_task, 1, 2) == 3, '1 + 2 should equal 3'
This solution is much more manual, but, it gives me the control I need to actually unit test, without repeating myself, and without losing the celery scope.
I see a lot of CELERY_ALWAYS_EAGER = true in unit tests methods as a solution for unit tests, but since the version 5.0.5 is available there are a lot of changes which makes most of the old answers deprecated and for me a time consuming nonsense, so for everyone here searching a Solution, go to the Doc and read the well documented unit test examples for the new Version:
https://docs.celeryproject.org/en/stable/userguide/testing.html
And to the Eager Mode with Unit Tests, here a quote from the actual docs:
Eager mode
The eager mode enabled by the task_always_eager setting is by
definition not suitable for unit tests.
When testing with eager mode you are only testing an emulation of what
happens in a worker, and there are many discrepancies between the
emulation and what happens in reality.
Another option is to mock the task if you do not need the side effects of running it.
from unittest import mock
#mock.patch('module.module.task')
def test_name(self, mock_task): ...