I am using Celery version 3.1.17.
Normally you can prepare your own static workflows with celery with canvas modules like chain, group, chord or simply linking tasks. You can access any result or any task property like task id of any task in your workflow. You have to have your tasks are predefined.
I am doing dynamic sub tasking by calling sub tasks in my workflow. For example, I call a task maybe a canvas module, and they decide the logic dynamically and try to call sub tasks according to that decision. But in that solution, there is no parent/child relation between my static workflow tasks and dynamic subtasks. I can not track them. This is really frustrating. Here is my current unusable way;
class ParentTask(Task):
def run(self, *args, **kwargs):
SubTask().subtask(args=(1, 2), countdown=1).apply_async()
class SubTask(Task):
def run(self, x, y, *args, **kwargs):
return x+y
non_tracable_for_subtask_result = ParentTask().delay()
I need a canvas module (group, chord etc.) can be extendable dynamically in a task in my workflow. Can I link new sub tasks dynamically to my current workflow(chord,group, etc.) on runtime?
I want something like;
// THIS CODE DOES NOT WORK, JUST TO EXPLAIN REQUIREMENT
class ParentTask(Task):
def run(self, *args, **kwargs):
count = get_count()
sub_task=SubTask().subtask(args=(1, 2), countdown=1)
for i in range(count):
//It could be like. THIS PART WHAT I LOOK FOR
self.link(sub_task)
class SubTask(Task):
def run(self, x, y, *args, **kwargs):
return x+y
>>> tracable_for_subtask_result = ParentTask().delay()
>>> tracable_for_subtask_result.children.get()
3
>>> tracable_for_subtask_result.children.id
.....
You can express dynamic workflows using task.replace.
There are some examples here: https://github.com/celery/celery/issues/3437
You can write something like:
class ParentTask(Task):
def run(self, *args, **kwargs):
count = get_count()
sub_tasks=[subtask.s(i) for i in range(count)]
g = group(*sub_tasks) | after_sub_tasks.s(self.request.id).on_error(after_sub_tasks_error.s(self.request.id))
return self.replace(g) # current task is replaced with subtasks without blocking current worker
#app.task(bind=True, name='after_sub_task')
def after_sub_tasks.(self, results, main_task_id):
print(f"results {results}")
print(f"main_task_id {main_task_id}")
return True
#app.task(bind=True, name='nx.delayed_task_error')
def after_sub_tasks_error(self, failed_task_id, main_task_id):
# no results available here... sigh
print(f"main_task_id {main_task_id}")
raise ValueError("Part of subtasks failed")
Related
I have a requirement that all my Celery tasks must be called with a specific keyword argument. I want to check and use the value of the keyword before my task is executed.
For instance, suppose I have the following:
#shared_task
def my_task(*args, **kwargs):
foo = kwargs.get('bar') # -> I don't want to copy this to all my tasks
# Do stuff here
How can I create a new decorator called my_special_shared_task so that the below is equivalent to the above:
#my_special_shared_task
def my_task(*args, **kwargs):
# Do stuff here
What about task inheritance?
something like:
class BaseTask(celery.Task):
foo = "some_value"
#app.task(base=BaseTask)
def my_task(*args, **kwargs):
# Do stuff here
print(self.foo)
here is the documentation.
I am trying to run multiple tasks in queue. The tasks come on user input. What i tried was creating a singleton class with ThreadPoolExecutor property and adding tasks into it. The tasks are added fine, but it looks like only the first addition of set of tasks works. The following are added but not executed.
class WebsiteTagScrapper:
class __WebsiteTagScrapper:
def __init__(self):
self.executor = ThreadPoolExecutor(max_workers=5)
instance = None
def __new__(cls): # __new__ always a classmethod
if not WebsiteTagScrapper.instance:
WebsiteTagScrapper.instance = WebsiteTagScrapper.__WebsiteTagScrapper()
return WebsiteTagScrapper.instance
I used multiprocess in one of my project without using celery, cause i think it was overkill for my use.
Maybe you could do something like this:
from multiprocessing import Process
class MyQueuProcess(Process):
def __init__(self):
super(MyQueuProcess, self).__init__()
self.tasks = []
def add_task(self, task):
self.tasks.append(task)
def run(self):
for task in self.tasks:
#Do your task
You just have to create an instance in your view, set up your task and then run(). Also if you need to access your database, you will need to import django in your child and then make a django.setup().
I'm having a task that looks like this
from mybasetask_module import MyBaseTask
#task(base=MyBaseTask)
#my_custom_decorator
def my_task(*args, **kwargs):
pass
and my base task looks like this
from celery import task, Task
class MyBaseTask(Task):
abstract = True
default_retry_delay = 10
max_retries = 3
acks_late = True
The problem I'm running into is that the celery worker is registering the task with the name
'mybasetask_module.__inner'
The task is registerd fine (which is the package+module+function) when I remove #my_custom_decorator from the task or if I provide an explicit name to the task like this
from mybasetask_module import MyBaseTask
#task(base=MyBaseTask, name='an_explicit_task_name')
#my_custom_decorator
def my_task(*args, **kwargs):
pass
Is this behavior expected? Do I need to do something so that my tasks are registered with the default auto registered name in the first case when I have multiple decorators but no explicit task name?
Thanks,
Use the functools.wraps() decorator to ensure that the wrapper returned by my_custom_decorator has the correct name:
from functools import wraps
def my_custom_decorator(func):
#wraps(func)
def __inner():
return func()
return __inner
The task name is taken from the function call that the task decorator wraps, but by inserting a decorator in between, you gave task your __inner wrapping function instead. The functools.wraps() decorator copies all the necessary metadata over from func to the wrapper so that task() can pick up the proper name.
I've been using testbed, webtest, and nose to test my Python GAE app, and it is a great setup. I'm now implementing something similar to Nick's great example of using the deferred library, but I can't figure out a good way to test the parts of the code triggered by DeadlineExceededError.
Since this is in the context of a taskqueue, it would be painful to construct a test that took more than 10 minutes to run. Is there a way to temporarily set the taskqueue time limit to a few seconds for the purpose of testing? Or perhaps some other way to elegantly test the execution of code in the except DeadlineExceededError block?
Abstract the "GAE context" for your code. in production provide real "GAE implementation" for testing provide a mock own that will raise the DeadlineExceededError. The test should not depend on any timeout, should be fast.
Sample abstraction (just glue):
class AbstractGAETaskContext(object):
def task_spired(): pass # this will throw exception in mock impl
# here you define any method that you call into GAE, to be mocked
def defered(...): pass
If you don't like abstraction, you can do monkey patching for testing only, also you need to define the task_expired function to be your hook for testing.
task_expired should be called during your task implementation function.
*UPDATED*This the 3rd solution:
First I want to mention that the Nick's sample implementation is not so great, the Mapper class has to many responsabilities(deferring, query data, update in batch); and this make the test hard to made, a lot of mocks need to be defined. So I extract the deferring responsabilities in a separate class. You only want to test that deferring mechanism, what actually is happen(the update, query, etc) should be handled in other test.
Here is deffering class, also this no more depends on GAE:
class DeferredCall(object):
def __init__(self, deferred):
self.deferred = deferred
def run(self, long_execution_call, context, *args, **kwargs):
''' long_execution_call should return a tuple that tell us how was terminate operation, with timeout and the context where was abandoned '''
next_context, timeouted = long_execution_call(context, *args, **kwargs)
if timeouted:
self.deferred(self.run, next_context, *args, **kwargs)
Here is the test module:
class Test(unittest.TestCase):
def test_defer(self):
calls = []
def mock_deferrer(callback, *args, **kwargs):
calls.append((callback, args, kwargs))
def interrupted(self, context):
return "new_context", True
d = DeferredCall()
d.run(interrupted, "init_context")
self.assertEquals(1, len(calls), 'a deferred call should be')
def test_no_defer(self):
calls = []
def mock_deferrer(callback, *args, **kwargs):
calls.append((callback, args, kwargs))
def completed(self, context):
return None, False
d = DeferredCall()
d.run(completed, "init_context")
self.assertEquals(0, len(calls), 'no deferred call should be')
How will look the Nick's Mapper implementation:
class Mapper:
...
def _continue(self, start_key, batch_size):
... # here is same code, nothing was changed
except DeadlineExceededError:
# Write any unfinished updates to the datastore.
self._batch_write()
# Queue a new task to pick up where we left off.
##deferred.defer(self._continue, start_key, batch_size)
return start_key, True ## make compatible with DeferredCall
self.finish()
return None, False ## make it comaptible with DeferredCall
runner = _continue
Code where you register the long running task; this only depend on the GAE deferred lib.
import DeferredCall
import PersonMapper # this inherits the Mapper
from google.appengine.ext import deferred
mapper = PersonMapper()
DeferredCall(deferred).run(mapper.run)
I'm working on a project using django and celery(django-celery). Our team decided to wrap all data access code within (app-name)/manager.py(NOT wrap into Managers like the django way), and let code in (app-name)/task.py only dealing with assemble and perform tasks with celery(so we don't have django ORM dependency in this layer).
In my manager.py, I have something like this:
def get_tag(tag_name):
ctype = ContentType.objects.get_for_model(Photo)
try:
tag = Tag.objects.get(name=tag_name)
except ObjectDoesNotExist:
return Tag.objects.none()
return tag
def get_tagged_photos(tag):
ctype = ContentType.objects.get_for_model(Photo)
return TaggedItem.objects.filter(content_type__pk=ctype.pk, tag__pk=tag.pk)
def get_tagged_photos_count(tag):
return get_tagged_photos(tag).count()
In my task.py, I like to wrap them into tasks (then maybe use these tasks to do more complicated tasks), so I write this decorator:
import manager #the module within same app containing data access functions
class mfunc_to_task(object):
def __init__(mfunc_type='get'):
self.mfunc_type = mfunc_type
def __call__(self, f):
def wrapper_f(*args, **kwargs):
callback = kwargs.pop('callback', None)
mfunc = getattr(manager, f.__name__)
result = mfunc(*args, **kwargs)
if callback:
if self.mfunc_type == 'get':
subtask(callback).delay(result)
elif self.mfunc_type == 'get_or_create':
subtask(callback).delay(result[0])
else:
subtask(callback).delay()
return result
return wrapper_f
then (still in task.py):
##task
#mfunc_to_task()
def get_tag():
pass
##task
#mfunc_to_task()
def get_tagged_photos():
pass
##task
#mfunc_to_task()
def get_tagged_photos_count():
pass
Things work fine without #task.
But, after applying that #task decorator(to the top as celery documentation instructed), things just start to fall apart. Apparently, every time the mfunc_to_task.__call__ gets called, the same task.get_tag function gets passed as f. So I ended up with the same wrapper_f every time, and now the only thing I cat do is to get a single tag.
I'm new to decorators. Any one can help me understand what went wrong here, or point out other ways to achieve the task? I really hate to write the same task wrap code for every of my data access functions.
Not quite sure why passing arguments won't work?
if you use this example:
#task()
def add(x, y):
return x + y
lets add some logging to the MyCoolTask:
from celery import task
from celery.registry import tasks
import logging
import celery
logger = logging.getLogger(__name__)
class MyCoolTask(celery.Task):
def __call__(self, *args, **kwargs):
"""In celery task this function call the run method, here you can
set some environment variable before the run of the task"""
logger.info("Starting to run")
return self.run(*args, **kwargs)
def after_return(self, status, retval, task_id, args, kwargs, einfo):
#exit point of the task whatever is the state
logger.info("Ending run")
pass
and create an extended class (extending MyCoolTask, but now with arguments):
class AddTask(MyCoolTask):
def run(self,x,y):
if x and y:
result=add(x,y)
logger.info('result = %d' % result)
return result
else:
logger.error('No x or y in arguments')
tasks.register(AddTask)
and make sure you pass the kwargs as json data:
{"x":8,"y":9}
I get the result:
[2013-03-05 17:30:25,853: INFO/MainProcess] Starting to run
[2013-03-05 17:30:25,855: INFO/MainProcess] result = 17
[2013-03-05 17:30:26,739: INFO/MainProcess] Ending run
[2013-03-05 17:30:26,741: INFO/MainProcess] Task iamscheduler.tasks.AddTask[6a62641d-16a6-44b6-a1cf-7d4bdc8ea9e0] succeeded in 0.888684988022s: 17
Instead of use decorator why you don't create a base class that extend celery.Task ?
In this way all your tasks can extend your customized task class, where you can implement your personal behavior by using methods __call__ and after_return
.
You can also define common methods and object for all your task.
class MyCoolTask(celery.Task):
def __call__(self, *args, **kwargs):
"""In celery task this function call the run method, here you can
set some environment variable before the run of the task"""
return self.run(*args, **kwargs)
def after_return(self, status, retval, task_id, args, kwargs, einfo):
#exit point of the task whatever is the state
pass