Celery: custom Task/Request attribute shared with Queue - python

There is a tracker class, that just counts success, failed, pending, and started tasks via redis.
The goal is to extend Celery, so its workers can access the group_id and keep statistics for the group. I expect an interface similar to:
def on_important_event(...):
group_id=uuid4()
for _ in range(count_of_jobs):
my_task.apply_async(..., group_id=group_id)
custom Task class would look like:
class MyTask(Task):
# declaring group_id somehow
def apply_async(...):
get_tracker(self.request.group_id).task_pending()
...
def before_start(...):
get_tracker(self.request.group_id).task_started()
...
def on_success(...):
get_tracker(self.request.group_id).task_success()
...
def on_failure(...):
get_tracker(self.request.group_id).task_failed()
...
I could not find a way to implement the class so it will properly save and receive custom attribute through AMQP.
UPD, to make clear:
The problem is to mark some calls of Tasks as a participant of a group. So I can track the group not general Task or a single call.
As it seems to me, there must be a way to add an attribute for Task that would be saved into Queue and then received by a Celery's worker so i can access it on Task class layer.

I would recommend a different approach - write a custom monitor (check the Monitoring API document in the official Celery docs). A good starting point: Real-time processing.
This is basically how Flower and Leek work.

Related

Huey not calling tasks in Django

I have a Django rest framework app that calls 2 huey tasks in succession in a serializer create method like so:
...
def create(self, validated_data):
user = self.context['request'].user
player_ids = validated_data.get('players', [])
game = Game.objects.create()
tasks.make_players_friends_task(player_ids)
tasks.send_notification_task(user.id, game.id)
return game
# tasks.py
#db_task()
def make_players_friends_task(ids):
players = User.objects.filter(id__in=ids)
# process players
#db_task()
def send_notification_task(user_id, game_id):
user = User.objects.get(id=user_id)
game = Game.objects.get(id=game_id)
# send notifications
When running the huey process in the terminal, when I hit this endpoint, I can see that only one or the other of the tasks is ever called, but never both. I am running huey with the default settings (redis with 1 thread worker.)
If I alter the code so that I am passing in the objects themselves as parameters, rather than the ids, and remove the django queries in the #db_task methods, things seem to work alright.
The reason I initially used the ids as parameters is because I assumed (or read somewhere) that huey uses json serialization as default, but after looking into it, pickle is actually the default serializer.
One theory is that since I am only running one worker, and also have a #db_periodic_task method in the app, the process can only handle listening for tasks or executing them at any time, but not both. This is the way celery seems to work, where you need a separate process for a scheduler and a worker each, but this isn't mentioned in huey's documentation.
If you run the huey consumer it will actually spawn a separate scheduler together with the amount of workers you've specified, so that's not going to be your problem.
You're not giving enough information to actually properly see what's going wrong so check the following:
If you run the huey consumer in the terminal, observe whether all your tasks show up as properly registered so that the consumer is actually capable of consuming them.
Check whether your redis process is running.
Try performing the tasks with a blocking call to see on which tasks it fails:
task_result = tasks.make_players_friends_task(player_ids)
task_result.get(blocking=True)
task_result = tasks.send_notification_task(user.id, game.id)
task_result.get(blocking=True)
Do this with a debugger or print statements to see whether it makes it to the end of your function or where it gets stuck.
Make sure to always restart your consumer when you change code. It doesn't automatically pick up new code like the django dev server. The fact that your code works as intended while pickling whole objects instead of passing id's could point to this, as it would be really weird that this would break it. On the other hand, you shouldn't pass in django ORM objects. It makes way more sense to use your id approach.

Elegant way to handle queuing up multiple tasks into single bulkadd/batch

I use task queue quite extensively in an application.
Most of the time it's using the following pattern:
yield (add_foo_task_async(), add_bar_task_async(), add_baz_task_async())
# add_foo_task_async() etc are defined like this
#classmethod
#ndb.tasklet
def add_foo_task_async(cls, param):
queue = taskqueue.Queue("foo")
# perform various modifications on params etc...
params = {
"param": param,
}
task = taskqueue.Task(url=uri_for("tasks/foo_worker"), params=params)
result = yield queue.add_async(task)
raise ndb.Return(result)
The problem is it seems to create a "ladder" of "bulkAdds".
I'd like to improve the performance so that there aren't all these ladders.
One solution I'm considering is creating a class where the tasks are created and stored in a list. The class would also have "add_tasks_to_taskqueue" method which queues them all to the actual task queue. One issue however is that the quite a lot of tasks that I use are queued up in _post_put_hook (so I'd need a way to pass this class everywhere). Another concern is that I use multiple queues at the moment, so I assume I'll need to change that?
update
I've seen that ndb context has some auto-batching code for memcache and urlfetch. Could the proposed solution somehow use a similar method where we extend ndb context (is that possible?) and use something like get_contenxt().add_task_to_batch_queue(task)
Is there a better/elegant way to handle what I'm trying to achieve?
Thanks

Viewflow Signal for New tasks?

I am looking to announce in my slack channels whenever a new task becomes available.
Looking at the src, it seems like there is only a signal whenever a task is started.
How can I create a signal when a task becomes available?
Generally, using signals to interact within your own application is a bad design decision.
You can implement same functionality more explicit by implementing a custom node, that would perform an callback on create
class MyFlow(Flow):
...
approve = (
MyView(flow_views.UpdateProcessView, fields=['approved'])
.onCreate(this.send_notification)
.Next(this.check_approve)
)
...
You can handle create action by overriding the node activation class activate method
The viewflow custom node sample could be helpful as a reference for custom node implementation https://github.com/viewflow/viewflow/blob/master/demo/customnode/nodes.py

Celery routing to multiple tasks rather than hosts

I am working on porting an application I wrote from Golang (using redis) to Python and I would love to use Celery to accomplish my task queuing, but I have a question regarding routing...
My application receives "events" via REST POSTs, where each "event" can be of a different type. I then want to have workers in the background wait for events of certain types. The caveat here is that ONE event could result in MORE than ONE task handling the event. For example:
some/lib/a/tasks.py
#task
def handle_event_typeA(a,b,c):
# handles event...
pass
#task
def handle_event_typeB(a,b,c):
# handles other event...
pass
some/lib/b/tasks.py
#task
def handle_event_typeA(a,b,c):
# handles event slightly differently... but still same event...
pass
In summary... I want to be able to run N number of workers (across X number of machines) and each one of these works will have Y numbers of tasks registered such as: a.handle_event_typeA, b.handle_event_typeA, etc... and I want to be able to insert a task into a queue and have one worker pick up the task and route it to more than one task in the worker (i.e. to both a.handle_event_typeA and b.handle_event_typeA).
I have read over the documentation of Kombu here and Celery's routing documentation here and I can't seem to figure out how to configure this correctly.
I have been using Celery for some time now for more traditional workflows and I am very happy with its feature set, performance, and stability. I would implement what I need using Kombu directly or some homebrew solution, but I would like to use Celery if at all possible.
Thanks guys! I hope I don't waste anyone's time with this question.
Edit 1
After some more time thinking about this issue I have come up with a workaround to implement what I want with Celery. It's not the most elegant solution, but it's working well. I am using django and it's caching abstraction (you can use something like memcached or redis directly instead). Here's the snippet I came up with:
from django.core.cache import cache
from celery.execute import send_task
SUBSCRIBERS_KEY = 'task_subscribers.{0}'
def subscribe_task(key, task):
# get current list of subscribers
cache_key = SUBSCRIBERS_KEY.format(key)
subscribers = cache.get(cache_key) or []
# get task name
if hasattr(task, 'delay'):
name = task.name
else:
name = task
# add to list
if not name in subscribers:
subscribers.append(name)
# set cache
cache.set(cache_key, subscribers)
def publish_task(key, *kargs):
# get current list of subscribers
cache_key = SUBSCRIBERS_KEY.format(key)
subscribers = cache.get(cache_key) or []
# iterate through all subscribers and execute task
for task in subscribers:
# send celery task
send_task(task, args=kargs, kwargs={})
What I then do is subscribe to tasks in different modules by doing the following:
subscribe_task('typeA', 'some.lib.b.tasks.handle_event_typeA')
Then I can call the publish task method when handling the REST events.

Google App Engine: Add task to queue from a task

I need to track data from another website. Since it's spread over 60+ pages, I intend to use a daily cron job to add a task to the queue. This task then should take care of one page and depending on some checks, put another instance of itself on the queue for the next page.
Now a simple
taskqueue.add(url='/path/to_self', params=control)
in the get of my webapp.RequestHandler class for this task leads to a
"POST /path/to_self HTTP/1.1" 405 -
Is there a way to get this to work, or is it simply not possible to add tasks to the queue from within tasks?
It's possible to add tasks from within tasks. I'm doing it in my application.
It's very useful when you want to migrate a large set of entities : one task processes a small chunk of entities then adds itself to the queue in order to process the rest until the migration is over.
I am not sure what is the problem with your code.
Have you implemented the post(self) method in your RequestHandler class ? Task calls default to the POST method.

Categories