My app gathers a bunch of phone numbers on a page. Once the user hits the submit button I create a celery task to call each number and give a reminder message then redirect them to a page where they can see the live updates about the call. I am using web sockets to live update the status of each call and need the tasks to execute synchronously as I only have access to dial out from one number.
So once the first call/task is completed, I want the next one to fire off.
I took a look at CELERY_ALWAYS_EAGER settings but it just went through the first iteration and stopped.
#task
def reminder(number):
# CODE THAT CALLS NUMBER HERE....
def make_calls(request):
for number in phone_numbers:
reminder.delay(number)
return redirect('live_call_updates')
If you look at the celery DOCS on tasks you see that to call a task synchronosuly, you use the apply() method as opposed to the apply_async() method.
So in your case you could use:
reminder.apply(args=[number])
The DOCS also note that:
If the CELERY_ALWAYS_EAGER setting is set, it will be replaced by a local apply() call instead.
Thanks to #JivanAmara who in the comments reiterated that when using apply(), the task will run locally(in the server/computer in which its called). And this can have ramifications, if you intended to run your tasks across multiple servers/machines.
if you want to fire each call one after another, why dont you wrap all the calls in one task
#task
def make_a_lot_of_calls(numbers):
for num in numbers:
# Assuming that reminder blocks till the call finishes
reminder(number)
def make_calls(request):
make_a_lot_of_calls.delay(phone_numers)
return redirect('live_call_updates')
Can use celery chain.
from celery import chain
tasks = [reminder.s(number) for number in phone_numbers]
chain(*tasks).apply_async()
Related
How I can run a another method action() automatically when a set of celery tasks is finished. Are there any simple way to trigger another function call on completion?
#tasks.py
#app.task
def rank(item):
# Update database
#main.py
from tasks import rank
def action():
print('Tasks has been finished.')
ans = list()
for item in tqdm.tqdm(all_items):
rank.apply_async(([{"_id": item["_id"], "max": item["max"]}]))
In the previous message that is very similar to this one, which you deleted, I explained how to do this without using the Chord workflow primitive that you for some reason decided to avoid... You even left some parts of that code here that does nothing (ans = list()). I will put that part of the answer here, as it explains how what you need can be accomplished:
Without some code changes your code will not work. For starters, apply_async() does not return result. So, after you modify the code to ans.append(rank.apply_async(([{"_id": item["_id"], "max": item["max"]}])).get()) it should work as you want, but unfortunately it will not distribute tasks (which is why we use Celery!), so in order to emulate the logic that Chord does, you would need to call apply_async() as you do, store the task IDs, and periodically poll for state. If the task is finished, get the result and do this until all are finished.
Solution B would be to use Group primitive, schedule your tasks to be executed in a group, obtain GroupResult object, and do the same what I wrote above - periodically poll for individual results.
If you do this polling in a loop, than you can simply call action() after the loop, as it will be called after all tasks are finished. Once you implement this you will understand why many of experienced Celery users use Chord instead...
I'm building a basic ETL pipeline that hits a main endpoint which holds a list of IDs (variable amounts on each call) for processing. My current thinking is to use RabbitMQ as a queue system and have three tasks (Extract, Transform, Load) consume from RabbitMQ. Most tutorials I've seen online showcase a simple sequential execution of tasks before they exit. I've tried to construct a DAG that does this sequential action on each ID we receive. But I've run into problems trying to figure out how to schedule all these tasks through airflow, when I don't know how many IDs exist.
This is the general Tree view of the DAG in question:
I've begun the process of utilizing RabbitMQ to push these IDs to a queue and have celery spool up a variable amount of workers to handle the load. The problem I've run into is I don't know how to break out of the "consumption" loop. For example (I'm using pseudocode for some abstraction over RabbitMQ):
def extract():
# callback function when messages are sent to this worker
def _extract(channel, url, rest):
resp = request.get(url)
channel.publish('transform_queue', resp)
# attach the callback to the queue
channel.basic_consume('extract_queue', callback=_extract)
channel.start_consuming() # runs a pseudo loop waiting for messages here
Just as a note, some the variables (such as the channel underneath _extract are implicit, but will most likely be wrapped in a Custom Operator.
The Load and Transform functions work similarly. The problem I've run into is when the function starts consuming it doesn't stop until it's shutdown. I've been able to send sentinel messages to allow the function to "exit", however this will cause the Task to be marked as Failed, and sent to retry. For example here's the code for the sentinel shutdown.
def extract():
# callback function when messages are sent to this worker
def _extract(channel, message, rest):
if message == SHUTDOWN:
exit()
resp = requests.get(message.url)
channel.publish('transform_queue', resp)
# attach the callback to the queue
channel.basic_consume('extract_queue', callback=_extract)
channel.start_consuming() # runs a pseudo loop waiting for messages here
There's also the option to selectively cancel consumers, however this would just add more complexity as there is still the issue of polling for cancellation, and then that task would end up with the same issue above.
The main questions I have are:
Is there a way to exit with success in this setup?
Is this the best way to approach this problem? I imagine this is a common use case for airflow, so there must be some best practices or common setups. However, I haven't been able to find it.
I could understand the following from your question and hence suggestions to explore the below.
You are not sure of number of inputs and hence number of times you want to run the flow
You can create a custom operator (say FindIDs) which starts with finding out how many IDs you need to execute for and pushes the values to XComs. These messages can then be used for your other functions (say extractor, transformer and loader) and they can be set in a sequence as below
start >> findIds >> extractor >> transformer >> loader >> end
Check https://airflow.apache.org/docs/apache-airflow/stable/concepts.html?#xcoms
You need to skip certain executions in case there are no inputs (or IDs in your case
I would use ShortCircuitOperator in this case and conditionally skip the execution for the DAG.
Check: https://github.com/apache/airflow/blob/master/airflow/example_dags/example_short_circuit_operator.py
I have a Django rest framework app that calls 2 huey tasks in succession in a serializer create method like so:
...
def create(self, validated_data):
user = self.context['request'].user
player_ids = validated_data.get('players', [])
game = Game.objects.create()
tasks.make_players_friends_task(player_ids)
tasks.send_notification_task(user.id, game.id)
return game
# tasks.py
#db_task()
def make_players_friends_task(ids):
players = User.objects.filter(id__in=ids)
# process players
#db_task()
def send_notification_task(user_id, game_id):
user = User.objects.get(id=user_id)
game = Game.objects.get(id=game_id)
# send notifications
When running the huey process in the terminal, when I hit this endpoint, I can see that only one or the other of the tasks is ever called, but never both. I am running huey with the default settings (redis with 1 thread worker.)
If I alter the code so that I am passing in the objects themselves as parameters, rather than the ids, and remove the django queries in the #db_task methods, things seem to work alright.
The reason I initially used the ids as parameters is because I assumed (or read somewhere) that huey uses json serialization as default, but after looking into it, pickle is actually the default serializer.
One theory is that since I am only running one worker, and also have a #db_periodic_task method in the app, the process can only handle listening for tasks or executing them at any time, but not both. This is the way celery seems to work, where you need a separate process for a scheduler and a worker each, but this isn't mentioned in huey's documentation.
If you run the huey consumer it will actually spawn a separate scheduler together with the amount of workers you've specified, so that's not going to be your problem.
You're not giving enough information to actually properly see what's going wrong so check the following:
If you run the huey consumer in the terminal, observe whether all your tasks show up as properly registered so that the consumer is actually capable of consuming them.
Check whether your redis process is running.
Try performing the tasks with a blocking call to see on which tasks it fails:
task_result = tasks.make_players_friends_task(player_ids)
task_result.get(blocking=True)
task_result = tasks.send_notification_task(user.id, game.id)
task_result.get(blocking=True)
Do this with a debugger or print statements to see whether it makes it to the end of your function or where it gets stuck.
Make sure to always restart your consumer when you change code. It doesn't automatically pick up new code like the django dev server. The fact that your code works as intended while pickling whole objects instead of passing id's could point to this, as it would be really weird that this would break it. On the other hand, you shouldn't pass in django ORM objects. It makes way more sense to use your id approach.
my friends all the time talking about doing time-consuming task with celery .since i haven't computer science i can't get exactly about time of execution of celery task . in celery document talking about daemon when calling .delay() but i can't found what is daemon and finally when exactly celery task will be execute if we call it by .delay() ? :)
for example if i have below code when my_task will be execute? function.py:
def test():
my_task.delay()
while second<10:
second += 1 # assume this part take a second
1-exactly when test() function finished (about 10 second after test() called)
2-in the middle of while loop
3- after finished test() and when requests wasn't too many and server have time and resources to do task!! (maybe celery is intelligent and know the best time for execute task)
4- whenever want :)
5- correct way that i don't pointed to . :)
if it's depending to configuration i must tell i used default configuration from celery documentation.thank you.
Imagine that you do not have this task alone but several ones. You put all these tasks on a queue if you invoke it with my_task.delay(). Now there are several workers which just picks the first open task and will execute them.
So the right answer would be
"Whenever the responsible worker is free". This could be immediately just before you go into your while second<10:-loop but could also take several seconds or minutes if the worker is currently busy.
I'm trying to use Celery to handle background tasks. I currently have the following setup:
#app.task
def test_subtask(id):
print('test_st:', id)
#app.task
def test_maintask():
print('test_maintask')
g = group(test_subtask.s(id) for id in range(10))
g.delay()
test_maintask is scheduled to execute every n seconds, which works (I see the print statement appearing in the command line window where I started the worker). What I'm trying to do is have this scheduled task spawn a series of subtasks, which I've grouped here using group().
It seems, however, like none of the test_subtask tasks are being executed. What am I doing wrong? I don't have any timing/result constraints for these subtasks and just want them to happen some time from now, asynchronously, in no particular order. n seconds later, test_maintask will fire again (and again) but with none of the subtasks executing.
I'm using one worker, one beat, and AMQP as a broker (on a separate machine).
EDIT: For what it's worth, the problem seems to be purely because of one task calling another (and not something because of the main task being scheduled). If I call the main task manually:
celery_funcs.test_maintask.delay()
I see the main task's print statement but -- again -- not the subtasks. Calling a subtask directly does work however:
celery_funcs.test_subtask.delay(10)
Sigh... just found out the answer, I used the following to configure my Celery app:
app = Celery('celery_app', broker='<my_broker_here>')
Strangely enough, this is not being picked up in the task itself... that is,
print('test_maintask using broker', app.conf.BROKER_URL, current_app.conf.BROKER_URL)
Gives back '<my_broker_here>' and None respectively, causing the group to be send of to... some default broker (I guess?).
Adding BROKER_URL to app.conf.update does the trick, though I'm still not completely clear on what's going on in Celery's internals here...