my friends all the time talking about doing time-consuming task with celery .since i haven't computer science i can't get exactly about time of execution of celery task . in celery document talking about daemon when calling .delay() but i can't found what is daemon and finally when exactly celery task will be execute if we call it by .delay() ? :)
for example if i have below code when my_task will be execute? function.py:
def test():
my_task.delay()
while second<10:
second += 1 # assume this part take a second
1-exactly when test() function finished (about 10 second after test() called)
2-in the middle of while loop
3- after finished test() and when requests wasn't too many and server have time and resources to do task!! (maybe celery is intelligent and know the best time for execute task)
4- whenever want :)
5- correct way that i don't pointed to . :)
if it's depending to configuration i must tell i used default configuration from celery documentation.thank you.
Imagine that you do not have this task alone but several ones. You put all these tasks on a queue if you invoke it with my_task.delay(). Now there are several workers which just picks the first open task and will execute them.
So the right answer would be
"Whenever the responsible worker is free". This could be immediately just before you go into your while second<10:-loop but could also take several seconds or minutes if the worker is currently busy.
Related
I have one main function which I want execute with different arguments. It's function which play video on raspberry pi using omxplayer.
I would like to use scheduler which let me to plan executing of specific task, they should define time when task will be executed and/or make a queue, and if I execute this main function, scheduler places this task at the end of queue.
I have tried Python-RQ and it's good, but the problem is that I don't know how I can add new task at the end of queue if I don't know name of previous job..
I have function which should adds jobs to queue.
def add_movie(path):
q.enqueue(run_movie2, '{0}'.format(path))
Which execute:
def run_movie2(path):
subprocess.Popen(['omxplayer','-o', 'hdmi', '/home/bart/FlaskApp/movies/{0}'.format(path)])
return "Playing {0}".format(path)
Do you know scheduler which meet the requirements?
What can you advise with python rq? Is it any way to do it one by one ? How can I always add jobs at the end of queue ?
Thank you.
I'm trying to use Celery to handle background tasks. I currently have the following setup:
#app.task
def test_subtask(id):
print('test_st:', id)
#app.task
def test_maintask():
print('test_maintask')
g = group(test_subtask.s(id) for id in range(10))
g.delay()
test_maintask is scheduled to execute every n seconds, which works (I see the print statement appearing in the command line window where I started the worker). What I'm trying to do is have this scheduled task spawn a series of subtasks, which I've grouped here using group().
It seems, however, like none of the test_subtask tasks are being executed. What am I doing wrong? I don't have any timing/result constraints for these subtasks and just want them to happen some time from now, asynchronously, in no particular order. n seconds later, test_maintask will fire again (and again) but with none of the subtasks executing.
I'm using one worker, one beat, and AMQP as a broker (on a separate machine).
EDIT: For what it's worth, the problem seems to be purely because of one task calling another (and not something because of the main task being scheduled). If I call the main task manually:
celery_funcs.test_maintask.delay()
I see the main task's print statement but -- again -- not the subtasks. Calling a subtask directly does work however:
celery_funcs.test_subtask.delay(10)
Sigh... just found out the answer, I used the following to configure my Celery app:
app = Celery('celery_app', broker='<my_broker_here>')
Strangely enough, this is not being picked up in the task itself... that is,
print('test_maintask using broker', app.conf.BROKER_URL, current_app.conf.BROKER_URL)
Gives back '<my_broker_here>' and None respectively, causing the group to be send of to... some default broker (I guess?).
Adding BROKER_URL to app.conf.update does the trick, though I'm still not completely clear on what's going on in Celery's internals here...
Celery will send task to idle workers.
I have a task will run every 5 seconds, and I want this task to only be sent to one specify worker.
Other tasks can share the left over workers
Can celery do this??
And I want to know what this parameter is: CELERY_TASK_RESULT_EXPIRES
Does it means that the task will not be sent to a worker in the queue?
Or does it stop the task if it runs too long?
Sure, you can. Best way to do it, separate celery workers using different queues. You just need to make sure that task you need goes to separate queue, and your worker listening particular queue.
Long story for this: http://docs.celeryproject.org/en/latest/userguide/routing.html
Just to answer your second question CELERY_TASK_RESULT_EXPIRES is the time in seconds that the result of the task is persisted. So after a task is over, its result is saved into your result backend. The result is kept there for the amount of time specified by that parameter. That is used when a task result might be accessed by different callers.
This has probably nothing to do with your problem. As for the first solution, as already stated you have to use multiple queues. However be aware that you cannot assign the task to a specific Worker Process, just to a specific Worker which will then assign it to a specific Worker Process.
How I can config celery to get one worker always run the same task. And after it ended starts it again on the same worker.
It looks like you will need to take two steps
Create a separate queue for this task, route the task to the queue
2a. Create an infinite loop that calls your particular task, such as this answer
OR
2b. Have a recursive task that calls itself on completion (this could get messy)
By code:
#celery.task()
def some_recursive_task():
# Do some stuff and schedule it to run again later
# Note that the next run is not scheduled in a fixed basis, like crontabs
# but based on history of some object
# Actual task is found here:
# https://github.com/rafaelsierra/cheddar/blob/master/src/feeds/tasks.py#L39
# Then it call himself again
countdown = bla.get_countdown()
some_recursive_task.apply_async(countdown=countdown)
This task will run withing the next 10 minutes and 12 hours, but this task also calls another tasks that should run now, one for downloading stuff and other to parse it.
The problem is that the main function is called for every single record on database, let's assume a few hundred tasks running, but, considering that those task runs in average every few hours the amount of tasks is not a big deal.
The problem starts when I try to run this with a single worker, when I start the worker, I put it to run all queues and set 8 concurrent workers, then it starts an begin to acknowledge the tasks, but it seems that, no matter how far in future a task is set to, a worker will get it and wait for the its scheduled run, meaning that this worker is locked until then.
I know that I can just split the two other functions into different queues, which I already did, but my concern is that workers will acknowledge tasks to run 12 hours ahead and will not run the ones it should in 30 minutes.
Shouldn't workers ignore scheduled tasks until its time and run the ones that are just delayed without a time?
I don't think, or don't know how, periodic tasks is a solution.
See the points 5 & 6 there. Please, keep in mind that countdown is no different from eta argument of the task.
In short you're right. Single worker (or any amount of workers) should not block on scheduled (eta or countdown) tasks.
How can you tell that workers are locked? The scheduled tasks are prefetched from the queue, but not acknowledged until they are executed.
Also, please keep in mind all scheduled tasks are kept in RAM until they're executed. You would like them to be as light as possible. From what I understand the scheduled task doesn't pass around big chunks of data, probably only some URI, so this shouldn't be a problem.
The links you've pasted return 404. Are you sure cheddar isn't a private repository?