I have a Django application that needs to run an optimization algorithm. This algorithm is composed of two parts. The first part is an evolutionary algorithm and this algorithm calls a certain number of tasks of the second part which is a simulated annealing algorithm.
The problem is that celery doesn't allow a task calls an asynchronous task.
I have tried this code below:
sa_list = []
for cromossomo in self._populacao:
sa_list.append(simulated_annealing_t.s(cromossomo.to_JSON(), self._NR, self._T, get_mutacao_str(self._mutacao_SA), self._argumentos))
job = group(sa_list)
result = job.apply_async()
resultados = result.get()
This code is part of the evolutionary algorithm which is a celery task.
When I tried to run it the celery shows this message:
[2015-12-02 16:20:15,970: WARNING/Worker-1] /home/arthur/django-user/local/lib/python2.7/site-packages/celery/result.py:45: RuntimeWarning: Never call result.get() within a task!
See http://docs.celeryq.org/en/latest/userguide/tasks.html#task-synchronous-subtasks
In Celery 3.2 this will result in an exception being
raised instead of just being a warning.
despite being just a warning the celery seems to be full of tasks and locks.
I searched for a lot of solutions but none of them worked.
one way to deal with this is to have a 2 stage pipeline:
def first_task():
sa_list = []
for cromossomo in self._populacao:
sa_list.append(simulated_annealing_t.s(cromossomo.to_JSON(), self._NR, self._T, get_mutacao_str(self._mutacao_SA), self._argumentos))
job = group(sa_list)
result = job.apply_async()
result.save()
return result.id
then call it like this:
from path.to.tasks import app, first_task
result_1 = first_task.apply_async()
result_2_id = result_1.get()
result_2 = app.GroupResult.restore(result_2_id)
resultados = result_2.get()
there are other ways to do this that involve more work - you could use a chord to gather the results of the group.
The problem is not that celery doesn't allow the execution of async tasks in your example, but that you'll run into a deadlock, hence the warning:
Let's assume you have a task A that spawns a number of subtasks B through apply_async(). Every one of those tasks is executed by a worker. The problem is that if the number of tasks B is larger than the amount of available workers, task A is still waiting for their results (in your example, at least - it's not by default). When task A is still running, the workers that have executed a task B will not execute another one, they are blocked until task A is finished. (I don't know exactly why, but I had this problem just a few weeks ago.)
This means that celery can't execute anything until you manually shut down the workers.
Solutions
This depends entirely what you will do with your task results. If you need them to execute the next subtask, you can chain them through Linking with callbacks or by hardcoding it into the respective tasks (so that you call the first, that calls the second, and so on).
If you only need to see if they are executed and are successful or not, you can use flower to monitor your tasks.
If you need to process the output of all the subtasks further, I recommend writing the results to an xml file: Have task A call all tasks B, and once they are done you execute task C that processes the results. Maybe there are more elegant solutions, but this avoids the deadlock for sure.
Related
We have started one celery worker reading from Rabbitmq (one queue):
celery -A tasks worker -c 1 (one process)
We send to RabbitMq 2 chains (3 tasks in each chain):
chain(*tasks_to_chain1).apply_async() (let's call it C1 and its tasks C1.1, C1.2, C1.3)
chain(*tasks_to_chain2).apply_async() (let's call it C2 and its tasks C2.1, C2.2, C2.3)
We expected the tasks to be run in this order: C1.1, C1.2, C1.3, C2.1, C2.2, C2.3.
However we are seeing this instead: C1.1, C2.1, C1.2, C2.2, C1.3, C2.3.
We don't get why. Can someone shed some light on what's happening?
Many thanks,
Edit: more generally speaking we observe that chain 2 starts before chain 1 ends.
Without access to full code it is not possible to test, but a plausible explanation is that your asynchronous sender just happens to do that. I would also assume the order is not deterministic. You will likely get somewhat different results if you keep repeating this.
When you execute apply_async(), an asynchronous task is created that will (probably, not sure without seeing code) start submitting those tasks to the queue, but as the call is not blocking, your main program immediately proceeds to the second apply_async() that creates another background task to submit things to the queue.
These two background tasks will run in the background handled by a scheduler. What you now see in your output is that each task submits one item to the queue but then passes control to the other task, which again submits one and then hands back control.
If you do not want this to happen asynchronously, use apply instead of apply_async. This is a blocking call, and your main program execution does not proceed until the first three tasks have been submitted. With asynchronous you can never be sure of the exact order of execution between tasks. You will know C1.1 will happen before C1.2 but you cannot guarantee how C1 and C2 tasks are interleaved.
I am using Dask to run a pool of tasks, retrieving results in the order they complete by the as_completed method, and potentially submitting new tasks to the pool each time one returns:
# Initial set of jobs
futures = [client.submit(job.run_simulation) for job in jobs]
pool = as_completed(futures, with_results=True)
while True:
# Wait for a job to finish
f, result = next(pool)
# Exit condition
if result == 'STOP':
break
# Do processing and maybe submit more jobs
more_jobs = process_result(f, result)
more_futures = [client.submit(job.run_simulation) for job in more_jobs]
pool.update(more_futures)
Here's my problem: The function job.run_simulation that I am submitting can sometimes hang for a long time, and I want to time out this function - kill the task and move on if the run time exceeds a certain time limit.
Ideally, I'd like to do something like client.submit(job.run_simulation, timeout=10), and have next(pool) return None if the task ran longer than the timeout.
Is there any way that Dask can help me time out jobs like this?
What I've tried so far
My first instinct was to handle the timeout independently of Dask within the job.run_simulation function itself. I've seen two types of suggestions (e.g. here) for generic Python timeouts.
1) Use two threads, one for the function itself and one for a timer. My impression is this doesn't actually work because you can't kill threads. Even if the timer runs out, both threads have to finish before the task is completed.
2) Use two separate processes (with the multiprocessing module), one for the function and one for the timer. This would work, but since I'm already in a daemon subprocess spawned by Dask, I'm not allowed to create new subprocesses.
A third possibility is to move the code block to a separate script that I run with subprocess.run and use the subprocess.run built in timeout. I could do this, but it feels like a worst-case fallback scenario because it would take a lot of cumbersome passing of data to and from the subprocess.
So it feels like I have to accomplish the timeout at the level of Dask. My one idea here is to create a timer as a subprocess at the same time as I submit the task to Dask. Then if the timer runs out, use Client.cancel() to stop the task. The problem with this plan is that Dask might wait for workers to free up before starting the task, and I don't want the timer running before the task is actually running.
Your assessment of the problem seems correct to me and the solutions you went through are the same that I would consider. Some notes:
Client.cancel is unable to stop a function from running if it has already started. These functions are running in a thread pool and so you run into the "can't stop threads" limitation. Dask workers are just Python processes and have the same abilities and limitations.
You say that you can't use processes from within a daemon process. One solution to this would be to change how you're using processes in one of the following ways:
If you're using dask.distributed on a single machine then just don't use processes
client = Client(processes=False)
Don't use Dask's default nanny processes, then your dask worker will be a normal process capable of using multiprocessing
Set dask's multiprocessing-context config to "spawn" rather than fork or forkserver
The clean way to solve this problem though is to solve it inside of your function job.run_simulation. Ideally you would be able to push this timeout logic down to that code and have it raise cleanly.
I am using Celery to run some tasks that take a long time to complete. There
is an initial task that needs to complete before two sub-tasks can run. The tasks that I created are file system operations and don't return a result.
I would like the subtasks to run at the same time, but when I use a group for these tasks they run sequentially and not in parallel.
I have tried:
g = group([secondary_task(), secondary_tasks2()])
chain(initial_task(),g)
I've also tried running the group directly in the first task, but that doesn't seem to work either.
Is what I'm trying to accomplish doable with Celery?
First Task
/ \
Second Task Third Task
Not:
First Task
|
Second Task
|
Third Task
The chain is definitely the right approach.
I would expect this to work: chain(initial_task.s(), g)()
Do you have more than one celery worker running to be able to run more than one task at the same time?
I'm trying to use Celery to handle background tasks. I currently have the following setup:
#app.task
def test_subtask(id):
print('test_st:', id)
#app.task
def test_maintask():
print('test_maintask')
g = group(test_subtask.s(id) for id in range(10))
g.delay()
test_maintask is scheduled to execute every n seconds, which works (I see the print statement appearing in the command line window where I started the worker). What I'm trying to do is have this scheduled task spawn a series of subtasks, which I've grouped here using group().
It seems, however, like none of the test_subtask tasks are being executed. What am I doing wrong? I don't have any timing/result constraints for these subtasks and just want them to happen some time from now, asynchronously, in no particular order. n seconds later, test_maintask will fire again (and again) but with none of the subtasks executing.
I'm using one worker, one beat, and AMQP as a broker (on a separate machine).
EDIT: For what it's worth, the problem seems to be purely because of one task calling another (and not something because of the main task being scheduled). If I call the main task manually:
celery_funcs.test_maintask.delay()
I see the main task's print statement but -- again -- not the subtasks. Calling a subtask directly does work however:
celery_funcs.test_subtask.delay(10)
Sigh... just found out the answer, I used the following to configure my Celery app:
app = Celery('celery_app', broker='<my_broker_here>')
Strangely enough, this is not being picked up in the task itself... that is,
print('test_maintask using broker', app.conf.BROKER_URL, current_app.conf.BROKER_URL)
Gives back '<my_broker_here>' and None respectively, causing the group to be send of to... some default broker (I guess?).
Adding BROKER_URL to app.conf.update does the trick, though I'm still not completely clear on what's going on in Celery's internals here...
From reading the Celery documentation, it looks like I should be able to use the following python code to list tasks on the queue that have not yet been picked up:
from celery.task.control import inspect
i = inspect()
tasks = i.reserved()
However, when running this code, the list of tasks is empty, even if there are items waiting in the queue (I have verified they are definitely in the queue using django-admin). The same is true for using the command line equivalent:
$ celeryctl inspect reserved
So I'm guessing this is not in fact what this command is for? If not, what is the accepted way to retrieve a list of tasks that have not yet started? Do I have to maintain my own list of task IDs in the code in order to query them?
The reason I ask is because I am trying to handle a situation where two tasks are queued which perform a write operation on the same object in the database. If both tasks execute in parallel and task 1 takes longer than task 2, it will overwrite the output from task 2, but I want the output from the most recent task i.e. task 2. So my plan was to cancel any pending tasks that operate on an object each time a new task is added which will write to the same object.
Thanks
Tom
You can see pending tasks using scheduled instead of reserved.
$ celeryctl inspect scheduled