With Airflow 1.8.1, I am using LocalExecutor with max_active_runs_per_dag=16, I called a for loop to dynamic create tasks (~100) with PythonOperators. Most time the tasks completed without any issues. However, it is still possible that task is with queue status but scheduler seems forget it, I can clear the task and able to rerun the queued task and worked, but would like to know how to avoid stuck in queue.
Related
I have been trying to revoke some Celery tasks by using
app.control.revoke(task_id)
As I am currently using gevent pool, I do understand that you would not be able to revoke a task that is currently executing as seen here.
However, it seems like I can't revoke tasks that are queued as well (base on what I can see from rabbitmq, these tasks are unacked). Do correct me if I am wrong, these queued tasks have yet to be executed and it should have nothing to do with the execution pool. How revoke works is that workers will have a copy of that task_id that is supposed to be revoked, save it (persistent revoke), and when the task reaches the worker, the worker will skip it.
I am using docker to coordinate everything and have included --statedb=/celery/celery.state . This file exists in the directory of the docker worker container so that should be fine.
I'm running Airflow in a Celery cluster and I'm having trouble getting cleared tasks to run after their dag is already marked as success.
Say I have a dag in the following state:
I want to run everything up to dummy_task_2 again for some reason. I run clear upstream on the task and everything before it is cleared:
In this state the scheduler will not schedule tasks 1 and 6 for execution, unless I also clear tasks 5 or 4:
basically it seems as if the scheduler simply does not care about tasks in a dag if all the 'final' tasks are already marked as success. If I clear those, then it goes back and picks up all the cleared tasks.
I've googled the hell out of this and it doesn't seem like intended behavior. I've checked the scheduler logs and there's nothing unusual, it simply reports No tasks to send to the executor until I clear one of the last tasks in the dag.
Is this a bug? Are there any logs I could check to see what's happening? I've even tried manually changing the state of the dag in the metadata database from 'success' to 'running' but the scheduler just reverts it to 'success' immediately.
I'm creating a celery task in a situation where task producers are more than consumers (workers). Now since my queues are getting filled up and the workers consume in FCFS manner, can I get to execute a specific task(given a task_id) instantly?
for eg:
My tasks are filled in the following fashion. [1,2,3,4,5,6,7,8,9,0]. Now the tasks are fetched from the zeroth index. Now a situation arise where I want to execute task 8 above all. How can I do this?
The worker need not execute that task (because there can be situation where a worker is already occupied). It can be run directly from the application. And when the task is completed (either from the worker or directly from the application), it should get deleted from the queue.
I know how to forcefully revoke a task (given a task_id) but how can I execute a task given an id ?
how can I execute a task given an id ?
the short answer is you can't. Celery workers pull tasks off the broker backend as they become available.
Why not?
Note that's not a limitation of Celery as such, rather it is a characteristic of message queuing systems(MQS) in general. The point of MQS is to desynchronize an application's component so that the producer can go on to do other work while workers execute the tasks asynchronously. In other words, once a task has been sent off it cannot be modified (but it can be removed as long as it has not been started yet).
What options are there?
Celery offers you several options to deal with lower v.s. higher priority or short- and long-running tasks, at task submission time:
Routing - tasks can be routed to different workers. So if your tasks [0 .. 9] are all long-running, except for task 8, you could route task 8 to a worker, or a set of workers, that deal with short-running tasks.
Timed execution - specify a countdown or estimated time of arrival (eta) for each task. That's a good option if you know that some tasks can be delayed for later execution i.e. when the system will be less busy. This leaves workers ready for those tasks that need to be executed immediately.
Task expiry - specify an expire countdown or time with a callback. This way the task will be revoked if it didn't execute within the time alloted to it and the callback can start an alternative course of action.
Check on task results periodically, revoke a task if it didn't start executing within some time. Note this is different from task expiry where the revoking only happens once a worker has fetched the task from the queue - if the queue is full the revoking may happen too late for your use case. Checking results periodically means you have another component in your system that does this and determines an alternate course of action.
I've been learning about celery and haven't been able to find the answer to a conceptual question and have had odd results experimenting.
When there are scheduled tasks (by scheduled, I don't mean periodic but scheduled to run in the future using eta=x) submitted to Celery, they seem to be consumed from the queue by a worker right away (rather than staying in the Redis default celery key/queue). Presumably, the worker will actually execute the tasks at eta.
What happens if that worker were to be shut down or restarted (to update it's registered tasks for example)? Would those scheduled tasks be lost? They are not "running" so a warm terminate wouldn't wait for them to finish of course.
Is there a way to force those tasks to be return to the queue and consumed by the next available worker?
I suppose, manually, one could dump the tasks before shutting down a worker:
http://celery.readthedocs.org/en/latest/userguide/workers.html#inspecting-workers
and resubmit them when a new worker is back up... but is this supposed to happen automatically?
Would really appreciate any help with this
Thanks
Take a look at acks_late
http://celery.readthedocs.org/en/latest/reference/celery.app.task.html#celery.app.task.Task.acks_late
If set to true Celery will keep the task in the queue until it has been successfully executed.
Update: Celery 5.1
Workers will acknowledge the message even if acks_late is enabled. This is the default and intentional setting set forth by the library. [Ref]
To change the default settings and re-queue your unfinished tasks, you can use the task_reject_on_worker_lost config. [Ref]
Although keep in mind that this could lead to a message loop and can cause unintended effects if your tasks are not idempotent.
Specifically for eta tasks, queues wait for workers to acknowledge the tasks before deleting them. With default settings, celery workers ack right before the task is executed and with acks_late when the task is finished executing.
So when workers fail to ack the tasks probably because of shutdown/restart/lost_connection or in case of Redis/SQS visibility_timeout exceeded [ref], the queue will redeliver the message to any available worker.
Is there any way to make Celery recheck if there are any tasks in the main queue ready to be started? Will the remote command add_consumer() do the job?
The reason: I am running several concurrent tasks, which spawn multiple sub-processes. When the tasks are done, the sub-processes sometimes take a few seconds to finish, so because the concurrency limit is maxed out by sub-processes, a new task from the queue is never started. And because Celery does not check again when the sub-processes finish, the queue gets stalled with no active tasks. Therefore I want to add a periodical task that tells Celery to recheck the queue and and start the next task. How do I tell Celery this?
From the docs:
The add_consumer control command will tell one or more workers to start consuming from a queue. This operation is idempotent.
Yes, add_consumer does what you want. You could also combine that with a periodic task to "recheck the queue and start the next task" every so often (depending on your need)