I have 8 workers in celery all running on different machines that run very long tasks (multiple hours)
These tasks are able to be stopped with app.control.revoke(task_id, terminate=True) and are visible in app.control.instpect()
What happens is that some order of events causes the workers to disconnect from RabbitMQ and are no longer visible within the app.control.instpect() method, but the underlying code continues to run within the worker.
These tasks are also not able to be terminated. And what would cause something like this to happen?
Is there a way to prevent this disconnect from happening? And what would cause something like this to happen?
The worker is started with celery -A project worker -l info -P eventlet --concurrency 5 > /dev/null
Related
How do I make the celery -A app worker command to consume only a single task and then exit.
I want to run celery workers as a kubernetes Job that finishes after handling a single task.
I'm using KEDA for autoscaling workers according to queue messages.
I want to run celery workers as jobs for long running tasks, as suggested in the documentation:
KEDA long running execution
There's not really anything specific for this. You would have to hack in your own driver program, probably via a custom concurrency module. Are you trying to use Keda ScaledJobs or something? You would just use a ScaledObject instead.
I have two servers, there is one celery worker on each server. I use Redis as the broker to collaborate with workers.
My question is how can I make only one worker run for most of the time, and once this worker is broken, another worker will turn on as a backup worker?
Basically, just take one worker as a back-up.
I know how to specify a task to a certain worker by a queue on the worker respectively, after reading the doc [http://docs.celeryproject.org/en/latest/userguide/routing.html#redis-message-priorities]
This is, in my humble opinion, completely against the point of having distributed system to off-load CPU heavy, or long-running tasks, or have thousands of small tasks that you can't run elsewhere...
- You are running two servers anyway, so why keeping the other one idle? More workers mean you will be able to process more tasks concurrently.
If you are not convinced, and still want to do this, you need to write a tiny service on machine with idle Celery worker. This service will periodically check the health of the active worker, and if that check fails, you will run Celery worker on the backup server.
Here is a question for you - why this service simply does not restart the Celery worker on the active server? - It is pretty much possible to do that, so again, I see no justification for having a completely idle machine doing nothing. If you are on a cloud platform, you can easily spin up a new instance from an existing image of your Celery worker. This is scenario I use in production.
If I am running Celery on (say) a bank of 50 machines all using a distributed RabbitMQ cluster.
If I have a task that is running and I know the task id, how in the world can Celery figure out which machine its running on to terminate it?
Thanks.
I am not sure if you can actually do it, when you spawn a task you will have a worker, somewhere in you 50 boxes, that executes that and you technically have no control on it as it s a separate process and the only thing you can control is either the asyncResult or the amqp message on the queue.
I am using airflow 1.7.1.3.
I have an issue with concurrency DAGs / Tasks. When a DAG is running, the scheduler does not launch other DAGs any more. It seems that scheduler is totally frozen (no logs anymore) ... until the running DAG is finished. Then, the new DAGrun is triggered. My different tasks are long-running ECS task (~10 minutes)
I used LocalExecutor and I let default config about parallelism=32 and dag_concurrency=16. I use airflow scheduler -n 20 and reboot it automatically and I set 'depends_on_past': False for all my DAGs declaration.
For information, I deployed airflow in containers running in an ECS cluster. max_threads = 2 and I have only 2 CPU available.
Any ideas ? Thanks
I ran into this issue as well using the LocalExecutor. It seems to be a limitation in how the LocalExecutor works. The scheduler ends up spawning child processes (32 in your case). In addition, your scheduler performs 20 iterations per execution, so by the time it gets to the end of its 20 runs, it waits for its child processes to terminate before the scheduler can exit. If there is a long-running child process, the scheduler will be blocked on its execution.
For us, the resolution was to switch to the CeleryExecutor. Of course, this requires a bit more infrastructure, management, and overall complexity for the Celery backend.
I'm running celery on multiple servers, each with a concurrency of 2 or more and I want to load balance celery tasks so that the server that has the lowest CPU usage can process my celery tasks.
For example, lets say I have 2 servers (A and B), each with a concurrency of 2, if I have 2 tasks in the queue, I want A to process one task and B to process the other. But currently its possible that the first process on A will execute one task and the second process on A will execute the second task while B is sitting idle.
Is there a simple way, by means of celery extensions or config, that I can route tasks to the server with lowest CPU usage?
Best option is to use celery.send_task from the producing server, then deploy the workers onto n instances. The workers can then be run as #ealeon mentioned, using celery -A proj worker -l info -Ofair.
This way, load will be distributed across all servers without having to have the codebase present on the consuming servers.
Try:
"http://docs.celeryproject.org/en/latest/userguide/optimizing.html#guide-optimizing
You can disable this prefetching behavior by enabling the -Ofair worker option:
$ celery -A proj worker -l info -Ofair"