How can I properly kill a celery task in a kubernetes environment? - python

How can I properly kill celery tasks running on containers inside a kubernetes environment? The structure of the whole application (all written in Python) is as follows:
A SDK that makes requests to our API;
A Kubernetes structure with one pod running the API and other pods running celery containers to deal with some long-running tasks that can be triggered by the API. These celery containers autoscale.
Suppose we call a SDK method that in turn makes a request to the API that triggers a task to be run on a celery container. What would be the correct/graceful way to kill this task if need be? I am aware that celery tasks have a revoke() method, but I tried using this approach and it did not work, even using terminate=True and signal=signal.SIGKILL (maybe this has something to do with the fact that I am using Azure Service Bus as a broker?)
Perhaps a mapping between a celery task and its corresponding container name would help, but I could not find a way to get this information as well.
Any help and/or ideas would be deeply appreciated.

The solution I found was to write to file shared by both API and Celery containers. In this file, whenever an interruption is captured, a flag is set to true. Inside the celery containers I keep periodically checking the contents of such file. If the flag is set to true, then I gracefully clear things up and raise an error.

Related

Apache Airflow without celery or kubernetes

Is there any way to run workflows by not using celery or kubernetes. Doc specifies only two ways to run it in multi-cluster mode. Can't I just have another multiple EC2 instances to run my workers for computations. (Without using celery or kubernetes).
Let's assume you have a number of EC2 instances. How would you manage them from Airflow? How would you distribute the load among those EC2 instances? Celery or Kubernetes take care exactly of these tasks.
If, for some reason, you cannot use Celery or Kubernetes, you can install Airflow on a single instance and scale up its resources as needed.
The only way to accomplish what you want is to write your own Executor (EC2Executor?) that fulfils your requirements.

Using RQ with a Docker container hosted on Heroku

A client has a goal to use Heroku for a project, but I'm stuck on how to make it work.
Basically, I need to run a function that takes around 15 seconds and depends on a custom repo, scipy, and some other dependencies that are typically NOT hosted on Heroku. So I turned the app into a docker container and pushed it to Heroku. So far so good.
The results of that function need to be returned via an API response. So I planned to use RQ for a task queue, and set up a worker process. Since my app is already using Docker, I had to stick with another docker container for the worker.
I can submit the task via the main app, and the worker takes over. However, the worker is in a separate docker container and cannot import the function. If I move the function completely to the worker, then I have the same problem in that I am unable to import the function into the main app when I call enqueue.
Does anybody have an idea how to resolve this? I feel like it is a complete mess right now.

Celery tasks functions - web server vs remote server

I'm willing to send tasks from a web server (running Django) to a remote machine that is holding a Rabbitmq server and some workers that I implemented with Celery.
If I follow the Celery way to go, it seems I have to share the code between both machines, which means replicating the workers logic code in the web app code.
So:
Is there a best practice to do that? Since code is redundant, I am thinking about using a git submodule (=> replicated in the web app code repo, and in the workers code repo)
Should I better use something else than Celery then?
Am I missing something?
One way to manage this is to store your workers in your django project. Django and celery play nice to each other allowing you to use parts of your django project in your celery app. http://celery.readthedocs.org/en/latest/django/first-steps-with-django.html
Deploying this would mean that your web application would not use the modules involved with your celery workers, and on your celery machine your django views and such would never be used. This usually only results in a couple of megs of unused django application code...
You can use send_task. It takes same parameters than apply_async but you only have to give the task name. Without loading the module in django you can send tasks:
app.send_task('tasks.add', args=[2, 2], kwargs={})
http://celery.readthedocs.org/en/latest/reference/celery.html#celery.Celery.send_task

Celery on a different server [duplicate]

I am new to celery.I know how to install and run one server but I need to distribute the task to multiple machines.
My project uses celery to assign user requests passing to a web framework to different machines and then returns the result.
I read the documentation but there it doesn't mention how to set up multiple machines.
What am I missing?
My understanding is that your app will push requests into a queueing system (e.g. rabbitMQ) and then you can start any number of workers on different machines (with access to the same code as the app which submitted the task). They will pick out tasks from the message queue and then get to work on them. Once they're done, they will update the tombstone database.
The upshot of this is that you don't have to do anything special to start multiple workers. Just start them on separate identical (same source tree) machines.
The server which has the message queue need not be the same as the one with the workers and needn't be the same as the machines which submit jobs. You just need to put the location of the message queue in your celeryconfig.py and all the workers on all the machines can pick up jobs from the queue to perform tasks.
The way I deployed it is like this:
clone your django project on a heroku instance (this will run the frontend)
add RabitMQ as an add on and configure it
clone your django project into another heroku instance (call it like worker) where you will run the celery tasks

Running celery in django not as an external process?

I want to give celery a try. I'm interested in a simple way to schedule crontab-like tasks, similar to Spring's quartz.
I see from celery's documentation that it requires running celeryd as a daemon process. Is there a way to refrain from running another external process and simply running this embedded in my django instance? Since I'm not interested in distributing the work at the moment, I'd rather keep it simple.
Add CELERY_ALWAYS_EAGER=True option in your django settings file and all your tasks will be executed locally. Seems like for the periodic tasks you have to execute celery beat as well.

Categories