I am new to celery.I know how to install and run one server but I need to distribute the task to multiple machines.
My project uses celery to assign user requests passing to a web framework to different machines and then returns the result.
I read the documentation but there it doesn't mention how to set up multiple machines.
What am I missing?
My understanding is that your app will push requests into a queueing system (e.g. rabbitMQ) and then you can start any number of workers on different machines (with access to the same code as the app which submitted the task). They will pick out tasks from the message queue and then get to work on them. Once they're done, they will update the tombstone database.
The upshot of this is that you don't have to do anything special to start multiple workers. Just start them on separate identical (same source tree) machines.
The server which has the message queue need not be the same as the one with the workers and needn't be the same as the machines which submit jobs. You just need to put the location of the message queue in your celeryconfig.py and all the workers on all the machines can pick up jobs from the queue to perform tasks.
The way I deployed it is like this:
clone your django project on a heroku instance (this will run the frontend)
add RabitMQ as an add on and configure it
clone your django project into another heroku instance (call it like worker) where you will run the celery tasks
Related
How can I properly kill celery tasks running on containers inside a kubernetes environment? The structure of the whole application (all written in Python) is as follows:
A SDK that makes requests to our API;
A Kubernetes structure with one pod running the API and other pods running celery containers to deal with some long-running tasks that can be triggered by the API. These celery containers autoscale.
Suppose we call a SDK method that in turn makes a request to the API that triggers a task to be run on a celery container. What would be the correct/graceful way to kill this task if need be? I am aware that celery tasks have a revoke() method, but I tried using this approach and it did not work, even using terminate=True and signal=signal.SIGKILL (maybe this has something to do with the fact that I am using Azure Service Bus as a broker?)
Perhaps a mapping between a celery task and its corresponding container name would help, but I could not find a way to get this information as well.
Any help and/or ideas would be deeply appreciated.
The solution I found was to write to file shared by both API and Celery containers. In this file, whenever an interruption is captured, a flag is set to true. Inside the celery containers I keep periodically checking the contents of such file. If the flag is set to true, then I gracefully clear things up and raise an error.
The title basically says it all. I have gunicorn running my app with 5 workers. I have a data structure that all the workers need access to that is being updated on a schedule by apscheduler. Currently apscheduler is being run once per worker, but I just want it run once period. Is there a way to do this? I've tried using the --preload option, which let's me load the shared data structure just once, but doesn't seem to let all the workers have access to it when it updates. I'm open to switching to uWSGI if that helps.
I'm not aware of any way to do this with either, at least not without some sort of RPC. That is, run APScheduler in a separate process and then connect to it from each worker. You may want to look up projects like RPyC and Execnet to do that.
I'm willing to send tasks from a web server (running Django) to a remote machine that is holding a Rabbitmq server and some workers that I implemented with Celery.
If I follow the Celery way to go, it seems I have to share the code between both machines, which means replicating the workers logic code in the web app code.
So:
Is there a best practice to do that? Since code is redundant, I am thinking about using a git submodule (=> replicated in the web app code repo, and in the workers code repo)
Should I better use something else than Celery then?
Am I missing something?
One way to manage this is to store your workers in your django project. Django and celery play nice to each other allowing you to use parts of your django project in your celery app. http://celery.readthedocs.org/en/latest/django/first-steps-with-django.html
Deploying this would mean that your web application would not use the modules involved with your celery workers, and on your celery machine your django views and such would never be used. This usually only results in a couple of megs of unused django application code...
You can use send_task. It takes same parameters than apply_async but you only have to give the task name. Without loading the module in django you can send tasks:
app.send_task('tasks.add', args=[2, 2], kwargs={})
http://celery.readthedocs.org/en/latest/reference/celery.html#celery.Celery.send_task
So I have a web service (flask + MySQL + celery) and I'm trying to figure out the proper way to deploy it on Elastic Beanstalk into separate Web Server and Worker environments/tiers. I currently have it working by launching the worker (using this answer) on the same instance as the web server, but obviously I want to have the worker(s) running in a separately auto-scaled environment. Note that the celery tasks rely on the main server code (e.g. making queries, etc) so they cannot be separated. Essentially it's an app with two entry points.
The only way I can think to do this is by having the code/config-script examine some env variable (e.g. ENV_TYPE = "worker" or "server") to determine whether to launch the standard flask app, or the celery worker.
The other caveat here is that I would have to "eb deploy" my code to two separate environments (server and worker), when I'd like/expect them to be deployed simultaneously since both use the same code base.
Apologies if this has been asked before, but I've looked around a lot and couldn't find anything, which I find surprising since this seems like a common use case.
Edit: Just found this answer, which addresses my concern for deploying twice (I guess it's technically deploy once and then update two environments, easily scriptable). But my question regarding how to bootstrap the application into server vs worker mode still stands.
Regarding the bootstrapping, if you setup an environment variable for an Elastic Beanstalk environment (docs here), then you never have to touch it again when you re-deploy your code with your script. You only need to add the environment variable if you create a new environment.
Thus when starting up, you can just check in Python for that ENV variable and then bootstrap from there and load what you need.
My preference is instead of creating a enum by specifying "worker" or "server", just do a boolean for the env variable like ENV_WORKER=1 or something. It'll remove possibility of typing mistakes and be easier to read.
if os.environ.get('ENV_WORKER') is not None:
# Bootstrap worker stuff here
else:
# Specific stuff for server here
I am new to celery.I know how to install and run one server but I need to distribute the task to multiple machines.
My project uses celery to assign user requests passing to a web framework to different machines and then returns the result.
I read the documentation but there it doesn't mention how to set up multiple machines.
What am I missing?
My understanding is that your app will push requests into a queueing system (e.g. rabbitMQ) and then you can start any number of workers on different machines (with access to the same code as the app which submitted the task). They will pick out tasks from the message queue and then get to work on them. Once they're done, they will update the tombstone database.
The upshot of this is that you don't have to do anything special to start multiple workers. Just start them on separate identical (same source tree) machines.
The server which has the message queue need not be the same as the one with the workers and needn't be the same as the machines which submit jobs. You just need to put the location of the message queue in your celeryconfig.py and all the workers on all the machines can pick up jobs from the queue to perform tasks.
The way I deployed it is like this:
clone your django project on a heroku instance (this will run the frontend)
add RabitMQ as an add on and configure it
clone your django project into another heroku instance (call it like worker) where you will run the celery tasks