Multiple Celery instances consuming a single queue

Multiple Celery instances consuming a single queue - python

Is it possible to have multiple celery instances possible on different machines consuming from a single queue for tasks, working with django preferably using django-orm as the backend?
How can i implement this if possible, I can't seem to find any documentation for this.

Yes it's possible, they just have to use the same broker. For instance, if you are using AMQP, the configs on your servers must share the same
BROKER_URL = 'amqp://user:password#localhost:5672//'
See the routing page for more details. For instance let's say you want to have a common queue for two servers, then one specific to each of them, you could do
On server 1:
CELERY_ROUTES = {'your_app.your_specific_tasks1': {'queue': 'server1'}}
user#server1:/$ celery -A your_celery_app worker -Q server1, default
On server 2:
CELERY_ROUTES = {'your_app.your_specific_tasks2': {'queue': 'server2'}}
user#server2:/$ celery -A your_celery_app worker -Q server2, default
Of course it's optional, by default all the tasks will be routed to the queue named celery.

Related

Celery send_task() method

I have my API, and some endpoints need to forward requests to Celery. Idea is to have specific API service that basically only instantiates Celery client and uses send_task() method, and seperate service(workers) that consume tasks. Code for task definitions should be located in that worker service. Basicaly seperating celery app (API) and celery worker to two seperate services.
I dont want my API to know about any celery task definitions, endpoints only need to use celery_client.send_task('some_task', (some_arguments)). So on one service i have my API, an on other service/host I have celery code base where my celery worker will execute tasks.
I came across this great article that describes what I want to do.
https://medium.com/#tanchinhiong/separating-celery-application-and-worker-in-docker-containers-f70fedb1ba6d
and this post Celery - How to send task from remote machine?
I need help on how to create routes for tasks from the API? I was expecting for celery_client.send_task() to have queue= keyword, but it does not. I need to have 2 queues, and two workers that will consume content from these two queues.
Commands for my workers:
celery -A <path_to_my_celery_file>.celery_client worker --loglevel=info -Q queue_1
celery -A <path_to_my_celery_file>.celery_client worker --loglevel=info -Q queue_2
I have also visited celery "Routing Tasks" documentation, but it is still unclear to me how to establish this communication.

Your API side should hold the router. I guess it's not an issue because it is only a map of task -> queue (aka send task1 to queue1).
In other words, your celery_client should have task_routes like:
task_routes = {
'mytasks.some_task': 'queue_1',
'mytasks.some_other_task': 'queue_2',
}

How to lock variable in celery task?

I have a celery task, that mangles some variable. It works perfect if I set a single celery worker, but when I use concurrency, it all messed up. How could I lock the critical section, where variable is mangled?
inb4: using Python 3.6, Redis both as broker and result backed. threading.Lock doesn't help in here.

As long as celery runs on multiple workers (processes) thread lock would not help, because it works inside single process. Moreover threading lock have use when you control overall process, while using celery there is no way to achieve that.
It means that celery requires distributed lock. For django I always use django-cache, as in: here. If you need more generic locks especially Redis based, working for any python app you can use sherlock.

I know this is a question with 2+ years, but right now I'm tuning my celery configs and I came to this topic.
I am using python 2.7 with Django 1.11 and celery 4 in a linux machine. I am using rabbitmq as a broker.
My configs implies to have celery running as daemon and celery beat to handle scheduled tasks.
And so, with a dedicated queue for a given task, you may configure this queue with a worker (process) with concurrency=1 (subprocesses).
This solution solves concurrency problems for celery to run a task, but in your code, if you run the task without celery, it won't respect concurrency principles.
Example code:
CELERY_TASK_QUEUES = (
Queue('celery_periodic', default_exchange, routing_key='celery_periodic'),
Queue('celery_task_1', celery_task_1_exchange, routing_key='celery_task_1'),
)
default_exchange = Exchange('celery_periodic', type='direct')
celery_task_1_exchange = Exchange('celery_task_1', type='direct')
CELERY_BEAT_SCHEDULE = {
'celery-task-1': {
'task': 'tasks.celery_task_1',
'schedule': timedelta(minutes=15),
'queue': 'celery_task_1'
},
}
and finally, in /etc/default/celeryd (docs here: https://docs.celeryproject.org/en/latest/userguide/daemonizing.html#example-configuration):
CELERYD_NODES="worker1 worker2"
CELERYD_OPTS="--concurrency=1 --time-limit=600 -Q:worker1 celery_periodic -Q:worker2 celery_task_1"
--concurrency N means you will have exactly N worker subprocesses for your worker instance (meaning the worker instance can handle N conccurent tasks) (from here: https://stackoverflow.com/a/44903753/9412892).
Read more here: https://docs.celeryproject.org/en/stable/userguide/workers.html#concurrency
BR,
Eduardo

Celery : understanding the big picture

Celery seems to be a great tool, but I have hard time understanding how the various Celery components work together:
The workers
The apps
The tasks
The message Broker (like RabbitMQ)
From what I understand, the command line:
celery -A not-clear-what-this-option-is worker
should run some sort of celery "worker server" which would itself need to connect to a broker server (I'm not so sure why so many servers are needed).
Then in any python code, some task may be sent to the worker by instantiating an app:
app = Celery('my_module', broker='pyamqp://guest#localhost//')
and then by decorating functions with this app in the following way:
#app.tasks
def my_func():
...
so that "my_func()" can now be called as "my_func.delay()" to be ran in an asynchronuous way.
Here are my questions:
What happens when my_func.delay() is called ? which server talks to which first ? and sending what where ?
What is the option to put behind the "-A" of the celery command? is this really needed ?
Suppose I have a process X which instantiates a Celery app to launch the task A, and suppose I have another process Y who wants to know the status of task A launched by X. I assume there is a way for Y to do so, but I don't know how. I suppose that Y should create its own instance of a Celery app. But then:
What function to call in the celery app of Y to get this information (and what is the "identifier" of task A inside the process Y) ?
How does this work in terms of communication, that is, when does the request goes through the Broker, and when does it go to the worker(s) ?
If anyone has some information about these questions, I would be grateful. I intend to use Celery in a Django project, where some requests to the server can trigger various time consuming tasks, and/or inquire about the status of previously launched tasks (pending, finished, error, etc...).

About the broker:
The main role of the broker is to mediate communication between the client and the worker
basically a lot of information is being generated and processed while your worker is running
taking care of this information is the broker's role
e.g. you can configure redis so that no information is lost if the server is shut down while running a process
The worker:
you can think of the worker as an instance independent of your application, which will only execute those tasks that you delegate to it
About the state of a task:
there are ways to consult celery to find out the status of a task, but I would not recommend building your application logic depending on this
if you want to get the output of a process and turn it in the input of another one, using tasks, I would recommend you to use a queue
run task A, and before finish insert your result objects in the queue
task B will listen to the queue and processes whatever comes up
The command:
on the terminal you can see in more detail what each argument means by running celery -h or celery --help
but the argument basically specifies which instance of celery you intend to run. So normally this argument will indicate where the instance you have configured and intend to execute can be found
usage: celery [-h] [-A APP] [-b BROKER] [--result-backend RESULT_BACKEND]
[--loader LOADER] [--config CONFIG] [--workdir WORKDIR]
[--no-color] [--quiet]
I hope this can provide an initial overview for those who get here

Celery is used to make functions to run in the background. Imagine you have a web API that does a job, and returns a response. You know, that job would seriously affect the response time for the API. So you'll transfer that particular job to Celery, and your API will respond instantly. Examples for some job that affect performance of an API are,
Routing to email servers
Routing to SMS Gateways
Database backup
Chained database operations
File conversion
Now, let's cover each components of celery.
The workers
Celery workers execute the job(function). They are asynchronous. So you'll have double the number of your processor cores as celery workers. You can assign a name and task to a celery worker#.
The apps
The app is the name of project you're working on. You'll have to specify that name in the celery instance.
The tasks
The functions you need to be executed in the background. Every task Celery execute will have a task id, state(and more). You can get that by inspecting a particular task.
The message Broker
Those tasks which will be executed in the background has to be moved from your python project to to Celery workers. Message brokers act as a medium here. So functions with its arguments will be transferred to brokers and from brokers Celery will fetch them to execute.
Some codes
celery -A project_name worker_name
celery -A project_name worker_name inspect
More in documentation
docs.celeryproject.org

Stop celery workers to consume from all queues

Cheers,
I have a celery setup running in a production environment (on Linux) where I need to consume two different task types from two dedicated queues (one for each). The problem that arises is, that all workers are always bound to both queues, even when I specify them to only consume from one of them.
TL;DR
Celery running with 2 queues
Messages are published in correct queue as designed
Workers keep consuming both queues
Leads to deadlock
General Information
Think of my two different task types as a hierarchical setup:
A task is a regular celery task that may take quite some time, because it dynamically dispatches other celery tasks and may be required to chain through their respective results
A node is a dynamically dispatched sub-task, which also is a regular celery task but itself can be considered an atomic unit.
My task thus can be a more complex setup of nodes where the results of one or more nodes serves as input for one or more subsequent nodes, and so on. Since my tasks can take longer and will only finish when all their nodes have been deployed, it is essential that they are handled by dedicated workers to keep a sufficient number of workers free to consume the nodes. Otherwise, this could lead to the system being stuck, when a lot of tasks are dispatched, each consumed by another worker, and their respective nodes are only queued but will never be consumed, because all workers are blocked.
If this is a bad design in general, please make any propositions on how I can improve it. I did not yet manage to build one of these processes using celery's built-in canvas primitives. Help me, if you can?!
Configuration/Setup
I run celery with amqp and have set up the following queues and routes in the celery configuration:
CELERY_QUERUES = (
Queue('prod_nodes', Exchange('prod'), routing_key='prod.node'),
Queue('prod_tasks', Exchange('prod'), routing_key='prod.task')
)
CELERY_ROUTES = (
'deploy_node': {'queue': 'prod_nodes', 'routing_key': 'prod.node'},
'deploy_task': {'queue': 'prod_tasks', 'routing_key': 'prod.task'}
)
When I launch my workers, I issue a call similar to the following:
celery multi start w_task_01 w_node_01 w_node_02 -A my.deployment.system \
-E -l INFO -P gevent -Q:1 prod_tasks -Q:2-3 prod_nodes -c 4 --autoreload \
--logfile=/my/path/to/log/%N.log --pidfile=/my/path/to/pid/%N.pid
The Problem
My queue and routing setup seems to work properly, as I can see messages being correctly queued in the RabbitMQ Management web UI.
However, all workers always consume celery tasks from both queues. I can see this when I start and open up the flower web UI and inspect one of the deployed tasks, where e.g. w_node_01 starts consuming messages from the prod_tasks queue, even though it shouldn't.
The RabbitMQ Management web UI furthermore tells me, that all started workers are set up as consumers for both queues.
Thus, I ask you...
... what did I do wrong?
Where is the issue with my setup or worker start call; How can I circumvent the problem of workers always consuming from both queues; Do I really have to make additional settings during runtime (what I certainly do not want)?
Thanks for your time and answers!

You can create 2 separate workers for each queue and each one's define what queue it should get tasks from using the -Q command line argument.
If you want to keep the number processes the same, by default a process is opened for each core for each worker you can use the --concurrency flag (See Celery docs for more info)

Celery allows configuring a worker with a specific queue.
1) Specify the name of the queue with 'queue' attribute for different types of jobs
celery.send_task('job_type1', args=[], kwargs={}, queue='queue_name_1')
celery.send_task('job_type2', args=[], kwargs={}, queue='queue_name_2')
2) Add the following entry in configuration file
CELERY_CREATE_MISSING_QUEUES = True
3) On starting the worker, pass -Q 'queue_name' as argument, for consuming from that desired queue.
celery -A proj worker -l info -Q queue_name_1 -n worker1
celery -A proj worker -l info -Q queue_name_2 -n worker2

Changing CELERYBEAT_SCHEDULER

I'm have two django apps using the same database. Both using celery and having own CELERYBEAT_SCHEDULE. Is there a way to distinct which tasks should be run in proper celery worker ? now tasks from both apps are scheduled in the same worker.
The reason of that is (?) CELERYBEAT_SCHEDULER to 'djcelery.schedulers.DatabaseScheduler'. I couldn't find any approach/scheduler not using Database. As a broker I'm using redis (local/different for each app) so tasks from outside of CELERYBEAT_SCHEDULE are working properly.

You can define two queues in CELERY_QUEUES setting and assign one queue to each celerybeat using --queues=queue1 parameter.
You can assing task to specific queue using options:
'options': {'queue': 'queue1'},

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiple Celery instances consuming a single queue - python

Is it possible to have multiple celery instances possible on different machines consuming from a single queue for tasks, working with django preferably using django-orm as the backend? How can i implement this if possible, I can't seem to find any documentation for this.

Related

Celery send_task() method

How to lock variable in celery task?

Celery : understanding the big picture

Stop celery workers to consume from all queues

Changing CELERYBEAT_SCHEDULER

Categories

Resources