Celery send_task() method - python

I have my API, and some endpoints need to forward requests to Celery. Idea is to have specific API service that basically only instantiates Celery client and uses send_task() method, and seperate service(workers) that consume tasks. Code for task definitions should be located in that worker service. Basicaly seperating celery app (API) and celery worker to two seperate services.
I dont want my API to know about any celery task definitions, endpoints only need to use celery_client.send_task('some_task', (some_arguments)). So on one service i have my API, an on other service/host I have celery code base where my celery worker will execute tasks.
I came across this great article that describes what I want to do.
https://medium.com/#tanchinhiong/separating-celery-application-and-worker-in-docker-containers-f70fedb1ba6d
and this post Celery - How to send task from remote machine?
I need help on how to create routes for tasks from the API? I was expecting for celery_client.send_task() to have queue= keyword, but it does not. I need to have 2 queues, and two workers that will consume content from these two queues.
Commands for my workers:
celery -A <path_to_my_celery_file>.celery_client worker --loglevel=info -Q queue_1
celery -A <path_to_my_celery_file>.celery_client worker --loglevel=info -Q queue_2
I have also visited celery "Routing Tasks" documentation, but it is still unclear to me how to establish this communication.

Your API side should hold the router. I guess it's not an issue because it is only a map of task -> queue (aka send task1 to queue1).
In other words, your celery_client should have task_routes like:
task_routes = {
'mytasks.some_task': 'queue_1',
'mytasks.some_other_task': 'queue_2',
}

Related

Celery: How to separate the logic of Publisher and Consumer?

I am new to Celery. In this example, I am unable to figure out how to separate the logic of publisher and consumer. Is the command celery -A tasks worker --loglevel=INFO used to start working for publishing or consuming?
If add.delay(4, 4) is to push data into a queue, how do I connect to the same queue in a separate code file and consume it?
Publishers are typically either Celery beat (scheduler), custom scripts that you develop, or other tasks executed by Celery workers in your cluster.
Consumers are EXCLUSIVELY Celery workers. Unless you dig really deep into Celery/Kombu and implement your own consumer you are pretty much not able to write consumer so easily.

Specify Worker in Celery

I have two workers:
celery worker -l info --concurrency=2 -A o_broker -n main_worker
celery worker -l info --concurrency=2 -A o_broker -n second_worker
I am using flower to monitor and receive API requests for these workers:
flower -A o_broker
to launch these celery workers from an API I use flower per the docs:
curl -X POST -d '{"args":[1,2]}' 'http://localhost:5555/api/task/async-apply/o_broker.add'
However, with this POST request it runs the task on either one of the workers. I need to choose to run a specific broker to complete the task.
How do I specify or set this up so I can choose what worker to use for the add task? If you have a solution using another API without flower, that would also work.
The easiest way to achieve this is with separate queues. Start worker with -Q first_worker,celery and the second broker with -Q second_worker,celery. celery is the default queue name in celery.
Now, when you want to send a task to just the first worker, you can route the task to the first_worker queue using celery's task_routes setting. You treat routing tasks to the second_worker queue symmetrically. You can also manually route a particular task call to a certain queue when using apply_async, e.g.:
add.apply_async(args=(1, 2), queue='first_worker')
n.b., last I checked, flower will only monitor one of your queues (by default it's the celery queue).

Celery : understanding the big picture

Celery seems to be a great tool, but I have hard time understanding how the various Celery components work together:
The workers
The apps
The tasks
The message Broker (like RabbitMQ)
From what I understand, the command line:
celery -A not-clear-what-this-option-is worker
should run some sort of celery "worker server" which would itself need to connect to a broker server (I'm not so sure why so many servers are needed).
Then in any python code, some task may be sent to the worker by instantiating an app:
app = Celery('my_module', broker='pyamqp://guest#localhost//')
and then by decorating functions with this app in the following way:
#app.tasks
def my_func():
...
so that "my_func()" can now be called as "my_func.delay()" to be ran in an asynchronuous way.
Here are my questions:
What happens when my_func.delay() is called ? which server talks to which first ? and sending what where ?
What is the option to put behind the "-A" of the celery command? is this really needed ?
Suppose I have a process X which instantiates a Celery app to launch the task A, and suppose I have another process Y who wants to know the status of task A launched by X. I assume there is a way for Y to do so, but I don't know how. I suppose that Y should create its own instance of a Celery app. But then:
What function to call in the celery app of Y to get this information (and what is the "identifier" of task A inside the process Y) ?
How does this work in terms of communication, that is, when does the request goes through the Broker, and when does it go to the worker(s) ?
If anyone has some information about these questions, I would be grateful. I intend to use Celery in a Django project, where some requests to the server can trigger various time consuming tasks, and/or inquire about the status of previously launched tasks (pending, finished, error, etc...).
About the broker:
The main role of the broker is to mediate communication between the client and the worker
basically a lot of information is being generated and processed while your worker is running
taking care of this information is the broker's role
e.g. you can configure redis so that no information is lost if the server is shut down while running a process
The worker:
you can think of the worker as an instance independent of your application, which will only execute those tasks that you delegate to it
About the state of a task:
there are ways to consult celery to find out the status of a task, but I would not recommend building your application logic depending on this
if you want to get the output of a process and turn it in the input of another one, using tasks, I would recommend you to use a queue
run task A, and before finish insert your result objects in the queue
task B will listen to the queue and processes whatever comes up
The command:
on the terminal you can see in more detail what each argument means by running celery -h or celery --help
but the argument basically specifies which instance of celery you intend to run. So normally this argument will indicate where the instance you have configured and intend to execute can be found
usage: celery [-h] [-A APP] [-b BROKER] [--result-backend RESULT_BACKEND]
[--loader LOADER] [--config CONFIG] [--workdir WORKDIR]
[--no-color] [--quiet]
I hope this can provide an initial overview for those who get here
Celery is used to make functions to run in the background. Imagine you have a web API that does a job, and returns a response. You know, that job would seriously affect the response time for the API. So you'll transfer that particular job to Celery, and your API will respond instantly. Examples for some job that affect performance of an API are,
Routing to email servers
Routing to SMS Gateways
Database backup
Chained database operations
File conversion
Now, let's cover each components of celery.
The workers
Celery workers execute the job(function). They are asynchronous. So you'll have double the number of your processor cores as celery workers. You can assign a name and task to a celery worker#.
The apps
The app is the name of project you're working on. You'll have to specify that name in the celery instance.
The tasks
The functions you need to be executed in the background. Every task Celery execute will have a task id, state(and more). You can get that by inspecting a particular task.
The message Broker
Those tasks which will be executed in the background has to be moved from your python project to to Celery workers. Message brokers act as a medium here. So functions with its arguments will be transferred to brokers and from brokers Celery will fetch them to execute.
Some codes
celery -A project_name worker_name
celery -A project_name worker_name inspect
More in documentation
docs.celeryproject.org

Multiple Celery instances consuming a single queue

Is it possible to have multiple celery instances possible on different machines consuming from a single queue for tasks, working with django preferably using django-orm as the backend?
How can i implement this if possible, I can't seem to find any documentation for this.
Yes it's possible, they just have to use the same broker. For instance, if you are using AMQP, the configs on your servers must share the same
BROKER_URL = 'amqp://user:password#localhost:5672//'
See the routing page for more details. For instance let's say you want to have a common queue for two servers, then one specific to each of them, you could do
On server 1:
CELERY_ROUTES = {'your_app.your_specific_tasks1': {'queue': 'server1'}}
user#server1:/$ celery -A your_celery_app worker -Q server1, default
On server 2:
CELERY_ROUTES = {'your_app.your_specific_tasks2': {'queue': 'server2'}}
user#server2:/$ celery -A your_celery_app worker -Q server2, default
Of course it's optional, by default all the tasks will be routed to the queue named celery.

How to make celery retry using the same worker?

I'm just starting out with celery in a Django project, and am kinda stuck at this particular problem: Basically, I need to distribute a long-running task to different workers. The task is actually broken into several steps, each of which takes considerable time to complete. Therefore, if some step fails, I'd like celery to retry this task using the same worker to reuse the results from the completed steps. I understand that celery uses routing to distribute tasks to certain server, but I can't find anything about this particular problem. I use RabbitMQ as my broker.
You could have every celeryd instance consume from a queue named after the hostname of the worker:
celeryd -l info -n worker1.example.com -Q celery,worker1.example.com
sets the hostname to worker1.example.com and will consume from a queue named the same, as well as the default queue (named celery).
Then to direct a task to a specific worker you can use:
task.apply_async(args, kwargs, queue="worker1.example.com")
similary to direct a retry:
task.retry(queue="worker1.example.com")
or to direct the retry to the same worker:
task.retry(queue=task.request.hostname)

Categories