How to execute tasks in different server using Python Celery? - python

Let's say I have three servers A, B & C. Server C will have the celery task code, and I need to execute them from servers A & B.
From the celery documentation, I see that there's a task.py file which is run as a celery worker
from celery import Celery
app = Celery('tasks', broker='pyamqp://guest#localhost//')
#app.task
def add(x, y):
return x + y
And then we have another python file (let's say client.py) which calls these tasks.
from tasks import add
add.delay(4, 4)
Here I can see that the client.py file is dependent on the tasks.py file as it's importing the task from the tasks.py file. If we are to run these two files in separate servers, we need to decouple them and somehow call the tasks without having to import the code. I am not able to figure out how to achieve that. So, how can it be done?

In general you do not do that. You deploy the same code (containing tasks) to both producers (clients) and consumers (workers). However, Celery is a cool piece of software and it allows you to actually schedule task without the need to distribute the code on the producer side. For that you have to use the send_task() method. You have to configure producer with same parameters as your workers (same broker naturally, same serialization) and you must know the calling task name and its parameters in order to schedule its execution correctly.

Related

Ensuring one task of a kind is being executed at a time. (Multiple tasks of different kinds can be executed concurrentlly) celery python

Let's say I have created two shared tasks:
from celery import shared_task
#shared_task
def taskA():
#do something
pass
#shared_task
def taskB():
#do something else
pass
I am using celery to perform certain tasks that will be invoked by the users of my Django project.
I have no issue with taskA and taskB being executed at the same time.
But, if taskA is already being executed, and another user tries to invoke taskA again, I want to show them an error message.
Is there a way to do that?
The only reliable way to do this that I can think of is to have a Celery worker with concurrency set to 1, subscribed to a dedicated queue. Then you send taskA to this particular queue.

I want to run my task of Project A from Project B using celery. Both of them have different code base. Is it possible to do so?

I am using celery in both the projects. Redis is used as a broker and backend. Suppose project A has task
#shared_task()
def add(x,y):
return x+y
Now I want to send this task in Project B and execute it there. Is there any possible way to do so?
Both the project has configuration like this
from celery import Celery
app = Celery('task',broker='redis://localhost:6379/1',backend='rpc://')

Enqueue celery task from other project

I have a project using celery to process tasks, and a second project which is an API that might need to enqueue tasks to be processed by celery workers.
However, these 2 projects are separated and I can't import the tasks in the API one.
I've used Sidekiq - Celery's equivalent in Ruby - in the past, and for example it is possible to push jobs by storing data in Redis from other languages/apps/processes if using the same format/payload.
Is something similar possible with Celery ? I couldn't find anything related.
Yes, this is possible in celery using send_task or signatures. Assuming fetch_data is the function in a separate code base, you can invoke it using one of the below methods
send_task
celery_app.send_task('fetch_data', kwargs={'url': request.json['url']})
app.signature
celery_app.signature('fetch_data', kwargs={'url': request.json['url']).delay()
You just specify the function name as a string and do not need to import it into your codebase.
You can read about this in more detail from https://www.distributedpython.com/2018/06/19/call-celery-task-outside-codebase/

Celery : understanding the big picture

Celery seems to be a great tool, but I have hard time understanding how the various Celery components work together:
The workers
The apps
The tasks
The message Broker (like RabbitMQ)
From what I understand, the command line:
celery -A not-clear-what-this-option-is worker
should run some sort of celery "worker server" which would itself need to connect to a broker server (I'm not so sure why so many servers are needed).
Then in any python code, some task may be sent to the worker by instantiating an app:
app = Celery('my_module', broker='pyamqp://guest#localhost//')
and then by decorating functions with this app in the following way:
#app.tasks
def my_func():
...
so that "my_func()" can now be called as "my_func.delay()" to be ran in an asynchronuous way.
Here are my questions:
What happens when my_func.delay() is called ? which server talks to which first ? and sending what where ?
What is the option to put behind the "-A" of the celery command? is this really needed ?
Suppose I have a process X which instantiates a Celery app to launch the task A, and suppose I have another process Y who wants to know the status of task A launched by X. I assume there is a way for Y to do so, but I don't know how. I suppose that Y should create its own instance of a Celery app. But then:
What function to call in the celery app of Y to get this information (and what is the "identifier" of task A inside the process Y) ?
How does this work in terms of communication, that is, when does the request goes through the Broker, and when does it go to the worker(s) ?
If anyone has some information about these questions, I would be grateful. I intend to use Celery in a Django project, where some requests to the server can trigger various time consuming tasks, and/or inquire about the status of previously launched tasks (pending, finished, error, etc...).
About the broker:
The main role of the broker is to mediate communication between the client and the worker
basically a lot of information is being generated and processed while your worker is running
taking care of this information is the broker's role
e.g. you can configure redis so that no information is lost if the server is shut down while running a process
The worker:
you can think of the worker as an instance independent of your application, which will only execute those tasks that you delegate to it
About the state of a task:
there are ways to consult celery to find out the status of a task, but I would not recommend building your application logic depending on this
if you want to get the output of a process and turn it in the input of another one, using tasks, I would recommend you to use a queue
run task A, and before finish insert your result objects in the queue
task B will listen to the queue and processes whatever comes up
The command:
on the terminal you can see in more detail what each argument means by running celery -h or celery --help
but the argument basically specifies which instance of celery you intend to run. So normally this argument will indicate where the instance you have configured and intend to execute can be found
usage: celery [-h] [-A APP] [-b BROKER] [--result-backend RESULT_BACKEND]
[--loader LOADER] [--config CONFIG] [--workdir WORKDIR]
[--no-color] [--quiet]
I hope this can provide an initial overview for those who get here
Celery is used to make functions to run in the background. Imagine you have a web API that does a job, and returns a response. You know, that job would seriously affect the response time for the API. So you'll transfer that particular job to Celery, and your API will respond instantly. Examples for some job that affect performance of an API are,
Routing to email servers
Routing to SMS Gateways
Database backup
Chained database operations
File conversion
Now, let's cover each components of celery.
The workers
Celery workers execute the job(function). They are asynchronous. So you'll have double the number of your processor cores as celery workers. You can assign a name and task to a celery worker#.
The apps
The app is the name of project you're working on. You'll have to specify that name in the celery instance.
The tasks
The functions you need to be executed in the background. Every task Celery execute will have a task id, state(and more). You can get that by inspecting a particular task.
The message Broker
Those tasks which will be executed in the background has to be moved from your python project to to Celery workers. Message brokers act as a medium here. So functions with its arguments will be transferred to brokers and from brokers Celery will fetch them to execute.
Some codes
celery -A project_name worker_name
celery -A project_name worker_name inspect
More in documentation
docs.celeryproject.org

Setup celery periodic task

How do I set up a periodic task with Celerybeat and Flask that queries a database every hour?
The environment looks like this:
/
|-app
|-__init__.py
|-jobs
|-task.py
|-celery-beat.sh
|-celery-worker.sh
|-manage.py
I currently have a query function called run_query() located in task.py
I want the scheduler to kick in once the application initiates so I have the following lines in my /app/__init__.py folder:
celery = Celery()
#celery.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(1, app.jobs.task.run_query())
(For simplicity's sake, I've set it up so that if it runs, it will run every minute. No such luck yet.)
When I launch the celery-worker.sh it recognizes my function under the [tasks] heading. But the scheduled function never runs. I can manually force the function to run by issuing the following at the command prompt:
>> from app.jobs import task
>> task.run_query.delay()
EDIT: Added celerybeat.sh
As a follow up: If the database is accessed through a flask context, during my asynch function call is it wise to create a new flask context to access the database? Use the existing flask context? Or forget contexts altogether and just initiate a connection to the database? My worry is that if I just initiate a new connection it may interfere with the existing context's connection?
To run periodic tasks you need some kind of schduler (Eg. celery beat).
celery beat is a scheduler; It kicks off tasks at regular intervals, that are then executed by available worker nodes in the
cluster.
You have to ensure only a single scheduler is running for a schedule
at a time, otherwise you’d end up with duplicate tasks. Using a
centralized approach means the schedule doesn’t have to be
synchronized, and the service can operate without using locks.
Reference: periodic-tasks
You can invoke scheduler with command,
$ celery -A proj beat #different process from your worker
You can also embed beat inside the worker by enabling the workers -B
option, this is convenient if you’ll never run more than one worker
node, but it’s not commonly used and for that reason isn’t recommended
for production use Starting scheduler:
$ celery -A proj worker -B
Reference: celery starting scheduler

Categories