running code when celery starts in worker mode - python

I am looking to run some code when a celery worker starts. Not when let's say a task is imported to be used from a client type application.
celery_app = Celery(__name__)
# I want to only create the egine if this file is used by a worker
engine = create_engine(str(POSTGRES_URL))

You are looking for worker signals ( https://docs.celeryproject.org/en/latest/userguide/signals.html?highlight=worker_ready#worker-signals ). It is all nicely explained there. I am guessing worker_ready is the one you should look at first.

Related

How to execute tasks in different server using Python Celery?

Let's say I have three servers A, B & C. Server C will have the celery task code, and I need to execute them from servers A & B.
From the celery documentation, I see that there's a task.py file which is run as a celery worker
from celery import Celery
app = Celery('tasks', broker='pyamqp://guest#localhost//')
#app.task
def add(x, y):
return x + y
And then we have another python file (let's say client.py) which calls these tasks.
from tasks import add
add.delay(4, 4)
Here I can see that the client.py file is dependent on the tasks.py file as it's importing the task from the tasks.py file. If we are to run these two files in separate servers, we need to decouple them and somehow call the tasks without having to import the code. I am not able to figure out how to achieve that. So, how can it be done?
In general you do not do that. You deploy the same code (containing tasks) to both producers (clients) and consumers (workers). However, Celery is a cool piece of software and it allows you to actually schedule task without the need to distribute the code on the producer side. For that you have to use the send_task() method. You have to configure producer with same parameters as your workers (same broker naturally, same serialization) and you must know the calling task name and its parameters in order to schedule its execution correctly.

Celery : understanding the big picture

Celery seems to be a great tool, but I have hard time understanding how the various Celery components work together:
The workers
The apps
The tasks
The message Broker (like RabbitMQ)
From what I understand, the command line:
celery -A not-clear-what-this-option-is worker
should run some sort of celery "worker server" which would itself need to connect to a broker server (I'm not so sure why so many servers are needed).
Then in any python code, some task may be sent to the worker by instantiating an app:
app = Celery('my_module', broker='pyamqp://guest#localhost//')
and then by decorating functions with this app in the following way:
#app.tasks
def my_func():
...
so that "my_func()" can now be called as "my_func.delay()" to be ran in an asynchronuous way.
Here are my questions:
What happens when my_func.delay() is called ? which server talks to which first ? and sending what where ?
What is the option to put behind the "-A" of the celery command? is this really needed ?
Suppose I have a process X which instantiates a Celery app to launch the task A, and suppose I have another process Y who wants to know the status of task A launched by X. I assume there is a way for Y to do so, but I don't know how. I suppose that Y should create its own instance of a Celery app. But then:
What function to call in the celery app of Y to get this information (and what is the "identifier" of task A inside the process Y) ?
How does this work in terms of communication, that is, when does the request goes through the Broker, and when does it go to the worker(s) ?
If anyone has some information about these questions, I would be grateful. I intend to use Celery in a Django project, where some requests to the server can trigger various time consuming tasks, and/or inquire about the status of previously launched tasks (pending, finished, error, etc...).
About the broker:
The main role of the broker is to mediate communication between the client and the worker
basically a lot of information is being generated and processed while your worker is running
taking care of this information is the broker's role
e.g. you can configure redis so that no information is lost if the server is shut down while running a process
The worker:
you can think of the worker as an instance independent of your application, which will only execute those tasks that you delegate to it
About the state of a task:
there are ways to consult celery to find out the status of a task, but I would not recommend building your application logic depending on this
if you want to get the output of a process and turn it in the input of another one, using tasks, I would recommend you to use a queue
run task A, and before finish insert your result objects in the queue
task B will listen to the queue and processes whatever comes up
The command:
on the terminal you can see in more detail what each argument means by running celery -h or celery --help
but the argument basically specifies which instance of celery you intend to run. So normally this argument will indicate where the instance you have configured and intend to execute can be found
usage: celery [-h] [-A APP] [-b BROKER] [--result-backend RESULT_BACKEND]
[--loader LOADER] [--config CONFIG] [--workdir WORKDIR]
[--no-color] [--quiet]
I hope this can provide an initial overview for those who get here
Celery is used to make functions to run in the background. Imagine you have a web API that does a job, and returns a response. You know, that job would seriously affect the response time for the API. So you'll transfer that particular job to Celery, and your API will respond instantly. Examples for some job that affect performance of an API are,
Routing to email servers
Routing to SMS Gateways
Database backup
Chained database operations
File conversion
Now, let's cover each components of celery.
The workers
Celery workers execute the job(function). They are asynchronous. So you'll have double the number of your processor cores as celery workers. You can assign a name and task to a celery worker#.
The apps
The app is the name of project you're working on. You'll have to specify that name in the celery instance.
The tasks
The functions you need to be executed in the background. Every task Celery execute will have a task id, state(and more). You can get that by inspecting a particular task.
The message Broker
Those tasks which will be executed in the background has to be moved from your python project to to Celery workers. Message brokers act as a medium here. So functions with its arguments will be transferred to brokers and from brokers Celery will fetch them to execute.
Some codes
celery -A project_name worker_name
celery -A project_name worker_name inspect
More in documentation
docs.celeryproject.org

Setup celery periodic task

How do I set up a periodic task with Celerybeat and Flask that queries a database every hour?
The environment looks like this:
/
|-app
|-__init__.py
|-jobs
|-task.py
|-celery-beat.sh
|-celery-worker.sh
|-manage.py
I currently have a query function called run_query() located in task.py
I want the scheduler to kick in once the application initiates so I have the following lines in my /app/__init__.py folder:
celery = Celery()
#celery.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(1, app.jobs.task.run_query())
(For simplicity's sake, I've set it up so that if it runs, it will run every minute. No such luck yet.)
When I launch the celery-worker.sh it recognizes my function under the [tasks] heading. But the scheduled function never runs. I can manually force the function to run by issuing the following at the command prompt:
>> from app.jobs import task
>> task.run_query.delay()
EDIT: Added celerybeat.sh
As a follow up: If the database is accessed through a flask context, during my asynch function call is it wise to create a new flask context to access the database? Use the existing flask context? Or forget contexts altogether and just initiate a connection to the database? My worry is that if I just initiate a new connection it may interfere with the existing context's connection?
To run periodic tasks you need some kind of schduler (Eg. celery beat).
celery beat is a scheduler; It kicks off tasks at regular intervals, that are then executed by available worker nodes in the
cluster.
You have to ensure only a single scheduler is running for a schedule
at a time, otherwise you’d end up with duplicate tasks. Using a
centralized approach means the schedule doesn’t have to be
synchronized, and the service can operate without using locks.
Reference: periodic-tasks
You can invoke scheduler with command,
$ celery -A proj beat #different process from your worker
You can also embed beat inside the worker by enabling the workers -B
option, this is convenient if you’ll never run more than one worker
node, but it’s not commonly used and for that reason isn’t recommended
for production use Starting scheduler:
$ celery -A proj worker -B
Reference: celery starting scheduler

How to determine which queues a Celery worker is consuming at runtime?

As part of a sanity check, I want to write some code to make sure the worker has started with a correct set of queues based on the settings given.
Celery is run like so:
celery worker -A my_app -l INFO -Q awesome_mode
I would like to work out after the app is initialised, which queues Celery is consuming.
e.g., I made up app.queues:
app = Celery('my_app')
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
if 'awesome_mode' in app.queues:
...
After a bit of interactive debugging I found app.amqp.queues which is a dictionary where the key is the name of the queue and the value is a Queue.
Unfortunately the dictionary is not populated immediately after initialisation, but does work after the worker_ready signal.
Placing this code after app initialisation seems to work. It could probably be placed elsewhere of course.
#worker_ready.connect
def worker_ready_handler(sender=None, **kwargs):
print(app.amqp.queues.keys())
The worker logs:
[2015-04-22 07:41:01,147: WARNING/MainProcess] ['celery', 'awesome_mode']
[2015-04-22 07:41:01,148: WARNING/MainProcess] celery#zaptop ready.

Would starting APScheduler in a uwsgi app end up with one scheduler for each worker?

I have a flask application in which I need the scheduling feature of APScheduler. The question is:
Where do I start the scheduler instance?
I use uwsgi+nginx to serve this application with multiple workers, wouldn't I end up with multiple instances of Scheduler that would be oblivious of each other? If this is correct, a single job would be triggered multiple times, wouldn't it?
What is the best strategy in this case so I end up with just one Scheduler instance and still be able to access the application's context from within the scheduled jobs?
This question has the same problem albeit with gunicorn instead of uwsgi, but the answer could be similar.
Below is the code defining "app" as a uwsgi callable application object.
The file containing this code is called wsgi.py (not that it matters).
app = create_app(config=ProductionConfig())
def job_listener(event):
get_ = "msg from job '%s'" % (event.job)
logging.info(get_)
# This code below never gets invoked when I check with worker_id() == 1
# The only time it is run is with worker_id() value of 0
app.sched = Scheduler()
app.sched.add_jobstore(ShelveJobStore('/tmp/apsched_%d' % uwsgi.worker_id()), 'file')
app.sched.add_listener(job_listener,
events.EVENT_JOB_EXECUTED |
events.EVENT_JOB_MISSED |
events.EVENT_JOB_ERROR)
app.sched.start()
uWSGI has a feature called mules (see: http://uwsgi-docs.readthedocs.org/en/latest/Mules.html), you can use them to start a script under the master which is not accessible via socket. It's designed to offload a work from the main app using schedulers and for signal handling so it seems to be perfect for deploying a scheduler inside the uwsgi stack.
UWSGI has a function uwsgi.worker_id(). If you start the scheduler conditionally in a specific worker, you won't end up with multiple scheduler instances.

Categories