Logging with python/celery/rabbitMQ

Logging with python/celery/rabbitMQ - python

I'm looking for the best way to keep track of my workers and queues and I'm looking into logging.
I've seen examples in the celery documentation that suggests setting up logging as follows:
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
#app.task
def add(x, y):
logger.info('Adding {0} + {1}'.format(x, y))
return x + y
Where does the logging file go? Also what information is stored in the log file? Is it just information that is contained in the logger.info function?
Does the logfile store the results returned by the workers, or is that separate?

Where does the logging file go?
As I can see you don't have any FileHandlers. It means logger write messages to console.
Let's check it. Here example of tasks.py:
# celery 4.0.2
#celery.task(name='add')
def add(x, y):
logger.info('Adding {0} + {1}'.format(x, y))
return x + y
app = celery.Celery(
__name__,
broker='redis://localhost:6379/0',
backend='redis://localhost:6379/0',
)
app.conf.beat_schedule = {
# run task each 2 seconds
'add-every-2-seconds': {
'task': 'add',
'schedule': 2.0,
'args': (1, 2)
},
}
Run Celery(celery worker -A tasks.app --loglevel=info --beat) and check console. You will see something like that:
[2017-04-08 18:18:55,924: INFO/Beat] Scheduler: Sending due task add-every-2-seconds (add)
[2017-04-08 18:18:55,930: INFO/MainProcess] Received task: add[44a6877c-84a2-4a26-815e-1f637fdf9c0c]
[2017-04-08 18:18:55,932: INFO/PoolWorker-2] add[44a6877c-84a2-4a26-815e-1f637fdf9c0c]: Adding 1 + 2
[2017-04-08 18:18:55,934: INFO/PoolWorker-2] Task add[44a6877c-84a2-4a26-815e-1f637fdf9c0c] succeeded in 0.00191404699945s: 3
[2017-04-08 18:18:57,924: INFO/Beat] Scheduler: Sending due task add-every-2-seconds (add)
[2017-04-08 18:18:57,928: INFO/MainProcess] Received task: add[c386d360-57d3-4352-8a89-f86bb2376e4e]
[2017-04-08 18:18:57,930: INFO/PoolWorker-3] add[c386d360-57d3-4352-8a89-f86bb2376e4e]: Adding 1 + 2
[2017-04-08 18:18:57,931: INFO/PoolWorker-3] Task add[c386d360-57d3-4352-8a89-f86bb2376e4e] succeeded in 0.00146738500007s: 3
It means logger works good and write our messages. Now let's try to add FileHandler for our tasks:
logger = get_task_logger(__name__)
task_handler = FileHandler('task.log')
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
task_handler.setFormatter(formatter)
logger.addHandler(task_handler)
Run Celery and check folder where stored tasks.py. You should see new file(tasks.log). Example of content:
2017-04-08 18:35:02,052 - tasks - INFO - Adding 1 + 2
...
Does the logfile store the results returned by the workers?
By default information just print to console. But you can register specific loggers, handlers and customize behavior using signals, custom Task/Loader class.
Also you can set -f LOGFILE, --logfile=LOGFILE argument when run Celery.
Hope this helps.

Related

prevent duplicate celery logging

How do I prevent duplicate celery logs in an application like this?
# test.py
from celery import Celery
import logging
app = Celery('tasks', broker='redis://localhost:6379/0')
app.logger = logging.getLogger("new_logger")
file_handler = logging.handlers.RotatingFileHandler("app.log", maxBytes=1024*1024, backupCount=1)
file_handler.setFormatter(logging.Formatter('custom_format %(message)s'))
app.logger.addHandler(file_handler)
#app.task
def foo(x, y):
app.logger.info("log info from foo")
I start the application with: celery -A test worker --loglevel=info --logfile celery.log
Then I cause foo to be run with python -c "from test import foo; print foo.delay(4, 4)"
This results in the "log info from foo" being displayed in both celery.log and app.log.
Here is app.log contents:
custom_format log info from foo
And here is celery.log contents:
[2017-07-26 21:17:24,962: INFO/MainProcess] Connected to redis://localhost:6379/0
[2017-07-26 21:17:24,967: INFO/MainProcess] mingle: searching for neighbors
[2017-07-26 21:17:25,979: INFO/MainProcess] mingle: all alone
[2017-07-26 21:17:25,991: INFO/MainProcess] celery#jd-t430 ready.
[2017-07-26 21:17:38,224: INFO/MainProcess] Received task: test.foo[e2c5e6aa-0d2d-4a16-978c-388a5e3cf162]
[2017-07-26 21:17:38,225: INFO/ForkPoolWorker-4] log info from foo
[2017-07-26 21:17:38,226: INFO/ForkPoolWorker-4] Task test.foo[e2c5e6aa-0d2d-4a16-978c-388a5e3cf162] succeeded in 0.000783085000876s: None
I considered removing the custom logger handler from the python code, but I don't want to just use celery.log because it doesn't support rotating files. I considered starting celery with --logfile /dev/null but then I would loose the mingle and other logs that don't show up in app.log.
Can I prevent "log info from foo" from showing up in celery.log? Given that I created the logger from scratch and only setup logging to app.log why is "log info from foo" showing up in celery.log anyway?
Is it possible to get the celery MainProcess and Worker logs (e.g. Connected to redis://localhost:6379/0) to be logged by a RotatingFileHandler (e.g. go in my app.log)?

Why is "log info from foo" showing up in celery.log?
The logging system is basically a tree of logging.Logger objects with main logging.Logger in the root of the tree (you get the root with call logging.getLogger() without parameters).
When you call logging.getLogger("child") you get reference to the logging.Logger processing the "child" logs. The problem is when you call logging.getLogger("child").info() the info message is delivered to the "child" but also to the parent of the "child" and to its parent until it arrives to the root.
To avoid sending logs to the parent you have to setup the logging.getLogger("child").propagate = False.

Setting up a result backend (rpc) with Celery in Django

I am attempting to get a result backend working on my local machine for a project I'm working on but I am running into an issue.
Currently I am trying to create a queue system in order for my lab to create cases. This is to prevent duplicate sequence numbers from being used. I am already using Celery for our printing so I figured I would create a new Celery queue and use that to handle the case. The front-end also needs to get the results of the case creations to display the case number that was created.
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html#rabbitmq
I was following the above tutorial on getting my Celery configured. Below is the source:
celeryconfig.py:
from kombu import Queue
CELERY_DEFAULT_QUEUE = 'celery'
CELERY_DEFAULT_EXCHANGE = 'celery'
CELERY_DEFAULT_EXCHANGE_TYPE = 'direct'
CELERY_RESULT_BACKEND = 'rpc://'
CELERY_RESULT_PERSISTENT = False
CELERY_QUEUES = (
Queue('celery', routing_key="celery"),
Queue('case_creation', routing_key='create.#')
)
CELERY_ROUTES = {
'case.tasks.create_case': {
'queue': 'case_creation',
'routing_key': 'create.1'
},
'print.tasks.connect_and_serve': {
'queue': 'celery',
'routing_key': 'celery'
}
}
celery.py:
import os
from celery import Celery
from django.conf import settings
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings.local')
app = Celery('proj', broker='amqp://guest#localhost//')
app.config_from_object('proj.celeryconfig')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
tasks.py:
import celery
from django.db import IntegrityError
from case.case_create import CaseCreate
#celery.task(bind=True)
def create_case(self, data, user, ip):
try:
acc = CaseCreate(data, user, ip)
return acc.begin()
except IntegrityError as e:
self.retry(exc=e, countdown=2)
Here is my view that calls the above task:
#require_authentication()
#requires_api_signature()
#csrf_exempt
#require_http_methods(['POST'])
def api_create_case(request):
result = create_case.delay(json.loads(request.body.decode('utf-8')), request.user, get_ip_address(request))
print(str(result)) # Prints the Task ID
print(str(result.get(timeout=1))) # Throws error
return HttpResponse(json.dumps({'result': str(result)}), status=200)
I start my celery queue with the following command:
celery -A proj worker -Q case_creation -n case_worker -c 1
When I run the celery worker I do see results show up under config:
-------------- celery#case_worker v3.1.16 (Cipater)
---- **** -----
--- * *** * -- Windows-8-6.2.9200
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: proj:0x32a2990
- ** ---------- .> transport: amqp://guest:**#localhost:5672//
- ** ---------- .> results: rpc://
- *** --- * --- .> concurrency: 1 (prefork)
-- ******* ----
--- ***** ----- [queues]
-------------- .> case_creation exchange=celery(direct) key=create.#
When I run the program and submit a new case this is the error message that I get:
No result backend configured. Please see the documentation for more information.
I have attempted every single thing I can find online. Is there anyone out there that can point me in the right direction? I'm so very close and so very tired of looking at this code.

If you want to keep your result, try this Keeping Results
app = Celery('proj', backend='amqp', broker='amqp://guest#localhost//')
EDIT
Make sure the client is configured with the right backend.
If for some reason the client is configured to use a different backend than the worker, you will not be able to receive the result, so make sure the backend is correct by inspecting it:
Try this to see the output:
>>> result = task.delay(…)
>>> print(result.backend)
other solutions will be instead of
app = Celery('proj',
backend='amqp',
broker='amqp://',
include=['proj.tasks'])
Try:
app = Celery('proj',
broker='amqp://',
include=['proj.tasks'])
app.conf.update(
CELERY_RESULT_BACKEND='amqp'
)

How do I setup logging when using aiohttp and aiopg with Gunicorn?

aiohttp is great, but setting up logging has been a nightmare, both locally and in production, when using Gunicorn.
Most of the examples and documentation I find for setting up logging are for running in native server mode, where you use make_handler()
As recommended in the documentation, I'm using Gunicorn as a Web Server to deploy, so I don't call make_handler explicitly.
I am not seeing aiohttp.access logs, nor the aiohttp.server logs, nor the aiopg logs, all of which should be set up by default
This is what I've got in a root level app.py:
import logging
import aiopg
from aiohttp import web
async def some_handler(request):
id = request.match_info["id"]
# perform some SA query
return web.json_response({"foo": id})
async def close_postgres(app):
app['postgres'].close()
await app['postgres'].wait_closed
async def init(loop, logger, config):
app = web.Application(
loop=loop,
logger=logger
)
app['postgres'] = await aiopg.sa.create_engine(loop=loop, echo=True) # other args ommitted
app.on_cleanup.append(close_postgres)
app.router.add_route('GET', '/', some_handler, 'name')
return app
def run():
config = parse_yaml('config.yml') # => turns config.yml to dict
logging.config.dictConfig(config['logging'])
logger = logging.getLogger("api")
loop = asyncio.get_event_loop()
app = run_until_complete(init(loop, logger, config))
return app
My config.yml file
logging:
version: 1
formatters:
simple:
format: '[%(asctime)s] [%(process)d] [%(levelname)s] %(message)s'
datefmt: '%Y-%m-%d %H:%M:%S %z'
handlers:
console:
class: logging.StreamHandler
formatter: simple
level: DEBUG
stream: ext://sys.stdout
loggers:
api:
handlers:
- console
level: DEBUG
I launch gunicorn with the following:
gunicorn 'app:run()' --worker-class aiohttp.worker.GunicornWebWorker
I only see the following logs no matter what query I make:
[2016-08-22 11:26:46 -0400] [41993] [INFO] Starting gunicorn 19.6.0
[2016-08-22 11:26:46 -0400] [41993] [INFO] Listening at: http://127.0.0.1:8000 (41993)
[2016-08-22 11:26:46 -0400] [41993] [INFO] Using worker: aiohttp.worker.GunicornWebWorker
[2016-08-22 11:26:46 -0400] [41996] [INFO] Booting worker with pid: 41996
What I want:
aiopg logs (which queries ran)
access logs
server logs
Thanks

Documentation don't ultimately recommend to use Gunicorn for deployment but have instructions for running under Gunicorn.
Perhaps it should be upgraded to passing correct format for access logger.
From my perspective the easiest way to run aiohttp server is just running it (by using web.run_app() handler or building own runner on top of it).
If you need several aiohttp instances -- use nginx in reverse proxy mode (most likely you already have it in your tool chain) and supervisord for controlling servers.
The combination just works without the need for intermediate layer. Just like people starts tornado or twisted.

Can't see my celery logs when running beat

I am starting celery via supervisord, see the entry below.
[program:celery]
user = foobar
autostart = true
autorestart = true
directory = /opt/src/slicephone/cloud
command = /opt/virtenvs/django_slice/bin/celery beat --app=cloud -l DEBUG -s /home/foobar/run/celerybeat-schedule --pidfile=/home/foobar/run/celerybeat.pid
priority = 100
stdout_logfile_backups = 0
stderr_logfile_backups = 0
stdout_logfile_maxbytes = 10MB
stderr_logfile_maxbytes = 10MB
stdout_logfile = /opt/logs/celery.stdout.log
stderr_logfile = /opt/logs/celery.stderr.log
pip freeze | grep celery
celery==3.1.0
But any usage of:
#celery.task
def test_rabbit_running():
import logging
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
logger.setLevel(logging.DEBUG)
logger.info("foobar")
doesn't show up in the logs. Instead I get entries like the following.
celery.stdout.log
celery beat v3.1.0 (Cipater) is starting.
__ - ... __ - _
Configuration ->
. broker -> redis://localhost:6379//
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> /home/foobar/run/celerybeat-schedule
. logfile -> [stderr]#%DEBUG
. maxinterval -> now (0s)
celery.stderr.log
[2013-11-12 05:42:39,539: DEBUG/MainProcess] beat: Waking up in 2.00 seconds.
INFO Scheduler: Sending due task test_rabbit_running (retail.tasks.test_rabbit_running)
[2013-11-12 05:42:41,547: INFO/MainProcess] Scheduler: Sending due task test_rabbit_running (retail.tasks.test_rabbit_running)
DEBUG retail.tasks.test_rabbit_running sent. id->34268340-6ffd-44d0-8e61-475a83ab3481
[2013-11-12 05:42:41,550: DEBUG/MainProcess] retail.tasks.test_rabbit_running sent. id->34268340-6ffd-44d0-8e61-475a83ab3481
DEBUG beat: Waking up in 6.00 seconds.
What do I have to do to make my logging calls appear in the log files?

It doesn't log anything because it doesn't execute any tasks (and it's ok).
See also Celerybeat not executing periodic tasks

I'd try to put the call to log inside a task as the name of the util function implies get_task_logger, or just start with a simple print, or have your own log set up as suggested in Django Celery Logging Best Practice (best way to go IMO)

How to dynamically add / remove periodic tasks to Celery (celerybeat)

If I have a function defined as follows:
def add(x,y):
return x+y
Is there a way to dynamically add this function as a celery PeriodicTask and kick it off at runtime? I'd like to be able to do something like (pseudocode):
some_unique_task_id = celery.beat.schedule_task(add, run_every=crontab(minute="*/30"))
celery.beat.start(some_unique_task_id)
I would also want to stop or remove that task dynamically with something like (pseudocode):
celery.beat.remove_task(some_unique_task_id)
or
celery.beat.stop(some_unique_task_id)
FYI I am not using djcelery, which lets you manage periodic tasks via the django admin.

This question was answered on google groups.
I AM NOT THE AUTHOR, all credit goes to Jean Mark
Here's a proper solution for this. Confirmed working, In my scenario,
I sub-classed Periodic Task and created a model out of it since I can
add other fields to the model as I need and also so I could add the
"terminate" method. You have to set the periodic task's enabled
property to False and save it before you delete it. The whole
subclassing is not a must, the schedule_every method is the one that
really does the work. When you're ready to terminate you task (if you
didn't subclass it) you can just use
PeriodicTask.objects.filter(name=...) to search for your task, disable
it, then delete it.
Hope this helps!
from djcelery.models import PeriodicTask, IntervalSchedule
from datetime import datetime
class TaskScheduler(models.Model):
periodic_task = models.ForeignKey(PeriodicTask)
#staticmethod
def schedule_every(task_name, period, every, args=None, kwargs=None):
""" schedules a task by name every "every" "period". So an example call would be:
TaskScheduler('mycustomtask', 'seconds', 30, [1,2,3])
that would schedule your custom task to run every 30 seconds with the arguments 1,2 and 3 passed to the actual task.
"""
permissible_periods = ['days', 'hours', 'minutes', 'seconds']
if period not in permissible_periods:
raise Exception('Invalid period specified')
# create the periodic task and the interval
ptask_name = "%s_%s" % (task_name, datetime.datetime.now()) # create some name for the period task
interval_schedules = IntervalSchedule.objects.filter(period=period, every=every)
if interval_schedules: # just check if interval schedules exist like that already and reuse em
interval_schedule = interval_schedules[0]
else: # create a brand new interval schedule
interval_schedule = IntervalSchedule()
interval_schedule.every = every # should check to make sure this is a positive int
interval_schedule.period = period
interval_schedule.save()
ptask = PeriodicTask(name=ptask_name, task=task_name, interval=interval_schedule)
if args:
ptask.args = args
if kwargs:
ptask.kwargs = kwargs
ptask.save()
return TaskScheduler.objects.create(periodic_task=ptask)
def stop(self):
"""pauses the task"""
ptask = self.periodic_task
ptask.enabled = False
ptask.save()
def start(self):
"""starts the task"""
ptask = self.periodic_task
ptask.enabled = True
ptask.save()
def terminate(self):
self.stop()
ptask = self.periodic_task
self.delete()
ptask.delete()

This was finally made possible by a fix included in celery v4.1.0. Now, you just need to change the schedule entries in the database backend, and celery-beat will act according to the new schedule.
The docs vaguely describe how this works. The default scheduler for celery-beat, PersistentScheduler, uses a shelve file as its schedule database. Any changes to the beat_schedule dictionary in the PersistentScheduler instance are synced with this database (by default, every 3 minutes), and vice-versa. The docs describe how to add new entries to the beat_schedule using app.add_periodic_task. To modify an existing entry, just add a new entry with the same name. Delete an entry as you would from a dictionary: del app.conf.beat_schedule['name'].
Suppose you want to monitor and modify your celery beat schedule using an external app. Then you have several options:
You can open the shelve database file and read its contents like a dictionary. Write back to this file for modifications.
You can run another instance of the Celery app, and use that one to modify the shelve file as described above.
You can use the custom scheduler class from django-celery-beat to store the schedule in a django-managed database, and access the entries there.
You can use the scheduler from celerybeat-mongo to store the schedule in a MongoDB backend, and access the entries there.

No, I'm sorry, this is not possible with the regular celerybeat.
But it's easily extensible to do what you want, e.g. the django-celery
scheduler is just a subclass reading and writing the schedule to the database
(with some optimizations on top).
Also you can use the django-celery scheduler even for non-Django projects.
Something like this:
Install django + django-celery:
$ pip install -U django django-celery
Add the following settings to your celeryconfig:
DATABASES = {
'default': {
'NAME': 'celerybeat.db',
'ENGINE': 'django.db.backends.sqlite3',
},
}
INSTALLED_APPS = ('djcelery', )
Create the database tables:
$ PYTHONPATH=. django-admin.py syncdb --settings=celeryconfig
Start celerybeat with the database scheduler:
$ PYTHONPATH=. django-admin.py celerybeat --settings=celeryconfig \
-S djcelery.schedulers.DatabaseScheduler
Also there's the djcelerymon command which can be used for non-Django projects
to start celerycam and a Django Admin webserver in the same process, you can
use that to also edit your periodic tasks in a nice web interface:
$ djcelerymon
(Note for some reason djcelerymon can't be stopped using Ctrl+C, you
have to use Ctrl+Z + kill %1)

There is a library called django-celery-beat which provides the models one needs. To make it dynamically load new periodic tasks one has to create its own Scheduler.
from django_celery_beat.schedulers import DatabaseScheduler
class AutoUpdateScheduler(DatabaseScheduler):
def tick(self, *args, **kwargs):
if self.schedule_changed():
print('resetting heap')
self.sync()
self._heap = None
new_schedule = self.all_as_schedule()
if new_schedule:
to_add = new_schedule.keys() - self.schedule.keys()
to_remove = self.schedule.keys() - new_schedule.keys()
for key in to_add:
self.schedule[key] = new_schedule[key]
for key in to_remove:
del self.schedule[key]
super(AutoUpdateScheduler, self).tick(*args, **kwargs)
#property
def schedule(self):
if not self._initial_read and not self._schedule:
self._initial_read = True
self._schedule = self.all_as_schedule()
return self._schedule

You can check out this flask-djcelery which configures flask and djcelery and also provides browseable rest api

The answer from #asksol is what's needed if in a Django application.
For non-django applications, you can use celery-sqlalchemy-scheduler which is modeled like django-celery-beat for Django since it also uses database instead of the file celerybeat-schedule.
https://pypi.org/project/celery-sqlalchemy-scheduler/
https://github.com/AngelLiang/celery-sqlalchemy-scheduler
Here is an example with runtime addition of a new task.
tasks.py
from celery import Celery
celery = Celery('tasks')
beat_dburi = 'sqlite:///schedule.db'
celery.conf.update(
{'beat_dburi': beat_dburi}
)
#celery.task
def my_task(arg1, arg2, be_careful):
print(f"{arg1} {arg2} be_careful {be_careful}")
Logs (Producer)
$ celery --app=tasks beat --scheduler=celery_sqlalchemy_scheduler.schedulers:DatabaseScheduler --loglevel=INFO
celery beat v5.1.2 (sun-harmonics) is starting.
[2021-08-20 15:20:20,927: INFO/MainProcess] beat: Starting...
Logs (Consumer)
$ celery --app=tasks worker --queues=celery --loglevel=INFO
-------------- celery#ubuntu20 v5.1.2 (sun-harmonics)
[2021-08-20 15:20:02,287: INFO/MainProcess] Connected to amqp://guest:**#127.0.0.1:5672//
Database schedules
$ sqlite3 schedule.db
sqlite> .databases
main: /home/nponcian/Documents/Program/1/db/schedule.db
sqlite> .tables
celery_crontab_schedule celery_periodic_task_changed
celery_interval_schedule celery_solar_schedule
celery_periodic_task
sqlite> select * from celery_periodic_task;
1|celery.backend_cleanup|celery.backend_cleanup||1||[]|{}|||||2021-08-20 19:20:20.955246|0||1||0|2021-08-20 07:20:20|
Now, while those workers are already running, let's update the schedules by adding a new scheduled task. Note that this is at runtime, without the need to restart the workers.
$ python3
>>> # Setup the session.
>>> from celery_sqlalchemy_scheduler.models import PeriodicTask, IntervalSchedule
>>> from celery_sqlalchemy_scheduler.session import SessionManager
>>> from tasks import beat_dburi
>>> session_manager = SessionManager()
>>> engine, Session = session_manager.create_session(beat_dburi)
>>> session = Session()
>>>
>>> # Setup the schedule (executes every 10 seconds).
>>> schedule = session.query(IntervalSchedule).filter_by(every=10, period=IntervalSchedule.SECONDS).first()
>>> if not schedule:
... schedule = IntervalSchedule(every=10, period=IntervalSchedule.SECONDS)
... session.add(schedule)
... session.commit()
...
>>>
>>> # Create the periodic task
>>> import json
>>> periodic_task = PeriodicTask(
... interval=schedule, # we created this above.
... name='My task', # simply describes this periodic task.
... task='tasks.my_task', # name of task.
... args=json.dumps(['arg1', 'arg2']),
... kwargs=json.dumps({
... 'be_careful': True,
... }),
... )
>>> session.add(periodic_task)
>>> session.commit()
Database schedules (updated)
We can now see that the newly added schedule has reflected to the database which is continuously read by the celery beat scheduler. So should there be any updates with the values of the args or kwargs, we can easily perform SQL updates on the database and it should reflect in realtime with the running workers (without the need of restart).
sqlite> select * from celery_periodic_task;
1|celery.backend_cleanup|celery.backend_cleanup||1||[]|{}|||||2021-08-20 19:20:20.955246|0||1||0|2021-08-20 07:20:20|
2|My task|tasks.my_task|1|||["arg1", "arg2"]|{"be_careful": true}||||||0||1||0|2021-08-20 07:26:49|
Logs (Producer)
Now, the new task is being enqueued every 10 seconds
[2021-08-20 15:26:51,768: INFO/MainProcess] DatabaseScheduler: Schedule changed.
[2021-08-20 15:26:51,768: INFO/MainProcess] Writing entries...
[2021-08-20 15:27:01,789: INFO/MainProcess] Scheduler: Sending due task My task (tasks.my_task)
[2021-08-20 15:27:11,776: INFO/MainProcess] Scheduler: Sending due task My task (tasks.my_task)
[2021-08-20 15:27:21,791: INFO/MainProcess] Scheduler: Sending due task My task (tasks.my_task)
Logs (Consumer)
The newly added task is correctly executed on time every 10 seconds
[2021-08-20 15:27:01,797: INFO/MainProcess] Task tasks.my_task[04dcb40c-0a77-437b-a129-57eb52850a51] received
[2021-08-20 15:27:01,798: WARNING/ForkPoolWorker-4] arg1 arg2 be_careful True
[2021-08-20 15:27:01,799: WARNING/ForkPoolWorker-4]
[2021-08-20 15:27:01,799: INFO/ForkPoolWorker-4] Task tasks.my_task[04dcb40c-0a77-437b-a129-57eb52850a51] succeeded in 0.000763321000704309s: None
[2021-08-20 15:27:11,783: INFO/MainProcess] Task tasks.my_task[e8370a6b-085f-4bd5-b7ad-8f85f4b61908] received
[2021-08-20 15:27:11,786: WARNING/ForkPoolWorker-4] arg1 arg2 be_careful True
[2021-08-20 15:27:11,786: WARNING/ForkPoolWorker-4]
[2021-08-20 15:27:11,787: INFO/ForkPoolWorker-4] Task tasks.my_task[e8370a6b-085f-4bd5-b7ad-8f85f4b61908] succeeded in 0.0006725780003762338s: None
[2021-08-20 15:27:21,797: INFO/MainProcess] Task tasks.my_task[c14d875d-7f6c-45c2-a76b-4e9483273185] received
[2021-08-20 15:27:21,799: WARNING/ForkPoolWorker-4] arg1 arg2 be_careful True
[2021-08-20 15:27:21,799: WARNING/ForkPoolWorker-4]
[2021-08-20 15:27:21,800: INFO/ForkPoolWorker-4] Task tasks.my_task[c14d875d-7f6c-45c2-a76b-4e9483273185] succeeded in 0.0006371149993356084s: None

I was looking for the same solution for Celery + Redis that can be flexible add/remove. Check out this one, redbeat, same guy from Heroku, even they put as well the Redis + Sentinel.
Hope helps :)

Celery can realize the dynamic periodic task with databases and calling itself.
But APSchedule is better.
Because dynamic periodic task always means long countdown or eta. Too many of these periodic tasks can take up a lot of memory, making it time-consuming to restart and execute non-delayed tasks.
tasks.py
import sqlite3
from celery import Celery
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
app = Celery(
'tasks',
broker='redis://localhost:6379/0',
backend='redis://localhost:6379/1',
imports=['tasks'],
)
conn = sqlite3.connect('database.db', check_same_thread=False)
c = conn.cursor()
sql = '''
CREATE TABLE IF NOT EXISTS `tasks`
(
`id` INTEGER UNIQUE PRIMARY KEY AUTOINCREMENT,
`name` TEXT,
`countdown` INTEGER
);
'''
c.execute(sql)
def create(name='job', countdown=5):
sql = 'INSERT INTO `tasks` (`name`, `countdown`) VALUES (?, ?)'
c.execute(sql, (name, countdown))
conn.commit()
return c.lastrowid
def read(id=None, verbose=False):
sql = 'SELECT * FROM `tasks` '
if id:
sql = 'SELECT * FROM `tasks` WHERE `id`={}'.format(id)
all_rows = c.execute(sql).fetchall()
if verbose:
print(all_rows)
return all_rows
def update(id, countdown):
sql = 'UPDATE `tasks` SET `countdown`=? WHERE `id`=?'
c.execute(sql, (countdown, id))
conn.commit()
def delete(id, verbose=False):
sql = 'DELETE FROM `tasks` WHERE `id`=?'
affected_rows = c.execute(sql, (id,)).rowcount
if verbose:
print('deleted {} rows'.format(affected_rows))
conn.commit()
#app.task
def job(id):
id = read(id)
if id:
id, name, countdown = id[0]
else:
logger.info('stop')
return
logger.warning('id={}'.format(id))
logger.warning('name={}'.format(name))
logger.warning('countdown={}'.format(countdown))
job.apply_async(args=(id,), countdown=countdown)
main.py
from tasks import *
id = create(name='job', countdown=5)
job(id)
# job.apply_async((id,), countdown=5) # wait 5s
print(read())
input('enter to update')
update(id, countdown=1)
input('enter to delete')
delete(id, verbose=True)

Some time ago I needed to dynamically update periodic tasks in Celery and Django, and I wrote an article about my approach (code for article).
I was using django-celery-beat package. It provides database models for PeriodicTask and IntervalSchedule. By manipulating PeriodicTask objects, you can add/remove/update/pause periodic tasks in Celery.
Create periodic task
from django_celery_beat.models import IntervalSchedule, PeriodicTask
schedule, created = IntervalSchedule.objects.get_or_create(
every=instance.interval,
period=IntervalSchedule.SECONDS,
)
task = PeriodicTask.objects.create(
interval=schedule,
name=f"Monitor: {instance.endpoint}",
task="monitors.tasks.task_monitor",
kwargs=json.dumps(
{
"monitor_id": instance.id,
}
),
)
Remove periodic task
PeriodicTask.objects.get(pk=task_id).delete()
Change interval in a periodic task
task = PeriodicTask.objects.get(pk=your_id)
schedule, created = IntervalSchedule.objects.get_or_create(
every=new_interval,
period=IntervalSchedule.SECONDS,
)
task.interval = schedule
task.save()
Pause periodic task
task = PeriodicTask.objects.get(pk=your_id)
task.enabled = false
task.save()
Beat service
When using django-celery-beat you need to pass scheduler argument when starting beat service:
celery -A backend beat -l INFO --scheduler django_celery_beat.schedulers:DatabaseScheduler --max-interval 10

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Logging with python/celery/rabbitMQ - python

Related

prevent duplicate celery logging

Setting up a result backend (rpc) with Celery in Django

How do I setup logging when using aiohttp and aiopg with Gunicorn?

Can't see my celery logs when running beat

How to dynamically add / remove periodic tasks to Celery (celerybeat)

Categories

Resources