Django+Celery integration with periodic tasks - python

I am a bit confused on how to configure Django+Celery.
I have followed what is reported in this guide.
Here it is the bunch of configuration I have to write:
BROKER_URL = 'amqp://...'
queue_arguments = {'x-max-length': 1}
CELERY_QUEUES = (
Queue('queue1', routing_key='queue1', queue_arguments=queue_arguments),
Queue('queue2', routing_key='queue2', queue_arguments=queue_arguments))
from datetime import timedelta
CELERYBEAT_SCHEDULE = {
'task1': {
'task': 'MyProject.tasks.this_is_task_1',
'schedule': timedelta(seconds=1)
},
'task2': {
'task': 'MyProject.tasks.this_is_task_2',
'schedule': timedelta(seconds=1)
}
}
CELERY_ROUTES = {
'MyProject.tasks.this_is_task_1': {
'queue': 'queue1',
'routing_key': 'queue1',
},
'MyProject.tasks.this_is_task_2': {
'queue': 'queue2',
'routing_key': 'queue2',
}
}
app = Celery('MyProject')
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'MyProject.settings')
app = Celery('MyProject')
app.conf.update(
CELERY_TASK_RESULT_EXPIRES=30,
CELERY_IGNORE_RESULT=True,
)
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
Based on 1, I should put:
The tasks in MyProject/tasks.py
The 'app' variable creation and initialization in MyProject/celery.py
The configuration variables in MyProject/settings.py
If I do so, i receive the following:
Couldn't apply scheduled task check_block_height: Queue.declare: (406) PRECONDITION_FAILED - inequivalent arg 'x-max-length'for queue 'queue2' in vhost '...': received the value '1' of type 'signedint' but current is none
[2015-11-04 00:30:12,899: DEBUG/MainProcess] beat: Waking up now.
It behaves as if the queue has already been created, but without any option.
If I keep just CELERYBEAT_SCHEDULE and CELERY_ROUTES in settings.py, everything seems to be working. The truth is that the queue configuration is ignored, that is, the CELERY_QUEUE configuration is not used.
Thanks

I solved the problem.
The issue was related to the fact that I already had created the queues, which were not limited.
By deleting them and restarting, everything worked perfectly by putting:
1. all the configuration in settings.py
2. all the Celery creation in celery.py

Related

How to dynamically change the schedule of celery beat?

I am using Celery 4.3.0. I am trying to update the schedule of celery beat every 5 seconds based on a schedule in a json file, so that when I’m manually editing, adding or deleting scheduled tasks in that json file, the changes are picked up by the celery beat scheduler without restarting it.
What I tried is creating a task that update this schedule by updating app.conf['CELERYBEAT_SCHEDULE']. The task successfully runs every 5 seconds but celery beat doesn’t update to the new schedule, even though I set beat_max_loop_interval to 1 sec.
tasks.py
from celery import Celery
app = Celery("tasks", backend='redis://', broker='redis://')
app.config_from_object('celeryconfig')
#app.task
def hello_world():
return "Hello World!"
#app.task
def update_schedule():
with open("path_to_scheduler.json", "r") as f:
app.conf['CELERYBEAT_SCHEDULE'] = json.load(f)
celeryconfig.py
beat_max_loop_interval = 1 # Maximum number of seconds beat can sleep between checking the schedule
beat_schedule = {
"greet-every-10-seconds": {
"task": "tasks.hello_world",
"schedule": 10.0
},
'update_schedule': {
'task': 'tasks.update_schedule',
'schedule': 5.0
},
}
schedule.json
{
"greet-every-2-seconds": {
"task": "tasks.hello_world",
"schedule": 2.0
},
"update_schedule": {
"task": "tasks.update_schedule",
"schedule": 5.0
}
}
Any help would be appreciated. If you know a better way to reload the beat schedule from a file I’m also keen to hear it.

RabbitMQ or Redis exploding Celery queues with Django 2.0

I am encountering a problem with celery and Django 2. I have two running environments:
Production: requirements.txt => No Issue
amqp==2.2.2
django==1.11.6
celery==4.1.0
django-celery-beat==1.0.1
django-celery-monitor==1.1.2
kombu==4.1.0
redis==2.10.6
Development: requirements.txt =>Issue Present
amqp==2.2.2
django==2.0.3
celery==4.1.0
django-celery-beat==1.1.1
django-celery-monitor==1.1.2
kombu==4.1.0
redis==2.10.6
The production environment should be migrated to Django 2.0 as soon as possible.
However, I can't do it without fixing this issue with Celery. My development environment is here to insure that everything is running fine before upgrading production servers.
The question
What changed with Django 2 to make a system, which was stable with Django 1.11, unstable with exploding queue sizes, with the same effect in RabbitMQ and Redis?
If any task is not consumed, how could it be automatically deleted by Redis/RabbitMQ ?
Celery Worker is launched as followed
The exact same command is used for both environment.
celery beat -A application --loglevel=info --detach
celery events -A application --loglevel=info --camera=django_celery_monitor.camera.Camera --frequency=2.0 --detach
celery worker -A application -l info --events
Application Settings
Since I migrate my development environment to Django 2, my RabbitMQ queues or Redis queues are litterally exploding in size and my database instances keep on scaling up. It seems like the tasks are not removed anymore from the queues.
I have to manually cleanup the celery queue which contains after a few days more 250k tasks. It seems that the TTL is set to "-1" however I can't figure out how to set it from django.
After a few hours, I have more than 220k tasks waiting to be processed and growing.
I use the following settings: available in file settings.py
Warning: The names used might not be the correct ones for celery, a remap has been to correctly assign the values with the file celery.py
# Celery Configuration
broker_url = "borker_url" # Redis or RabbitMQ, it doesn't change anything.
broker_use_ssl=True
accept_content = ['application/json']
worker_concurrency = 3
result_serializer = 'json'
result_expires=7*24*30*30
task_serializer = 'json'
task_acks_late=True # Acknoledge pool when task is over
task_reject_on_worker_lost=True
task_time_limit=90
task_soft_time_limit=60
task_always_eager = False
task_queues=[
Queue(
'celery',
Exchange('celery'),
routing_key = 'celery',
queue_arguments = {
'x-message-ttl': 60 * 1000 # 60 000 ms = 60 secs.
}
)
]
event_queue_expires=60
event_queue_ttl=5
beat_scheduler = 'django_celery_beat.schedulers:DatabaseScheduler'
beat_max_loop_interval=10
beat_sync_every=1
monitors_expire_success = timedelta(hours=1)
monitors_expire_error = timedelta(days=3)
monitors_expire_pending = timedelta(days=5)
beat_schedule = {
'refresh_all_rss_subscribers_count': {
'task': 'feedcrunch.tasks.refresh_all_rss_subscribers_count',
'schedule': crontab(hour=0, minute=5), # Everyday at midnight + 5 mins
'options': {'expires': 20 * 60} # 20 minutes
},
'clean_unnecessary_rss_visits': {
'task': 'feedcrunch.tasks.clean_unnecessary_rss_visits',
'schedule': crontab(hour=0, minute=20), # Everyday at midnight + 20 mins
'options': {'expires': 20 * 60} # 20 minutes
},
'celery.backend_cleanup': {
'task': 'celery.backend_cleanup',
'schedule': crontab(minute='30'), # Every hours when minutes = 30 mins
'options': {'expires': 50 * 60} # 50 minutes
},
'refresh_all_rss_feeds': {
'task': 'feedcrunch.tasks.refresh_all_rss_feeds',
'schedule': crontab(minute='40'), # Every hours when minutes = 40 mins
'options': {'expires': 30 * 60} # 30 minutes
},
}
Worker Logs examples
Some idea : Is it normal that "expires" and "timelimit" settings are set to None (see image above).
I think I have found a temporary solution, it seems that the celery library have some bugs. We have to wait for the 4.2.0 release.
I recommend having a look to: https://github.com/celery/celery/issues/#4041.
As a temporary bug fix, I recommend using the following commit: https://github.com/celery/celery/commit/be55de6, it seems to have fixed the issue:
git+https://github.com/celery/celery.git#be55de6#egg=celery

celery tasks queue not working with rabbitmq

Celery tasks successfully executing without queues
setup.
BROKER_URL = "amqp://user:pass#localhost:5672/test"
# Celery Data Format
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERYD_TASK_SOFT_TIME_LIMIT = 60
CELERY_IGNORE_RESULT = True
#app.task
def test(a,b,c):
print("doing something here...")
command
celery worker -A proj -E -l INFO
The above setup worker is executing successfully.
I have introduced queue to the celery tasks.
added configuration with the previous setup
from kombu.entity import Exchange, Queue
CELERY_QUEUES = (
Queue('high', Exchange('high'), routing_key='high'),
Queue('normal', Exchange('normal'), routing_key='normal'),
Queue('low', Exchange('low'), routing_key='low'),
)
CELERY_DEFAULT_QUEUE = 'normal'
CELERY_DEFAULT_EXCHANGE = 'normal'
CELERY_DEFAULT_ROUTING_KEY = 'normal'
CELERY_ROUTES = {
'myapp.tasks.test': {'queue': 'high'},
}
command
celery worker -A proj -E -l INFO -n worker.high -Q high
call
test.delay(1, 2, 3)
When I execute with the queue worker is not running. Did I miss any configuration?
Change CELERY_ROUTES to CELERY_TASK_ROUTES- changed in version 4
First, make sure that connection established in both rabbit & high worker logs.
Then, try to change your CELERY_ROUTES to:
CELERY_ROUTES = {
'myapp.tasks.test': {
'exchange': 'high',
'exchange_type': 'high',
'routing_key': 'high'
}
}
or call the task with queue, for example:
test_task = test.signature(args=(1, 2, 3), queue='high', immutable=True)
test_task.apply_async()

Setting up periodic tasks in Celery (celerybeat) dynamically using add_periodic_task

I'm using Celery 4.0.1 with Django 1.10 and I have troubles scheduling tasks (running a task works fine). Here is the celery configuration:
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myapp.settings')
app = Celery('myapp')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
app.conf.BROKER_URL = 'amqp://{}:{}#{}'.format(settings.AMQP_USER, settings.AMQP_PASSWORD, settings.AMQP_HOST)
app.conf.CELERY_DEFAULT_EXCHANGE = 'myapp.celery'
app.conf.CELERY_DEFAULT_QUEUE = 'myapp.celery_default'
app.conf.CELERY_TASK_SERIALIZER = 'json'
app.conf.CELERY_ACCEPT_CONTENT = ['json']
app.conf.CELERY_IGNORE_RESULT = True
app.conf.CELERY_DISABLE_RATE_LIMITS = True
app.conf.BROKER_POOL_LIMIT = 2
app.conf.CELERY_QUEUES = (
Queue('myapp.celery_default'),
Queue('myapp.queue1'),
Queue('myapp.queue2'),
Queue('myapp.queue3'),
)
Then in tasks.py I have:
#app.task(queue='myapp.queue1')
def my_task(some_id):
print("Doing something with", some_id)
In views.py I want to schedule this task:
def my_view(request, id):
app.add_periodic_task(10, my_task.s(id))
Then I execute the commands:
sudo systemctl start rabbitmq.service
celery -A myapp.celery_app beat -l debug
celery worker -A myapp.celery_app
But the task is never scheduled. I don't see anything in the logs. The task is working because if in my view I do:
def my_view(request, id):
my_task.delay(id)
The task is executed.
If in my configuration file if I schedule the task manually, like this it works:
app.conf.CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.my_task',
'schedule': 10.0,
'args': (66,)
},
}
I just can't schedule the task dynamically. Any idea?
EDIT: (13/01/2018)
The latest release 4.1.0 have addressed the subject in this ticket #3958 and has been merged
Actually you can't not define periodic task at the view level, because the beat schedule setting will be loaded first and can not be rescheduled at runtime:
The add_periodic_task() function will add the entry to the beat_schedule setting behind the scenes, and the same setting can also can be used to set up periodic tasks manually:
app.conf.CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.my_task',
'schedule': 10.0,
'args': (66,)
},
}
which means if you want to use add_periodic_task() it should be wrapped within an on_after_configure handler at the celery app level and any modification on runtime will not take effect:
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(10, my_task.s(66))
As mentioned in the doc the the regular celerybeat simply keep track of task execution:
The default scheduler is the celery.beat.PersistentScheduler, that simply keeps track of the last run times in a local shelve database file.
In order to be able to dynamically manage periodic tasks and reschedule celerybeat at runtime:
There’s also the django-celery-beat extension that stores the schedule in the Django database, and presents a convenient admin interface to manage periodic tasks at runtime.
The tasks will be persisted in django database and the scheduler could be updated in task model at the db level. Whenever you update a periodic task a counter in this tasks table will be incremented, and tells the celery beat service to reload the schedule from the database.
A possible solution for you could be as follow:
from django_celery_beat.models import PeriodicTask, IntervalSchedule
schedule= IntervalSchedule.objects.create(every=10, period=IntervalSchedule.SECONDS)
task = PeriodicTask.objects.create(interval=schedule, name='any name', task='tasks.my_task', args=json.dumps([66]))
views.py
def update_task_view(request, id)
task = PeriodicTask.objects.get(name="task name") # if we suppose names are unique
task.args=json.dumps([id])
task.save()

Python + Celery manual routing

I've been working on getting manual routing set up with Celery, but can't seem to get specific tasks into specific queues. Here's what I've got going on so far pretty much:
CELERY_QUEUES = {
"default": {
"binding_key": "default"},
"medium": {
"binding_key": "medium"},
"heavy": {
"binding_key": "heavy"},
}
with the routes defined like
CELERY_ROUTES = ({ "tasks.some_heavy_task": {
"queue": "heavy",
"routing_key": "tasks.heavy"
}}, )
and the daemons started like
celeryd -l INFO -c 3 -Q heavy
The "some_heavy_task"'s never get run though. When I remove the routing and just have a default queue I can get them to run. What am I doing wrong here, any suggestions?
I created special celeryconfig file for each tasks, all tasks stored in special queue.
Here is example:
CELERY_IMPORTS = ('cleaner_on_celery.tasks',)
CELERYBEAT_SCHEDULE = {
'cleaner': {
"task": "cleaner_on_celery.tasks.cleaner",
"schedule": timedelta(seconds=CLEANER_TIMEOUT),
},
}
CELERY_QUEUES = {
"cleaner": {"exchange": "cleaner", "binding_key": "cleaner"}
}
CELERY_DEFAULT_QUEUE = "cleaner"
from celeryconfig import *
You can see in the bottom: I import common celeryconfig module. In this case you can start few celeryd instances. Also I recommend to use it with supervisord, after creating supervisord.conf file for each task you can easy manage them as:
supervisorctl start cleaner
supervisorctl stop cleaner

Categories