I am encountering a problem with celery and Django 2. I have two running environments:
Production: requirements.txt => No Issue
amqp==2.2.2
django==1.11.6
celery==4.1.0
django-celery-beat==1.0.1
django-celery-monitor==1.1.2
kombu==4.1.0
redis==2.10.6
Development: requirements.txt =>Issue Present
amqp==2.2.2
django==2.0.3
celery==4.1.0
django-celery-beat==1.1.1
django-celery-monitor==1.1.2
kombu==4.1.0
redis==2.10.6
The production environment should be migrated to Django 2.0 as soon as possible.
However, I can't do it without fixing this issue with Celery. My development environment is here to insure that everything is running fine before upgrading production servers.
The question
What changed with Django 2 to make a system, which was stable with Django 1.11, unstable with exploding queue sizes, with the same effect in RabbitMQ and Redis?
If any task is not consumed, how could it be automatically deleted by Redis/RabbitMQ ?
Celery Worker is launched as followed
The exact same command is used for both environment.
celery beat -A application --loglevel=info --detach
celery events -A application --loglevel=info --camera=django_celery_monitor.camera.Camera --frequency=2.0 --detach
celery worker -A application -l info --events
Application Settings
Since I migrate my development environment to Django 2, my RabbitMQ queues or Redis queues are litterally exploding in size and my database instances keep on scaling up. It seems like the tasks are not removed anymore from the queues.
I have to manually cleanup the celery queue which contains after a few days more 250k tasks. It seems that the TTL is set to "-1" however I can't figure out how to set it from django.
After a few hours, I have more than 220k tasks waiting to be processed and growing.
I use the following settings: available in file settings.py
Warning: The names used might not be the correct ones for celery, a remap has been to correctly assign the values with the file celery.py
# Celery Configuration
broker_url = "borker_url" # Redis or RabbitMQ, it doesn't change anything.
broker_use_ssl=True
accept_content = ['application/json']
worker_concurrency = 3
result_serializer = 'json'
result_expires=7*24*30*30
task_serializer = 'json'
task_acks_late=True # Acknoledge pool when task is over
task_reject_on_worker_lost=True
task_time_limit=90
task_soft_time_limit=60
task_always_eager = False
task_queues=[
Queue(
'celery',
Exchange('celery'),
routing_key = 'celery',
queue_arguments = {
'x-message-ttl': 60 * 1000 # 60 000 ms = 60 secs.
}
)
]
event_queue_expires=60
event_queue_ttl=5
beat_scheduler = 'django_celery_beat.schedulers:DatabaseScheduler'
beat_max_loop_interval=10
beat_sync_every=1
monitors_expire_success = timedelta(hours=1)
monitors_expire_error = timedelta(days=3)
monitors_expire_pending = timedelta(days=5)
beat_schedule = {
'refresh_all_rss_subscribers_count': {
'task': 'feedcrunch.tasks.refresh_all_rss_subscribers_count',
'schedule': crontab(hour=0, minute=5), # Everyday at midnight + 5 mins
'options': {'expires': 20 * 60} # 20 minutes
},
'clean_unnecessary_rss_visits': {
'task': 'feedcrunch.tasks.clean_unnecessary_rss_visits',
'schedule': crontab(hour=0, minute=20), # Everyday at midnight + 20 mins
'options': {'expires': 20 * 60} # 20 minutes
},
'celery.backend_cleanup': {
'task': 'celery.backend_cleanup',
'schedule': crontab(minute='30'), # Every hours when minutes = 30 mins
'options': {'expires': 50 * 60} # 50 minutes
},
'refresh_all_rss_feeds': {
'task': 'feedcrunch.tasks.refresh_all_rss_feeds',
'schedule': crontab(minute='40'), # Every hours when minutes = 40 mins
'options': {'expires': 30 * 60} # 30 minutes
},
}
Worker Logs examples
Some idea : Is it normal that "expires" and "timelimit" settings are set to None (see image above).
I think I have found a temporary solution, it seems that the celery library have some bugs. We have to wait for the 4.2.0 release.
I recommend having a look to: https://github.com/celery/celery/issues/#4041.
As a temporary bug fix, I recommend using the following commit: https://github.com/celery/celery/commit/be55de6, it seems to have fixed the issue:
git+https://github.com/celery/celery.git#be55de6#egg=celery
Related
I am using Celery 4.3.0. I am trying to update the schedule of celery beat every 5 seconds based on a schedule in a json file, so that when I’m manually editing, adding or deleting scheduled tasks in that json file, the changes are picked up by the celery beat scheduler without restarting it.
What I tried is creating a task that update this schedule by updating app.conf['CELERYBEAT_SCHEDULE']. The task successfully runs every 5 seconds but celery beat doesn’t update to the new schedule, even though I set beat_max_loop_interval to 1 sec.
tasks.py
from celery import Celery
app = Celery("tasks", backend='redis://', broker='redis://')
app.config_from_object('celeryconfig')
#app.task
def hello_world():
return "Hello World!"
#app.task
def update_schedule():
with open("path_to_scheduler.json", "r") as f:
app.conf['CELERYBEAT_SCHEDULE'] = json.load(f)
celeryconfig.py
beat_max_loop_interval = 1 # Maximum number of seconds beat can sleep between checking the schedule
beat_schedule = {
"greet-every-10-seconds": {
"task": "tasks.hello_world",
"schedule": 10.0
},
'update_schedule': {
'task': 'tasks.update_schedule',
'schedule': 5.0
},
}
schedule.json
{
"greet-every-2-seconds": {
"task": "tasks.hello_world",
"schedule": 2.0
},
"update_schedule": {
"task": "tasks.update_schedule",
"schedule": 5.0
}
}
Any help would be appreciated. If you know a better way to reload the beat schedule from a file I’m also keen to hear it.
I am trying to schedule a task that runs every 10 minutes using Django 1.9.8, Celery 4.0.2, RabbitMQ 2.1.4, Redis 2.10.5. These are all running within Docker containers in Linux (Fedora 25). I have tried many combinations of things that I found in Celery docs and from this site. The only combination that has worked thus far is below. However, it only runs the periodic task initially when the application starts, but the schedule is ignored thereafter. I have absolutely confirmed that the scheduled task does not run again after the initial time.
My (almost-working) setup that only runs one-time:
settings.py:
INSTALLED_APPS = (
...
'django_celery_beat',
...
)
BROKER_URL = 'amqp://{user}:{password}#{hostname}/{vhost}/'.format(
user=os.environ['RABBIT_USER'],
password=os.environ['RABBIT_PASS'],
hostname=RABBIT_HOSTNAME,
vhost=os.environ.get('RABBIT_ENV_VHOST', '')
# We don't want to have dead connections stored on rabbitmq, so we have to negotiate using heartbeats
BROKER_HEARTBEAT = '?heartbeat=30'
if not BROKER_URL.endswith(BROKER_HEARTBEAT):
BROKER_URL += BROKER_HEARTBEAT
BROKER_POOL_LIMIT = 1
BROKER_CONNECTION_TIMEOUT = 10
# Celery configuration
# configure queues, currently we have only one
CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
)
# Sensible settings for celery
CELERY_ALWAYS_EAGER = False
CELERY_ACKS_LATE = True
CELERY_TASK_PUBLISH_RETRY = True
CELERY_DISABLE_RATE_LIMITS = False
# By default we will ignore result
# If you want to see results and try out tasks interactively, change it to False
# Or change this setting on tasks level
CELERY_IGNORE_RESULT = True
CELERY_SEND_TASK_ERROR_EMAILS = False
CELERY_TASK_RESULT_EXPIRES = 600
# Set redis as celery result backend
CELERY_RESULT_BACKEND = 'redis://%s:%d/%d' % (REDIS_HOST, REDIS_PORT, REDIS_DB)
CELERY_REDIS_MAX_CONNECTIONS = 1
# Don't use pickle as serializer, json is much safer
CELERY_TASK_SERIALIZER = "json"
CELERY_RESULT_SERIALIZER = "json"
CELERY_ACCEPT_CONTENT = ['application/json']
CELERYD_HIJACK_ROOT_LOGGER = False
CELERYD_PREFETCH_MULTIPLIER = 1
CELERYD_MAX_TASKS_PER_CHILD = 1000
celeryconf.py
coding=UTF8
from __future__ import absolute_import
import os
from celery import Celery
from django.conf import settings
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "web_portal.settings")
app = Celery('web_portal')
CELERY_TIMEZONE = 'UTC'
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
tasks.py
from celery.schedules import crontab
from .celeryconf import app as celery_app
#celery_app.on_after_finalize.connect
def setup_periodic_tasks(sender, **kwargs):
# Calls email_scanner every 10 minutes
sender.add_periodic_task(
crontab(hour='*',
minute='*/10',
second='*',
day_of_week='*',
day_of_month='*'),
email_scanner.delay(),
)
#app.task
def email_scanner():
dispatch_list = scanning.email_scan()
for dispatch in dispatch_list:
validate_dispatch.delay(dispatch)
return
run_celery.sh -- Used to start celery tasks from docker-compose.yml
#!/bin/sh
# wait for RabbitMQ server to start
sleep 10
cd web_portal
# run Celery worker for our project myproject with Celery configuration stored in Celeryconf
su -m myuser -c "celery beat -l info --pidfile=/tmp/celerybeat-web_portal.pid -s /tmp/celerybeat-schedule &"
su -m myuser -c "celery worker -A web_portal.celeryconf -Q default -n default#%h"
I have also tried using a CELERYBEAT_SCHEDULER in the settings.py in lieu of the #celery_app.on_after finalize_connect decorator and block in tasks.py, but the scheduler never ran even once.
settings.py (not working at all scenario)
(same as before except also including the following)
CELERYBEAT_SCHEDULE = {
'email-scanner-every-5-minutes': {
'task': 'tasks.email_scanner',
'schedule': timedelta(minutes=10)
},
}
The Celery 4.0.2 documentation online presumes that I should instinctively know many givens, but I am new in this environment. If anybody knows where I can find a tutorial OTHER THAN docs.celeryproject.org and http://django-celery-beat.readthedocs.io/en/latest/ which both assume that I am already a Django master, I would be grateful. Or let me know of course if you see something obviously wrong in my setup. Thanks!
I found a solution that works. I could not get CELERYBEAT_SCHEDULE or the celery task decorators to work, and I suspect that it may be at least partially due with the manner in which I started the Celery beat task.
The working solution goes the whole 9 yards to utilize Django Database Scheduler. I downloaded the GitHub project "https://github.com/celery/django-celery-beat" and incorporated all of the code as another "app" in my project. This enabled Django-Admin access to maintain the cron / interval / periodic task(s) tables via a browser. I also modified my run_celery.sh as follows:
#!/bin/sh
# wait for RabbitMQ server to start
sleep 10
# run Celery worker for our project myproject with Celery configuration stored in Celeryconf
celery beat -A web_portal.celeryconf -l info --pidfile=/tmp/celerybeat- web_portal.pid -S django --detach
su -m myuser -c "celery worker -A web_portal.celeryconf -Q default -n default#%h -l info "
After adding a scheduled task via the django-admin web interface, the scheduler started working fine.
I'm using Celery 4.0.1 with Django 1.10 and I have troubles scheduling tasks (running a task works fine). Here is the celery configuration:
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myapp.settings')
app = Celery('myapp')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
app.conf.BROKER_URL = 'amqp://{}:{}#{}'.format(settings.AMQP_USER, settings.AMQP_PASSWORD, settings.AMQP_HOST)
app.conf.CELERY_DEFAULT_EXCHANGE = 'myapp.celery'
app.conf.CELERY_DEFAULT_QUEUE = 'myapp.celery_default'
app.conf.CELERY_TASK_SERIALIZER = 'json'
app.conf.CELERY_ACCEPT_CONTENT = ['json']
app.conf.CELERY_IGNORE_RESULT = True
app.conf.CELERY_DISABLE_RATE_LIMITS = True
app.conf.BROKER_POOL_LIMIT = 2
app.conf.CELERY_QUEUES = (
Queue('myapp.celery_default'),
Queue('myapp.queue1'),
Queue('myapp.queue2'),
Queue('myapp.queue3'),
)
Then in tasks.py I have:
#app.task(queue='myapp.queue1')
def my_task(some_id):
print("Doing something with", some_id)
In views.py I want to schedule this task:
def my_view(request, id):
app.add_periodic_task(10, my_task.s(id))
Then I execute the commands:
sudo systemctl start rabbitmq.service
celery -A myapp.celery_app beat -l debug
celery worker -A myapp.celery_app
But the task is never scheduled. I don't see anything in the logs. The task is working because if in my view I do:
def my_view(request, id):
my_task.delay(id)
The task is executed.
If in my configuration file if I schedule the task manually, like this it works:
app.conf.CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.my_task',
'schedule': 10.0,
'args': (66,)
},
}
I just can't schedule the task dynamically. Any idea?
EDIT: (13/01/2018)
The latest release 4.1.0 have addressed the subject in this ticket #3958 and has been merged
Actually you can't not define periodic task at the view level, because the beat schedule setting will be loaded first and can not be rescheduled at runtime:
The add_periodic_task() function will add the entry to the beat_schedule setting behind the scenes, and the same setting can also can be used to set up periodic tasks manually:
app.conf.CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.my_task',
'schedule': 10.0,
'args': (66,)
},
}
which means if you want to use add_periodic_task() it should be wrapped within an on_after_configure handler at the celery app level and any modification on runtime will not take effect:
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(10, my_task.s(66))
As mentioned in the doc the the regular celerybeat simply keep track of task execution:
The default scheduler is the celery.beat.PersistentScheduler, that simply keeps track of the last run times in a local shelve database file.
In order to be able to dynamically manage periodic tasks and reschedule celerybeat at runtime:
There’s also the django-celery-beat extension that stores the schedule in the Django database, and presents a convenient admin interface to manage periodic tasks at runtime.
The tasks will be persisted in django database and the scheduler could be updated in task model at the db level. Whenever you update a periodic task a counter in this tasks table will be incremented, and tells the celery beat service to reload the schedule from the database.
A possible solution for you could be as follow:
from django_celery_beat.models import PeriodicTask, IntervalSchedule
schedule= IntervalSchedule.objects.create(every=10, period=IntervalSchedule.SECONDS)
task = PeriodicTask.objects.create(interval=schedule, name='any name', task='tasks.my_task', args=json.dumps([66]))
views.py
def update_task_view(request, id)
task = PeriodicTask.objects.get(name="task name") # if we suppose names are unique
task.args=json.dumps([id])
task.save()
I'm not good at english, so if you cannot understand my sentence, give me any comment.
I use celery for periodic task on django.
CELERYBEAT_SCHEDULE = {
'send_sms_one_pm': {
'task': 'tasks.send_one_pm',
'schedule': crontab(minute=0, hour=13),
},
'send_sms_ten_am': {
'task': 'tasks.send_ten_am',
'schedule': crontab(minute=0, hour=10),
},
'night_proposal_noti': {
'task': 'tasks.night_proposal_noti',
'schedule': crontab(minute=0, hour=10)
},
}
This is my celery schedule and i use redis for celery queue.
Problem is, when the biggest task is start, other task is on hold.
biggest task will be processed for 10hours, and, other tasks are start after 10 hours.
My task looks like
#app.task(name='tasks.send_one_pm')
def send_one_pm():
I found, celery give me task.apply_asnyc(), but couldn't find periodic tasks can working on asnyc.
So, i want to know celery's periodic task can work as asnyc task. my celery worker are 8 workers.
Did you assign CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler' in your settings as well?
If you want one task starts to run after another one, you should use apply_asnyc() with link kwargs, it looks like this:
res=[signature(your_task.name, args=(...), options=kwargs, immutable=True),..]
task.apply_async((args), link=res)
Question
After running tasks via celery's periodic task scheduler, beat, why do I have so many unconsumed queues remaining in RabbitMQ?
Setup
Django web app running on Heroku
Tasks scheduled via celery beat
Tasks run via celery worker
Message broker is RabbitMQ from ClouldAMQP
Procfile
web: gunicorn --workers=2 --worker-class=gevent --bind=0.0.0.0:$PORT project_name.wsgi:application
scheduler: python manage.py celery worker --loglevel=ERROR -B -E --maxtasksperchild=1000
worker: python manage.py celery worker -E --maxtasksperchild=1000 --loglevel=ERROR
settings.py
CELERYBEAT_SCHEDULE = {
'do_some_task': {
'task': 'project_name.apps.appname.tasks.some_task',
'schedule': datetime.timedelta(seconds=60 * 15),
'args': ''
},
}
tasks.py
#celery.task
def some_task()
# Get some data from external resources
# Save that data to the database
# No return value specified
Result
Every time the task runs, I get (via the RabbitMQ web interface):
An additional message in the "Ready" state under my "Queued Messages"
An additional queue with a single message in the "ready" state
This queue has no listed consumers
It ended up being my setting for CELERY_RESULT_BACKEND.
Previously, it was:
CELERY_RESULT_BACKEND = 'amqp'
I no longer had unconsumed messages / queues in RabbitMQ after I changed it to:
CELERY_RESULT_BACKEND = 'database'
What was happening, it would appear, is that after a task was executed, celery was sending info about that task back via rabbitmq, but, there was nothing setup to consume these responses messages, hence a bunch of unread ones ending up in the queue.
NOTE: This means that celery would be adding database entries recording the outcomes of tasks. To keep my database from getting loaded up with useless messages, I added:
# Delete result records ("tombstones") from database after 4 hours
# http://docs.celeryproject.org/en/latest/configuration.html#celery-task-result-expires
CELERY_TASK_RESULT_EXPIRES = 14400
Relevant parts from Settings.py
########## CELERY CONFIGURATION
import djcelery
# https://github.com/celery/django-celery/
djcelery.setup_loader()
INSTALLED_APPS = INSTALLED_APPS + (
'djcelery',
)
# Compress all the messages using gzip
# http://celery.readthedocs.org/en/latest/userguide/calling.html#compression
CELERY_MESSAGE_COMPRESSION = 'gzip'
# See: http://docs.celeryproject.org/en/latest/configuration.html#broker-transport
BROKER_TRANSPORT = 'amqplib'
# Set this number to the amount of allowed concurrent connections on your AMQP
# provider, divided by the amount of active workers you have.
#
# For example, if you have the 'Little Lemur' CloudAMQP plan (their free tier),
# they allow 3 concurrent connections. So if you run a single worker, you'd
# want this number to be 3. If you had 3 workers running, you'd lower this
# number to 1, since 3 workers each maintaining one open connection = 3
# connections total.
#
# See: http://docs.celeryproject.org/en/latest/configuration.html#broker-pool-limit
BROKER_POOL_LIMIT = 3
# See: http://docs.celeryproject.org/en/latest/configuration.html#broker-connection-max-retries
BROKER_CONNECTION_MAX_RETRIES = 0
# See: http://docs.celeryproject.org/en/latest/configuration.html#broker-url
BROKER_URL = os.environ.get('CLOUDAMQP_URL')
# Previously, had this set to 'amqp', this resulted in many read / unconsumed
# queues and messages in RabbitMQ
# See: http://docs.celeryproject.org/en/latest/configuration.html#celery-result-backend
CELERY_RESULT_BACKEND = 'database'
# Delete result records ("tombstones") from database after 4 hours
# http://docs.celeryproject.org/en/latest/configuration.html#celery-task-result-expires
CELERY_TASK_RESULT_EXPIRES = 14400
########## END CELERY CONFIGURATION
Looks like you are getting back responses from your consumed tasks.
You can avoid that by doing:
#celery.task(ignore_result=True)