I don't know exactly why, but I am getting duplicated tasks. I thing this may be related with time change of the last weekend (The clock was delayed for an hour in the system).
The first task should not be executed, since I say explicitly hour=2. Any idea why this happens?
[2017-11-01 01:00:00,001: INFO/Beat] Scheduler: Sending due task every-first-day_month (app.users.views.websites_down)
[2017-11-01 02:00:00,007: INFO/Beat] Scheduler: Sending due task every-first-day_month (app.users.views.websites_down)
from celery.schedules import crontab
CELERYBEAT_SCHEDULE = {
'every-first-day_month': {
'task': 'app.users.views.websites_down',
'schedule': crontab(hour=2, minute=0, day_of_month=1),
}
}
CELERY_TIMEZONE = "Europe/Lisbon"
Related
I've been having issues in getting crontab tasks to run in my local time so what I did was create 24 tasks like the following for each hour of the day.
app.conf.beat_schedule = {
'crontab-test-8am': {
'task': 'celery.tasks.test_crontab',
'schedule': crontab(minute="0", hour="8"),
'args':('8am',)
},
}
#app.task()
def test_crontab(message):
print('CRONTAB HAS RUN - ' + message)
I eventually figured out how to have crontab correctly run in my local time with the following
app.conf.enable_utc = False
app.conf.update(timezone = "Australia/Perth") #8am AWST = 12am UTC
The problem is it's now running the tasks in both local time and UTC time.
[2022-07-22 08:00:00,000: INFO/MainProcess] Scheduler: Sending due task crontab-test-8am (celery.tasks.test_crontab)
[2022-07-22 08:00:00,010: INFO/MainProcess] Scheduler: Sending due task crontab-test-12am (celery.tasks.test_crontab)
[2022-07-22 08:00:00,015: WARNING/ForkPoolWorker-51] CRONTAB HAS RUN - 8am
[2022-07-22 08:00:00,223: WARNING/ForkPoolWorker-51] CRONTAB HAS RUN - 12am
From what I could find, celery beat stores the schedules in the celerybeat-schedule file. I deleted it and restarted celery beat but it did the same thing.
I'd like any suggestions as to how I could try fix this.
I am encountering a problem with celery and Django 2. I have two running environments:
Production: requirements.txt => No Issue
amqp==2.2.2
django==1.11.6
celery==4.1.0
django-celery-beat==1.0.1
django-celery-monitor==1.1.2
kombu==4.1.0
redis==2.10.6
Development: requirements.txt =>Issue Present
amqp==2.2.2
django==2.0.3
celery==4.1.0
django-celery-beat==1.1.1
django-celery-monitor==1.1.2
kombu==4.1.0
redis==2.10.6
The production environment should be migrated to Django 2.0 as soon as possible.
However, I can't do it without fixing this issue with Celery. My development environment is here to insure that everything is running fine before upgrading production servers.
The question
What changed with Django 2 to make a system, which was stable with Django 1.11, unstable with exploding queue sizes, with the same effect in RabbitMQ and Redis?
If any task is not consumed, how could it be automatically deleted by Redis/RabbitMQ ?
Celery Worker is launched as followed
The exact same command is used for both environment.
celery beat -A application --loglevel=info --detach
celery events -A application --loglevel=info --camera=django_celery_monitor.camera.Camera --frequency=2.0 --detach
celery worker -A application -l info --events
Application Settings
Since I migrate my development environment to Django 2, my RabbitMQ queues or Redis queues are litterally exploding in size and my database instances keep on scaling up. It seems like the tasks are not removed anymore from the queues.
I have to manually cleanup the celery queue which contains after a few days more 250k tasks. It seems that the TTL is set to "-1" however I can't figure out how to set it from django.
After a few hours, I have more than 220k tasks waiting to be processed and growing.
I use the following settings: available in file settings.py
Warning: The names used might not be the correct ones for celery, a remap has been to correctly assign the values with the file celery.py
# Celery Configuration
broker_url = "borker_url" # Redis or RabbitMQ, it doesn't change anything.
broker_use_ssl=True
accept_content = ['application/json']
worker_concurrency = 3
result_serializer = 'json'
result_expires=7*24*30*30
task_serializer = 'json'
task_acks_late=True # Acknoledge pool when task is over
task_reject_on_worker_lost=True
task_time_limit=90
task_soft_time_limit=60
task_always_eager = False
task_queues=[
Queue(
'celery',
Exchange('celery'),
routing_key = 'celery',
queue_arguments = {
'x-message-ttl': 60 * 1000 # 60 000 ms = 60 secs.
}
)
]
event_queue_expires=60
event_queue_ttl=5
beat_scheduler = 'django_celery_beat.schedulers:DatabaseScheduler'
beat_max_loop_interval=10
beat_sync_every=1
monitors_expire_success = timedelta(hours=1)
monitors_expire_error = timedelta(days=3)
monitors_expire_pending = timedelta(days=5)
beat_schedule = {
'refresh_all_rss_subscribers_count': {
'task': 'feedcrunch.tasks.refresh_all_rss_subscribers_count',
'schedule': crontab(hour=0, minute=5), # Everyday at midnight + 5 mins
'options': {'expires': 20 * 60} # 20 minutes
},
'clean_unnecessary_rss_visits': {
'task': 'feedcrunch.tasks.clean_unnecessary_rss_visits',
'schedule': crontab(hour=0, minute=20), # Everyday at midnight + 20 mins
'options': {'expires': 20 * 60} # 20 minutes
},
'celery.backend_cleanup': {
'task': 'celery.backend_cleanup',
'schedule': crontab(minute='30'), # Every hours when minutes = 30 mins
'options': {'expires': 50 * 60} # 50 minutes
},
'refresh_all_rss_feeds': {
'task': 'feedcrunch.tasks.refresh_all_rss_feeds',
'schedule': crontab(minute='40'), # Every hours when minutes = 40 mins
'options': {'expires': 30 * 60} # 30 minutes
},
}
Worker Logs examples
Some idea : Is it normal that "expires" and "timelimit" settings are set to None (see image above).
I think I have found a temporary solution, it seems that the celery library have some bugs. We have to wait for the 4.2.0 release.
I recommend having a look to: https://github.com/celery/celery/issues/#4041.
As a temporary bug fix, I recommend using the following commit: https://github.com/celery/celery/commit/be55de6, it seems to have fixed the issue:
git+https://github.com/celery/celery.git#be55de6#egg=celery
I want to monitor the status of a celery task.
for example:
I have a celery task that read that reads a table every minute, when the task is executed at minute 2 I do not want to read what I already read the first task
app.conf.beat_schedule = {
'add-every-1-minute': {
'task': 'read_data_base',
'schedule': 1.0
}
}
I run celery:
celery multi start --app=myapp fast_worker
slow_worker
-Q:fast_worker fast-queue
-Q:slow_worker slow-queue
-c:fast_worker 1 -c:slow_worker 1
--logfile=%n.log --pidfile=%n.pid
And celerybeat:
celery beat -A myapp
Task:
#task.periodic_task(run_every=timedelta(seconds=5), ignore_result=True)
def test_log_task_queue():
import time
time.sleep(10)
print "test_log_task_queue"
Routing:
CELERY_ROUTES = {
'myapp.tasks.test_log_task_queue': {
'queue': 'slow-queue',
'routing_key': 'slow-queue',
},
}
I use rabbitMQ. When I open rabbitMQ admin panel, I see that my tasks are in slow-queue, but when I open logs I see task output for both workers. Why do both workers execute my tasks, even when task not in worker queue?
It looks like celery multi creates something like shared queues. To fix this problem, I added -X option:
celery multi start --app=myapp fast_worker
slow_worker
-Q:fast_worker fast-queue
-Q:slow_worker slow-queue
-X:fast_worker slow-queue
-X:slow_worker fast-queue
-c:fast_worker 1 -c:slow_worker 1
--logfile=%n.log --pidfile=%n.pid
I'm not good at english, so if you cannot understand my sentence, give me any comment.
I use celery for periodic task on django.
CELERYBEAT_SCHEDULE = {
'send_sms_one_pm': {
'task': 'tasks.send_one_pm',
'schedule': crontab(minute=0, hour=13),
},
'send_sms_ten_am': {
'task': 'tasks.send_ten_am',
'schedule': crontab(minute=0, hour=10),
},
'night_proposal_noti': {
'task': 'tasks.night_proposal_noti',
'schedule': crontab(minute=0, hour=10)
},
}
This is my celery schedule and i use redis for celery queue.
Problem is, when the biggest task is start, other task is on hold.
biggest task will be processed for 10hours, and, other tasks are start after 10 hours.
My task looks like
#app.task(name='tasks.send_one_pm')
def send_one_pm():
I found, celery give me task.apply_asnyc(), but couldn't find periodic tasks can working on asnyc.
So, i want to know celery's periodic task can work as asnyc task. my celery worker are 8 workers.
Did you assign CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler' in your settings as well?
If you want one task starts to run after another one, you should use apply_asnyc() with link kwargs, it looks like this:
res=[signature(your_task.name, args=(...), options=kwargs, immutable=True),..]
task.apply_async((args), link=res)