Django celerybeat periodic task only runs once - python

I am trying to schedule a task that runs every 10 minutes using Django 1.9.8, Celery 4.0.2, RabbitMQ 2.1.4, Redis 2.10.5. These are all running within Docker containers in Linux (Fedora 25). I have tried many combinations of things that I found in Celery docs and from this site. The only combination that has worked thus far is below. However, it only runs the periodic task initially when the application starts, but the schedule is ignored thereafter. I have absolutely confirmed that the scheduled task does not run again after the initial time.
My (almost-working) setup that only runs one-time:
settings.py:
INSTALLED_APPS = (
...
'django_celery_beat',
...
)
BROKER_URL = 'amqp://{user}:{password}#{hostname}/{vhost}/'.format(
user=os.environ['RABBIT_USER'],
password=os.environ['RABBIT_PASS'],
hostname=RABBIT_HOSTNAME,
vhost=os.environ.get('RABBIT_ENV_VHOST', '')
# We don't want to have dead connections stored on rabbitmq, so we have to negotiate using heartbeats
BROKER_HEARTBEAT = '?heartbeat=30'
if not BROKER_URL.endswith(BROKER_HEARTBEAT):
BROKER_URL += BROKER_HEARTBEAT
BROKER_POOL_LIMIT = 1
BROKER_CONNECTION_TIMEOUT = 10
# Celery configuration
# configure queues, currently we have only one
CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
)
# Sensible settings for celery
CELERY_ALWAYS_EAGER = False
CELERY_ACKS_LATE = True
CELERY_TASK_PUBLISH_RETRY = True
CELERY_DISABLE_RATE_LIMITS = False
# By default we will ignore result
# If you want to see results and try out tasks interactively, change it to False
# Or change this setting on tasks level
CELERY_IGNORE_RESULT = True
CELERY_SEND_TASK_ERROR_EMAILS = False
CELERY_TASK_RESULT_EXPIRES = 600
# Set redis as celery result backend
CELERY_RESULT_BACKEND = 'redis://%s:%d/%d' % (REDIS_HOST, REDIS_PORT, REDIS_DB)
CELERY_REDIS_MAX_CONNECTIONS = 1
# Don't use pickle as serializer, json is much safer
CELERY_TASK_SERIALIZER = "json"
CELERY_RESULT_SERIALIZER = "json"
CELERY_ACCEPT_CONTENT = ['application/json']
CELERYD_HIJACK_ROOT_LOGGER = False
CELERYD_PREFETCH_MULTIPLIER = 1
CELERYD_MAX_TASKS_PER_CHILD = 1000
celeryconf.py
coding=UTF8
from __future__ import absolute_import
import os
from celery import Celery
from django.conf import settings
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "web_portal.settings")
app = Celery('web_portal')
CELERY_TIMEZONE = 'UTC'
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
tasks.py
from celery.schedules import crontab
from .celeryconf import app as celery_app
#celery_app.on_after_finalize.connect
def setup_periodic_tasks(sender, **kwargs):
# Calls email_scanner every 10 minutes
sender.add_periodic_task(
crontab(hour='*',
minute='*/10',
second='*',
day_of_week='*',
day_of_month='*'),
email_scanner.delay(),
)
#app.task
def email_scanner():
dispatch_list = scanning.email_scan()
for dispatch in dispatch_list:
validate_dispatch.delay(dispatch)
return
run_celery.sh -- Used to start celery tasks from docker-compose.yml
#!/bin/sh
# wait for RabbitMQ server to start
sleep 10
cd web_portal
# run Celery worker for our project myproject with Celery configuration stored in Celeryconf
su -m myuser -c "celery beat -l info --pidfile=/tmp/celerybeat-web_portal.pid -s /tmp/celerybeat-schedule &"
su -m myuser -c "celery worker -A web_portal.celeryconf -Q default -n default#%h"
I have also tried using a CELERYBEAT_SCHEDULER in the settings.py in lieu of the #celery_app.on_after finalize_connect decorator and block in tasks.py, but the scheduler never ran even once.
settings.py (not working at all scenario)
(same as before except also including the following)
CELERYBEAT_SCHEDULE = {
'email-scanner-every-5-minutes': {
'task': 'tasks.email_scanner',
'schedule': timedelta(minutes=10)
},
}
The Celery 4.0.2 documentation online presumes that I should instinctively know many givens, but I am new in this environment. If anybody knows where I can find a tutorial OTHER THAN docs.celeryproject.org and http://django-celery-beat.readthedocs.io/en/latest/ which both assume that I am already a Django master, I would be grateful. Or let me know of course if you see something obviously wrong in my setup. Thanks!

I found a solution that works. I could not get CELERYBEAT_SCHEDULE or the celery task decorators to work, and I suspect that it may be at least partially due with the manner in which I started the Celery beat task.
The working solution goes the whole 9 yards to utilize Django Database Scheduler. I downloaded the GitHub project "https://github.com/celery/django-celery-beat" and incorporated all of the code as another "app" in my project. This enabled Django-Admin access to maintain the cron / interval / periodic task(s) tables via a browser. I also modified my run_celery.sh as follows:
#!/bin/sh
# wait for RabbitMQ server to start
sleep 10
# run Celery worker for our project myproject with Celery configuration stored in Celeryconf
celery beat -A web_portal.celeryconf -l info --pidfile=/tmp/celerybeat- web_portal.pid -S django --detach
su -m myuser -c "celery worker -A web_portal.celeryconf -Q default -n default#%h -l info "
After adding a scheduled task via the django-admin web interface, the scheduler started working fine.

Related

Celery jobs not running on heroku (python/django app)

I have a Django app setup with some scheduled tasks. The app is deployed on Heroku with Redis. The task runs if invoked synchronously in the console, or locally when I also have redis and celery running. However, the scheduled jobs are not running on Heroku.
My task:
#shared_task(name="send_emails")
def send_emails():
.....
celery.py:
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
from celery.schedules import crontab
# set the default Django settings module for the 'celery' program.
# this is also used in manage.py
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'my_app.settings')
# Get the base REDIS URL, default to redis' default
BASE_REDIS_URL = os.environ.get('REDIS_URL', 'redis://localhost:6379')
app = Celery('my_app')
# Using a string here means the worker don't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
app.conf.broker_url = BASE_REDIS_URL
# this allows you to schedule items in the Django admin.
app.conf.beat_scheduler = 'django_celery_beat.schedulers.DatabaseScheduler'
# These are the scheduled jobs
app.conf.beat_schedule = {
'send_emails_crontab': {
'task': 'send_emails',
'schedule': crontab(hour=9, minute=0),
'args': (),
}
}
In Procfile:
worker: celery -A my_app worker --beat -S django -l info
I've spun up the worker with heroku ps:scale worker=1 -a my-app.
I can see the registered tasks under [tasks] in the worker logs.
However, the scheduled tasks are not running at their scheduled time. Calling send_emails.delay() in the production console does work.
How do I get the worker to stay alive and / or run the job at the scheduled time?
I have a workaround using a command and heroku scheduler. Just unsure if that's the best way to do it.
If you're on free demo, you should know that heroku server sleeps and if your scheduled task becomes due when your server is sleeping, it won't run.
I share you any ideas.
Run console and get the datetime of Dyno. The Dyno use a localtime US.
The DynoFree sleeps each 30 minutes and only 450 hours/month.
Try change celery to BackgroundScheduler,
you need add a script clock.py as:
from myapp import myfunction
from datetime import datetime
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.schedulers.blocking import BlockingScheduler
from time import monotonic, sleep, ctime
import os
sched = BlockingScheduler()
hour = int(os.environ.get("SEARCH_HOUR"))
minutes = int(os.environ.get("SEARCH_MINUTES"))
#sched.scheduled_job('cron', day_of_week='mon-sun', hour=hour, minute = minutes)
def scheduled_job():
print('This job: Execute myfunction every at ', hour, ':', minutes)
#My function
myfunction()
sched.start(
)
In Procfile:
clock: python clock.py
and run:
heroku ps:scale clock=1 --app thenameapp
Regards.

Reload celery beat config

I'm using celery and celery-beat without Django and I have a task which needs to modify celery-beat schedule when run.
Now I have the following code (module called celery_tasks):
# __init__.py
from .celery import app as celery_app
__all__ = ['celery_app']
#celery.py
from celery import Celery
import config
celery_config = config.get_celery_config()
app = Celery(
__name__,
include=[
'celery_tasks.tasks',
],
)
app.conf.update(celery_config)
# tasks.py
from celery_tasks import celery_app
from celery import shared_task
#shared_task
def start_game():
celery_app.conf.beat_schedule = {
'process_round': {
'task': 'celery_tasks.tasks.process_round',
'schedule': 5,
},
}
I start celery with the following command:
celery worker -A celery_tasks -E -l info --beat
start_game executes and exists normally, but beat process_round task never runs.
How can I force-reload beat schedule (restarting all workers doesn't seem as a good idea)?
the problem with normal celery backend when you start the celerybeat process. it will create a config file and write all tasks and schedules in to that file so it cannot change dynamically
you can use the package
celerybeat-sqlalchemy-scheduler so you can edit schedule on DB itself so that celerybeat will pickup the new schedule from DB itself
also there is another package celery-redbeat which using redis-server as backend
you can refer this this also
Using schedule config also seems bad idea. What if initially process_round task will be active and check if game is not started task just do nothing.

Celery - Flower service in Cent/7 doesn't show data after a restart

My service is as below :
[Service]
User=vagrant
Group=vagrant
WorkingDirectory=/home/vagrant/
ExecStart=/usr/bin/celery -A tasks /usr/bin/flower --port=5000 --basic_auth=sun:flower --persistent=true --/var/log/supervisord.log
Restart=on-failure
Type=simple
I have created a simple tasks for testing purposes after creating a REDIS master-slave replication and pointing Celery and flower to Master...
celery.ini for Supervisord
[program:SPSimpleTasks]
directory = /home/vagrant/
command = celery -A tasks worker --loglevel=DEBUG -n SimpleTasks
user=vagrant
autostart=true
autorestart=true
stdout_logfile=/var/log/supervisord.log
stderr_logfile=/var/log/supervisord.log
celery_app.py
from celery import Celery
app = Celery('tasks', backend='redis://:<PASSWORD>#192.168.100.20:6379/0')
#app.config_from_object('celeryconfig')
app.conf.update(
enable_utc = True,
broker_url = 'redis://:<PASSWORD>#192.168.100.20:6379/0',
result_backend = 'redis://:<PASSWORD>#192.168.100.20:6379/1',
result_persistent = True,
task_serializer = 'json',
result_serializer = 'json',
accept_content = ['json']
)
Again, the workers are working fine and flower shows the tasks result. But once I restart the service using
sudo service flower restart
Flower loads as its brand new install without any data...
Am I doing something wrong with the broker / backend urls?
Well It was an additional change to the configuration in Flower and I changed the approach a bit!
The required config property was db
I updated the service to
/etc/systemd/system/flower.service
[Unit]
Description=Flower Celery Service
[Service]
User=vagrant
Group=vagrant
WorkingDirectory=/home/vagrant/
ExecStart=/usr/bin/flower --/var/log/supervisord.log
Restart=on-failure
Type=simple
[Install]
WantedBy=multi-user.target
/home/vagrant/flowerconfig.py
port = 5000
broker = 'redis://:<password>#192.168.100.20:6379/0'
broker_api = 'redis://:<password>#192.168.100.20:6379/0'
basic_auth = ['sun:flower']
persistent = True
db = '/home/vagrant/flower.db'
After providing the db property, it was all fine. Except for one issue.. the Flower Dashboard was not showing data but the Tasks page was loading the tasks list prior to the refresh!
Learnt from http://flower.readthedocs.io/en/latest/config.html#db
A database file to use if persistent mode is enabled (by default,
db=flower)
I've submitted the same as issue / observation in flower -- https://github.com/mher/flower/issues/787
Hope this helps to other facing similar issue!!

Setting up periodic tasks in Celery (celerybeat) dynamically using add_periodic_task

I'm using Celery 4.0.1 with Django 1.10 and I have troubles scheduling tasks (running a task works fine). Here is the celery configuration:
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myapp.settings')
app = Celery('myapp')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
app.conf.BROKER_URL = 'amqp://{}:{}#{}'.format(settings.AMQP_USER, settings.AMQP_PASSWORD, settings.AMQP_HOST)
app.conf.CELERY_DEFAULT_EXCHANGE = 'myapp.celery'
app.conf.CELERY_DEFAULT_QUEUE = 'myapp.celery_default'
app.conf.CELERY_TASK_SERIALIZER = 'json'
app.conf.CELERY_ACCEPT_CONTENT = ['json']
app.conf.CELERY_IGNORE_RESULT = True
app.conf.CELERY_DISABLE_RATE_LIMITS = True
app.conf.BROKER_POOL_LIMIT = 2
app.conf.CELERY_QUEUES = (
Queue('myapp.celery_default'),
Queue('myapp.queue1'),
Queue('myapp.queue2'),
Queue('myapp.queue3'),
)
Then in tasks.py I have:
#app.task(queue='myapp.queue1')
def my_task(some_id):
print("Doing something with", some_id)
In views.py I want to schedule this task:
def my_view(request, id):
app.add_periodic_task(10, my_task.s(id))
Then I execute the commands:
sudo systemctl start rabbitmq.service
celery -A myapp.celery_app beat -l debug
celery worker -A myapp.celery_app
But the task is never scheduled. I don't see anything in the logs. The task is working because if in my view I do:
def my_view(request, id):
my_task.delay(id)
The task is executed.
If in my configuration file if I schedule the task manually, like this it works:
app.conf.CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.my_task',
'schedule': 10.0,
'args': (66,)
},
}
I just can't schedule the task dynamically. Any idea?
EDIT: (13/01/2018)
The latest release 4.1.0 have addressed the subject in this ticket #3958 and has been merged
Actually you can't not define periodic task at the view level, because the beat schedule setting will be loaded first and can not be rescheduled at runtime:
The add_periodic_task() function will add the entry to the beat_schedule setting behind the scenes, and the same setting can also can be used to set up periodic tasks manually:
app.conf.CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.my_task',
'schedule': 10.0,
'args': (66,)
},
}
which means if you want to use add_periodic_task() it should be wrapped within an on_after_configure handler at the celery app level and any modification on runtime will not take effect:
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(10, my_task.s(66))
As mentioned in the doc the the regular celerybeat simply keep track of task execution:
The default scheduler is the celery.beat.PersistentScheduler, that simply keeps track of the last run times in a local shelve database file.
In order to be able to dynamically manage periodic tasks and reschedule celerybeat at runtime:
There’s also the django-celery-beat extension that stores the schedule in the Django database, and presents a convenient admin interface to manage periodic tasks at runtime.
The tasks will be persisted in django database and the scheduler could be updated in task model at the db level. Whenever you update a periodic task a counter in this tasks table will be incremented, and tells the celery beat service to reload the schedule from the database.
A possible solution for you could be as follow:
from django_celery_beat.models import PeriodicTask, IntervalSchedule
schedule= IntervalSchedule.objects.create(every=10, period=IntervalSchedule.SECONDS)
task = PeriodicTask.objects.create(interval=schedule, name='any name', task='tasks.my_task', args=json.dumps([66]))
views.py
def update_task_view(request, id)
task = PeriodicTask.objects.get(name="task name") # if we suppose names are unique
task.args=json.dumps([id])
task.save()

Celery-RabbitMQ Distributed Queue Test Message

I keep getting:ERROR/MainProcess] consumer: Cannot connect to amqp://ec2celeryuser when I run celery -A tasks worker on terminal.
Basically what I'm trying to do is get celery/rabbitmq working properly across (2) ec2 instances. To pass a silly task in tasks.py for processing to rabbitmq.
Instance 1 - Houses rabbitMQ
This currently runs RabbitMQ fine. If I run sudo rabbitmqctl status it outputs:
Status of node 'rabbit#ip-xx-xxx-xxx-xx' ...
[{pid,786},
2. Instance 2 - Houses Celery
I'm trying to run celery on instance 2 against Instance 1 using the following in terminal:
celery -A tasks worker
I have a file celeryconfig.py:
BROKER_URL = 'amqp://ec2celeryuser:mypasshere#xx.xxx.xx.xx:5672/celeryserver1/'
#CELERY SETTINGS
CELERY_IMPORTS = ("tasks",)
CELERY_RESULT_BACKEND = "amqp"
I have a file client.py:
from tasks import add
result = add.delay(4, 4) # call task
result_sum = result.get(timeout=5) # wait to get result for a maximum of 5 seconds
I have a file tasks.py:
from celery import Celery
app = Celery('tasks', broker='amqp://ec2celeryuser:mypasshere#xx.xxx.xx.xx:5672/celeryserver1/')
#app.task
def add(x, y):
return x + y
I've properly setup a vhost, a user ec2celeryuser, and gave this user permissions of:
sudo rabbitmqctl set_permissions -p /celeryserver1 ec2celeryuser ".*" ".*" ".*"
if I do: sudo rabbitmqctl list_users on RabbitMQ (instance 1) it shows:
ec2celeryuser []
guest [administrator
I've tried both usernames with their passwords, but no change.
I've been following the Celery Guide, and a tutorial without much luck.
What am I doing wrong here? Clearly there is a connection issue, but what am I doing wrong?
Thank you!
Thanks to user natdempk for helping me fix the configuration syntax of a queues.
The issue was creating a vhost in rabbitmq like:
sudo rabbitmqctl add_vhost /celeryserver1
when it should have been:
sudo rabbitmqctl add_vhost celeryserver1
I then had to reset the permissions for my user ec2celeryuser like:
sudo rabbitmqctl set_permissions -p celeryserver1 ec2celeryuser ".*" ".*" ".*"
The way I realized this was the issue was: I visited /var/log/rabbitmq/<last log file.log>
and saw:
=INFO REPORT==== 30-Apr-2014::12:45:58 ===
accepted TCP connection on [::]:5672 from xx.xxx.xxx.xxx:45964
=INFO REPORT==== 30-Apr-2014::12:45:58 ===
starting TCP connection <x.xxx.x> from from xx.xxx.xxx.xxx:45964
=ERROR REPORT==== 30-Apr-2014::12:46:01 ===
exception on TCP connection <x.xxx.x> from from xx.xxx.xxx.xxx:45964
{channel0_error,opening,
{amqp_error,access_refused,
"access to vhost 'celeryserver1/' refused for user 'ec2celeryuser'",
'connection.open'}}
Since fixing the vhost, I now pleasantly see:
[2014-04-30 13:08:10,101: WARNING/MainProcess] celery#ip-xx-xxx-xx-xxx ready.
So I see a few things wrong here. First your broker URL for rabbitMQ in tasks.py doesn't seem correct. It should read something like below.
app = Celery('tasks', broker='amqp://ec2celeryuser:ec2celerypassword#xx.xxx.xx.xx/celeryserver1/')
Also you might want to specify the app you want celery to serve when you run the worker process. You can do this by running celery -A tasks worker from the directory tasks.py is located in.
Another thing is your code in client.py to call your task seems incorrect. From the celery documentation, you can call the task as follows:
from tasks import add
result = add.delay(4, 4) # call task
result_sum = result.get(timeout=5) # wait to get result for a maximum of 5 seconds
Fixing these might solve your issue, or at least get you closer.

Categories