Celery Daily Scheduled Tasks with Crontab - python

I have problem with daily scheduled tasks with crontab.
Here is my celery.py
app.conf.beat_schedule = {
'run-cache-updater': {
'task': 'tasks.run_cache_updater',
'schedule': crontab(
minute=0,
hour='1-4'
),
}
}
Below is my tasks.py
What I am doing there is, getting all records from DB. Triggering other jobs to update my caches on Redis.
#app.task
def run_cache_updater():
batch_size = 1000
cache_records = models.CacheRecords.objects.all()
def _chunk_list(all_records, size_of_batch):
for i in range(0, len(all_records), size_of_batch):
yield [item.id for item in all_records[i: i + batch_size]]
for items in _chunk_list(cache_records, batch_size):
update_cache.delay(items)
#app.task
def update_cache(ids_in_chunks):
for id in ids_in_chunks:
# Some calls are done here. Then sleep for 200 ms.
time.sleep(0.2)
My tasks are running good. However, they start to run between 1 and 4 and then they start again every 4 hours like 8-11, 15-18..
What I am doing wrong here and how can I fix it?

This sounds like a Celery bug, it's probably worth raising on their Github repo.
However, as a workaround, you could try the more explicit notation, hour='1,2,3,4', just in case the issue is in the parsing of that specific crontab interval style.

Related

Django Celery Periodic task is not running on given crontab

I am using the below packages.
celery==5.1.2
Django==3.1
I have 2 periodic celery tasks, in which I want the first task to run every 15 mins and the second to run every 20 mins. But the problem is that the first task is running on time, while the second is running on random timing.
Although I'm getting a message on console on time for both tasks:
Scheduler: Sending due task <task_name> (<task_name>)
Please find the following files,
celery.py
from celery import Celery, Task
app = Celery('settings')
...
class PeriodicTask(Task):
#classmethod
def on_bound(cls, app):
app.conf.beat_schedule[cls.name] = {
"schedule": cls.run_every,
"task": cls.name,
"args": cls.args if hasattr(cls, "args") else (),
"kwargs": cls.kwargs if hasattr(cls, "kwargs") else {},
"options": cls.options if hasattr(cls, "options") else {}
}
tasks.py
from celery.schedules import crontab
from settings.celery import app, PeriodicTask
...
#app.task(
base=PeriodicTask,
run_every=crontab(minute='*/15'),
name='task1',
options={'queue': 'queue_name'}
)
def task1():
logger.info("task1 called")
#app.task(
base=PeriodicTask,
run_every=crontab(minute='*/20'),
name='task2'
)
def task2():
logger.info("task2 called")
Please help me to find the bug here. Thanks!

Run task from another periodic task with celery

I have periodic task which should trigger another task. Final expected behavior: first task should collect some data from external service and then loop over this data (list) and call another task with passing over argument (current iteration in loop). I want to have those tasks in loop being asynchronical.
I wrote code that runs a task in period, but I can't figure out how this task should call another task, because when I do it by .delay() method then nothing happens.
Here is some simplified code that I want to run:
#celery_app.task(name="Hello World")
def hello_world():
print(f"HELLO WORLD PRINT")
add.delay(2, 2)
return 'Hello'
#celery_app.task
def add(x, y):
with open(f"./{str(datetime.datetime.now())}.txt", 'w') as file:
file.write(str(x+y))
print(f"x + y = {x + y}")
return x + y
For now hello_world() is running every 30 sec and as a result I receive HELLO WORLD PRINT in logs, but add task is not running. I can't see either print or file that should be created by this task.
Update for comment, here is how I use queue:
celery_app.conf.task_routes = {
"project.app.hello_world": {
"queue": 'test_queue'
},
"project.app.add": {
"queue": 'test_queue'
},
There are few ways to solve the problem.
The obvious one is to put the queue name in the .apply_async, for an example add.apply_async(10, 10, queue="test_queue").
Another solution is to put the queue into the task annotation, ie #celery_app.task(queue="test_queue").
I have never configured task_routes, but I believe it is possible to specify it there like you tried...

Scheduling Django method with Celery

I have this method:
def getExchangeRates():
""" Here we have the function that will retrieve the latest rates from fixer.io """
rates = {}
response = urlopen('http://data.fixer.io/api/latest?access_key=c2f5070ad78b0748111281f6475c0bdd')
data = response.read()
rdata = json.loads(data.decode(), parse_float=float)
rates_from_rdata = rdata.get('rates', {})
for rate_symbol in ['USD', 'GBP', 'HKD', 'AUD', 'JPY', 'SEK', 'NOK']:
try:
rates[rate_symbol] = rates_from_rdata[rate_symbol]
except KeyError:
logging.warning('rate for {} not found in rdata'.format(rate_symbol))
pass
return rates
#require_http_methods(['GET', 'POST'])
def index(request):
rates = getExchangeRates()
fixerio_rates = [Fixerio_rates(currency=currency, rate=rate)
for currency, rate in rates.items()]
Fixerio_rates.objects.bulk_create(fixerio_rates)
return render(request, 'index.html')
I want to schedule this, let's say, for every day at 9am, except for weekends.
I haven't found some comprehensive tutorial on how to schedule this based on such a specific datetime, also, I don't know if I could schedule this method, or create another method in my tasks file that inherits this one, and runs at any specific date.
I do have the celery.py file in my project root, and the tasks.py file in my app folder.
Or, maybe celery isn't the way to go for this situation?
Any ideas?
There are some django packages that let you manage "cron-like" jobs using django admin interface. I used in the past both django-chronograph and django-chroniker (https://github.com/chrisspen/django-chroniker). There is also django-cron (https://django-cron.readthedocs.io/en/latest/installation.html), but I never used it.
All of them have similar approach: you create one single entry on your crontab runninng something like python manage.py runcrons every minute, and on your settings.py you add the package to show it on admin.
Take a look on the documentation of either Chroniker or Django-cron for more info on how to set it up.
Also, you can use Celery Beat to schedule tasks that you need.
Tasks can be scheduled with Celery Beat.
Celery Beat must be launched as another process. This beat process will kick scheduled tasks to the celery worker process that will launch the tasks like any other celery asynchronous task. To orquestrate these two process usually is a good idea use something like supervisord in production and honcho in development.
The scheduled tasks can be defined in the code, or stored in a database and handled through django-admin with the extension django-celery-beat
To add it by code the easiest way is create another method in the tasks.py file. For your requirement for every day at 9am, except for weekends "it could look like this"
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# Executes every Monday morning at 9 a.m.
sender.add_periodic_task(
crontab(hour=9, minute=0, day_of_week=1),
test.s('Happy Mondays!'),
)
sender.add_periodic_task(
crontab(hour=9, minute=0, day_of_week=2),
test.s('Happy Tuesday!'),
)
sender.add_periodic_task(
crontab(hour=9, minute=0, day_of_week=3),
test.s('Happy Wednesday!'),
)
sender.add_periodic_task(
crontab(hour=9, minute=0, day_of_week=4),
test.s('Happy Thursday!'),
)
sender.add_periodic_task(
crontab(hour=9, minute=0, day_of_week=1),
test.s('Happy Friday!'),
)
#app.task
def test(arg):
print(arg)

Set up a while loop in celery task

I want to have a while loop going continuously in the background on my web server. I still want to have possibility to turn on and off the loop using flask giving command to my celery worker. The while loop in celery seems only run once.
from celery import Celery
#app.task
def count(i):
if i == 1: # turn on command
while True: # a while loop to achieve what I want to do
i = i+1
return i
elif i == 0: # turn off command given by flask
return i
I also tried celery_beat, but this requires me to give arguments in advance rather than accepting command from another source.
app.conf.update(
CELERYBEAT_SCHEDULE = {
'add-every-1-seconds': {
'task': 'tasks.count',
'schedule': timedelta(seconds=1),
#'args': (1)
},
})
Thanks for #dim's answer. The code I have now is:
#app.task
def count(i):
if i == 1:
while True: # a while loop to achieve what I want to do
i = i+1
time.sleep(1)
print i
print 'i am counting'
To start the worker:
$ celery -A tasks worker -l info
And call it from python
>> from tasks import count
>> result = count(1)
To stop the loop from python
>> result.revoke(terminate=True)
Hope this will be useful for people wanting to have loop in their celery task.

Where do I register an rq-scheduler job in a Django app?

I'd like to use django_rq and rq-scheduler for offline tasks, but I'm unsure of where to call rq-scheduler's ability to schedule repeating tasks. Right now, I've added my scheduling to a tasks.py module in my app, and import that in __init__.py. There has to be a better way to do this, though, right?
Thanks in advance.
I created a custom management command which modifies and replaces the rqscheduler command included in django_rq. An example is provided here: https://github.com/rq/rq-scheduler/issues/51#issuecomment-362352497
The best place I've found to run it is from your AppConfig in apps.py.
def ready(self):
scheduler = django_rq.get_scheduler('default')
# Delete any existing jobs in the scheduler when the app starts up
for job in scheduler.get_jobs():
job.delete()
# Have 'mytask' run every 5 minutes
scheduler.schedule(datetime.utcnow(), 'mytask', interval=60*5)
I've added scheduling to a __init__ module in one of my project application (in terms of Django), but wrapped with small function which prevents queueing jobs twice or more. Scheduling strategy may be dependent of your specific needs (i.e. you may need additional checking for a job arguments).
Code that works for me and fit my needs:
import django_rq
from collections import defaultdict
import tasks
scheduler = django_rq.get_scheduler('default')
jobs = scheduler.get_jobs()
functions = defaultdict(lambda: list())
map(lambda x: functions[x.func].append(x.meta.get('interval')), jobs)
now = datetime.datetime.now()
def schedule_once(func, interval):
"""
Schedule job once or reschedule when interval changes
"""
if not func in functions or not interval in functions[func]\
or len(functions[func])>1:
# clear all scheduled jobs for this function
map(scheduler.cancel, filter(lambda x: x.func==func, jobs))
# schedule with new interval
scheduler.schedule(now+datetime.timedelta(seconds=interval), func,
interval=interval)
schedule_once(tasks.some_task_a, interval=60*5)
schedule_once(tasks.some_task_b, interval=120)
Also I've wrapped this snippet to avoid imports at the package level:
def init_scheduler():
# paste here initialization code
init_scheduler()
you should use django command to run schedule job https://docs.djangoproject.com/en/3.2/howto/custom-management-commands/
like this
enter image description here
class Command(BaseCommand):
def handle(self, *args, **options):
scheduler = django_rq.get_scheduler('crontab_job')
for job in scheduler.get_jobs():
scheduler.cancel(job)
# 定时任务例子1
scheduler.cron(
"*/3 * * * *", # 每周一零点零时零分执行 0 0 * * 0 测试可以使用 */3 * * * * 每3分钟执行一次
func=gong_an_job, # Function to be queued
kwargs={'msg': '我是王龙飞1,我喜欢修仙', 'number': 1}, # Keyword arguments passed into function when executed
repeat=None, # Repeat this number of times (None means repeat forever)
queue_name='crontab_job', # In which queue the job should be put in
use_local_timezone=False # Interpret hours in the local timezone
)
# 定时任务例子2
scheduler.cron(
"*/5 * * * *", # 每周一零点零时零分执行 0 0 * * 0 测试可以使用 */3 * * * * 每3分钟执行一次
func=gong_an_job, # Function to be queued
kwargs={'msg': '我是王龙飞222222,我喜欢修仙', 'number': 22222}, # Keyword arguments passed into function when executed
repeat=None, # Repeat this number of times (None means repeat forever)
queue_name='crontab_job', # In which queue the job should be put in
use_local_timezone=False # Interpret hours in the local timezone
)
#create crontab job
python manage.py rq_crontab_job
#check crontab job and put crontab job to queue
python manage.py rqscheduler --queue crontab_job
#run crontab job
python manage.py rqworker crontab_job
I think the first answer is greate,but in multi-Progress enviroment may have some probelm,you should only run once to create crontab job !

Categories