Related
I am trying out celery and rabbitMQ combinations for asynchronous task scheduling, below is the sample program I tried on Pycharm IDE....Everything seems to be working but I am not able to see the return value from the tasks.
I am also monitoring the rabbitMQ management console , but still not able to see the return value from the tasks.
I dont understand where I am going wrong,this is my first attempt at celery and RabbitMQ
I have created a tasks.py file with 2 sample tasks(with proper decorators assigned) and returning a value for each task.
I have also started teh RabbitMQ server (using {rabbitmq-server start} command).
Then I have started the celery worker, command used : {celery -A tasks --loglevel=info}
Now, when I am trying to execute these tasks using delay() method, the command ({reverse.delay('andy')}) is running and I am getting , something like this, but I am not able to see the returned value.
from celery import Celery
app = Celery('tasks', broker= 'amqp://localhost//', backend='rpc://')
#app.task
def reverse(string):
return string[::-1]
#app.task
def add(x, y):
return x + y
Well, I have figured out the issue... It seams the latest versions of celery don't go well with windows. To fix this issue, I have installed 'eventlet' package and it takes care of the rest.
One thing to note is that, we need to start the celery worker using eventlet support, PFB the command:
celery -A <module_name> worker -loglevel=info -P eventlet
You can check the return value from task by using the result property. The syntax below is in python3 with type hinting so that you can follow along with what is going on.
import time
from celery.result import AsyncResult
result: AsyncResult = reverse.delay('andy')
task_id: str = result.id
while not result.ready():
time.sleep(1)
result = AsyncResult(task_id)
if result.successful():
return_value = result.result
n.b., the above is a naive example because in a production environment, you typically don't busy wait like we did here. Instead the client issues periodic polls checking whether the status is ready and, if not, returns control to some other portion of the code (like a UI with a spinner).
I've been working on a web app using Django, and I'm curious if there is a way to schedule a job to run periodically.
Basically I just want to run through the database and make some calculations/updates on an automatic, regular basis, but I can't seem to find any documentation on doing this.
Does anyone know how to set this up?
To clarify: I know I can set up a cron job to do this, but I'm curious if there is some feature in Django that provides this functionality. I'd like people to be able to deploy this app themselves without having to do much config (preferably zero).
I've considered triggering these actions "retroactively" by simply checking if a job should have been run since the last time a request was sent to the site, but I'm hoping for something a bit cleaner.
One solution that I have employed is to do this:
1) Create a custom management command, e.g.
python manage.py my_cool_command
2) Use cron (on Linux) or at (on Windows) to run my command at the required times.
This is a simple solution that doesn't require installing a heavy AMQP stack. However there are nice advantages to using something like Celery, mentioned in the other answers. In particular, with Celery it is nice to not have to spread your application logic out into crontab files. However the cron solution works quite nicely for a small to medium sized application and where you don't want a lot of external dependencies.
EDIT:
In later version of windows the at command is deprecated for Windows 8, Server 2012 and above. You can use schtasks.exe for same use.
**** UPDATE ****
This the new link of django doc for writing the custom management command
Celery is a distributed task queue, built on AMQP (RabbitMQ). It also handles periodic tasks in a cron-like fashion (see periodic tasks). Depending on your app, it might be worth a gander.
Celery is pretty easy to set up with django (docs), and periodic tasks will actually skip missed tasks in case of a downtime. Celery also has built-in retry mechanisms, in case a task fails.
We've open-sourced what I think is a structured app. that Brian's solution above alludes too. We would love any / all feedback!
https://github.com/tivix/django-cron
It comes with one management command:
./manage.py runcrons
That does the job. Each cron is modeled as a class (so its all OO) and each cron runs at a different frequency and we make sure the same cron type doesn't run in parallel (in case crons themselves take longer time to run than their frequency!)
If you're using a standard POSIX OS, you use cron.
If you're using Windows, you use at.
Write a Django management command to
Figure out what platform they're on.
Either execute the appropriate "AT" command for your users, or update the crontab for your users.
Interesting new pluggable Django app: django-chronograph
You only have to add one cron entry which acts as a timer, and you have a very nice Django admin interface into the scripts to run.
Look at Django Poor Man's Cron which is a Django app that makes use of spambots, search engine indexing robots and alike to run scheduled tasks in approximately regular intervals
See: http://code.google.com/p/django-poormanscron/
I had exactly the same requirement a while ago, and ended up solving it using APScheduler (User Guide)
It makes scheduling jobs super simple, and keeps it independent for from request-based execution of some code. Following is a simple example.
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
job = None
def tick():
print('One tick!')\
def start_job():
global job
job = scheduler.add_job(tick, 'interval', seconds=3600)
try:
scheduler.start()
except:
pass
Hope this helps somebody!
Django APScheduler for Scheduler Jobs. Advanced Python Scheduler (APScheduler) is a Python library that lets you schedule your Python code to be executed later, either just once or periodically. You can add new jobs or remove old ones on the fly as you please.
note: I'm the author of this library
Install APScheduler
pip install apscheduler
View file function to call
file name: scheduler_jobs.py
def FirstCronTest():
print("")
print("I am executed..!")
Configuring the scheduler
make execute.py file and add the below codes
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
Your written functions Here, the scheduler functions are written in scheduler_jobs
import scheduler_jobs
scheduler.add_job(scheduler_jobs.FirstCronTest, 'interval', seconds=10)
scheduler.start()
Link the File for Execution
Now, add the below line in the bottom of Url file
import execute
You can check the full code by executing
[Click here]
https://github.com/devchandansh/django-apscheduler
Brian Neal's suggestion of running management commands via cron works well, but if you're looking for something a little more robust (yet not as elaborate as Celery) I'd look into a library like Kronos:
# app/cron.py
import kronos
#kronos.register('0 * * * *')
def task():
pass
RabbitMQ and Celery have more features and task handling capabilities than Cron. If task failure isn't an issue, and you think you will handle broken tasks in the next call, then Cron is sufficient.
Celery & AMQP will let you handle the broken task, and it will get executed again by another worker (Celery workers listen for the next task to work on), until the task's max_retries attribute is reached. You can even invoke tasks on failure, like logging the failure, or sending an email to the admin once the max_retries has been reached.
And you can distribute Celery and AMQP servers when you need to scale your application.
I personally use cron, but the Jobs Scheduling parts of django-extensions looks interesting.
Although not part of Django, Airflow is a more recent project (as of 2016) that is useful for task management.
Airflow is a workflow automation and scheduling system that can be used to author and manage data pipelines. A web-based UI provides the developer with a range of options for managing and viewing these pipelines.
Airflow is written in Python and is built using Flask.
Airflow was created by Maxime Beauchemin at Airbnb and open sourced in the spring of 2015. It joined the Apache Software Foundation’s incubation program in the winter of 2016. Here is the Git project page and some addition background information.
Put the following at the top of your cron.py file:
#!/usr/bin/python
import os, sys
sys.path.append('/path/to/') # the parent directory of the project
sys.path.append('/path/to/project') # these lines only needed if not on path
os.environ['DJANGO_SETTINGS_MODULE'] = 'myproj.settings'
# imports and code below
I just thought about this rather simple solution:
Define a view function do_work(req, param) like you would with any other view, with URL mapping, return a HttpResponse and so on.
Set up a cron job with your timing preferences (or using AT or Scheduled Tasks in Windows) which runs curl http://localhost/your/mapped/url?param=value.
You can add parameters but just adding parameters to the URL.
Tell me what you guys think.
[Update] I'm now using runjob command from django-extensions instead of curl.
My cron looks something like this:
#hourly python /path/to/project/manage.py runjobs hourly
... and so on for daily, monthly, etc'. You can also set it up to run a specific job.
I find it more managable and a cleaner. Doesn't require mapping a URL to a view. Just define your job class and crontab and you're set.
after the part of code,I can write anything just like my views.py :)
#######################################
import os,sys
sys.path.append('/home/administrator/development/store')
os.environ['DJANGO_SETTINGS_MODULE']='store.settings'
from django.core.management impor setup_environ
from store import settings
setup_environ(settings)
#######################################
from
http://www.cotellese.net/2007/09/27/running-external-scripts-against-django-models/
You should definitely check out django-q!
It requires no additional configuration and has quite possibly everything needed to handle any production issues on commercial projects.
It's actively developed and integrates very well with django, django ORM, mongo, redis. Here is my configuration:
# django-q
# -------------------------------------------------------------------------
# See: http://django-q.readthedocs.io/en/latest/configure.html
Q_CLUSTER = {
# Match recommended settings from docs.
'name': 'DjangoORM',
'workers': 4,
'queue_limit': 50,
'bulk': 10,
'orm': 'default',
# Custom Settings
# ---------------
# Limit the amount of successful tasks saved to Django.
'save_limit': 10000,
# See https://github.com/Koed00/django-q/issues/110.
'catch_up': False,
# Number of seconds a worker can spend on a task before it's terminated.
'timeout': 60 * 5,
# Number of seconds a broker will wait for a cluster to finish a task before presenting it again. This needs to be
# longer than `timeout`, otherwise the same task will be processed multiple times.
'retry': 60 * 6,
# Whether to force all async() calls to be run with sync=True (making them synchronous).
'sync': False,
# Redirect worker exceptions directly to Sentry error reporter.
'error_reporter': {
'sentry': RAVEN_CONFIG,
},
}
Yes, the method above is so great. And I tried some of them. At last, I found a method like this:
from threading import Timer
def sync():
do something...
sync_timer = Timer(self.interval, sync, ())
sync_timer.start()
Just like Recursive.
Ok, I hope this method can meet your requirement. :)
A more modern solution (compared to Celery) is Django Q:
https://django-q.readthedocs.io/en/latest/index.html
It has great documentation and is easy to grok. Windows support is lacking, because Windows does not support process forking. But it works fine if you create your dev environment using the Windows for Linux Subsystem.
I had something similar with your problem today.
I didn't wanted to have it handled by the server trhough cron (and most of the libs were just cron helpers in the end).
So i've created a scheduling module and attached it to the init .
It's not the best approach, but it helps me to have all the code in a single place and with its execution related to the main app.
I use celery to create my periodical tasks. First you need to install it as follows:
pip install django-celery
Don't forget to register django-celery in your settings and then you could do something like this:
from celery import task
from celery.decorators import periodic_task
from celery.task.schedules import crontab
from celery.utils.log import get_task_logger
#periodic_task(run_every=crontab(minute="0", hour="23"))
def do_every_midnight():
#your code
I am not sure will this be useful for anyone, since I had to provide other users of the system to schedule the jobs, without giving them access to the actual server(windows) Task Scheduler, I created this reusable app.
Please note users have access to one shared folder on server where they can create required command/task/.bat file. This task then can be scheduled using this app.
App name is Django_Windows_Scheduler
ScreenShot:
If you want something more reliable than Celery, try TaskHawk which is built on top of AWS SQS/SNS.
Refer: http://taskhawk.readthedocs.io
For simple dockerized projects, I could not really see any existing answer fit.
So I wrote a very barebones solution without the need of external libraries or triggers, which runs on its own. No external os-cron needed, should work in every environment.
It works by adding a middleware: middleware.py
import threading
def should_run(name, seconds_interval):
from application.models import CronJob
from django.utils.timezone import now
try:
c = CronJob.objects.get(name=name)
except CronJob.DoesNotExist:
CronJob(name=name, last_ran=now()).save()
return True
if (now() - c.last_ran).total_seconds() >= seconds_interval:
c.last_ran = now()
c.save()
return True
return False
class CronTask:
def __init__(self, name, seconds_interval, function):
self.name = name
self.seconds_interval = seconds_interval
self.function = function
def cron_worker(*_):
if not should_run("main", 60):
return
# customize this part:
from application.models import Event
tasks = [
CronTask("events", 60 * 30, Event.clean_stale_objects),
# ...
]
for task in tasks:
if should_run(task.name, task.seconds_interval):
task.function()
def cron_middleware(get_response):
def middleware(request):
response = get_response(request)
threading.Thread(target=cron_worker).start()
return response
return middleware
models/cron.py:
from django.db import models
class CronJob(models.Model):
name = models.CharField(max_length=10, primary_key=True)
last_ran = models.DateTimeField()
settings.py:
MIDDLEWARE = [
...
'application.middleware.cron_middleware',
...
]
Simple way is to write a custom shell command see Django Documentation and execute it using a cronjob on linux. However i would highly recommend using a message broker like RabbitMQ coupled with celery. Maybe you can have a look at
this Tutorial
One alternative is to use Rocketry:
from rocketry import Rocketry
from rocketry.conds import daily, after_success
app = Rocketry()
#app.task(daily.at("10:00"))
def do_daily():
...
#app.task(after_success(do_daily))
def do_after_another():
...
if __name__ == "__main__":
app.run()
It also supports custom conditions:
from pathlib import Path
#app.cond()
def file_exists(file):
return Path(file).exists()
#app.task(daily & file_exists("myfile.csv"))
def do_custom():
...
And it also supports Cron:
from rocketry.conds import cron
#app.task(cron('*/2 12-18 * Oct Fri'))
def do_cron():
...
It can be integrated quite nicely with FastAPI and I think it could be integrated with Django as well as Rocketry is essentially just a sophisticated loop that can spawn, async tasks, threads and processes.
Disclaimer: I'm the author.
Another option, similar to Brian Neal's answer it to use RunScripts
Then you don't need to set up commands. This has the advantage of more flexible or cleaner folder structures.
This file must implement a run() function. This is what gets called when you run the script. You can import any models or other parts of your django project to use in these scripts.
And then, just
python manage.py runscript path.to.script
I have a Django application that I'm running on Docker. I'm trying to launch an APScheduler scheduler when I run the docker container.
I created a scheduler and I simply added it a job that I called test1, and that sends an email to my address.
This is the Python script that is launched when I run the container.
from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.schedulers.background import BackgroundScheduler
#scheduler = BlockingScheduler()
scheduler = BackgroundScheduler()
def test1():
... (code to send email)
scheduler.add_job(test1, 'interval', seconds = 20)
scheduler.start()
This is the results I obtained with each of the two kind of schedulers:
BlockingScheduler: the scheduler works, I receive an email every 20 seconds. However I can't access the app. I presume this is normal due to the very nature of the BlockingScheduler.
screenshot1
screenshot2
BackgroundScheduler: no problem to access the application. However, I receive no email.
Since the emails were sent in one of the two cases I guess the problem is neither Django nor Docker related, but purely about APScheduler. I did my research but I couldn't find why the BackgroundScheduler didn't work as in the tutorials I read, the developper set up the scheduler the same way I did.
Any help would be much appreciated, thanks!
UPDATE 1
I tried the two following things, both made the BackgroundScheduler behave like a BlockingScheduler (which is not what I want)
1) Setting the daemon option to False when initialising the scheduler instance:
scheduler = BackgroundScheduler(daemon = False)
2) "Trying to keep the main thread alive", as explained in these:
how-do-i-schedule-an-interval-job-with-apscheduler
apscheduler-inside-a-class-object
I added this right after scheduler.starts():
while True:
time.sleep(1)
scheduler.shutdown()
UPDATE 2
When I try to setup a BackgroundScheduler in a single Python file (outside of any application context), it works very well:
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.schedulers.blocking import BlockingScheduler
def test1():
print('issou')
scheduler = BackgroundScheduler()
scheduler.start()
scheduler.add_job(test1, 'interval', seconds=5)
print('yatangaki')
'yatangaki' is first printed, and then 'issou' every 5 seconds, so everything seems fine.
UPDATE 3
Now I've tried to start the scheduler on a Django app that I ran locally with python manage.py runserver, without using Docker.
It works perfectly: the emails are sent and I can access the main view of the application.
Note: the BackgroundScheduler is started by a function called start_test1. In this app, I run start_test1 in the top-level urls.py file. On the other app - the one that I run with Docker, which is the one I want to use in the end - start_test1 is started in a Python script, that is itself triggered in a .sh file, which I run via the CMD Docker command.
It appears it was all about where to start the scheduler and to add the job.
In what I did initially (putting the code in a .sh file), the BackgroundScheduler started but the Python script immediately ended after being ran, as it didn't have a blocking behaviour and the sh. file wasn't really part of the app (it's used by the Dockerfile, not by the app).
I ended up finding the solution here:
execute-code-when-django-starts-once-only
There was no apps.py file inside my application so I created one and followed the instruction in this thread.
It works fine now.
I have a similar problem but it got solved only by not blocking the worker:
# working:
def worker():
phone_elm = ....
thread = threading.Thread(target=work, args=(phone_elm,))
thread.start()
# not working:
def worker():
phone_elm = ....
work(phone_elm)
scheduler2 = BackgroundScheduler(timezone="Asia/Kolkata")
# schedule scanning running of folder's running file
scheduler2.add_job(worker, 'interval', seconds=15, max_instances=5000)
scheduler1.start()
I mean the not working, is that after 10 ~ time it triggered it was stopped for no reason, started again after 1 of the 10 stopped (work exited)
It is also part of Django, but it is clear that start is not exiting...
I have an application written in Web2Py that contains some modules. I need to call some functions out of a module on a periodic basis, say once daily. I have been trying to get a scheduler working for that purpose but am not sure how to get it working properly. I have referred to this and this to get started.
I have got a scheduler.py class in the models directory, which contains code like this:
from gluon.scheduler import Scheduler
from Module1 import Module1
def daily_task():
module1 = Module1()
module1.action1(arg1, arg2, arg3)
daily_task_scheduler = Scheduler(db, tasks=dict(my_daily_task=daily_task))
In default.py I have following code for the scheduler:
def daily_periodic_task():
daily_task_scheduler.queue_task('daily_running_task', repeats=0, period=60)
[for testing I am running it after 60 seconds, otherwise for daily I plan to use period=86400]
In my Module1.py class, I have this kind of code:
def action1(self, arg1, arg2, arg3):
for row in db().select(db.table1.ALL):
row.processed = 'processed'
row.update_record()
One of the issues I am facing is that I don't understand clearly how to make this scheduler work to automatically handle the execution of action1 on daily basis.
When I launch my application using syntax similar to: python web2py.py -K my_app it shows this in the console:
web2py Web Framework
Created by Massimo Di Pierro, Copyright 2007-2015
Version 2.11.2-stable+timestamp.2015.05.30.16.33.24
Database drivers available: sqlite3, imaplib, pyodbc, pymysql, pg8000
starting single-scheduler for "my_app"...
However, when I see the browser at:
http://127.0.0.1:8000/my_app/default/daily_periodic_task
I just see "None" as text displayed on the screen and I don't see any changes produced by the scheduled task in my database table.
While when I see the browser at:
http://127.0.0.1:8000/my_app/default/index
I get an error stating This web page is not available, basically indicating my application never got started.
When I start my application normally using python web2py.py my application loads fine but I don't see any changes produced by the scheduled task in my database table.
I am unable to figure out what I am doing wrong here and how to properly use the scheduler with Web2Py. Basically, I need to know how can I start my application normally alongwith the scheduled tasks properly running in background.
Any help in this regard would be highly appreciated.
Running python web2py.py starts the built-in web server, enabling web2py to respond to HTTP requests (i.e., serving web pages to a browser). This has nothing to do with the scheduler and will not result in any scheduled tasks being run.
To run scheduled tasks, you must start one or more background workers via:
python web2py.py -K myapp
The above does not start the built-in web server and therefore does not enable you to visit web pages. It simply starts a worker process that will be available to execute scheduled tasks.
Also, note that the above does not actually result in any tasks being scheduled. To schedule a task, you must insert a record in the db.scheduler_task table, which you can do via any of the usual methods of inserting records (including using appadmin) or programmatically via the scheduler.queue_task method (which is what you use in your daily_periodic_task action).
Note, you can simultaneously start the built-in web server and a scheduler worker process via:
python web2py.py -a yourpassword -K myapp -X
So, to schedule a daily task and have it actually executed, you need to (a) start a scheduler worker and (b) schedule the task. You can schedule the task by visiting your daily_periodic_task action, but note that you only need to visit that action once, as once the task has been scheduled, it remains in effect indefinitely (given that you have set repeats=0).
If the task does not appear to be working, it is possible there is something wrong with the task itself that is resulting in an error.
I've been working on a web app using Django, and I'm curious if there is a way to schedule a job to run periodically.
Basically I just want to run through the database and make some calculations/updates on an automatic, regular basis, but I can't seem to find any documentation on doing this.
Does anyone know how to set this up?
To clarify: I know I can set up a cron job to do this, but I'm curious if there is some feature in Django that provides this functionality. I'd like people to be able to deploy this app themselves without having to do much config (preferably zero).
I've considered triggering these actions "retroactively" by simply checking if a job should have been run since the last time a request was sent to the site, but I'm hoping for something a bit cleaner.
One solution that I have employed is to do this:
1) Create a custom management command, e.g.
python manage.py my_cool_command
2) Use cron (on Linux) or at (on Windows) to run my command at the required times.
This is a simple solution that doesn't require installing a heavy AMQP stack. However there are nice advantages to using something like Celery, mentioned in the other answers. In particular, with Celery it is nice to not have to spread your application logic out into crontab files. However the cron solution works quite nicely for a small to medium sized application and where you don't want a lot of external dependencies.
EDIT:
In later version of windows the at command is deprecated for Windows 8, Server 2012 and above. You can use schtasks.exe for same use.
**** UPDATE ****
This the new link of django doc for writing the custom management command
Celery is a distributed task queue, built on AMQP (RabbitMQ). It also handles periodic tasks in a cron-like fashion (see periodic tasks). Depending on your app, it might be worth a gander.
Celery is pretty easy to set up with django (docs), and periodic tasks will actually skip missed tasks in case of a downtime. Celery also has built-in retry mechanisms, in case a task fails.
We've open-sourced what I think is a structured app. that Brian's solution above alludes too. We would love any / all feedback!
https://github.com/tivix/django-cron
It comes with one management command:
./manage.py runcrons
That does the job. Each cron is modeled as a class (so its all OO) and each cron runs at a different frequency and we make sure the same cron type doesn't run in parallel (in case crons themselves take longer time to run than their frequency!)
If you're using a standard POSIX OS, you use cron.
If you're using Windows, you use at.
Write a Django management command to
Figure out what platform they're on.
Either execute the appropriate "AT" command for your users, or update the crontab for your users.
Interesting new pluggable Django app: django-chronograph
You only have to add one cron entry which acts as a timer, and you have a very nice Django admin interface into the scripts to run.
Look at Django Poor Man's Cron which is a Django app that makes use of spambots, search engine indexing robots and alike to run scheduled tasks in approximately regular intervals
See: http://code.google.com/p/django-poormanscron/
I had exactly the same requirement a while ago, and ended up solving it using APScheduler (User Guide)
It makes scheduling jobs super simple, and keeps it independent for from request-based execution of some code. Following is a simple example.
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
job = None
def tick():
print('One tick!')\
def start_job():
global job
job = scheduler.add_job(tick, 'interval', seconds=3600)
try:
scheduler.start()
except:
pass
Hope this helps somebody!
Django APScheduler for Scheduler Jobs. Advanced Python Scheduler (APScheduler) is a Python library that lets you schedule your Python code to be executed later, either just once or periodically. You can add new jobs or remove old ones on the fly as you please.
note: I'm the author of this library
Install APScheduler
pip install apscheduler
View file function to call
file name: scheduler_jobs.py
def FirstCronTest():
print("")
print("I am executed..!")
Configuring the scheduler
make execute.py file and add the below codes
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
Your written functions Here, the scheduler functions are written in scheduler_jobs
import scheduler_jobs
scheduler.add_job(scheduler_jobs.FirstCronTest, 'interval', seconds=10)
scheduler.start()
Link the File for Execution
Now, add the below line in the bottom of Url file
import execute
You can check the full code by executing
[Click here]
https://github.com/devchandansh/django-apscheduler
Brian Neal's suggestion of running management commands via cron works well, but if you're looking for something a little more robust (yet not as elaborate as Celery) I'd look into a library like Kronos:
# app/cron.py
import kronos
#kronos.register('0 * * * *')
def task():
pass
RabbitMQ and Celery have more features and task handling capabilities than Cron. If task failure isn't an issue, and you think you will handle broken tasks in the next call, then Cron is sufficient.
Celery & AMQP will let you handle the broken task, and it will get executed again by another worker (Celery workers listen for the next task to work on), until the task's max_retries attribute is reached. You can even invoke tasks on failure, like logging the failure, or sending an email to the admin once the max_retries has been reached.
And you can distribute Celery and AMQP servers when you need to scale your application.
I personally use cron, but the Jobs Scheduling parts of django-extensions looks interesting.
Although not part of Django, Airflow is a more recent project (as of 2016) that is useful for task management.
Airflow is a workflow automation and scheduling system that can be used to author and manage data pipelines. A web-based UI provides the developer with a range of options for managing and viewing these pipelines.
Airflow is written in Python and is built using Flask.
Airflow was created by Maxime Beauchemin at Airbnb and open sourced in the spring of 2015. It joined the Apache Software Foundation’s incubation program in the winter of 2016. Here is the Git project page and some addition background information.
Put the following at the top of your cron.py file:
#!/usr/bin/python
import os, sys
sys.path.append('/path/to/') # the parent directory of the project
sys.path.append('/path/to/project') # these lines only needed if not on path
os.environ['DJANGO_SETTINGS_MODULE'] = 'myproj.settings'
# imports and code below
I just thought about this rather simple solution:
Define a view function do_work(req, param) like you would with any other view, with URL mapping, return a HttpResponse and so on.
Set up a cron job with your timing preferences (or using AT or Scheduled Tasks in Windows) which runs curl http://localhost/your/mapped/url?param=value.
You can add parameters but just adding parameters to the URL.
Tell me what you guys think.
[Update] I'm now using runjob command from django-extensions instead of curl.
My cron looks something like this:
#hourly python /path/to/project/manage.py runjobs hourly
... and so on for daily, monthly, etc'. You can also set it up to run a specific job.
I find it more managable and a cleaner. Doesn't require mapping a URL to a view. Just define your job class and crontab and you're set.
after the part of code,I can write anything just like my views.py :)
#######################################
import os,sys
sys.path.append('/home/administrator/development/store')
os.environ['DJANGO_SETTINGS_MODULE']='store.settings'
from django.core.management impor setup_environ
from store import settings
setup_environ(settings)
#######################################
from
http://www.cotellese.net/2007/09/27/running-external-scripts-against-django-models/
You should definitely check out django-q!
It requires no additional configuration and has quite possibly everything needed to handle any production issues on commercial projects.
It's actively developed and integrates very well with django, django ORM, mongo, redis. Here is my configuration:
# django-q
# -------------------------------------------------------------------------
# See: http://django-q.readthedocs.io/en/latest/configure.html
Q_CLUSTER = {
# Match recommended settings from docs.
'name': 'DjangoORM',
'workers': 4,
'queue_limit': 50,
'bulk': 10,
'orm': 'default',
# Custom Settings
# ---------------
# Limit the amount of successful tasks saved to Django.
'save_limit': 10000,
# See https://github.com/Koed00/django-q/issues/110.
'catch_up': False,
# Number of seconds a worker can spend on a task before it's terminated.
'timeout': 60 * 5,
# Number of seconds a broker will wait for a cluster to finish a task before presenting it again. This needs to be
# longer than `timeout`, otherwise the same task will be processed multiple times.
'retry': 60 * 6,
# Whether to force all async() calls to be run with sync=True (making them synchronous).
'sync': False,
# Redirect worker exceptions directly to Sentry error reporter.
'error_reporter': {
'sentry': RAVEN_CONFIG,
},
}
Yes, the method above is so great. And I tried some of them. At last, I found a method like this:
from threading import Timer
def sync():
do something...
sync_timer = Timer(self.interval, sync, ())
sync_timer.start()
Just like Recursive.
Ok, I hope this method can meet your requirement. :)
A more modern solution (compared to Celery) is Django Q:
https://django-q.readthedocs.io/en/latest/index.html
It has great documentation and is easy to grok. Windows support is lacking, because Windows does not support process forking. But it works fine if you create your dev environment using the Windows for Linux Subsystem.
I had something similar with your problem today.
I didn't wanted to have it handled by the server trhough cron (and most of the libs were just cron helpers in the end).
So i've created a scheduling module and attached it to the init .
It's not the best approach, but it helps me to have all the code in a single place and with its execution related to the main app.
I use celery to create my periodical tasks. First you need to install it as follows:
pip install django-celery
Don't forget to register django-celery in your settings and then you could do something like this:
from celery import task
from celery.decorators import periodic_task
from celery.task.schedules import crontab
from celery.utils.log import get_task_logger
#periodic_task(run_every=crontab(minute="0", hour="23"))
def do_every_midnight():
#your code
I am not sure will this be useful for anyone, since I had to provide other users of the system to schedule the jobs, without giving them access to the actual server(windows) Task Scheduler, I created this reusable app.
Please note users have access to one shared folder on server where they can create required command/task/.bat file. This task then can be scheduled using this app.
App name is Django_Windows_Scheduler
ScreenShot:
If you want something more reliable than Celery, try TaskHawk which is built on top of AWS SQS/SNS.
Refer: http://taskhawk.readthedocs.io
For simple dockerized projects, I could not really see any existing answer fit.
So I wrote a very barebones solution without the need of external libraries or triggers, which runs on its own. No external os-cron needed, should work in every environment.
It works by adding a middleware: middleware.py
import threading
def should_run(name, seconds_interval):
from application.models import CronJob
from django.utils.timezone import now
try:
c = CronJob.objects.get(name=name)
except CronJob.DoesNotExist:
CronJob(name=name, last_ran=now()).save()
return True
if (now() - c.last_ran).total_seconds() >= seconds_interval:
c.last_ran = now()
c.save()
return True
return False
class CronTask:
def __init__(self, name, seconds_interval, function):
self.name = name
self.seconds_interval = seconds_interval
self.function = function
def cron_worker(*_):
if not should_run("main", 60):
return
# customize this part:
from application.models import Event
tasks = [
CronTask("events", 60 * 30, Event.clean_stale_objects),
# ...
]
for task in tasks:
if should_run(task.name, task.seconds_interval):
task.function()
def cron_middleware(get_response):
def middleware(request):
response = get_response(request)
threading.Thread(target=cron_worker).start()
return response
return middleware
models/cron.py:
from django.db import models
class CronJob(models.Model):
name = models.CharField(max_length=10, primary_key=True)
last_ran = models.DateTimeField()
settings.py:
MIDDLEWARE = [
...
'application.middleware.cron_middleware',
...
]
Simple way is to write a custom shell command see Django Documentation and execute it using a cronjob on linux. However i would highly recommend using a message broker like RabbitMQ coupled with celery. Maybe you can have a look at
this Tutorial
One alternative is to use Rocketry:
from rocketry import Rocketry
from rocketry.conds import daily, after_success
app = Rocketry()
#app.task(daily.at("10:00"))
def do_daily():
...
#app.task(after_success(do_daily))
def do_after_another():
...
if __name__ == "__main__":
app.run()
It also supports custom conditions:
from pathlib import Path
#app.cond()
def file_exists(file):
return Path(file).exists()
#app.task(daily & file_exists("myfile.csv"))
def do_custom():
...
And it also supports Cron:
from rocketry.conds import cron
#app.task(cron('*/2 12-18 * Oct Fri'))
def do_cron():
...
It can be integrated quite nicely with FastAPI and I think it could be integrated with Django as well as Rocketry is essentially just a sophisticated loop that can spawn, async tasks, threads and processes.
Disclaimer: I'm the author.
Another option, similar to Brian Neal's answer it to use RunScripts
Then you don't need to set up commands. This has the advantage of more flexible or cleaner folder structures.
This file must implement a run() function. This is what gets called when you run the script. You can import any models or other parts of your django project to use in these scripts.
And then, just
python manage.py runscript path.to.script