Running background thread in GAE flexible environment with Python-compact

Running background thread in GAE flexible environment with Python-compact - python

I am working on migrating an existing python GAE (Google App Engine) standard environment app to the flexible environment. I read through the guide and decided to try out the python-compact runtime, as it's always good to re-use as much code as possible.
In the standard environment app, we use background_thread.start_new_background_thread() to spawn a bunch of infinite-loop threads to work on some background work forever. However, I couldn't get start_new_background_thread working in the flexible environment, even for some really simple app. Like this sample app:
github.com/GoogleCloudPlatform/python-docs-samples/tree/master/appengine/background
I keep getting the following error while running the app in the cloud (it works fine locally though).
I debugged into it by using the cloud debugger, but there was no any error message available at all while the exception was raised in the background_thread.py
Any idea how I can run a long-live background thread in the flexible environment with python-compact runtime? Thanks!

One of the differences between App Engine standard and App Engine flexible is that with Flex we're really just running a docker container. I can think of 2 approaches to try out.
1. Just use Python multiprocessing
App Engine standard enforces a sandbox that mostly means no direct use of threads or processes. With Flex, you should be able to just use the standard Python lib for starting a new sub process:
https://docs.python.org/3/library/subprocess.html
2. Use supervisord and docker
If that doesn't work, another approach you could take here is to customize the docker image you're using in Flex, and use supervisord to start multiple processes. First, generate the dockerfile by cd-ing into folder with your sources and running:
gcloud preview app gen-config --custom
This will create a Dockerfile that you can customize. Now, you are going to want to start 2 processes - the process we were starting (I think for python-compat it's gunicorn) and your background process. The easiest way to do that with docker is to use supervisord:
https://docs.docker.com/engine/admin/using_supervisord/
After modifying your Dockerfile and adding a supervisord.conf, you can just deploy your app as you normally would with gcloud preview app deploy.
Hope this helps!

I wish the documentation said that background_thread was not a supported API.
Anyway, I've found some hacks to help with some thread incompatibilities. App Engine uses os.environ to read a lot of settings. The "real" threads in your application will have a bunch of environment variables set there. The background threads you start will have none. One hack I've used is to copy some of the environment variables. For example, I needed to copy set the SERVER_SOFTWARE variable in the background threads in order to get the App Engine cloud storage library to work. We use something like:
_global_server_software = None
_SERVER_SOFTWARE = 'SERVER_SOFTWARE'
def environ_wrapper(function, args):
if _global_server_software is not None:
os.environ[_SERVER_SOFTWARE] = _global_server_software
function(*args)
def start_thread_with_app_engine_environ(function, *args):
# HACK: Required for the cloudstorage API on Flexible environment threads to work
# App Engine relies on a lot of environment variables to work correctly. New threads get none
# of those variables. loudstorage uses SERVER_SOFTWARE to determine if it is a test instance
global _global_server_software
if _global_server_software is None and os.environ.get(_SERVER_SOFTWARE) is not None:
_global_server_software = os.environ[_SERVER_SOFTWARE]
t = threading.Thread(target=environ_wrapper, args=(
function, args))
t.start()

Related

Why firebase_admin doesn't resolve when running multiprocesses

I have a Tornado app which is using python firebase_admin SDK.
When I run in single process:
console_server = tornado.httpserver.HTTPServer(ConsoleApplication())
console_server.listen(options.console_port, options.bind_addr)
tornado.ioloop.IOLoop.instance().start()
firebase_admin works fine. But when I change it to run in multiprocess:
console_server = tornado.httpserver.HTTPServer(ConsoleApplication())
console_server.bind(options.console_port, options.bind_addr)
console_server.start(4)
tornado.ioloop.IOLoop.instance().start()
The last line here is getting stuck:
if (not len(firebase_admin._apps)):
cred = ...
self.app = firebase_admin.initialize_app(cred)
self.app = firebase_admin.get_app()
self.db = firestore.client()
...
ref = self.db.document(USER_DOC.format(org, value))
user_ref = ref.get()
Seems like get() is not getting resolved since I don't get any exception.
Does anyone has an idea why it's happening or at least how can I debug it?

The multiprocess fork (i.e. the start(4) call) must come very early in the life of your application. In particular, most things that touch the network must come after the fork (bind() is one of the few exceptions, and must come before the fork in this case).
You (probably) need to reorganize things so that you're creating the firebase app after the fork. This can be annoying if you're using the HTTPServer.start method, so you may want to switch to calling tornado.process.fork_processes() directly instead (this is documented as the "advanced multi-process" pattern).

I know it's an old question, but I want to share my experience regarding this issue to help future visitors.
I recently developed a script with multiprocessing that uses Firebase Admin Python SDK, everything worked fine in my local Windows machine, but when I deployed it for production in Linux server, I noticed the script is getting stuck in get() function.
After hours of searching, I found out that the default start method of a python process is different in Windows and Unix environments: Windows uses spawn as default start method, whereas Unix uses fork. You can learn more about start methods in the documentation.
So to make it work in my Linux server, I just changed the start method to spawn:
if __name__ == '__main__':
multiprocessing.set_start_method('spawn') # <-- Set spawn as start_method
# The rest of your script here
# ...

What is the best way to run python scripts in AWS?

I have three python scripts, 1.py, 2.py, and 3.py, each having 3 runtime arguments to be passed.
All three python programs are independent of each other. All 3 may run in a sequential manner in a batch or it may happen any two may run depending upon some configuration.
Manual approach:
Create EC2 instance, run python script, shut it down.
Repeat the above step for the next python script.
The automated way would be trigger the above process through lambda and replicate the above process using some combination of services.
What is the best way to implement this in AWS?

AWS Batch has a DAG scheduler, technically you could define job1, job2, job3 and tell AWS Batch to run them in that order. But I wouldn't recommend that route.
For the above to work you would basically need to create 3 docker images. image1, image2, image3. and then put these in ECR (Docker Hub can also work if not using Fargate launch type).
I don't think that makes sense unless each job is bulky has its own runtime that's different from the others.
Instead I would write a Python program that calls 1.py 2.py and 3.py, put that in a Docker image and run a AWS batch job or just ECS Fargate task.
main.py:
import subprocess
exit_code = subprocess.call("python3 /path/to/1.py", shell=True)
# decide if you want call 2.py and so on ...
# 1.py will see the same stdout, stderr as main.py
# with batch and fargate you can retrieve these form cloudwatch logs ...
Now you have a Docker image that just needs to run somewhere. Fargate is fast to startup, bit pricey, has a 10GB max limit on temporary storage. AWS Batch is slow to startup on a cold start, but can use spot instances in your account. You might need to make a custom AMI for AWS batch to work. i.e. if you want more storage.
Note: for anyone who wants to scream at shell=True, both main.py and 1.py came from the same codebase. It's a batch job, not an internet facing API that took that from user request.

You can run your EC2 instance via a Python Script, using the AWS boto3 library (https://aws.amazon.com/sdk-for-python/). So, a possible solution would be to trigger a Lambda function periodically (you can use Amazon Cloudwatch for periodic events), and inside that function you can boot up your EC2 instance using Python script.
In your instance you can configure your OS to run a Python script every time it boots up, I would suggest you to use Crontab (See this link https://www.instructables.com/id/Raspberry-Pi-Launch-Python-script-on-startup/)
At the end of your script, you can trigger a Amazon SQS event to a function that will shutdown your first instance and than call another function that will start the second script.

You could use meadowrun - disclaimer I am one of the maintainers so obviously biased.
Meadowrun is a python library/tool that manages EC2 instances for you, moves python code + environment dependencies to them, and runs a function without any hassle.
For example, you could put your scripts in a Git repo and run them like so:
import asyncio
from meadowrun import AllocCloudInstance, Deployment, run_function
from script_1 import run_1
async def main():
results = await run_function(
# the function to run on the EC2 instance
lambda: run_1(arguments),
# properties of the VM that runs the function
AllocCloudInstance(
logical_cpu_required=2,
memory_gb_required=16,
interruption_probability_threshold=15,
cloud_provider="EC2"),
# code+env to deploy on the VM, there's other options here
Deployment.git_repo(
"https://github.com/someuser/somerepo",
conda_yml_file="env.yml",
)
)
It will then create an EC2 instance with the given requirements for you (or reuse one if it's already there - could be useful for running your scripts in sequence), creates python code + enviroment there, runs the function and returns any results and output.

For 2022, depending on your infrastructure constraints, i'd say the easiest way would be to set the scripts on Lambda and then call them from the CloudWatch with the required parameters (create a rule):
https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html
That way you can configure them to run independently or sequential and not having to worry about setting up and turning on and off the infrastructure.
This applies to scripts that are not too recursive intensive and that don't run for more than 15 minutes at a time (Lambda time limit)

How to write python script to run automatically at 11:30 pm everyday? [duplicate]

I've been working on a web app using Django, and I'm curious if there is a way to schedule a job to run periodically.
Basically I just want to run through the database and make some calculations/updates on an automatic, regular basis, but I can't seem to find any documentation on doing this.
Does anyone know how to set this up?
To clarify: I know I can set up a cron job to do this, but I'm curious if there is some feature in Django that provides this functionality. I'd like people to be able to deploy this app themselves without having to do much config (preferably zero).
I've considered triggering these actions "retroactively" by simply checking if a job should have been run since the last time a request was sent to the site, but I'm hoping for something a bit cleaner.

One solution that I have employed is to do this:
1) Create a custom management command, e.g.
python manage.py my_cool_command
2) Use cron (on Linux) or at (on Windows) to run my command at the required times.
This is a simple solution that doesn't require installing a heavy AMQP stack. However there are nice advantages to using something like Celery, mentioned in the other answers. In particular, with Celery it is nice to not have to spread your application logic out into crontab files. However the cron solution works quite nicely for a small to medium sized application and where you don't want a lot of external dependencies.
EDIT:
In later version of windows the at command is deprecated for Windows 8, Server 2012 and above. You can use schtasks.exe for same use.
**** UPDATE ****
This the new link of django doc for writing the custom management command

Celery is a distributed task queue, built on AMQP (RabbitMQ). It also handles periodic tasks in a cron-like fashion (see periodic tasks). Depending on your app, it might be worth a gander.
Celery is pretty easy to set up with django (docs), and periodic tasks will actually skip missed tasks in case of a downtime. Celery also has built-in retry mechanisms, in case a task fails.

We've open-sourced what I think is a structured app. that Brian's solution above alludes too. We would love any / all feedback!
https://github.com/tivix/django-cron
It comes with one management command:
./manage.py runcrons
That does the job. Each cron is modeled as a class (so its all OO) and each cron runs at a different frequency and we make sure the same cron type doesn't run in parallel (in case crons themselves take longer time to run than their frequency!)

If you're using a standard POSIX OS, you use cron.
If you're using Windows, you use at.
Write a Django management command to
Figure out what platform they're on.
Either execute the appropriate "AT" command for your users, or update the crontab for your users.

Interesting new pluggable Django app: django-chronograph
You only have to add one cron entry which acts as a timer, and you have a very nice Django admin interface into the scripts to run.

Look at Django Poor Man's Cron which is a Django app that makes use of spambots, search engine indexing robots and alike to run scheduled tasks in approximately regular intervals
See: http://code.google.com/p/django-poormanscron/

I had exactly the same requirement a while ago, and ended up solving it using APScheduler (User Guide)
It makes scheduling jobs super simple, and keeps it independent for from request-based execution of some code. Following is a simple example.
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
job = None
def tick():
print('One tick!')\
def start_job():
global job
job = scheduler.add_job(tick, 'interval', seconds=3600)
try:
scheduler.start()
except:
pass
Hope this helps somebody!

Django APScheduler for Scheduler Jobs. Advanced Python Scheduler (APScheduler) is a Python library that lets you schedule your Python code to be executed later, either just once or periodically. You can add new jobs or remove old ones on the fly as you please.
note: I'm the author of this library
Install APScheduler
pip install apscheduler
View file function to call
file name: scheduler_jobs.py
def FirstCronTest():
print("")
print("I am executed..!")
Configuring the scheduler
make execute.py file and add the below codes
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
Your written functions Here, the scheduler functions are written in scheduler_jobs
import scheduler_jobs
scheduler.add_job(scheduler_jobs.FirstCronTest, 'interval', seconds=10)
scheduler.start()
Link the File for Execution
Now, add the below line in the bottom of Url file
import execute
You can check the full code by executing
[Click here]
https://github.com/devchandansh/django-apscheduler

Brian Neal's suggestion of running management commands via cron works well, but if you're looking for something a little more robust (yet not as elaborate as Celery) I'd look into a library like Kronos:
# app/cron.py
import kronos
#kronos.register('0 * * * *')
def task():
pass

RabbitMQ and Celery have more features and task handling capabilities than Cron. If task failure isn't an issue, and you think you will handle broken tasks in the next call, then Cron is sufficient.
Celery & AMQP will let you handle the broken task, and it will get executed again by another worker (Celery workers listen for the next task to work on), until the task's max_retries attribute is reached. You can even invoke tasks on failure, like logging the failure, or sending an email to the admin once the max_retries has been reached.
And you can distribute Celery and AMQP servers when you need to scale your application.

I personally use cron, but the Jobs Scheduling parts of django-extensions looks interesting.

Although not part of Django, Airflow is a more recent project (as of 2016) that is useful for task management.
Airflow is a workflow automation and scheduling system that can be used to author and manage data pipelines. A web-based UI provides the developer with a range of options for managing and viewing these pipelines.
Airflow is written in Python and is built using Flask.
Airflow was created by Maxime Beauchemin at Airbnb and open sourced in the spring of 2015. It joined the Apache Software Foundation’s incubation program in the winter of 2016. Here is the Git project page and some addition background information.

Put the following at the top of your cron.py file:
#!/usr/bin/python
import os, sys
sys.path.append('/path/to/') # the parent directory of the project
sys.path.append('/path/to/project') # these lines only needed if not on path
os.environ['DJANGO_SETTINGS_MODULE'] = 'myproj.settings'
# imports and code below

I just thought about this rather simple solution:
Define a view function do_work(req, param) like you would with any other view, with URL mapping, return a HttpResponse and so on.
Set up a cron job with your timing preferences (or using AT or Scheduled Tasks in Windows) which runs curl http://localhost/your/mapped/url?param=value.
You can add parameters but just adding parameters to the URL.
Tell me what you guys think.
[Update] I'm now using runjob command from django-extensions instead of curl.
My cron looks something like this:
#hourly python /path/to/project/manage.py runjobs hourly
... and so on for daily, monthly, etc'. You can also set it up to run a specific job.
I find it more managable and a cleaner. Doesn't require mapping a URL to a view. Just define your job class and crontab and you're set.

after the part of code,I can write anything just like my views.py :)
#######################################
import os,sys
sys.path.append('/home/administrator/development/store')
os.environ['DJANGO_SETTINGS_MODULE']='store.settings'
from django.core.management impor setup_environ
from store import settings
setup_environ(settings)
#######################################
from
http://www.cotellese.net/2007/09/27/running-external-scripts-against-django-models/

You should definitely check out django-q!
It requires no additional configuration and has quite possibly everything needed to handle any production issues on commercial projects.
It's actively developed and integrates very well with django, django ORM, mongo, redis. Here is my configuration:
# django-q
# -------------------------------------------------------------------------
# See: http://django-q.readthedocs.io/en/latest/configure.html
Q_CLUSTER = {
# Match recommended settings from docs.
'name': 'DjangoORM',
'workers': 4,
'queue_limit': 50,
'bulk': 10,
'orm': 'default',
# Custom Settings
# ---------------
# Limit the amount of successful tasks saved to Django.
'save_limit': 10000,
# See https://github.com/Koed00/django-q/issues/110.
'catch_up': False,
# Number of seconds a worker can spend on a task before it's terminated.
'timeout': 60 * 5,
# Number of seconds a broker will wait for a cluster to finish a task before presenting it again. This needs to be
# longer than `timeout`, otherwise the same task will be processed multiple times.
'retry': 60 * 6,
# Whether to force all async() calls to be run with sync=True (making them synchronous).
'sync': False,
# Redirect worker exceptions directly to Sentry error reporter.
'error_reporter': {
'sentry': RAVEN_CONFIG,
},
}

Yes, the method above is so great. And I tried some of them. At last, I found a method like this:
from threading import Timer
def sync():
do something...
sync_timer = Timer(self.interval, sync, ())
sync_timer.start()
Just like Recursive.
Ok, I hope this method can meet your requirement. :)

A more modern solution (compared to Celery) is Django Q:
https://django-q.readthedocs.io/en/latest/index.html
It has great documentation and is easy to grok. Windows support is lacking, because Windows does not support process forking. But it works fine if you create your dev environment using the Windows for Linux Subsystem.

I had something similar with your problem today.
I didn't wanted to have it handled by the server trhough cron (and most of the libs were just cron helpers in the end).
So i've created a scheduling module and attached it to the init .
It's not the best approach, but it helps me to have all the code in a single place and with its execution related to the main app.

I use celery to create my periodical tasks. First you need to install it as follows:
pip install django-celery
Don't forget to register django-celery in your settings and then you could do something like this:
from celery import task
from celery.decorators import periodic_task
from celery.task.schedules import crontab
from celery.utils.log import get_task_logger
#periodic_task(run_every=crontab(minute="0", hour="23"))
def do_every_midnight():
#your code

I am not sure will this be useful for anyone, since I had to provide other users of the system to schedule the jobs, without giving them access to the actual server(windows) Task Scheduler, I created this reusable app.
Please note users have access to one shared folder on server where they can create required command/task/.bat file. This task then can be scheduled using this app.
App name is Django_Windows_Scheduler
ScreenShot:

If you want something more reliable than Celery, try TaskHawk which is built on top of AWS SQS/SNS.
Refer: http://taskhawk.readthedocs.io

For simple dockerized projects, I could not really see any existing answer fit.
So I wrote a very barebones solution without the need of external libraries or triggers, which runs on its own. No external os-cron needed, should work in every environment.
It works by adding a middleware: middleware.py
import threading
def should_run(name, seconds_interval):
from application.models import CronJob
from django.utils.timezone import now
try:
c = CronJob.objects.get(name=name)
except CronJob.DoesNotExist:
CronJob(name=name, last_ran=now()).save()
return True
if (now() - c.last_ran).total_seconds() >= seconds_interval:
c.last_ran = now()
c.save()
return True
return False
class CronTask:
def __init__(self, name, seconds_interval, function):
self.name = name
self.seconds_interval = seconds_interval
self.function = function
def cron_worker(*_):
if not should_run("main", 60):
return
# customize this part:
from application.models import Event
tasks = [
CronTask("events", 60 * 30, Event.clean_stale_objects),
# ...
]
for task in tasks:
if should_run(task.name, task.seconds_interval):
task.function()
def cron_middleware(get_response):
def middleware(request):
response = get_response(request)
threading.Thread(target=cron_worker).start()
return response
return middleware
models/cron.py:
from django.db import models
class CronJob(models.Model):
name = models.CharField(max_length=10, primary_key=True)
last_ran = models.DateTimeField()
settings.py:
MIDDLEWARE = [
...
'application.middleware.cron_middleware',
...
]

Simple way is to write a custom shell command see Django Documentation and execute it using a cronjob on linux. However i would highly recommend using a message broker like RabbitMQ coupled with celery. Maybe you can have a look at
this Tutorial

One alternative is to use Rocketry:
from rocketry import Rocketry
from rocketry.conds import daily, after_success
app = Rocketry()
#app.task(daily.at("10:00"))
def do_daily():
...
#app.task(after_success(do_daily))
def do_after_another():
...
if __name__ == "__main__":
app.run()
It also supports custom conditions:
from pathlib import Path
#app.cond()
def file_exists(file):
return Path(file).exists()
#app.task(daily & file_exists("myfile.csv"))
def do_custom():
...
And it also supports Cron:
from rocketry.conds import cron
#app.task(cron('*/2 12-18 * Oct Fri'))
def do_cron():
...
It can be integrated quite nicely with FastAPI and I think it could be integrated with Django as well as Rocketry is essentially just a sophisticated loop that can spawn, async tasks, threads and processes.
Disclaimer: I'm the author.

Another option, similar to Brian Neal's answer it to use RunScripts
Then you don't need to set up commands. This has the advantage of more flexible or cleaner folder structures.
This file must implement a run() function. This is what gets called when you run the script. You can import any models or other parts of your django project to use in these scripts.
And then, just
python manage.py runscript path.to.script

How to deal with environment variables and concurrency in Django

I'm working on a web application (using Django) that use another software to make some processing. This software needs to set its working directory to be in the environment variables. When a client make a request the app create the working directory (create data to be used by the external software). Then set the environment variable used by the external software to the created directory. Finally we call the external software and get the result.
Here's a summary of what the app is doing :
def request(data):
path = create_working_directory(data)
os.environ['WORKING_DIRECTORY'] = path
result = call_the_external_software()
I haven't tested this yet (in reality it's not as simple as in this example). I'm thinking to execute this function in new process. Will I have problems when multiple client make simultaneous requests? If yes what should I do to fix the problems?
ps : I can't change anything on the external program.

See https://docs.python.org/2/library/subprocess.html#subprocess.Popen. Note that Popen takes a "env" argument that you can use to define environment variables in the child call.
def request(data):
path = create_working_directory(data)
env = {"WORKING_DIRECTORY": path}
result = subprocess.call([ext_script] + ext_args, env=env)
return result # presumably

Set up a scheduled job?

I've been working on a web app using Django, and I'm curious if there is a way to schedule a job to run periodically.
Basically I just want to run through the database and make some calculations/updates on an automatic, regular basis, but I can't seem to find any documentation on doing this.
Does anyone know how to set this up?
To clarify: I know I can set up a cron job to do this, but I'm curious if there is some feature in Django that provides this functionality. I'd like people to be able to deploy this app themselves without having to do much config (preferably zero).
I've considered triggering these actions "retroactively" by simply checking if a job should have been run since the last time a request was sent to the site, but I'm hoping for something a bit cleaner.

One solution that I have employed is to do this:
1) Create a custom management command, e.g.
python manage.py my_cool_command
2) Use cron (on Linux) or at (on Windows) to run my command at the required times.
This is a simple solution that doesn't require installing a heavy AMQP stack. However there are nice advantages to using something like Celery, mentioned in the other answers. In particular, with Celery it is nice to not have to spread your application logic out into crontab files. However the cron solution works quite nicely for a small to medium sized application and where you don't want a lot of external dependencies.
EDIT:
In later version of windows the at command is deprecated for Windows 8, Server 2012 and above. You can use schtasks.exe for same use.
**** UPDATE ****
This the new link of django doc for writing the custom management command

Celery is a distributed task queue, built on AMQP (RabbitMQ). It also handles periodic tasks in a cron-like fashion (see periodic tasks). Depending on your app, it might be worth a gander.
Celery is pretty easy to set up with django (docs), and periodic tasks will actually skip missed tasks in case of a downtime. Celery also has built-in retry mechanisms, in case a task fails.

We've open-sourced what I think is a structured app. that Brian's solution above alludes too. We would love any / all feedback!
https://github.com/tivix/django-cron
It comes with one management command:
./manage.py runcrons
That does the job. Each cron is modeled as a class (so its all OO) and each cron runs at a different frequency and we make sure the same cron type doesn't run in parallel (in case crons themselves take longer time to run than their frequency!)

If you're using a standard POSIX OS, you use cron.
If you're using Windows, you use at.
Write a Django management command to
Figure out what platform they're on.
Either execute the appropriate "AT" command for your users, or update the crontab for your users.

Interesting new pluggable Django app: django-chronograph
You only have to add one cron entry which acts as a timer, and you have a very nice Django admin interface into the scripts to run.

Look at Django Poor Man's Cron which is a Django app that makes use of spambots, search engine indexing robots and alike to run scheduled tasks in approximately regular intervals
See: http://code.google.com/p/django-poormanscron/

I had exactly the same requirement a while ago, and ended up solving it using APScheduler (User Guide)
It makes scheduling jobs super simple, and keeps it independent for from request-based execution of some code. Following is a simple example.
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
job = None
def tick():
print('One tick!')\
def start_job():
global job
job = scheduler.add_job(tick, 'interval', seconds=3600)
try:
scheduler.start()
except:
pass
Hope this helps somebody!

Django APScheduler for Scheduler Jobs. Advanced Python Scheduler (APScheduler) is a Python library that lets you schedule your Python code to be executed later, either just once or periodically. You can add new jobs or remove old ones on the fly as you please.
note: I'm the author of this library
Install APScheduler
pip install apscheduler
View file function to call
file name: scheduler_jobs.py
def FirstCronTest():
print("")
print("I am executed..!")
Configuring the scheduler
make execute.py file and add the below codes
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
Your written functions Here, the scheduler functions are written in scheduler_jobs
import scheduler_jobs
scheduler.add_job(scheduler_jobs.FirstCronTest, 'interval', seconds=10)
scheduler.start()
Link the File for Execution
Now, add the below line in the bottom of Url file
import execute
You can check the full code by executing
[Click here]
https://github.com/devchandansh/django-apscheduler

Brian Neal's suggestion of running management commands via cron works well, but if you're looking for something a little more robust (yet not as elaborate as Celery) I'd look into a library like Kronos:
# app/cron.py
import kronos
#kronos.register('0 * * * *')
def task():
pass

RabbitMQ and Celery have more features and task handling capabilities than Cron. If task failure isn't an issue, and you think you will handle broken tasks in the next call, then Cron is sufficient.
Celery & AMQP will let you handle the broken task, and it will get executed again by another worker (Celery workers listen for the next task to work on), until the task's max_retries attribute is reached. You can even invoke tasks on failure, like logging the failure, or sending an email to the admin once the max_retries has been reached.
And you can distribute Celery and AMQP servers when you need to scale your application.

I personally use cron, but the Jobs Scheduling parts of django-extensions looks interesting.

Although not part of Django, Airflow is a more recent project (as of 2016) that is useful for task management.
Airflow is a workflow automation and scheduling system that can be used to author and manage data pipelines. A web-based UI provides the developer with a range of options for managing and viewing these pipelines.
Airflow is written in Python and is built using Flask.
Airflow was created by Maxime Beauchemin at Airbnb and open sourced in the spring of 2015. It joined the Apache Software Foundation’s incubation program in the winter of 2016. Here is the Git project page and some addition background information.

Put the following at the top of your cron.py file:
#!/usr/bin/python
import os, sys
sys.path.append('/path/to/') # the parent directory of the project
sys.path.append('/path/to/project') # these lines only needed if not on path
os.environ['DJANGO_SETTINGS_MODULE'] = 'myproj.settings'
# imports and code below

I just thought about this rather simple solution:
Define a view function do_work(req, param) like you would with any other view, with URL mapping, return a HttpResponse and so on.
Set up a cron job with your timing preferences (or using AT or Scheduled Tasks in Windows) which runs curl http://localhost/your/mapped/url?param=value.
You can add parameters but just adding parameters to the URL.
Tell me what you guys think.
[Update] I'm now using runjob command from django-extensions instead of curl.
My cron looks something like this:
#hourly python /path/to/project/manage.py runjobs hourly
... and so on for daily, monthly, etc'. You can also set it up to run a specific job.
I find it more managable and a cleaner. Doesn't require mapping a URL to a view. Just define your job class and crontab and you're set.

after the part of code,I can write anything just like my views.py :)
#######################################
import os,sys
sys.path.append('/home/administrator/development/store')
os.environ['DJANGO_SETTINGS_MODULE']='store.settings'
from django.core.management impor setup_environ
from store import settings
setup_environ(settings)
#######################################
from
http://www.cotellese.net/2007/09/27/running-external-scripts-against-django-models/

You should definitely check out django-q!
It requires no additional configuration and has quite possibly everything needed to handle any production issues on commercial projects.
It's actively developed and integrates very well with django, django ORM, mongo, redis. Here is my configuration:
# django-q
# -------------------------------------------------------------------------
# See: http://django-q.readthedocs.io/en/latest/configure.html
Q_CLUSTER = {
# Match recommended settings from docs.
'name': 'DjangoORM',
'workers': 4,
'queue_limit': 50,
'bulk': 10,
'orm': 'default',
# Custom Settings
# ---------------
# Limit the amount of successful tasks saved to Django.
'save_limit': 10000,
# See https://github.com/Koed00/django-q/issues/110.
'catch_up': False,
# Number of seconds a worker can spend on a task before it's terminated.
'timeout': 60 * 5,
# Number of seconds a broker will wait for a cluster to finish a task before presenting it again. This needs to be
# longer than `timeout`, otherwise the same task will be processed multiple times.
'retry': 60 * 6,
# Whether to force all async() calls to be run with sync=True (making them synchronous).
'sync': False,
# Redirect worker exceptions directly to Sentry error reporter.
'error_reporter': {
'sentry': RAVEN_CONFIG,
},
}

Yes, the method above is so great. And I tried some of them. At last, I found a method like this:
from threading import Timer
def sync():
do something...
sync_timer = Timer(self.interval, sync, ())
sync_timer.start()
Just like Recursive.
Ok, I hope this method can meet your requirement. :)

A more modern solution (compared to Celery) is Django Q:
https://django-q.readthedocs.io/en/latest/index.html
It has great documentation and is easy to grok. Windows support is lacking, because Windows does not support process forking. But it works fine if you create your dev environment using the Windows for Linux Subsystem.

I had something similar with your problem today.
I didn't wanted to have it handled by the server trhough cron (and most of the libs were just cron helpers in the end).
So i've created a scheduling module and attached it to the init .
It's not the best approach, but it helps me to have all the code in a single place and with its execution related to the main app.

I use celery to create my periodical tasks. First you need to install it as follows:
pip install django-celery
Don't forget to register django-celery in your settings and then you could do something like this:
from celery import task
from celery.decorators import periodic_task
from celery.task.schedules import crontab
from celery.utils.log import get_task_logger
#periodic_task(run_every=crontab(minute="0", hour="23"))
def do_every_midnight():
#your code

I am not sure will this be useful for anyone, since I had to provide other users of the system to schedule the jobs, without giving them access to the actual server(windows) Task Scheduler, I created this reusable app.
Please note users have access to one shared folder on server where they can create required command/task/.bat file. This task then can be scheduled using this app.
App name is Django_Windows_Scheduler
ScreenShot:

If you want something more reliable than Celery, try TaskHawk which is built on top of AWS SQS/SNS.
Refer: http://taskhawk.readthedocs.io

For simple dockerized projects, I could not really see any existing answer fit.
So I wrote a very barebones solution without the need of external libraries or triggers, which runs on its own. No external os-cron needed, should work in every environment.
It works by adding a middleware: middleware.py
import threading
def should_run(name, seconds_interval):
from application.models import CronJob
from django.utils.timezone import now
try:
c = CronJob.objects.get(name=name)
except CronJob.DoesNotExist:
CronJob(name=name, last_ran=now()).save()
return True
if (now() - c.last_ran).total_seconds() >= seconds_interval:
c.last_ran = now()
c.save()
return True
return False
class CronTask:
def __init__(self, name, seconds_interval, function):
self.name = name
self.seconds_interval = seconds_interval
self.function = function
def cron_worker(*_):
if not should_run("main", 60):
return
# customize this part:
from application.models import Event
tasks = [
CronTask("events", 60 * 30, Event.clean_stale_objects),
# ...
]
for task in tasks:
if should_run(task.name, task.seconds_interval):
task.function()
def cron_middleware(get_response):
def middleware(request):
response = get_response(request)
threading.Thread(target=cron_worker).start()
return response
return middleware
models/cron.py:
from django.db import models
class CronJob(models.Model):
name = models.CharField(max_length=10, primary_key=True)
last_ran = models.DateTimeField()
settings.py:
MIDDLEWARE = [
...
'application.middleware.cron_middleware',
...
]

Simple way is to write a custom shell command see Django Documentation and execute it using a cronjob on linux. However i would highly recommend using a message broker like RabbitMQ coupled with celery. Maybe you can have a look at
this Tutorial

One alternative is to use Rocketry:
from rocketry import Rocketry
from rocketry.conds import daily, after_success
app = Rocketry()
#app.task(daily.at("10:00"))
def do_daily():
...
#app.task(after_success(do_daily))
def do_after_another():
...
if __name__ == "__main__":
app.run()
It also supports custom conditions:
from pathlib import Path
#app.cond()
def file_exists(file):
return Path(file).exists()
#app.task(daily & file_exists("myfile.csv"))
def do_custom():
...
And it also supports Cron:
from rocketry.conds import cron
#app.task(cron('*/2 12-18 * Oct Fri'))
def do_cron():
...
It can be integrated quite nicely with FastAPI and I think it could be integrated with Django as well as Rocketry is essentially just a sophisticated loop that can spawn, async tasks, threads and processes.
Disclaimer: I'm the author.

Another option, similar to Brian Neal's answer it to use RunScripts
Then you don't need to set up commands. This has the advantage of more flexible or cleaner folder structures.
This file must implement a run() function. This is what gets called when you run the script. You can import any models or other parts of your django project to use in these scripts.
And then, just
python manage.py runscript path.to.script

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Running background thread in GAE flexible environment with Python-compact - python

Related

Why firebase_admin doesn't resolve when running multiprocesses

What is the best way to run python scripts in AWS?

How to write python script to run automatically at 11:30 pm everyday? [duplicate]

How to deal with environment variables and concurrency in Django

Set up a scheduled job?

Categories

Resources