Restarting The Jobs In ApScheduler Python When the Wsgi Server is Restarted - python

I'm using python Apscheduler to schedule my jobs. All my jobs are stored as a cron job and use the BackgroundScheduler. I've the following codes:
def letschedule():
jobstores = {
'default': SQLAlchemyJobStore(url=app_jobs_store)
}
executors = {
'default': ThreadPoolExecutor(20),
'processpool': ProcessPoolExecutor(5)
}
job_defaults = {
'coalesce': False,
'max_instances': 1,
'misfire_grace_time':1200
}
scheduler = BackgroundScheduler(jobstores=jobstores, executors=executors, job_defaults=job_defaults, timezone=utc)
#jobstores=jobstores, executors=executors, job_defaults=job_defaults, timezone=utc
return scheduler
And I start the job scheduler as follow in the app:
sch = letschedule()
sch.start()
log.info('the scheduler started')
And I've the following add job function.
def addjobs():
jobs = []
try:
sch.add_job(forecast_jobs, 'cron', day_of_week=os.environ.get("FORECAST_WEEKOFDAY"),
id="forecast",
replace_existing=False,week='1-53',hour=os.environ.get("FORECAST_HOUR"),
minute=os.environ.get("FORECAST_MINUTE"), timezone='UTC')
jobs.append({'job_id':'forecast', 'type':'weekly'})
log.info('the forecast added to the scheduler')
except BaseException as e:
log.info(e)
pass
try:
sch.add_job(insertcwhstock, 'cron',
id="cwhstock_data", day_of_week='0-6', replace_existing=False,hour=os.environ.get("CWHSTOCK_HOUR"),
minute=os.environ.get("CWHSTOCK_MINUTE"),
week='1-53',timezone='UTC')
jobs.append({'job_id':'cwhstock_data', 'type':'daily'})
log.info('the cwhstock job added to the scheduler')
except BaseException as e:
log.info(e)
pass
return json.dumps({'data':jobs})
I use this in the flask application, I call the /activatejobs and the jobs are added to the scheduler and it works fine. However when I restart the wsgi server, the jobs aren't started again, I've to remove the .sqlite file and add the jobs again. What I want is the jobs are supposed to be restarted automatically once the scheduler is started (if there are already jobs in the database.)
I tried to get such result trying some ways, but couldn't. Any help would be greatly appreciated. Thanks in Advance.

I also had the same problem using FastApi framework. I could solve the problem after add this code to my app.py:
scheduler = BackgroundScheduler()
pg_job_store = SQLAlchemyJobStore(engine=my_engine)
scheduler.add_jobstore(jobstore=pg_job_store, alias='sqlalchemy')
scheduler.start()
Adding this code, after I restart the application server I could see apscheduler logs searching for jobs:
2021-10-20 14:37:53,433 - apscheduler.scheduler - INFO => Scheduler started
2021-10-20 14:37:53,433 - apscheduler.scheduler - DEBUG => Looking for jobs to run
Jobstore default:
No scheduled jobs
Jobstore sqlalchemy:
remove_from_db_job (trigger: date[2021-10-20 14:38:00 -03], next run at: 2021-10-20 14:38:00 -03)
2021-10-20 14:37:53,443 - apscheduler.scheduler - DEBUG => Next wakeup is due at 2021-10-20 14:38:00-03:00 (in 6.565892 seconds)
It works for me.

Related

APScheduler does not run scheduled tasks: Flask + uWSGI

I have an application on a Flask and uWSGI with a jobstore in a SQLite. I start the scheduler along with the application, and add new tasks through add_task when some url is visited.
I see that the tasks are saved correctly in the jobstore, I can view them through the API, but it does not execute at the appointed time.
A few important data:
uwsgi.ini
processes = 1
enable-threads = true
__init__.py
scheduler = APScheduler()
scheduler.init_app(app)
with app.app_context():
scheduler.start()
main.py
scheduler.add_job(
id='{}{}'.format(test.id, g.user.id),
func = pay_day,
args = [test.id, g.user.id],
trigger ='interval',
minutes=test.timer
)
in service.py
def pay_day(tid, uid):
with scheduler.app.app_context():
*some code here*
Interesting behavior: if you create a task by going to the URL and restart the application after that, the task will be executed. But if the application is running and one of the users creates a task by going to the URL, then this task will not be completed until the application is restarted.
I don't get any errors or exceptions, even in the scheduler logs.
I already have no idea how to make it work and what I did wrong. I need a hint.
uWSGI employs some tricks which disable the Global Interpreter Lock and with it, the use of threads which are vital to the operation of APScheduler. To fix this, you need to re-enable the GIL using the --enable-threads switch. See the uWSGI documentation for more details.
I know that you had enable-threads = true in uwsgi.ini, but try the to enable it using the command line.

Celery jobs not running on heroku (python/django app)

I have a Django app setup with some scheduled tasks. The app is deployed on Heroku with Redis. The task runs if invoked synchronously in the console, or locally when I also have redis and celery running. However, the scheduled jobs are not running on Heroku.
My task:
#shared_task(name="send_emails")
def send_emails():
.....
celery.py:
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
from celery.schedules import crontab
# set the default Django settings module for the 'celery' program.
# this is also used in manage.py
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'my_app.settings')
# Get the base REDIS URL, default to redis' default
BASE_REDIS_URL = os.environ.get('REDIS_URL', 'redis://localhost:6379')
app = Celery('my_app')
# Using a string here means the worker don't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
app.conf.broker_url = BASE_REDIS_URL
# this allows you to schedule items in the Django admin.
app.conf.beat_scheduler = 'django_celery_beat.schedulers.DatabaseScheduler'
# These are the scheduled jobs
app.conf.beat_schedule = {
'send_emails_crontab': {
'task': 'send_emails',
'schedule': crontab(hour=9, minute=0),
'args': (),
}
}
In Procfile:
worker: celery -A my_app worker --beat -S django -l info
I've spun up the worker with heroku ps:scale worker=1 -a my-app.
I can see the registered tasks under [tasks] in the worker logs.
However, the scheduled tasks are not running at their scheduled time. Calling send_emails.delay() in the production console does work.
How do I get the worker to stay alive and / or run the job at the scheduled time?
I have a workaround using a command and heroku scheduler. Just unsure if that's the best way to do it.
If you're on free demo, you should know that heroku server sleeps and if your scheduled task becomes due when your server is sleeping, it won't run.
I share you any ideas.
Run console and get the datetime of Dyno. The Dyno use a localtime US.
The DynoFree sleeps each 30 minutes and only 450 hours/month.
Try change celery to BackgroundScheduler,
you need add a script clock.py as:
from myapp import myfunction
from datetime import datetime
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.schedulers.blocking import BlockingScheduler
from time import monotonic, sleep, ctime
import os
sched = BlockingScheduler()
hour = int(os.environ.get("SEARCH_HOUR"))
minutes = int(os.environ.get("SEARCH_MINUTES"))
#sched.scheduled_job('cron', day_of_week='mon-sun', hour=hour, minute = minutes)
def scheduled_job():
print('This job: Execute myfunction every at ', hour, ':', minutes)
#My function
myfunction()
sched.start(
)
In Procfile:
clock: python clock.py
and run:
heroku ps:scale clock=1 --app thenameapp
Regards.

Jobs are not being stored in the database after add_job() has been called

I'm trying to use Flask-APScheduler to run some delayed jobs and after setup and running the app it successfully creates jobs table in Postgres database but does not save jobs there after add_job() has been successfully called.
config.py
(Tried to use default tableschema and tablename but still no jobs stored)
SCHEDULER_TIMEZONE = utc
SCHEDULER_API_ENABLED = False
SCHEDULER_JOBSTORES = {
'default': SQLAlchemyJobStore(
url=SQLALCHEMY_DATABASE_URI,
tableschema=DB_SCHEMA,
tablename='jobs'
)
}
SCHEDULER_EXECUTORS = {
'default': ThreadPoolExecutor(20),
'processpool': ProcessPoolExecutor(5)
}
SCHEDULER_JOB_DEFAULTS = {
'coalesce': False,
'max_instances': 3
}
adding job piece of code
(Both log messages displayed and except body ignored. Also tried implicitly set jobstore='default' but still the same issue)
date = datetime.strptime(date_string, DATETIME_FORMAT)
try:
app.logger.info('adding job')
scheduler.add_job(
job_id,
publish_post,
name=job_name,
trigger='date',
next_run_time=date,
kwargs={
'channel': channel,
'post': post
}
)
app.logger.info('job added')
except Exception as e:
app.logger.error(e)
sentry.captureException()
return {
'status': 'FAILED',
'message': 'Can not add job ({})'.format(job_name)
}
job.py
(Test job to run)
def publish_post(channel, post):
app.logger.info('{}: {}'.format(channel, post.text))
UPD:
With DEBUG mode on start I have the following messages:
gunicorn_scheduler_1 | 2019-04-04 07:07:21 - INFO - Scheduler started
gunicorn_scheduler_1 | 2019-04-04 07:07:21 - DEBUG - No jobs; waiting until a job is added
but after add_job() call between adding job & job added I have an additional message from APScheduler:
2019-04-04 07:09:46 - INFO - Adding job tentatively -- it will be properly scheduled when the scheduler starts
But the first message tells that scheduler is already started.
I had the same problem and i solved using 1 line to setup the database
Example:
scheduler = BlockingScheduler(jobstores=jobstores (You dont need this param), executors=executors, job_defaults=job_defaults, timezone='Etc/GMT+3')
scheduler = BlockingScheduler(executors=executors, job_defaults=job_defaults, timezone='Etc/GMT+3')
-- Use this line to setup your database
scheduler.add_jobstore('sqlalchemy', url='sqlite:///jobs.sqlite')
Its done your jobs starting to save inside the database

Celery multi workers unexpected task execution order

I run celery:
celery multi start --app=myapp fast_worker
slow_worker
-Q:fast_worker fast-queue
-Q:slow_worker slow-queue
-c:fast_worker 1 -c:slow_worker 1
--logfile=%n.log --pidfile=%n.pid
And celerybeat:
celery beat -A myapp
Task:
#task.periodic_task(run_every=timedelta(seconds=5), ignore_result=True)
def test_log_task_queue():
import time
time.sleep(10)
print "test_log_task_queue"
Routing:
CELERY_ROUTES = {
'myapp.tasks.test_log_task_queue': {
'queue': 'slow-queue',
'routing_key': 'slow-queue',
},
}
I use rabbitMQ. When I open rabbitMQ admin panel, I see that my tasks are in slow-queue, but when I open logs I see task output for both workers. Why do both workers execute my tasks, even when task not in worker queue?
It looks like celery multi creates something like shared queues. To fix this problem, I added -X option:
celery multi start --app=myapp fast_worker
slow_worker
-Q:fast_worker fast-queue
-Q:slow_worker slow-queue
-X:fast_worker slow-queue
-X:slow_worker fast-queue
-c:fast_worker 1 -c:slow_worker 1
--logfile=%n.log --pidfile=%n.pid

Setting up periodic tasks in Celery (celerybeat) dynamically using add_periodic_task

I'm using Celery 4.0.1 with Django 1.10 and I have troubles scheduling tasks (running a task works fine). Here is the celery configuration:
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myapp.settings')
app = Celery('myapp')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
app.conf.BROKER_URL = 'amqp://{}:{}#{}'.format(settings.AMQP_USER, settings.AMQP_PASSWORD, settings.AMQP_HOST)
app.conf.CELERY_DEFAULT_EXCHANGE = 'myapp.celery'
app.conf.CELERY_DEFAULT_QUEUE = 'myapp.celery_default'
app.conf.CELERY_TASK_SERIALIZER = 'json'
app.conf.CELERY_ACCEPT_CONTENT = ['json']
app.conf.CELERY_IGNORE_RESULT = True
app.conf.CELERY_DISABLE_RATE_LIMITS = True
app.conf.BROKER_POOL_LIMIT = 2
app.conf.CELERY_QUEUES = (
Queue('myapp.celery_default'),
Queue('myapp.queue1'),
Queue('myapp.queue2'),
Queue('myapp.queue3'),
)
Then in tasks.py I have:
#app.task(queue='myapp.queue1')
def my_task(some_id):
print("Doing something with", some_id)
In views.py I want to schedule this task:
def my_view(request, id):
app.add_periodic_task(10, my_task.s(id))
Then I execute the commands:
sudo systemctl start rabbitmq.service
celery -A myapp.celery_app beat -l debug
celery worker -A myapp.celery_app
But the task is never scheduled. I don't see anything in the logs. The task is working because if in my view I do:
def my_view(request, id):
my_task.delay(id)
The task is executed.
If in my configuration file if I schedule the task manually, like this it works:
app.conf.CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.my_task',
'schedule': 10.0,
'args': (66,)
},
}
I just can't schedule the task dynamically. Any idea?
EDIT: (13/01/2018)
The latest release 4.1.0 have addressed the subject in this ticket #3958 and has been merged
Actually you can't not define periodic task at the view level, because the beat schedule setting will be loaded first and can not be rescheduled at runtime:
The add_periodic_task() function will add the entry to the beat_schedule setting behind the scenes, and the same setting can also can be used to set up periodic tasks manually:
app.conf.CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.my_task',
'schedule': 10.0,
'args': (66,)
},
}
which means if you want to use add_periodic_task() it should be wrapped within an on_after_configure handler at the celery app level and any modification on runtime will not take effect:
app = Celery()
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(10, my_task.s(66))
As mentioned in the doc the the regular celerybeat simply keep track of task execution:
The default scheduler is the celery.beat.PersistentScheduler, that simply keeps track of the last run times in a local shelve database file.
In order to be able to dynamically manage periodic tasks and reschedule celerybeat at runtime:
There’s also the django-celery-beat extension that stores the schedule in the Django database, and presents a convenient admin interface to manage periodic tasks at runtime.
The tasks will be persisted in django database and the scheduler could be updated in task model at the db level. Whenever you update a periodic task a counter in this tasks table will be incremented, and tells the celery beat service to reload the schedule from the database.
A possible solution for you could be as follow:
from django_celery_beat.models import PeriodicTask, IntervalSchedule
schedule= IntervalSchedule.objects.create(every=10, period=IntervalSchedule.SECONDS)
task = PeriodicTask.objects.create(interval=schedule, name='any name', task='tasks.my_task', args=json.dumps([66]))
views.py
def update_task_view(request, id)
task = PeriodicTask.objects.get(name="task name") # if we suppose names are unique
task.args=json.dumps([id])
task.save()

Categories