APScheduler cron job stops running on restart - python

I have cron jobs that I schedule to run every minute, when I run the program for the first time it works fine. If I close and restart the program the jobs never run again, even though the scheduler.print_jobs() function shows the jobs are scheduled as this is printed:
Pending jobs:
daily_check (trigger: cron[second='0'], next run at: 2023-02-16 15:26:08 UTC)
daily_check (trigger: cron[second='30'], pending)
daily_check (trigger: cron[second='50'], pending)
This is the scheduling code that runs when the program runs
def schedule_daily_check():
try:
scheduler.add_job(daily_check,'cron',second=00,id='daily_kek',jitter=10,replace_existing=False, misfire_grace_time = 1,coalesce=True)
scheduler.add_job(daily_check,'cron',second=30,id='daily_kek2',jitter=10,replace_existing=False, misfire_grace_time = 1,coalesce=True)
scheduler.add_job(daily_check,'cron',second=50,id='daily_kek3',jitter=10,replace_existing=False, misfire_grace_time = 1,coalesce=True)
#scheduler.add_job(daily_check,'cron',second=43,second=random.randint(0,59),id='daily_kek4',jitter=60*60,replace_existing=False)
scheduler.start()
except ConflictingIdError:
scheduler.print_jobs()

Related

How to perform scheduling using python?

I am trying to schedule a few jobs inside my python. Supposely , the text from the logging should appear every 1 minute and every 5 minute from jobs.py file inside my docker container. However, the text is appearing every 2minutes inside the docker container. Is there a clash between the python schedule and cronjobs ?
Current Output inside the docker container
13:05:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:05:00] "GET /reminder/send_reminders HTTP/1.1" 200 -
13:06:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:06:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:07:00 [D] schedule Running job Job(interval=1, unit=minutes, do=job_feeds_update, args=(), kwargs={})
13:07:00 [I] jobs job_feeds_update
13:07:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:07:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:08:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:08:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:09:00 [D] schedule Running job Job(interval=1, unit=minutes, do=job_feeds_update, args=(), kwargs={})
13:09:00 [I] jobs job_feeds_update
13:09:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:09:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:10:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:10:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:10:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:10:00] "GET /reminder/send_reminders HTTP/1.1" 200 -
13:11:00 [D] schedule Running job Job(interval=1, unit=minutes, do=job_feeds_update, args=(), kwargs={})
13:11:00 [I] jobs job_feeds_update
13:11:00 [D] schedule Running job Job(interval=5, unit=minutes, do=job_send_reminders, args=(), kwargs={})
13:11:00 [I] jobs job_send_reminders
server.py
#Cron Job
#app.route('/feeds/update_feeds')
def update_feeds():
schedule.run_pending()
return 'OK UPDATED FEED!'
#app.route('/reminder/send_reminders')
def send_reminders():
schedule.run_pending()
return 'OK UPDATED STATUS!'
jobs.py
def job_feeds_update():
update_feed()
update_feed_eng()
logger.info("job_feeds_update")
schedule.every(1).minutes.do(job_feeds_update)
# send email reminders
def job_send_reminders():
send_reminders()
logger.info("job_send_reminders")
schedule.every(5).minutes.do(job_send_reminders)
Docker File
FROM alpine:latest
# Install curlt
RUN apk add --no-cache curl
# Copy Scripts to Docker Image
COPY reminders.sh /usr/local/bin/reminders.sh
COPY feeds.sh /usr/local/bin/feeds.sh
RUN echo ' */5 * * * * /usr/local/bin/reminders.sh' >> /etc/crontabs/root
RUN echo ' * * * * * /usr/local/bin/feeds.sh' >> /etc/crontabs/root
# Run crond -f for Foreground
CMD ["/usr/sbin/crond", "-f"]
I think you're running into a couple of issues:
As you suspected, your schedule is on a different schedule/interval than your cron job. They're out of sync (and you can't ever expect them to be in sync for the next reason). From the moment your jobs.py script was executed, that's the starting point from which the schedule counts the intervals.
i.e. if you're running something every minute but the jobs.py script starts at 30 seconds into the current minute (i.e. 01:00:30 - 1:00am 30 seconds past), then the scheduler will run the job at 1:01:30, then 1:02:30, then 1:03:30 and so on.
Schedule doesn't guarantee you precise frequency execution. When the scheduler runs a job, the job execution time is not taken into account. So if you schedule something like your feeds/reminders jobs, it could take a little bit to process. Once it's finished running, the scheduler decides that the next job will only run 1 minute after the end of the previous job. This means your execution time can throw off the schedule.
Try running this example in a python script to see what I'm talking about
# Schedule Library imported
import schedule
import time
from datetime import datetime
def geeks():
now = datetime.now() # current date and time
date_time = now.strftime("%m/%d/%Y, %H:%M:%S")
time.sleep(5)
print(date_time + "- Look at the timestamp")
geeks();
# Task scheduling
# After every 10mins geeks() is called.
schedule.every(1).seconds.do(geeks)
# Loop so that the scheduling task
# keeps on running all time.
while True:
# Checks whether a scheduled task
# is pending to run or not
schedule.run_pending()
time.sleep(0.1)
We've scheduled the geeks function to run every second. But if you look at the geeks function, I've added a time.sleep(5) to pretend that there may be some blocking API call here that can take 5 seconds. Then observe the timestamps logged - you'll notice they're not always consistent with the schedule we originally wanted!
Now onto how your cron job and scheduler are out of sync
Look at the following logs:
13:07:00 [D] schedule Running job Job(interval=1, unit=minutes, do=job_feeds_update, args=(), kwargs={})
13:07:00 [I] jobs job_feeds_update
13:07:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:07:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
# minute 8 doesn't trigger the schedule for feeds
13:09:00 [D] schedule Running job Job(interval=1, unit=minutes, do=job_feeds_update, args=(), kwargs={})
13:09:00 [I] jobs job_feeds_update
13:09:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:09:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
What's likely happening here is as follows:
at 13:07:00, your cron sends the request to feed items
at 13:07:00, the job schedule has a pending job for feed items
at 13:07:00:, the job finishes and schedule decides the next job can only run after 1 minute from now, which is roughly ~13:08:01 (note the 01, this is to account for milliseconds/timing of job executions, which lets assume it took 1 second to run the feed items update)
at 13:08:00, your cron job triggers the request asking schedule run_pending jobs.
at 13:08:00 however, there are no pending jobs to run because the next time feed items can run is 13:08:01 which is not right now.
at 13:09:00, your cron tab triggers the request again
at 13:09:00, there is a pending job available that should've run at 13:08:01 so that gets executed now.
I hope this illustrates the issue you're running into being out of sync between cron and schedule. This issue will become worse in a production environment. You can read more about Parallel execution for schedule as a means to keep things off the main thread but that will only go so far. Let's talk about...
Possible Solutions
Use run_all from schedule instead of run_pending to force jobs to trigger, regardless of when they're actually scheduled for.
But if you think about it, this is no different than simply calling job_feeds_update straight from your API route itself. This isn't a bad idea by itself but it's still not super clean as it will block the main thread of your API server until the job_feeds_update is complete, which might not be ideal if you have other routes that users need.
You could combine this with the next suggestion:
Use a jobqueue and threads
Check out the second example on the Parallel Execution page of schedule's docs. It shows you how to use a jobqueue and threads to offload jobs.
Because you run schedule.run_pending(), your main thread in your server is blocked until the jobs run. By using threads (+ the job queue), you can keep scheduling jobs in the queue + avoid blocking the main server with your jobs. This should optimize things a little bit further for you by letting jobs continue to be scheduled.
Use ischedule instead as it takes into account the job execution time and provides precise schedules: https://pypi.org/project/ischedule/. This might be the simplest solution for you in case 1+2 end up being a headache!
Don't use schedule and simply have your cron jobs hit a route that just runs the actual function (so basically counter to the advice of using 1+2 above). Problem with this is that if your functions take longer than a minute to run for feed updates, you may have multiple overlapping cron jobs running at the same time doing feed updates. So I'd recommend not doing this and relying on a mechanism to queue/schedule your requests with threads and jobs. Only mentioning this as a potential scenario of what else you could do.

Can I mark an Airflow SSHOperator task Execution_timout as Success

I have tasks in dag that executes bash scripts over ssh on more than one machine. I have configured execution_timeouts for these tasks as well. But when the execution_timeout occurs it marks the current task as failed and marks the next tasks as upsteam_failed. Hence one failed task causes other tasks to fail as well.
I have already used AirflowTaskTimeout with PythonOperator and it works just fine, i have pasted the example below. But for SSHOperator I can't call a python function to execute bash scripts on a remote machine and catch exceptions.
def func():
try:
time.sleep(40)
except AirflowTaskTimeout:
return 'success'
t1 = PythonOperator(
task_id="task1",
python_callable=func,
execution_timeout=timedelta(seconds=30),
dag=dag
)
t3 = SSHOperator(
task_id='task3',
ssh_hook=sshHook5,
command="sleep 40",
execution_timeout=timedelta(seconds=30),
dag=dag
)
I want to know is there a workaround (in case of SSHOperator/BashOperator) that i can use to mark a timed out task as successful.

How can I prevent my bot to fall asleep/idiling on Heroku!?, Cronjob is not executing after bot starts polling?

hope yall doing well.
I have a telegram bot that I deployed on Heroku, but the bot fall asleep after 20-30 minutes,
because I'm using Heroku's free dyno, I tried to prevent this by creating a cronjob which
only prints something in the console to keep the bot awake.
As you can see below I have 2 functions, start_polling & cronjob
but since I execute the start_polling first, cronjob won't execute
Is there any trick that I can use here to prevent my bot fall asleep!?
import os
import django
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'telega.settings')
django.setup()
from main.bot import bot
from crontab import CronTab
from datetime import datetime
from apscheduler.schedulers.blocking import BlockingScheduler
def cronjob():
""" Main cron job to prevent bot fall asleep. """
print("Cron job is running, bot wont fall asleep")
print("Tick! The time is: %s" % datetime.now())
def start_polling():
""" Starts the bot """
try:
bot.skip_pending = True
print(f'Bot {bot.get_me().username} started')
bot.polling()
except Exception as e:
print(e)
# 1. Start polling
start_polling()
# 2. Start the scheduler ==> Prevent bot to fall asleep
scheduler = BlockingScheduler()
scheduler.add_job(cronjob, "interval", seconds=300)
scheduler.start()
A Web dyno will go to sleep after 30 min without incoming HTTP requests. You cannot prevent that in any way (i.e. running some background code).
You have 2 options:
keep it alive by executing a request (every x min) from an external schedule, like Kaffeine
convert the Dyno to worker. If you don't need to receive incoming requests (for example you Bot is only polling) this is a good solution: the Dyno will not sleep and you won't need to use external tools.

Restarting The Jobs In ApScheduler Python When the Wsgi Server is Restarted

I'm using python Apscheduler to schedule my jobs. All my jobs are stored as a cron job and use the BackgroundScheduler. I've the following codes:
def letschedule():
jobstores = {
'default': SQLAlchemyJobStore(url=app_jobs_store)
}
executors = {
'default': ThreadPoolExecutor(20),
'processpool': ProcessPoolExecutor(5)
}
job_defaults = {
'coalesce': False,
'max_instances': 1,
'misfire_grace_time':1200
}
scheduler = BackgroundScheduler(jobstores=jobstores, executors=executors, job_defaults=job_defaults, timezone=utc)
#jobstores=jobstores, executors=executors, job_defaults=job_defaults, timezone=utc
return scheduler
And I start the job scheduler as follow in the app:
sch = letschedule()
sch.start()
log.info('the scheduler started')
And I've the following add job function.
def addjobs():
jobs = []
try:
sch.add_job(forecast_jobs, 'cron', day_of_week=os.environ.get("FORECAST_WEEKOFDAY"),
id="forecast",
replace_existing=False,week='1-53',hour=os.environ.get("FORECAST_HOUR"),
minute=os.environ.get("FORECAST_MINUTE"), timezone='UTC')
jobs.append({'job_id':'forecast', 'type':'weekly'})
log.info('the forecast added to the scheduler')
except BaseException as e:
log.info(e)
pass
try:
sch.add_job(insertcwhstock, 'cron',
id="cwhstock_data", day_of_week='0-6', replace_existing=False,hour=os.environ.get("CWHSTOCK_HOUR"),
minute=os.environ.get("CWHSTOCK_MINUTE"),
week='1-53',timezone='UTC')
jobs.append({'job_id':'cwhstock_data', 'type':'daily'})
log.info('the cwhstock job added to the scheduler')
except BaseException as e:
log.info(e)
pass
return json.dumps({'data':jobs})
I use this in the flask application, I call the /activatejobs and the jobs are added to the scheduler and it works fine. However when I restart the wsgi server, the jobs aren't started again, I've to remove the .sqlite file and add the jobs again. What I want is the jobs are supposed to be restarted automatically once the scheduler is started (if there are already jobs in the database.)
I tried to get such result trying some ways, but couldn't. Any help would be greatly appreciated. Thanks in Advance.
I also had the same problem using FastApi framework. I could solve the problem after add this code to my app.py:
scheduler = BackgroundScheduler()
pg_job_store = SQLAlchemyJobStore(engine=my_engine)
scheduler.add_jobstore(jobstore=pg_job_store, alias='sqlalchemy')
scheduler.start()
Adding this code, after I restart the application server I could see apscheduler logs searching for jobs:
2021-10-20 14:37:53,433 - apscheduler.scheduler - INFO => Scheduler started
2021-10-20 14:37:53,433 - apscheduler.scheduler - DEBUG => Looking for jobs to run
Jobstore default:
No scheduled jobs
Jobstore sqlalchemy:
remove_from_db_job (trigger: date[2021-10-20 14:38:00 -03], next run at: 2021-10-20 14:38:00 -03)
2021-10-20 14:37:53,443 - apscheduler.scheduler - DEBUG => Next wakeup is due at 2021-10-20 14:38:00-03:00 (in 6.565892 seconds)
It works for me.

simple celery test with Print doesn't go to Terminal

EDIT 1:
Actually, print statements outputs to the Celery terminal, instead of the terminal where the python program is ran - as #PatrickAllen indicated
OP
I've recently started to use Celery, but can't even get a simple test going where I print a line to the terminal after a 30 second wait.
In my tasks.py:
from celery import Celery
celery = Celery(__name__, broker='amqp://guest#localhost//', backend='amqp://guest#localhost//')
#celery.task
def test_message():
print ("schedule task says hello")
in the main module for my package, I have:
import tasks.py
if __name__ == '__main__':
<do something>
tasks.test_message.apply_async(countdown=30)
I run it from terminal:
celery -A tasks worker --loglevel=info
Task is ran correctly, but nothing on the terminal of the main program. Celery output:
[2016-03-06 17:49:46,890: INFO/MainProcess] Received task: tasks.test_message[4282fa1a-8b2f-4fa2-82be-d8f90288b6e2] eta:[2016-03-06 06:50:16.785896+00:00]
[2016-03-06 17:50:17,890: WARNING/Worker-2] schedule task says hello
[2016-03-06 17:50:17,892: WARNING/Worker-2] The client is not currently connected.
[2016-03-06 17:50:18,076: INFO/MainProcess] Task tasks.test_message[4282fa1a-8b2f-4fa2-82be-d8f90288b6e2] succeeded in 0.18711688100120227s: None

Categories