apscheduler doesn't work normally - python

this is the code:
#coding=utf-8
from apscheduler.scheduler import Scheduler
import logging
logging.basicConfig(filename='/tmp/log', level=logging.DEBUG,
format='%(levelname)s[%(asctime)s]: %(message)s')
sched = Scheduler()
sched.start()
##sched.interval_schedule(seconds=3)
def job_function():
logging.debug('hello world')
sched.add_interval_job(job_function, seconds=3)
if i switch to decorator, still doesn't work. the log is like this:
DEBUG[2011-10-09 11:02:45,175]: Looking for jobs to run
DEBUG[2011-10-09 11:02:45,176]: No jobs; waiting until a job is added
INFO[2011-10-09 11:02:45,176]: Added job "job_function (trigger: interval[0:00:03], next run at: 2011-10-09 11:02:48.176444)" to job store "default"
INFO[2011-10-09 11:02:45,177]: Shutting down thread pool
the job job_function is added, but is nerver triggered, why?

If this is all your code, then it's clear why it's not working -- the application exits before the job is scheduled to be executed. See the examples provided at https://bitbucket.org/agronholm/apscheduler/src/tip/examples .

As mentioned in the documentation, if you want the Scheduler to block, you need to set the standalone flag to True.
s = Scheduler(standalone=True)
<add jobs here>
s.start()
Make sure you add signal handlers or catch interrupt exceptions :-)

Related

Python: logging object loses its file handler when passed to an RQ task queue

Problem
I pass a logging (logger) object, supposed to add lines to test.log, to a function background_task() that is run by the rq utility (task queues manager). logger has a FileHandler assigned to it to allow logging to test.log. Until background_task() is run, you can see the file handler present in logger.handlers, but when the logger is passed to background_task() and background_task() is run by rq worker, logger.handlers gets empty.
But if I ditch rq (and Redis) and just run background_task right away, the content of logger.handlers is preserved. So, it has something to do with rq (and, probably, task queuing in general, it's a new topic for me).
Steps to reproduce
Run add_job.py: python3 add_job.py. You'll see the output of print(logger.handlers) called from within add_job(): there will be a handlers list containing FileHandler added in get_job_logger().
Run command rq worker to start executing the queued task. You'll see the output of print(logger.handlers) once again but this time called from within background_task() and the list will be empty! Handlers of the logging (logger) object somehow get lost when the function that accepts a logger as an argument is run by rq (rq worker). What gives?
Here's how it looks like in the terminal:
$ python3 add_job.py
[<FileHandler /home/user/project/test.log (INFO)>]
$ rq worker
17:44:45 Worker rq:worker:2bbad3623e95438f81396c662cb01284: started, version 1.10.1
17:44:45 Subscribing to channel rq:pubsub:2bbad3623e95438f81396c662cb01284
17:44:45 *** Listening on default...
17:44:45 default: tasks.background_task(<RootLogger root (INFO)>) (5a5301be-efc3-49a7-ab0c-f7cf0a4bd3e5)
[]
Source code
add_job.py
import logging
from logging import FileHandler
from redis import Redis
from rq import Queue
from tasks import background_task
def add_job():
r = Redis()
qu = Queue(connection=r)
logger = get_job_logger()
print(logger.handlers)
job = qu.enqueue(background_task, logger)
def get_job_logger():
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger_file_handler = FileHandler('test.log')
logger_file_handler.setLevel(logging.INFO)
logger.addHandler(logger_file_handler)
return logger
if __name__ == '__main__':
add_job()
tasks.py
def background_task(logger):
print(logger.handlers)
Answered here.
FileHandler does not get carried over into other threads. You start the FileHandler in the main thread and rq worker starts other threads. Memory is not shared like that.
Hm, I see... Thanks!
I assumed the FileHandler was being serialized or whatnot when written to Redis as a part of the whole logger object and then reinitialized when popping out of the queue.
Anyway, I'll try passing a file path to the function and initialize a logger from within. That way, keeping the FileHandler object to one thread.
EDIT: yeah, it works

How to redirect logs from secondary threads in Azure Functions using Python

I am using Azure functions to run a Python script that launches multiple threads (for performance reasons). Everything is working as expected, except for the fact that only the info logs from the main() thread appear on the Azure Functions log.
All the logs that I am using in the "secondary" threads that I start in main() do not appear in the Azure Functions logs.
Is there a way to ensure that the logs from the secondary threads show on the Azure Functions log?
The modules that I am using are "logging" and "threading".
I am using Python 3.6; I have already tried to lower the logging level in the secondary threads, but this did not help unfortunately.
The various secondary thread functions are in different modules.
My function has a structure similar to the following pseudo-code:
def main()->None:
logging.basicConfig(level=logging.INFO)
logging.info("Starting the process...")
thread1 = threading.Thread(target=foo,args=("one arg",))
thread2 = threading.Thread(target=foo,args=("another arg",))
thread3 = threading.Thread(target=foo,args=("yet another arg",))
thread1.start()
thread2.start()
thread3.start()
logging.info("All threads started successfully!")
return
# in another module
def foo(st:str)->None:
logging.basicConfig(level=logging.INFO)
logging.info(f"Starting thread for arg {st}")
The current Azure log output is:
INFO: Starting the process...
INFO: "All threads started successfully!"
I would like it to be something like:
INFO: Starting the process...
INFO: Starting thread for arg one arg
INFO: Starting thread for arg another arg
INFO: Starting thread for arg yet another arg
INFO: All threads started successfully!
(of course the order of the secondary threads could be anything)
Azure functions Python worker framework sets AsyncLoggingHandler as a handler to the root logger. From this handler to its destination it seems logs are filtered along the path by an invocation_id.
An invocation_id is set if the framework starts threads itself, as it does for the main sync function. On the other hand if we start threads ourselves from the main function, we must set the invocation_id in the started thread for the logs to reach its destination.
This azure_functions_worker.dispatcher.get_current_invocation_id function checks if the current thread has a running event loop. If no running loop is found, it just checks azure_functions_worker.dispatcher._invocation_id_local, which is thread local storage, for an attribute named v for the value of invocation_id.
Because the threads we start doesn't have a running event loop, we have to get invocation_id from the context and set it on azure_functions_worker.dispatcher._invocation_id_local.v in every thread we start.
The invocation_id is made available by the framework in context parameter of main function.
Tested it on Ubuntu 18.04, azure-functions-core-tools-4 and Python 3.8.
import sys
import azure.functions as func
import logging
import threading
# import thread local storage
from azure_functions_worker.dispatcher import (
_invocation_id_local as tls,
)
def main(req: func.HttpRequest, context: func.Context) -> func.HttpResponse:
logging.info("Starting the process...")
thread1 = threading.Thread(
target=foo,
args=(
context,
"one arg",
),
)
thread2 = threading.Thread(
target=foo,
args=(
context,
"another arg",
),
)
thread3 = threading.Thread(
target=foo,
args=(
context,
"yet another arg",
),
)
thread1.start()
thread2.start()
thread3.start()
logging.info("All threads started successfully!")
name = req.params.get("name")
if not name:
try:
req_body = req.get_json()
except ValueError:
pass
else:
name = req_body.get("name")
if name:
return func.HttpResponse(
f"Hello, {name}. This HTTP triggered function executed successfully."
)
else:
return func.HttpResponse(
"This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response.",
status_code=200,
)
# in another module
def foo(context, st: str) -> None:
# invocation_id_local = sys.modules[
# "azure_functions_worker.dispatcher"
# ]._invocation_id_local
# invocation_id_local.v = context.invocation_id
tls.v = context.invocation_id
logging.info(f"Starting thread for arg {st}")
https://github.com/Azure/azure-functions-python-worker/blob/81b84102dc14b7d209ad7e00be68f25c37987c1e/azure_functions_worker/dispatcher.py
This must be something in your Azure setup: in a non-Azure setup, it works as expected. You should add join() calls for your threads. And basicConfig() should be called only once, from a main entry point.
Are your threads I/O bound? Due to the GIL, having multiple compute-bound threads doesn't give your code any performance advantages. It might be better to structure your code around concurrent.futures.ProcessPoolExecutor or multiprocessing.
Here is a Repl which shows a slightly modified version of your code working as expected.
I may be wrong but I suspect azure to run your main function in a daemon thread.
Quoting https://docs.python.org/3/library/threading.html: The entire Python program exits when no alive non-daemon threads are left.
When not setting daemon in the Thread constructor, it reuses the value of the father thread.
You can check this is your issue by printing thread1.daemon before starting your childs threads.
Anyway, I can reproduce the issue on my pc writing (without any Azure, just plain python3):
def main():
logging.basicConfig(level=logging.INFO)
logging.info("Starting the process...")
thread1 = threading.Thread(target=foo,args=("one arg",),daemon=True)
thread2 = threading.Thread(target=foo,args=("another arg",),daemon=True)
thread3 = threading.Thread(target=foo,args=("yet another arg",),daemon=True)
thread1.start()
thread2.start()
thread3.start()
logging.info("All threads started successfully!")
return
def foo(st):
for i in range(2000): # Giving a bit a of time for race condition to happen
print ('tamere', file = open('/dev/null','w'))
logging.basicConfig(level=logging.INFO)
logging.info(f"Starting thread for arg {st}")
main()
If I force daemon to False / leave it undefined, it work. Thus I guess your issue is that azure start your main function in a daemon thread, and since you don't override daemon flag to False, the whole process exit instantly.
PD: I know nothing about Azure, there is a possibility that you are indeed trying to do something the wrong way and there is another interface to do exactly what you want but in the way Azure expect you to. So this answer is potentially just an explanation of what happens rather than real guidance.
Azure functions is an async environment.
If you define an async def, it'll be run with asyncio.
Otherwise it'll be run with concurrent.futures.ThreadPoolExecutor.
It's better to define your functions async.
Threading works. You don't need to start threads manually. Thread pool executes your blocking code. You have to make it work for you.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-app-settings#python_threadpool_thread_count

Can we add jobs to running schedular in ApSchedular

I started Background scheduler in one file and ran it. Then from other file I accessed scheduler instance and added job. What my thought was the instance will add the job and it will run. I am new to this scheduling mechanisms. What I did is
On one file Main.py
import time
from apscheduler.schedulers.background import BackgroundScheduler
class Main:
a = 2
sched = BackgroundScheduler()
sched.start()
while True:
time.sleep(5)
From other file Bm.py
from Main import Main
class Bm(Main) :
def timed_job():
print 'aa'
Main.sched.add_job(timed_job,'interval',seconds=1)
I thought this would do, but it didnot.I need to this way from seperate file because I need to make a task manager which would run jobs and I need to be able to add or remove jobs anytime needed.SO how can we add and remove jobs to/from running apscheduler?
UPDATE :
This is confusing. I added a function printme on Main.py and did sched.add_job(printme,'interval',seconds=5), it prints me as expected but when I run Bm.py it also prints me, when it was supposed to print aa
def printme():
print 'me'
while True:
# time.sleep(5)
sched.add_job(printme,'interval',seconds=5)
if (input() is 'q'):
sched.shutdown()

Python - Apscheduler not stopping a job even after using 'remove_job'

This is my code
I'm using the remove_job and the shutdown functions of the scheduler to stop a job, but it keeps on executing.
What is the correct way to stop a job from executing any further?
from apscheduler.schedulers.background import BlockingScheduler
def job_function():
print "job executing"
scheduler = BlockingScheduler(standalone=True)
scheduler.add_job(job_function, 'interval', seconds=1, id='my_job_id')
scheduler.start()
scheduler.remove_job('my_job_id')
scheduler.shutdown()
Simply ask the scheduler to remove the job inside the job_function using the remove_function as #Akshay Pratap Singh Pointed out correctly, that the control never returns back to start()
from apscheduler.schedulers.background import BlockingScheduler
count = 0
def job_function():
print "job executing"
global count, scheduler
# Execute the job till the count of 5
count = count + 1
if count == 5:
scheduler.remove_job('my_job_id')
scheduler = BlockingScheduler()
scheduler.add_job(job_function, 'interval', seconds=1, id='my_job_id')
scheduler.start()
As you are using BlockingScheduler , so first you know it's nature.
So, basically BlockingScheduler is a scheduler which runs in foreground(i.e start() will block the program).In laymen terms, It runs in the foreground, so when you call start(), the call never returns. That's why all lines which are followed by start() are never called, due to which your scheduler never stopped.
BlockingScheduler can be useful if you want to use APScheduler as a standalone scheduler (e.g. to build a daemon).
Solution
If you want to stop your scheduler after running some code, then you should opt for other types of scheduler listed in ApScheduler docs.
I recommend BackgroundScheduler, if you want the scheduler to run in the background inside your application/program which you can pause, resume and remove at anytime, when you need it.
The scheduler needs to be stopped from another thread. The thread in which scheduler.start() is called gets blocked by the scheduler. The lines that you've written after scheduler.start() is unreachable code.
This is how I solved the problem. Pay attention to the position where the code schedule.shutdown() is located!
def do_something():
global schedule
print("schedule execute")
# schedule.remove_job(id='rebate')
schedule.shutdown(wait=False)
if __name__ == '__main__':
global schedule
schedule = BlockingScheduler()
schedule.add_job(do_something, 'cron', id='rebate', month=12, day=5, hour=17, minute=47, second=35)
schedule.start()
print('over')

How do I schedule an interval job with APScheduler?

I'm trying to schedule an interval job with APScheduler (v3.0.0).
I've tried:
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()
def my_interval_job():
print 'Hello World!'
sched.add_job(my_interval_job, 'interval', seconds=5)
sched.start()
and
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()
#sched.scheduled_job('interval', id='my_job_id', seconds=5)
def my_interval_job():
print 'Hello World!'
sched.start()
Either should work according to the docs, but the job never fires...
UPDATE:
It turns out there was something else, environment-related, preventing the task from running. This morning, the task is working fine without any modifications to the code from yesterday.
UPDATE 2:
After further testing, I've found that 'interval' jobs seem to be generally flaky... The above code now works in my dev environment, but not when I deploy to a staging env (I'm using a heroku app for staging). I have other apscheduler 'cron' jobs that work just fine in the staging/production envs.
When I turn on DEBUG logging for the "apscheduler.schedulers" logger, the log indicates that the interval job is added:
Added job "my_cron_job1" to job store "default"
Added job "my_cron_job2" to job store "default"
Added job "my_interval_job" to job store "default"
Scheduler started
Adding job tentatively -- it will be properly scheduled when the scheduler starts
Adding job tentatively -- it will be properly scheduled when the scheduler starts
Looking for jobs to run
Next wakeup is due at 2015-03-24 15:05:00-07:00 (in 254.210542 seconds)
How can the next wakeup be due 254 seconds from now when the interval job is set to 5 seconds??
You need to keep the thread alive. Here is a example of how I used it.
from subprocess import call
import time
import os
from pytz import utc
from apscheduler.schedulers.background import BackgroundScheduler
def job():
print("In job")
call(['python', 'scheduler/main.py'])
if __name__ == '__main__':
scheduler = BackgroundScheduler()
scheduler.configure(timezone=utc)
scheduler.add_job(job, 'interval', seconds=10)
scheduler.start()
print('Press Ctrl+{0} to exit'.format('Break' if os.name == 'nt' else 'C'))
try:
# This is here to simulate application activity (which keeps the main thread alive).
while True:
time.sleep(5)
except (KeyboardInterrupt, SystemExit):
# Not strictly necessary if daemonic mode is enabled but should be done if possible
scheduler.shutdown()
I haven't figured out what caused the original issue, but I got around it by swapping the order in which the jobs are scheduled, so that the 'interval' job is scheduled BEFORE the 'cron' jobs.
i.e. I switched from this:
def my_cron_job1():
print "cron job 1"
def my_cron_job2():
print "cron job 2"
def my_interval_job():
print "interval job"
if __name__ == '__main__':
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler(timezone='MST')
sched.add_job(my_cron_job1, 'cron', id='my_cron_job1', minute=10)
sched.add_job(my_cron_job2, 'cron', id='my_cron_job2', minute=20)
sched.add_job(my_interval_job, 'interval', id='my_job_id', seconds=5)
to this:
def my_cron_job1():
print "cron job 1"
def my_cron_job2():
print "cron job 2"
def my_interval_job():
print "interval job"
if __name__ == '__main__':
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler(timezone='MST')
sched.add_job(my_interval_job, 'interval', id='my_job_id', seconds=5)
sched.add_job(my_cron_job1, 'cron', id='my_cron_job1', minute=10)
sched.add_job(my_cron_job2, 'cron', id='my_cron_job2', minute=20)
and now both the cron jobs and the interval jobs run without a problem in both environments.
How can the next wakeup be due 254 seconds from now when the interval
job is set to 5 seconds??
It's simple:
you have many pending executions as your most of the jobs didn't completed in the interval-window of time.
You could use the following parameters in order to sort this out:
**misfire_grace_time**: Maximum time in seconds for the job execution to be allowed to delay before it is considered a misfire
**coalesce**: Roll several pending executions of jobs into one
To read more, check the documentation here.
The documentation had an error there. I've fixed it now.
That first line should be:
from apscheduler.schedulers.blocking import BlockingScheduler
It would've raised an ImportError though, but you didn't mention any.
Did you try any of the provided examples?
Ok, I've looked at the updated question.
The reason you're having problems may be that you could be using the wrong timezone. Your country is currently using daylight saving time in most locations, so the correct timezone would probably be MDT (Mountain Daylight Time). But that will break again when you move back to standard time. So I advise you to use a timezone like "America/Denver". That will take care of the DST switches.
Question: Are you using CentOS? So far it's the only known operating system where automatic detection of the local timezone is impossible.

Categories