APScheduler task not firing due to eventlet monkey_patch - python

I have some python code in which an APScheduler job is not firing. As context, I also have a handler that is looking a directory for file modifications in addition using eventlet/GreenPool to do multi-threading. Based on some troubleshooting, it seems like there's some sort of conflict between APScheduler and eventlet.
My output looks as follows:
2016-12-26 02:30:30 UTC (+0000): Finished Download Pass
2016-12-26 02:46:07 UTC (+0000): EXITING due to control-C or other exit signal
Jobstore default:
Time-Activated Download (trigger: interval[0:05:00], next run at: 2016-12-25 18:35:00 PST)
2016-12-26 02:46:07 UTC (+0000): 1
(18:35 PST = 02:35 UTC)...so it should have fired 11 minutes before I pressed control-C
from apscheduler import events ## pip install apscheduler
from apscheduler.schedulers.background import BackgroundScheduler
# Threading
from eventlet import patcher, GreenPool ## pip install eventlet
patcher.monkey_patch(all = True)
def setSchedule(scheduler, cfg, minutes = 60*2, hours = 0):
"""Set up the schedule of how frequently a download should be attempted.
scheduler object must already be declared.
will accept either minutes or hours for the period between downloads"""
if hours > 0:
minutes = 60*hours if minutes == 60 else 60*hours+minutes
handle = scheduler.add_job(processAllQueues,
trigger='interval',
kwargs={'cfg': cfg},
id='RQmain',
name='Time-Activated Download',
coalesce=True,
max_instances=1,
minutes=minutes,
start_date=dt.datetime.strptime('2016-10-10 00:15:00', '%Y-%m-%d %H:%M:%S') # computer's local time
)
return handle
def processAllQueues(cfg):
SQSpool = GreenPool(size=int(cfg.get('GLOBAL','Max_AWS_Connections')))
FHpool = GreenPool(size=int(cfg.get('GLOBAL','Max_Raw_File_Process')))
arSects = []
dGlobal = dict(cfg.items('GLOBAL'))
for sect in filter(lambda x: iz.notEqualz(x,'GLOBAL','RUNTIME'),cfg.sections()):
dSect = dict(cfg.items(sect)) # changes all key names to lowercase
n = dSect['sqs_queue_name']
nn = dSect['node_name']
fnbase = "{}_{}".format(nn,n)
dSect["no_ext_file_name"] = os.path.normpath(os.path.join(cfg.get('RUNTIME','Data_Directory'),fnbase))
arSects.append(mergeTwoDicts(dGlobal,dSect)) # section overrides global
arRes = []
for (que_data,spec_section) in SQSpool.imap(doQueueDownload,arSects):
if que_data: fileResult = FHpool.spawn(outputQueueToFiles,spec_section,que_data).wait()
else: fileResult = (False,spec_section['sqs_queue_name'])
arRes.append(fileResult)
SQSpool.waitall()
FHpool.waitall()
pr.ts_print("Finished Download Pass")
return None
def main():
cfgglob = readConfigs(cfgdir, datdir)
sched = BackgroundScheduler()
cron_job = setSchedule(sched, cfgglob, 5)
sched.start(paused=True)
try:
change_handle = win32file.FindFirstChangeNotification(cfgdir, 0, win32con.FILE_NOTIFY_CHANGE_FILE_NAME | win32con.FILE_NOTIFY_CHANGE_LAST_WRITE)
processAllQueues(cfgglob)
sched.resume() # turn the scheduler back on and monitor both wallclock and config directory.
cron_job.resume()
while 1:
SkipDownload = False
result = win32event.WaitForSingleObject(change_handle, 500)
if result == win32con.WAIT_OBJECT_0: # If the WaitForSO returned because of a notification rather than error/timing out
sched.pause() # make sure we don't run the job as a result of timestamp AND file modification
while 1:
try:
win32file.FindNextChangeNotification(change_handle) # rearm - done at start because of the loop structure here
cfgglob = None
cfgglob = readConfigs(cfgdir,datdir)
cron_job.modify(kwargs={'cfg': cfgglob}) # job_id="RQmain",
change_handle = win32file.FindFirstChangeNotification(cfgdir, 0, win32con.FILE_NOTIFY_CHANGE_FILE_NAME | win32con.FILE_NOTIFY_CHANGE_LAST_WRITE) # refresh handle
if not SkipDownload: processAllQueues(cfgglob)
sched.resume()
cron_job.resume()
break
except KeyboardInterrupt:
if VERBOSE | DEBUG: pr.ts_print("EXITING due to control-C or other exit signal")
finally:
sched.print_jobs()
pr.ts_print(sched.state)
sched.shutdown(wait=False)
If I comment out most of the processAllQueues function along with the eventlet includes at top, it fires appropriately. If I keep the
from eventlet import patcher, GreenPool ## pip install eventlet
patcher.monkey_patch(all = True)
but comment out processAllQueues up to the print line in the second-to-last line, it fails to fire the APScheduler, indicating that there's either a problem with importing patcher and GreenPool or with the monkey_patch statement. Commenting out the patcher.monkey_patch(all = True) makes it "work" again.
Does anyone know what an alternate monkey_patch statement would be that would work in my circumstances?

You have an explicit event loop watching for file changes. That blocks eventlet event loop from running. You have two options:
Wrap blocking calls (such as win32event.WaitForSingleObject()) in eventlet.tpool.execute()
Run eventlet.sleep() before/after blocking calls and make sure you don't block for too long.
eventlet.monkey_patch(thread=False) is shorter alternative to listing every other module as true. Generally you want thread=True when using locks or thread-local storage or threading API to spawn green threads. You may want thread=False if you genuinely use OS threads, like for funny GUI frameworks.
You shouldn't really consider Eventlet on Windows for running important projects. Performance is much inferior against POSIX. I didn't run tests on Windows since 0.17. It's rather for ease of development on popular desktop platform.

Related

Python / rq - How to pass information from the caller to the worker?

I want to use rq to run tasks on a separate worker to gather data from a measuring instrument. The end of the task will be signaled by a user pressing a button on a dash app.
The problem is that the task itself does not know when to terminate since it doesn't have access to the dash app's context.
I already use meta to pass information from the worker back to the caller but can I pass information from the caller to the worker?
Example task:
from rq import get_current_job
from time import time
def mock_measurement():
job = get_current_job()
t_start = time()
# Run the measurement
t = []
i = []
job.meta['should_stop'] = False # I want to use this tag to tell the job to stop
while not job.meta['should_stop']:
t.append(time() - t_start)
i.append(np.random.random())
job.meta['data'] = (t, i)
job.save_meta()
sleep(5)
print("Job Finished")
From the console, I can start a job as such
queue = rq.Queue('test-app', connection=Redis('localhost', 6379))
job = queue.enqueue('tasks.mock_measurement')
and I would like to be able to do this from the console to signify to the worker it can stop running:
job.meta['should_stop'] = True
job.save_meta()
job.refresh
However, while the commands above return without an error, they do not actually update the meta dictionary.
Because you didn't fetch the updated meta. But, don't do this!!
Invoking save_meta and refresh in caller and worker will lose data.
Instead, Use job.connection.set(job + ':should_stop', 1, ex=300) to set flag, and use job.connection.get(job + ':should_stop') to check if flag is set.

How to prevent Python from opening too many files while threading? [duplicate]

I want to repeatedly execute a function in Python every 60 seconds forever (just like an NSTimer in Objective C or setTimeout in JS). This code will run as a daemon and is effectively like calling the python script every minute using a cron, but without requiring that to be set up by the user.
In this question about a cron implemented in Python, the solution appears to effectively just sleep() for x seconds. I don't need such advanced functionality so perhaps something like this would work
while True:
# Code executed here
time.sleep(60)
Are there any foreseeable problems with this code?
If your program doesn't have a event loop already, use the sched module, which implements a general purpose event scheduler.
import sched, time
def do_something(scheduler):
# schedule the next call first
scheduler.enter(60, 1, do_something, (scheduler,))
print("Doing stuff...")
# then do your stuff
my_scheduler = sched.scheduler(time.time, time.sleep)
my_scheduler.enter(60, 1, do_something, (my_scheduler,))
my_scheduler.run()
If you're already using an event loop library like asyncio, trio, tkinter, PyQt5, gobject, kivy, and many others - just schedule the task using your existing event loop library's methods, instead.
Lock your time loop to the system clock like this:
import time
starttime = time.time()
while True:
print("tick")
time.sleep(60.0 - ((time.time() - starttime) % 60.0))
If you want a non-blocking way to execute your function periodically, instead of a blocking infinite loop I'd use a threaded timer. This way your code can keep running and perform other tasks and still have your function called every n seconds. I use this technique a lot for printing progress info on long, CPU/Disk/Network intensive tasks.
Here's the code I've posted in a similar question, with start() and stop() control:
from threading import Timer
class RepeatedTimer(object):
def __init__(self, interval, function, *args, **kwargs):
self._timer = None
self.interval = interval
self.function = function
self.args = args
self.kwargs = kwargs
self.is_running = False
self.start()
def _run(self):
self.is_running = False
self.start()
self.function(*self.args, **self.kwargs)
def start(self):
if not self.is_running:
self._timer = Timer(self.interval, self._run)
self._timer.start()
self.is_running = True
def stop(self):
self._timer.cancel()
self.is_running = False
Usage:
from time import sleep
def hello(name):
print "Hello %s!" % name
print "starting..."
rt = RepeatedTimer(1, hello, "World") # it auto-starts, no need of rt.start()
try:
sleep(5) # your long-running job goes here...
finally:
rt.stop() # better in a try/finally block to make sure the program ends!
Features:
Standard library only, no external dependencies
start() and stop() are safe to call multiple times even if the timer has already started/stopped
function to be called can have positional and named arguments
You can change interval anytime, it will be effective after next run. Same for args, kwargs and even function!
You might want to consider Twisted which is a Python networking library that implements the Reactor Pattern.
from twisted.internet import task, reactor
timeout = 60.0 # Sixty seconds
def doWork():
#do work here
pass
l = task.LoopingCall(doWork)
l.start(timeout) # call every sixty seconds
reactor.run()
While "while True: sleep(60)" will probably work Twisted probably already implements many of the features that you will eventually need (daemonization, logging or exception handling as pointed out by bobince) and will probably be a more robust solution
Here's an update to the code from MestreLion that avoids drifiting over time.
The RepeatedTimer class here calls the given function every "interval" seconds as requested by the OP; the schedule doesn't depend on how long the function takes to execute. I like this solution since it doesn't have external library dependencies; this is just pure python.
import threading
import time
class RepeatedTimer(object):
def __init__(self, interval, function, *args, **kwargs):
self._timer = None
self.interval = interval
self.function = function
self.args = args
self.kwargs = kwargs
self.is_running = False
self.next_call = time.time()
self.start()
def _run(self):
self.is_running = False
self.start()
self.function(*self.args, **self.kwargs)
def start(self):
if not self.is_running:
self.next_call += self.interval
self._timer = threading.Timer(self.next_call - time.time(), self._run)
self._timer.start()
self.is_running = True
def stop(self):
self._timer.cancel()
self.is_running = False
Sample usage (copied from MestreLion's answer):
from time import sleep
def hello(name):
print "Hello %s!" % name
print "starting..."
rt = RepeatedTimer(1, hello, "World") # it auto-starts, no need of rt.start()
try:
sleep(5) # your long-running job goes here...
finally:
rt.stop() # better in a try/finally block to make sure the program ends!
import time, traceback
def every(delay, task):
next_time = time.time() + delay
while True:
time.sleep(max(0, next_time - time.time()))
try:
task()
except Exception:
traceback.print_exc()
# in production code you might want to have this instead of course:
# logger.exception("Problem while executing repetitive task.")
# skip tasks if we are behind schedule:
next_time += (time.time() - next_time) // delay * delay + delay
def foo():
print("foo", time.time())
every(5, foo)
If you want to do this without blocking your remaining code, you can use this to let it run in its own thread:
import threading
threading.Thread(target=lambda: every(5, foo)).start()
This solution combines several features rarely found combined in the other solutions:
Exception handling: As far as possible on this level, exceptions are handled properly, i. e. get logged for debugging purposes without aborting our program.
No chaining: The common chain-like implementation (for scheduling the next event) you find in many answers is brittle in the aspect that if anything goes wrong within the scheduling mechanism (threading.Timer or whatever), this will terminate the chain. No further executions will happen then, even if the reason of the problem is already fixed. A simple loop and waiting with a simple sleep() is much more robust in comparison.
No drift: My solution keeps an exact track of the times it is supposed to run at. There is no drift depending on the execution time (as in many other solutions).
Skipping: My solution will skip tasks if one execution took too much time (e. g. do X every five seconds, but X took 6 seconds). This is the standard cron behavior (and for a good reason). Many other solutions then simply execute the task several times in a row without any delay. For most cases (e. g. cleanup tasks) this is not wished. If it is wished, simply use next_time += delay instead.
The easier way I believe to be:
import time
def executeSomething():
#code here
time.sleep(60)
while True:
executeSomething()
This way your code is executed, then it waits 60 seconds then it executes again, waits, execute, etc...
No need to complicate things :D
I ended up using the schedule module. The API is nice.
import schedule
import time
def job():
print("I'm working...")
schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)
schedule.every(5).to(10).minutes.do(job)
schedule.every().monday.do(job)
schedule.every().wednesday.at("13:15").do(job)
schedule.every().minute.at(":17").do(job)
while True:
schedule.run_pending()
time.sleep(1)
Alternative flexibility solution is Apscheduler.
pip install apscheduler
from apscheduler.schedulers.background import BlockingScheduler
def print_t():
pass
sched = BlockingScheduler()
sched.add_job(print_t, 'interval', seconds =60) #will do the print_t work for every 60 seconds
sched.start()
Also, apscheduler provides so many schedulers as follow.
BlockingScheduler: use when the scheduler is the only thing running in your process
BackgroundScheduler: use when you’re not using any of the frameworks below, and want the scheduler to run in the background inside your application
AsyncIOScheduler: use if your application uses the asyncio module
GeventScheduler: use if your application uses gevent
TornadoScheduler: use if you’re building a Tornado application
TwistedScheduler: use if you’re building a Twisted application
QtScheduler: use if you’re building a Qt application
I faced a similar problem some time back. May be http://cronus.readthedocs.org might help?
For v0.2, the following snippet works
import cronus.beat as beat
beat.set_rate(2) # run twice per second
while beat.true():
# do some time consuming work here
beat.sleep() # total loop duration would be 0.5 sec
The main difference between that and cron is that an exception will kill the daemon for good. You might want to wrap with an exception catcher and logger.
If drift is not a concern
import threading, time
def print_every_n_seconds(n=2):
while True:
print(time.ctime())
time.sleep(n)
thread = threading.Thread(target=print_every_n_seconds, daemon=True)
thread.start()
Which asynchronously outputs.
#Tue Oct 16 17:29:40 2018
#Tue Oct 16 17:29:42 2018
#Tue Oct 16 17:29:44 2018
If the task being run takes appreciable amount of time, then the interval becomes 2 seconds + task time, so if you need precise scheduling then this is not for you.
Note the daemon=True flag means this thread won't block the app from shutting down. For example, had issue where pytest would hang indefinitely after running tests waiting for this thead to cease.
Simply use
import time
while True:
print("this will run after every 30 sec")
#Your code here
time.sleep(30)
One possible answer:
import time
t=time.time()
while True:
if time.time()-t>10:
#run your task here
t=time.time()
I use Tkinter after() method, which doesn't "steal the game" (like the sched module that was presented earlier), i.e. it allows other things to run in parallel:
import Tkinter
def do_something1():
global n1
n1 += 1
if n1 == 6: # (Optional condition)
print "* do_something1() is done *"; return
# Do your stuff here
# ...
print "do_something1() "+str(n1)
tk.after(1000, do_something1)
def do_something2():
global n2
n2 += 1
if n2 == 6: # (Optional condition)
print "* do_something2() is done *"; return
# Do your stuff here
# ...
print "do_something2() "+str(n2)
tk.after(500, do_something2)
tk = Tkinter.Tk();
n1 = 0; n2 = 0
do_something1()
do_something2()
tk.mainloop()
do_something1() and do_something2() can run in parallel and in whatever interval speed. Here, the 2nd one will be executed twice as fast.Note also that I have used a simple counter as a condition to terminate either function. You can use whatever other contition you like or none if you what a function to run until the program terminates (e.g. a clock).
Here's an adapted version to the code from MestreLion.
In addition to the original function, this code:
1) add first_interval used to fire the timer at a specific time(caller need to calculate the first_interval and pass in)
2) solve a race-condition in original code. In the original code, if control thread failed to cancel the running timer("Stop the timer, and cancel the execution of the timer’s action. This will only work if the timer is still in its waiting stage." quoted from https://docs.python.org/2/library/threading.html), the timer will run endlessly.
class RepeatedTimer(object):
def __init__(self, first_interval, interval, func, *args, **kwargs):
self.timer = None
self.first_interval = first_interval
self.interval = interval
self.func = func
self.args = args
self.kwargs = kwargs
self.running = False
self.is_started = False
def first_start(self):
try:
# no race-condition here because only control thread will call this method
# if already started will not start again
if not self.is_started:
self.is_started = True
self.timer = Timer(self.first_interval, self.run)
self.running = True
self.timer.start()
except Exception as e:
log_print(syslog.LOG_ERR, "timer first_start failed %s %s"%(e.message, traceback.format_exc()))
raise
def run(self):
# if not stopped start again
if self.running:
self.timer = Timer(self.interval, self.run)
self.timer.start()
self.func(*self.args, **self.kwargs)
def stop(self):
# cancel current timer in case failed it's still OK
# if already stopped doesn't matter to stop again
if self.timer:
self.timer.cancel()
self.running = False
Here is another solution without using any extra libaries.
def delay_until(condition_fn, interval_in_sec, timeout_in_sec):
"""Delay using a boolean callable function.
`condition_fn` is invoked every `interval_in_sec` until `timeout_in_sec`.
It can break early if condition is met.
Args:
condition_fn - a callable boolean function
interval_in_sec - wait time between calling `condition_fn`
timeout_in_sec - maximum time to run
Returns: None
"""
start = last_call = time.time()
while time.time() - start < timeout_in_sec:
if (time.time() - last_call) > interval_in_sec:
if condition_fn() is True:
break
last_call = time.time()
I use this to cause 60 events per hour with most events occurring at the same number of seconds after the whole minute:
import math
import time
import random
TICK = 60 # one minute tick size
TICK_TIMING = 59 # execute on 59th second of the tick
TICK_MINIMUM = 30 # minimum catch up tick size when lagging
def set_timing():
now = time.time()
elapsed = now - info['begin']
minutes = math.floor(elapsed/TICK)
tick_elapsed = now - info['completion_time']
if (info['tick']+1) > minutes:
wait = max(0,(TICK_TIMING-(time.time() % TICK)))
print ('standard wait: %.2f' % wait)
time.sleep(wait)
elif tick_elapsed < TICK_MINIMUM:
wait = TICK_MINIMUM-tick_elapsed
print ('minimum wait: %.2f' % wait)
time.sleep(wait)
else:
print ('skip set_timing(); no wait')
drift = ((time.time() - info['begin']) - info['tick']*TICK -
TICK_TIMING + info['begin']%TICK)
print ('drift: %.6f' % drift)
info['tick'] = 0
info['begin'] = time.time()
info['completion_time'] = info['begin'] - TICK
while 1:
set_timing()
print('hello world')
#random real world event
time.sleep(random.random()*TICK_MINIMUM)
info['tick'] += 1
info['completion_time'] = time.time()
Depending upon actual conditions you might get ticks of length:
60,60,62,58,60,60,120,30,30,60,60,60,60,60...etc.
but at the end of 60 minutes you'll have 60 ticks; and most of them will occur at the correct offset to the minute you prefer.
On my system I get typical drift of < 1/20th of a second until need for correction arises.
The advantage of this method is resolution of clock drift; which can cause issues if you're doing things like appending one item per tick and you expect 60 items appended per hour. Failure to account for drift can cause secondary indications like moving averages to consider data too deep into the past resulting in faulty output.
e.g., Display current local time
import datetime
import glib
import logger
def get_local_time():
current_time = datetime.datetime.now().strftime("%H:%M")
logger.info("get_local_time(): %s",current_time)
return str(current_time)
def display_local_time():
logger.info("Current time is: %s", get_local_time())
return True
# call every minute
glib.timeout_add(60*1000, display_local_time)
timed-count can do that to high precision (i.e. < 1 ms) as it's synchronized to the system clock. It won't drift over time and isn't affected by the length of the code execution time (provided that's less than the interval period of course).
A simple, blocking example:
from timed_count import timed_count
for count in timed_count(60):
# Execute code here exactly every 60 seconds
...
You could easily make it non-blocking by running it in a thread:
from threading import Thread
from timed_count import timed_count
def periodic():
for count in timed_count(60):
# Execute code here exactly every 60 seconds
...
thread = Thread(target=periodic)
thread.start()
''' tracking number of times it prints'''
import threading
global timeInterval
count=0
def printit():
threading.Timer(timeInterval, printit).start()
print( "Hello, World!")
global count
count=count+1
print(count)
printit
if __name__ == "__main__":
timeInterval= int(input('Enter Time in Seconds:'))
printit()
I think it depends what you want to do and your question didn't specify lots of details.
For me I want to do an expensive operation in one of my already multithreaded processes. So I have that leader process check the time and only her do the expensive op (checkpointing a deep learning model). To do this I increase the counter to make sure 5 then 10 then 15 seconds have passed to save every 5 seconds (or use modular arithmetic with math.floor):
def print_every_5_seconds_have_passed_exit_eventually():
"""
https://stackoverflow.com/questions/3393612/run-certain-code-every-n-seconds
https://stackoverflow.com/questions/474528/what-is-the-best-way-to-repeatedly-execute-a-function-every-x-seconds
:return:
"""
opts = argparse.Namespace(start=time.time())
next_time_to_print = 0
while True:
current_time_passed = time.time() - opts.start
if current_time_passed >= next_time_to_print:
next_time_to_print += 5
print(f'worked and {current_time_passed=}')
print(f'{current_time_passed % 5=}')
print(f'{math.floor(current_time_passed % 5) == 0}')
starting __main__ at __init__
worked and current_time_passed=0.0001709461212158203
current_time_passed % 5=0.0001709461212158203
True
worked and current_time_passed=5.0
current_time_passed % 5=0.0
True
worked and current_time_passed=10.0
current_time_passed % 5=0.0
True
worked and current_time_passed=15.0
current_time_passed % 5=0.0
True
To me the check of the if statement is what I need. Having threads, schedulers in my already complicated multiprocessing multi-gpu code is not a complexity I want to add if I can avoid it and it seems I can. Checking the worker id is easy to make sure only 1 process is doing this.
Note I used the True print statements to really make sure the modular arithemtic trick worked since checking for exact time is obviously not going to work! But to my pleasant surprised the floor did the trick.

Python BackgroundScheduler program crashing when ran from another module

I am trying to build a application that will run a bash script every 10 minutes. I am using apscheduler to accomplish this and when i run my code from terminal it works like clock work. However when i try to run the code from another module it crashes i suspect that the calling module is waiting for the "schedule" module to finish and then crash when that never happens.
Error code
/bin/bash: line 1: 13613 Killed ( python ) < /tmp/vIZsEfp/26
shell returned 137
Function that calls schedule
def shedual_toggled(self,widget):
prosessSchedular.start_background_checker()
Schedule Program
def schedul_check():
"""set up to call prosess checker every 10 mins"""
print "%s check ran" %(counter)
counter =+ 1
app = prosessCheckerv3.call_bash() < calls the bash file
if app == False:
print "error with bash"
return False
else:
prosessCheckerv3.build_snap_shot(app)
def start_background_checker():
scheduler = BackgroundScheduler()
scheduler.add_job(schedul_check, 'interval', minutes=10)
scheduler.start()
while True:
time.sleep(2)
if __name__ == '__main__':
start_background_checker()
this program simply calls another ever 10 mins. As a side note i have been trying to stay as far away from multi-threading as possible but if that is required so be it.
Well I managed to figure it out my self. The issue that GTK+ is not thread safe so the timed module need to be either be ran in another thread or else you can realise/enter the thread before/after calling the module.
I just did it like this.
def shedual_toggeld(self,widget):
onOffSwitch = widget.get_active()
""" After main GTK has logicly finished all GUI work run thread on toggel button """
thread = threading.Thread(target=self.call_schedual, args=(onOffSwitch,))
thread.daemon = True
thread.start()
def call_schedual(self, onOffSwitch):
if onOffSwitch == True:
self.sch.start_background_checker()
else:
self.sch.stop_background_checker()
This article goes through it in more detail. Hopefully some one else will find this useful.
http://blogs.operationaldynamics.com/andrew/software/gnome-desktop/gtk-thread-awareness

Periodic python thread that can be asynchronously poked into action

Making a thread that does something periodically is as simple as setting it's target to a function that looks something like:
minute = 60
nextTime = time.time()
while True:
Do_Some_Stuff()
nextTime += minute
sleep_time = nextTime - time.time()
if sleep_time > 0:
time.sleep(sleep_time)
But... what if I want the ability for another thread to poke into action before the period is expired in some cases?
I came up with the following:
Trigger = threading.Condition()
def loop():
while True:
Do_Some_Stuff()
with Trigger:
Trigger.wait(timeout=60)
If I launch a Thread with loop as it's target, it will run once a minute, unless I execute a
Trigger.notify()
from elsewhere. And then it runs right away. Is there a better way to do this? I played with both a Semaphore and Event implementations, but they both took off once I poked them asynchronously.
And what definitely eludes me, is how I might not just poke it, but poke it at some future point. IOW, whatever your current wait, I'd like the next one to be 5 seconds from now (I guess I could fork another short lived thread that delayed 5 seconds and then drove the Trigger).
Thought I'd at least share what I came up with. I used a Queue from the queue module. Here's some example code (meant to be run interactively):
import threading
import queue
import datetime
import time
Trigger = queue.Queue()
def periodicAction():
while True:
print(datetime.datetime.now()) # arbitrary action
try:
delta = Trigger.get(timeout=20)
if delta is not None:
time.sleep(delta)
except queue.Empty:
pass
loop = threading.Thread(target=periodicAction)
loop.start()
#now dance with the user
while True:
response = input('Wakeup how soon? (Enter for immediate)')
try:
delta = float(response)
except ValueError:
delta = None
Trigger.put(delta)

Python Watchdog issue - missing events

I am using Python Watchdog to monitor a folder on Ubuntu. It's working fine with 1 or 2 files, but when I moved 50 files by command mv *.xml dest_folder then it received only 2 events and processed only 2 files. Below is the code.
def on_moved(self, event):
try:
logger.debug("on_moved event :" + str(event) )
self._validate_xml(event.dest_path)
except Exception as ex:
logger.exception(ex)
If I comment out _validate_xml function then I receive all 45 events.
Can any one tell me what is exactly happened in the Watchdog and what is the best solution for this?
I haven't used Python Watchdog, but from a generic real-time systems perspective,
processing xml with _validate_xml can be slow, and make you miss events.
event = similar to an interrupt, handling should be as fast as possible.
To more you do while handling an event, the less "real-time" your system becomes. What you can do is offload the xml validity check to another process and exchange messages with a Queue (message would be event.dest_path) the paths you have seen moving. Your event handling will be as simple as putting messages on a queue, and the files can be processed in batch by the consumer of the queue.
In short:
instantiate a Queue
fork() process
in the on_moved handler, put messages on the queue,
in the forked process, pop messages from the queue and call _validate_xml.
you may optionally leverage multiprocessing.Pool do validate xml files in parallel.
good luck.
EDIT: tested out on my system; most of the comments above seem not to apply because watchdog's code seems to handle threading just fine.
#!/usr/bin/env python
import time
from watchdog.observers import Observer, api
from watchdog.events import LoggingEventHandler, FileSystemEventHandler, FileMovedEvent
import logging
def counter_gen():
count = 0
while True:
count += 1
yield count
class XmlValidatorHandler(FileSystemEventHandler):
sleep_time = 0.1
COUNTER = counter_gen()
def on_moved(self, event):
if isinstance(event, FileMovedEvent):
print '%s - event %d; validate: %s' % (
type(self).__name__, self.COUNTER.next(), event.dest_path)
time.sleep(self.sleep_time)
class SlowXmlValidatorHandler(XmlValidatorHandler):
sleep_time = 2
COUNTER = counter_gen()
def get_observer(handler):
observer = Observer(timeout=0.5)
observer.event_queue.maxsize=10
observer.schedule(handler, path='.', recursive=True)
return observer
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
event_handler = LoggingEventHandler()
observer1 = get_observer(XmlValidatorHandler())
observer2 = get_observer(SlowXmlValidatorHandler())
observer1.start()
observer2.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer1.stop()
observer2.stop()
observer1.join()
observer2.join()
Wasn't able to reproduce your issue. some pointers:
check queue maxsize, if you already have items in there and they don't get handled in a timely fashion, then my guess is that the timeout kicks in and the event is lost. You may want to resize in that case.
check timeout, if it is configured, you may want to tune that parameter.
Maybe a more complete snippet would help us help you.

Categories