Python Watchdog issue - missing events - python

I am using Python Watchdog to monitor a folder on Ubuntu. It's working fine with 1 or 2 files, but when I moved 50 files by command mv *.xml dest_folder then it received only 2 events and processed only 2 files. Below is the code.
def on_moved(self, event):
try:
logger.debug("on_moved event :" + str(event) )
self._validate_xml(event.dest_path)
except Exception as ex:
logger.exception(ex)
If I comment out _validate_xml function then I receive all 45 events.
Can any one tell me what is exactly happened in the Watchdog and what is the best solution for this?

I haven't used Python Watchdog, but from a generic real-time systems perspective,
processing xml with _validate_xml can be slow, and make you miss events.
event = similar to an interrupt, handling should be as fast as possible.
To more you do while handling an event, the less "real-time" your system becomes. What you can do is offload the xml validity check to another process and exchange messages with a Queue (message would be event.dest_path) the paths you have seen moving. Your event handling will be as simple as putting messages on a queue, and the files can be processed in batch by the consumer of the queue.
In short:
instantiate a Queue
fork() process
in the on_moved handler, put messages on the queue,
in the forked process, pop messages from the queue and call _validate_xml.
you may optionally leverage multiprocessing.Pool do validate xml files in parallel.
good luck.
EDIT: tested out on my system; most of the comments above seem not to apply because watchdog's code seems to handle threading just fine.
#!/usr/bin/env python
import time
from watchdog.observers import Observer, api
from watchdog.events import LoggingEventHandler, FileSystemEventHandler, FileMovedEvent
import logging
def counter_gen():
count = 0
while True:
count += 1
yield count
class XmlValidatorHandler(FileSystemEventHandler):
sleep_time = 0.1
COUNTER = counter_gen()
def on_moved(self, event):
if isinstance(event, FileMovedEvent):
print '%s - event %d; validate: %s' % (
type(self).__name__, self.COUNTER.next(), event.dest_path)
time.sleep(self.sleep_time)
class SlowXmlValidatorHandler(XmlValidatorHandler):
sleep_time = 2
COUNTER = counter_gen()
def get_observer(handler):
observer = Observer(timeout=0.5)
observer.event_queue.maxsize=10
observer.schedule(handler, path='.', recursive=True)
return observer
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
event_handler = LoggingEventHandler()
observer1 = get_observer(XmlValidatorHandler())
observer2 = get_observer(SlowXmlValidatorHandler())
observer1.start()
observer2.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer1.stop()
observer2.stop()
observer1.join()
observer2.join()
Wasn't able to reproduce your issue. some pointers:
check queue maxsize, if you already have items in there and they don't get handled in a timely fashion, then my guess is that the timeout kicks in and the event is lost. You may want to resize in that case.
check timeout, if it is configured, you may want to tune that parameter.
Maybe a more complete snippet would help us help you.

Related

Python - improving logging to file and console using multiprocessing

I am trying to download a file on CAN bus using python-can. It involves sending data very quickly (in the order of 2-3 messages per millisecond). I am trying to log to file these messages without impacting the speed of sending. Doing the file I/O slows down the sending due to the logging overhead. I tried various methods to improve this (including using queues and reading the queue from another thread but this was not much better - possibly due to GIL). Most of these tests started with using the Python logging module and trying various handlers (QueueHandler/QueueListener, MemoryHandler, etc).
I've managed to make some significant improvements by moving the file I/O into a separate process. I initially ran into an issue with the overhead of sending data from one process to another - so I now buffer it. Now, instead of taking 150% longer with direct file I/O in the main process, I see ~20% increase in time.
I thought that, since this is running in another process, I could also print() the data to console (which I know is relative expensive) but I see a huge increase in the file download time.
What is happening that means the print() affects the main process even though it is running in a child process?
Code below:
file_logger_mp() is called from the main process and it starts the child process that does the logging. The main process then uses the log_hdl function to add a message to the buffer. When the buffer reaches a certain size (100) it is sent to the child process for logging to file or printing to console.
Device: rpi4. And the main process uses asyncio, in case that affects it.
def file_logger_mp(logger_name: str, log_file_pth: str):
conn_rec, conn_send = multiprocessing.Pipe()
log_hdl_c = MyLogger(conn_send)
log_hdl = log_hdl_c.log_hdl # This is used by main code to provide log messages to child process
listener = MyProcess(conn_rec, log_file_pth)
atexit.register(log_hdl_c.final_flush, listener)
listener.start() # Start the child process
return log_hdl, listener
class MyLogger():
def __init__(self, conn_send) -> None:
self.buffer = []
self.conn_send = conn_send
def log_hdl(self, msg):
self.buffer.append(msg)
if len(self.buffer) > 100:
self.conn_send.send(self.buffer)
self.buffer.clear()
def final_flush(self, listener):
self.conn_send.send(self.buffer)
listener.terminate()
class MyProcess(multiprocessing.Process):
def __init__(self, queue, f_hdl):
multiprocessing.Process.__init__(self)
self.exit = multiprocessing.Event()
self.queue = queue
self.f_hdl = f_hdl
def run(self):
f = open(self.f_hdl, "w+")
while not self.exit.is_set():
try:
record = self.queue.recv()
for msg in record:
output = str(msg)
f.write(output+'\n')
print(output) # This `print()` causes large delays to main process?!
record.clear()
except Exception:
import sys, traceback
print('Whoops! Problem:', file=sys.stderr)
traceback.print_exc(file=sys.stderr)
for msg in record: # Flush any pending records before finishing
f.write(str(msg)+'\n')
f.close()
def terminate(self):
self.exit.set()

Should I use Events, Semaphores, Locks, Conditions, or a combination thereof to manage safely exiting my multithreaded Python program?

I am writing a multithreaded python program in which the main thread and the other threads it spawns run as daemons (but not with Thread.daemon=True) that look for certain files in certain directories and perform actions with them when they exist. It is possible that an error occurs in one/any of the threads which would require the whole program to exit. However, I need the other threads to finish their current job before exiting.
From what I understand, if I set myThread.daemon=True for my spawned threads, they will automatically exit immediately when the main thread exits. However, I want the other threads to finish their current job before exiting (unless the error is some sort of catastrophic failure, in which case I'll probably just exit everything anyway, safely or not). Therefore, I am not setting the daemon property to True for the threads.
Looking at the threading module documentation and the various objects available to me such as Events, Semaphores, Conditions, and Locks, I'm unsure of the best way to handle my situation. Additionally, I'm unsure how to handle this scenario when the program needs to terminate due to SIGTERM/SIGINT signals.
Some code that illustrates a simplified version of the structure of my program:
import threading
import signals
import glob
import time
class MyThread1( threading.thread ):
def __init__( self, name='MyThread1' ):
threading.Thread.__init__( self )
self.name = name
return
def run( self ):
while True:
filePathList = glob.glob( thisThreadDir + '/*.txt' )
for file in filePathList:
try:
doSomeProcessing( file )
# Then move the file to another thread's dir
# or potentially create a new file that will
# be picked up by another thread
except:
# Need to potentially tell all other threads
# to finish task and exit depending on error
# I assume this would be the place to check for some kind of
# flag or other indication to terminate the thread?
time.sleep( 30 )
# Now imagine a few more custom threads with the same basic structure,
# except that what is happening in doSomeProcessing() will obviously vary
# Main Thread/Script
def sigintHandler( SIGINT, frame ):
# How do I handle telling all threads to finish their current loop
# and then exit safely when I encounter this signal?
sys.exit( 1 )
def sigtermHandler( SIGTERM, frame ):
# Same question for this signal handler
sys.exit( 1 )
signal.signal( signal.SIGINT, sigintHandler )
signal.signal( signal.SIGTERM, sigtermHandler )
myOtherThread1 = MyThread1()
myOtherThreadN = MyThreadN()
myOtherThread1.start()
myOtherThreadN.start()
while True:
filePathList = glob.glob( mainDir + '/*.txt' )
for file in filePathList:
try:
doMainProcessing( file )
# Move file or write a new one in another thread's dir
except:
# Again, potentially need to exit the whole program, but want
# the other threads to finish their current loop first
# Check if another thread told us we need to exit?
time.sleep( 30 )
I would use an Event to signal to a thread that it should exit:
create an event in __init__
use the event's wait() in run() for sleep and for checking when to exit
set the event from outside to stop the thread
To handle exceptions within a thread, I would have a try/ except block around everything it does. When something is caught, store the exception (and/or any other info you need), clean up and exit the thread.
Outside, in the main thread, check for the store exceptions in all threads, if any exception is found, signal to all threads that they should exit.
To handle exceptions in the main thread (which includes also SIGINT), have a try/except block there and signal to all threads to stop.
All together, with dummy exceptions and debug prints:
import threading
import time
class MyThread(threading.Thread):
def __init__(self):
super().__init__()
self.stop_requested = threading.Event()
self.exception = None
def run(self):
try:
# sleep for 1 second, or until stop is requested, stop if stop is requested
while not self.stop_requested.wait(1):
# do your thread thing here
print('in thread {}'.format(self))
# simulate a random exception:
import random
if random.randint(0, 50) == 42:
1 / 0
except Exception as e:
self.exception = e
# clean up here
print('clean up thread {}'.format(self))
def stop(self):
# set the event to signal stop
self.stop_requested.set()
# create and start some threads
threads = [MyThread(), MyThread(), MyThread(), MyThread()]
for t in threads:
t.start()
# main thread looks at the status of all threads
try:
while True:
for t in threads:
if t.exception:
# there was an error in a thread - raise it in main thread too
# this will stop the loop
raise t.exception
time.sleep(0.2)
except Exception as e:
# handle exceptions any way you like, or don't
# This includes exceptions in main thread as well as those in other threads
# (because of "raise t.exception" above)
print(e)
finally:
print('clan up everything')
for t in threads:
# threads will know how to clean up when stopped
t.stop()

Multiple thread with Autobahn, ApplicationRunner and ApplicationSession

python-running-autobahnpython-asyncio-websocket-server-in-a-separate-subproce
can-an-asyncio-event-loop-run-in-the-background-without-suspending-the-python-in
Was trying to solve my issue with this two links above but i have not.
I have the following error : RuntimeError: There is no current event loop in thread 'Thread-1'.
Here the code sample (python 3):
from autobahn.asyncio.wamp import ApplicationSession
from autobahn.asyncio.wamp import ApplicationRunner
from asyncio import coroutine
import time
import threading
class PoloniexWebsocket(ApplicationSession):
def onConnect(self):
self.join(self.config.realm)
#coroutine
def onJoin(self, details):
def on_ticker(*args):
print(args)
try:
yield from self.subscribe(on_ticker, 'ticker')
except Exception as e:
print("Could not subscribe to topic:", e)
def poloniex_worker():
runner = ApplicationRunner("wss://api.poloniex.com:443", "realm1")
runner.run(PoloniexWebsocket)
def other_worker():
while True:
print('Thank you')
time.sleep(2)
if __name__ == "__main__":
polo_worker = threading.Thread(None, poloniex_worker, None, (), {})
thank_worker = threading.Thread(None, other_worker, None, (), {})
polo_worker.start()
thank_worker.start()
polo_worker.join()
thank_worker.join()
So, my final goal is to have 2 threads launched at the start. Only one need to use ApplicationSession and ApplicationRunner. Thank you.
A separate thread must have it's own event loop. So if poloniex_worker needs to listen to a websocket, it needs its own event loop:
def poloniex_worker():
asyncio.set_event_loop(asyncio.new_event_loop())
runner = ApplicationRunner("wss://api.poloniex.com:443", "realm1")
runner.run(PoloniexWebsocket)
But if you're on a Unix machine, you will face another error if you try to do this. Autobahn asyncio uses Unix signals, but those Unix signals only work in the main thread. You can simply turn off Unix signals if you don't plan on using them. To do that, you have to go to the file where ApplicationRunner is defined. That is wamp.py in python3.5 > site-packages > autobahn > asyncio on my machine. You can comment out the signal handling section of the code like so:
# try:
# loop.add_signal_handler(signal.SIGTERM, loop.stop)
# except NotImplementedError:
# # signals are not available on Windows
# pass
All this is a lot of work. If you don't absolutely need to run your ApplicationSession in a separate thread from the main thread, it's better to just run the ApplicationSession in the main thread.

The thread hangs using FTP LIST with Python

I'm using ftplib for connecting and getting file list from FTP server.
The problem I have is that the connection hangs from time to time and I don't know why. I'm running python script as a daemon, using threads.
See what I mean:
def main():
signal.signal(signal.SIGINT, signal_handler)
app.db = MySQLWrapper()
try:
app.opener = FTP_Opener()
mainloop = MainLoop()
while not app.terminate:
# suspend main thread until the queue terminates
# this lets to restart the queue automatically in case of unexpected shutdown
mainloop.join(10)
while (not app.terminate) and (not mainloop.isAlive()):
time.sleep(script_timeout)
print time.ctime(), "main: trying to restart the queue"
try:
mainloop = MainLoop()
except Exception:
time.sleep(60)
finally:
app.db.close()
app.db = None
app.opener = None
mainloop = None
try:
os.unlink(PIDFILE)
except:
pass
# give other threads time to terminate
time.sleep(1)
print time.ctime(), "main: main thread terminated"
MainLoop() has some functions for FTP connect, download specific files and disconnect from the server.
Here's how I get the file's list:
file_list = app.opener.load_list()
And how FTP_Opener.load_list() function looks like:
def load_list(self):
attempts = 0
while attempts<=ftp_config.load_max_attempts:
attempts += 1
filelist = []
try:
self._connect()
self._chdir()
# retrieve file list to 'filelist' var
self.FTP.retrlines('LIST', lambda s: filelist.append(s))
filelist = self._filter_filelist(self._parse_filelist(filelist))
return filelist
except Exception:
print sys.exc_info()
self._disconnect()
sleep(0.1)
print time.ctime(), "FTP Opener: can't load file list"
return []
Why sometimes the FTP connection hangs and how can I monitor this? So if it happens I would like to terminate the thread somehow and start a new one.
Thanks
If you are building for robustness, I would highly recommend that you look into using an event-driven method. One such which have FTP support is Twisted (API).
The advantage is that you don't block the thread while waiting for i/O and you can create simple timer functions to monitor your connections if you so prefer. It also scales a lot better. It is slightly more complicated to code using event-driven patterns, so if this is just a simple script it may or may not be worth the extra effort, but since you write that you are writing a daemon, it might be worth looking into.
Here is an example of an FTP client: ftpclient.py

Is there a python library for notification and waiting?

I'm using python-zookeeper for locking, and I'm trying to figure out a way of getting the execution to wait for notification when it's watching a file, because zookeeper.exists() returns immediately, rather than blocking.
Basically, I have the code listed below, but I'm unsure of the best way to implement the notify() and wait_for_notification() functions. It could be done with os.kill() and signal.pause(), but I'm sure that's likely to cause problems if I later have multiple locks in one program - is there a specific Python library that is good for this sort of thing?
def get_lock(zh):
lockfile = zookeeper.create(zh,lockdir + '/guid-lock-','lock', [ZOO_OPEN_ACL_UNSAFE], zookeeper.EPHEMERAL | zookeeper.SEQUENCE)
while(True):
# this won't work for more than one waiting process, fix later
children = zookeeper.get_children(zh, lockdir)
if len(children) == 1 and children[0] == basename(lockfile):
return lockfile
# yeah, there's a problem here, I'll fix it later
for child in children:
if child < basename(lockfile):
break
# exists will call notify when the watched file changes
if zookeeper.exists(zh, lockdir + '/' + child, notify):
# Process should wait here until notify() wakes it
wait_for_notification()
def drop_lock(zh,lockfile):
zookeeper.delete(zh,lockfile)
def notify(zh, unknown1, unknown2, lockfile):
pass
def wait_for_notification():
pass
The Condition variables from Python's threading module are probably a very good fit for what you're trying to do:
http://docs.python.org/library/threading.html#condition-objects
I've extended to the example to make it a little more obvious how you would adapt it for your purposes:
#!/usr/bin/env python
from collections import deque
from threading import Thread,Condition
QUEUE = deque()
def an_item_is_available():
return bool(QUEUE)
def get_an_available_item():
return QUEUE.popleft()
def make_an_item_available(item):
QUEUE.append(item)
def consume(cv):
cv.acquire()
while not an_item_is_available():
cv.wait()
print 'We got an available item', get_an_available_item()
cv.release()
def produce(cv):
cv.acquire()
make_an_item_available('an item to be processed')
cv.notify()
cv.release()
def main():
cv = Condition()
Thread(target=consume, args=(cv,)).start()
Thread(target=produce, args=(cv,)).start()
if __name__ == '__main__':
main()
My answer may not be relevant to your question, but it is relevant to the question title.
from threading import Thread,Event
locker = Event()
def MyJob(locker):
while True:
#
# do some logic here
#
locker.clear() # Set event state to 'False'
locker.wait() # suspend the thread until event state is 'True'
worker_thread = Thread(target=MyJob, args=(locker,))
worker_thread.start()
#
# some main thread logic here
#
locker.set() # This sets the event state to 'True' and thus it resumes the worker_thread
More information here: https://docs.python.org/3/library/threading.html#event-objects

Categories