Python Watchdog with Slurm Output - python

I'm trying to use python-watchdog to monitor output of SLURM jobs on a supercomputer. For some reason, the watchdog program isn't detecting changes in the files, even if a tail -f shows that the file is indeed being changed. Here's my watchdog program:
import logging
import socket
import sys
import time
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler
logging.basicConfig(level=logging.INFO,
format='%(asctime)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S',
filename="/work/ollie/pgierz/PISM/pindex_vostok_ds50/scripts/pindex_vostok_ds50.watchdog")
def on_created(event):
logging.info(f"hey, {event.src_path} has been created!")
def on_deleted(event):
logging.info(f"what the f**k! Someone deleted {event.src_path}!")
def on_modified(event):
logging.info(f"hey buddy, {event.src_path} has been modified")
def on_moved(event):
logging.info(f"ok ok ok, someone moved {event.src_path} to {event.dest_path}")
if __name__ == "__main__":
if "ollie" in socket.gethostname():
logging.info("Not watching on login node...")
sys.exit()
# Only do this on compute node:
patterns = "*"
ignore_patterns = "*.watchdog"
ignore_directories = False
case_sensitive = True
my_event_handler = PatternMatchingEventHandler(
patterns, ignore_patterns, ignore_directories, case_sensitive
)
my_event_handler.on_created = on_created
my_event_handler.on_deleted = on_deleted
my_event_handler.on_modified = on_modified
my_event_handler.on_moved = on_moved
path = "/work/ollie/pgierz/PISM/pindex_vostok_ds50/scripts"
#path = "/work/ollie/pgierz/PISM/pindex_vostok_ds30/"
go_recursively = True
my_observer = Observer()
my_observer.schedule(my_event_handler, path, recursive=go_recursively)
my_observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
my_observer.stop()
my_observer.join()
This is just a suspicion, but could it be that the filesystem doesn't actually register the file as being "changed" since it is still open from the batch job? Doing an ls -l or stat on the output files shows it was "modified" when the job started. Do I need to tell slurm to "flush" the file?

Related

Detect files changes in python

I wanted to detect changes in certain directories, and if I heard changes in the files, for example, many changes at the same time it would return some print and perform certain actions. (I'm programming for linux)
import sys
import time
import logging
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler
if __name__ == '__main__':
# Set the format for logging info
logging.basicConfig(level=logging.INFO,
format='%(asctime)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
# Set format for displaying path
#path = sys.argv[1] if len(sys.argv) > 1 else '.'
path = '/home/kali/Downloads'
event_handler = LoggingEventHandler()
observer = Observer()
observer.schedule(event_handler, path, recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
I'm using this base code to perform the actions, but how can I put the conditions of several changes using this base?

python inotify to monitor for In_closted_write and in_moved_to events

I am monitoring a directory for new files to be moved to or created.
Upon detecting the new file I call a another python script to process the file.
#!/usr/bin/python
import os
import signal
import sys
import logging
import inotify.adapters
import subprocess
_DEFAULT_LOG_FORMAT = ''
_LOGGER = logging.getLogger(__name__)
def _configure_logging():
_LOGGER.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
formatter = logging.Formatter(_DEFAULT_LOG_FORMAT)
ch.setFormatter(formatter)
_LOGGER.addHandler(ch)
def exit_gracefully(signum, frame):
signal.signal(signal.SIGINT, original_sigint)
sys.exit(1)
signal.signal(signal.SIGINT, exit_gracefully)
def main():
i = inotify.adapters.Inotify()
i.add_watch(b'/home/sort/tmp')
try:
for event in i.event_gen():
if event is not None:
if 'IN_MOVED_TO' in event[1] or 'IN_CLOSE_WRITE' in event[1]:
(header, type_names, watch_path, filename) = event
_LOGGER.info("%s" #"WD=(%d) MASK=(%d) COOKIE=(%d) LEN=(%d) MASK->NAMES=%s "
#"WATCH-PATH=[%s]"
"FILENAME=%s" + "/" + "%s",
type_names,#header.wd, header.mask, header.cookie, header.len, type_names,
watch_path.decode('utf-8'), filename.decode('utf-8'))
fnp = str(event[2] + "/" + event[3])
print fnp
proc = subprocess.Popen([orgpath, fnp], stderr=subprocess.STDOUT, bufsize=1)
#proc.communicate()
finally:
i.remove_watch(b'/home/sort/tmp')
if __name__ == '__main__':
_configure_logging()
orgdir = os.path.dirname(os.path.realpath(sys.argv[0]))
orgpath = os.path.join(orgdir, "organize.py")
original_sigint = signal.getsignal(signal.SIGINT)
signal.signal(signal.SIGINT, exit_gracefully)
print("Watching /home/sort/tmp for new files")
main()
The end goal is to only process one file at a time as I call to an API to scrape for metadata. To many calls to the API in a short period of time could result in the API key to be banned or temporarily blocked.
Right now when I copy more than a single file into the monitoring directory the script gets called on each file at the same time.
Try putting a for loop to run the python file..
for files in directory:
...code that runs the python file
if it is still running too fast, you can put a timer on to throttle the API calls
import time
for files in directory:
...code that runs the python file
time.sleep(5)

Store logging info as variable to use in email message alert

I'm only a couple weeks in to learning python with no previous programming background so I apologize for my ignorance..
I'm trying to use a combination of modules to monitor a folder for new files (watchdog), alert on any event (logging module), and then ship the alert to my email (smtplib).
I've found a really good example here: How to run an function when anything changes in a dir with Python Watchdog?
However, I'm stuck trying to save the logging info as a variable to use in my email message. I'm wondering if I'll need to output the logging information to a file and then read in the line to use as a variable.
Anyways, this is what I have. Any help is appreciated. In the meantime I'll continue to Google.
import sys
import time
import logging
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler
import smtplib
class Event(LoggingEventHandler):
def on_any_event(self, event):
logMsg = logging.basicConfig(level=logging.INFO,
format='%(asctime)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
sender = 'NoReply#myDomain.com'
receiver = 'test.user#myDomain.com'
message = """From: No Reply <NoReply#myDomain.com>
TO: Test User <test.user#myDomain.com>
Subject: Folder Modify Detected
The following change was detected: """ + str(logMsg)
mail = smtplib.SMTP('mailServer.myDomain.com', 25)
mail.ehlo()
mail.starttls()
mail.sendmail(sender, receiver, message)
mail.close()
if __name__ == "__main__":
path = sys.argv[1] if len(sys.argv) > 1 else '.'
event_handler = Event()
observer = Observer()
observer.schedule(event_handler, path, recursive=False)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
What you need is a SMTPHandler, so that every time the folder changes (new log created), an email is sent.
import logging
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler
class Event(LoggingEventHandler):
def on_any_event(self, event):
# do stuff
pass
if __name__ == "__main__":
root = logging.getLogger()
root.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s - %(message)s',
'%Y-%m-%d %H:%M:%S')
root.setFormatter(formatter)
mail_handler = logging.handlers.SMTPHandler(mailhost='mailserver',
fromaddr='noreply#example.com',
toaddrs=['me#example.com'],
subject='The log',
credentials=('user','pwd'),
secure=None)
mail_handler.setLevel(logging.INFO)
mail_handler.setFormatter(formatter)
root.addHandler(mail_handler) # Add this handler to root logger
path = sys.argv[1] if len(sys.argv) > 1 else '.'
event_handler = Event()
observer = Observer()
observer.schedule(event_handler, path, recursive=False)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
Was able to obtain working example here and tailor to my needs:
https://www.michaelcho.me/article/using-pythons-watchdog-to-monitor-changes-to-a-directory

Block until file in directory changes

I want to use watchdog to block until a file changes in a directory. What I'm doing is sleeping while a variable is False. The problem with this, though, is that I can't interrupt the sleep; there's still up to 1s delay to break when the file changes. How can I break out of the sleep and continue to the point after the sleep? Or more generically, block until a file changes? Here's my code:
import sys
import time
import os
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
if __name__ == "__main__":
observer = Observer()
def nothing():
pass
class FileChangeHandler(FileSystemEventHandler):
done = False
def on_any_event(self, event):
print 'Got event'
FileChangeHandler.done = True
path = sys.argv[1] if len(sys.argv) > 1 else '.'
event_handler = FileChangeHandler()
observer.schedule(event_handler, path, recursive=True)
observer.start()
while not event_handler.done:
time.sleep(1)
print 'Done'
observer.stop()
observer.join()

python watchdog runs more than once

I am trying to learn python-watchdog, but I am sort of confused why the job I set up runs more than once. So, here is my set up:
#handler.py
import os
from watchdog.events import FileSystemEventHandler
from actions import run_something
def getext(filename):
return os.path.splitext(filename)[-1].lower()
class ChangeHandler(FileSystemEventHandler):
def on_any_event(self, event):
if event.is_directory:
return
if getext(event.src_path) == '.done':
run_something()
else:
print "event not directory.. exiting..."
pass
the observer is set up like so:
#observer.py
import os
import time
from watchdog.observers import Observer
from handler import ChangeHandler
BASEDIR = "/path/to/some/directory/bin"
def main():
while 1:
event_handler = ChangeHandler()
observer = Observer()
observer.schedule(event_handler, BASEDIR, recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
if __name__ == '__main__':
main()
and finally, the actions like so:
#actions.py
import os
import subprocess
def run_something():
output = subprocess.check_output(['./run.sh'])
print output
return None
..where ./run.sh is just a shell script I would like to run when a file with an extension .done is found on /path/to/some/directory/bin
#run.sh
#!/bin/bash
echo "Job Start: $(date)"
rm -rf /path/to/some/directory/bin/job.done # remove the .done file
echo "Job Done: $(date)"
However, when I issue a python observer.py and then do a touch job.done on /path/to/some/directory/bin, I see that my shell script ./run.sh runs three times and not one..
I am confused why this runs thrice and not just once (I do delete the job.done file on my bash script)
To debug watchdog scripts, it is useful to print what watchdog is seeing as events. One file edit or CLI command, such as touch, can result in multiple watchdog events. For example, if you insert a print statement:
class ChangeHandler(FileSystemEventHandler):
def on_any_event(self, event):
print(event)
to log every event, running
% touch job.done
generates
2014-12-24 13:11:02 - <FileCreatedEvent: src_path='/home/unutbu/tmp/job.done'>
2014-12-24 13:11:02 - <DirModifiedEvent: src_path='/home/unutbu/tmp'>
2014-12-24 13:11:02 - <FileModifiedEvent: src_path='/home/unutbu/tmp/job.done'>
Above there were two events with src_path ending in job.done. Thus,
if getext(event.src_path) == '.done':
run_something()
runs twice because there is a FileCreatedEvent and a FileModifiedEvent.
You might be better off only monitoring FileModifiedEvents.

Categories