How to monitor Python files for changes?

How to monitor Python files for changes? - python

I want to restart my Python web application, if code gets changed. But there could be a large number of files that could be changed, since files in imported modules could change ...
How to get the actual file names from imported packages / modules?
How can modified Python files be detected efficiently? Is there a library to do that?

Shameless plug. There's also http://github.com/gorakhargosh/watchdog that I'm working on to do exactly this.
HTH.

gamin is another option which is slightly less Linux-specific.

I'm not sure how you would implement the 'reload application' operation in your circumstance; reloading a changed module with the reload built-in probably won't cut it.
But as far as detecting whether or not there was a change, the following would be one way to approach it.
Most python modules have a __file__ attribute.
All loaded modules are stored in sys.modules.
We can walk through sys.modules at some interval, and look for changes on disk for each module in turn
Sometimes __file__ points to a .pyc file instead of a .py file, so you might have to chop off the trailing c. Sometimes a .pyc file exists but a .py doesn't exist; in a robust system you'd have to allow for this.
A proof of concept of code to do this (not robust):
_module_timestamps = {}
_checking = False
def run_checker():
global _checking
_checking = True
while _checking:
for name, module in sys.modules.iteritems():
if hasattr(module, '__file__'):
filename = module.__file__
if filename.endswith('.pyc'):
filename = filename[:-1]
mtime = os.stat(filename).st_mtime
if name not in _module_timestamps:
_module_timestamps[name] = mtime
else:
if mtime > _module_timestamps[name]:
do_reload(name)
else:
'module %r has no file attribute' % (name,)
time.sleep(1)
def do_reload(modname):
print 'I would reload now, because of %r' % (modname,)
check_thread = threading.Thread(target=run_checker)
check_thread.daemon = True
check_thread.start()
try:
while 1:
time.sleep(0.1)
except KeyboardInterrupt:
print '\nexiting...'

Here's an example of how this could be implemented using pyinotify (ie., on Linux).
from importlib import import_module
class RestartingLauncher:
def __init__(self, module_name, start_function, stop_function, path="."):
self._module_name = module_name
self._filename = '%s.py' % module_name
self._start_function = start_function
self._stop_function = stop_function
self._path = path
self._setup()
def _setup(self):
import pyinotify
self._wm = pyinotify.WatchManager()
self._notifier = pyinotify.ThreadedNotifier(
self._wm, self._on_file_modified)
self._notifier.start()
# We monitor the directory (instead of just the file) because
# otherwise inotify gets confused by editors such a Vim.
flags = pyinotify.EventsCodes.OP_FLAGS['IN_MODIFY']
wdd = self._wm.add_watch(self._path, flags)
def _on_file_modified(self, event):
if event.name == self._filename:
print "File modification detected. Restarting application..."
self._reload_request = True
getattr(self._module, self._stop_function)()
def run(self):
self._module = import_module(self._module_name)
self._reload_request = True
while self._reload_request:
self._reload_request = False
reload(self._module)
getattr(self._module, self._start_function)()
print 'Bye!'
self._notifier.stop()
def launch_app(module_name, start_func, stop_func):
try:
import pyinotify
except ImportError:
print 'Pyinotify not found. Launching app anyway...'
m = import_module(self._module_name)
getattr(m, start_func)()
else:
RestartingLauncher(module_name, start_func, stop_func).run()
if __name__ == '__main__':
launch_app('example', 'main', 'force_exit')
The parameters in the launch_app call are the filename (without the ".py"), the function to start execution and a function that somehow stops the execution.
Here's a stupid example of an "app" that could be (re-)launched using the previous code:
run = True
def main():
print 'in...'
while run: pass
print 'out'
def force_exit():
global run
run = False
In a typical application where you'd want to use this, you'd probably have a main loop of some sort. Here's a more real example, for a GLib/GTK+ based application:
from gi.repository import GLib
GLib.threads_init()
loop = GLib.MainLoop()
def main():
print "running..."
loop.run()
def force_exit():
print "stopping..."
loop.quit()
The same concept works for most other loops (Clutter, Qt, etc).
Monitoring several code files (ie. all files that are part of the application) and error resilience (eg. printing exceptions and waiting in an idle loop until the code is fixed, then launching it again) are left as exercises for the reader :).
Note: All code in this answer is released under the ISC License (in addition to Creative Commons).

This is operating system specific. For Linux, there is inotify, see e.g. http://github.com/rvoicilas/inotify-tools/

Related

Check if calling script used "if name == "main" (to comply with multiprocessing requirement)

I wrote a package that is using multiprocessing.Pool inside one of its functions.
Due to this reason, it is mandatory (as specified in here under "Safe importing of main module") that the outermost calling function can be imported safely e.g. without starting a new process. This is usually achieved using the if __name__ == "__main__": statement as explicitly explained at the link above.
My understanding (but please correct me if I'm wrong) is that multiprocessing imports the outermost calling module. So, if this is not "import-safe", this will start a new process that will import again the outermost module and so on recursively, until everything crashes.
If the outermost module is not "import-safe" when the main function is launched it usually hangs without printing any warning, error, message, anything.
Since using if __name__ == "__main__": is not usually mandatory and the user is usually not always aware of all the modules used inside a package, I would like to check at the beginning of my function if the user complied with this requirement and, if not, raise a warning/error.
Is this possible? How can I do this?
To show this with an example, consider the following example.
Let's say I developed my_module.py and I share it online/in my company.
# my_module.py
from multiprocessing import Pool
def f(x):
return x*x
def my_function(x_max):
with Pool(5) as p:
print(p.map(f, range(x_max)))
If a user (not me) writes his own script as:
# Script_of_a_good_user.py
from my_module import my_function
if __name__ == '__main__':
my_function(10)
all is good and the output is printed as expected.
However, if a careless user writes his script as:
# Script_of_a_careless_user.py
from my_module import my_function
my_function(10)
then the process hangs, no output is produces, but no error message or warning is issued to the user.
Is there a way inside my_function, BEFORE opening Pool, to check if the user used the if __name__ == '__main__': condition in its script and, if not, raise an error saying it should do it?
NOTE: I think this behavior is only a problem on Windows machines where fork() is not available, as explained here.

You can use the traceback module to inspect the stack and find the information you're looking for. Parse the top frame, and look for the main shield in the code.
I assume this will fail when you're working with a .pyc file and don't have access to the source code, but I assume developers will test their code in the regular fashion first before doing any kind of packaging, so I think it's safe to assume your error message will get printed when needed.
Version with verbose messages:
import traceback
import re
def called_from_main_shield():
print("Calling introspect")
tb = traceback.extract_stack()
print(traceback.format_stack())
print(f"line={tb[0].line} lineno={tb[0].lineno} file={tb[0].filename}")
try:
with open(tb[0].filename, mode="rt") as f:
found_main_shield = False
for i, line in enumerate(f):
if re.search(r"__name__.*['\"]__main__['\"]", line):
found_main_shield = True
if i == tb[0].lineno:
print(f"found_main_shield={found_main_shield}")
return found_main_shield
except:
print("Coulnd't inspect stack, let's pretend the code is OK...")
return True
print(called_from_main_shield())
if __name__ == "__main__":
print(called_from_main_shield())
In the output, we see that the first called to called_from_main_shield returns False, while the second returns True:
$ python3 introspect.py
Calling introspect
[' File "introspect.py", line 24, in <module>\n print(called_from_main_shield())\n', ' File "introspect.py", lin
e 7, in called_from_main_shield\n print(traceback.format_stack())\n']
line=print(called_from_main_shield()) lineno=24 file=introspect.py
found_main_shield=False
False
Calling introspect
[' File "introspect.py", line 27, in <module>\n print(called_from_main_shield())\n', ' File "introspect.py", lin
e 7, in called_from_main_shield\n print(traceback.format_stack())\n']
line=print(called_from_main_shield()) lineno=27 file=introspect.py
found_main_shield=True
True
More concise version:
def called_from_main_shield():
tb = traceback.extract_stack()
try:
with open(tb[0].filename, mode="rt") as f:
found_main_shield = False
for i, line in enumerate(f):
if re.search(r"__name__.*['\"]__main__['\"]", line):
found_main_shield = True
if i == tb[0].lineno:
return found_main_shield
except:
return True
Now, it's not super elegant to use re.search() like I did, but it should be reliable enough. Warning: since I defined this function in my main script, I had to make sure that line didn't match itself, which is why I used ['\"] to match the quotes instead of using a simpler RE like __name__.*__main__. Whatever you chose, just make sure it's flexible enough to match all legal variants of that code, which is what I aimed for.

I think the best you can do is to try execute the code and provide a hint if it fails. Something like this:
# my_module.py
import sys # Use sys.stderr to print to the error stream.
from multiprocessing import Pool
def f(x):
return x*x
def my_function(x_max):
try:
with Pool(5) as p:
print(p.map(f, range(x_max)))
except RuntimeError as e:
print("Whoops! Did you perhaps forget to put the code in `if __name__ == '__main__'`?", file=sys.stderr)
raise e
This is of course not a 100% solution, as there might be several other reasons the code throws a RuntimeError.
If it doesn't raise a RuntimeError, an ugly solution would be to explicitly force the user to pass in the name of the module.
# my_module.py
from multiprocessing import Pool
def f(x):
return x*x
def my_function(x_max, module):
"""`module` must be set to `__name__`, for example `my_function(10, __name__)`"""
if module == '__main__':
with Pool(5) as p:
print(p.map(f, range(x_max)))
else:
raise Exception("This can only be called from the main module.")
And call it as:
# Script_of_a_careless_user.py
from my_module import my_function
my_function(10, __name__)
This makes it very explicit to the user.

Keeping track of when Python modules are imported

Does the interpreter somehow keep a timestamp of when a module is imported? Or is there an easy way of hooking into the import machinery to do this?
The scenario is a long-running Python process that at various points imports user-provided modules. I would like the process to be able to check "should I restart to load the latest code changes?" by checking the module file's timestamps against the time the module was imported.

Here's a way to automatically have an attribute (named _loadtime in the example code below) added to modules when they're imported. The code is based on Recipe 10.12 titled "Patching Modules on Import" in the book Python Cookbook, by David Beazley and Brian Jones, O'Reilly, 2013, which shows a technique that I adapted to do what you want.
For testing purposes I created this trivial target_module.py file:
print('in target_module')
Here's the example code:
import importlib
import sys
import time
class PostImportFinder:
def __init__(self):
self._skip = set() # To prevent recursion.
def find_module(self, fullname, path=None):
if fullname in self._skip: # Prevent recursion
return None
self._skip.add(fullname)
return PostImportLoader(self)
class PostImportLoader:
def __init__(self, finder):
self._finder = finder
def load_module(self, fullname):
importlib.import_module(fullname)
module = sys.modules[fullname]
# Add a custom attribute to the module object.
module._loadtime = time.time()
self._finder._skip.remove(fullname)
return module
sys.meta_path.insert(0, PostImportFinder())
if __name__ == '__main__':
import time
try:
print('importing target_module')
import target_module
except Exception as e:
print('Import failed:', e)
raise
loadtime = time.localtime(target_module._loadtime)
print('module loadtime: {} ({})'.format(
target_module._loadtime,
time.strftime('%Y-%b-%d %H:%M:%S', loadtime)))
Sample output:
importing target_module
in target_module
module loadtime: 1604683023.2491636 (2020-Nov-06 09:17:03)

I don't think there's any way to get around how hacky this is, but how about something like this every time you import? (I don't know exactly how you're importing):
import time
from types import ModuleType
# create a dictionary to keep track
# filter globals to exclude things that aren't modules and aren't builtins
MODULE_TIMES = {k:None for k,v in globals().items() if not k.startswith("__") and not k.endswith("__") and type(v) == ModuleType}
for module_name in user_module_list:
MODULE_TIMES[module_name] = time.time()
eval("import {0}".format(module_name))
And then you can reference this dictionary in a similar way later.

Changes made to object attribute not seen when using the multiprocessing module

When using multiprocessing in Python, and you're importing a module, why is is that any instance variables in the module are pass by copy to the child process, whereas and arguments passed in the args() parameter are pass by reference.
Does this have to do with thread safety perhaps?
foo.py
class User:
def __init__(self, name):
self.name = name
foo_user = User('foo')
main.py
import multiprocessing
from foo import User, foo_user
def worker(main_foo):
print(main_foo.name) #prints 'main user'
print(foo_user.name) #prints 'foo user', why doesn't it print 'override'
if __name__ == '__main__':
main_foo = User('main user')
foo_user.name = 'override'
p = multiprocessing.Process(target=worker, args=(main_foo,))
p.start()
p.join()
EDIT: I'm an idiot, self.name = None should have been self.name = name. I made the correction in my code and forgot to copy it back over.

Actually, it does print override. Look at this:
$ python main.py
None
override
But! This only happens on *Nix. My guess is that you are running on Windows. The difference being that, in Windows, a fresh copy of the interpreter is spawned to just run your function, and the change you made to foo_user.name is not made, because in this new instance, __name__ is not __main__, so that bit of code is not executed. This is done to prevent infinite recursion.
You'll see the difference if you add this line to your function:
def worker(main_foo):
print(__name__)
...
This prints __main__ on *Nix. However, it will not be __main__ for Windows.
You'll want to move that line out of the if __name__ == __main__ block, if you want it to work.

python importlib can't find a module inside a daemon context

I've got a script that imports modules dynamically based on configuration. I'm trying to implement a daemon context (using the python-daemon module) on the script, and it seems to be interfering with python's ability to find the modules in question.
Insite mymodule/__init__.py in setup() I do this:
load_modules(args, config, logger)
try:
with daemon.DaemonContext(
files_preserve = getLogfileHandlers(logger)
):
main_loop(config)
I've got a call to setup() inside mymodule/__main__.py and I'm loading the whole thing this way:
PYTHONPATH=. python -m mymodule
This works fine, but a listening port that gets set up inside load_modules() is closed by the newly added daemon context, so I want to move that function call inside the daemon context like so:
try:
with daemon.DaemonContext(
files_preserve = getLogfileHandlers(logger)
):
load_modules(args, config, logger)
main_loop(config)
Modules are loaded inside load_modules() this way:
for mysubmodule in modules:
try:
i = importlib.import_module("mymodule.{}".format(mysubmodule))
except ImportError as err:
logger.error("import of mymodule.{} failed: {}".format(
mysubmodule, err))
With load_modules() outside the daemon context this works fine. When I move it inside the daemon context it seems to be unable to find the modules it's looking for. I get this:
import of mymodule.submodule failed: No module named submodule
It looks like some sort of namespace problem -- I note that the exception only refers to the submodule portion of the module name I try to import -- but I've compared everything I can think of inside and outside the daemon context, and I can't find the important difference. sys.path is unchanged, the daemon context isn't clearing the environemnt, or chrooting. The cwd changes to / of course, but that shouldn't have any effect on python's ability to find modules, since the absolute path to . appears in sys.path.
What am I missing here?
EDIT: I'm adding an SSCCE to make the situation more clear. The following three files create a module called "mymodule" that can be run from the command line as PYTHONPATH=. python -m mymodule. There are two calls to load_module() in __init__.py, one commented out. You can demonstrate the problem by swapping which one is commented.
mymodule/__main__.py
from mymodule import setup
import sys
if __name__ == "__main__":
sys.exit(setup())
mymodule/__init__.py
import daemon
import importlib
import logging
def main_loop():
logger = logging.getLogger('loop')
logger.debug("Code runs here.")
def load_module():
logger = logging.getLogger('load_module')
submodule = 'foo'
try:
i = importlib.import_module("mymodule.{}".format(submodule))
except ImportError as e:
logger.error("import of mymodule.{} failed: {}".format(
submodule, e))
def setup_logging():
logfile = 'mymodule.log'
fh = logging.FileHandler(logfile)
root_logger = logging.getLogger()
root_logger.addHandler(fh)
root_logger.setLevel(logging.DEBUG)
def get_logfile_handlers(logger):
handlers = []
for handler in logger.handlers:
handlers.append(handler.stream.fileno())
return handlers
def setup():
setup_logging()
logger = logging.getLogger()
# load_module()
with daemon.DaemonContext(
files_preserve = get_logfile_handlers(logger)
):
load_module()
main_loop()
mymodule/foo.py
import logging
logger=logging.getLogger('foo')
logger.debug("Inside foo.py")

I spent a good 4 hours trying to work this one out when I hit it in my own project. The clue is here:
If the module being imported is supposed to be contained within a package then the second argument passed to find_module(), __path__ on the parent package, is used as the source of paths.
(From https://docs.python.org/2/reference/simple_stmts.html#import)
Once you have successfully imported mymodule, python2 no longer uses sys.path to search for the submodules, it uses sys.modules["mymodule"].__path__. When you import mymodule, python2 unhelpfully sets its __path__ to the relative directory it was stored in:
mymodule.__path__ = ['mymodule']
After daemonizing, python's CWD is set to / and the only place the import internals search for mysubmodule is in /mymodule.
I worked around this by using os.chdir() to change CWD back to the old dir after daemonizing:
oldcwd = os.getcwd()
with DaemonizeContext():
os.chdir(oldcwd)
# ... daemon things

This works fine, but a listening port that gets set up inside load_modules() is closed by the newly added daemon context, so
No. load_modules() should load modules. It should not open ports.
If you need to preserve a file or socket opened outside the context, pass it to files_preserve. If possible, it is preferred to simply open files and such inside the context instead, as I suggest above.

Icon overlay in windows not reflecting in explorer

I m using python2.7 and py2exe to create dll from my python script.
Successfully created a DLL and registered an entry for my icon overlay status and then restarted windows explorer process through task manager.
Verified whether my entry is in registry and Yes, it was there.
But when i set my status through a python test app script for a specific folder location.
I expected all the files and folders in the selected path should overlayed with my overlay icon.
But No, Icon overlay is not happening at all.
But when i m testing through python script for register entry(without creating DLL) and setting my icon overlay through my test app script.
Yes, Its working perfectly.
I am confused why it is not happening when tried with my DLL???
Below is my python script to register an status entry
import os
import win32traceutil
import pythoncom
import winerror
from win32com.shell import shell, shellcon
from multiprocessing.connection import Client
REG_PATH = r'Software\Microsoft\Windows\CurrentVersion\Explorer\ShellIconOverlayIdentifiers'
REG_KEY = "IconOverlayTest"
class IconOverlay:
_reg_desc_ = 'Icon Overlay COM Server'
_public_methods_ = ['GetOverlayInfo', 'GetPriority','IsMemberOf']
_com_interfaces_ = [shell.IID_IShellIconOverlayIdentifier, pythoncom.IID_IDispatch]
def __init__(self, *_args, **_kwargs):
self._icon = None
self._icon_id = None
raise NotImplementedError
def GetOverlayInfo(self):
return self._icon, 0, shellcon.ISIOI_ICONFILE
def GetPriority(self):
return 0
def IsMemberOf(self, path, _attrs):
if is_member(path, self._icon_id):
return winerror.S_OK
return winerror.E_FAIL
class IconOverlay_test(IconOverlay):
_reg_progid_ = 'a.TestServer1'
_reg_clsid_ = '{8B19F050-8354-11E1-A0FE-5C260A5D15E4}'
def __init__(self):
self._icon = "C:\\Users\\Administrator\\mat\\icon_overlay\\icons\\1.ico"
self._icon_id = 101
classes = [IconOverlay_test,]
def is_member(path, icon_id):
try:
conn = None
conn = Client("\\\\.\\pipe\\test.listener", "AF_PIPE")
conn.send(path)
if conn.poll(3):
reply = conn.recv()
return reply == icon_id
except Exception:
pass
finally:
conn and conn.close()
return False
def DllRegisterServer():
print("Registering %s ......."%IconOverlay._reg_desc_)
import winreg
#winreg = _winreg
for view in [winreg.KEY_WOW64_64KEY, winreg.KEY_WOW64_32KEY]:
for cls in classes:
with winreg.CreateKeyEx(winreg.HKEY_LOCAL_MACHINE, r"%s\%s" %
(REG_PATH, cls._reg_progid_), 0,
winreg.KEY_ALL_ACCESS|view) as hkey:
print(" %s"%cls)
winreg.SetValueEx(hkey, None, 0, winreg.REG_SZ, cls._reg_clsid_)
print("Registration complete: %s" % IconOverlay._reg_desc_)
def DllUnregisterServer():
print("Unregistering %s ......."%IconOverlay._reg_desc_)
import winreg
#winreg = _winreg
for view in [winreg.KEY_WOW64_64KEY, winreg.KEY_WOW64_32KEY]:
for cls in classes:
try:
_key = winreg.DeleteKeyEx(winreg.HKEY_LOCAL_MACHINE, r"%s\%s"
% (REG_PATH, cls._reg_progid_),
winreg.KEY_ALL_ACCESS|view)
except WindowsError as err:
if err.errno != 2:
raise
print("Unregistration complete: %s" % IconOverlay._reg_desc_)
if __name__ == '__main__':
from win32com.server import register
register.UseCommandLine(*classes,
finalize_register = DllRegisterServer,
finalize_unregister = DllUnregisterServer)

This is really painful to get working right, goodluck!
I believe windows will only allow 10 different icons to be registered and it will only work with the first 10 registered alphabetically. Do you already have 10 registered? It's quite easy to exceed 10 if you have dropbox, tortoise-svn etc. installed as each image counts as an entry. If that's the case try putting an underscore or a 0 before the name to make sure it gets priority, although it will mean another icon will lose out- I don't think there's a way around this.
Also sometimes windows doesn't know to refresh the status of the icon. Which version of windows are you running? Some are worse than others. I seem to remember XP isn't very good at this. There's some tricks to get it to update though, you can refresh the window through windows api, but it looks horrible and the whole of explorer flashes. A better way I found is to change the an attribute on the file. This is the trick I used:
import stat,os
file_att= os.stat(path)[0]
if file_att & stat.S_IWRITE:
os.chmod(path,stat.S_IREAD)
os.chmod(path,stat.S_IWRITE)
else:
os.chmod(path,stat.S_IWRITE)
os.chmod(path,stat.S_IREAD)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to monitor Python files for changes? - python

Shameless plug. There's also http://github.com/gorakhargosh/watchdog that I'm working on to do exactly this. HTH.

gamin is another option which is slightly less Linux-specific.

This is operating system specific. For Linux, there is inotify, see e.g. http://github.com/rvoicilas/inotify-tools/

Related

Check if calling script used "if name == "main" (to comply with multiprocessing requirement)

Keeping track of when Python modules are imported

Changes made to object attribute not seen when using the multiprocessing module

python importlib can't find a module inside a daemon context

Icon overlay in windows not reflecting in explorer

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to monitor Python files for changes? - python

Shameless plug. There's also http://github.com/gorakhargosh/watchdog that I'm working on to do exactly this. HTH.

gamin is another option which is slightly less Linux-specific.

This is operating system specific. For Linux, there is inotify, see e.g. http://github.com/rvoicilas/inotify-tools/

Related

Check if calling script used "if __name__ == "__main__" (to comply with multiprocessing requirement)

Keeping track of when Python modules are imported

Changes made to object attribute not seen when using the multiprocessing module

python importlib can't find a module inside a daemon context

Icon overlay in windows not reflecting in explorer

Categories

Resources

Check if calling script used "if name == "main" (to comply with multiprocessing requirement)