Deep Reload Modules from only One Package - python

How do you deep reload a package without importing modules from any other outside packages?
For example, for reloading the following:
# example_pkg.py
import logging # do not reload this stdlib package
import example_pkg.ex_mod # reload this module
IPython's deepreload module cannot specify a whitelist for imports and uses module level variables during reloads which makes it unreliable for threaded environments.
A question similar to this has been asked before, but it focused on discovering dependencies (as was mentioned in a comment) not exclusively within a single package.

Using the sys and importlib modules, a function can be written to remove the package and its modules from Python's import cache. This allows for the package to reload its child modules when it is re-imported.
import sys
import importlib
from types import ModuleType
def deep_reload(m: ModuleType):
name = m.__name__ # get the name that is used in sys.modules
name_ext = name + '.' # support finding sub modules or packages
del m
def compare(loaded: str):
return (loaded == name) or loaded.startswith(name_ext)
all_mods = tuple(sys.modules) # prevent changing iterable while iterating over it
sub_mods = filter(compare, all_mods)
for pkg in sub_mods:
del sys.modules[pkg] # remove sub modules and packages from import cache
return importlib.import_module(name)
This code can be extended with a Lock to make it thread-safe as well:
from threading import Lock
sys_mod_lock = Lock() # all accesses to sys.modules must be programmed to acquire this lock first
# this means do not use any builtin import mechanism such as the 'import' statement once the following function is being used
# instead use importlib's import_module function while sys_mod_lock is acquired
def tsafe_reload(m: ModuleType):
with sys_mod_lock:
return deep_reload(m)
Note: these functions come with one of the caveats from the standard library's reload. Any references elsewhere in the program leading to the old package will be maintained and will not be replaced automatically. For that, you can look at globalsub, which can replace all references to an object in the interpreter with a different object (usually for debugging purposes).

Related

List project modules imported both directly and indirectly

I've been caught out by circular imports on a large project. So I'm seeking to find a way to test my code to see which of the modules in the project (and only in the project) are imported when an import statement is run. This is to inform refactoring and make sure there isn't an import somewhere deep within a package that's causing a problem.
Suppose I import project package 'agent', I want to know which project modules also get imported as a result. For instance if 'environment' and 'policy' are imported due to modules deep within the agent package containing those import statements, then I want to see just those listed. So not numpy modules listed for example as they are outside the project and so not relevant for circular dependencies.
So far I have this:
import sys
import agent # project module
for k, v in sys.modules.items():
print(f"key: {k} value: {v}")
example rows:
key: numpy.random value: <module 'numpy.random' from '/home/robin/Python/anaconda3/envs/rl/lib/python3.9/site-packages/numpy/random/__init__.py'>
key: environment value: <module 'environment' from '/home/robin/Python/Projects/RL_Sutton/Cliff/environment/__init__.py'>
This does return the modules imported both directly and indirectly but also includes a lot else such as all the components of numpy and builtins etc... If I could filter this dictionary that would solve it.
k is a str, v is <class 'module'>.
The module's __str__ method does return the module file path within it so I suppose that could be used but it's not a clean solution. I've tried looking at the documentation for sys.modules and module_type but nothing there gives a way to filter modules to the current project (that I could see).
I tried to modify the solutions for each of these without success:
How to list imported modules?
List imported modules from an imported module in Python 3
ModuleFinder also looked promising but from the limited example I couldn't see how to make path or excludes solve the problem.
Update
I didn't specify this in the original question but I'm importing modules that often look like this:
from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
import environment
import policy
ModuleFinder will find environment and policy even though they won't be imported at runtime and don't matter for cyclic imports. So I adapted the accepted answer below to find only runtime imports.
import agent
import sys
app_dir = '/path/to/projects_folder'
imported_module_names = []
for module_name, mod in sys.modules.items():
file = getattr(mod, '__file__', '')
if str(file).startswith(app_dir) and module_name != '__main__':
imported_module_names.append(module_name)
for module_name in sorted(imported_module_names):
print(module_name)
You can use modulefinder to run a script and inspect the imported modules. These can be filtered by using the __file__ attribute (given that you actually import these modules from the file system; don't worry about the dunder attribute, it's for consistency with the builtin module type):
from modulefinder import ModuleFinder
finder = ModuleFinder()
finder.run_script('test.py')
appdir = '/path/to/project'
modules = {name: mod for name, mod in finder.modules.items()
if mod.__file__ is not None
and mod.__file__.startswith(appdir)}
for name in modules.keys():
print(f"{name}")
You can invoke Python with the -v command line option which will print a message each time a module is initialized.

Register global import hooks in python

I want to add libraries access control to my installation of python and I wanted to know if there is some way to hook into the import system, so that I can run a script to check if a python program is allowed to import libraries, to block untrusted modules from importing dangerous native modules that could leak information on my system, like os.
While researching by myself, I found out about PEP 302, which sounds like what I am looking for, but I couldn't find how to register those hooks installation-wide.
Would someone be able to tell me if there is a way in python to add such an import hook to all imports on the system rather than only on the currently executing program?
You can change the import of modules by implementing you own custom import loader object. A starting point in the documentation can be found here: https://docs.python.org/3/library/importlib.html
What you need to do is to create a loader that will act on the packages you want to check on, and then either load them, or raise a desired exception. In the case of modules what are not in your access control list, you should return None, this makes the import machinery load them normally. I have create a minimal example of this type of functionality that you can start from and extend to build your desired functionality.
import sys
import importlib
class ImportInterceptor(importlib.abc.Loader):
def __init__(self, package_permissions):
self.package_permissions = package_permissions
def find_module(self, fullname, path=None):
if fullname in self.package_permissions:
if self.package_permissions[fullname]:
return self
else:
raise ImportError("Package import was not allowed")
def load_module(self, fullname):
sys.meta_path = [x for x in sys.meta_path[1:] if x is not self]
module = importlib.import_module(fullname)
sys.meta_path = [self] + sys.meta_path
return module
if not hasattr(sys,'frozen'):
sys.meta_path = [ImportInterceptor({'textwrap': True, 'Pathlib': False})] + sys.meta_path
import textwrap
print(textwrap.dedent(' test'))
# Works fine
from pathlib import Path
# Raises exception
Note that the loader removes itself from sys.meta_path when loading the package. This is to avoid an infinite loop where it keeps calling itself every time it tries to load the module "for real".

Reloading packages (and their submodules) recursively in Python

In Python you can reload a module as follows...
import foobar
import importlib
importlib.reload(foobar)
This works for .py files, but for Python packages it will only reload the package and not any of the nested sub-modules.
With a package:
foobar/__init__.py
foobar/spam.py
foobar/eggs.py
Python Script:
import foobar
# assume `spam/__init__.py` is importing `.spam`
# so we dont need an explicit import.
print(foobar.spam) # ok
import importlib
importlib.reload(foobar)
# foobar.spam WONT be reloaded.
Not to suggest this is a bug, but there are times its useful to reload a package and all its submodules. (If you want to edit a module while a script runs for example).
What are some good ways to recursively reload a package in Python?
Notes:
For the purpose of this question assume the latest Python3.x
(currently using importlib)
Allowing that this may requre some edits to the modules themselves.
Assume that wildcard imports aren't used (from foobar import *), since they may complicate reload logic.
Heres a function that recursively loads a package.
Double checked that the reloaded modules are updated in the modules where they are used, and that issues with infinite recursion are checked for.
One restruction is it needs to run on a package (which only makes sense for packages anyway)
import os
import types
import importlib
def reload_package(package):
assert(hasattr(package, "__package__"))
fn = package.__file__
fn_dir = os.path.dirname(fn) + os.sep
module_visit = {fn}
del fn
def reload_recursive_ex(module):
importlib.reload(module)
for module_child in vars(module).values():
if isinstance(module_child, types.ModuleType):
fn_child = getattr(module_child, "__file__", None)
if (fn_child is not None) and fn_child.startswith(fn_dir):
if fn_child not in module_visit:
# print("reloading:", fn_child, "from", module)
module_visit.add(fn_child)
reload_recursive_ex(module_child)
return reload_recursive_ex(package)
# example use
import os
reload_package(os)
I'll offer another answer for the case in which you want to reload only a specific nested module. I found this to be useful for situations where I found myself editing a single subnested module, and reloading all sub-nested modules via a solution like ideasman42's approach or deepreload would produce undesired behavior.
assuming you want to reload a module into the workspace below
my_workspace.ipynb
import importlib
import my_module
import my_other_module_that_I_dont_want_to_reload
print(my_module.test()) #old result
importlib.reload(my_module)
print(my_module.test()) #new result
but my_module.py looks like this:
import my_nested_submodule
def test():
my_nested_submodule.do_something()
and you just made an edit in my_nested_submodule.py:
def do_something():
print('look at this cool new functionality!')
You can manually force my_nested_submodule, and only my_nested_submodule to be reloaded by adjusting my_module.py so it looks like the following:
import my_nested_submodule
import importlib
importlib.reload(my_nested_submodule)
def test():
my_nested_submodule.do_something()
I've updated the answer from #ideasman42 to always reload modules from the bottom of the dependency tree first. Note that it will raise an error if the dependency graph is not a tree (i.e. contains cycles) as I don't think it will be possible to cleanly reload all modules in that case.
import importlib
import os
import types
import pathlib
def get_package_dependencies(package):
assert(hasattr(package, "__package__"))
fn = package.__file__
fn_dir = os.path.dirname(fn) + os.sep
node_set = {fn} # set of module filenames
node_depth_dict = {fn:0} # tracks the greatest depth that we've seen for each node
node_pkg_dict = {fn:package} # mapping of module filenames to module objects
link_set = set() # tuple of (parent module filename, child module filename)
del fn
def dependency_traversal_recursive(module, depth):
for module_child in vars(module).values():
# skip anything that isn't a module
if not isinstance(module_child, types.ModuleType):
continue
fn_child = getattr(module_child, "__file__", None)
# skip anything without a filename or outside the package
if (fn_child is None) or (not fn_child.startswith(fn_dir)):
continue
# have we seen this module before? if not, add it to the database
if not fn_child in node_set:
node_set.add(fn_child)
node_depth_dict[fn_child] = depth
node_pkg_dict[fn_child] = module_child
# set the depth to be the deepest depth we've encountered the node
node_depth_dict[fn_child] = max(depth, node_depth_dict[fn_child])
# have we visited this child module from this parent module before?
if not ((module.__file__, fn_child) in link_set):
link_set.add((module.__file__, fn_child))
dependency_traversal_recursive(module_child, depth+1)
else:
raise ValueError("Cycle detected in dependency graph!")
dependency_traversal_recursive(package, 1)
return (node_pkg_dict, node_depth_dict)
# example use
import collections
node_pkg_dict, node_depth_dict = get_package_dependencies(collections)
for (d,v) in sorted([(d,v) for v,d in node_depth_dict.items()], reverse=True):
print("Reloading %s" % pathlib.Path(v).name)
importlib.reload(node_pkg_dict[v])

How to make a copy of a python module at runtime?

I need to make a copy of a socket module to be able to use it and to have one more socket module monkey-patched and use it differently.
Is this possible?
I mean to really copy a module, namely to get the same result at runtime as if I've copied socketmodule.c, changed the initsocket() function to initmy_socket(), and installed it as my_socket extension.
You can always do tricks like importing a module then deleting it from sys.modules or trying to copy a module. However, Python already provides what you want in its Standard Library.
import imp # Standard module to do such things you want to.
# We can import any module including standard ones:
os1=imp.load_module('os1', *imp.find_module('os'))
# Here is another one:
os2=imp.load_module('os2', *imp.find_module('os'))
# This returns True:
id(os1)!=id(os2)
Python3.3+
imp.load_module is deprecated in python3.3+, and recommends the use of importlib
#!/usr/bin/env python3
import sys
import importlib.util
SPEC_OS = importlib.util.find_spec('os')
os1 = importlib.util.module_from_spec(SPEC_OS)
SPEC_OS.loader.exec_module(os1)
sys.modules['os1'] = os1
os2 = importlib.util.module_from_spec(SPEC_OS)
SPEC_OS.loader.exec_module(os2)
sys.modules['os2'] = os2
del SPEC_OS
assert os1 is not os2, \
"Module `os` instancing failed"
Here, we import the same module twice but as completely different module objects. If you check sys.modules, you can see two names you entered as first parameters to load_module calls. Take a look at the documentation for details.
UPDATE:
To make the main difference of this approach obvious, I want to make this clearer: When you import the same module this way, you will have both versions globally accessible for every other module you import in runtime, which is exactly what the questioner needs as I understood.
Below is another example to emphasize this point.
These two statements do exactly the same thing:
import my_socket_module as socket_imported
socket_imported = imp.load_module('my_socket_module',
*imp.find_module('my_socket_module')
)
On second line, we repeat 'my_socket_module' string twice and that is how import statement works; but these two strings are, in fact, used for two different reasons.
Second occurrence as we passed it to find_module is used as the file name that will be found on the system. The first occurrence of the string as we passed it to load_module method is used as system-wide identifier of the loaded module.
So, we can use different names for these which means we can make it work exactly like we copied the python source file for the module and loaded it.
socket = imp.load_module('socket_original', *imp.find_module('my_socket_module'))
socket_monkey = imp.load_module('socket_patched',*imp.find_module('my_socket_module'))
def alternative_implementation(blah, blah):
return 'Happiness'
socket_monkey.original_function = alternative_implementation
import my_sub_module
Then in my_sub_module, I can import 'socket_patched' which does not exist on system! Here we are in my_sub_module.py.
import socket_patched
socket_patched.original_function('foo', 'bar')
# This call brings us 'Happiness'
This is pretty disgusting, but this might suffice:
import sys
# if socket was already imported, get rid of it and save a copy
save = sys.modules.pop('socket', None)
# import socket again (it's not in sys.modules, so it will be reimported)
import socket as mysock
if save is None:
# if we didn't have a saved copy, remove my version of 'socket'
del sys.modules['socket']
else:
# if we did have a saved copy overwrite my socket with the original
sys.modules['socket'] = save
Here's some code that creates a new module with the functions and variables of the old:
def copymodule(old):
new = type(old)(old.__name__, old.__doc__)
new.__dict__.update(old.__dict__)
return new
Note that this does a fairly shallow copy of the module. The dictionary is newly created, so basic monkey patching will work, but any mutables in the original module will be shared between the two.
Edit: According to the comment, a deep copy is needed. I tried messing around with monkey-patching the copy module to support deep copies of modules, but that didn't work. Next I tried importing the module twice, but since modules are cached in sys.modules, that gave me the same module twice. Finally, the solution I hit upon was removing the modules from sys.modules after importing it the first time, then importing it again.
from imp import find_module, load_module
from sys import modules
def loadtwice(name, path=None):
"""Import two copies of a module.
The name and path arguments are as for `find_module` in the `imp` module.
Note that future imports of the module will return the same object as
the second of the two returned by this function.
"""
startingmods = modules.copy()
foundmod = find_module(name, path)
mod1 = load_module(name, *foundmod)
newmods = set(modules) - set(startingmods)
for m in newmods:
del modules[m]
mod2 = load_module(name, *foundmod)
return mod1, mod2
Physically copy the socket module to socket_monkey and go from there? I don't feel you need any "clever" work-around... but I might well be over simplifying!

Creating aliases for Python packages?

I have a directory, let's call it Storage full of packages with unwieldy names like mypackage-xxyyzzww, and of course Storage is on my PYTHONPATH. Since packages have long unmemorable names, all of the packages are symlinked to friendlier names, such as mypackage.
Now, I don't want to rely on file system symbolic links to do this, instead I tried mucking around with sys.path and sys.modules. Currently I'm doing something like this:
import imp
imp.load_package('mypackage', 'Storage/mypackage-xxyyzzww')
How bad is it to do things this way, and is there a chance this will break in the future? One funny thing is that there's even no mention of imp.load_package function in the docs.
EDIT: besides not relying on symbolic links, I can't use PYTHONPATH variable anymore.
Instead of using imp, you can assign different names to imported modules.
import mypackage_xxyyzzww as mypackage
If you then create a __init__.py file inside of Storage, you can add several of the above lines to make importing easier.
Storage/__init__.py:
import mypackage_xxyyzzww as mypackage
import otherpackage_xxyyzzww as otherpackage
Interpreter:
>>> from Storage import mypackage, otherpackage
importlib may be more appropriate, as it uses/implements the PEP302 mechanism.
Follow the DictImporter example, but override find_module to find the real filename and store it in the dict, then override load_module to get the code from the found file.
You shouldn't need to use sys.path once you've created your Storage module
#from importlib import abc
import imp
import os
import sys
import logging
logging.basicConfig(level=logging.DEBUG)
dprint = logging.debug
class MyImporter(object):
def __init__(self,path):
self.path=path
self.names = {}
def find_module(self,fullname,path=None):
dprint("find_module({fullname},{path})".format(**locals()))
ml = imp.find_module(fullname,path)
dprint(repr(ml))
raise ImportError
def load_module(self,fullname):
dprint("load_module({fullname})".format(**locals()))
return imp.load_module(fullname)
raise ImportError
def load_storage( path, modname=None ):
if modname is None:
modname = os.path.basename(path)
mod = imp.new_module(modname)
sys.modules[modname] = mod
assert mod.__name__== modname
mod.__path__=[path]
#sys.meta_path.append(MyImporter(path))
mod.__loader__= MyImporter(path)
return mod
if __name__=="__main__":
load_storage("arbitrary-path-to-code/Storage")
from Storage import plain
from Storage import mypkg
Then when you import Storage.mypackage, python will immediately use your importer without bothering to look on sys.path
That doesn't work. The code above does work to import ordinary modules under Storage without requiring Storage to be on sys.path, but both 3.1 and 2.6 seem to ignore the loader attribute mentioned in PEP302.
If I uncomment the sys.meta_path line, 3.1 dies with StackOverflow, and 2.6 dies with ImportError. hmmm... I'm out of time now, but may look at it later.
Packages are just entries in the namespace. You should not name your path components with anything that is not a legal python variable name.

Categories