Python: force every import to reload - python

Is there a way to force import x to always reload x in Python (i.e., as if I had called reload(x), or imp.reload(x) for Python 3)? Or in general, is there some way to force some code to be run every time I run import x? I'm OK with monkey patching or hackery.
I've tried moving the code into a separate module and deleting x from sys.modules in that separate file. I dabbled a bit with import hooks, but I didn't try too hard because according to the documentation, they are only called after the sys.modules cache is checked. I also tried monkeypatching sys.modules with a custom dict subclass, but whenever I do that, from module import submodule raises KeyError (I'm guessing sys.modules is not a real dictionary).
Basically, I'm trying to write a debugging tool (which is why some hackery is OK here). My goal is simply that import x is shorter to type than import x;x.y.

If you really want to change the semantics of the import statement, you will have to patch the interpreter. import checks whether the named module already is loaded and if so it does nothing more. You would have to change exactly that, and that is hard-wired in the interpreter.
Maybe you can live with patching the Python sources to use myImport('modulename') instead of import modulename? That would make it possible within Python itself.

Taking a lead from Alfe's answer, I got it to work like this. This goes at the module level.
def custom_logic():
# Put whatever you want to run when the module is imported here
# This version is run on the first import
custom_logic()
def __myimport__(name, *args, **kwargs):
if name == 'x': # Replace with the name of this module
# This version is run on all subsequent imports
custom_logic()
return __origimport__(name, *args, **kwargs)
# Will only be run on first import
__builtins__['__origimport__'] = __import__
__builtins__['__import__'] = __myimport__
We are monkeypatching __builtins__, which is why __origimport__ is defined when __myimport__ is run.

Related

How does import and reload actually work when dealing with packages

I have issues understanding some subtleties of the Python import system. I have condensed my doubts around a minimal example and a number of concrete and related questions detailed below.
I have defined a package in a folder called modules, whose content is an __init__.py and two regular modules, one with general functionality for the package and other with the definitions for the end user. The content is as simple as:
init.py
from .base import *
from .implementation import *
base.py
class FactoryClass():
registry = {}
#classmethod
def add_to_registry(cls, newclass):
cls.registry[newclass.__name__] = newclass
#classmethod
def getobject(cls, classname, *args, **kwargs):
return cls.registry[classname](*args, **kwargs)
class BaseClass():
def hello(self):
print(f"Hello from instance of class {type(self).__name__}")
implementation.py
from .base import BaseClass, FactoryClass
class First(BaseClass):
pass
class Second(BaseClass):
pass
FactoryClass.add_to_registry(First)
FactoryClass.add_to_registry(Second)
The user of the package will use it as:
import modules
a = modules.FactoryClass.getobject("First")
b = modules.FactoryClass.getobject("Second")
a.hello()
b.hello()
This works. The problem comes because I'm developing this, and my workflow includes adding functionality in implementation.py and then continaully test it by reloading the module. But I can not understand/predict what module I have to reload to have the functions updated. I'm making changes that have no effect and it drives me crazy (until yesterday I was working on a large .py file with all code lumped together, so I had none of these problems).
Here are some test I have done, and I'd like to understand what's happening and why.
First, I start commenting out all mentions to Second class in implementation.py (to pretend it was not yet developed):
from importlib import reload
import modules
modules.base.FactoryClass is modules.FactoryClass # returns True
modules.FactoryClass.registry # just First class is in registry
a = modules.FactoryClass.getobject("First")
b = modules.FactoryClass.getobject("Second") # raises KeyError as expected
This code and its output is pretty clear. The only thing that puzzles me is why there is a modules.base module at all (I did not import it!). Further, it is redundant as their classes point to the same objects. Why importing modules also imports modules.base and modules.implementation as separate but essentially identical objects?
Now things become interesting as I comment out, i.e. I finish developing Second, and I'd like to test it without having to restart the Python session. I have tried 3 different reloads:
reload (modules)
This does absolutely nothing. I'd expect some sort of recursivity, but as I have found in many other threats, this is the expected behavior.
Now I try to manually reload one of those "unexpected" modules:
reload (modules.implementation)
modules.base.FactoryClass is modules.FactoryClass # True
modules.FactoryClass.registry # First and Second
a = modules.FactoryClass.getobject("First")
b = modules.FactoryClass.getobject("Second") # Works as expected
This seems to be the right way to go. It updates the module contents as expected and the new functionality is usable. What puzzles me is why modules.FactoryClass has been updated (its registry) despite the fact that I did not reload the modules.base module. I'd expect this function to stay "outdated".
Finally, and starting from the just freshly uncommented version, I have tried
reload (modules.base)
modules.base.FactoryClass is modules.FactoryClass # False
modules.FactoryClass.registry # just First class is in registry
modules.base.FactoryClass.registry # empty
a = modules.base.FactoryClass.getobject("First")
b = modules.base.FactoryClass.getobject("Second") # raises KeyError
This is very odd. modules.FactoryClass is outdated (Second is unknown). modules.base.Factory is empty. Why are now modules.FactoryClass and modules.base.FactoryClass different objects?
Could someone explain why the three different versions of reload a package have so different behaviour?
You are confused about how the Python import system works, so I strongly recommend you read the corresponding documentations : the import system and importlib.reload.
A foreword : code hot-reloading in Python is tricky. I recommend to not do that if it is not required. You have seen it yourself : bugs are very tricky.
Then to your questions :
Why importing modules also imports modules.base and modules.implementation as separate but essentially identical objects?
As #Kemp answered as a comment (and I upvoted), imports are transitive. When you import a, Python will parse/compile/execute the corresponding library file. If the module does import b then Python will do it again for the b library file, and again and again. You don't see it, but when your program starts there is already a lot of things that have been imported.
Given this file :
print("nothing else")
When I set my debugger to pause before executing the print line, if I look into sys.modules I already have 338 different libraries imported : builtins (where print came from), sys, itertools, enum, json, ...
Understand that "no visible import statement" does not mean "nothing have been imported".
When you execute import a, Python will start by checking its sys.modules cache to determine if the library have already been read from disk, parsed, compiled and executed into a module object. If this library was not yet imported during this program, then Python will take the time to do all that. But because it is slow, Python optimize with a cache.
The result is a module object, that gets bind the current namespace, so that you can access it.
We can summerize it like that :
def import_library(name: str) -> Module:
if name not in sys.modules:
# cache miss
filepath = locate_library(name)
bytecode = compile_library(filepath)
module = execute(bytecode)
sys.modules[name] = module
# in any case, at this point, the module is available
return sys.modules[name]
You are thus confusing module objects with variables.
In any module you can declare variables with whatever name (but allowed by Python's grammar). And some of them will reference modules.
here is an example :
# file: main.py
import lib # create a variable named `lib` referencing the `lib` library
import lib as horse # create a variable named `horse` referencing the `lib` library
print(lib.a.number) # 14
print(horse.a.number) # 14
print(lib is horse) # True
print(lib.a.sublib.__name__) # lib.sublib
import lib.sublib
from lib import sublib
import lib.sublib as lib_sublib
print((lib.sublib is sublib, sublib is lib_sublib, lib.a.zebra is sublib)) # (True, True, True)
import sys
print(sys.modules["lib"] is lib) # True
print(sys.modules["lib.sublib"] is sublib) # True
print(lib.sublib.package_color) # blue
print(lib.sublib.color) # AttributeError: module 'lib.sublib' has no attribute 'color'
# file: lib/__init__.py
from . import a
# file: lib/a.py
from . import sublib
from . import sublib as zebra
number = 14
# file: lib/sublib/__init__.py
from .b import color as package_color
# file: lib/sublib/b.py
color = "blue"
Python offers a lot of flexibility about how to import things, what to expose, how to access. But I admit it is confusing.
Also take a look at the role of __all__ in __init__.py. Given that, you should now understand your question on subpackage naming/visibility.
reload (modules) This does absolutely nothing. I'd expect some sort of recursivity, but as I have found in many other threats, this is the expected behavior.
Given what I explained, can you now understand what it does ? And why what it does is not what you want it to do ?
Because what you want is to get modules.implementations hot-reloaded, but you ask for modules.
>>> from importlib import reload
>>> import lib
>>> lib.sublib.package_color
'blue'
>>> # I edit the "b.py" file so that the color is "red"
>>> old_lib = lib
>>> new_lib = reload(lib)
>>> lib is new_lib, lib is old_lib
(True, True)
>>> lib.sublib.package_color
'blue'
>>> lib.sublib.b.color
'red'
>>> import sys
>>> sys.modules["lib.sublib.b"].color
'red'
First, the top-level reload did not work, because what the file only did was import sublib, which hit the cache, so nothing really gets done.
You have to reload the actual module for its content to takes effect. But it does not work magically : it will create new objects (module-level definitions) and put them into the same module object, but it can't update references that may exist on the preceding module's content. That is why we see a "blue" even after the module has been reloaded : the package_color is a reference to the first version's color variable, it does not get updated when the module is reloaded. This is dangerous : there may be different copies of similar things lying around.
why modules.FactoryClass has been updated
You are reloading modules.implementation in this case. What happens is that it reloads the whole file to populate the module object, I highlighted the perceived effects :
from .base import BaseClass, FactoryClass # relative library "base" already in the `sys.modules` cache
class First(BaseClass): # was already defined, redefined
pass
class Second(BaseClass): # was not defined yed, created
pass
FactoryClass.add_to_registry(First) # overwritting the registry for "First" with the redefinition of class `First`
FactoryClass.add_to_registry(Second) # registering for "Second" the definition of class `Second`
You can see it another way :
>>> import modules
>>> from importlib import reload
>>> First_before = modules.implementation.First
>>> reload(modules.implementation)
<module 'modules.implementation' from 'C:\\PycharmProjects\\stack_overflow\\68069397\\modules\\implementation.py'>
>>> First_after = modules.implementation.First
>>> First_before_reload is First_after_reload
False
When you are reloading a module, Python will re-execute all the code, in a way that may be different than the previous time(s). Here, each time you are doing FactoryClass.add_to_registry so the FactoryClass gets updated with the (re)definitions.
Why are now modules.FactoryClass and modules.base.FactoryClass different objects?
Because you reloaded modules.base, creating a new FactoryClass class object, but the from .base import BaseClass, FactoryClass does not get reloaded, so it is still using the class object from before the reload.
Because you reloaded, you got yourselves copy of everything. The problem is that you still have lingering references to versions of before the reload.
I hope it answers your questions.
Import is not easy, but reloading is notably tricky.
If you truly desire to reload your code, then you will have to take extra extra extra care to correctly re-import everything, in the correct order (if such an order exist, if there is not too much side-effects). You need a proper script to update everything correctly, and in case it does not work perfectly, you will frequently have horrible, sad and mind-bending bugs.
But if you prefer keep your sanity, if your workflow does not require you to reload parts of the program, just close it and restart it. There is nothing wrong with that. It is the simple solution. The sane solution.
TL;DR: don't use importlib.reload if you don't know how it works exactly and understand the risks.

Pass-through/export whole third party module (using __all__?)

I have a module that wraps another module to insert some shim logic in some functions. The wrapped module uses a settings module mod.settings which I want to expose, but I don't want the users to import it from there, in case I would like to shim something there as well in the future. I want them to import wrapmod.settings.
Importing the module and exporting it works, but is a bit verbose on the client side. It results in having to write settings.thing instead of just thing.
I want the users to be able to do from wrapmod.settings import * and get the same results as if they did from mod.settings import * but right now, only from wrapmod import settings is available. How to I work around this?
If I understand the situation correctly, you're writing a module wrapmod that is intended to transform parts of an existing package mod. The specific part you're transforming is the submodule mod.settings. You've imported the settings module and made your changes to it, but even though it is available as wrapmod.settings, you can't use that module name in an from ... import ... statement.
I think the best way to fix that is to insert the modified module into sys.modules under the new dotted name. This makes Python accept that name as valid even though wrapmod isn't really a package.
So wrapmod would look something like:
import sys
from mod import settings
# modify settings here
sys.modules['wrapmod.settings'] = settings # add this line!
I ended up making a code-generator for a thin wrapper module instead, since the sys.module hacking broke all IDE integration.
from ... import mod
# this is just a pass-through wrapper around mod.settings
__all__ = mod.__all__
# generate pass-through wrapper around mod.settings; doesn't break IDE integration, unlike manual sys.modules editing.
if __name__ == "__main__":
for thing in settings.__all__:
print(thing + " = mod." + thing)
which when run as a script, outputs code that can then be appended to the end of this file.

How to make a copy of a python module at runtime?

I need to make a copy of a socket module to be able to use it and to have one more socket module monkey-patched and use it differently.
Is this possible?
I mean to really copy a module, namely to get the same result at runtime as if I've copied socketmodule.c, changed the initsocket() function to initmy_socket(), and installed it as my_socket extension.
You can always do tricks like importing a module then deleting it from sys.modules or trying to copy a module. However, Python already provides what you want in its Standard Library.
import imp # Standard module to do such things you want to.
# We can import any module including standard ones:
os1=imp.load_module('os1', *imp.find_module('os'))
# Here is another one:
os2=imp.load_module('os2', *imp.find_module('os'))
# This returns True:
id(os1)!=id(os2)
Python3.3+
imp.load_module is deprecated in python3.3+, and recommends the use of importlib
#!/usr/bin/env python3
import sys
import importlib.util
SPEC_OS = importlib.util.find_spec('os')
os1 = importlib.util.module_from_spec(SPEC_OS)
SPEC_OS.loader.exec_module(os1)
sys.modules['os1'] = os1
os2 = importlib.util.module_from_spec(SPEC_OS)
SPEC_OS.loader.exec_module(os2)
sys.modules['os2'] = os2
del SPEC_OS
assert os1 is not os2, \
"Module `os` instancing failed"
Here, we import the same module twice but as completely different module objects. If you check sys.modules, you can see two names you entered as first parameters to load_module calls. Take a look at the documentation for details.
UPDATE:
To make the main difference of this approach obvious, I want to make this clearer: When you import the same module this way, you will have both versions globally accessible for every other module you import in runtime, which is exactly what the questioner needs as I understood.
Below is another example to emphasize this point.
These two statements do exactly the same thing:
import my_socket_module as socket_imported
socket_imported = imp.load_module('my_socket_module',
*imp.find_module('my_socket_module')
)
On second line, we repeat 'my_socket_module' string twice and that is how import statement works; but these two strings are, in fact, used for two different reasons.
Second occurrence as we passed it to find_module is used as the file name that will be found on the system. The first occurrence of the string as we passed it to load_module method is used as system-wide identifier of the loaded module.
So, we can use different names for these which means we can make it work exactly like we copied the python source file for the module and loaded it.
socket = imp.load_module('socket_original', *imp.find_module('my_socket_module'))
socket_monkey = imp.load_module('socket_patched',*imp.find_module('my_socket_module'))
def alternative_implementation(blah, blah):
return 'Happiness'
socket_monkey.original_function = alternative_implementation
import my_sub_module
Then in my_sub_module, I can import 'socket_patched' which does not exist on system! Here we are in my_sub_module.py.
import socket_patched
socket_patched.original_function('foo', 'bar')
# This call brings us 'Happiness'
This is pretty disgusting, but this might suffice:
import sys
# if socket was already imported, get rid of it and save a copy
save = sys.modules.pop('socket', None)
# import socket again (it's not in sys.modules, so it will be reimported)
import socket as mysock
if save is None:
# if we didn't have a saved copy, remove my version of 'socket'
del sys.modules['socket']
else:
# if we did have a saved copy overwrite my socket with the original
sys.modules['socket'] = save
Here's some code that creates a new module with the functions and variables of the old:
def copymodule(old):
new = type(old)(old.__name__, old.__doc__)
new.__dict__.update(old.__dict__)
return new
Note that this does a fairly shallow copy of the module. The dictionary is newly created, so basic monkey patching will work, but any mutables in the original module will be shared between the two.
Edit: According to the comment, a deep copy is needed. I tried messing around with monkey-patching the copy module to support deep copies of modules, but that didn't work. Next I tried importing the module twice, but since modules are cached in sys.modules, that gave me the same module twice. Finally, the solution I hit upon was removing the modules from sys.modules after importing it the first time, then importing it again.
from imp import find_module, load_module
from sys import modules
def loadtwice(name, path=None):
"""Import two copies of a module.
The name and path arguments are as for `find_module` in the `imp` module.
Note that future imports of the module will return the same object as
the second of the two returned by this function.
"""
startingmods = modules.copy()
foundmod = find_module(name, path)
mod1 = load_module(name, *foundmod)
newmods = set(modules) - set(startingmods)
for m in newmods:
del modules[m]
mod2 = load_module(name, *foundmod)
return mod1, mod2
Physically copy the socket module to socket_monkey and go from there? I don't feel you need any "clever" work-around... but I might well be over simplifying!

How to tell if a Python modules I being reload()ed from within the module

When writing a Python module, is there a way to tell if the module is being imported or reloaded?
I know I can create a class, and the __init__() will only be called on the first import, but I hadn't planning on creating a class. Though, I will if there isn't an easy way to tell if we are being imported or reloaded.
The documentation for reload() actually gives a code snippet that I think should work for your purposes, at least in the usual case. You'd do something like this:
try:
reloading
except NameError:
reloading = False # means the module is being imported
else:
reloading = True # means the module is being reloaded
What this really does is detect whether the module is being imported "cleanly" (e.g. for the first time) or is overwriting a previous instance of the same module. In the normal case, a "clean" import corresponds to the import statement, and a "dirty" import corresponds to reload(), because import only really imports the module once, the first time it's executed (for each given module).
If you somehow manage to force a subsequent execution of the import statement into doing something nontrivial, or if you somehow manage to import your module for the first time using reload(), or if you mess around with the importing mechanism (through the imp module or the like), all bets are off. In other words, don't count on this always working in every possible situation.
P.S. The fact that you're asking this question makes me wonder if you're doing something you probably shouldn't be doing, but I won't ask.
>>> import os
>>> os.foo = 5
>>> os.foo
5
>>> import os
>>> os.foo
5

How do I override a Python import?

I'm working on pypreprocessor which is a preprocessor that takes c-style directives and I've been able to make it work like a traditional preprocessor (it's self-consuming and executes postprocessed code on-the-fly) except that it breaks library imports.
The problem is: The preprocessor runs through the file, processes it, outputs to a temporary file, and exec() the temporary file. Libraries that are imported need to be handled a little different, because they aren't executed, but rather they are loaded and made accessible to the caller module.
What I need to be able to do is: Interrupt the import (since the preprocessor is being run in the middle of the import), load the postprocessed code as a tempModule, and replace the original import with the tempModule to trick the calling script with the import into believing that the tempModule is the original module.
I have searched everywhere and so far and have no solution.
This Stack Overflow question is the closest I've seen so far to providing an answer:
Override namespace in Python
Here's what I have.
# Remove the bytecode file created by the first import
os.remove(moduleName + '.pyc')
# Remove the first import
del sys.modules[moduleName]
# Import the postprocessed module
tmpModule = __import__(tmpModuleName)
# Set first module's reference to point to the preprocessed module
sys.modules[moduleName] = tmpModule
moduleName is the name of the original module, and tmpModuleName is the name of the postprocessed code file.
The strange part is this solution still runs completely normal as if the first module completed loaded normally; unless you remove the last line, then you get a module not found error.
Hopefully someone on Stack Overflow know a lot more about imports than I do, because this one has me stumped.
Note: I will only award a solution, or, if this is not possible in Python; the best, most detailed explanation of why this is not impossible.
Update: For anybody who is interested, here is the working code.
if imp.lock_held() is True:
del sys.modules[moduleName]
sys.modules[tmpModuleName] = __import__(tmpModuleName)
sys.modules[moduleName] = __import__(tmpModuleName)
The 'imp.lock_held' part detects whether the module is being loaded as a library. The following lines do the rest.
Does this answer your question? The second import does the trick.
Mod_1.py
def test_function():
print "Test Function -- Mod 1"
Mod_2.py
def test_function():
print "Test Function -- Mod 2"
Test.py
#!/usr/bin/python
import sys
import Mod_1
Mod_1.test_function()
del sys.modules['Mod_1']
sys.modules['Mod_1'] = __import__('Mod_2')
import Mod_1
Mod_1.test_function()
To define a different import behavior or to totally subvert the import process you will need to write import hooks. See PEP 302.
For example,
import sys
class MyImporter(object):
def find_module(self, module_name, package_path):
# Return a loader
return self
def load_module(self, module_name):
# Return a module
return self
sys.meta_path.append(MyImporter())
import now_you_can_import_any_name
print now_you_can_import_any_name
It outputs:
<__main__.MyImporter object at 0x009F85F0>
So basically it returns a new module (which can be any object), in this case itself. You may use it to alter the import behavior by returning processe_xxx on import of xxx.
IMO: Python doesn't need a preprocessor. Whatever you are accomplishing can be accomplished in Python itself due to it very dynamic nature, for example, taking the case of the debug example, what is wrong with having at top of file
debug = 1
and later
if debug:
print "wow"
?
In Python 2 there is the imputil module that seems to provide the functionality you are looking for, but has been removed in python 3. It's not very well documented but contains an example section that shows how you can replace the standard import functions.
For Python 3 there is the importlib module (introduced in Python 3.1) that contains functions and classes to modify the import functionality in all kinds of ways. It should be suitable to hook your preprocessor into the import system.

Categories