How to modify imported source code on-the-fly?

How to modify imported source code on-the-fly? - python

Suppose I have a module file like this:
# my_module.py
print("hello")
Then I have a simple script:
# my_script.py
import my_module
This will print "hello".
Let's say I want to "override" the print() function so it returns "world" instead. How could I do this programmatically (without manually modifying my_module.py)?
What I thought is that I need somehow to modify the source code of my_module before or while importing it. Obvisouly, I cannot do this after importing it so solution using unittest.mock are impossible.
I also thought I could read the file my_module.py, perform modification, then load it. But this is ugly, as it will not work if the module is located somewhere else.
The good solution, I think, is to make use of importlib.
I read the doc and found a very intersecting method: get_source(fullname). I thought I could just override it:
def get_source(fullname):
source = super().get_source(fullname)
source = source.replace("hello", "world")
return source
Unfortunately, I am a bit lost with all these abstract classes and I do not know how to perform this properly.
I tried vainly:
spec = importlib.util.find_spec("my_module")
spec.loader.get_source = mocked_get_source
module = importlib.util.module_from_spec(spec)
Any help would be welcome, please.

Here's a solution based on the content of this great talk. It allows any arbitrary modifications to be made to the source before importing the specified module. It should be reasonably correct as long as the slides did not omit anything important. This will only work on Python 3.5+.
import importlib
import sys
def modify_and_import(module_name, package, modification_func):
spec = importlib.util.find_spec(module_name, package)
source = spec.loader.get_source(module_name)
new_source = modification_func(source)
module = importlib.util.module_from_spec(spec)
codeobj = compile(new_source, module.__spec__.origin, 'exec')
exec(codeobj, module.__dict__)
sys.modules[module_name] = module
return module
So, using this you can do
my_module = modify_and_import("my_module", None, lambda src: src.replace("hello", "world"))

This doesn't answer the general question of dynamically modifying the source code of an imported module, but to "Override" or "monkey-patch" its use of the print() function can be done (since it's a built-in function in Python 3.x). Here's how:
#!/usr/bin/env python3
# my_script.py
import builtins
_print = builtins.print
def my_print(*args, **kwargs):
_print('In my_print: ', end='')
return _print(*args, **kwargs)
builtins.print = my_print
import my_module # -> In my_print: hello

I first needed to better understand the import operation. Fortunately, this is well explained in the importlib documentation and scratching through the source code helped too.
This import process is actually split in two parts. First, a finder is in charge of parsing the module name (including dot-separated packages) and instantiating an appropriate loader. Indeed, built-in are not imported as local modules for example. Then, the loader is called based on what the finder returned. This loader get the source from a file or from a cache, and executed the code if the module was not previously loaded.
This is very simple. This explains why I actually did not need to use abstract classes from importutil.abc: I do not want to provide my own import process. Instead, I could create a subclass inherited from one of the classes from importuil.machinery and override get_source() from SourceFileLoader for example. However, this is not the way to go because the loader is instantiated by the finder so I do not have the hand on which class is used. I cannot specify that my subclass should be used.
So, the best solution is to let the finder do its job, and then replace the get_source() method of whatever Loader has been instantiated.
Unfortunately, by looking trough the code source I saw that the basic Loaders are not using get_source() (which is only used by the the inspect module). So my whole idea could not work.
In the end, I guess get_source() should be called manually, then the returned source should be modified, and finally the code should be executed. This is what Martin Valgur detailed in his answer.
If compatibility with Python 2 is needed, I see no other way than reading the source file:
import imp
import sys
import types
module_name = "my_module"
file, pathname, description = imp.find_module(module_name)
with open(pathname) as f:
source = f.read()
source = source.replace('hello', 'world')
module = types.ModuleType(module_name)
exec(source, module.__dict__)
sys.modules[module_name] = module

If importing the module before the patching it is okay, then a possible solution would be
import inspect
import my_module
source = inspect.getsource(my_module)
new_source = source.replace('"hello"', '"world"')
exec(new_source, my_module.__dict__)
If you're after a more general solution, then you can also take a look at the approach I used in another answer a while ago.

My solution updates the source file, which works for the inner import situation. The inner import means that transformers.models.albert import modeling_albert from the source file. In such case, even if I use the solution from Martin Valgur, it won't work. So I update the source file. Hope it help the people who have the same trouble with me.
import inspect
from transformers.models.albert import modeling_albert
# Get source
source = inspect.getsource(modeling_albert)
source_before = "AlbertModel(config, add_pooling_layer=False)"
source_after = "AlbertModel(config, add_pooling_layer=True)"
new_source = source.replace(source_before, source_after)
# Update file
file_path = modeling_albert.__spec__.origin
with open(file_path, 'w') as f:
f.write(new_source)

Not elegant, but works for me (may have to add a path):
with open ('my_module.py') as aFile:
exec (aFile.read () .replace (<something>, <something else>))

Related

How does import and reload actually work when dealing with packages

I have issues understanding some subtleties of the Python import system. I have condensed my doubts around a minimal example and a number of concrete and related questions detailed below.
I have defined a package in a folder called modules, whose content is an __init__.py and two regular modules, one with general functionality for the package and other with the definitions for the end user. The content is as simple as:
init.py
from .base import *
from .implementation import *
base.py
class FactoryClass():
registry = {}
#classmethod
def add_to_registry(cls, newclass):
cls.registry[newclass.__name__] = newclass
#classmethod
def getobject(cls, classname, *args, **kwargs):
return cls.registry[classname](*args, **kwargs)
class BaseClass():
def hello(self):
print(f"Hello from instance of class {type(self).__name__}")
implementation.py
from .base import BaseClass, FactoryClass
class First(BaseClass):
pass
class Second(BaseClass):
pass
FactoryClass.add_to_registry(First)
FactoryClass.add_to_registry(Second)
The user of the package will use it as:
import modules
a = modules.FactoryClass.getobject("First")
b = modules.FactoryClass.getobject("Second")
a.hello()
b.hello()
This works. The problem comes because I'm developing this, and my workflow includes adding functionality in implementation.py and then continaully test it by reloading the module. But I can not understand/predict what module I have to reload to have the functions updated. I'm making changes that have no effect and it drives me crazy (until yesterday I was working on a large .py file with all code lumped together, so I had none of these problems).
Here are some test I have done, and I'd like to understand what's happening and why.
First, I start commenting out all mentions to Second class in implementation.py (to pretend it was not yet developed):
from importlib import reload
import modules
modules.base.FactoryClass is modules.FactoryClass # returns True
modules.FactoryClass.registry # just First class is in registry
a = modules.FactoryClass.getobject("First")
b = modules.FactoryClass.getobject("Second") # raises KeyError as expected
This code and its output is pretty clear. The only thing that puzzles me is why there is a modules.base module at all (I did not import it!). Further, it is redundant as their classes point to the same objects. Why importing modules also imports modules.base and modules.implementation as separate but essentially identical objects?
Now things become interesting as I comment out, i.e. I finish developing Second, and I'd like to test it without having to restart the Python session. I have tried 3 different reloads:
reload (modules)
This does absolutely nothing. I'd expect some sort of recursivity, but as I have found in many other threats, this is the expected behavior.
Now I try to manually reload one of those "unexpected" modules:
reload (modules.implementation)
modules.base.FactoryClass is modules.FactoryClass # True
modules.FactoryClass.registry # First and Second
a = modules.FactoryClass.getobject("First")
b = modules.FactoryClass.getobject("Second") # Works as expected
This seems to be the right way to go. It updates the module contents as expected and the new functionality is usable. What puzzles me is why modules.FactoryClass has been updated (its registry) despite the fact that I did not reload the modules.base module. I'd expect this function to stay "outdated".
Finally, and starting from the just freshly uncommented version, I have tried
reload (modules.base)
modules.base.FactoryClass is modules.FactoryClass # False
modules.FactoryClass.registry # just First class is in registry
modules.base.FactoryClass.registry # empty
a = modules.base.FactoryClass.getobject("First")
b = modules.base.FactoryClass.getobject("Second") # raises KeyError
This is very odd. modules.FactoryClass is outdated (Second is unknown). modules.base.Factory is empty. Why are now modules.FactoryClass and modules.base.FactoryClass different objects?
Could someone explain why the three different versions of reload a package have so different behaviour?

You are confused about how the Python import system works, so I strongly recommend you read the corresponding documentations : the import system and importlib.reload.
A foreword : code hot-reloading in Python is tricky. I recommend to not do that if it is not required. You have seen it yourself : bugs are very tricky.
Then to your questions :
Why importing modules also imports modules.base and modules.implementation as separate but essentially identical objects?
As #Kemp answered as a comment (and I upvoted), imports are transitive. When you import a, Python will parse/compile/execute the corresponding library file. If the module does import b then Python will do it again for the b library file, and again and again. You don't see it, but when your program starts there is already a lot of things that have been imported.
Given this file :
print("nothing else")
When I set my debugger to pause before executing the print line, if I look into sys.modules I already have 338 different libraries imported : builtins (where print came from), sys, itertools, enum, json, ...
Understand that "no visible import statement" does not mean "nothing have been imported".
When you execute import a, Python will start by checking its sys.modules cache to determine if the library have already been read from disk, parsed, compiled and executed into a module object. If this library was not yet imported during this program, then Python will take the time to do all that. But because it is slow, Python optimize with a cache.
The result is a module object, that gets bind the current namespace, so that you can access it.
We can summerize it like that :
def import_library(name: str) -> Module:
if name not in sys.modules:
# cache miss
filepath = locate_library(name)
bytecode = compile_library(filepath)
module = execute(bytecode)
sys.modules[name] = module
# in any case, at this point, the module is available
return sys.modules[name]
You are thus confusing module objects with variables.
In any module you can declare variables with whatever name (but allowed by Python's grammar). And some of them will reference modules.
here is an example :
# file: main.py
import lib # create a variable named `lib` referencing the `lib` library
import lib as horse # create a variable named `horse` referencing the `lib` library
print(lib.a.number) # 14
print(horse.a.number) # 14
print(lib is horse) # True
print(lib.a.sublib.__name__) # lib.sublib
import lib.sublib
from lib import sublib
import lib.sublib as lib_sublib
print((lib.sublib is sublib, sublib is lib_sublib, lib.a.zebra is sublib)) # (True, True, True)
import sys
print(sys.modules["lib"] is lib) # True
print(sys.modules["lib.sublib"] is sublib) # True
print(lib.sublib.package_color) # blue
print(lib.sublib.color) # AttributeError: module 'lib.sublib' has no attribute 'color'
# file: lib/__init__.py
from . import a
# file: lib/a.py
from . import sublib
from . import sublib as zebra
number = 14
# file: lib/sublib/__init__.py
from .b import color as package_color
# file: lib/sublib/b.py
color = "blue"
Python offers a lot of flexibility about how to import things, what to expose, how to access. But I admit it is confusing.
Also take a look at the role of __all__ in __init__.py. Given that, you should now understand your question on subpackage naming/visibility.
reload (modules) This does absolutely nothing. I'd expect some sort of recursivity, but as I have found in many other threats, this is the expected behavior.
Given what I explained, can you now understand what it does ? And why what it does is not what you want it to do ?
Because what you want is to get modules.implementations hot-reloaded, but you ask for modules.
>>> from importlib import reload
>>> import lib
>>> lib.sublib.package_color
'blue'
>>> # I edit the "b.py" file so that the color is "red"
>>> old_lib = lib
>>> new_lib = reload(lib)
>>> lib is new_lib, lib is old_lib
(True, True)
>>> lib.sublib.package_color
'blue'
>>> lib.sublib.b.color
'red'
>>> import sys
>>> sys.modules["lib.sublib.b"].color
'red'
First, the top-level reload did not work, because what the file only did was import sublib, which hit the cache, so nothing really gets done.
You have to reload the actual module for its content to takes effect. But it does not work magically : it will create new objects (module-level definitions) and put them into the same module object, but it can't update references that may exist on the preceding module's content. That is why we see a "blue" even after the module has been reloaded : the package_color is a reference to the first version's color variable, it does not get updated when the module is reloaded. This is dangerous : there may be different copies of similar things lying around.
why modules.FactoryClass has been updated
You are reloading modules.implementation in this case. What happens is that it reloads the whole file to populate the module object, I highlighted the perceived effects :
from .base import BaseClass, FactoryClass # relative library "base" already in the `sys.modules` cache
class First(BaseClass): # was already defined, redefined
pass
class Second(BaseClass): # was not defined yed, created
pass
FactoryClass.add_to_registry(First) # overwritting the registry for "First" with the redefinition of class `First`
FactoryClass.add_to_registry(Second) # registering for "Second" the definition of class `Second`
You can see it another way :
>>> import modules
>>> from importlib import reload
>>> First_before = modules.implementation.First
>>> reload(modules.implementation)
<module 'modules.implementation' from 'C:\\PycharmProjects\\stack_overflow\\68069397\\modules\\implementation.py'>
>>> First_after = modules.implementation.First
>>> First_before_reload is First_after_reload
False
When you are reloading a module, Python will re-execute all the code, in a way that may be different than the previous time(s). Here, each time you are doing FactoryClass.add_to_registry so the FactoryClass gets updated with the (re)definitions.
Why are now modules.FactoryClass and modules.base.FactoryClass different objects?
Because you reloaded modules.base, creating a new FactoryClass class object, but the from .base import BaseClass, FactoryClass does not get reloaded, so it is still using the class object from before the reload.
Because you reloaded, you got yourselves copy of everything. The problem is that you still have lingering references to versions of before the reload.
I hope it answers your questions.
Import is not easy, but reloading is notably tricky.
If you truly desire to reload your code, then you will have to take extra extra extra care to correctly re-import everything, in the correct order (if such an order exist, if there is not too much side-effects). You need a proper script to update everything correctly, and in case it does not work perfectly, you will frequently have horrible, sad and mind-bending bugs.
But if you prefer keep your sanity, if your workflow does not require you to reload parts of the program, just close it and restart it. There is nothing wrong with that. It is the simple solution. The sane solution.
TL;DR: don't use importlib.reload if you don't know how it works exactly and understand the risks.

Injecting variables into an import namespace

To illustrate what I am trying to do, let's say I have a module testmod that lives in ./testmod.py. The entire contents of this module is
x = test
I would like to be able to successfully import this module into Python, using any of the tools available in importlib or any other built in library.
Obviously doing a simple import testmod statement from the current directory results in an error: NameError: name 'test' is not defined.
I thought that maybe passing either globals or locals to __import__ correctly would modify the environment inside the script being run, but it does not:
>>> testmod = __import__('testmod', globals={'test': 'globals'}, locals={'test': 'locals'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jfoxrabi/testmod.py", line 1, in <module>
x = test
NameError: name 'test' is not defined
I was setting the value of test differently so I could see which dict testmod.x came from if this worked.
Since neither of these seems to work, I am stuck. Is it even possible to accomplish what I am trying to do? I would guess that yes, since this is Python, not Sparta.
I am using Python 3.5 on Anaconda. I would very much prefer not to use external libraries.
Update: The Why
I am importing a module into my program as a configuration file. The reason that I am not using JSON or INI is that I would like to have the full scope of Python's interpreter available to compute the values in the config from expressions. I would like to have certain values that I compute before-hand in the program available to do those calculations.
While I am aware of the fact that this is about as bad as calling eval (I do that too in my program), I am not concerned with the security aspect for the time being. I am, however, quite willing to entertain better solutions should this indeed turn out to be a case of XY.

I came up with a solution based on this answer and the importlib docs. Basically, I have access to the module object before it is loaded by using the correct sequence of calls to importlib:
from importlib.util import spec_from_file_location, module_from_spec
from os.path import splitext, basename
def loadConfig(fileName):
test = 'This is a test'
name = splitext(basename(fileName))[0]
spec = spec_from_file_location(name, fileName)
config = module_from_spec(spec)
config.test = test
spec.loader.exec_module(config)
return config
testmod = loadConfig('./testmod.py')
This is a bit better than modifying builtins, which may have unintended consequences in other parts of the program, and may also restrict the names I can pass in to the module.
I decided to put all the configuration items into a single field accessible at load time, which I named config. This allows me to do the following in testmod:
if 'test' in config:
x = config['test']
The loader now looks like this:
from importlib.util import spec_from_file_location, module_from_spec
from os.path import splitext, basename
def loadConfig(fileName, **kwargs):
name = splitext(basename(fileName))[0]
spec = spec_from_file_location(name, fileName)
config = module_from_spec(spec)
config.config = kwargs
spec.loader.exec_module(config)
return config
testmod = loadConfig('./testmod.py', test='This is a test')
After finding myself using this a bunch of times, I finally ended up adding this functionality to the utility library I maintain, haggis. haggis.load.load_module loads a text file as a module with injection, while haggis.load.module_as_dict does a more advanced version of the same that loads it as a potentially nested configuration file into a dict.

You could screw with Python's builtins to inject your own fake built-in test variable:
import builtins # __builtin__, no s, in Python 2
builtins.test = 5 # or whatever other placeholder value
import testmod
del builtins.test # clean up after ourselves

How to make a copy of a python module at runtime?

I need to make a copy of a socket module to be able to use it and to have one more socket module monkey-patched and use it differently.
Is this possible?
I mean to really copy a module, namely to get the same result at runtime as if I've copied socketmodule.c, changed the initsocket() function to initmy_socket(), and installed it as my_socket extension.

You can always do tricks like importing a module then deleting it from sys.modules or trying to copy a module. However, Python already provides what you want in its Standard Library.
import imp # Standard module to do such things you want to.
# We can import any module including standard ones:
os1=imp.load_module('os1', *imp.find_module('os'))
# Here is another one:
os2=imp.load_module('os2', *imp.find_module('os'))
# This returns True:
id(os1)!=id(os2)
Python3.3+
imp.load_module is deprecated in python3.3+, and recommends the use of importlib
#!/usr/bin/env python3
import sys
import importlib.util
SPEC_OS = importlib.util.find_spec('os')
os1 = importlib.util.module_from_spec(SPEC_OS)
SPEC_OS.loader.exec_module(os1)
sys.modules['os1'] = os1
os2 = importlib.util.module_from_spec(SPEC_OS)
SPEC_OS.loader.exec_module(os2)
sys.modules['os2'] = os2
del SPEC_OS
assert os1 is not os2, \
"Module `os` instancing failed"
Here, we import the same module twice but as completely different module objects. If you check sys.modules, you can see two names you entered as first parameters to load_module calls. Take a look at the documentation for details.
UPDATE:
To make the main difference of this approach obvious, I want to make this clearer: When you import the same module this way, you will have both versions globally accessible for every other module you import in runtime, which is exactly what the questioner needs as I understood.
Below is another example to emphasize this point.
These two statements do exactly the same thing:
import my_socket_module as socket_imported
socket_imported = imp.load_module('my_socket_module',
*imp.find_module('my_socket_module')
)
On second line, we repeat 'my_socket_module' string twice and that is how import statement works; but these two strings are, in fact, used for two different reasons.
Second occurrence as we passed it to find_module is used as the file name that will be found on the system. The first occurrence of the string as we passed it to load_module method is used as system-wide identifier of the loaded module.
So, we can use different names for these which means we can make it work exactly like we copied the python source file for the module and loaded it.
socket = imp.load_module('socket_original', *imp.find_module('my_socket_module'))
socket_monkey = imp.load_module('socket_patched',*imp.find_module('my_socket_module'))
def alternative_implementation(blah, blah):
return 'Happiness'
socket_monkey.original_function = alternative_implementation
import my_sub_module
Then in my_sub_module, I can import 'socket_patched' which does not exist on system! Here we are in my_sub_module.py.
import socket_patched
socket_patched.original_function('foo', 'bar')
# This call brings us 'Happiness'

This is pretty disgusting, but this might suffice:
import sys
# if socket was already imported, get rid of it and save a copy
save = sys.modules.pop('socket', None)
# import socket again (it's not in sys.modules, so it will be reimported)
import socket as mysock
if save is None:
# if we didn't have a saved copy, remove my version of 'socket'
del sys.modules['socket']
else:
# if we did have a saved copy overwrite my socket with the original
sys.modules['socket'] = save

Here's some code that creates a new module with the functions and variables of the old:
def copymodule(old):
new = type(old)(old.__name__, old.__doc__)
new.__dict__.update(old.__dict__)
return new
Note that this does a fairly shallow copy of the module. The dictionary is newly created, so basic monkey patching will work, but any mutables in the original module will be shared between the two.
Edit: According to the comment, a deep copy is needed. I tried messing around with monkey-patching the copy module to support deep copies of modules, but that didn't work. Next I tried importing the module twice, but since modules are cached in sys.modules, that gave me the same module twice. Finally, the solution I hit upon was removing the modules from sys.modules after importing it the first time, then importing it again.
from imp import find_module, load_module
from sys import modules
def loadtwice(name, path=None):
"""Import two copies of a module.
The name and path arguments are as for `find_module` in the `imp` module.
Note that future imports of the module will return the same object as
the second of the two returned by this function.
"""
startingmods = modules.copy()
foundmod = find_module(name, path)
mod1 = load_module(name, *foundmod)
newmods = set(modules) - set(startingmods)
for m in newmods:
del modules[m]
mod2 = load_module(name, *foundmod)
return mod1, mod2

Physically copy the socket module to socket_monkey and go from there? I don't feel you need any "clever" work-around... but I might well be over simplifying!

Analyse python project imports

In general I would like to understand what exactly code the code in my projects is actually using from a big framework.
First I want to know what are all the imports (possibly with static analysis), and then if possible which of these imports are actually used.
For the first problem I could use a regexp of course, but I would like to find a cleaner way.
but I don't see how with ast/inspect/parser.
And about the second problem I should be able to find out automatically if some of the imports are actually unused, but how can I do that?
EDIT:
about the second issue maybe the best way is a simple import hook, which just records everything it was imported and then call the default importing mechanisms.
So I tried something like:
class MyLoader(object):
"""
Loader object
"""
def __init__(self):
self.loaded = set()
def find_module(self, module_name, package=None):
print("requesting %s" % module_name)
self.loaded.add(module_name)
return self
def load_module(self, fullname):
fp, pathname, stuff = imp.find_module(fullname)
imp.load_module(fullname, fp, pathname, stuff)
But trying to import "random" I get
from future import division
ImportError: No module named future
Which I think means I'm missing something..
I haven't found any simple example of using imp to do some import introspection, any hints?

I'm pleased to say that listing out the imports is actually quite simple.
I need a minimal implementation of the Importer protocol (defined by PEP 302), where if find_module returns None it will just fallback to the next one.
This simple script can actually show the imports done by the program passed in:
import sys
class ImportInspector(object):
def find_module(self, module, path):
print("importing module %s" % module)
if __name__ == '__main__':
progname = sys.argv[0]
# shift by one position
sys.argv = sys.argv[1:]
sys.meta_path.append(ImportInspector())
code = compile(open(progname, 'rb').read(), progname, 'exec')
exec(code)
Given this, any kind of trick can be implemented on top of it.
For example we can keep track of the imports in a set and store them all when the program quits.
I think we might even get the hiearchy of the imports and produce a graph similar to what gprof2dot does, but only based on the analysis of the imports.

Problem with such an analysis would be the dynamic nature of python. In fact the set of modules that are used may be dependent on the runtime variable (i.e. some modules could be imported and used under certain runtime conditions only).
May be not the best way, but if you have pretty decent test coverage for your code, you can use coverage.py output to check what modules were loaded during the test execution.

How do I override a Python import?

I'm working on pypreprocessor which is a preprocessor that takes c-style directives and I've been able to make it work like a traditional preprocessor (it's self-consuming and executes postprocessed code on-the-fly) except that it breaks library imports.
The problem is: The preprocessor runs through the file, processes it, outputs to a temporary file, and exec() the temporary file. Libraries that are imported need to be handled a little different, because they aren't executed, but rather they are loaded and made accessible to the caller module.
What I need to be able to do is: Interrupt the import (since the preprocessor is being run in the middle of the import), load the postprocessed code as a tempModule, and replace the original import with the tempModule to trick the calling script with the import into believing that the tempModule is the original module.
I have searched everywhere and so far and have no solution.
This Stack Overflow question is the closest I've seen so far to providing an answer:
Override namespace in Python
Here's what I have.
# Remove the bytecode file created by the first import
os.remove(moduleName + '.pyc')
# Remove the first import
del sys.modules[moduleName]
# Import the postprocessed module
tmpModule = __import__(tmpModuleName)
# Set first module's reference to point to the preprocessed module
sys.modules[moduleName] = tmpModule
moduleName is the name of the original module, and tmpModuleName is the name of the postprocessed code file.
The strange part is this solution still runs completely normal as if the first module completed loaded normally; unless you remove the last line, then you get a module not found error.
Hopefully someone on Stack Overflow know a lot more about imports than I do, because this one has me stumped.
Note: I will only award a solution, or, if this is not possible in Python; the best, most detailed explanation of why this is not impossible.
Update: For anybody who is interested, here is the working code.
if imp.lock_held() is True:
del sys.modules[moduleName]
sys.modules[tmpModuleName] = __import__(tmpModuleName)
sys.modules[moduleName] = __import__(tmpModuleName)
The 'imp.lock_held' part detects whether the module is being loaded as a library. The following lines do the rest.

Does this answer your question? The second import does the trick.
Mod_1.py
def test_function():
print "Test Function -- Mod 1"
Mod_2.py
def test_function():
print "Test Function -- Mod 2"
Test.py
#!/usr/bin/python
import sys
import Mod_1
Mod_1.test_function()
del sys.modules['Mod_1']
sys.modules['Mod_1'] = __import__('Mod_2')
import Mod_1
Mod_1.test_function()

To define a different import behavior or to totally subvert the import process you will need to write import hooks. See PEP 302.
For example,
import sys
class MyImporter(object):
def find_module(self, module_name, package_path):
# Return a loader
return self
def load_module(self, module_name):
# Return a module
return self
sys.meta_path.append(MyImporter())
import now_you_can_import_any_name
print now_you_can_import_any_name
It outputs:
<__main__.MyImporter object at 0x009F85F0>
So basically it returns a new module (which can be any object), in this case itself. You may use it to alter the import behavior by returning processe_xxx on import of xxx.
IMO: Python doesn't need a preprocessor. Whatever you are accomplishing can be accomplished in Python itself due to it very dynamic nature, for example, taking the case of the debug example, what is wrong with having at top of file
debug = 1
and later
if debug:
print "wow"
?

In Python 2 there is the imputil module that seems to provide the functionality you are looking for, but has been removed in python 3. It's not very well documented but contains an example section that shows how you can replace the standard import functions.
For Python 3 there is the importlib module (introduced in Python 3.1) that contains functions and classes to modify the import functionality in all kinds of ways. It should be suitable to hook your preprocessor into the import system.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to modify imported source code on-the-fly? - python

Not elegant, but works for me (may have to add a path): with open ('my_module.py') as aFile: exec (aFile.read () .replace (<something>, <something else>))

Related

How does import and reload actually work when dealing with packages

Injecting variables into an import namespace

How to make a copy of a python module at runtime?

Analyse python project imports

How do I override a Python import?

Categories

Resources