To illustrate what I am trying to do, let's say I have a module testmod that lives in ./testmod.py. The entire contents of this module is
x = test
I would like to be able to successfully import this module into Python, using any of the tools available in importlib or any other built in library.
Obviously doing a simple import testmod statement from the current directory results in an error: NameError: name 'test' is not defined.
I thought that maybe passing either globals or locals to __import__ correctly would modify the environment inside the script being run, but it does not:
>>> testmod = __import__('testmod', globals={'test': 'globals'}, locals={'test': 'locals'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jfoxrabi/testmod.py", line 1, in <module>
x = test
NameError: name 'test' is not defined
I was setting the value of test differently so I could see which dict testmod.x came from if this worked.
Since neither of these seems to work, I am stuck. Is it even possible to accomplish what I am trying to do? I would guess that yes, since this is Python, not Sparta.
I am using Python 3.5 on Anaconda. I would very much prefer not to use external libraries.
Update: The Why
I am importing a module into my program as a configuration file. The reason that I am not using JSON or INI is that I would like to have the full scope of Python's interpreter available to compute the values in the config from expressions. I would like to have certain values that I compute before-hand in the program available to do those calculations.
While I am aware of the fact that this is about as bad as calling eval (I do that too in my program), I am not concerned with the security aspect for the time being. I am, however, quite willing to entertain better solutions should this indeed turn out to be a case of XY.
I came up with a solution based on this answer and the importlib docs. Basically, I have access to the module object before it is loaded by using the correct sequence of calls to importlib:
from importlib.util import spec_from_file_location, module_from_spec
from os.path import splitext, basename
def loadConfig(fileName):
test = 'This is a test'
name = splitext(basename(fileName))[0]
spec = spec_from_file_location(name, fileName)
config = module_from_spec(spec)
config.test = test
spec.loader.exec_module(config)
return config
testmod = loadConfig('./testmod.py')
This is a bit better than modifying builtins, which may have unintended consequences in other parts of the program, and may also restrict the names I can pass in to the module.
I decided to put all the configuration items into a single field accessible at load time, which I named config. This allows me to do the following in testmod:
if 'test' in config:
x = config['test']
The loader now looks like this:
from importlib.util import spec_from_file_location, module_from_spec
from os.path import splitext, basename
def loadConfig(fileName, **kwargs):
name = splitext(basename(fileName))[0]
spec = spec_from_file_location(name, fileName)
config = module_from_spec(spec)
config.config = kwargs
spec.loader.exec_module(config)
return config
testmod = loadConfig('./testmod.py', test='This is a test')
After finding myself using this a bunch of times, I finally ended up adding this functionality to the utility library I maintain, haggis. haggis.load.load_module loads a text file as a module with injection, while haggis.load.module_as_dict does a more advanced version of the same that loads it as a potentially nested configuration file into a dict.
You could screw with Python's builtins to inject your own fake built-in test variable:
import builtins # __builtin__, no s, in Python 2
builtins.test = 5 # or whatever other placeholder value
import testmod
del builtins.test # clean up after ourselves
Related
Suppose I have a module file like this:
# my_module.py
print("hello")
Then I have a simple script:
# my_script.py
import my_module
This will print "hello".
Let's say I want to "override" the print() function so it returns "world" instead. How could I do this programmatically (without manually modifying my_module.py)?
What I thought is that I need somehow to modify the source code of my_module before or while importing it. Obvisouly, I cannot do this after importing it so solution using unittest.mock are impossible.
I also thought I could read the file my_module.py, perform modification, then load it. But this is ugly, as it will not work if the module is located somewhere else.
The good solution, I think, is to make use of importlib.
I read the doc and found a very intersecting method: get_source(fullname). I thought I could just override it:
def get_source(fullname):
source = super().get_source(fullname)
source = source.replace("hello", "world")
return source
Unfortunately, I am a bit lost with all these abstract classes and I do not know how to perform this properly.
I tried vainly:
spec = importlib.util.find_spec("my_module")
spec.loader.get_source = mocked_get_source
module = importlib.util.module_from_spec(spec)
Any help would be welcome, please.
Here's a solution based on the content of this great talk. It allows any arbitrary modifications to be made to the source before importing the specified module. It should be reasonably correct as long as the slides did not omit anything important. This will only work on Python 3.5+.
import importlib
import sys
def modify_and_import(module_name, package, modification_func):
spec = importlib.util.find_spec(module_name, package)
source = spec.loader.get_source(module_name)
new_source = modification_func(source)
module = importlib.util.module_from_spec(spec)
codeobj = compile(new_source, module.__spec__.origin, 'exec')
exec(codeobj, module.__dict__)
sys.modules[module_name] = module
return module
So, using this you can do
my_module = modify_and_import("my_module", None, lambda src: src.replace("hello", "world"))
This doesn't answer the general question of dynamically modifying the source code of an imported module, but to "Override" or "monkey-patch" its use of the print() function can be done (since it's a built-in function in Python 3.x). Here's how:
#!/usr/bin/env python3
# my_script.py
import builtins
_print = builtins.print
def my_print(*args, **kwargs):
_print('In my_print: ', end='')
return _print(*args, **kwargs)
builtins.print = my_print
import my_module # -> In my_print: hello
I first needed to better understand the import operation. Fortunately, this is well explained in the importlib documentation and scratching through the source code helped too.
This import process is actually split in two parts. First, a finder is in charge of parsing the module name (including dot-separated packages) and instantiating an appropriate loader. Indeed, built-in are not imported as local modules for example. Then, the loader is called based on what the finder returned. This loader get the source from a file or from a cache, and executed the code if the module was not previously loaded.
This is very simple. This explains why I actually did not need to use abstract classes from importutil.abc: I do not want to provide my own import process. Instead, I could create a subclass inherited from one of the classes from importuil.machinery and override get_source() from SourceFileLoader for example. However, this is not the way to go because the loader is instantiated by the finder so I do not have the hand on which class is used. I cannot specify that my subclass should be used.
So, the best solution is to let the finder do its job, and then replace the get_source() method of whatever Loader has been instantiated.
Unfortunately, by looking trough the code source I saw that the basic Loaders are not using get_source() (which is only used by the the inspect module). So my whole idea could not work.
In the end, I guess get_source() should be called manually, then the returned source should be modified, and finally the code should be executed. This is what Martin Valgur detailed in his answer.
If compatibility with Python 2 is needed, I see no other way than reading the source file:
import imp
import sys
import types
module_name = "my_module"
file, pathname, description = imp.find_module(module_name)
with open(pathname) as f:
source = f.read()
source = source.replace('hello', 'world')
module = types.ModuleType(module_name)
exec(source, module.__dict__)
sys.modules[module_name] = module
If importing the module before the patching it is okay, then a possible solution would be
import inspect
import my_module
source = inspect.getsource(my_module)
new_source = source.replace('"hello"', '"world"')
exec(new_source, my_module.__dict__)
If you're after a more general solution, then you can also take a look at the approach I used in another answer a while ago.
My solution updates the source file, which works for the inner import situation. The inner import means that transformers.models.albert import modeling_albert from the source file. In such case, even if I use the solution from Martin Valgur, it won't work. So I update the source file. Hope it help the people who have the same trouble with me.
import inspect
from transformers.models.albert import modeling_albert
# Get source
source = inspect.getsource(modeling_albert)
source_before = "AlbertModel(config, add_pooling_layer=False)"
source_after = "AlbertModel(config, add_pooling_layer=True)"
new_source = source.replace(source_before, source_after)
# Update file
file_path = modeling_albert.__spec__.origin
with open(file_path, 'w') as f:
f.write(new_source)
Not elegant, but works for me (may have to add a path):
with open ('my_module.py') as aFile:
exec (aFile.read () .replace (<something>, <something else>))
I need to make a copy of a socket module to be able to use it and to have one more socket module monkey-patched and use it differently.
Is this possible?
I mean to really copy a module, namely to get the same result at runtime as if I've copied socketmodule.c, changed the initsocket() function to initmy_socket(), and installed it as my_socket extension.
You can always do tricks like importing a module then deleting it from sys.modules or trying to copy a module. However, Python already provides what you want in its Standard Library.
import imp # Standard module to do such things you want to.
# We can import any module including standard ones:
os1=imp.load_module('os1', *imp.find_module('os'))
# Here is another one:
os2=imp.load_module('os2', *imp.find_module('os'))
# This returns True:
id(os1)!=id(os2)
Python3.3+
imp.load_module is deprecated in python3.3+, and recommends the use of importlib
#!/usr/bin/env python3
import sys
import importlib.util
SPEC_OS = importlib.util.find_spec('os')
os1 = importlib.util.module_from_spec(SPEC_OS)
SPEC_OS.loader.exec_module(os1)
sys.modules['os1'] = os1
os2 = importlib.util.module_from_spec(SPEC_OS)
SPEC_OS.loader.exec_module(os2)
sys.modules['os2'] = os2
del SPEC_OS
assert os1 is not os2, \
"Module `os` instancing failed"
Here, we import the same module twice but as completely different module objects. If you check sys.modules, you can see two names you entered as first parameters to load_module calls. Take a look at the documentation for details.
UPDATE:
To make the main difference of this approach obvious, I want to make this clearer: When you import the same module this way, you will have both versions globally accessible for every other module you import in runtime, which is exactly what the questioner needs as I understood.
Below is another example to emphasize this point.
These two statements do exactly the same thing:
import my_socket_module as socket_imported
socket_imported = imp.load_module('my_socket_module',
*imp.find_module('my_socket_module')
)
On second line, we repeat 'my_socket_module' string twice and that is how import statement works; but these two strings are, in fact, used for two different reasons.
Second occurrence as we passed it to find_module is used as the file name that will be found on the system. The first occurrence of the string as we passed it to load_module method is used as system-wide identifier of the loaded module.
So, we can use different names for these which means we can make it work exactly like we copied the python source file for the module and loaded it.
socket = imp.load_module('socket_original', *imp.find_module('my_socket_module'))
socket_monkey = imp.load_module('socket_patched',*imp.find_module('my_socket_module'))
def alternative_implementation(blah, blah):
return 'Happiness'
socket_monkey.original_function = alternative_implementation
import my_sub_module
Then in my_sub_module, I can import 'socket_patched' which does not exist on system! Here we are in my_sub_module.py.
import socket_patched
socket_patched.original_function('foo', 'bar')
# This call brings us 'Happiness'
This is pretty disgusting, but this might suffice:
import sys
# if socket was already imported, get rid of it and save a copy
save = sys.modules.pop('socket', None)
# import socket again (it's not in sys.modules, so it will be reimported)
import socket as mysock
if save is None:
# if we didn't have a saved copy, remove my version of 'socket'
del sys.modules['socket']
else:
# if we did have a saved copy overwrite my socket with the original
sys.modules['socket'] = save
Here's some code that creates a new module with the functions and variables of the old:
def copymodule(old):
new = type(old)(old.__name__, old.__doc__)
new.__dict__.update(old.__dict__)
return new
Note that this does a fairly shallow copy of the module. The dictionary is newly created, so basic monkey patching will work, but any mutables in the original module will be shared between the two.
Edit: According to the comment, a deep copy is needed. I tried messing around with monkey-patching the copy module to support deep copies of modules, but that didn't work. Next I tried importing the module twice, but since modules are cached in sys.modules, that gave me the same module twice. Finally, the solution I hit upon was removing the modules from sys.modules after importing it the first time, then importing it again.
from imp import find_module, load_module
from sys import modules
def loadtwice(name, path=None):
"""Import two copies of a module.
The name and path arguments are as for `find_module` in the `imp` module.
Note that future imports of the module will return the same object as
the second of the two returned by this function.
"""
startingmods = modules.copy()
foundmod = find_module(name, path)
mod1 = load_module(name, *foundmod)
newmods = set(modules) - set(startingmods)
for m in newmods:
del modules[m]
mod2 = load_module(name, *foundmod)
return mod1, mod2
Physically copy the socket module to socket_monkey and go from there? I don't feel you need any "clever" work-around... but I might well be over simplifying!
How can a Python program easily import a Python module from a file with an arbitrary name?
The standard library import mechanism doesn't seem to help. An important constraint is that I don't want an accompanying bytecode file to appear; if I use imp.load_module on a source file named foo, a file named fooc appears, which is messy and confusing.
The Python import mechanism expects that it knows best what the filename will be: module files are found at specific filesystem locations, and in particular that the filenames have particular suffixes (foo.py for Python source code, etc.) and no others.
That clashes with another convention, at least on Unix: files which will be executed as a command should be named without reference to the implementation language. The command to do “foo”, for example, should be in a program file named foo with no suffix.
Unit-testing such a program file, though, requires importing that file. I need the objects from the program file as a Python module object, ready for manipulation in the unit test cases, exactly as import would give me.
What is the Pythonic way to import a module, especially from a file whose name doesn't end with .py, without a bytecode file appearing for that import?
import os
import imp
py_source_open_mode = "U"
py_source_description = (".py", py_source_open_mode, imp.PY_SOURCE)
module_filepath = "foo/bar/baz"
module_name = os.path.basename(module_filepath)
with open(module_filepath, py_source_open_mode) as module_file:
foo_module = imp.load_module(
module_name, module_file, module_filepath, py_source_description)
My best implementation so far is (using features only in Python 2.6 or later):
import os
import sys
import imp
import contextlib
#contextlib.contextmanager
def preserve_value(namespace, name):
""" A context manager to preserve, then restore, the specified binding.
:param namespace: The namespace object (e.g. a class or dict)
containing the name binding.
:param name: The name of the binding to be preserved.
:yield: None.
When the context manager is entered, the current value bound to
`name` in `namespace` is saved. When the context manager is
exited, the binding is re-established to the saved value.
"""
saved_value = getattr(namespace, name)
yield
setattr(namespace, name, saved_value)
def make_module_from_file(module_name, module_filepath):
""" Make a new module object from the source code in specified file.
:param module_name: The name of the resulting module object.
:param module_filepath: The filesystem path to open for
reading the module's Python source.
:return: The module object.
The Python import mechanism is not used. No cached bytecode
file is created, and no entry is placed in `sys.modules`.
"""
py_source_open_mode = 'U'
py_source_description = (".py", py_source_open_mode, imp.PY_SOURCE)
with open(module_filepath, py_source_open_mode) as module_file:
with preserve_value(sys, 'dont_write_bytecode'):
sys.dont_write_bytecode = True
module = imp.load_module(
module_name, module_file, module_filepath,
py_source_description)
return module
def import_program_as_module(program_filepath):
""" Import module from program file `program_filepath`.
:param program_filepath: The full filesystem path to the program.
This name will be used for both the source file to read, and
the resulting module name.
:return: The module object.
A program file has an arbitrary name; it is not suitable to
create a corresponding bytecode file alongside. So the creation
of bytecode is suppressed during the import.
The module object will also be added to `sys.modules`.
"""
module_name = os.path.basename(program_filepath)
module = make_module_from_file(module_name, program_filename)
sys.modules[module_name] = module
return module
This is a little too broad: it disables bytecode file generation during the entire import process for the module, which means other modules imported during that process will also not have bytecode files generated.
I'm still looking for a way to disable bytecode file generation for only the specified module file.
I ran into this problem, and after failing to find a solution I really liked, solved it by making a symlink with a .py extension. So, the executable script might be called command, the symbolic link that points to it is command.py, and the unit tests are in command_test.py. The symlink is happily accepted when importing the file as a module and the executable script follows standard naming conventions.
This has the added advantage of making it easy to keep the unit tests in a different directory than the executable.
I am having a problem that may be quite a basic thing, but as a Python learner I've been struggling with it for hours. The documentation has not provided me with an answer so far.
The problem is that an import statement included in a module does not seem to be executed when I import this module from a python script. What I have is as follows:
I have a file project.py (i.e. python library) that looks like this:
import datetime
class Project:
""" This class is a container for project data """
title = ""
manager = ""
date = datetime.datetime.min
def __init__( self, title="", manager="", date=datetime.datetime.min ):
""" Init function with some defaults """
self.title = title
self.manager = manager
self.date = date
This library is later used in a script (file.py) that imports project, it starts like this:
import project
print datetime.datetime.min
The problem then arises when I try to execute this script with Python file.py. Python then complains with the folliwing NameError:
Traceback (most recent call last):
File "file.py", line 3, in <module>
print datetime.datetime.min
NameError: name 'datetime' is not defined
This actually happens also if I try to make the same statements (import and print) directly from the Python shell.
Shouldn't the datetime module be automatically imported in the precise moment that I call import project?
Thanks a lot in advance.
The datetime module is only imported into the project namespace. So you could access it as project.datetime.datetime.min, but really you should import it into your script directly.
Every symbol (name) that you create in your project.py file (like your Project class) ends up in the project namespace, which includes things you import from other modules. This isn't as inefficient as it might seem however - the actual datetime module is still only imported once, no matter how many times you do it. Every time you import it subsequent to the first one it's just importing the names into the current namespace, but not actually doing all the heavy lifting of reading and importing the module.
Try thinking of the import statement as roughly equivalent to:
project = __import__('project')
Effectively an import statement is simply an assignment to a variable. There may be some side effects as the module is loaded, but from inside your script all you see is a simple assignment to a name.
You can pull in all the names from a module using from project import *, but don't do that because it makes your code much more brittle and harder to maintain. Instead either just import the module or exactly the names you want.
So for your code something like:
import datetime
from project import Project
is the sort of thing you should be doing.
I have a directory, let's call it Storage full of packages with unwieldy names like mypackage-xxyyzzww, and of course Storage is on my PYTHONPATH. Since packages have long unmemorable names, all of the packages are symlinked to friendlier names, such as mypackage.
Now, I don't want to rely on file system symbolic links to do this, instead I tried mucking around with sys.path and sys.modules. Currently I'm doing something like this:
import imp
imp.load_package('mypackage', 'Storage/mypackage-xxyyzzww')
How bad is it to do things this way, and is there a chance this will break in the future? One funny thing is that there's even no mention of imp.load_package function in the docs.
EDIT: besides not relying on symbolic links, I can't use PYTHONPATH variable anymore.
Instead of using imp, you can assign different names to imported modules.
import mypackage_xxyyzzww as mypackage
If you then create a __init__.py file inside of Storage, you can add several of the above lines to make importing easier.
Storage/__init__.py:
import mypackage_xxyyzzww as mypackage
import otherpackage_xxyyzzww as otherpackage
Interpreter:
>>> from Storage import mypackage, otherpackage
importlib may be more appropriate, as it uses/implements the PEP302 mechanism.
Follow the DictImporter example, but override find_module to find the real filename and store it in the dict, then override load_module to get the code from the found file.
You shouldn't need to use sys.path once you've created your Storage module
#from importlib import abc
import imp
import os
import sys
import logging
logging.basicConfig(level=logging.DEBUG)
dprint = logging.debug
class MyImporter(object):
def __init__(self,path):
self.path=path
self.names = {}
def find_module(self,fullname,path=None):
dprint("find_module({fullname},{path})".format(**locals()))
ml = imp.find_module(fullname,path)
dprint(repr(ml))
raise ImportError
def load_module(self,fullname):
dprint("load_module({fullname})".format(**locals()))
return imp.load_module(fullname)
raise ImportError
def load_storage( path, modname=None ):
if modname is None:
modname = os.path.basename(path)
mod = imp.new_module(modname)
sys.modules[modname] = mod
assert mod.__name__== modname
mod.__path__=[path]
#sys.meta_path.append(MyImporter(path))
mod.__loader__= MyImporter(path)
return mod
if __name__=="__main__":
load_storage("arbitrary-path-to-code/Storage")
from Storage import plain
from Storage import mypkg
Then when you import Storage.mypackage, python will immediately use your importer without bothering to look on sys.path
That doesn't work. The code above does work to import ordinary modules under Storage without requiring Storage to be on sys.path, but both 3.1 and 2.6 seem to ignore the loader attribute mentioned in PEP302.
If I uncomment the sys.meta_path line, 3.1 dies with StackOverflow, and 2.6 dies with ImportError. hmmm... I'm out of time now, but may look at it later.
Packages are just entries in the namespace. You should not name your path components with anything that is not a legal python variable name.