How to replace usages of deprecated imp.load_dynamic? - python

The imp module is deprecated (since version 3.4), however some parts of the infrastructure (e.g. pyximport) still use imp.load_dynamic, which leads to deprecation-warnings with newer Python versions.
Internally, imp.load_dynamic uses importlib-machinery:
from importlib._bootstrap import _load
def load_dynamic(name, path, file=None):
"""**DEPRECATED**
Load an extension module.
"""
import importlib.machinery
loader = importlib.machinery.ExtensionFileLoader(name, path)
# Issue #24748: Skip the sys.modules check in _load_module_shim;
# always load new extension
spec = importlib.machinery.ModuleSpec(
name=name, loader=loader, origin=path)
return _load(spec)
But it feels silly to duplicate this code (which at least once had to be improved) for all kind of projects.
importlib's documentation proposes the following:
import importlib.util
import sys
def alternative_load_dynamic(name, path, file=None):
spec = importlib.util.spec_from_file_location(name, path)
module = importlib.util.module_from_spec(spec)
sys.modules[name] = module
spec.loader.exec_module(module)
return sys.modules[name]
While it has the advantage of not using the private API compared to the first implementation (i.e. _load) and gets the behavior at the core of issue #24748 right, it is not a drop-in replacement for imp.load_dynamic:
this alternative can more: it can also load pure python modules (*.py or *.pyc).
this alternative can less: it can only load dynamic modules with ending *.so (or *.pyd on Windows), but not for example foo.my_ending.
What are other options to replace imp.load_dynamic?

How about importlib.import_module ?
It also notes:
If you are dynamically importing a module that was created since the interpreter began execution (e.g., created a Python source file), you may need to call invalidate_caches() in order for the new module to be noticed by the import system.
https://docs.python.org/3/library/importlib.html#importlib.import_module

Related

ModuleSpec not found during reload for programmatically imported file in different directory

While implementing a reload function in a CLI, I stumbled across this odd error which I cannot seem to solve. To recreate this, first, create a dummy file in a different directory to your CWD. For this example, I'll call it test.py.
Using the recipe here, import it programmatically, but ensure that the CWD is separate and the path to test.py cannot be reached using the CWD as a 'stem'.
spec = importlib.util.spec_from_file_location('test', <path>)
module = importlib.util.module_from_spec(spec)
sys.modules['test'] = module
spec.loader.exec_module(module)
The error occurs when trying to reload the module with importlib.reload:
>>> importlib.reload(module)
File "C:\Program Files\Python38\lib\importlib\__init__.py", line 168, in reload
raise ModuleNotFoundError(f"spec not found for the module {name!r}", name=name)
ModuleNotFoundError: spec not found for the module 'test'
I had a little inspection of importlib.__init__.reload which seems to call _frozen_importlib._bootstrap._find_spec. The error is raised because the module spec was not found, yet the attribute __spec__ is present before the reload. The spec is set to None due to the nature of the code:
spec = module.__spec__ = _bootstrap._find_spec(name, pkgpath, target)
if spec is None:
raise ModuleNotFoundError(f"spec not found for the module {name!r}", name=name)
Is there a way for me to reload the programatically imported module? I do need to keep the file imported using a file path since the program it's a part of allows the user to write 'plugins' which can be placed anywhere in the filesystem. This is a simplified version of what's mainly going on in the import mechanics for my code.
With a suitably recent version of python (>=3.4), you can add a MetaPathFinder to sys.meta_path to tell python where to find the module spec. If you do this, you avoid the explicit module load recipe and importlib.reload should work correctly. Example:
class ModuleFinder(importlib.abc.MetaPathFinder):
def __init__(self, path_map:dict):
self.path_map = path_map
def find_spec(self, fullname, path, target=None):
if not fullname in self.path_map:
return None
return importlib.util.spec_from_file_location(fullname, self.path_map[fullname])
def find_module(self, fullname, path):
return None # No need to implement, backward compatibility only
sys.meta_path.append(ModuleFinder(my_module_paths))
Where my_module_paths is some dict mapping module names to locations.
For more details: https://docs.python.org/3/library/importlib.html#setting-up-an-importer
(actually this is just below where you found the recipe)
I would like to give some clarity to the situation.
What happens is the importlib.reload(module) calls
_bootstrap._find_spec(name, pkgpath, target). The bootstrap
part of the importlib is a frozen module so, unless we run Python
from source, we can't step into it with a debugger. We're able
to step, however, through reload() and up to the failing call.
See c:/my/python/lib/importlib/__init__.py(166)reload() and
c:/my/python/Lib/importlib/_bootstrap.py(890)_find_spec().
Comparing the inputs passed in from reload() to _find_spec(), we
see that the name passed in is "test", the path is the
pkgpath which is None, and the target is the module object for test. The name is found in sys.modules, so is_reload is True. Then
for each finder in the meta_path, its find_spec() function is
called. Specifically, its called with path which corresponds to
pkgpath and is equal to None. Hence, the spec is always None.
The pkgpath is ultimately determined by the packages parent. The module nameis simplytest` and there is no
parent. Since the test module lives outside the current working
directory, this will always fail.
When we add the ModuleFinder to the meta_path, its find_spec gets
called within in the _bootstrap._find_spec(name, pkgpath, target) line. It then calls importlib.util.spec_from_file_location which was how the module was loaded in the first time.

In Python how can one tell if a module comes from a C extension?

What is the correct or most robust way to tell from Python if an imported module comes from a C extension as opposed to a pure Python module? This is useful, for example, if a Python package has a module with both a pure Python implementation and a C implementation, and you want to be able to tell at runtime which one is being used.
One idea is to examine the file extension of module.__file__, but I'm not sure all the file extensions one should check for and if this approach is necessarily the most reliable.
tl;dr
See the "In Search of Perfection" subsection below for the well-tested answer.
As a pragmatic counterpoint to abarnert's helpful analysis of the subtlety involved in portably identifying C extensions, Stack Overflow Productions™ presents... an actual answer.
The capacity to reliably differentiate C extensions from non-C extensions is incredibly useful, without which the Python community would be impoverished. Real-world use cases include:
Application freezing, converting one cross-platform Python codebase into multiple platform-specific executables. PyInstaller is the standard example here. Identifying C extensions is critical to robust freezing. If a module imported by the codebase being frozen is a C extension, all external shared libraries transitively linked to by that C extension must be frozen with that codebase as well. Shameful confession: I contribute to PyInstaller.
Application optimization, either statically to native machine code (e.g., Cython) or dynamically in a just-in-time manner (e.g., Numba). For self-evident reasons, Python optimizers necessarily differentiate already compiled C extensions from uncompiled pure-Python modules.
Dependency analysis, inspecting external shared libraries on behalf of end users. In our case, we analyze a mandatory dependency (Numpy) to detect local installations of this dependency linking against non-parallelized shared libraries (e.g., the reference BLAS implementation) and inform end users when this is the case. Why? Because we don't want the blame when our application underperforms due to improper installation of dependencies over which we have no control. Bad performance is your fault, hapless user!
Probably other essential low-level stuff. Profiling, maybe?
We can all agree that freezing, optimization, and minimizing end user complaints are useful. Ergo, identifying C extensions is useful.
The Disagreement Deepens
I also disagree with abarnert's penultimate conclusion that:
The best heuristics anyone has come up with for this are the ones implemented in the inspect module, so the best thing to do is to use that.
No. The best heuristics anyone has come up with for this are those given below. All stdlib modules (including but not limited to inspect) are useless for this purpose. Specifically:
The inspect.getsource() and inspect.getsourcefile() functions ambiguously return None for both C extensions (which understandably have no pure-Python source) and other types of modules that also have no pure-Python source (e.g., bytecode-only modules). Useless.
importlib machinery only applies to modules loadable by PEP 302-compliant loaders and hence visible to the default importlib import algorithm. Useful, but hardly generally applicable. The assumption of PEP 302 compliance breaks down when the real world hits your package in the face repeatedly. For example, did you know that the __import__() built-in is actually overriddable? This is how we used to customize Python's import mechanism – back when the Earth was still flat.
abarnert's ultimate conclusion is also contentious:
…there is no perfect answer.
There is a perfect answer. Much like the oft-doubted Triforce of Hyrulean legend, a perfect answer exists for every imperfect question.
Let's find it.
In Search of Perfection
The pure-Python function that follows returns True only if the passed previously imported module object is a C extension: For simplicity, Python 3.x is assumed.
import inspect, os
from importlib.machinery import ExtensionFileLoader, EXTENSION_SUFFIXES
from types import ModuleType
def is_c_extension(module: ModuleType) -> bool:
'''
`True` only if the passed module is a C extension implemented as a
dynamically linked shared library specific to the current platform.
Parameters
----------
module : ModuleType
Previously imported module object to be tested.
Returns
----------
bool
`True` only if this module is a C extension.
'''
assert isinstance(module, ModuleType), '"{}" not a module.'.format(module)
# If this module was loaded by a PEP 302-compliant CPython-specific loader
# loading only C extensions, this module is a C extension.
if isinstance(getattr(module, '__loader__', None), ExtensionFileLoader):
return True
# Else, fallback to filetype matching heuristics.
#
# Absolute path of the file defining this module.
module_filename = inspect.getfile(module)
# "."-prefixed filetype of this path if any or the empty string otherwise.
module_filetype = os.path.splitext(module_filename)[1]
# This module is only a C extension if this path's filetype is that of a
# C extension specific to the current platform.
return module_filetype in EXTENSION_SUFFIXES
If it looks long, that's because docstrings, comments, and assertions are good. It's actually only six lines. Eat your elderly heart out, Guido.
Proof in the Pudding
Let's unit test this function with four portably importable modules:
The stdlib pure-Python os.__init__ module. Hopefully not a C extension.
The stdlib pure-Python importlib.machinery submodule. Hopefully not a C extension.
The stdlib _elementtree C extension.
The third-party numpy.core.multiarray C extension.
To wit:
>>> import os
>>> import importlib.machinery as im
>>> import _elementtree as et
>>> import numpy.core.multiarray as ma
>>> for module in (os, im, et, ma):
... print('Is "{}" a C extension? {}'.format(
... module.__name__, is_c_extension(module)))
Is "os" a C extension? False
Is "importlib.machinery" a C extension? False
Is "_elementtree" a C extension? True
Is "numpy.core.multiarray" a C extension? True
All's well that ends.
How to do this?
The details of our code are quite inconsequential. Very well, where do we begin?
If the passed module was loaded by a PEP 302-compliant loader (the common case), the PEP 302 specification requires the attribute assigned on importation to this module to define a special __loader__ attribute whose value is the loader object loading this module. Hence:
If this value for this module is an instance of the CPython-specific importlib.machinery.ExtensionFileLoader class, this module is a C extension.
Else, either (A) the active Python interpreter is not the official CPython implementation (e.g., PyPy) or (B) the active Python interpreter is CPython but this module was not loaded by a PEP 302-compliant loader, typically due to the default __import__() machinery being overridden (e.g., by a low-level bootloader running this Python application as a platform-specific frozen binary). In either case, fallback to testing whether this module's filetype is that of a C extension specific to the current platform.
Eight line functions with twenty page explanations. Thas just how we rolls.
First, I don't think this is at all useful. It's very common for modules to be pure-Python wrappers around a C extension module—or, in some cases, pure-Python wrappers around a C extension module if it's available, or a pure Python implementation if not.
For some popular third-party examples: numpy is pure Python, even though everything important is implemented in C; bintrees is pure Python, even though its classes may all be implemented either in C or in Python depending on how you build it; etc.
And this is true in most of the stdlib from 3.2 on. For example, if you just import pickle, the implementation classes will be built in C (what you used to get from cpickle in 2.7) in CPython, while they'll be pure-Python versions in PyPy, but either way pickle itself is pure Python.
But if you do want to do this, you actually need to distinguish three things:
Built-in modules, like sys.
C extension modules, like 2.x's cpickle.
Pure Python modules, like 2.x's pickle.
And that's assuming you only care about CPython; if your code runs in, say, Jython, or IronPython, the implementation could be JVM or .NET rather than native code.
You can't distinguish perfectly based on __file__, for a number of reasons:
Built-in modules have no __file__ at all. (This is documented in a few places—e.g., the Types and members table in the inspect docs.) Note that if you're using something like py2app or cx_freeze, what counts as "built-in" may be different from a standalone installation.
A pure-Python module may have a .pyc/.pyo file without having a .py file in a distributed app.
A module in a a package installed as a single-file egg (which is common with easy_install, less so with pip) will have either a blank or useless __file__.
If you build a binary distribution, there's a good chance your whole library will be packed in a zip file, causing the same problem as single-file eggs.
In 3.1+, the import process has been massively cleaned up, mostly rewritten in Python, and mostly exposed to the Python layer.
So, you can use the importlib module to see the chain of loaders used to load a module, and ultimately you'll get to BuiltinImporter (builtins), ExtensionFileLoader (.so/.pyd/etc.), SourceFileLoader (.py), or SourcelessFileLoader (.pyc/.pyo).
You can also see the suffixes assigned to each of the four, on the current target platform, as constants in importlib.machinery. So, you could check that the any(pathname.endswith(suffix) for suffix in importlib.machinery.EXTENSION_SUFFIXES)), but that won't actually help in, e.g., the egg/zip case unless you've already traveled up the chain anyway.
The best heuristics anyone has come up with for this are the ones implemented in the inspect module, so the best thing to do is to use that.
The best choice will be one or more of getsource, getsourcefile, and getfile; which is best depends on which heuristics you want.
A built-in module will raise a TypeError for any of them.
An extension module ought to return an empty string for getsourcefile. This seems to work in all the 2.5-3.4 versions I have, but I don't have 2.4 around. For getsource, at least in some versions, it returns the actual bytes of the .so file, even though it should be returning an empty string or raising an IOError. (In 3.x, you will almost certainly get a UnicodeError or SyntaxError, but you probably don't want to rely on that…)
Pure Python modules may return an empty string for getsourcefile if in an egg/zip/etc. They should always return a non-empty string for getsource if source is available, even inside an egg/zip/etc., but if they're sourceless bytecode (.pyc/etc.) they will return an empty string or raise an IOError.
The best bet is to experiment with the version you care about on the platform(s) you care about in the distribution/setup(s) you care about.
#Cecil Curry's function is excellent. Two minor comments: firsly, the _elementtree example raises a TypeError with my copy of Python 3.5.6. Secondly, as #crld points out, it's also helpful to know if a module contains C extensions, but a more portable version might help. More generic versions (with Python 3.6+ f-string syntax) may therefore be:
from importlib.machinery import ExtensionFileLoader, EXTENSION_SUFFIXES
import inspect
import logging
import os
import os.path
import pkgutil
from types import ModuleType
from typing import List
log = logging.getLogger(__name__)
def is_builtin_module(module: ModuleType) -> bool:
"""
Is this module a built-in module, like ``os``?
Method is as per :func:`inspect.getfile`.
"""
return not hasattr(module, "__file__")
def is_module_a_package(module: ModuleType) -> bool:
assert inspect.ismodule(module)
return os.path.basename(inspect.getfile(module)) == "__init__.py"
def is_c_extension(module: ModuleType) -> bool:
"""
Modified from
https://stackoverflow.com/questions/20339053/in-python-how-can-one-tell-if-a-module-comes-from-a-c-extension.
``True`` only if the passed module is a C extension implemented as a
dynamically linked shared library specific to the current platform.
Args:
module: Previously imported module object to be tested.
Returns:
bool: ``True`` only if this module is a C extension.
Examples:
.. code-block:: python
from cardinal_pythonlib.modules import is_c_extension
import os
import _elementtree as et
import numpy
import numpy.core.multiarray as numpy_multiarray
is_c_extension(os) # False
is_c_extension(numpy) # False
is_c_extension(et) # False on my system (Python 3.5.6). True in the original example.
is_c_extension(numpy_multiarray) # True
""" # noqa
assert inspect.ismodule(module), f'"{module}" not a module.'
# If this module was loaded by a PEP 302-compliant CPython-specific loader
# loading only C extensions, this module is a C extension.
if isinstance(getattr(module, '__loader__', None), ExtensionFileLoader):
return True
# If it's built-in, it's not a C extension.
if is_builtin_module(module):
return False
# Else, fallback to filetype matching heuristics.
#
# Absolute path of the file defining this module.
module_filename = inspect.getfile(module)
# "."-prefixed filetype of this path if any or the empty string otherwise.
module_filetype = os.path.splitext(module_filename)[1]
# This module is only a C extension if this path's filetype is that of a
# C extension specific to the current platform.
return module_filetype in EXTENSION_SUFFIXES
def contains_c_extension(module: ModuleType,
import_all_submodules: bool = True,
include_external_imports: bool = False,
seen: List[ModuleType] = None,
verbose: bool = False) -> bool:
"""
Extends :func:`is_c_extension` by asking: is this module, or any of its
submodules, a C extension?
Args:
module: Previously imported module object to be tested.
import_all_submodules: explicitly import all submodules of this module?
include_external_imports: check modules in other packages that this
module imports?
seen: used internally for recursion (to deal with recursive modules);
should be ``None`` when called by users
verbose: show working via log?
Returns:
bool: ``True`` only if this module or one of its submodules is a C
extension.
Examples:
.. code-block:: python
import logging
import _elementtree as et
import os
import arrow
import alembic
import django
import numpy
import numpy.core.multiarray as numpy_multiarray
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG) # be verbose
contains_c_extension(os) # False
contains_c_extension(et) # False
contains_c_extension(numpy) # True -- different from is_c_extension()
contains_c_extension(numpy_multiarray) # True
contains_c_extension(arrow) # False
contains_c_extension(alembic) # False
contains_c_extension(alembic, include_external_imports=True) # True
# ... this example shows that Alembic imports hashlib, which can import
# _hashlib, which is a C extension; however, that doesn't stop us (for
# example) installing Alembic on a machine with no C compiler
contains_c_extension(django)
""" # noqa
assert inspect.ismodule(module), f'"{module}" not a module.'
if seen is None: # only true for the top-level call
seen = [] # type: List[ModuleType]
if module in seen: # modules can "contain" themselves
# already inspected; avoid infinite loops
return False
seen.append(module)
# Check the thing we were asked about
is_c_ext = is_c_extension(module)
if verbose:
log.info(f"Is module {module!r} a C extension? {is_c_ext}")
if is_c_ext:
return True
if is_builtin_module(module):
# built-in, therefore we stop searching it
return False
# Now check any children, in a couple of ways
top_level_module = seen[0]
top_path = os.path.dirname(top_level_module.__file__)
# Recurse using dir(). This picks up modules that are automatically
# imported by our top-level model. But it won't pick up all submodules;
# try e.g. for django.
for candidate_name in dir(module):
candidate = getattr(module, candidate_name)
# noinspection PyBroadException
try:
if not inspect.ismodule(candidate):
# not a module
continue
except Exception:
# e.g. a Django module that won't import until we configure its
# settings
log.error(f"Failed to test ismodule() status of {candidate!r}")
continue
if is_builtin_module(candidate):
# built-in, therefore we stop searching it
continue
candidate_fname = getattr(candidate, "__file__")
if not include_external_imports:
if os.path.commonpath([top_path, candidate_fname]) != top_path:
if verbose:
log.debug(f"Skipping, not within the top-level module's "
f"directory: {candidate!r}")
continue
# Recurse:
if contains_c_extension(
module=candidate,
import_all_submodules=False, # only done at the top level, below # noqa
include_external_imports=include_external_imports,
seen=seen):
return True
if import_all_submodules:
if not is_module_a_package(module):
if verbose:
log.debug(f"Top-level module is not a package: {module!r}")
return False
# Otherwise, for things like Django, we need to recurse in a different
# way to scan everything.
# See https://stackoverflow.com/questions/3365740/how-to-import-all-submodules. # noqa
log.debug(f"Walking path: {top_path!r}")
try:
for loader, module_name, is_pkg in pkgutil.walk_packages([top_path]): # noqa
if not is_pkg:
log.debug(f"Skipping, not a package: {module_name!r}")
continue
log.debug(f"Manually importing: {module_name!r}")
# noinspection PyBroadException
try:
candidate = loader.find_module(module_name)\
.load_module(module_name) # noqa
except Exception:
# e.g. Alembic "autogenerate" gives: "ValueError: attempted
# relative import beyond top-level package"; or Django
# "django.core.exceptions.ImproperlyConfigured"
log.error(f"Package failed to import: {module_name!r}")
continue
if contains_c_extension(
module=candidate,
import_all_submodules=False, # only done at the top level # noqa
include_external_imports=include_external_imports,
seen=seen):
return True
except Exception:
log.error("Unable to walk packages further; no C extensions "
"detected so far!")
raise
return False
# noinspection PyUnresolvedReferences,PyTypeChecker
def test() -> None:
import _elementtree as et
import arrow
import alembic
import django
import django.conf
import numpy
import numpy.core.multiarray as numpy_multiarray
log.info(f"contains_c_extension(os): "
f"{contains_c_extension(os)}") # False
log.info(f"contains_c_extension(et): "
f"{contains_c_extension(et)}") # False
log.info(f"is_c_extension(numpy): "
f"{is_c_extension(numpy)}") # False
log.info(f"contains_c_extension(numpy): "
f"{contains_c_extension(numpy)}") # True
log.info(f"contains_c_extension(numpy_multiarray): "
f"{contains_c_extension(numpy_multiarray)}") # True # noqa
log.info(f"contains_c_extension(arrow): "
f"{contains_c_extension(arrow)}") # False
log.info(f"contains_c_extension(alembic): "
f"{contains_c_extension(alembic)}") # False
log.info(f"contains_c_extension(alembic, include_external_imports=True): "
f"{contains_c_extension(alembic, include_external_imports=True)}") # True # noqa
# ... this example shows that Alembic imports hashlib, which can import
# _hashlib, which is a C extension; however, that doesn't stop us (for
# example) installing Alembic on a machine with no C compiler
django.conf.settings.configure()
log.info(f"contains_c_extension(django): "
f"{contains_c_extension(django)}") # False
if __name__ == '__main__':
logging.basicConfig(level=logging.INFO) # be verbose
test()
While Cecil Curry's answer works (and was very informative, as was abarnert's, I might add) it will return False for the "top level" of a module even if it includes sub-modules that use the C extension (e.g. numpy vs. numpy.core.multiarray).
While probably not as robust as it could be, the following is working for my use current use cases:
def is_c(module):
# if module is part of the main python library (e.g. os), it won't have a path
try:
for path, subdirs, files in os.walk(module.__path__[0]):
for f in files:
ftype = f.split('.')[-1]
if ftype == 'so':
is_c = True
break
return is_c
except AttributeError:
path = inspect.getfile(module)
suffix = path.split('.')[-1]
if suffix != 'so':
return False
elif suffix == 'so':
return True
is_c(os), is_c(im), is_c(et), is_c_extension(ma), is_c(numpy)
# (False, False, True, True, True)
If you, like me, saw #Cecil Curry' s great answer and thought, how could I do this for an entire requirements file in a super lazy way without #Rudolf Cardinal's complex child library traversal, look no further!
First, dump all your installed requirements (assuming you did this in a virtual env and don't have other stuff in here) into a file with pip freeze > requirements.txt.
Then run the following script to check each of those requirements.
Note: this is super lazy and WILL NOT work for many libraries who's import names don't match their pip names.
import inspect, os
import importlib
from importlib.machinery import ExtensionFileLoader, EXTENSION_SUFFIXES
from types import ModuleType
# function from Cecil Curry's answer:
def is_c_extension(module: ModuleType) -> bool:
'''
`True` only if the passed module is a C extension implemented as a
dynamically linked shared library specific to the current platform.
Parameters
----------
module : ModuleType
Previously imported module object to be tested.
Returns
----------
bool
`True` only if this module is a C extension.
'''
assert isinstance(module, ModuleType), '"{}" not a module.'.format(module)
# If this module was loaded by a PEP 302-compliant CPython-specific loader
# loading only C extensions, this module is a C extension.
if isinstance(getattr(module, '__loader__', None), ExtensionFileLoader):
return True
# Else, fallback to filetype matching heuristics.
#
# Absolute path of the file defining this module.
module_filename = inspect.getfile(module)
# "."-prefixed filetype of this path if any or the empty string otherwise.
module_filetype = os.path.splitext(module_filename)[1]
# This module is only a C extension if this path's filetype is that of a
# C extension specific to the current platform.
return module_filetype in EXTENSION_SUFFIXES
with open('requirements.txt') as f:
lines = f.readlines()
for line in lines:
# super lazy pip name to library name conversion
# there is probably a better way to do this.
lib = line.split("=")[0].replace("python-","").replace("-","_").lower()
try:
mod = importlib.import_module(lib)
print(f"is {lib} a c extension? : {is_c_extension(mod)}")
except:
print(f"could not check {lib}, perhaps the name for imports is different?")

Import arbitrary-named file as a Python module, without generating bytecode file

How can a Python program easily import a Python module from a file with an arbitrary name?
The standard library import mechanism doesn't seem to help. An important constraint is that I don't want an accompanying bytecode file to appear; if I use imp.load_module on a source file named foo, a file named fooc appears, which is messy and confusing.
The Python import mechanism expects that it knows best what the filename will be: module files are found at specific filesystem locations, and in particular that the filenames have particular suffixes (foo.py for Python source code, etc.) and no others.
That clashes with another convention, at least on Unix: files which will be executed as a command should be named without reference to the implementation language. The command to do “foo”, for example, should be in a program file named foo with no suffix.
Unit-testing such a program file, though, requires importing that file. I need the objects from the program file as a Python module object, ready for manipulation in the unit test cases, exactly as import would give me.
What is the Pythonic way to import a module, especially from a file whose name doesn't end with .py, without a bytecode file appearing for that import?
import os
import imp
py_source_open_mode = "U"
py_source_description = (".py", py_source_open_mode, imp.PY_SOURCE)
module_filepath = "foo/bar/baz"
module_name = os.path.basename(module_filepath)
with open(module_filepath, py_source_open_mode) as module_file:
foo_module = imp.load_module(
module_name, module_file, module_filepath, py_source_description)
My best implementation so far is (using features only in Python 2.6 or later):
import os
import sys
import imp
import contextlib
#contextlib.contextmanager
def preserve_value(namespace, name):
""" A context manager to preserve, then restore, the specified binding.
:param namespace: The namespace object (e.g. a class or dict)
containing the name binding.
:param name: The name of the binding to be preserved.
:yield: None.
When the context manager is entered, the current value bound to
`name` in `namespace` is saved. When the context manager is
exited, the binding is re-established to the saved value.
"""
saved_value = getattr(namespace, name)
yield
setattr(namespace, name, saved_value)
def make_module_from_file(module_name, module_filepath):
""" Make a new module object from the source code in specified file.
:param module_name: The name of the resulting module object.
:param module_filepath: The filesystem path to open for
reading the module's Python source.
:return: The module object.
The Python import mechanism is not used. No cached bytecode
file is created, and no entry is placed in `sys.modules`.
"""
py_source_open_mode = 'U'
py_source_description = (".py", py_source_open_mode, imp.PY_SOURCE)
with open(module_filepath, py_source_open_mode) as module_file:
with preserve_value(sys, 'dont_write_bytecode'):
sys.dont_write_bytecode = True
module = imp.load_module(
module_name, module_file, module_filepath,
py_source_description)
return module
def import_program_as_module(program_filepath):
""" Import module from program file `program_filepath`.
:param program_filepath: The full filesystem path to the program.
This name will be used for both the source file to read, and
the resulting module name.
:return: The module object.
A program file has an arbitrary name; it is not suitable to
create a corresponding bytecode file alongside. So the creation
of bytecode is suppressed during the import.
The module object will also be added to `sys.modules`.
"""
module_name = os.path.basename(program_filepath)
module = make_module_from_file(module_name, program_filename)
sys.modules[module_name] = module
return module
This is a little too broad: it disables bytecode file generation during the entire import process for the module, which means other modules imported during that process will also not have bytecode files generated.
I'm still looking for a way to disable bytecode file generation for only the specified module file.
I ran into this problem, and after failing to find a solution I really liked, solved it by making a symlink with a .py extension. So, the executable script might be called command, the symbolic link that points to it is command.py, and the unit tests are in command_test.py. The symlink is happily accepted when importing the file as a module and the executable script follows standard naming conventions.
This has the added advantage of making it easy to keep the unit tests in a different directory than the executable.

Creating aliases for Python packages?

I have a directory, let's call it Storage full of packages with unwieldy names like mypackage-xxyyzzww, and of course Storage is on my PYTHONPATH. Since packages have long unmemorable names, all of the packages are symlinked to friendlier names, such as mypackage.
Now, I don't want to rely on file system symbolic links to do this, instead I tried mucking around with sys.path and sys.modules. Currently I'm doing something like this:
import imp
imp.load_package('mypackage', 'Storage/mypackage-xxyyzzww')
How bad is it to do things this way, and is there a chance this will break in the future? One funny thing is that there's even no mention of imp.load_package function in the docs.
EDIT: besides not relying on symbolic links, I can't use PYTHONPATH variable anymore.
Instead of using imp, you can assign different names to imported modules.
import mypackage_xxyyzzww as mypackage
If you then create a __init__.py file inside of Storage, you can add several of the above lines to make importing easier.
Storage/__init__.py:
import mypackage_xxyyzzww as mypackage
import otherpackage_xxyyzzww as otherpackage
Interpreter:
>>> from Storage import mypackage, otherpackage
importlib may be more appropriate, as it uses/implements the PEP302 mechanism.
Follow the DictImporter example, but override find_module to find the real filename and store it in the dict, then override load_module to get the code from the found file.
You shouldn't need to use sys.path once you've created your Storage module
#from importlib import abc
import imp
import os
import sys
import logging
logging.basicConfig(level=logging.DEBUG)
dprint = logging.debug
class MyImporter(object):
def __init__(self,path):
self.path=path
self.names = {}
def find_module(self,fullname,path=None):
dprint("find_module({fullname},{path})".format(**locals()))
ml = imp.find_module(fullname,path)
dprint(repr(ml))
raise ImportError
def load_module(self,fullname):
dprint("load_module({fullname})".format(**locals()))
return imp.load_module(fullname)
raise ImportError
def load_storage( path, modname=None ):
if modname is None:
modname = os.path.basename(path)
mod = imp.new_module(modname)
sys.modules[modname] = mod
assert mod.__name__== modname
mod.__path__=[path]
#sys.meta_path.append(MyImporter(path))
mod.__loader__= MyImporter(path)
return mod
if __name__=="__main__":
load_storage("arbitrary-path-to-code/Storage")
from Storage import plain
from Storage import mypkg
Then when you import Storage.mypackage, python will immediately use your importer without bothering to look on sys.path
That doesn't work. The code above does work to import ordinary modules under Storage without requiring Storage to be on sys.path, but both 3.1 and 2.6 seem to ignore the loader attribute mentioned in PEP302.
If I uncomment the sys.meta_path line, 3.1 dies with StackOverflow, and 2.6 dies with ImportError. hmmm... I'm out of time now, but may look at it later.
Packages are just entries in the namespace. You should not name your path components with anything that is not a legal python variable name.

Import a python module without the .py extension

I have a file called foobar (without .py extension). In the same directory I have another python file that tries to import it:
import foobar
But this only works if I rename the file to foobar.py. Is it possible to import a python module that doesn't have the .py extension?
Update: the file has no extension because I also use it as a standalone script, and I don't want to type the .py extension to run it.
Update2: I will go for the symlink solution mentioned below.
You can use the imp.load_source function (from the imp module), to load a module dynamically from a given file-system path.
import imp
foobar = imp.load_source('foobar', '/path/to/foobar')
This SO discussion also shows some interesting options.
Here is a solution for Python 3.4+:
from importlib.util import spec_from_loader, module_from_spec
from importlib.machinery import SourceFileLoader
spec = spec_from_loader("foobar", SourceFileLoader("foobar", "/path/to/foobar"))
foobar = module_from_spec(spec)
spec.loader.exec_module(foobar)
Using spec_from_loader and explicitly specifying a SourceFileLoader will force the machinery to load the file as source, without trying to figure out the type of the file from the extension. This means that you can load the file even though it is not listed in importlib.machinery.SOURCE_SUFFIXES.
If you want to keep importing the file by name after the first load, add the module to sys.modules:
sys.modules['foobar'] = foobar
You can find an implementation of this function in a utility library I maintain called haggis. haggis.load.load_module has options for adding the module to sys.modules, setting a custom name, and injecting variables into the namespace for the code to use.
Like others have mentioned, you could use imp.load_source, but it will make your code more difficult to read. I would really only recommend it if you need to import modules whose names or paths aren't known until run-time.
What is your reason for not wanting to use the .py extension? The most common case for not wanting to use the .py extension, is because the python script is also run as an executable, but you still want other modules to be able to import it. If this is the case, it might be beneficial to move functionality into a .py file with a similar name, and then use foobar as a wrapper.
imp.load_source(module_name, path) should do or you can do the more verbose imp.load_module(module_name, file_handle, ...) route if you have a file handle instead
importlib helper function
Here is a convenient, ready-to-use helper to replace imp, with an example, based on what was mentioned at: https://stackoverflow.com/a/43602645/895245
main.py
#!/usr/bin/env python3
import os
import importlib
import sys
def import_path(path):
module_name = os.path.basename(path).replace('-', '_')
spec = importlib.util.spec_from_loader(
module_name,
importlib.machinery.SourceFileLoader(module_name, path)
)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
sys.modules[module_name] = module
return module
notmain = import_path('not-main')
print(notmain)
print(notmain.x)
not-main
x = 1
Run:
python3 main.py
Output:
<module 'not_main' from 'not-main'>
1
I replace - with _ because my importable Python executables without extension have hyphens. This is not mandatory, but produces better module names.
This pattern is also mentioned in the docs at: https://docs.python.org/3.7/library/importlib.html#importing-a-source-file-directly
I ended up moving to it because after updating to Python 3.7, import imp prints:
DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
and I don't know how to turn that off, this was asked at:
The imp module is deprecated
How to ignore deprecation warnings in Python
Tested in Python 3.7.3.
If you install the script with package manager (deb or alike) another option would be to use setuptools:
"...there’s no easy way to have a script’s filename match local conventions on both Windows and POSIX platforms. For another, you often have to create a separate file just for the “main” script, when your actual “main” is a function in a module somewhere... setuptools fixes all of these problems by automatically generating scripts for you with the correct extension, and on Windows it will even create an .exe file..."
https://pythonhosted.org/setuptools/setuptools.html#automatic-script-creation
import imp has been deprecated.
The following is clean and minimal for me:
import sys
import types
import pathlib
def importFileAs(
modAsName: str,
importedFilePath: typing.Union[str, pathlib.Path],
) -> types.ModuleType:
""" Import importedFilePath as modAsName, return imported module
by loading importedFilePath and registering modAsName in sys.modules.
importedFilePath can be any file and does not have to be a .py file. modAsName should be python valid.
Raises ImportError: If the file cannot be imported or any Exception: occuring during loading.
Refs:
Similar to: https://stackoverflow.com/questions/19009932/import-arbitrary-python-source-file-python-3-3
but allows for other than .py files as well through importlib.machinery.SourceFileLoader.
"""
import importlib.util
import importlib.machinery
# from_loader does not enforce .py but importlib.util.spec_from_file_location() does.
spec = importlib.util.spec_from_loader(
modAsName,
importlib.machinery.SourceFileLoader(modAsName, importedFilePath),
)
if spec is None:
raise ImportError(f"Could not load spec for module '{modAsName}' at: {importedFilePath}")
module = importlib.util.module_from_spec(spec)
try:
spec.loader.exec_module(module)
except FileNotFoundError as e:
raise ImportError(f"{e.strerror}: {importedFilePath}") from e
sys.modules[modAsName] = module
return module
And then I would use it as so:
aasMarmeeManage = importFileAs('aasMarmeeManage', '/bisos/bpip/bin/aasMarmeeManage.cs')
def g_extraParams(): aasMarmeeManage.g_extraParams()

Categories