Creating aliases for Python packages? - python

I have a directory, let's call it Storage full of packages with unwieldy names like mypackage-xxyyzzww, and of course Storage is on my PYTHONPATH. Since packages have long unmemorable names, all of the packages are symlinked to friendlier names, such as mypackage.
Now, I don't want to rely on file system symbolic links to do this, instead I tried mucking around with sys.path and sys.modules. Currently I'm doing something like this:
import imp
imp.load_package('mypackage', 'Storage/mypackage-xxyyzzww')
How bad is it to do things this way, and is there a chance this will break in the future? One funny thing is that there's even no mention of imp.load_package function in the docs.
EDIT: besides not relying on symbolic links, I can't use PYTHONPATH variable anymore.

Instead of using imp, you can assign different names to imported modules.
import mypackage_xxyyzzww as mypackage
If you then create a __init__.py file inside of Storage, you can add several of the above lines to make importing easier.
Storage/__init__.py:
import mypackage_xxyyzzww as mypackage
import otherpackage_xxyyzzww as otherpackage
Interpreter:
>>> from Storage import mypackage, otherpackage

importlib may be more appropriate, as it uses/implements the PEP302 mechanism.
Follow the DictImporter example, but override find_module to find the real filename and store it in the dict, then override load_module to get the code from the found file.
You shouldn't need to use sys.path once you've created your Storage module
#from importlib import abc
import imp
import os
import sys
import logging
logging.basicConfig(level=logging.DEBUG)
dprint = logging.debug
class MyImporter(object):
def __init__(self,path):
self.path=path
self.names = {}
def find_module(self,fullname,path=None):
dprint("find_module({fullname},{path})".format(**locals()))
ml = imp.find_module(fullname,path)
dprint(repr(ml))
raise ImportError
def load_module(self,fullname):
dprint("load_module({fullname})".format(**locals()))
return imp.load_module(fullname)
raise ImportError
def load_storage( path, modname=None ):
if modname is None:
modname = os.path.basename(path)
mod = imp.new_module(modname)
sys.modules[modname] = mod
assert mod.__name__== modname
mod.__path__=[path]
#sys.meta_path.append(MyImporter(path))
mod.__loader__= MyImporter(path)
return mod
if __name__=="__main__":
load_storage("arbitrary-path-to-code/Storage")
from Storage import plain
from Storage import mypkg
Then when you import Storage.mypackage, python will immediately use your importer without bothering to look on sys.path
That doesn't work. The code above does work to import ordinary modules under Storage without requiring Storage to be on sys.path, but both 3.1 and 2.6 seem to ignore the loader attribute mentioned in PEP302.
If I uncomment the sys.meta_path line, 3.1 dies with StackOverflow, and 2.6 dies with ImportError. hmmm... I'm out of time now, but may look at it later.

Packages are just entries in the namespace. You should not name your path components with anything that is not a legal python variable name.

Related

List project modules imported both directly and indirectly

I've been caught out by circular imports on a large project. So I'm seeking to find a way to test my code to see which of the modules in the project (and only in the project) are imported when an import statement is run. This is to inform refactoring and make sure there isn't an import somewhere deep within a package that's causing a problem.
Suppose I import project package 'agent', I want to know which project modules also get imported as a result. For instance if 'environment' and 'policy' are imported due to modules deep within the agent package containing those import statements, then I want to see just those listed. So not numpy modules listed for example as they are outside the project and so not relevant for circular dependencies.
So far I have this:
import sys
import agent # project module
for k, v in sys.modules.items():
print(f"key: {k} value: {v}")
example rows:
key: numpy.random value: <module 'numpy.random' from '/home/robin/Python/anaconda3/envs/rl/lib/python3.9/site-packages/numpy/random/__init__.py'>
key: environment value: <module 'environment' from '/home/robin/Python/Projects/RL_Sutton/Cliff/environment/__init__.py'>
This does return the modules imported both directly and indirectly but also includes a lot else such as all the components of numpy and builtins etc... If I could filter this dictionary that would solve it.
k is a str, v is <class 'module'>.
The module's __str__ method does return the module file path within it so I suppose that could be used but it's not a clean solution. I've tried looking at the documentation for sys.modules and module_type but nothing there gives a way to filter modules to the current project (that I could see).
I tried to modify the solutions for each of these without success:
How to list imported modules?
List imported modules from an imported module in Python 3
ModuleFinder also looked promising but from the limited example I couldn't see how to make path or excludes solve the problem.
Update
I didn't specify this in the original question but I'm importing modules that often look like this:
from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
import environment
import policy
ModuleFinder will find environment and policy even though they won't be imported at runtime and don't matter for cyclic imports. So I adapted the accepted answer below to find only runtime imports.
import agent
import sys
app_dir = '/path/to/projects_folder'
imported_module_names = []
for module_name, mod in sys.modules.items():
file = getattr(mod, '__file__', '')
if str(file).startswith(app_dir) and module_name != '__main__':
imported_module_names.append(module_name)
for module_name in sorted(imported_module_names):
print(module_name)
You can use modulefinder to run a script and inspect the imported modules. These can be filtered by using the __file__ attribute (given that you actually import these modules from the file system; don't worry about the dunder attribute, it's for consistency with the builtin module type):
from modulefinder import ModuleFinder
finder = ModuleFinder()
finder.run_script('test.py')
appdir = '/path/to/project'
modules = {name: mod for name, mod in finder.modules.items()
if mod.__file__ is not None
and mod.__file__.startswith(appdir)}
for name in modules.keys():
print(f"{name}")
You can invoke Python with the -v command line option which will print a message each time a module is initialized.

Python 3.5+: How to dynamically import a module given the full file path (in the presence of implicit sibling imports)?

Question
The standard library clearly documents how to import source files directly (given the absolute file path to the source file), but this approach does not work if that source file uses implicit sibling imports as described in the example below.
How could that example be adapted to work in the presence of implicit sibling imports?
I already checked out this and this other Stackoverflow questions on the topic, but they do not address implicit sibling imports within the file being imported by hand.
Setup/Example
Here's an illustrative example
Directory structure:
root/
- directory/
- app.py
- folder/
- implicit_sibling_import.py
- lib.py
app.py:
import os
import importlib.util
# construct absolute paths
root = os.path.abspath(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))
isi_path = os.path.join(root, 'folder', 'implicit_sibling_import.py')
def path_import(absolute_path):
'''implementation taken from https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly'''
spec = importlib.util.spec_from_file_location(absolute_path, absolute_path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
isi = path_import(isi_path)
print(isi.hello_wrapper())
lib.py:
def hello():
return 'world'
implicit_sibling_import.py:
import lib # this is the implicit sibling import. grabs root/folder/lib.py
def hello_wrapper():
return "ISI says: " + lib.hello()
#if __name__ == '__main__':
# print(hello_wrapper())
Running python folder/implicit_sibling_import.py with the if __name__ == '__main__': block commented out yields ISI says: world in Python 3.6.
But running python directory/app.py yields:
Traceback (most recent call last):
File "directory/app.py", line 10, in <module>
spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
File "/Users/pedro/test/folder/implicit_sibling_import.py", line 1, in <module>
import lib
ModuleNotFoundError: No module named 'lib'
Workaround
If I add import sys; sys.path.insert(0, os.path.dirname(isi_path)) to app.py, python app.py yields world as intended, but I would like to avoid munging the sys.path if possible.
Answer requirements
I'd like python app.py to print ISI says: world and I'd like to accomplish this by modifying the path_import function.
I'm not sure of the implications of mangling sys.path. Eg. if there was directory/requests.py and I added the path to directory to the sys.path, I wouldn't want import requests to start importing directory/requests.py instead of importing the requests library that I installed with pip install requests.
The solution MUST be implemented as a python function that accepts the absolute file path to the desired module and returns the module object.
Ideally, the solution should not introduce side-effects (eg. if it does modify sys.path, it should return sys.path to its original state). If the solution does introduce side-effects, it should explain why a solution cannot be achieved without introducing side-effects.
PYTHONPATH
If I have multiple projects doing this, I don't want to have to remember to set PYTHONPATH every time I switch between them. The user should just be able to pip install my project and run it without any additional setup.
-m
The -m flag is the recommended/pythonic approach, but the standard library also clearly documents How to import source files directly. I'd like to know how I can adapt that approach to cope with implicit relative imports. Clearly, Python's internals must do this, so how do the internals differ from the "import source files directly" documentation?
The easiest solution I could come up with is to temporarily modify sys.path in the function doing the import:
from contextlib import contextmanager
#contextmanager
def add_to_path(p):
import sys
old_path = sys.path
sys.path = sys.path[:]
sys.path.insert(0, p)
try:
yield
finally:
sys.path = old_path
def path_import(absolute_path):
'''implementation taken from https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly'''
with add_to_path(os.path.dirname(absolute_path)):
spec = importlib.util.spec_from_file_location(absolute_path, absolute_path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
This should not cause any problems unless you do imports in another thread concurrently. Otherwise, since sys.path is restored to its previous state, there should be no unwanted side effects.
Edit:
I realize that my answer is somewhat unsatisfactory but, digging into the code reveals that, the line spec.loader.exec_module(module) basically results in exec(spec.loader.get_code(module.__name__),module.__dict__) getting called. Here spec.loader.get_code(module.__name__) is simply the code contained in lib.py.
Thus a better answer to the question would have to find a way to make the import statement behave differently by simply injecting one or more global variables through the second argument of the exec-statement. However, "whatever you do to make the import machinery look in that file's folder, it'll have to linger beyond the duration of the initial import, since functions from that file might perform further imports when you call them", as stated by #user2357112 in the question comments.
Unfortunately the only way to change the behavior of the import statement seems to be to change sys.path or in a package __path__. module.__dict__ already contains __path__ so that doesn't seem to work which leaves sys.path (Or trying to figure out why exec does not treat the code as a package even though it has __path__ and __package__ ... - But I don't know where to start - Maybe it has something to do with having no __init__.py file).
Furthermore this issue does not seem to be specific to importlib but rather a general problem with sibling imports.
Edit2: If you don't want the module to end up in sys.modules the following should work (Note that any modules added to sys.modules during the import are removed):
from contextlib import contextmanager
#contextmanager
def add_to_path(p):
import sys
old_path = sys.path
old_modules = sys.modules
sys.modules = old_modules.copy()
sys.path = sys.path[:]
sys.path.insert(0, p)
try:
yield
finally:
sys.path = old_path
sys.modules = old_modules
add to the PYTHONPATH environment variable the path your application is on
Augment the default search path for module files. The format is the same as the shell’s PATH: one or more directory pathnames
separated by os.pathsep (e.g. colons on Unix or semicolons on
Windows). Non-existent directories are silently ignored.
on bash its like this:
export PYTHONPATH="./folder/:${PYTHONPATH}"
or run directly:
PYTHONPATH="./folder/:${PYTHONPATH}" python directory/app.py
The OP's idea is great, this work only for this example by adding sibling modules with proper name to the sys.modules, I would say it is the SAME as adding PYTHONPATH. tested and working with version 3.5.1.
import os
import sys
import importlib.util
class PathImport(object):
def get_module_name(self, absolute_path):
module_name = os.path.basename(absolute_path)
module_name = module_name.replace('.py', '')
return module_name
def add_sibling_modules(self, sibling_dirname):
for current, subdir, files in os.walk(sibling_dirname):
for file_py in files:
if not file_py.endswith('.py'):
continue
if file_py == '__init__.py':
continue
python_file = os.path.join(current, file_py)
(module, spec) = self.path_import(python_file)
sys.modules[spec.name] = module
def path_import(self, absolute_path):
module_name = self.get_module_name(absolute_path)
spec = importlib.util.spec_from_file_location(module_name, absolute_path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return (module, spec)
def main():
pathImport = PathImport()
root = os.path.abspath(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))
isi_path = os.path.join(root, 'folder', 'implicit_sibling_import.py')
sibling_dirname = os.path.dirname(isi_path)
pathImport.add_sibling_modules(sibling_dirname)
(lib, spec) = pathImport.path_import(isi_path)
print (lib.hello())
if __name__ == '__main__':
main()
Try:
export PYTHONPATH="./folder/:${PYTHONPATH}"
or run directly:
PYTHONPATH="./folder/:${PYTHONPATH}" python directory/app.py
Make sure your root is in a folder that is explicitly searched in the PYTHONPATH. Use an absolute import:
from root.folder import implicit_sibling_import #called from app.py
Make sure your root is in a folder that is explicitly searched in the PYTHONPATH
Use an absolute import:
from root.folder import implicit_sibling_import # called from app.py

How to import members of all modules within a package?

I am developing a package that has a file structure similar to the following:
test.py
package/
__init__.py
foo_module.py
example_module.py
If I call import package in test.py, I want the package module to appear similar to this:
>>> vars(package)
mapping_proxy({foo: <function foo at 0x…}, {example: <function example at 0x…})
In other words, I want the members of all modules in package to be in package's namespace, and I do not want the modules themselves to be in the namespace. package is not a sub-package.
Let's say my files look like this:
foo_module.py:
def foo(bar):
return bar
example_module.py:
def example(arg):
return foo(arg)
test.py:
print(example('derp'))
How do I structure the import statements in test.py, example_module.py, and __init__.py to work from outside the package directory (i.e. test.py) and within the package itself (i.e. foo_module.py and example_module.py)? Everything I try gives Parent module '' not loaded, cannot perform relative import or ImportError: No module named 'module_name'.
Also, as a side-note (as per PEP 8): "Relative imports for intra-package imports are highly discouraged. Always use the absolute package path for all imports. Even now that PEP 328 is fully implemented in Python 2.5, its style of explicit relative imports is actively discouraged; absolute imports are more portable and usually more readable."
I am using Python 3.3.
I want the members of all modules in package to be in package's
namespace, and I do not want the modules themselves to be in the
namespace.
I was able to do that by adapting something I've used in Python 2 to automatically import plug-ins to also work in Python 3.
In a nutshell, here's how it works:
The package's __init__.py file imports all the other Python files in the same package directory except for those whose names start with an '_' (underscore) character.
It then adds any names in the imported module's namespace to that of __init__ module's (which is also the package's namespace). Note I had to make the example_module module explicitly import foo from the .foo_module.
One important aspect of doing things this way is realizing that it's dynamic and doesn't require the package module names to be hardcoded into the __init__.py file. Of course this requires more code to accomplish, but also makes it very generic and able to work with just about any (single-level) package — since it will automatically import new modules when they're added and no longer attempt to import any removed from the directory.
test.py:
from package import *
print(example('derp'))
__init__.py:
def _import_all_modules():
""" Dynamically imports all modules in this package. """
import traceback
import os
global __all__
__all__ = []
globals_, locals_ = globals(), locals()
# Dynamically import all the package modules in this file's directory.
for filename in os.listdir(__name__):
# Process all python files in directory that don't start
# with underscore (which also prevents this module from
# importing itself).
if filename[0] != '_' and filename.split('.')[-1] in ('py', 'pyw'):
modulename = filename.split('.')[0] # Filename sans extension.
package_module = '.'.join([__name__, modulename])
try:
module = __import__(package_module, globals_, locals_, [modulename])
except:
traceback.print_exc()
raise
for name in module.__dict__:
if not name.startswith('_'):
globals_[name] = module.__dict__[name]
__all__.append(name)
_import_all_modules()
foo_module.py:
def foo(bar):
return bar
example_module.py:
from .foo_module import foo # added
def example(arg):
return foo(arg)
I think you can get the values you need without cluttering up your namespace, by using from module import name style imports. I think these imports will work for what you are asking for:
Imports for example_module.py:
from package.foo_module import foo
Imports for __init__.py:
from package.foo_module import foo
from package.example_module import example
__all__ = [foo, example] # not strictly necessary, but makes clear what is public
Imports for test.py:
from package import example
Note that this only works if you're running test.py (or something else at the same level of the package hierarchy). Otherwise you'd need to make sure the folder containing package is in the python module search path (either by installing the package somewhere Python will look for it, or by adding the appropriate folder to sys.path).

Python dependencies?

Is it possible to programmatically detect dependencies given a python project residing in SVN?
Here is a twist which adds some precision, and which might be useful if you find you're frequently checking dependencies of miscellaneous code:
Catches only import statements executed by the code being analyzed.
Automatically excludes all system-loaded modules, so you don't have to weed through it.
Also reports the symbols imported from each module.
Code:
import __builtin__
import collections
import sys
IN_USE = collections.defaultdict(set)
_IMPORT = __builtin__.__import__
def _myimport(name, globs=None, locs=None, fromlist=None, level=-1):
global IN_USE
if fromlist is None:
fromlist = []
IN_USE[name].update(fromlist)
return _IMPORT(name, globs, locs, fromlist, level)
# monkey-patch __import__
setattr(__builtin__, '__import__', _myimport)
# import and run the target project here and run the routine
import foobar
foobar.do_something()
# when it finishes running, dump the imports
print 'modules and symbols imported by "foobar":'
for key in sorted(IN_USE.keys()):
print key
for name in sorted(IN_USE[key]):
print ' ', name
Example foobar module:
import byteplay
import cjson
def _other():
from os import path
from sys import modules
def do_something():
import hashlib
import lxml
_other()
Output:
modules and symbols imported by "foobar":
_hashlib
array
array
byteplay
cStringIO
StringIO
cjson
dis
findlabels
foobar
hashlib
itertools
lxml
opcode
*
__all__
operator
os
path
sys
modules
types
warnings
Absolutely! If you are working from a UNIX or Linux shell, a simple combination of grep and awk would work; basically, all you want to do is search for lines containing the "import" keyword.
However, if you are working from any environment, you could just write a small Python script to do the searching for you (don't forget that strings are treated as immutable sequences, so you can do something like if "import" in line: ....
The one sticky spot, would be associating those imported modules to their package name (the first one that comes to mind is the PIL module, in Ubuntu it's provided by the python-imaging package).
Python code can import modules using runtime-constructed strings, so the only surefire way would be to run the code. Real-world example: when you open a database with SQLAlchemy's dbconnect, the library will load one or more db-api modules depending on the content of your database string.
If you're willing to run the code, here is a relatively simple way to do this by examining sys.modules when it finishes:
>>> from sys import modules
>>> import codeofinterest
>>> execute_code_of_interest()
>>> print modules
[ long, list, of, loaded, modules ]
Here, too, you should keep in mind that this could theoretically fail if execute_code_of_interest() modifies sys.modules, but I believe that's quite rare in production code.

Import a python module without the .py extension

I have a file called foobar (without .py extension). In the same directory I have another python file that tries to import it:
import foobar
But this only works if I rename the file to foobar.py. Is it possible to import a python module that doesn't have the .py extension?
Update: the file has no extension because I also use it as a standalone script, and I don't want to type the .py extension to run it.
Update2: I will go for the symlink solution mentioned below.
You can use the imp.load_source function (from the imp module), to load a module dynamically from a given file-system path.
import imp
foobar = imp.load_source('foobar', '/path/to/foobar')
This SO discussion also shows some interesting options.
Here is a solution for Python 3.4+:
from importlib.util import spec_from_loader, module_from_spec
from importlib.machinery import SourceFileLoader
spec = spec_from_loader("foobar", SourceFileLoader("foobar", "/path/to/foobar"))
foobar = module_from_spec(spec)
spec.loader.exec_module(foobar)
Using spec_from_loader and explicitly specifying a SourceFileLoader will force the machinery to load the file as source, without trying to figure out the type of the file from the extension. This means that you can load the file even though it is not listed in importlib.machinery.SOURCE_SUFFIXES.
If you want to keep importing the file by name after the first load, add the module to sys.modules:
sys.modules['foobar'] = foobar
You can find an implementation of this function in a utility library I maintain called haggis. haggis.load.load_module has options for adding the module to sys.modules, setting a custom name, and injecting variables into the namespace for the code to use.
Like others have mentioned, you could use imp.load_source, but it will make your code more difficult to read. I would really only recommend it if you need to import modules whose names or paths aren't known until run-time.
What is your reason for not wanting to use the .py extension? The most common case for not wanting to use the .py extension, is because the python script is also run as an executable, but you still want other modules to be able to import it. If this is the case, it might be beneficial to move functionality into a .py file with a similar name, and then use foobar as a wrapper.
imp.load_source(module_name, path) should do or you can do the more verbose imp.load_module(module_name, file_handle, ...) route if you have a file handle instead
importlib helper function
Here is a convenient, ready-to-use helper to replace imp, with an example, based on what was mentioned at: https://stackoverflow.com/a/43602645/895245
main.py
#!/usr/bin/env python3
import os
import importlib
import sys
def import_path(path):
module_name = os.path.basename(path).replace('-', '_')
spec = importlib.util.spec_from_loader(
module_name,
importlib.machinery.SourceFileLoader(module_name, path)
)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
sys.modules[module_name] = module
return module
notmain = import_path('not-main')
print(notmain)
print(notmain.x)
not-main
x = 1
Run:
python3 main.py
Output:
<module 'not_main' from 'not-main'>
1
I replace - with _ because my importable Python executables without extension have hyphens. This is not mandatory, but produces better module names.
This pattern is also mentioned in the docs at: https://docs.python.org/3.7/library/importlib.html#importing-a-source-file-directly
I ended up moving to it because after updating to Python 3.7, import imp prints:
DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
and I don't know how to turn that off, this was asked at:
The imp module is deprecated
How to ignore deprecation warnings in Python
Tested in Python 3.7.3.
If you install the script with package manager (deb or alike) another option would be to use setuptools:
"...there’s no easy way to have a script’s filename match local conventions on both Windows and POSIX platforms. For another, you often have to create a separate file just for the “main” script, when your actual “main” is a function in a module somewhere... setuptools fixes all of these problems by automatically generating scripts for you with the correct extension, and on Windows it will even create an .exe file..."
https://pythonhosted.org/setuptools/setuptools.html#automatic-script-creation
import imp has been deprecated.
The following is clean and minimal for me:
import sys
import types
import pathlib
def importFileAs(
modAsName: str,
importedFilePath: typing.Union[str, pathlib.Path],
) -> types.ModuleType:
""" Import importedFilePath as modAsName, return imported module
by loading importedFilePath and registering modAsName in sys.modules.
importedFilePath can be any file and does not have to be a .py file. modAsName should be python valid.
Raises ImportError: If the file cannot be imported or any Exception: occuring during loading.
Refs:
Similar to: https://stackoverflow.com/questions/19009932/import-arbitrary-python-source-file-python-3-3
but allows for other than .py files as well through importlib.machinery.SourceFileLoader.
"""
import importlib.util
import importlib.machinery
# from_loader does not enforce .py but importlib.util.spec_from_file_location() does.
spec = importlib.util.spec_from_loader(
modAsName,
importlib.machinery.SourceFileLoader(modAsName, importedFilePath),
)
if spec is None:
raise ImportError(f"Could not load spec for module '{modAsName}' at: {importedFilePath}")
module = importlib.util.module_from_spec(spec)
try:
spec.loader.exec_module(module)
except FileNotFoundError as e:
raise ImportError(f"{e.strerror}: {importedFilePath}") from e
sys.modules[modAsName] = module
return module
And then I would use it as so:
aasMarmeeManage = importFileAs('aasMarmeeManage', '/bisos/bpip/bin/aasMarmeeManage.cs')
def g_extraParams(): aasMarmeeManage.g_extraParams()

Categories