How to import members of all modules within a package? - python

I am developing a package that has a file structure similar to the following:
test.py
package/
__init__.py
foo_module.py
example_module.py
If I call import package in test.py, I want the package module to appear similar to this:
>>> vars(package)
mapping_proxy({foo: <function foo at 0x…}, {example: <function example at 0x…})
In other words, I want the members of all modules in package to be in package's namespace, and I do not want the modules themselves to be in the namespace. package is not a sub-package.
Let's say my files look like this:
foo_module.py:
def foo(bar):
return bar
example_module.py:
def example(arg):
return foo(arg)
test.py:
print(example('derp'))
How do I structure the import statements in test.py, example_module.py, and __init__.py to work from outside the package directory (i.e. test.py) and within the package itself (i.e. foo_module.py and example_module.py)? Everything I try gives Parent module '' not loaded, cannot perform relative import or ImportError: No module named 'module_name'.
Also, as a side-note (as per PEP 8): "Relative imports for intra-package imports are highly discouraged. Always use the absolute package path for all imports. Even now that PEP 328 is fully implemented in Python 2.5, its style of explicit relative imports is actively discouraged; absolute imports are more portable and usually more readable."
I am using Python 3.3.

I want the members of all modules in package to be in package's
namespace, and I do not want the modules themselves to be in the
namespace.
I was able to do that by adapting something I've used in Python 2 to automatically import plug-ins to also work in Python 3.
In a nutshell, here's how it works:
The package's __init__.py file imports all the other Python files in the same package directory except for those whose names start with an '_' (underscore) character.
It then adds any names in the imported module's namespace to that of __init__ module's (which is also the package's namespace). Note I had to make the example_module module explicitly import foo from the .foo_module.
One important aspect of doing things this way is realizing that it's dynamic and doesn't require the package module names to be hardcoded into the __init__.py file. Of course this requires more code to accomplish, but also makes it very generic and able to work with just about any (single-level) package — since it will automatically import new modules when they're added and no longer attempt to import any removed from the directory.
test.py:
from package import *
print(example('derp'))
__init__.py:
def _import_all_modules():
""" Dynamically imports all modules in this package. """
import traceback
import os
global __all__
__all__ = []
globals_, locals_ = globals(), locals()
# Dynamically import all the package modules in this file's directory.
for filename in os.listdir(__name__):
# Process all python files in directory that don't start
# with underscore (which also prevents this module from
# importing itself).
if filename[0] != '_' and filename.split('.')[-1] in ('py', 'pyw'):
modulename = filename.split('.')[0] # Filename sans extension.
package_module = '.'.join([__name__, modulename])
try:
module = __import__(package_module, globals_, locals_, [modulename])
except:
traceback.print_exc()
raise
for name in module.__dict__:
if not name.startswith('_'):
globals_[name] = module.__dict__[name]
__all__.append(name)
_import_all_modules()
foo_module.py:
def foo(bar):
return bar
example_module.py:
from .foo_module import foo # added
def example(arg):
return foo(arg)

I think you can get the values you need without cluttering up your namespace, by using from module import name style imports. I think these imports will work for what you are asking for:
Imports for example_module.py:
from package.foo_module import foo
Imports for __init__.py:
from package.foo_module import foo
from package.example_module import example
__all__ = [foo, example] # not strictly necessary, but makes clear what is public
Imports for test.py:
from package import example
Note that this only works if you're running test.py (or something else at the same level of the package hierarchy). Otherwise you'd need to make sure the folder containing package is in the python module search path (either by installing the package somewhere Python will look for it, or by adding the appropriate folder to sys.path).

Related

List project modules imported both directly and indirectly

I've been caught out by circular imports on a large project. So I'm seeking to find a way to test my code to see which of the modules in the project (and only in the project) are imported when an import statement is run. This is to inform refactoring and make sure there isn't an import somewhere deep within a package that's causing a problem.
Suppose I import project package 'agent', I want to know which project modules also get imported as a result. For instance if 'environment' and 'policy' are imported due to modules deep within the agent package containing those import statements, then I want to see just those listed. So not numpy modules listed for example as they are outside the project and so not relevant for circular dependencies.
So far I have this:
import sys
import agent # project module
for k, v in sys.modules.items():
print(f"key: {k} value: {v}")
example rows:
key: numpy.random value: <module 'numpy.random' from '/home/robin/Python/anaconda3/envs/rl/lib/python3.9/site-packages/numpy/random/__init__.py'>
key: environment value: <module 'environment' from '/home/robin/Python/Projects/RL_Sutton/Cliff/environment/__init__.py'>
This does return the modules imported both directly and indirectly but also includes a lot else such as all the components of numpy and builtins etc... If I could filter this dictionary that would solve it.
k is a str, v is <class 'module'>.
The module's __str__ method does return the module file path within it so I suppose that could be used but it's not a clean solution. I've tried looking at the documentation for sys.modules and module_type but nothing there gives a way to filter modules to the current project (that I could see).
I tried to modify the solutions for each of these without success:
How to list imported modules?
List imported modules from an imported module in Python 3
ModuleFinder also looked promising but from the limited example I couldn't see how to make path or excludes solve the problem.
Update
I didn't specify this in the original question but I'm importing modules that often look like this:
from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
import environment
import policy
ModuleFinder will find environment and policy even though they won't be imported at runtime and don't matter for cyclic imports. So I adapted the accepted answer below to find only runtime imports.
import agent
import sys
app_dir = '/path/to/projects_folder'
imported_module_names = []
for module_name, mod in sys.modules.items():
file = getattr(mod, '__file__', '')
if str(file).startswith(app_dir) and module_name != '__main__':
imported_module_names.append(module_name)
for module_name in sorted(imported_module_names):
print(module_name)
You can use modulefinder to run a script and inspect the imported modules. These can be filtered by using the __file__ attribute (given that you actually import these modules from the file system; don't worry about the dunder attribute, it's for consistency with the builtin module type):
from modulefinder import ModuleFinder
finder = ModuleFinder()
finder.run_script('test.py')
appdir = '/path/to/project'
modules = {name: mod for name, mod in finder.modules.items()
if mod.__file__ is not None
and mod.__file__.startswith(appdir)}
for name in modules.keys():
print(f"{name}")
You can invoke Python with the -v command line option which will print a message each time a module is initialized.

Load package but package with same name is already loaded

I have two versions of the same Python package. I need from a module in a subpackage in the current version to be able to call a function inside the old version of the package (which copied itself in the past)
Where I am now:
now/
package/
__init__.py
subpackage/
__init__.py
module.py -> "import package.subpackage.... <HERE>"
subpackage2/
...
...
The old version:
past/
package/
__init__.py
subpackage/
__init__.py
module.py -> "import package.subpackage; from . import module2; .... def f(x) ..."
module2.py
subpackage2/
...
...
I need to import in <HERE> the "old" f and run it.
Ideally
the function f should live its life inside the old package without knowing anything about the new version of the package
the module in the new package should call it, let it live its life, get the results and then forget altogether about the existence of the old package (so calling "import package.subpackage2" after letting f do her thing should run the "new" version)
doing that should not be terribly complex
The underlying idea is to improve reproducibility by saving the code that I used for some task along with the output data, and then being able to run parts of it.
Sadly, I understood this is not a simple task with Python 3, so I am prepared to accept some sort of compromise. I am prepared to accept, for example that after running the old f(x) the name package in the "new" code will be bound to the old.
EDIT
I tried in two ways using importlib. The idea was to create an object mod and then doing f = getattr(mod, "f"), but it doesn't work
Changing sys.path to ['.../past/package/subpackage'] and then calling importlib.import_module('package.subpackage.module') . The problem is that it will load the one in "now" even with the changed sys.path, probably because the name package is already in sys.modules
spec = importlib.util.spec_from_file_location("module", "path..to..past..module.py"))
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
In that case relative imports (from . import module2.py) won't work, giving the error "attempted relative import with no known parent package"
There is one way this could work quite simply, but you will have to make a few modifications to your old package.
You can simply create a file in now/package/old/__init__.py containing:
__path__ = ['/absolute/path/to/old/package']
In the new package, you can then do:
from package.old.package.subpackage.module import f as old_f
The catch here is that the old package tries to import its own packages using absolute import, it is going to load stuff from the new packages instead. So the old package will have to only use relative imports when importing stuffs from its own package or you'll have to prepend package.old to all absolute imports that the old package was doing.
If you are fine with modifying the old packages in this way, then that should be fine. If that limitation would not work for you, then read on.
If you are really, really sure that for some reasons don't want to modify the old packages. Then let's do some black magic, you'd want to replace builtins.__import__ with your own version that returns different modules depending on who is doing the importing. You can figure out who is doing the importing by inspecting the call stack.
For example, this is how you might do it (tested on Python 3.6):
import builtins
import inspect
import package.old
old_package_path = package.old.__path__[0]
OUR_PACKAGE_NAME = 'package'
OUR_PACKAGE_NAME_WITH_DOT = OUR_PACKAGE_NAME + '.'
def import_module(name, globs=None, locs=None, fromlist=(), level=0):
# only intercept imports for our own package from our old module
if not name.startswith(OUR_PACKAGE_NAME_WITH_DOT) or \
not inspect.stack()[1].filename.startswith(old_package_path):
return real_import(name, globs, locs, fromlist, level)
new_name = OUR_PACKAGE_NAME + '.old.' + name[len(OUR_PACKAGE_NAME_WITH_DOT):]
mod = real_import(new_name, globs, locs, fromlist, level)
return mod.old
# save the original __import__ since we'll need it to do the actual import
real_import = builtins.__import__
builtins.__import__ = import_module
builtins.__import__ gets called on any import statements encountered by the interpreter, and the call is not cached so you can return different things every time it is called even when they use the same name.
The following is my old answer, here for historical purpose only
I don't quite get what you're trying to do, but this is likely possible to do in Python 3 by using importlib.
You would just create a module loader that loads your module from an explicit filepath.
There's also an invalidate_caches() and reload() function which may be useful, though you may not need them.

Relative import of package __init__.py

Suppose I have a package containing two submodules and also a substantial amount of code in __init__.py itself:
pkg/__init__.py
pkg/foo.py
pkg/bar.py
and, to make planned future refactorings easier, I want components of the package to exclusively use relative imports to refer to each other. In particular, import pkg should never appear.
From foo.py I can do
from __future__ import absolute_import
from . import bar
to get access to the bar.py module, and vice versa.
The question is, what do I write to import __init__.py in this manner? I want exactly the same effect as import pkg as local_name, only without having to specify the absolute name pkg.
#import pkg as local_name
from . import ??? as local_name
UPDATE: Inspired by maxymoo's answer, I tried
from . import __init__ as local_name
This does not set local_name to the the module defined by __init__.py; it instead gets what appears to be a bound method wrapper for the __init__ method of that module. I suppose I could do
from . import __init__ as local_name
local_name = local_name.__self__
to get the thing I want, but (a) yuck, and (b) this makes me worry that the module hasn't been fully initialized.
Answers need to work on both Python 2.7 and Python 3.4+.
Yes, it would probably be better to hollow out __init__.py and just have it reexport stuff from the submodules, but that can't happen just yet.
There's nothing special about the dunders (they're just discouraged when writing your own module/function names); you should just be able to do
from .__init__ import my_function as local_name
python2 and python3 (uses the discouraged __import__):
from 1st level module (pkg.foo, pgk.bar, ...):
local_name = __import__("", globals(), locals(), [], 1)
from module in subpackage (pkg.subpkg.foo, ...):
local_name = __import__("", globals(), locals(), [], 2)
python3 only*:
From pkg.foo or pkg.bar:
import importlib
local_name = importlib.import_module("..", __name__)
From pkg.subpkg.baz:
import importlib
local_name = importlib.import_module("...", __name__)
*import_module on python2 tries load pkg. in this case, unfortunately.

How to reference to the top-level module in Python inside a package?

In the below hierachy, is there a convenient and universal way to reference to the top_package using a generic term in all .py file below? I would like to have a consistent way to import other modules, so that even when the "top_package" changes name nothing breaks.
I am not in favour of using the relative import like "..level_one_a" as relative path will be different to each python file below. I am looking for a way that:
Each python file can have the same import statement for the same module in the package.
A decoupling reference to "top_package" in any .py file inside the package, so whatever name "top_package" changes to, nothing breaks.
top_package/
__init__.py
level_one_a/
__init__.py
my_lib.py
level_two/
__init__.py
hello_world.py
level_one_b/
__init__.py
my_lib.py
main.py
This should do the job:
top_package = __import__(__name__.split('.')[0])
The trick here is that for every module the __name__ variable contains the full path to the module separated by dots such as, for example, top_package.level_one_a.my_lib. Hence, if you want to get the top package name, you just need to get the first component of the path and import it using __import__.
Despite the variable name used to access the package is still called top_package, you can rename the package and if will still work.
Put your package and the main script into an outer container directory, like this:
container/
main.py
top_package/
__init__.py
level_one_a/
__init__.py
my_lib.py
level_two/
__init__.py
hello_world.py
level_one_b/
__init__.py
my_lib.py
When main.py is run, its parent directory (container) will be automatically added to the start of sys.path. And since top_package is now in the same directory, it can be imported from anywhere within the package tree.
So hello_world.py could import level_one_b/my_lib.py like this:
from top_package.level_one_b import my_lib
No matter what the name of the container directory is, or where it is located, the imports will always work with this arrangement.
But note that, in your original example, top_package it could easily function as the container directory itself. All you would have to do is remove top_package/__init__.py, and you would be left with efectively the same arrangement.
The previous import statement would then change to:
from level_one_b import my_lib
and you would be free to rename top_package however you wished.
You could use a combination of the __import__() function and the __path__ attribute of a package.
For example, suppose you wish to import <whatever>.level_one_a.level_two.hello_world from somewhere else in the package. You could do something like this:
import os
_temp = __import__(__path__[0].split(os.sep)[0] + ".level_one_a.level_two.hello_world")
my_hello_world = _temp.level_one_a.level_two.hello_world
This code is independent of the name of the top level package and can be used anywhere in the package. It's also pretty ugly.
This works from within a library module:
import __main__ as main_package
TOP_PACKAGE = main_package.__package__.split('.')[0]
I believe #2 is impossible without using relative imports or the named package. You have to specify what module to import either by explicitly calling its name or using a relative import. otherwise how would the interpreter know what you want?
If you make your application launcher one level above top_level/ and have it import top_level you can then reference top_level.* from anywhere inside the top_level package.
(I can show you an example from software I'm working on: http://github.com/toddself/beerlog/)

Creating aliases for Python packages?

I have a directory, let's call it Storage full of packages with unwieldy names like mypackage-xxyyzzww, and of course Storage is on my PYTHONPATH. Since packages have long unmemorable names, all of the packages are symlinked to friendlier names, such as mypackage.
Now, I don't want to rely on file system symbolic links to do this, instead I tried mucking around with sys.path and sys.modules. Currently I'm doing something like this:
import imp
imp.load_package('mypackage', 'Storage/mypackage-xxyyzzww')
How bad is it to do things this way, and is there a chance this will break in the future? One funny thing is that there's even no mention of imp.load_package function in the docs.
EDIT: besides not relying on symbolic links, I can't use PYTHONPATH variable anymore.
Instead of using imp, you can assign different names to imported modules.
import mypackage_xxyyzzww as mypackage
If you then create a __init__.py file inside of Storage, you can add several of the above lines to make importing easier.
Storage/__init__.py:
import mypackage_xxyyzzww as mypackage
import otherpackage_xxyyzzww as otherpackage
Interpreter:
>>> from Storage import mypackage, otherpackage
importlib may be more appropriate, as it uses/implements the PEP302 mechanism.
Follow the DictImporter example, but override find_module to find the real filename and store it in the dict, then override load_module to get the code from the found file.
You shouldn't need to use sys.path once you've created your Storage module
#from importlib import abc
import imp
import os
import sys
import logging
logging.basicConfig(level=logging.DEBUG)
dprint = logging.debug
class MyImporter(object):
def __init__(self,path):
self.path=path
self.names = {}
def find_module(self,fullname,path=None):
dprint("find_module({fullname},{path})".format(**locals()))
ml = imp.find_module(fullname,path)
dprint(repr(ml))
raise ImportError
def load_module(self,fullname):
dprint("load_module({fullname})".format(**locals()))
return imp.load_module(fullname)
raise ImportError
def load_storage( path, modname=None ):
if modname is None:
modname = os.path.basename(path)
mod = imp.new_module(modname)
sys.modules[modname] = mod
assert mod.__name__== modname
mod.__path__=[path]
#sys.meta_path.append(MyImporter(path))
mod.__loader__= MyImporter(path)
return mod
if __name__=="__main__":
load_storage("arbitrary-path-to-code/Storage")
from Storage import plain
from Storage import mypkg
Then when you import Storage.mypackage, python will immediately use your importer without bothering to look on sys.path
That doesn't work. The code above does work to import ordinary modules under Storage without requiring Storage to be on sys.path, but both 3.1 and 2.6 seem to ignore the loader attribute mentioned in PEP302.
If I uncomment the sys.meta_path line, 3.1 dies with StackOverflow, and 2.6 dies with ImportError. hmmm... I'm out of time now, but may look at it later.
Packages are just entries in the namespace. You should not name your path components with anything that is not a legal python variable name.

Categories