I've been caught out by circular imports on a large project. So I'm seeking to find a way to test my code to see which of the modules in the project (and only in the project) are imported when an import statement is run. This is to inform refactoring and make sure there isn't an import somewhere deep within a package that's causing a problem.
Suppose I import project package 'agent', I want to know which project modules also get imported as a result. For instance if 'environment' and 'policy' are imported due to modules deep within the agent package containing those import statements, then I want to see just those listed. So not numpy modules listed for example as they are outside the project and so not relevant for circular dependencies.
So far I have this:
import sys
import agent # project module
for k, v in sys.modules.items():
print(f"key: {k} value: {v}")
example rows:
key: numpy.random value: <module 'numpy.random' from '/home/robin/Python/anaconda3/envs/rl/lib/python3.9/site-packages/numpy/random/__init__.py'>
key: environment value: <module 'environment' from '/home/robin/Python/Projects/RL_Sutton/Cliff/environment/__init__.py'>
This does return the modules imported both directly and indirectly but also includes a lot else such as all the components of numpy and builtins etc... If I could filter this dictionary that would solve it.
k is a str, v is <class 'module'>.
The module's __str__ method does return the module file path within it so I suppose that could be used but it's not a clean solution. I've tried looking at the documentation for sys.modules and module_type but nothing there gives a way to filter modules to the current project (that I could see).
I tried to modify the solutions for each of these without success:
How to list imported modules?
List imported modules from an imported module in Python 3
ModuleFinder also looked promising but from the limited example I couldn't see how to make path or excludes solve the problem.
Update
I didn't specify this in the original question but I'm importing modules that often look like this:
from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
import environment
import policy
ModuleFinder will find environment and policy even though they won't be imported at runtime and don't matter for cyclic imports. So I adapted the accepted answer below to find only runtime imports.
import agent
import sys
app_dir = '/path/to/projects_folder'
imported_module_names = []
for module_name, mod in sys.modules.items():
file = getattr(mod, '__file__', '')
if str(file).startswith(app_dir) and module_name != '__main__':
imported_module_names.append(module_name)
for module_name in sorted(imported_module_names):
print(module_name)
You can use modulefinder to run a script and inspect the imported modules. These can be filtered by using the __file__ attribute (given that you actually import these modules from the file system; don't worry about the dunder attribute, it's for consistency with the builtin module type):
from modulefinder import ModuleFinder
finder = ModuleFinder()
finder.run_script('test.py')
appdir = '/path/to/project'
modules = {name: mod for name, mod in finder.modules.items()
if mod.__file__ is not None
and mod.__file__.startswith(appdir)}
for name in modules.keys():
print(f"{name}")
You can invoke Python with the -v command line option which will print a message each time a module is initialized.
Related
I'm baffled by the importing dynamics in __init__.py.
Say I have this structure:
package
├── __init__.py
└── subpackage
├── __init__.py
└── dostuff.py
I would like to import things in dostuff.py. I could do it like this: from package.subpackage.dostuff import thefunction, but I would like to remove the subpackage level in the import statement, so it would look like this:
from package.dostuff import thefunction
I tried putting this in package/__init__.py:
from .subpackage import dostuff
And what I don't understand is this:
# doing this works:
from package import dostuff
dostuff.thefunction()
# but this doesn't work:
from package.dostuff import thefunction
# ModuleNotFoundError: No module named 'package.dostuff'
Why is that, and how can I make from package.dostuff import thefunction work?
The only way I see to make what you intend would be to actually create a package/dostuff.py module and import all you need in it as from .subpackage.dostuff import thefunction.
The point is that when you use from .subpackage import dostuff in package/__init__.py, you do not rename the original module.
To be more explicit, here is an example of use with both your import and a package/dostuff.py file:
# We import the dostuff link from package
>>> from package import dostuff
>>> dostuff
<module 'package.subpackage.dostuff' from '/tmp/test/package/subpackage/dostuff.py'>
# We use our custom package.dostuff
>>> from package.dostuff import thefunction
>>> package.dostuff
<module 'package.dostuff' from '/tmp/test/package/dostuff.py'>
>>> from package import dostuff
>>> dostuff
<module 'package.dostuff' from '/tmp/test/package/dostuff.py'>
# The loaded function is the same
>>> dostuff.thefunction
<function thefunction at 0x7f95403d2730>
>>> package.dostuff.thefunction
<function thefunction at 0x7f95403d2730>
A clearer way of putting this is:
from X import Y only works when X is an actual module path.
Y on the contrary can be any item imported in this module.
This also applies to packages with anything being declared in their __init__.py. Here you declare the module package.subpackage.dostuff in package, hence you can import it and use it.
But if you try to use the module for a direct import, it has to exist on the filesystem
Resources:
Python documentation about module management in the import system:
https://docs.python.org/3/reference/import.html#submodules.
Python import system search behavior:
https://docs.python.org/3/reference/import.html#searching
https://docs.python.org/3/glossary.html#term-qualified-name
https://docs.python.org/2.0/ref/import.html
I hope that makes it clearer
You can in fact fake this quite easily by fiddling with Python's sys.modules dict. The question is whether you do really need this or whether it might be good to spend a second thought on your package structure.
Personally, I would consider this bad style, because it applies magic to the module and package names and people who might use and extend your package will have a hard time figuring out what's going on there.
Following your structure above, add the following code to your package/__init__.py:
import sys
from .subpackage import dostuff
# This will be package.dostuff; just avoiding to hard-code it.
_pkg_name = f"{__name__}.{dostuff.__name__.rsplit('.', 1)[1]}"
if _pkg_name not in sys.modules.keys():
dostuff.__name__ = _pkg_name # Will have no effect; see below
sys.modules[_pkg_name] = dostuff
This imports the dostuff module from your subpackage to the scope of package, changes its module path and adds it to the imported modules. Essentially, this just copies the binding of your module to another import path where member memory addresses remain the same. You just duplicate the references:
import package
print(package.dostuff)
print(package.subpackage.dostuff)
print(package.dostuff.something_to_do)
print(package.subpackage.dostuff.something_to_do)
... yields
<module 'package.subpackage.dostuff' from '/path/package/subpackage/dostuff.py'>
<module 'package.subpackage.dostuff' from '/path/package/subpackage/dostuff.py'>
<function something_to_do at 0x1029b8ae8>
<function something_to_do at 0x1029b8ae8>
Note that
The module name package.subpackage.dostuff has not changed even though being updated in package/__init__.py
The function reference is the same: 0x1029b8ae8
Now, you can also go
from package.dostuff import something_to_do
something_to_do()
However, be cautious. Changing the imported modules during import of a module might have unintended side-effects (also the order of updating sys.modules and importing other subpackages or submodules from within package might be relevant). Usually, you buy extra work and extra complexity by applying such kind of "improvement". Better yet set up a proper package structure and stick to it.
I am developing a package that has a file structure similar to the following:
test.py
package/
__init__.py
foo_module.py
example_module.py
If I call import package in test.py, I want the package module to appear similar to this:
>>> vars(package)
mapping_proxy({foo: <function foo at 0x…}, {example: <function example at 0x…})
In other words, I want the members of all modules in package to be in package's namespace, and I do not want the modules themselves to be in the namespace. package is not a sub-package.
Let's say my files look like this:
foo_module.py:
def foo(bar):
return bar
example_module.py:
def example(arg):
return foo(arg)
test.py:
print(example('derp'))
How do I structure the import statements in test.py, example_module.py, and __init__.py to work from outside the package directory (i.e. test.py) and within the package itself (i.e. foo_module.py and example_module.py)? Everything I try gives Parent module '' not loaded, cannot perform relative import or ImportError: No module named 'module_name'.
Also, as a side-note (as per PEP 8): "Relative imports for intra-package imports are highly discouraged. Always use the absolute package path for all imports. Even now that PEP 328 is fully implemented in Python 2.5, its style of explicit relative imports is actively discouraged; absolute imports are more portable and usually more readable."
I am using Python 3.3.
I want the members of all modules in package to be in package's
namespace, and I do not want the modules themselves to be in the
namespace.
I was able to do that by adapting something I've used in Python 2 to automatically import plug-ins to also work in Python 3.
In a nutshell, here's how it works:
The package's __init__.py file imports all the other Python files in the same package directory except for those whose names start with an '_' (underscore) character.
It then adds any names in the imported module's namespace to that of __init__ module's (which is also the package's namespace). Note I had to make the example_module module explicitly import foo from the .foo_module.
One important aspect of doing things this way is realizing that it's dynamic and doesn't require the package module names to be hardcoded into the __init__.py file. Of course this requires more code to accomplish, but also makes it very generic and able to work with just about any (single-level) package — since it will automatically import new modules when they're added and no longer attempt to import any removed from the directory.
test.py:
from package import *
print(example('derp'))
__init__.py:
def _import_all_modules():
""" Dynamically imports all modules in this package. """
import traceback
import os
global __all__
__all__ = []
globals_, locals_ = globals(), locals()
# Dynamically import all the package modules in this file's directory.
for filename in os.listdir(__name__):
# Process all python files in directory that don't start
# with underscore (which also prevents this module from
# importing itself).
if filename[0] != '_' and filename.split('.')[-1] in ('py', 'pyw'):
modulename = filename.split('.')[0] # Filename sans extension.
package_module = '.'.join([__name__, modulename])
try:
module = __import__(package_module, globals_, locals_, [modulename])
except:
traceback.print_exc()
raise
for name in module.__dict__:
if not name.startswith('_'):
globals_[name] = module.__dict__[name]
__all__.append(name)
_import_all_modules()
foo_module.py:
def foo(bar):
return bar
example_module.py:
from .foo_module import foo # added
def example(arg):
return foo(arg)
I think you can get the values you need without cluttering up your namespace, by using from module import name style imports. I think these imports will work for what you are asking for:
Imports for example_module.py:
from package.foo_module import foo
Imports for __init__.py:
from package.foo_module import foo
from package.example_module import example
__all__ = [foo, example] # not strictly necessary, but makes clear what is public
Imports for test.py:
from package import example
Note that this only works if you're running test.py (or something else at the same level of the package hierarchy). Otherwise you'd need to make sure the folder containing package is in the python module search path (either by installing the package somewhere Python will look for it, or by adding the appropriate folder to sys.path).
Is it possible in python to get a list of modules from a folder/package and import them?
I would like to be able to do this from a function inside a class, so that the entire class has access to them (possibly done from the __init__ method).
Any help would be greatly appreciated.
See the modules document.
The only solution is for the package author to provide an explicit
index of the package. The import statement uses the following
convention: if a package’s __init__.py code defines a list named
__all__, it is taken to be the list of module names that should be imported when from package import * is encountered. It is up to the
package author to keep this list up-to-date when a new version of the
package is released. Package authors may also decide not to support
it, if they don’t see a use for importing * from their package. For
example, the file sounds/effects/__init__.py could contain the
following code:
__all__ = ["echo", "surround", "reverse"]
This would mean that from sound.effects import * would import the three named submodules of the sound package.
Yes, you could find a way to do this by doing a directory listing for the files in the directory and import them manually. But there isn't built-in syntax for what you're asking.
You can know the list of the modules with the dir function
import module
dir (module)
Later in a program, you can import a single function :
from module import function
The distribute module provides a mechanism that does much of this. First, you might start by listing the python files in a package using pkg_resources.resource_listdir:
>>> module_names = set(os.path.splitext(r)[0]
... for r
... in pkg_resources.resource_listdir("sqlalchemy", "/")
... if os.path.splitext(r)[1] in ('.py', '.pyc', '.pyo', '')
... ) - set(('__init__',))
>>> module_names
set(['engine', 'util', 'exc', 'pool', 'processors', 'interfaces',
'databases', 'ext', 'topological', 'queue', 'test', 'connectors',
'orm', 'log', 'dialects', 'sql', 'types', 'schema'])
You could then import each module in a loop:
modules = {}
for module in module_names:
modules[module] = __import__('.'.join('sqlalchemy', module))
Is it possible to programmatically detect dependencies given a python project residing in SVN?
Here is a twist which adds some precision, and which might be useful if you find you're frequently checking dependencies of miscellaneous code:
Catches only import statements executed by the code being analyzed.
Automatically excludes all system-loaded modules, so you don't have to weed through it.
Also reports the symbols imported from each module.
Code:
import __builtin__
import collections
import sys
IN_USE = collections.defaultdict(set)
_IMPORT = __builtin__.__import__
def _myimport(name, globs=None, locs=None, fromlist=None, level=-1):
global IN_USE
if fromlist is None:
fromlist = []
IN_USE[name].update(fromlist)
return _IMPORT(name, globs, locs, fromlist, level)
# monkey-patch __import__
setattr(__builtin__, '__import__', _myimport)
# import and run the target project here and run the routine
import foobar
foobar.do_something()
# when it finishes running, dump the imports
print 'modules and symbols imported by "foobar":'
for key in sorted(IN_USE.keys()):
print key
for name in sorted(IN_USE[key]):
print ' ', name
Example foobar module:
import byteplay
import cjson
def _other():
from os import path
from sys import modules
def do_something():
import hashlib
import lxml
_other()
Output:
modules and symbols imported by "foobar":
_hashlib
array
array
byteplay
cStringIO
StringIO
cjson
dis
findlabels
foobar
hashlib
itertools
lxml
opcode
*
__all__
operator
os
path
sys
modules
types
warnings
Absolutely! If you are working from a UNIX or Linux shell, a simple combination of grep and awk would work; basically, all you want to do is search for lines containing the "import" keyword.
However, if you are working from any environment, you could just write a small Python script to do the searching for you (don't forget that strings are treated as immutable sequences, so you can do something like if "import" in line: ....
The one sticky spot, would be associating those imported modules to their package name (the first one that comes to mind is the PIL module, in Ubuntu it's provided by the python-imaging package).
Python code can import modules using runtime-constructed strings, so the only surefire way would be to run the code. Real-world example: when you open a database with SQLAlchemy's dbconnect, the library will load one or more db-api modules depending on the content of your database string.
If you're willing to run the code, here is a relatively simple way to do this by examining sys.modules when it finishes:
>>> from sys import modules
>>> import codeofinterest
>>> execute_code_of_interest()
>>> print modules
[ long, list, of, loaded, modules ]
Here, too, you should keep in mind that this could theoretically fail if execute_code_of_interest() modifies sys.modules, but I believe that's quite rare in production code.
I have a directory, let's call it Storage full of packages with unwieldy names like mypackage-xxyyzzww, and of course Storage is on my PYTHONPATH. Since packages have long unmemorable names, all of the packages are symlinked to friendlier names, such as mypackage.
Now, I don't want to rely on file system symbolic links to do this, instead I tried mucking around with sys.path and sys.modules. Currently I'm doing something like this:
import imp
imp.load_package('mypackage', 'Storage/mypackage-xxyyzzww')
How bad is it to do things this way, and is there a chance this will break in the future? One funny thing is that there's even no mention of imp.load_package function in the docs.
EDIT: besides not relying on symbolic links, I can't use PYTHONPATH variable anymore.
Instead of using imp, you can assign different names to imported modules.
import mypackage_xxyyzzww as mypackage
If you then create a __init__.py file inside of Storage, you can add several of the above lines to make importing easier.
Storage/__init__.py:
import mypackage_xxyyzzww as mypackage
import otherpackage_xxyyzzww as otherpackage
Interpreter:
>>> from Storage import mypackage, otherpackage
importlib may be more appropriate, as it uses/implements the PEP302 mechanism.
Follow the DictImporter example, but override find_module to find the real filename and store it in the dict, then override load_module to get the code from the found file.
You shouldn't need to use sys.path once you've created your Storage module
#from importlib import abc
import imp
import os
import sys
import logging
logging.basicConfig(level=logging.DEBUG)
dprint = logging.debug
class MyImporter(object):
def __init__(self,path):
self.path=path
self.names = {}
def find_module(self,fullname,path=None):
dprint("find_module({fullname},{path})".format(**locals()))
ml = imp.find_module(fullname,path)
dprint(repr(ml))
raise ImportError
def load_module(self,fullname):
dprint("load_module({fullname})".format(**locals()))
return imp.load_module(fullname)
raise ImportError
def load_storage( path, modname=None ):
if modname is None:
modname = os.path.basename(path)
mod = imp.new_module(modname)
sys.modules[modname] = mod
assert mod.__name__== modname
mod.__path__=[path]
#sys.meta_path.append(MyImporter(path))
mod.__loader__= MyImporter(path)
return mod
if __name__=="__main__":
load_storage("arbitrary-path-to-code/Storage")
from Storage import plain
from Storage import mypkg
Then when you import Storage.mypackage, python will immediately use your importer without bothering to look on sys.path
That doesn't work. The code above does work to import ordinary modules under Storage without requiring Storage to be on sys.path, but both 3.1 and 2.6 seem to ignore the loader attribute mentioned in PEP302.
If I uncomment the sys.meta_path line, 3.1 dies with StackOverflow, and 2.6 dies with ImportError. hmmm... I'm out of time now, but may look at it later.
Packages are just entries in the namespace. You should not name your path components with anything that is not a legal python variable name.