Is it possible to programmatically detect dependencies given a python project residing in SVN?
Here is a twist which adds some precision, and which might be useful if you find you're frequently checking dependencies of miscellaneous code:
Catches only import statements executed by the code being analyzed.
Automatically excludes all system-loaded modules, so you don't have to weed through it.
Also reports the symbols imported from each module.
Code:
import __builtin__
import collections
import sys
IN_USE = collections.defaultdict(set)
_IMPORT = __builtin__.__import__
def _myimport(name, globs=None, locs=None, fromlist=None, level=-1):
global IN_USE
if fromlist is None:
fromlist = []
IN_USE[name].update(fromlist)
return _IMPORT(name, globs, locs, fromlist, level)
# monkey-patch __import__
setattr(__builtin__, '__import__', _myimport)
# import and run the target project here and run the routine
import foobar
foobar.do_something()
# when it finishes running, dump the imports
print 'modules and symbols imported by "foobar":'
for key in sorted(IN_USE.keys()):
print key
for name in sorted(IN_USE[key]):
print ' ', name
Example foobar module:
import byteplay
import cjson
def _other():
from os import path
from sys import modules
def do_something():
import hashlib
import lxml
_other()
Output:
modules and symbols imported by "foobar":
_hashlib
array
array
byteplay
cStringIO
StringIO
cjson
dis
findlabels
foobar
hashlib
itertools
lxml
opcode
*
__all__
operator
os
path
sys
modules
types
warnings
Absolutely! If you are working from a UNIX or Linux shell, a simple combination of grep and awk would work; basically, all you want to do is search for lines containing the "import" keyword.
However, if you are working from any environment, you could just write a small Python script to do the searching for you (don't forget that strings are treated as immutable sequences, so you can do something like if "import" in line: ....
The one sticky spot, would be associating those imported modules to their package name (the first one that comes to mind is the PIL module, in Ubuntu it's provided by the python-imaging package).
Python code can import modules using runtime-constructed strings, so the only surefire way would be to run the code. Real-world example: when you open a database with SQLAlchemy's dbconnect, the library will load one or more db-api modules depending on the content of your database string.
If you're willing to run the code, here is a relatively simple way to do this by examining sys.modules when it finishes:
>>> from sys import modules
>>> import codeofinterest
>>> execute_code_of_interest()
>>> print modules
[ long, list, of, loaded, modules ]
Here, too, you should keep in mind that this could theoretically fail if execute_code_of_interest() modifies sys.modules, but I believe that's quite rare in production code.
Related
I've been caught out by circular imports on a large project. So I'm seeking to find a way to test my code to see which of the modules in the project (and only in the project) are imported when an import statement is run. This is to inform refactoring and make sure there isn't an import somewhere deep within a package that's causing a problem.
Suppose I import project package 'agent', I want to know which project modules also get imported as a result. For instance if 'environment' and 'policy' are imported due to modules deep within the agent package containing those import statements, then I want to see just those listed. So not numpy modules listed for example as they are outside the project and so not relevant for circular dependencies.
So far I have this:
import sys
import agent # project module
for k, v in sys.modules.items():
print(f"key: {k} value: {v}")
example rows:
key: numpy.random value: <module 'numpy.random' from '/home/robin/Python/anaconda3/envs/rl/lib/python3.9/site-packages/numpy/random/__init__.py'>
key: environment value: <module 'environment' from '/home/robin/Python/Projects/RL_Sutton/Cliff/environment/__init__.py'>
This does return the modules imported both directly and indirectly but also includes a lot else such as all the components of numpy and builtins etc... If I could filter this dictionary that would solve it.
k is a str, v is <class 'module'>.
The module's __str__ method does return the module file path within it so I suppose that could be used but it's not a clean solution. I've tried looking at the documentation for sys.modules and module_type but nothing there gives a way to filter modules to the current project (that I could see).
I tried to modify the solutions for each of these without success:
How to list imported modules?
List imported modules from an imported module in Python 3
ModuleFinder also looked promising but from the limited example I couldn't see how to make path or excludes solve the problem.
Update
I didn't specify this in the original question but I'm importing modules that often look like this:
from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
import environment
import policy
ModuleFinder will find environment and policy even though they won't be imported at runtime and don't matter for cyclic imports. So I adapted the accepted answer below to find only runtime imports.
import agent
import sys
app_dir = '/path/to/projects_folder'
imported_module_names = []
for module_name, mod in sys.modules.items():
file = getattr(mod, '__file__', '')
if str(file).startswith(app_dir) and module_name != '__main__':
imported_module_names.append(module_name)
for module_name in sorted(imported_module_names):
print(module_name)
You can use modulefinder to run a script and inspect the imported modules. These can be filtered by using the __file__ attribute (given that you actually import these modules from the file system; don't worry about the dunder attribute, it's for consistency with the builtin module type):
from modulefinder import ModuleFinder
finder = ModuleFinder()
finder.run_script('test.py')
appdir = '/path/to/project'
modules = {name: mod for name, mod in finder.modules.items()
if mod.__file__ is not None
and mod.__file__.startswith(appdir)}
for name in modules.keys():
print(f"{name}")
You can invoke Python with the -v command line option which will print a message each time a module is initialized.
I simply want to take all my .py files from a single folder (I don't care about the sub-folders for now) and put them into a single module.
The use case I'm having here is that I'm writing some pretty standard object-oriented code and I'm using a single file for every class, and I don't want to have to write from myClass import myClass for every class into my __init__.py. I can't use Python3, so I'm still working with impand reloadand such.
At the moment I'm using
# this is __init__.py
import pkgutil
for loader, name, is_pkg in pkgutil.walk_packages(__path__):
if not is_pkg:
__import__(__name__ + "." + name)
and it doesn't seem to work, it includes the packages but it includes them as modules, so that I have to write MyClass.MyClass for a class that is defined in a file with it's own name. That's silly and I don't like it.
I've been searching forever and I'm just getting more confused how complicated this seemingly standard use case seems to be. Do python devs just write everything into a single file? Or do they always have tons of imports?
Is this something that should be approached in an entirely different way?
What you really want to do
To do the job you need to bind your class names to namespace of your __init__.py script.
After this step you will be able to just from YourPackageName import * and just use your classes directly. Like this:
import YourPackageName
c = YourPackageName.MyClass()
or
from YourPackageName import *
c = MyClass()
Ways to achieve this
You have multiple ways to import modules dynamically: __import__(), __all__.
But.
The only way to bind names into namespace of current module is to use from myClass import myClass statement. Static statement.
In other words, content of each of your __init__.py scripts should be looking like that:
#!/usr/bin/env python
# coding=utf-8
from .MySubPackage import *
from .MyAnotherSubPackage import *
from .my_pretty_class import myPrettyClass
from .my_another_class import myAnotherClass
...
And you should know that even for a dynamic __all__:
It is up to the package author to keep this list up-to-date when a new version of the package is released.
(https://docs.python.org/2/tutorial/modules.html#importing-from-a-package)
So a clear answers to your questions:
Do python devs just write everything into a single file?
No, they don't.
Or do they always have tons of imports?
Almost. But definitely not tons. You need to import each of your modules just once (into an appropriate __init__.py scripts). And then just import whole package or sub-package at once.
Example
Let's assume that there is next package structure:
MyPackage
|---MySubPackage
| |---__init__.py
| |---pretty_class_1.py
| |---pretty_class_2.py
|---__init__.py
|---sleepy_class_1.py
|---sleepy_class_2.py
Content of the MyPackage/MySubPackage/__init__.py:
#!/usr/bin/env python
# coding=utf-8
from .pretty_class_1 import PrettyClass1
from .pretty_class_2 import PrettyClass2
Content of the MyPackage/__init__.py:
#!/usr/bin/env python
# coding=utf-8
from .MySubPackage import *
from .sleepy_class_1 import SleepyClass1
from .sleepy_class_2 import SleepyClass2
As result, now we are able to write next code in our application:
import MyPackage
p = MyPackage.PrettyClass1()
s = MyPackage.SleepyClass2()
or
from MyPackage import *
p = PrettyClass1()
s = SleepyClass2()
I'm building a dependency graph in python3 using the ast module. How do I know what file(s) will be imported if that import statement were to be executed?
Not a complete answer, but here are some bits you should be aware of:
Imports might happen in conditional or try-catch blocks. So depending on a setting of an environment variable, module A might or might not import module B.
There's a wide variety of import syntax: import A, from A import B, from A import *, from . import A, from .. import A, from ..A import B as well as their versions with A replaced with sub-modules.
Imports can happen in any executable context - the top-level of the file, in a function, in a class definition etc.
eval can evaluate code with imports. Up to you if you consider such code to be a dependency.
The standard library modulefinder module might help.
As suggested in a comment: the other answers are valid, but one of the fundamental problems is that your examples only work for 'simple' scripts or files: A lot of more complex code will use things like dynamic imports: consider the following:
path, task_name = "module.function".rsplit(".", 1);
module = importlib.import_module(path);
real_func = getattr(module, task_name);
real_func();
The actual original string could be obfuscated, or pulled from a DB, or a file or...
There are alternatives to importlib, but this is on top of the exec type stuff you might see in #horia's good answer.
in myModule.py I am importing environ from os , like
from os import environ since I am only using environ, but when I do dir(myModule) it shows environ as publicly visible , how ever should it be imported as protected assuming some other project may also have its own environ function ?
If you're doing from os import environ, then you'll reference it as environ.
If you do import os, it's os.environ.
So depending on your needs, the second option might be better. The first will look better and read easier, whereas the second avoids namespace pollution.
Expanding on #mgilson's comment - when you do dir(somemodule), everything you see is namespaced to that module. In other words, you have to use the . (name resolution operator) to "reach" those items.
So, in myModule.py you have the following lines:
from os import environ
a = 4
In some other module, or the Python prompt, you have the following statements:
import myModule
dir(myModule)
Now, in order to get to a or environ that is inside myModule, you'd have to explicitly define its scope:
print(a) # this won't work
print(myModule.a) # this will print 4
In Python as a general rule, there is no explicit hiding/protecting. Python expects its users to be consenting adults and "know what they are doing".
However, developers can control what happens when someone tries to import everything from a module (from myModule import *), but this isn't strictly enforced. You can still get to everything inside myModule by prefixing the module name.
I have a directory, let's call it Storage full of packages with unwieldy names like mypackage-xxyyzzww, and of course Storage is on my PYTHONPATH. Since packages have long unmemorable names, all of the packages are symlinked to friendlier names, such as mypackage.
Now, I don't want to rely on file system symbolic links to do this, instead I tried mucking around with sys.path and sys.modules. Currently I'm doing something like this:
import imp
imp.load_package('mypackage', 'Storage/mypackage-xxyyzzww')
How bad is it to do things this way, and is there a chance this will break in the future? One funny thing is that there's even no mention of imp.load_package function in the docs.
EDIT: besides not relying on symbolic links, I can't use PYTHONPATH variable anymore.
Instead of using imp, you can assign different names to imported modules.
import mypackage_xxyyzzww as mypackage
If you then create a __init__.py file inside of Storage, you can add several of the above lines to make importing easier.
Storage/__init__.py:
import mypackage_xxyyzzww as mypackage
import otherpackage_xxyyzzww as otherpackage
Interpreter:
>>> from Storage import mypackage, otherpackage
importlib may be more appropriate, as it uses/implements the PEP302 mechanism.
Follow the DictImporter example, but override find_module to find the real filename and store it in the dict, then override load_module to get the code from the found file.
You shouldn't need to use sys.path once you've created your Storage module
#from importlib import abc
import imp
import os
import sys
import logging
logging.basicConfig(level=logging.DEBUG)
dprint = logging.debug
class MyImporter(object):
def __init__(self,path):
self.path=path
self.names = {}
def find_module(self,fullname,path=None):
dprint("find_module({fullname},{path})".format(**locals()))
ml = imp.find_module(fullname,path)
dprint(repr(ml))
raise ImportError
def load_module(self,fullname):
dprint("load_module({fullname})".format(**locals()))
return imp.load_module(fullname)
raise ImportError
def load_storage( path, modname=None ):
if modname is None:
modname = os.path.basename(path)
mod = imp.new_module(modname)
sys.modules[modname] = mod
assert mod.__name__== modname
mod.__path__=[path]
#sys.meta_path.append(MyImporter(path))
mod.__loader__= MyImporter(path)
return mod
if __name__=="__main__":
load_storage("arbitrary-path-to-code/Storage")
from Storage import plain
from Storage import mypkg
Then when you import Storage.mypackage, python will immediately use your importer without bothering to look on sys.path
That doesn't work. The code above does work to import ordinary modules under Storage without requiring Storage to be on sys.path, but both 3.1 and 2.6 seem to ignore the loader attribute mentioned in PEP302.
If I uncomment the sys.meta_path line, 3.1 dies with StackOverflow, and 2.6 dies with ImportError. hmmm... I'm out of time now, but may look at it later.
Packages are just entries in the namespace. You should not name your path components with anything that is not a legal python variable name.