How can I find out what file is importing a particular file in python?
Consider the following example:
#a.py
import cmn
....
#b.py
import cmn
...
#cmn.py
#Here, I want to know which file (a.py or b.py)
#is importing this one.
#Is it possible to do this?
...
All the files a.py, b.py and cmn.py are in the same directory.
Why do I want to do this?
In C/C++, they have include feature. What i want to do can illuminate by the C/C++ code.
//a.cpp
....
#define SOME_STUFF ....
#include "cmn.h"
//b.cpp
...
#define SOME_STUFF ....
#include "cmn.h"
//cmn.h
//Here, I'll define some functions/classes that will use the symbol define
//in the a.cpp or b.cpp
...
....code refer to the SOME_STUFF.....
In C/C++, we can use this method to reuse sourecode.
Now return to my python code.
When a.py import cmn.py, i hope to run cmn.py and the cmn.py will refer to the symbol defined in the a.py.
When b.py import cmn.py, i hope to run cmn.py and the cmn.py will refer to the symbol defined in the b.py.
The namedtuple code in the collections module has an example of how (and when) to do this:
#cmn.py
import sys
print 'I am being imported by', sys._getframe(1).f_globals.get('__name__')
One limitation of this approach is that the outermost module is always named __main__. If that is the case, the name of the outermost module can be determined from sys.argv[0].
A second limitation is that if the code using sys._getframe is in the module scope it is only executed on the first import of cmn.py. You'd need to call a function of some sort after imports if you want to monitor all imports of the module.
Well, this is a kind of bizarre thing to do. You haven't explained why you want to know what is importing your module, so I can't actually help you solve your problem. You also haven't explained how or when you want to know the importing module.
def who_imports(studied_module):
for loaded_module in sys.modules.values():
for module_attribute in dir(loaded_module):
if getattr(loaded_module, module_attribute) is studied_module:
yield loaded_module
This will give you an iterator over all the modules which use your module as a top-level object. It won't find modules that do from cmn import *, and the list will change over time.
>>> import os
>>> for m in who_imports(os):
... print m.__name__
...
site
__main__
posixpath
genericpath
posixpath
linecache
You'd need to install an import hook that tracks all imports. See PEP 302 and http://docs.python.org/dev/py3k/library/importlib.html. However, as the comments above point out, there is probably a better way to structure your code.
Related
I would like a way to detect if my module was executed directly, as in import module or from module import * rather than by import module.submodule (which also executes module), and have this information accessible in module's __init__.py.
Here is a use case:
In Python, a common idiom is to add import statement in a module's __init__.py file, such as to "flatten" the module's namespace and make its submodules accessible directly. Unfortunately, doing so can make loading a specific submodule very slow, as all other siblings imported in __init__.py will also execute.
For instance:
module/
__init__.py
submodule/
__init__.py
...
sibling/
__init__.py
...
By adding to module/__init__.py:
from .submodule import *
from .sibling import *
It is now possible for users of the module to access definitions in submodules without knowing the details of the package structure (i.e. from module import SomeClass, where SomeClass is defined somewhere in submodule and exposed in its own __init__.py file).
However, if I now run submodule directly (as in import module.submodule, by calling python3 -m module.submodule, or even indirectly via pytest) I will also, unavoidably, execute sibling! If sibling is large, this can slow things down for no reason.
I would instead like to write module/__init__.py something like:
if __???__ == 'module':
from .submodule import *
from .sibling import *
Where __???__ gives me the fully qualified name of the import. Any similar mechanism would also work, although I'm mostly interested in the general case (detecting direct executing) rather than this specific example.
What is being desired is will result in undefined behavior (in the sense whether or not the flattened names be importable from module) when we consider how the import system actually works, if it were actually possible.
Hypothetically, if what you want to achieve is possible, where some __dunder__ that will disambiguate which import statement was used to import module/__init__.py (e.g. import module and from module import *, vs import module.submodule. For the first case, module may trigger the subsequent (slow) import to produce a "flattened" version of the desired imports, while the latter case (import module.submodule) will avoid that and thus module will not contain any assignments of the "flattened" imports.
To illustrate the example a bit more, say one may import SiblingClass from module.sibling.SiblingClass by simply doing from module import SiblingClass as the module/__init__.py file executes from .sibling import * statement to create that binding. But then, if executing import module.submodule resulting in the avoidance of that flatten import, we get the following scenario:
import module.submodule
# module.submodule gets imported
from module import SiblingClass
# ImportError will occur
Why is that? This is simply due to how Python imports a file - the source file is executed in its entirety once to assign imports, function and class declarations to the designated names, and be registered to sys.modules under its import name. Importing the module again will not execute the file again, thus if the from .sibling import * statement was not executed during its initial import (i.e. import module.submodule), it will never be executed again during subsequent import of the same module, as the copy produced by the initial import assigned to its module entry in sys.module is returned (unless the module was reloaded manually, the code for the module will be executed again).
You may verify this fact by putting in a print statement into a file, import the corresponding module to see the output produced, and see that no further output will be produced on subsequent import of that module (related: What happens when a module is imported twice?).
Effectively, the desired functionality as described in the question cannot be implemented in Python.
A related thread on this topic: How to only import sub module without exec __init__.py in the package
This is not a complete solution, but standalone py.test (ignore __init__.py files) proposes setting a global flag to detect when in test. This corrects the problem for tests at least, provided the concerned modules don't call each other.
Having already use flat packages, I was not expecting the issue I encountered with nested packages. Here is…
Directory layout
dir
|
+-- test.py
|
+-- package
|
+-- __init__.py
|
+-- subpackage
|
+-- __init__.py
|
+-- module.py
Content of init.py
Both package/__init__.py and package/subpackage/__init__.py are empty.
Content of module.py
# file `package/subpackage/module.py`
attribute1 = "value 1"
attribute2 = "value 2"
attribute3 = "value 3"
# and as many more as you want...
Content of test.py (3 versions)
Version 1
# file test.py
from package.subpackage.module import *
print attribute1 # OK
That's the bad and unsafe way of importing things (import all in a bulk), but it works.
Version 2
# file test.py
import package.subpackage.module
from package.subpackage import module # Alternative
from module import attribute1
A safer way to import, item by item, but it fails, Python don't want this: fails with the message: "No module named module". However …
# file test.py
import package.subpackage.module
from package.subpackage import module # Alternative
print module # Surprise here
… says <module 'package.subpackage.module' from '...'>. So that's a module, but that's not a module /-P 8-O ... uh
Version 3
# file test.py v3
from package.subpackage.module import attribute1
print attribute1 # OK
This one works. So you are either forced to use the overkill prefix all the time or use the unsafe way as in version #1 and disallowed by Python to use the safe handy way? The better way, which is safe and avoid unecessary long prefix is the only one which Python reject? Is this because it loves import * or because it loves overlong prefixes (which does not help to enforce this practice)?.
Sorry for the hard words, but that's two days I trying to work around this stupid‑like behavior. Unless I was totally wrong somewhere, this will leave me with a feeling something is really broken in Python's model of package and sub‑packages.
Notes
I don't want to rely on sys.path, to avoid global side effects, nor on *.pth files, which are just another way to play with sys.path with the same global effets. For the solution to be clean, it has to be local only. Either Python is able to handle subpackage, either it's not, but it should not require to play with global configuration to be able to handle local stuff.
I also tried use imports in package/subpackage/__init__.py, but it solved nothing, it do the same, and complains subpackage is not a known module, while print subpackage says it's a module (weird behavior, again).
May be I'm entirely wrong tough (the option I would prefer), but this make me feel a lot disappointed about Python.
Any other known way beside of the three I tried? Something I don't know about?
(sigh)
----- %< ----- edit ----- >% -----
Conclusion so far (after people's comments)
There is nothing like real sub‑package in Python, as all package references goes to a global dictionnary, only, which means there's no local dictionary, which implies there's is no way to manage local package reference.
You have to either use full prefix or short prefix or alias. As in:
Full prefix version
from package.subpackage.module import attribute1
# An repeat it again an again
# But after that, you can simply:
use_of (attribute1)
Short prefix version (but repeated prefix)
from package.subpackage import module
# Short but then you have to do:
use_of (module.attribute1)
# and repeat the prefix at every use place
Or else, a variation of the above.
from package.subpackage import module as m
use_of (m.attribute1)
# `m` is a shorter prefix, but you could as well
# define a more meaningful name after the context
Factorized version
If you don't mind about importing multiple entity all at once in a batch, you can:
from package.subpackage.module import attribute1, attribute2
# and etc.
Not in my first favorite taste (I prefer to have one import statement per imported entity), but may be the one I will personally favor.
Update (2012-09-14):
Finally appears to be OK in practice, except with a comment about the layout. Instead of the above, I used:
from package.subpackage.module import (
attribute1,
attribute2,
attribute3,
...) # and etc.
You seem to be misunderstanding how import searches for modules. When you use an import statement it always searches the actual module path (and/or sys.modules); it doesn't make use of module objects in the local namespace that exist because of previous imports. When you do:
import package.subpackage.module
from package.subpackage import module
from module import attribute1
The second line looks for a package called package.subpackage and imports module from that package. This line has no effect on the third line. The third line just looks for a module called module and doesn't find one. It doesn't "re-use" the object called module that you got from the line above.
In other words from someModule import ... doesn't mean "from the module called someModule that I imported earlier..." it means "from the module named someModule that you find on sys.path...". There is no way to "incrementally" build up a module's path by importing the packages that lead to it. You always have to refer to the entire module name when importing.
It's not clear what you're trying to achieve. If you only want to import the particular object attribute1, just do from package.subpackage.module import attribute1 and be done with it. You need never worry about the long package.subpackage.module once you've imported the name you want from it.
If you do want to have access to the module to access other names later, then you can do from package.subpackage import module and, as you've seen you can then do module.attribute1 and so on as much as you like.
If you want both --- that is, if you want attribute1 directly accessible and you want module accessible, just do both of the above:
from package.subpackage import module
from package.subpackage.module import attribute1
attribute1 # works
module.someOtherAttribute # also works
If you don't like typing package.subpackage even twice, you can just manually create a local reference to attribute1:
from package.subpackage import module
attribute1 = module.attribute1
attribute1 # works
module.someOtherAttribute #also works
The reason #2 fails is because sys.modules['module'] does not exist (the import routine has its own scope, and cannot see the module local name), and there's no module module or package on-disk. Note that you can separate multiple imported names by commas.
from package.subpackage.module import attribute1, attribute2, attribute3
Also:
from package.subpackage import module
print module.attribute1
If all you're trying to do is to get attribute1 in your global namespace, version 3 seems just fine. Why is it overkill prefix ?
In version 2, instead of
from module import attribute1
you can do
attribute1 = module.attribute1
I'm building a dependency graph in python3 using the ast module. How do I know what file(s) will be imported if that import statement were to be executed?
Not a complete answer, but here are some bits you should be aware of:
Imports might happen in conditional or try-catch blocks. So depending on a setting of an environment variable, module A might or might not import module B.
There's a wide variety of import syntax: import A, from A import B, from A import *, from . import A, from .. import A, from ..A import B as well as their versions with A replaced with sub-modules.
Imports can happen in any executable context - the top-level of the file, in a function, in a class definition etc.
eval can evaluate code with imports. Up to you if you consider such code to be a dependency.
The standard library modulefinder module might help.
As suggested in a comment: the other answers are valid, but one of the fundamental problems is that your examples only work for 'simple' scripts or files: A lot of more complex code will use things like dynamic imports: consider the following:
path, task_name = "module.function".rsplit(".", 1);
module = importlib.import_module(path);
real_func = getattr(module, task_name);
real_func();
The actual original string could be obfuscated, or pulled from a DB, or a file or...
There are alternatives to importlib, but this is on top of the exec type stuff you might see in #horia's good answer.
I want to make use of an existing python module (called "module.py"). I'm only interested in one function from that module ("my_function()"). The module also contains a lot of other functions, which I'm not using. These other functions cause the module to have a lot of imports that are not used in my_function.
"""module.py"""
import useful_import
import useless_import1
import useless_import2
def my_function():
return useful_import.do()
def other_function1():
return useless_import1.do()
def other_function2():
return useless_import2.do()
The code I've written (main.py) imports only my_function, but it still requires me to include/install the other useless modules. I've checked and none of the useless modules run any code on import, so I should be able to safely remove them.
"""main.py"""
from module import my_function
print my_function()
How do I best deal with this?
Should I included the useless imports in my project anyway?
Should I make a copy of module.py and edit it so that it only contains my_function and the right imports?
Should I copy my_function and its imports into main.py?
(some other option I didn't think/know of)?
It kind of depends on the context, e.g. how will this code be used later, who will maintain it, what kind of code is it realy etc etc.
But my suggestion under most circumstances would be:
refactor my_function and its needed imports into a new_module.py
use this module in main.py
Either remove module.py from your code base, or have it import from new_module
I'm trying to do the following in python 2.6.
my_module.py:-
from another_module import another_factory
def my_factory(name):
pass
another_module.py:-
from my_module import my_factory
def another_factory(name):
pass
Both modules in the same folder.
It gives me the error:
Error: cannot import name my_factory
As seen from the comments, you are trying to do a circle import which is impossible.
If in your module A you try to import something from the module B, and when loading the module B (to satisfy this dependency) you are trying to import something from the module A, you are where you started and you got a circle import: A needs B and B needs A!!, it is somehow like saying that A needs A, which is quite unlogic.
For instance:
# moduleA
from moduleB import functionB
...
So the interpreter tries to load the moduleB, which looks like the following:
# moduleB
from moduleA import functionA
...
And goes back to the moduleA, which tries again to import B, and, etc. Therefore python just raises the error and stops the insanity for a greater good.
Dependencies don't work like this. Define what module needs the other one, and just do a simple import. In your example, it seems that another_module needs my_module, so change my_module and eliminate the dependency on another_module.
If both modules actually need each other, it is a clear sign that they belong to the same logical concept, and should be merged.
PD: in some cases to avoid huge files, you can split a logical unit in two, and to avoid the circle dependencies, you write your imports inside of the functions (which are not executed at load time), so that there is not a circle. This is however in general something to avoid.
The real question is... do you consider each file as a module or are they part of a package ?
Trying to import modules outside a package is sometimes painful. You should rather build a package by simply creating an empty __init__.py module in the directory. Though, if you have
__init__.py
my_module.py
another_module.py
If you have te following function in my_module.py,
def my_factory(x):
return x * x
You should be able to access the my_factory() function from another_module.py by writing this :
from my_module import my_factory
But, if you don't have the __init__.py file/module, the import function will be (somehow) lost and will only use the sys.path for searching other modules. You may then add the following lines (before the import) in the another_module.py file :
sys.path.append(os.path.dirname(os.path.expanduser('.')))
You may also use the various packages available to help importing modules, like imp or import_file (see the documentation). Or you can decide to use load_source (also see the doc : https://docs.python.org/2/library/imp.html)