Given only source file, what files does it import? - python

I'm building a dependency graph in python3 using the ast module. How do I know what file(s) will be imported if that import statement were to be executed?

Not a complete answer, but here are some bits you should be aware of:
Imports might happen in conditional or try-catch blocks. So depending on a setting of an environment variable, module A might or might not import module B.
There's a wide variety of import syntax: import A, from A import B, from A import *, from . import A, from .. import A, from ..A import B as well as their versions with A replaced with sub-modules.
Imports can happen in any executable context - the top-level of the file, in a function, in a class definition etc.
eval can evaluate code with imports. Up to you if you consider such code to be a dependency.
The standard library modulefinder module might help.

As suggested in a comment: the other answers are valid, but one of the fundamental problems is that your examples only work for 'simple' scripts or files: A lot of more complex code will use things like dynamic imports: consider the following:
path, task_name = "module.function".rsplit(".", 1);
module = importlib.import_module(path);
real_func = getattr(module, task_name);
real_func();
The actual original string could be obfuscated, or pulled from a DB, or a file or...
There are alternatives to importlib, but this is on top of the exec type stuff you might see in #horia's good answer.

Related

How to replace from x import * to import x in python code? [duplicate]

I was recently tasked with maintaining a bunch of code that uses from module import * fairly heavily.
This codebase has gotten big enough that import conflicts/naming ambiguity/"where the heck did this function come from, there are like eight imported modules that have one with the same name?!"ism have become more and more common.
Moving forward, I've been using explicit members (i.e. import module ... module.object.function() to make the maintenance work I do more readable.
But I was wondering: is there an IDE or utility which robustly parses Python code and refactors * import statements into module import statements, and then prepends the full module path onto all references to members of that module?
We're not using metaprogramming/reflection/inspect/monkeypatching heavily, so if aforementened IDE/util behaves poorly with such things, that is OK.
Not a perfect solution, but what I usually do is this:
Open Pydev
Remove all * imports
Use the optimize imports command (ctrl+shift+o) to re-add all the imports
Roughly solves the problem :)
If you want to build a solution yourself, try http://docs.python.org/library/modulefinder.html
Here are the other related tools mentioned:
working with AST directly, which is very low-level for your use.
working with modulefinder which may have a lot of the boilerplate code you are looking for,
rope, a refactoring library (#Lucas Graf),
the bicycle repair man, a refactoring libary
the logilab-astng library used in pylint
More about pylint
pylint is a very good tool built on top of ast that is already able to tell you where in your code there are from somemodule import * statements, as well as telling you which imports are not necessary.
example:
# next is what's on line 32
from re import *
this will complain:
W: 32,0: Wildcard import re
W: 32,0: Unused import finditer from wildcard import
W: 32,0: Unused import LOCALE from wildcard import
... # this is a long list ...
Towards a solution?
Note that in the above output pylint gives you the line numbers. it might be some effort, but a refactoring tool can look at those particular warnings, get the line number, import the module and look at the __all__ list, or using a sandboxed execfile() statement to see the module's global names (would modulefinder help with that? maybe...). With the list of global names from __all__ and the names that pylint complains about, you can have two set() and proceed to get the difference. Replace the line featuring wildcard imports with specific imports.
I wrote some refactoring tools to do just that. Star Namer will go through all of your wildcard * imports for a script and replace them with the actual functions to be imported.
Usage: ./star_namer.py module_filename script_filename
Once you've converted all of your star imports to actual names you can use from_to_import.py to fix them. This is how it works:
Running your script through pylint and counting up all of the currently undefined words.
Removing all of the from modname import lines from the script.
Running the script through pylint again and comparing the difference in undefined words.
Going through the json output of pylint (in reverse order), it determines the exact position of replacements to be made and inserts the modname. in the correct place.
I thought this approach would be a little more robust, by offloading the syntax processing to an advanced utility, that's designed for it, instead of trying to grep through all the text myself with regex expressions.
Usage: from_to_import.py script_name modname
It will show you what changes are to be made before making them. Press y to save. The main issues I've found so far are text alignment issues caused by inserting the modname. text which makes comments misaligned and it doesn't deal with aliased function names well (from ... import quickrun as qrun)
Full documentation here: https://github.com/SurpriseDog/Star-Wrangler

Is there an IDE/utility to refactor Python * imports to use standard module.member syntax?

I was recently tasked with maintaining a bunch of code that uses from module import * fairly heavily.
This codebase has gotten big enough that import conflicts/naming ambiguity/"where the heck did this function come from, there are like eight imported modules that have one with the same name?!"ism have become more and more common.
Moving forward, I've been using explicit members (i.e. import module ... module.object.function() to make the maintenance work I do more readable.
But I was wondering: is there an IDE or utility which robustly parses Python code and refactors * import statements into module import statements, and then prepends the full module path onto all references to members of that module?
We're not using metaprogramming/reflection/inspect/monkeypatching heavily, so if aforementened IDE/util behaves poorly with such things, that is OK.
Not a perfect solution, but what I usually do is this:
Open Pydev
Remove all * imports
Use the optimize imports command (ctrl+shift+o) to re-add all the imports
Roughly solves the problem :)
If you want to build a solution yourself, try http://docs.python.org/library/modulefinder.html
Here are the other related tools mentioned:
working with AST directly, which is very low-level for your use.
working with modulefinder which may have a lot of the boilerplate code you are looking for,
rope, a refactoring library (#Lucas Graf),
the bicycle repair man, a refactoring libary
the logilab-astng library used in pylint
More about pylint
pylint is a very good tool built on top of ast that is already able to tell you where in your code there are from somemodule import * statements, as well as telling you which imports are not necessary.
example:
# next is what's on line 32
from re import *
this will complain:
W: 32,0: Wildcard import re
W: 32,0: Unused import finditer from wildcard import
W: 32,0: Unused import LOCALE from wildcard import
... # this is a long list ...
Towards a solution?
Note that in the above output pylint gives you the line numbers. it might be some effort, but a refactoring tool can look at those particular warnings, get the line number, import the module and look at the __all__ list, or using a sandboxed execfile() statement to see the module's global names (would modulefinder help with that? maybe...). With the list of global names from __all__ and the names that pylint complains about, you can have two set() and proceed to get the difference. Replace the line featuring wildcard imports with specific imports.
I wrote some refactoring tools to do just that. Star Namer will go through all of your wildcard * imports for a script and replace them with the actual functions to be imported.
Usage: ./star_namer.py module_filename script_filename
Once you've converted all of your star imports to actual names you can use from_to_import.py to fix them. This is how it works:
Running your script through pylint and counting up all of the currently undefined words.
Removing all of the from modname import lines from the script.
Running the script through pylint again and comparing the difference in undefined words.
Going through the json output of pylint (in reverse order), it determines the exact position of replacements to be made and inserts the modname. in the correct place.
I thought this approach would be a little more robust, by offloading the syntax processing to an advanced utility, that's designed for it, instead of trying to grep through all the text myself with regex expressions.
Usage: from_to_import.py script_name modname
It will show you what changes are to be made before making them. Press y to save. The main issues I've found so far are text alignment issues caused by inserting the modname. text which makes comments misaligned and it doesn't deal with aliased function names well (from ... import quickrun as qrun)
Full documentation here: https://github.com/SurpriseDog/Star-Wrangler

Python: Importing everything from a python namespace / package

all.
I'd think that this could be answered easily, but it isn't. As long as I've been searching for an answer, I keep thinking that I'm overlooking something simple.
I have a python workspace with the following package structure:
MyTestProject
/src
/TestProjectNamespace
__init__.py
Module_A.py
Module_B.py
SecondTestProject
/src
/SecondTestProjectNamespace
__init__.py
Module_1.py
Module_2.py
...
Module_10.py
Note that MyTestProjectNamespace has a reference to SecondTestProjectNamespace.
In MyTestProjectNamespace, I need to import everything in SecondTestProjectNamespace. I could import one module at a time with the following statement(s):
from SecondTestProjectNamespace.Module_A import *
from SecondTestProjectNamespace.Module_B import *
...but this isn't practical if the SecondTestProject has 50 modules in it.
Does Python support a way to import everything in a namespace / package? Any help would be appreciated.
Thanks in advance.
Yes, you can roll this using pkgutil.
Here's an example that lists all packages under twisted (except tests), and imports them:
# -*- Mode: Python -*-
# vi:si:et:sw=4:sts=4:ts=4
import pkgutil
import twisted
for importer, modname, ispkg in pkgutil.walk_packages(
path=twisted.__path__,
prefix=twisted.__name__+'.',
onerror=lambda x: None):
# skip tests
if modname.find('test') > -1:
continue
print(modname)
# gloss over import errors
try:
__import__(modname)
except:
print 'Failed importing', modname
pass
# show that we actually imported all these, by showing one subpackage is imported
print twisted.python
I have to agree with the other posters that star imports are a bad idea.
No. It is possible to set up SecondTestProject to automatically import everything in its submodules, by putting code in __init__.py to do the from ... import * you mention. It's also possible to automate this to some extent using the __import__ function and/or the imp module. But there is no quick and easy way to take a package that isn't set up this way and make it work this way.
It's probably not a good idea anyway. If you have 50 modules, importing everything from all of them into your global namespace is going to cause a proliferation of names, and very likely conflicts among those names.
As other had put it - it might not be a good idea. But there are ways of keeping your namespaces and therefore avoiding naming conflicts - and having all the modules/sub-packages in a module available to the package user with a single import.
Let's suppose I have a package named "pack", within it a module named "a.py" defining some "b" variable. All I want to do is :
>>> import pack
>>> pack.a.b
1
One way of doing this is to put in pack/__init__.py a line that says
import a - thus in your case you'd need fifty such lines, and keep them up to date.
Not that bad.
However, the documentation at http://docs.python.org/tutorial/modules.html#importing-from-a-package - says that if you have a string list named __all__ in your __init__.py file, all module/sub-package names in that list are imported when one does from pack import *
That alone would half-work - but would require users of your package to perform the not-recommended "from x import *" form.
But -- you can do the "... import *" inside __init__.py itself, after defining the __all__ variable - so all you have to do is to keep the __all__ up to date:
With the TestProjectNamespace/__init__.py being like this:
__all__ = ["Module_A", "Module_B", ...]
from TestProjectNamespace import *
your users would have
TestProjectNamespace.Module_A (and others) available upon import of TestProjectNamespace.
And, of course - you could automate the creation of __all__ - it is just a variable, after all - but I would not recommend that.
Does Python support a way to import everything in a namespace / package?
No. A package is not a super-module -- it's a collection of modules grouped together.
At least part of the reason is that it's not trivial to determine what 'everything' means inside a folder: there are problems like network drives, soft links, hard links, ...

Self import of subpackages or not?

Suppose you have the following
b
b/__init__.py
b/c
b/c/__init__.py
b/c/d
b/c/d/__init__.py
In some python packages, if you import b, you only get the symbols defined in b. To access b.c, you have to explicitly import b.c or from b import c. In other words, you have to
import b
import b.c
import b.c.d
print b.c.d
In other cases I saw an automatic import of all the subpackages. This means that the following code does not produce an error
import b
print b.c.d
because b/__init__.py takes care of importing its subpackages.
I tend to prefer the first (explicit better than implicit), and I always used it, but are there cases where the second one is preferred to the first?
I like namespaces -- so I think that import b should only get what's in b itself (presumably in b/__init__.py). If there's a reason to segregate other functionality in b.c, b.c.d, or whatever, then just import b should not drag it all in -- if the "drag it all in" does happen, I think that suggests that the namespace separation was probably a bogus one to start with. Of course, there are examples even in the standard library (import os, then you can use os.path.join and the like), but they're ancient, by now essentially "grandfathered" things from way before the Python packaging system was mature and stable. In new code, I'd strongly recommend that a package should not drag its subpackages along for the ride when you import it. (Do import this at the Python prompt and contemplate the very last line it shows;-).
__all__ = [your vars, functions, classes]
Use syntax above in package b's __init__.py to auto load things listed in dict. :)

What are good rules of thumb for Python imports?

I am a little confused by the multitude of ways in which you can import modules in Python.
import X
import X as Y
from A import B
I have been reading up about scoping and namespaces, but I would like some practical advice on what is the best strategy, under which circumstances and why. Should imports happen at a module level or a method/function level? In the __init__.py or in the module code itself?
My question is not really answered by "Python packages - import by class, not file" although it is obviously related.
In production code in our company, we try to follow the following rules.
We place imports at the beginning of the file, right after the main file's docstring, e.g.:
"""
Registry related functionality.
"""
import wx
# ...
Now, if we import a class that is one of few in the imported module, we import the name directly, so that in the code we only have to use the last part, e.g.:
from RegistryController import RegistryController
from ui.windows.lists import ListCtrl, DynamicListCtrl
There are modules, however, that contain dozens of classes, e.g. list of all possible exceptions. Then we import the module itself and reference to it in the code:
from main.core import Exceptions
# ...
raise Exceptions.FileNotFound()
We use the import X as Y as rarely as possible, because it makes searching for usage of a particular module or class difficult. Sometimes, however, you have to use it if you wish to import two classes that have the same name, but exist in different modules, e.g.:
from Queue import Queue
from main.core.MessageQueue import Queue as MessageQueue
As a general rule, we don't do imports inside methods -- they simply make code slower and less readable. Some may find this a good way to easily resolve cyclic imports problem, but a better solution is code reorganization.
Let me just paste a part of conversation on django-dev mailing list started by Guido van Rossum:
[...]
For example, it's part of the Google Python style guides[1] that all
imports must import a module, not a class or function from that
module. There are way more classes and functions than there are
modules, so recalling where a particular thing comes from is much
easier if it is prefixed with a module name. Often multiple modules
happen to define things with the same name -- so a reader of the code
doesn't have to go back to the top of the file to see from which
module a given name is imported.
Source: http://groups.google.com/group/django-developers/browse_thread/thread/78975372cdfb7d1a
1: http://code.google.com/p/soc/wiki/PythonStyleGuide#Module_and_package_imports
I would normally use import X on module level. If you only need a single object from a module, use from X import Y.
Only use import X as Y in case you're otherwise confronted with a name clash.
I only use imports on function level to import stuff I need when the module is used as the main module, like:
def main():
import sys
if len(sys.argv) > 1:
pass
HTH
Someone above said that
from X import A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P
is equivalent to
import X
import X allows direct modifications to A-P, while from X import ... creates copies of A-P. For from X import A..P you do not get updates to variables if they are modified. If you modify them, you only modify your copy, but X does know about your modifications.
If A-P are functions, you won't know the difference.
Others have covered most of the ground here but I just wanted to add one case where I will use import X as Y (temporarily), when I'm trying out a new version of a class or module.
So if we were migrating to a new implementation of a module, but didn't want to cut the code base over all at one time, we might write a xyz_new module and do this in the source files that we had migrated:
import xyz_new as xyz
Then, once we cut over the entire code base, we'd just replace the xyz module with xyz_new and change all of the imports back to
import xyz
DON'T do this:
from X import *
unless you are absolutely sure that you will use each and every thing in that module. And even then, you should probably reconsider using a different approach.
Other than that, it's just a matter of style.
from X import Y
is good and saves you lots of typing. I tend to use that when I'm using something in it fairly frequently But if you're importing a lot from that module, you could end up with an import statement that looks like this:
from X import A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P
You get the idea. That's when imports like
import X
become useful. Either that or if I'm not really using anything in X very frequently.
I generally try to use the regular import modulename, unless the module name is long, or used often..
For example, I would do..
from BeautifulSoup import BeautifulStoneSoup as BSS
..so I can do soup = BSS(html) instead of BeautifulSoup.BeautifulStoneSoup(html)
Or..
from xmpp import XmppClientBase
..instead of importing the entire of xmpp when I only use the XmppClientBase
Using import x as y is handy if you want to import either very long method names , or to prevent clobbering an existing import/variable/class/method (something you should try to avoid completely, but it's not always possible)
Say I want to run a main() function from another script, but I already have a main() function..
from my_other_module import main as other_module_main
..wouldn't replace my main function with my_other_module's main
Oh, one thing - don't do from x import * - it makes your code very hard to understand, as you cannot easily see where a method came from (from x import *; from y import *; my_func() - where is my_func defined?)
In all cases, you could just do import modulename and then do modulename.subthing1.subthing2.method("test")...
The from x import y as z stuff is purely for convenience - use it whenever it'll make your code easier to read or write!
When you have a well-written library, which is sometimes case in python, you ought just import it and use it as it. Well-written library tends to take life and language of its own, resulting in pleasant-to-read -code, where you rarely reference the library. When a library is well-written, you ought not need renaming or anything else too often.
import gat
node = gat.Node()
child = node.children()
Sometimes it's not possible to write it this way, or then you want to lift down things from library you imported.
from gat import Node, SubNode
node = Node()
child = SubNode(node)
Sometimes you do this for lot of things, if your import string overflows 80 columns, It's good idea to do this:
from gat import (
Node, SubNode, TopNode, SuperNode, CoolNode,
PowerNode, UpNode
)
The best strategy is to keep all of these imports on the top of the file. Preferrably ordered alphabetically, import -statements first, then from import -statements.
Now I tell you why this is the best convention.
Python could perfectly have had an automatic import, which'd look from the main imports for the value when it can't be found from global namespace. But this is not a good idea. I explain shortly why. Aside it being more complicated to implement than simple import, programmers wouldn't be so much thinking about the depedencies and finding out from where you imported things ought be done some other way than just looking into imports.
Need to find out depedencies is one reason why people hate "from ... import *". Some bad examples where you need to do this exist though, for example opengl -wrappings.
So the import definitions are actually valuable as defining the depedencies of the program. It is the way how you should exploit them. From them you can quickly just check where some weird function is imported from.
The import X as Y is useful if you have different implementations of the same module/class.
With some nested try..import..except ImportError..imports you can hide the implementation from your code. See lxml etree import example:
try:
from lxml import etree
print("running with lxml.etree")
except ImportError:
try:
# Python 2.5
import xml.etree.cElementTree as etree
print("running with cElementTree on Python 2.5+")
except ImportError:
try:
# Python 2.5
import xml.etree.ElementTree as etree
print("running with ElementTree on Python 2.5+")
except ImportError:
try:
# normal cElementTree install
import cElementTree as etree
print("running with cElementTree")
except ImportError:
try:
# normal ElementTree install
import elementtree.ElementTree as etree
print("running with ElementTree")
except ImportError:
print("Failed to import ElementTree from any known place")
I'm with Jason in the fact of not using
from X import *
But in my case (i'm not an expert programmer, so my code does not meet the coding style too well) I usually do in my programs a file with all the constants like program version, authors, error messages and all that stuff, so the file are just definitions, then I make the import
from const import *
That saves me a lot of time. But it's the only file that has that import, and it's because all inside that file are just variable declarations.
Doing that kind of import in a file with classes and definitions might be useful, but when you have to read that code you spend lots of time locating functions and classes.

Categories