Pythonic way to import modules in a packaged .py file [duplicate] - python
PEP 8 states:
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed?
Isn't this:
class SomeClass(object):
def not_often_called(self)
from datetime import datetime
self.datetime = datetime.now()
more efficient than this?
from datetime import datetime
class SomeClass(object):
def not_often_called(self)
self.datetime = datetime.now()
Module importing is quite fast, but not instant. This means that:
Putting the imports at the top of the module is fine, because it's a trivial cost that's only paid once.
Putting the imports within a function will cause calls to that function to take longer.
So if you care about efficiency, put the imports at the top. Only move them into a function if your profiling shows that would help (you did profile to see where best to improve performance, right??)
The best reasons I've seen to perform lazy imports are:
Optional library support. If your code has multiple paths that use different libraries, don't break if an optional library is not installed.
In the __init__.py of a plugin, which might be imported but not actually used. Examples are Bazaar plugins, which use bzrlib's lazy-loading framework.
Putting the import statement inside of a function can prevent circular dependencies.
For example, if you have 2 modules, X.py and Y.py, and they both need to import each other, this will cause a circular dependency when you import one of the modules causing an infinite loop. If you move the import statement in one of the modules then it won't try to import the other module till the function is called, and that module will already be imported, so no infinite loop. Read here for more - effbot.org/zone/import-confusion.htm
I have adopted the practice of putting all imports in the functions that use them, rather than at the top of the module.
The benefit I get is the ability to refactor more reliably. When I move a function from one module to another, I know that the function will continue to work with all of its legacy of testing intact. If I have my imports at the top of the module, when I move a function, I find that I end up spending a lot of time getting the new module's imports complete and minimal. A refactoring IDE might make this irrelevant.
There is a speed penalty as mentioned elsewhere. I have measured this in my application and found it to be insignificant for my purposes.
It is also nice to be able to see all module dependencies up front without resorting to search (e.g. grep). However, the reason I care about module dependencies is generally because I'm installing, refactoring, or moving an entire system comprising multiple files, not just a single module. In that case, I'm going to perform a global search anyway to make sure I have the system-level dependencies. So I have not found global imports to aid my understanding of a system in practice.
I usually put the import of sys inside the if __name__=='__main__' check and then pass arguments (like sys.argv[1:]) to a main() function. This allows me to use main in a context where sys has not been imported.
Most of the time this would be useful for clarity and sensible to do but it's not always the case. Below are a couple of examples of circumstances where module imports might live elsewhere.
Firstly, you could have a module with a unit test of the form:
if __name__ == '__main__':
import foo
aa = foo.xyz() # initiate something for the test
Secondly, you might have a requirement to conditionally import some different module at runtime.
if [condition]:
import foo as plugin_api
else:
import bar as plugin_api
xx = plugin_api.Plugin()
[...]
There are probably other situations where you might place imports in other parts in the code.
The first variant is indeed more efficient than the second when the function is called either zero or one times. With the second and subsequent invocations, however, the "import every call" approach is actually less efficient. See this link for a lazy-loading technique that combines the best of both approaches by doing a "lazy import".
But there are reasons other than efficiency why you might prefer one over the other. One approach is makes it much more clear to someone reading the code as to the dependencies that this module has. They also have very different failure characteristics -- the first will fail at load time if there's no "datetime" module while the second won't fail until the method is called.
Added Note: In IronPython, imports can be quite a bit more expensive than in CPython because the code is basically being compiled as it's being imported.
Curt makes a good point: the second version is clearer and will fail at load time rather than later, and unexpectedly.
Normally I don't worry about the efficiency of loading modules, since it's (a) pretty fast, and (b) mostly only happens at startup.
If you have to load heavyweight modules at unexpected times, it probably makes more sense to load them dynamically with the __import__ function, and be sure to catch ImportError exceptions, and handle them in a reasonable manner.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
In most cases you want to load the modules at the top of the source file. For somebody reading your code, it makes it much easier to tell what function or object came from what module.
One good reason to import a module elsewhere in the code is if it's used in a debugging statement.
For example:
do_something_with_x(x)
I could debug this with:
from pprint import pprint
pprint(x)
do_something_with_x(x)
Of course, the other reason to import modules elsewhere in the code is if you need to dynamically import them. This is because you pretty much don't have any choice.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
Here's an updated summary of the answers to this
and
related
questions.
PEP 8
recommends putting imports at the top.
It's often more convenient to get
ImportErrors
when you first run your program
rather than when your program first calls your function.
Putting imports in the function scope
can help avoid issues with circular imports.
Putting imports in the function scope
helps keep maintain a clean module namespace,
so that it does not appear among tab-completion suggestions.
Start-up time:
imports in a function won't run until (if) that function is called.
Might get significant with heavy-weight libraries.
Even though import statements are super fast on subsequent runs,
they still incur a speed penalty
which can be significant if the function is trivial but frequently in use.
Imports under the __name__ == "__main__" guard seem very reasonable.
Refactoring
might be easier if the imports are located in the function
where they're used (facilitates moving it to another module).
It can also be argued that this is good for readability.
However, most would argue the contrary, i.e.
Imports at the top enhance readability,
since you can see all your dependencies at a glance.
It seems unclear if dynamic or conditional imports favour one style over another.
I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.
If you import at the top, you take the load hit no matter what. That's pretty small, but commonly in the milliseconds, not nanoseconds.
If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn't happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as #aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).
Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don't know why).
0 foo: 14429.0924 µs
1 foo: 63.8962 µs
2 foo: 10.0136 µs
3 foo: 7.1526 µs
4 foo: 7.8678 µs
0 bar: 9.0599 µs
1 bar: 6.9141 µs
2 bar: 7.1526 µs
3 bar: 7.8678 µs
4 bar: 7.1526 µs
The code:
from __future__ import print_function
from time import time
def foo():
import collections
import re
import string
import math
import subprocess
return
def bar():
import collections
import re
import string
import math
import subprocess
return
t0 = time()
for i in xrange(5):
foo()
t1 = time()
print(" %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
for i in xrange(5):
bar()
t1 = time()
print(" %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
It's a tradeoff, that only the programmer can decide to make.
Case 1 saves some memory and startup time by not importing the datetime module (and doing whatever initialization it might require) until needed. Note that doing the import 'only when called' also means doing it 'every time when called', so each call after the first one is still incurring the additional overhead of doing the import.
Case 2 save some execution time and latency by importing datetime beforehand so that not_often_called() will return more quickly when it is called, and also by not incurring the overhead of an import on every call.
Besides efficiency, it's easier to see module dependencies up front if the import statements are ... up front. Hiding them down in the code can make it more difficult to easily find what modules something depends on.
Personally I generally follow the PEP except for things like unit tests and such that I don't want always loaded because I know they aren't going to be used except for test code.
Here's an example where all the imports are at the very top (this is the only time I've needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows.
import os
# ...
try:
kill = os.kill # will raise AttributeError on Windows
from signal import SIGTERM
def terminate(process):
kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
try:
from win32api import TerminateProcess # use win32api if available
def terminate(process):
TerminateProcess(int(process._handle), -1)
except ImportError:
def terminate(process):
raise NotImplementedError # define a dummy function
(On review: what John Millikin said.)
This is like many other optimizations - you sacrifice some readability for speed. As John mentioned, if you've done your profiling homework and found this to be a significantly useful enough change and you need the extra speed, then go for it. It'd probably be good to put a note up with all the other imports:
from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below
Module initialization only occurs once - on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope.
Couple that information with the argument for readability and I would say that it is best to have the import statement at module scope.
Just to complete Moe's answer and the original question:
When we have to deal with circular dependences we can do some "tricks". Assuming we're working with modules a.py and b.py that contain x() and b y(), respectively. Then:
We can move one of the from imports at the bottom of the module.
We can move one of the from imports inside the function or method that is actually requiring the import (this isn't always possible, as you may use it from several places).
We can change one of the two from imports to be an import that looks like: import a
So, to conclude. If you aren't dealing with circular dependencies and doing some kind of trick to avoid them, then it's better to put all your imports at the top because of the reasons already explained in other answers to this question. And please, when doing this "tricks" include a comment, it's always welcome! :)
In addition to the excellent answers already given, it's worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution.
This issue often comes up in Apache Spark's Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It's best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.
I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users.
So my answer is no, do not always put the imports at the top of your modules.
It's interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.
Readability
In addition to startup performance, there is a readability argument to be made for localizing import statements. For example take python line numbers 1283 through 1296 in my current first python project:
listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
str(Gtk.get_minor_version())+"."+
str(Gtk.get_micro_version())])
import xml.etree.ElementTree as ET
xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
result[2]+" "+result[3]])
If the import statement was at the top of file I would have to scroll up a long way, or press Home, to find out what ET was. Then I would have to navigate back to line 1283 to continue reading code.
Indeed even if the import statement was at the top of the function (or class) as many would place it, paging up and back down would be required.
Displaying the Gnome version number will rarely be done so the import at top of file introduces unnecessary startup lag.
There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help.
test.py
X=10
Y=11
Z=12
def add(i):
i = i + 10
runlocal.py
from test import add, X, Y, Z
def callme():
x=X
y=Y
z=Z
ladd=add
for i in range(100000000):
ladd(i)
x+y+z
callme()
run.py
from test import add, X, Y, Z
def callme():
for i in range(100000000):
add(i)
X+Y+Z
callme()
A time on Linux shows a small gain
/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py
0:17.80 real, 17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py
0:14.23 real, 14.22 user, 0.01 sys
real is wall clock. user is time in program. sys is time for system calls.
https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names
I would like to mention a usecase of mine, very similar to those mentioned by #John Millikin and #V.K.:
Optional Imports
I do data analysis with Jupyter Notebook, and I use the same IPython notebook as a template for all analyses. In some occasions, I need to import Tensorflow to do some quick model runs, but sometimes I work in places where tensorflow isn't set up / is slow to import. In those cases, I encapsulate my Tensorflow-dependent operations in a helper function, import tensorflow inside that function, and bind it to a button.
This way, I could do "restart-and-run-all" without having to wait for the import, or having to resume the rest of the cells when it fails.
While PEP encourages importing at the top of a module, it isn't an error to import at other levels. That indicates imports should be at the top, however there are exceptions.
It is a micro-optimization to load modules when they are used. Code that is sluggish importing can be optimized later if it makes a sizable difference.
Still, you might introduce flags to conditionally import at as near to the top as possible, allowing a user to use configuration to import the modules they need while still importing everything immediately.
Importing as soon as possible means the program will fail if any imports (or imports of imports) are missing or have syntax errors. If all imports occur at the top of all modules then python works in two steps. Compile. Run.
Built in modules work anywhere they are imported because they are well designed. Modules you write should be the same. Moving around your imports to the top or to their first use can help ensure there are no side effects and the code is injecting dependencies.
Whether you put imports at the top or not, your code should still work when the imports are at the top. So start by importing immediately then optimize as needed.
This is a fascinating discussion. Like many others I had never even considered this topic. I got cornered into having to have the imports in the functions because of wanting to use the Django ORM in one of my libraries. I was having to call django.setup() before importing my model classes and because this was at the top of the file it was being dragged into completely non-Django library code because of the IoC injector construction.
I kind of hacked around a bit and ended up putting the django.setup() in the singleton constructor and the relevant import at the top of each class method. Now this worked fine but made me uneasy because the imports weren't at the top and also I started worrying about the extra time hit of the imports. Then I came here and read with great interest everybody's take on this.
I have a long C++ background and now use Python/Cython. My take on this is that why not put the imports in the function unless it causes you a profiled bottleneck. It's only like declaring space for variables just before you need them. The trouble is I have thousands of lines of code with all the imports at the top! So I think I will do it from now on and change the odd file here and there when I'm passing through and have the time.
Related
How do I connect different namespaces and modules in Python
I'm developing a physical model using Python. I right now have four files (I suppose I can call them modules but I really am no Python expert) in one folder. Right now it works just fine but after reading a bit it turned out the way I do it could lead to errors when further expanding my model (by using from ... import* statement). (1) initialize: Does nothing except for calling the plot() module with some information about what to plot. Here I would like to set some initial parameters which won't change during the computation and pass them to the compute module. Imports plot (2) compute: Contains initial values (which I actually want to set in initialize module) and all the computation algorithm. During computation, functions which are stored in functions module are being called a lot. Imports functions (3) functions: Contains actual physical functions. (4) plot: Contains all the plots. Imports everything from compute (from compute import *) Originally I only had the compute and the functions module. When I started plotting however, the compute module became too big (in my eyes) so I wanted to split. I realized I would need to pass all the variables I calculated in compute (quite a few) as arguments to the plot module. Or alternatively import the compute module with a prefix (e.g. import compute as com --> com.variable). I didn't like both options too much. So i decided to use the from compute import * statement, which allows me to use the same variable names as originally defined in compute. This looks a lot cleaner to me and makes it easier to read to variable names. The problem is, when importing compute into my plot module (to get all the variable names) the code in compute is executed. But I would like to call compute from my initialize module to set up the initial parameters. However If i do so i would need to call compute twice, which takes a lot more time. Besides that, from ... import * apparently is not a good choice. I'm grateful for any suggestions.
On the import statements The main reason to avoid from my_model import * statements is, that it obscures which variables are imported (and available in the scope after the importing) and where does the variables comes from. For example if you have: from mod1 import * from mod2 import * ... print(my_var) ... it is not immediately clear if the my_var comes from mod1 or mod2. Furthermore, lets say, we define my_var in mod1 and then during the development we define my_var also in mod2 (for example even without exporting it at all), suddenly my_var is something else which may lead to weird errors. And if you are using chained imports, it is really hard to keep track which import gets executed first and which last, so generally you need to make sure you never define variables with clashing names. This problem is somewhat (personally, I still think it is worth avoiding just for that reason) mitigated with usage of modern IDEs which can point out clashing names, however it is still subject to overlooking. However, speaking of the IDEs... if you know a function you need is in a specific module -- if you imported the module as import my_mod -- you can type my_mod. and the IDE shows you the options, if you import it via from my_mod import * its not so easy. For these reason it is generally recommended to use plain import my_mod or using aliases import my_mod as mm which have none of these problems. I know you mentioned you don't like this approach but maybe it is worth reconsidering. Nevertheless, if you still don't like it there is a middle ground of from my_mod import var_1, var_2, func. Then you don't have to write mm. every time you need the variables. It has a drawback that every time you want to use a new variable defined in my_mod you'll need to add it to the import statement as well, however on the bright side, this approach would make you sure you don't import clashing variables (as while adding it to import statement you can cross-check with other imports, and also it will make sure that you don't end up importing some variables that you don't want to import. Nonetheless, if you are building your own small project you might indeed end up using from my_mod import * just be aware of the drawbacks mentioned above. On the execution while importing The fact that some (non-trivial) code gets executed during import statement suggests bad code design. During the import call, the only code that gets "executed" is the code written in the "top scope". That is, if there are some functions etc. these function (naturally) don't get executed, just... defined. However if you have some "direct" code in the "top scope", for example: print(42) this code will get executed (and prints 42) when first encountered appropriate import statement. In most cases in this "top scope" should be just only class/function definitions and occasionally some global variables. All other code should be enclosed in some function or class or method... exactly for the reason, so it wouldn't be executed at the import statement. So if you have some code, which you wish to execute at specific moment, wrap that part of the code in a function a call that function (from the other module), rather than relying on the import statement to execute the code. Note that there are some exceptions to aforementioned "rule". Most notably, if you want some code to gets executed when you call the file via python3 my_file.py you need to put the code you wish to be executed in the "top level". However even in this case this code should be put within if __name__ == '__main__': guard for this reason -- not to execute the code on import (just when "calling the file" directly), additionally, it is a good practice to have a very short code within this guard (for example just calling some function)
for namespace you can check python document and for module put command in terminal pip install (module name) example:- pip install flask #the flask module is automatically download in your system
python - import at top of file vs inside a function [duplicate]
PEP 8 states: Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants. However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed? Isn't this: class SomeClass(object): def not_often_called(self) from datetime import datetime self.datetime = datetime.now() more efficient than this? from datetime import datetime class SomeClass(object): def not_often_called(self) self.datetime = datetime.now()
Module importing is quite fast, but not instant. This means that: Putting the imports at the top of the module is fine, because it's a trivial cost that's only paid once. Putting the imports within a function will cause calls to that function to take longer. So if you care about efficiency, put the imports at the top. Only move them into a function if your profiling shows that would help (you did profile to see where best to improve performance, right??) The best reasons I've seen to perform lazy imports are: Optional library support. If your code has multiple paths that use different libraries, don't break if an optional library is not installed. In the __init__.py of a plugin, which might be imported but not actually used. Examples are Bazaar plugins, which use bzrlib's lazy-loading framework.
Putting the import statement inside of a function can prevent circular dependencies. For example, if you have 2 modules, X.py and Y.py, and they both need to import each other, this will cause a circular dependency when you import one of the modules causing an infinite loop. If you move the import statement in one of the modules then it won't try to import the other module till the function is called, and that module will already be imported, so no infinite loop. Read here for more - effbot.org/zone/import-confusion.htm
I have adopted the practice of putting all imports in the functions that use them, rather than at the top of the module. The benefit I get is the ability to refactor more reliably. When I move a function from one module to another, I know that the function will continue to work with all of its legacy of testing intact. If I have my imports at the top of the module, when I move a function, I find that I end up spending a lot of time getting the new module's imports complete and minimal. A refactoring IDE might make this irrelevant. There is a speed penalty as mentioned elsewhere. I have measured this in my application and found it to be insignificant for my purposes. It is also nice to be able to see all module dependencies up front without resorting to search (e.g. grep). However, the reason I care about module dependencies is generally because I'm installing, refactoring, or moving an entire system comprising multiple files, not just a single module. In that case, I'm going to perform a global search anyway to make sure I have the system-level dependencies. So I have not found global imports to aid my understanding of a system in practice. I usually put the import of sys inside the if __name__=='__main__' check and then pass arguments (like sys.argv[1:]) to a main() function. This allows me to use main in a context where sys has not been imported.
Most of the time this would be useful for clarity and sensible to do but it's not always the case. Below are a couple of examples of circumstances where module imports might live elsewhere. Firstly, you could have a module with a unit test of the form: if __name__ == '__main__': import foo aa = foo.xyz() # initiate something for the test Secondly, you might have a requirement to conditionally import some different module at runtime. if [condition]: import foo as plugin_api else: import bar as plugin_api xx = plugin_api.Plugin() [...] There are probably other situations where you might place imports in other parts in the code.
The first variant is indeed more efficient than the second when the function is called either zero or one times. With the second and subsequent invocations, however, the "import every call" approach is actually less efficient. See this link for a lazy-loading technique that combines the best of both approaches by doing a "lazy import". But there are reasons other than efficiency why you might prefer one over the other. One approach is makes it much more clear to someone reading the code as to the dependencies that this module has. They also have very different failure characteristics -- the first will fail at load time if there's no "datetime" module while the second won't fail until the method is called. Added Note: In IronPython, imports can be quite a bit more expensive than in CPython because the code is basically being compiled as it's being imported.
Curt makes a good point: the second version is clearer and will fail at load time rather than later, and unexpectedly. Normally I don't worry about the efficiency of loading modules, since it's (a) pretty fast, and (b) mostly only happens at startup. If you have to load heavyweight modules at unexpected times, it probably makes more sense to load them dynamically with the __import__ function, and be sure to catch ImportError exceptions, and handle them in a reasonable manner.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible. In most cases you want to load the modules at the top of the source file. For somebody reading your code, it makes it much easier to tell what function or object came from what module. One good reason to import a module elsewhere in the code is if it's used in a debugging statement. For example: do_something_with_x(x) I could debug this with: from pprint import pprint pprint(x) do_something_with_x(x) Of course, the other reason to import modules elsewhere in the code is if you need to dynamically import them. This is because you pretty much don't have any choice. I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
Here's an updated summary of the answers to this and related questions. PEP 8 recommends putting imports at the top. It's often more convenient to get ImportErrors when you first run your program rather than when your program first calls your function. Putting imports in the function scope can help avoid issues with circular imports. Putting imports in the function scope helps keep maintain a clean module namespace, so that it does not appear among tab-completion suggestions. Start-up time: imports in a function won't run until (if) that function is called. Might get significant with heavy-weight libraries. Even though import statements are super fast on subsequent runs, they still incur a speed penalty which can be significant if the function is trivial but frequently in use. Imports under the __name__ == "__main__" guard seem very reasonable. Refactoring might be easier if the imports are located in the function where they're used (facilitates moving it to another module). It can also be argued that this is good for readability. However, most would argue the contrary, i.e. Imports at the top enhance readability, since you can see all your dependencies at a glance. It seems unclear if dynamic or conditional imports favour one style over another.
I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect. If you import at the top, you take the load hit no matter what. That's pretty small, but commonly in the milliseconds, not nanoseconds. If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn't happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as #aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963). Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don't know why). 0 foo: 14429.0924 µs 1 foo: 63.8962 µs 2 foo: 10.0136 µs 3 foo: 7.1526 µs 4 foo: 7.8678 µs 0 bar: 9.0599 µs 1 bar: 6.9141 µs 2 bar: 7.1526 µs 3 bar: 7.8678 µs 4 bar: 7.1526 µs The code: from __future__ import print_function from time import time def foo(): import collections import re import string import math import subprocess return def bar(): import collections import re import string import math import subprocess return t0 = time() for i in xrange(5): foo() t1 = time() print(" %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6)) t0 = t1 for i in xrange(5): bar() t1 = time() print(" %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6)) t0 = t1
It's a tradeoff, that only the programmer can decide to make. Case 1 saves some memory and startup time by not importing the datetime module (and doing whatever initialization it might require) until needed. Note that doing the import 'only when called' also means doing it 'every time when called', so each call after the first one is still incurring the additional overhead of doing the import. Case 2 save some execution time and latency by importing datetime beforehand so that not_often_called() will return more quickly when it is called, and also by not incurring the overhead of an import on every call. Besides efficiency, it's easier to see module dependencies up front if the import statements are ... up front. Hiding them down in the code can make it more difficult to easily find what modules something depends on. Personally I generally follow the PEP except for things like unit tests and such that I don't want always loaded because I know they aren't going to be used except for test code.
Here's an example where all the imports are at the very top (this is the only time I've needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows. import os # ... try: kill = os.kill # will raise AttributeError on Windows from signal import SIGTERM def terminate(process): kill(process.pid, SIGTERM) except (AttributeError, ImportError): try: from win32api import TerminateProcess # use win32api if available def terminate(process): TerminateProcess(int(process._handle), -1) except ImportError: def terminate(process): raise NotImplementedError # define a dummy function (On review: what John Millikin said.)
This is like many other optimizations - you sacrifice some readability for speed. As John mentioned, if you've done your profiling homework and found this to be a significantly useful enough change and you need the extra speed, then go for it. It'd probably be good to put a note up with all the other imports: from foo import bar from baz import qux # Note: datetime is imported in SomeClass below
Module initialization only occurs once - on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope. Couple that information with the argument for readability and I would say that it is best to have the import statement at module scope.
Just to complete Moe's answer and the original question: When we have to deal with circular dependences we can do some "tricks". Assuming we're working with modules a.py and b.py that contain x() and b y(), respectively. Then: We can move one of the from imports at the bottom of the module. We can move one of the from imports inside the function or method that is actually requiring the import (this isn't always possible, as you may use it from several places). We can change one of the two from imports to be an import that looks like: import a So, to conclude. If you aren't dealing with circular dependencies and doing some kind of trick to avoid them, then it's better to put all your imports at the top because of the reasons already explained in other answers to this question. And please, when doing this "tricks" include a comment, it's always welcome! :)
In addition to the excellent answers already given, it's worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution. This issue often comes up in Apache Spark's Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It's best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.
I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users. So my answer is no, do not always put the imports at the top of your modules.
It's interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.
Readability In addition to startup performance, there is a readability argument to be made for localizing import statements. For example take python line numbers 1283 through 1296 in my current first python project: listdata.append(['tk font version', font_version]) listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+ str(Gtk.get_minor_version())+"."+ str(Gtk.get_micro_version())]) import xml.etree.ElementTree as ET xmltree = ET.parse('/usr/share/gnome/gnome-version.xml') xmlroot = xmltree.getroot() result = [] for child in xmlroot: result.append(child.text) listdata.append(['Gnome version', result[0]+"."+result[1]+"."+ result[2]+" "+result[3]]) If the import statement was at the top of file I would have to scroll up a long way, or press Home, to find out what ET was. Then I would have to navigate back to line 1283 to continue reading code. Indeed even if the import statement was at the top of the function (or class) as many would place it, paging up and back down would be required. Displaying the Gnome version number will rarely be done so the import at top of file introduces unnecessary startup lag.
There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help. test.py X=10 Y=11 Z=12 def add(i): i = i + 10 runlocal.py from test import add, X, Y, Z def callme(): x=X y=Y z=Z ladd=add for i in range(100000000): ladd(i) x+y+z callme() run.py from test import add, X, Y, Z def callme(): for i in range(100000000): add(i) X+Y+Z callme() A time on Linux shows a small gain /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py 0:17.80 real, 17.77 user, 0.01 sys /tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py 0:14.23 real, 14.22 user, 0.01 sys real is wall clock. user is time in program. sys is time for system calls. https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names
I would like to mention a usecase of mine, very similar to those mentioned by #John Millikin and #V.K.: Optional Imports I do data analysis with Jupyter Notebook, and I use the same IPython notebook as a template for all analyses. In some occasions, I need to import Tensorflow to do some quick model runs, but sometimes I work in places where tensorflow isn't set up / is slow to import. In those cases, I encapsulate my Tensorflow-dependent operations in a helper function, import tensorflow inside that function, and bind it to a button. This way, I could do "restart-and-run-all" without having to wait for the import, or having to resume the rest of the cells when it fails.
While PEP encourages importing at the top of a module, it isn't an error to import at other levels. That indicates imports should be at the top, however there are exceptions. It is a micro-optimization to load modules when they are used. Code that is sluggish importing can be optimized later if it makes a sizable difference. Still, you might introduce flags to conditionally import at as near to the top as possible, allowing a user to use configuration to import the modules they need while still importing everything immediately. Importing as soon as possible means the program will fail if any imports (or imports of imports) are missing or have syntax errors. If all imports occur at the top of all modules then python works in two steps. Compile. Run. Built in modules work anywhere they are imported because they are well designed. Modules you write should be the same. Moving around your imports to the top or to their first use can help ensure there are no side effects and the code is injecting dependencies. Whether you put imports at the top or not, your code should still work when the imports are at the top. So start by importing immediately then optimize as needed.
This is a fascinating discussion. Like many others I had never even considered this topic. I got cornered into having to have the imports in the functions because of wanting to use the Django ORM in one of my libraries. I was having to call django.setup() before importing my model classes and because this was at the top of the file it was being dragged into completely non-Django library code because of the IoC injector construction. I kind of hacked around a bit and ended up putting the django.setup() in the singleton constructor and the relevant import at the top of each class method. Now this worked fine but made me uneasy because the imports weren't at the top and also I started worrying about the extra time hit of the imports. Then I came here and read with great interest everybody's take on this. I have a long C++ background and now use Python/Cython. My take on this is that why not put the imports in the function unless it causes you a profiled bottleneck. It's only like declaring space for variables just before you need them. The trouble is I have thousands of lines of code with all the imports at the top! So I think I will do it from now on and change the odd file here and there when I'm passing through and have the time.
What is the benefit of putting an import statement inside of a class? [duplicate]
I created a module named util that provides classes and functions I often use in Python. Some of them need imported features. What are the pros and the cons of importing needed things inside class/function definition? Is it better than import at the beginning of a module file? Is it a good idea?
It's the most common style to put every import at the top of the file. PEP 8 recommends it, which is a good reason to do it to start with. But that's not a whim, it has advantages (although not critical enough to make everything else a crime). It allows finding all imports at a glance, as opposed to looking through the whole file. It also ensures everything is imported before any other code (which may depend on some imports) is executed. NameErrors are usually easy to resolve, but they can be annoying. There's no (significant) namespace pollution to be avoided by keeping the module in a smaller scope, since all you add is the actual module (no, import * doesn't count and probably shouldn't be used anyway). Inside functions, you'd import again on every call (not really harmful since everything is imported once, but uncalled for).
PEP8, the Python style guide, states that: Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants. Of course this is no hard and fast rule, and imports can go anywhere you want them to. But putting them at the top is the best way to go about it. You can of course import within functions or a class. But note you cannot do this: def foo(): from os import * Because: SyntaxWarning: import * only allowed at module level
Like flying sheep's answer, I agree that the others are right, but I put imports in other places like in __init__() routines and function calls when I am DEVELOPING code. After my class or function has been tested and proven to work with the import inside of it, I normally give it its own module with the import following PEP8 guidelines. I do this because sometimes I forget to delete imports after refactoring code or removing old code with bad ideas. By keeping the imports inside the class or function under development, I am specifying its dependencies should I want to copy it elsewhere or promote it to its own module...
Only move imports into a local scope, such as inside a function definition, if it’s necessary to solve a problem such as avoiding a circular import or are trying to reduce the initialization time of a module. This technique is especially helpful if many of the imports are unnecessary depending on how the program executes. You may also want to move imports into a function if the modules are only ever used in that function. Note that loading a module the first time may be expensive because of the one time initialization of the module, but loading a module multiple times is virtually free, costing only a couple of dictionary lookups. Even if the module name has gone out of scope, the module is probably available in sys.modules. https://docs.python.org/3/faq/programming.html#what-are-the-best-practices-for-using-import-in-a-module
I believe that it's best practice (according to some PEP's) that you keep import statements at the beginning of a module. You can add import statements to an __init__.py file, which will import those module to all modules inside the package. So...it's certainly something you can do the way you're doing it, but it's discouraged and actually unnecessary.
While the other answers are mostly right, there is a reason why python allows this. It is not smart to import redundant stuff which isn’t needed. So, if you want to e.g. parse XML into an element tree, but don’t want to use the slow builtin XML parser if lxml is available, you would need to check this the moment you need to invoke the parser. And instead of memorizing the availability of lxml at the beginning, I would prefer to try importing and using lxml, except it’s not there, in which case I’d fallback to the builtin xml module.
Most Performant Way To Do Imports
From a performance point of view (time or memory) is it better to do: import pandas as pd or from pandas import DataFrame, TimeSeries Does the best thing to depend on how many classes I'm importing from the package? Similarly, I've seen people do things like: def foo(bar): from numpy import array Why would I ever want to do an import inside a function or method definition? Wouldn't this mean that import is being performed every time that the function is called? Or is this just to avoid namespace collisions?
This is micro-optimising, and you should not worry about this. Modules are loaded once per Python process. All code that then imports only need to bind a name to the module or objects defined in the module. That binding is extremely cheap. Moreover, the top-level code in your module only runs once too, so the binding takes place just once. An import in a function does the binding each time the function is run, but again, this is so cheap as to be negligible. Importing in a function makes a difference for two reasons: it won't put that name in the global namespace for the module (so no namespace pollution), and because the name is now local, using that name is slightly faster than using a global. If you want to improve performance, focus on code that is being repeated many, many times. Importing is not it.
Answering the more general question of when to import, imports are dependancies. It is code that may-or-may-not exist, that is required for the functioning of the program. It is therefore, a very good idea to import that code as soon as possible to prevent dumb errors from cropping up in the middle of execution. This is particularly true as pypy becomes more popular, when the import might exist but isn't usable via pypy. Far better to fail early, than potentially hours into the execution of the code. As for "import pandas as pd" vs "from pandas import DataFrame, TimeSeries", this question has multiple concerns (as all questions do), with some far more important than others. There's the question of namespace, there's the question of readability, and there's the question of performance. Performance, as Martjin states, should contribute to about 0.0001% of the decision. Readability should contribute about 90%. Namespace only 10%, as it can be mitigated so easily. Personally, in my opinion, both import X as Y and form X import Y is bad practice, because explicit is better than implicit. You don't want to be on line 2000 trying to remember which package "calculate_mean" comes from because it isn't referenced anywhere else in the code. When i first started using numpy I was copy/pasting code from the internet, and couldn't figure out why i didn't/couldn't pip install np. This obviously isn't a problem if you have pre-existing knowledge that "np" is python for "numpy", but it's a stupid and pointless confusion for the 3 letters it saves. It came from numpy. Use numpy.
There is an advantage of importing a module inside of a function that hasn't been mentioned yet: doing so gives you some control over when the module is loaded. In fact, even though #J.J's answer recommends importing all modules as early as possible, this control allows you to postpone loading the module. Why would you want to do that? Well, while it doesn't improve the actual performance of your program, doing so can improve the perceived performance, and by virtue of this, the user experience: In part, users perceive whether your app is fast or slow based on how long it takes to start up. MSDN: Best practices for your app's startup performance Loading every module at the beginning of your main script can take some time. For example, one of my apps uses the Qt framework, Pandas, Numpy, and Matplotlib. If all these modules are imported right at the beginning of the app, the appearance of the user interface is delayed by several seconds. Users don't like to wait, and they are likely to perceive your app as generally slow because of this wait. But if for example Matplotlib is imported only from within those functions that are called whenever the user issues a plot command, the startup time is notably reduced. The user doesn't perceive your app to be that sluggish anymore, which may result in a better user experience.
What benefits or disadvantages would importing a module that contains 'import' commands?
If I were to create a module that was called for example imp_mod.py and inside it contained all (subjectively used) relevant modules that I frequently used. Would importing this module into my main program allow me access to the imports contained inside imp_mod.py? If so, what disadvantages would this bring? I guess a major advantage would be a reduction of time spent importing even though its only a couple of seconds saved...
Yes, it would allow you to access them. If you place these imports in imp_mod.py: from os import listdir from collections import defaultdict from copy import deepcopy Then, you could do this in another file, say, myfile.py: import imp_mod imp_mod.listdir imp_mod.defaultdict imp_mod.deepcopy You're wrong about reduction of importing time, as what happens is the opposite. Python will need to import imp_mod and then import the other modules afterwards, while the first import would not be needed if you were importing these modules in myfile.py itself. If you do the same imports in another file, they will already be in cache, so virtually no time is spent in the next import. The real disadvantage here is less readability. Whoever looks at imp_mod.listdir, for example, will ask himself what the heck is this method and why it has the same name as that os module's method. When he had to open imp_mod.py just to find out that it's the same method, well, he probably wouldn't be happy. I wouldn't.
As lucasnadalutti mentioned, you can access them by importing your module. In terms of advantages, it can make your main program care less about where the imports are coming from if the imp_mod handles all imports, however, as your program gets more complex and starts to include more namespaces, this approach can get more messy. You can start to handle a bit of this by using __init__.py within directories to handle imports to do a similar thing, but as things get more complex, personally, I feel it add a little more complexity. I'd rather just know where a module came from to look it up.