Related
PEP 8 states:
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed?
Isn't this:
class SomeClass(object):
def not_often_called(self)
from datetime import datetime
self.datetime = datetime.now()
more efficient than this?
from datetime import datetime
class SomeClass(object):
def not_often_called(self)
self.datetime = datetime.now()
Module importing is quite fast, but not instant. This means that:
Putting the imports at the top of the module is fine, because it's a trivial cost that's only paid once.
Putting the imports within a function will cause calls to that function to take longer.
So if you care about efficiency, put the imports at the top. Only move them into a function if your profiling shows that would help (you did profile to see where best to improve performance, right??)
The best reasons I've seen to perform lazy imports are:
Optional library support. If your code has multiple paths that use different libraries, don't break if an optional library is not installed.
In the __init__.py of a plugin, which might be imported but not actually used. Examples are Bazaar plugins, which use bzrlib's lazy-loading framework.
Putting the import statement inside of a function can prevent circular dependencies.
For example, if you have 2 modules, X.py and Y.py, and they both need to import each other, this will cause a circular dependency when you import one of the modules causing an infinite loop. If you move the import statement in one of the modules then it won't try to import the other module till the function is called, and that module will already be imported, so no infinite loop. Read here for more - effbot.org/zone/import-confusion.htm
I have adopted the practice of putting all imports in the functions that use them, rather than at the top of the module.
The benefit I get is the ability to refactor more reliably. When I move a function from one module to another, I know that the function will continue to work with all of its legacy of testing intact. If I have my imports at the top of the module, when I move a function, I find that I end up spending a lot of time getting the new module's imports complete and minimal. A refactoring IDE might make this irrelevant.
There is a speed penalty as mentioned elsewhere. I have measured this in my application and found it to be insignificant for my purposes.
It is also nice to be able to see all module dependencies up front without resorting to search (e.g. grep). However, the reason I care about module dependencies is generally because I'm installing, refactoring, or moving an entire system comprising multiple files, not just a single module. In that case, I'm going to perform a global search anyway to make sure I have the system-level dependencies. So I have not found global imports to aid my understanding of a system in practice.
I usually put the import of sys inside the if __name__=='__main__' check and then pass arguments (like sys.argv[1:]) to a main() function. This allows me to use main in a context where sys has not been imported.
Most of the time this would be useful for clarity and sensible to do but it's not always the case. Below are a couple of examples of circumstances where module imports might live elsewhere.
Firstly, you could have a module with a unit test of the form:
if __name__ == '__main__':
import foo
aa = foo.xyz() # initiate something for the test
Secondly, you might have a requirement to conditionally import some different module at runtime.
if [condition]:
import foo as plugin_api
else:
import bar as plugin_api
xx = plugin_api.Plugin()
[...]
There are probably other situations where you might place imports in other parts in the code.
The first variant is indeed more efficient than the second when the function is called either zero or one times. With the second and subsequent invocations, however, the "import every call" approach is actually less efficient. See this link for a lazy-loading technique that combines the best of both approaches by doing a "lazy import".
But there are reasons other than efficiency why you might prefer one over the other. One approach is makes it much more clear to someone reading the code as to the dependencies that this module has. They also have very different failure characteristics -- the first will fail at load time if there's no "datetime" module while the second won't fail until the method is called.
Added Note: In IronPython, imports can be quite a bit more expensive than in CPython because the code is basically being compiled as it's being imported.
Curt makes a good point: the second version is clearer and will fail at load time rather than later, and unexpectedly.
Normally I don't worry about the efficiency of loading modules, since it's (a) pretty fast, and (b) mostly only happens at startup.
If you have to load heavyweight modules at unexpected times, it probably makes more sense to load them dynamically with the __import__ function, and be sure to catch ImportError exceptions, and handle them in a reasonable manner.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
In most cases you want to load the modules at the top of the source file. For somebody reading your code, it makes it much easier to tell what function or object came from what module.
One good reason to import a module elsewhere in the code is if it's used in a debugging statement.
For example:
do_something_with_x(x)
I could debug this with:
from pprint import pprint
pprint(x)
do_something_with_x(x)
Of course, the other reason to import modules elsewhere in the code is if you need to dynamically import them. This is because you pretty much don't have any choice.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
Here's an updated summary of the answers to this
and
related
questions.
PEP 8
recommends putting imports at the top.
It's often more convenient to get
ImportErrors
when you first run your program
rather than when your program first calls your function.
Putting imports in the function scope
can help avoid issues with circular imports.
Putting imports in the function scope
helps keep maintain a clean module namespace,
so that it does not appear among tab-completion suggestions.
Start-up time:
imports in a function won't run until (if) that function is called.
Might get significant with heavy-weight libraries.
Even though import statements are super fast on subsequent runs,
they still incur a speed penalty
which can be significant if the function is trivial but frequently in use.
Imports under the __name__ == "__main__" guard seem very reasonable.
Refactoring
might be easier if the imports are located in the function
where they're used (facilitates moving it to another module).
It can also be argued that this is good for readability.
However, most would argue the contrary, i.e.
Imports at the top enhance readability,
since you can see all your dependencies at a glance.
It seems unclear if dynamic or conditional imports favour one style over another.
I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.
If you import at the top, you take the load hit no matter what. That's pretty small, but commonly in the milliseconds, not nanoseconds.
If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn't happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as #aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).
Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don't know why).
0 foo: 14429.0924 µs
1 foo: 63.8962 µs
2 foo: 10.0136 µs
3 foo: 7.1526 µs
4 foo: 7.8678 µs
0 bar: 9.0599 µs
1 bar: 6.9141 µs
2 bar: 7.1526 µs
3 bar: 7.8678 µs
4 bar: 7.1526 µs
The code:
from __future__ import print_function
from time import time
def foo():
import collections
import re
import string
import math
import subprocess
return
def bar():
import collections
import re
import string
import math
import subprocess
return
t0 = time()
for i in xrange(5):
foo()
t1 = time()
print(" %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
for i in xrange(5):
bar()
t1 = time()
print(" %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
It's a tradeoff, that only the programmer can decide to make.
Case 1 saves some memory and startup time by not importing the datetime module (and doing whatever initialization it might require) until needed. Note that doing the import 'only when called' also means doing it 'every time when called', so each call after the first one is still incurring the additional overhead of doing the import.
Case 2 save some execution time and latency by importing datetime beforehand so that not_often_called() will return more quickly when it is called, and also by not incurring the overhead of an import on every call.
Besides efficiency, it's easier to see module dependencies up front if the import statements are ... up front. Hiding them down in the code can make it more difficult to easily find what modules something depends on.
Personally I generally follow the PEP except for things like unit tests and such that I don't want always loaded because I know they aren't going to be used except for test code.
Here's an example where all the imports are at the very top (this is the only time I've needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows.
import os
# ...
try:
kill = os.kill # will raise AttributeError on Windows
from signal import SIGTERM
def terminate(process):
kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
try:
from win32api import TerminateProcess # use win32api if available
def terminate(process):
TerminateProcess(int(process._handle), -1)
except ImportError:
def terminate(process):
raise NotImplementedError # define a dummy function
(On review: what John Millikin said.)
This is like many other optimizations - you sacrifice some readability for speed. As John mentioned, if you've done your profiling homework and found this to be a significantly useful enough change and you need the extra speed, then go for it. It'd probably be good to put a note up with all the other imports:
from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below
Module initialization only occurs once - on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope.
Couple that information with the argument for readability and I would say that it is best to have the import statement at module scope.
Just to complete Moe's answer and the original question:
When we have to deal with circular dependences we can do some "tricks". Assuming we're working with modules a.py and b.py that contain x() and b y(), respectively. Then:
We can move one of the from imports at the bottom of the module.
We can move one of the from imports inside the function or method that is actually requiring the import (this isn't always possible, as you may use it from several places).
We can change one of the two from imports to be an import that looks like: import a
So, to conclude. If you aren't dealing with circular dependencies and doing some kind of trick to avoid them, then it's better to put all your imports at the top because of the reasons already explained in other answers to this question. And please, when doing this "tricks" include a comment, it's always welcome! :)
In addition to the excellent answers already given, it's worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution.
This issue often comes up in Apache Spark's Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It's best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.
I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users.
So my answer is no, do not always put the imports at the top of your modules.
It's interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.
Readability
In addition to startup performance, there is a readability argument to be made for localizing import statements. For example take python line numbers 1283 through 1296 in my current first python project:
listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
str(Gtk.get_minor_version())+"."+
str(Gtk.get_micro_version())])
import xml.etree.ElementTree as ET
xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
result[2]+" "+result[3]])
If the import statement was at the top of file I would have to scroll up a long way, or press Home, to find out what ET was. Then I would have to navigate back to line 1283 to continue reading code.
Indeed even if the import statement was at the top of the function (or class) as many would place it, paging up and back down would be required.
Displaying the Gnome version number will rarely be done so the import at top of file introduces unnecessary startup lag.
There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help.
test.py
X=10
Y=11
Z=12
def add(i):
i = i + 10
runlocal.py
from test import add, X, Y, Z
def callme():
x=X
y=Y
z=Z
ladd=add
for i in range(100000000):
ladd(i)
x+y+z
callme()
run.py
from test import add, X, Y, Z
def callme():
for i in range(100000000):
add(i)
X+Y+Z
callme()
A time on Linux shows a small gain
/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py
0:17.80 real, 17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py
0:14.23 real, 14.22 user, 0.01 sys
real is wall clock. user is time in program. sys is time for system calls.
https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names
I would like to mention a usecase of mine, very similar to those mentioned by #John Millikin and #V.K.:
Optional Imports
I do data analysis with Jupyter Notebook, and I use the same IPython notebook as a template for all analyses. In some occasions, I need to import Tensorflow to do some quick model runs, but sometimes I work in places where tensorflow isn't set up / is slow to import. In those cases, I encapsulate my Tensorflow-dependent operations in a helper function, import tensorflow inside that function, and bind it to a button.
This way, I could do "restart-and-run-all" without having to wait for the import, or having to resume the rest of the cells when it fails.
While PEP encourages importing at the top of a module, it isn't an error to import at other levels. That indicates imports should be at the top, however there are exceptions.
It is a micro-optimization to load modules when they are used. Code that is sluggish importing can be optimized later if it makes a sizable difference.
Still, you might introduce flags to conditionally import at as near to the top as possible, allowing a user to use configuration to import the modules they need while still importing everything immediately.
Importing as soon as possible means the program will fail if any imports (or imports of imports) are missing or have syntax errors. If all imports occur at the top of all modules then python works in two steps. Compile. Run.
Built in modules work anywhere they are imported because they are well designed. Modules you write should be the same. Moving around your imports to the top or to their first use can help ensure there are no side effects and the code is injecting dependencies.
Whether you put imports at the top or not, your code should still work when the imports are at the top. So start by importing immediately then optimize as needed.
This is a fascinating discussion. Like many others I had never even considered this topic. I got cornered into having to have the imports in the functions because of wanting to use the Django ORM in one of my libraries. I was having to call django.setup() before importing my model classes and because this was at the top of the file it was being dragged into completely non-Django library code because of the IoC injector construction.
I kind of hacked around a bit and ended up putting the django.setup() in the singleton constructor and the relevant import at the top of each class method. Now this worked fine but made me uneasy because the imports weren't at the top and also I started worrying about the extra time hit of the imports. Then I came here and read with great interest everybody's take on this.
I have a long C++ background and now use Python/Cython. My take on this is that why not put the imports in the function unless it causes you a profiled bottleneck. It's only like declaring space for variables just before you need them. The trouble is I have thousands of lines of code with all the imports at the top! So I think I will do it from now on and change the odd file here and there when I'm passing through and have the time.
I created a module named util that provides classes and functions I often use in Python.
Some of them need imported features. What are the pros and the cons of importing needed things inside class/function definition? Is it better than import at the beginning of a module file? Is it a good idea?
It's the most common style to put every import at the top of the file. PEP 8 recommends it, which is a good reason to do it to start with. But that's not a whim, it has advantages (although not critical enough to make everything else a crime). It allows finding all imports at a glance, as opposed to looking through the whole file. It also ensures everything is imported before any other code (which may depend on some imports) is executed. NameErrors are usually easy to resolve, but they can be annoying.
There's no (significant) namespace pollution to be avoided by keeping the module in a smaller scope, since all you add is the actual module (no, import * doesn't count and probably shouldn't be used anyway). Inside functions, you'd import again on every call (not really harmful since everything is imported once, but uncalled for).
PEP8, the Python style guide, states that:
Imports are always put at the top of
the file, just after any module
comments and docstrings, and before module globals and constants.
Of course this is no hard and fast rule, and imports can go anywhere you want them to. But putting them at the top is the best way to go about it. You can of course import within functions or a class.
But note you cannot do this:
def foo():
from os import *
Because:
SyntaxWarning: import * only allowed at module level
Like flying sheep's answer, I agree that the others are right, but I put imports in other places like in __init__() routines and function calls when I am DEVELOPING code. After my class or function has been tested and proven to work with the import inside of it, I normally give it its own module with the import following PEP8 guidelines. I do this because sometimes I forget to delete imports after refactoring code or removing old code with bad ideas. By keeping the imports inside the class or function under development, I am specifying its dependencies should I want to copy it elsewhere or promote it to its own module...
Only move imports into a local scope, such as inside a function definition, if it’s necessary to solve a problem such as avoiding a circular import or are trying to reduce the initialization time of a module. This technique is especially helpful if many of the imports are unnecessary depending on how the program executes. You may also want to move imports into a function if the modules are only ever used in that function. Note that loading a module the first time may be expensive because of the one time initialization of the module, but loading a module multiple times is virtually free, costing only a couple of dictionary lookups. Even if the module name has gone out of scope, the module is probably available in sys.modules.
https://docs.python.org/3/faq/programming.html#what-are-the-best-practices-for-using-import-in-a-module
I believe that it's best practice (according to some PEP's) that you keep import statements at the beginning of a module. You can add import statements to an __init__.py file, which will import those module to all modules inside the package.
So...it's certainly something you can do the way you're doing it, but it's discouraged and actually unnecessary.
While the other answers are mostly right, there is a reason why python allows this.
It is not smart to import redundant stuff which isn’t needed. So, if you want to e.g. parse XML into an element tree, but don’t want to use the slow builtin XML parser if lxml is available, you would need to check this the moment you need to invoke the parser.
And instead of memorizing the availability of lxml at the beginning, I would prefer to try importing and using lxml, except it’s not there, in which case I’d fallback to the builtin xml module.
This question already has answers here:
Should import statements always be at the top of a module?
(22 answers)
Closed 8 years ago.
What are the pros and cons of importing a Python module and/or function inside of a function, with respect to efficiency of speed and of memory?
Does it re-import every time the function is run, or perhaps just once at the beginning whether or not the function is run?
Does it re-import every time the function is run?
No; or rather, Python modules are essentially cached every time they are imported, so importing a second (or third, or fourth...) time doesn't actually force them to go through the whole import process again. 1
Does it import once at the beginning whether or not the function is run?
No, it is only imported if and when the function is executed. 2, 3
As for the benefits: it depends, I guess. If you may only run a function very rarely and don't need the module imported anywhere else, it may be beneficial to only import it in that function. Or if there is a name clash or other reason you don't want the module or symbols from the module available everywhere, you may only want to import it in a specific function. (Of course, there's always from my_module import my_function as f for those cases.)
In general practice, it's probably not that beneficial. In fact, most Python style guides encourage programmers to place all imports at the beginning of the module file.
The very first time you import goo from anywhere (inside or outside a function), goo.py (or other importable form) is loaded and sys.modules['goo'] is set to the module object thus built. Any future import within the same run of the program (again, whether inside or outside a function) just look up sys.modules['goo'] and bind it to barename goo in the appropriate scope. The dict lookup and name binding are very fast operations.
Assuming the very first import gets totally amortized over the program's run anyway, having the "appropriate scope" be module-level means each use of goo.this, goo.that, etc, is two dict lookups -- one for goo and one for the attribute name. Having it be "function level" pays one extra local-variable setting per run of the function (even faster than the dictionary lookup part!) but saves one dict lookup (exchanging it for a local-variable lookup, blazingly fast) for each goo.this (etc) access, basically halving the time such lookups take.
We're talking about a few nanoseconds one way or another, so it's hardly a worthwhile optimization. The one potentially substantial advantage of having the import within a function is when that function may well not be needed at all in a given run of the program, e.g., that function deals with errors, anomalies, and rare situations in general; if that's the case, any run that does not need the functionality will not even perform the import (and that's a saving of microseconds, not just nanoseconds), only runs that do need the functionality will pay the (modest but measurable) price.
It's still an optimization that's only worthwhile in pretty extreme situations, and there are many others I would consider before trying to squeeze out microseconds in this way.
It imports once when the function executes first time.
Pros:
imports related to the function they're used in
easy to move functions around the package
Cons:
couldn't see what modules this module might depend on
Might I suggest in general that instead of asking, "Will X improve my performance?" you use profiling to determine where your program is actually spending its time and then apply optimizations according to where you'll get the most benefit?
And then you can use profiling to assure that your optimizations have actually benefited you, too.
Importing inside a function will effectively import the module once.. the first time the function is run.
It ought to import just as fast whether you import it at the top, or when the function is run. This isn't generally a good reason to import in a def. Pros? It won't be imported if the function isn't called.. This is actually a reasonable reason if your module only requires the user to have a certain module installed if they use specific functions of yours...
If that's not he reason you're doing this, it's almost certainly a yucky idea.
It imports once when the function is called for the first time.
I could imagine doing it this way if I had a function in an imported module that is used very seldomly and is the only one requiring the import. Looks rather far-fetched, though...
PEP 8 states:
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed?
Isn't this:
class SomeClass(object):
def not_often_called(self)
from datetime import datetime
self.datetime = datetime.now()
more efficient than this?
from datetime import datetime
class SomeClass(object):
def not_often_called(self)
self.datetime = datetime.now()
Module importing is quite fast, but not instant. This means that:
Putting the imports at the top of the module is fine, because it's a trivial cost that's only paid once.
Putting the imports within a function will cause calls to that function to take longer.
So if you care about efficiency, put the imports at the top. Only move them into a function if your profiling shows that would help (you did profile to see where best to improve performance, right??)
The best reasons I've seen to perform lazy imports are:
Optional library support. If your code has multiple paths that use different libraries, don't break if an optional library is not installed.
In the __init__.py of a plugin, which might be imported but not actually used. Examples are Bazaar plugins, which use bzrlib's lazy-loading framework.
Putting the import statement inside of a function can prevent circular dependencies.
For example, if you have 2 modules, X.py and Y.py, and they both need to import each other, this will cause a circular dependency when you import one of the modules causing an infinite loop. If you move the import statement in one of the modules then it won't try to import the other module till the function is called, and that module will already be imported, so no infinite loop. Read here for more - effbot.org/zone/import-confusion.htm
I have adopted the practice of putting all imports in the functions that use them, rather than at the top of the module.
The benefit I get is the ability to refactor more reliably. When I move a function from one module to another, I know that the function will continue to work with all of its legacy of testing intact. If I have my imports at the top of the module, when I move a function, I find that I end up spending a lot of time getting the new module's imports complete and minimal. A refactoring IDE might make this irrelevant.
There is a speed penalty as mentioned elsewhere. I have measured this in my application and found it to be insignificant for my purposes.
It is also nice to be able to see all module dependencies up front without resorting to search (e.g. grep). However, the reason I care about module dependencies is generally because I'm installing, refactoring, or moving an entire system comprising multiple files, not just a single module. In that case, I'm going to perform a global search anyway to make sure I have the system-level dependencies. So I have not found global imports to aid my understanding of a system in practice.
I usually put the import of sys inside the if __name__=='__main__' check and then pass arguments (like sys.argv[1:]) to a main() function. This allows me to use main in a context where sys has not been imported.
Most of the time this would be useful for clarity and sensible to do but it's not always the case. Below are a couple of examples of circumstances where module imports might live elsewhere.
Firstly, you could have a module with a unit test of the form:
if __name__ == '__main__':
import foo
aa = foo.xyz() # initiate something for the test
Secondly, you might have a requirement to conditionally import some different module at runtime.
if [condition]:
import foo as plugin_api
else:
import bar as plugin_api
xx = plugin_api.Plugin()
[...]
There are probably other situations where you might place imports in other parts in the code.
The first variant is indeed more efficient than the second when the function is called either zero or one times. With the second and subsequent invocations, however, the "import every call" approach is actually less efficient. See this link for a lazy-loading technique that combines the best of both approaches by doing a "lazy import".
But there are reasons other than efficiency why you might prefer one over the other. One approach is makes it much more clear to someone reading the code as to the dependencies that this module has. They also have very different failure characteristics -- the first will fail at load time if there's no "datetime" module while the second won't fail until the method is called.
Added Note: In IronPython, imports can be quite a bit more expensive than in CPython because the code is basically being compiled as it's being imported.
Curt makes a good point: the second version is clearer and will fail at load time rather than later, and unexpectedly.
Normally I don't worry about the efficiency of loading modules, since it's (a) pretty fast, and (b) mostly only happens at startup.
If you have to load heavyweight modules at unexpected times, it probably makes more sense to load them dynamically with the __import__ function, and be sure to catch ImportError exceptions, and handle them in a reasonable manner.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
In most cases you want to load the modules at the top of the source file. For somebody reading your code, it makes it much easier to tell what function or object came from what module.
One good reason to import a module elsewhere in the code is if it's used in a debugging statement.
For example:
do_something_with_x(x)
I could debug this with:
from pprint import pprint
pprint(x)
do_something_with_x(x)
Of course, the other reason to import modules elsewhere in the code is if you need to dynamically import them. This is because you pretty much don't have any choice.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
Here's an updated summary of the answers to this
and
related
questions.
PEP 8
recommends putting imports at the top.
It's often more convenient to get
ImportErrors
when you first run your program
rather than when your program first calls your function.
Putting imports in the function scope
can help avoid issues with circular imports.
Putting imports in the function scope
helps keep maintain a clean module namespace,
so that it does not appear among tab-completion suggestions.
Start-up time:
imports in a function won't run until (if) that function is called.
Might get significant with heavy-weight libraries.
Even though import statements are super fast on subsequent runs,
they still incur a speed penalty
which can be significant if the function is trivial but frequently in use.
Imports under the __name__ == "__main__" guard seem very reasonable.
Refactoring
might be easier if the imports are located in the function
where they're used (facilitates moving it to another module).
It can also be argued that this is good for readability.
However, most would argue the contrary, i.e.
Imports at the top enhance readability,
since you can see all your dependencies at a glance.
It seems unclear if dynamic or conditional imports favour one style over another.
I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.
If you import at the top, you take the load hit no matter what. That's pretty small, but commonly in the milliseconds, not nanoseconds.
If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn't happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as #aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).
Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don't know why).
0 foo: 14429.0924 µs
1 foo: 63.8962 µs
2 foo: 10.0136 µs
3 foo: 7.1526 µs
4 foo: 7.8678 µs
0 bar: 9.0599 µs
1 bar: 6.9141 µs
2 bar: 7.1526 µs
3 bar: 7.8678 µs
4 bar: 7.1526 µs
The code:
from __future__ import print_function
from time import time
def foo():
import collections
import re
import string
import math
import subprocess
return
def bar():
import collections
import re
import string
import math
import subprocess
return
t0 = time()
for i in xrange(5):
foo()
t1 = time()
print(" %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
for i in xrange(5):
bar()
t1 = time()
print(" %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
It's a tradeoff, that only the programmer can decide to make.
Case 1 saves some memory and startup time by not importing the datetime module (and doing whatever initialization it might require) until needed. Note that doing the import 'only when called' also means doing it 'every time when called', so each call after the first one is still incurring the additional overhead of doing the import.
Case 2 save some execution time and latency by importing datetime beforehand so that not_often_called() will return more quickly when it is called, and also by not incurring the overhead of an import on every call.
Besides efficiency, it's easier to see module dependencies up front if the import statements are ... up front. Hiding them down in the code can make it more difficult to easily find what modules something depends on.
Personally I generally follow the PEP except for things like unit tests and such that I don't want always loaded because I know they aren't going to be used except for test code.
Here's an example where all the imports are at the very top (this is the only time I've needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows.
import os
# ...
try:
kill = os.kill # will raise AttributeError on Windows
from signal import SIGTERM
def terminate(process):
kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
try:
from win32api import TerminateProcess # use win32api if available
def terminate(process):
TerminateProcess(int(process._handle), -1)
except ImportError:
def terminate(process):
raise NotImplementedError # define a dummy function
(On review: what John Millikin said.)
This is like many other optimizations - you sacrifice some readability for speed. As John mentioned, if you've done your profiling homework and found this to be a significantly useful enough change and you need the extra speed, then go for it. It'd probably be good to put a note up with all the other imports:
from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below
Module initialization only occurs once - on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope.
Couple that information with the argument for readability and I would say that it is best to have the import statement at module scope.
Just to complete Moe's answer and the original question:
When we have to deal with circular dependences we can do some "tricks". Assuming we're working with modules a.py and b.py that contain x() and b y(), respectively. Then:
We can move one of the from imports at the bottom of the module.
We can move one of the from imports inside the function or method that is actually requiring the import (this isn't always possible, as you may use it from several places).
We can change one of the two from imports to be an import that looks like: import a
So, to conclude. If you aren't dealing with circular dependencies and doing some kind of trick to avoid them, then it's better to put all your imports at the top because of the reasons already explained in other answers to this question. And please, when doing this "tricks" include a comment, it's always welcome! :)
In addition to the excellent answers already given, it's worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution.
This issue often comes up in Apache Spark's Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It's best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.
I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users.
So my answer is no, do not always put the imports at the top of your modules.
It's interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.
Readability
In addition to startup performance, there is a readability argument to be made for localizing import statements. For example take python line numbers 1283 through 1296 in my current first python project:
listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
str(Gtk.get_minor_version())+"."+
str(Gtk.get_micro_version())])
import xml.etree.ElementTree as ET
xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
result[2]+" "+result[3]])
If the import statement was at the top of file I would have to scroll up a long way, or press Home, to find out what ET was. Then I would have to navigate back to line 1283 to continue reading code.
Indeed even if the import statement was at the top of the function (or class) as many would place it, paging up and back down would be required.
Displaying the Gnome version number will rarely be done so the import at top of file introduces unnecessary startup lag.
There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help.
test.py
X=10
Y=11
Z=12
def add(i):
i = i + 10
runlocal.py
from test import add, X, Y, Z
def callme():
x=X
y=Y
z=Z
ladd=add
for i in range(100000000):
ladd(i)
x+y+z
callme()
run.py
from test import add, X, Y, Z
def callme():
for i in range(100000000):
add(i)
X+Y+Z
callme()
A time on Linux shows a small gain
/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py
0:17.80 real, 17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py
0:14.23 real, 14.22 user, 0.01 sys
real is wall clock. user is time in program. sys is time for system calls.
https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names
I would like to mention a usecase of mine, very similar to those mentioned by #John Millikin and #V.K.:
Optional Imports
I do data analysis with Jupyter Notebook, and I use the same IPython notebook as a template for all analyses. In some occasions, I need to import Tensorflow to do some quick model runs, but sometimes I work in places where tensorflow isn't set up / is slow to import. In those cases, I encapsulate my Tensorflow-dependent operations in a helper function, import tensorflow inside that function, and bind it to a button.
This way, I could do "restart-and-run-all" without having to wait for the import, or having to resume the rest of the cells when it fails.
While PEP encourages importing at the top of a module, it isn't an error to import at other levels. That indicates imports should be at the top, however there are exceptions.
It is a micro-optimization to load modules when they are used. Code that is sluggish importing can be optimized later if it makes a sizable difference.
Still, you might introduce flags to conditionally import at as near to the top as possible, allowing a user to use configuration to import the modules they need while still importing everything immediately.
Importing as soon as possible means the program will fail if any imports (or imports of imports) are missing or have syntax errors. If all imports occur at the top of all modules then python works in two steps. Compile. Run.
Built in modules work anywhere they are imported because they are well designed. Modules you write should be the same. Moving around your imports to the top or to their first use can help ensure there are no side effects and the code is injecting dependencies.
Whether you put imports at the top or not, your code should still work when the imports are at the top. So start by importing immediately then optimize as needed.
This is a fascinating discussion. Like many others I had never even considered this topic. I got cornered into having to have the imports in the functions because of wanting to use the Django ORM in one of my libraries. I was having to call django.setup() before importing my model classes and because this was at the top of the file it was being dragged into completely non-Django library code because of the IoC injector construction.
I kind of hacked around a bit and ended up putting the django.setup() in the singleton constructor and the relevant import at the top of each class method. Now this worked fine but made me uneasy because the imports weren't at the top and also I started worrying about the extra time hit of the imports. Then I came here and read with great interest everybody's take on this.
I have a long C++ background and now use Python/Cython. My take on this is that why not put the imports in the function unless it causes you a profiled bottleneck. It's only like declaring space for variables just before you need them. The trouble is I have thousands of lines of code with all the imports at the top! So I think I will do it from now on and change the odd file here and there when I'm passing through and have the time.
I am relatively new to Python having used C# for many years and I'm hoping someone can help me with this question. I have a module called actuators.py that contains a number of classes for defining properties and methods for the servos I use in a robot project. In another module called robot.py, I instantiate my actuators like this:
import actuators as Actuators
myActuators = Actuators.AllActuators()
This allows me to access the properties of my servos as long as I am in the robot.py module. For example, I can write:
print myActuators.HeadTilt.MinPosition
to get the minimum value allowed for the servo that tilts the robot's head. So far so good.
Now I want to access these same values in a separate thread that is defined by a different module called tilt_head.py. I assume I need to import a reference to the robot.py module, but doing this ends up re-executing all the code in robot.py whereas all I really want is a static reference to the myActuators object. And I can't use
from robot.py import myActuators
because myActuators is not a module.
In C#, I would do this using a declaration like this:
public static Actuators myActuators;
which then allows me to reference myActuators in any other file within my project. Is there a way to do something similar in Python? If you need my actual code, I will be happy to post it.
Thanks!
And I can't use
from robot.py import myActuators
because myActuators is not a module.
But myActuators doesn't need to be a module. You can do exactly that. (Though you'll want to use just robot rather than robot.py)
http://docs.python.org/reference/simple_stmts.html#import
As well:
http://docs.python.org/tutorial/modules.html#more-on-modules
"A module can contain executable statements as well as function definitions. These statements are intended to initialize the module. They are executed only the first time the module is imported somewhere."
So using a script that imports robot and then tilt_head, the executable stuff in robot.py will NOT be run multiple times.
Of course, if robot.py is intended to be the main module of the program, then I'd suggest going with A. Levy's answer.
You should be able to simply use:
from robot import myActuators
You can't say "from robot.py" because the name there is a module name, not a file name.
BTW: I'm not sure why you are doing import actuators as Actuators. Why do you want to change the case? Modules are most commonly lowercase.
As others have been saying, you can actually import myActuators from the robot module. To solve your problem of the code in robot being re-executed, the standard Python approach is to wrap the stand-alone code in an if __name__ == '__main__': so that it will only be executed when the module is used as a standalone app, but not when it is imported.
E.G.
# Things that are defined when robot.py is used as a module
# and when used standalone.
import actuators as Actuators
myActuators = Actuators.AllActuators()
if __name__ == '__main__':
# Stuff that you only want executed when running robot.py
Then when you want to use robot.py in another module, you should be able to do this:
from robot import myActuators
EDIT:
Actually, as JAB pointed out, I was a little confused in my explanation of conditional execution. your code won't be re-executed if you reload the module. The module loading machinery handles reloading of modules to make sure that they only get loaded and defined one time. All subsequent loads are references to the previously loaded module. If, however, you do have some steps that you only want to happen when you are running the module as a script, then you can place them in the if block and they won't be executed when you load robots from another module.
I guess it doesn't really solve your problem, though it seems like your problem was really an incorrect import syntax.
I would guess that you can access that object as robot.myActuators because it already exists in the process, inside the robot.py namespace.
Many thanks for all the answers and pointers. As it turns out, my problem was a recursive import that ended up trying to create an object more than once. By sorting out the instance creation a little more logically, I was finally able to make the problem go away.