Python: imports at the beginning of the main program & PEP 8 - python

The PEP 8 recommends that modules be imported at the beginning of programs.
Now, I feel that importing some of them at the beginning of the main program (i.e., after if __name__ == '__main__') makes sense. For instance, if the main program reads arguments from the command line, I tend to do import sys at the beginning of the main program: this way, sys does not have to be imported when the code is used as a module, since there is no need, in this case, for command line argument access.
How bad is this infringement to PEP 8? should I refrain from doing this? or would it be reasonable to amend PEP 8?

I can't really tell you how bad this is to do.
However, I've greatly improved performance (response time, load) for a web app by importing certain libraries only at the first usage.
BTW, the following is also from PEP 8:
But most importantly: know when to be
inconsistent -- sometimes the style
guide just doesn't apply. When in
doubt, use your best judgment. Look
at other examples and decide what
looks best. And don't hesitate to
ask!

In general I don't think there's much harm in late importing for modules that may not be needed.
However sys I would definitely import early, at the top. It's such a common module that it's quite likely you might use sys elsewhere in your script and not notice that it's not always imported. sys is also one of the modules that always gets loaded by Python itself, so you are not saving any module startup time by avoiding the import (not that there's much startup for sys anyway).

I would recommend you to do what you feel is most appropriate when there is nothing in PEP about your concern.

Importing sys doesn't really take that long that I would worry about it. Some modules do take longer however.
I don't think sys really clogs up the namespace very much. I wouldn't use a variable or class called sys regardless.
If you think it's doing more harm than good to have it at the top, by all means do it however you like. PEP 8 is just a guide line and lots of code you see does not conform to it.

The issue isn't performance.
The issue is clarity.
Your "main" program is only a main program today. Tomorrow, it may be a library included in some higher-level main program. Later, it will be just one module in a bigger package.
Since your "main" program's life may change, you have two responses.
Isolate the "main" things inside if __name__ == "__main__". This is not a grotesque violation of PEP-8. This is a reasonable way to package things.
Try to limit the number of features in your "main" program scripts. Try to keep them down to imports and the if __name__ == "__main__" stuff. If your main script is small, then your import question goes away.

Related

python - import at top of file vs inside a function [duplicate]

PEP 8 states:
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed?
Isn't this:
class SomeClass(object):
def not_often_called(self)
from datetime import datetime
self.datetime = datetime.now()
more efficient than this?
from datetime import datetime
class SomeClass(object):
def not_often_called(self)
self.datetime = datetime.now()
Module importing is quite fast, but not instant. This means that:
Putting the imports at the top of the module is fine, because it's a trivial cost that's only paid once.
Putting the imports within a function will cause calls to that function to take longer.
So if you care about efficiency, put the imports at the top. Only move them into a function if your profiling shows that would help (you did profile to see where best to improve performance, right??)
The best reasons I've seen to perform lazy imports are:
Optional library support. If your code has multiple paths that use different libraries, don't break if an optional library is not installed.
In the __init__.py of a plugin, which might be imported but not actually used. Examples are Bazaar plugins, which use bzrlib's lazy-loading framework.
Putting the import statement inside of a function can prevent circular dependencies.
For example, if you have 2 modules, X.py and Y.py, and they both need to import each other, this will cause a circular dependency when you import one of the modules causing an infinite loop. If you move the import statement in one of the modules then it won't try to import the other module till the function is called, and that module will already be imported, so no infinite loop. Read here for more - effbot.org/zone/import-confusion.htm
I have adopted the practice of putting all imports in the functions that use them, rather than at the top of the module.
The benefit I get is the ability to refactor more reliably. When I move a function from one module to another, I know that the function will continue to work with all of its legacy of testing intact. If I have my imports at the top of the module, when I move a function, I find that I end up spending a lot of time getting the new module's imports complete and minimal. A refactoring IDE might make this irrelevant.
There is a speed penalty as mentioned elsewhere. I have measured this in my application and found it to be insignificant for my purposes.
It is also nice to be able to see all module dependencies up front without resorting to search (e.g. grep). However, the reason I care about module dependencies is generally because I'm installing, refactoring, or moving an entire system comprising multiple files, not just a single module. In that case, I'm going to perform a global search anyway to make sure I have the system-level dependencies. So I have not found global imports to aid my understanding of a system in practice.
I usually put the import of sys inside the if __name__=='__main__' check and then pass arguments (like sys.argv[1:]) to a main() function. This allows me to use main in a context where sys has not been imported.
Most of the time this would be useful for clarity and sensible to do but it's not always the case. Below are a couple of examples of circumstances where module imports might live elsewhere.
Firstly, you could have a module with a unit test of the form:
if __name__ == '__main__':
import foo
aa = foo.xyz() # initiate something for the test
Secondly, you might have a requirement to conditionally import some different module at runtime.
if [condition]:
import foo as plugin_api
else:
import bar as plugin_api
xx = plugin_api.Plugin()
[...]
There are probably other situations where you might place imports in other parts in the code.
The first variant is indeed more efficient than the second when the function is called either zero or one times. With the second and subsequent invocations, however, the "import every call" approach is actually less efficient. See this link for a lazy-loading technique that combines the best of both approaches by doing a "lazy import".
But there are reasons other than efficiency why you might prefer one over the other. One approach is makes it much more clear to someone reading the code as to the dependencies that this module has. They also have very different failure characteristics -- the first will fail at load time if there's no "datetime" module while the second won't fail until the method is called.
Added Note: In IronPython, imports can be quite a bit more expensive than in CPython because the code is basically being compiled as it's being imported.
Curt makes a good point: the second version is clearer and will fail at load time rather than later, and unexpectedly.
Normally I don't worry about the efficiency of loading modules, since it's (a) pretty fast, and (b) mostly only happens at startup.
If you have to load heavyweight modules at unexpected times, it probably makes more sense to load them dynamically with the __import__ function, and be sure to catch ImportError exceptions, and handle them in a reasonable manner.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
In most cases you want to load the modules at the top of the source file. For somebody reading your code, it makes it much easier to tell what function or object came from what module.
One good reason to import a module elsewhere in the code is if it's used in a debugging statement.
For example:
do_something_with_x(x)
I could debug this with:
from pprint import pprint
pprint(x)
do_something_with_x(x)
Of course, the other reason to import modules elsewhere in the code is if you need to dynamically import them. This is because you pretty much don't have any choice.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
Here's an updated summary of the answers to this
and
related
questions.
PEP 8
recommends putting imports at the top.
It's often more convenient to get
ImportErrors
when you first run your program
rather than when your program first calls your function.
Putting imports in the function scope
can help avoid issues with circular imports.
Putting imports in the function scope
helps keep maintain a clean module namespace,
so that it does not appear among tab-completion suggestions.
Start-up time:
imports in a function won't run until (if) that function is called.
Might get significant with heavy-weight libraries.
Even though import statements are super fast on subsequent runs,
they still incur a speed penalty
which can be significant if the function is trivial but frequently in use.
Imports under the __name__ == "__main__" guard seem very reasonable.
Refactoring
might be easier if the imports are located in the function
where they're used (facilitates moving it to another module).
It can also be argued that this is good for readability.
However, most would argue the contrary, i.e.
Imports at the top enhance readability,
since you can see all your dependencies at a glance.
It seems unclear if dynamic or conditional imports favour one style over another.
I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.
If you import at the top, you take the load hit no matter what. That's pretty small, but commonly in the milliseconds, not nanoseconds.
If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn't happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as #aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).
Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don't know why).
0 foo: 14429.0924 µs
1 foo: 63.8962 µs
2 foo: 10.0136 µs
3 foo: 7.1526 µs
4 foo: 7.8678 µs
0 bar: 9.0599 µs
1 bar: 6.9141 µs
2 bar: 7.1526 µs
3 bar: 7.8678 µs
4 bar: 7.1526 µs
The code:
from __future__ import print_function
from time import time
def foo():
import collections
import re
import string
import math
import subprocess
return
def bar():
import collections
import re
import string
import math
import subprocess
return
t0 = time()
for i in xrange(5):
foo()
t1 = time()
print(" %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
for i in xrange(5):
bar()
t1 = time()
print(" %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
It's a tradeoff, that only the programmer can decide to make.
Case 1 saves some memory and startup time by not importing the datetime module (and doing whatever initialization it might require) until needed. Note that doing the import 'only when called' also means doing it 'every time when called', so each call after the first one is still incurring the additional overhead of doing the import.
Case 2 save some execution time and latency by importing datetime beforehand so that not_often_called() will return more quickly when it is called, and also by not incurring the overhead of an import on every call.
Besides efficiency, it's easier to see module dependencies up front if the import statements are ... up front. Hiding them down in the code can make it more difficult to easily find what modules something depends on.
Personally I generally follow the PEP except for things like unit tests and such that I don't want always loaded because I know they aren't going to be used except for test code.
Here's an example where all the imports are at the very top (this is the only time I've needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows.
import os
# ...
try:
kill = os.kill # will raise AttributeError on Windows
from signal import SIGTERM
def terminate(process):
kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
try:
from win32api import TerminateProcess # use win32api if available
def terminate(process):
TerminateProcess(int(process._handle), -1)
except ImportError:
def terminate(process):
raise NotImplementedError # define a dummy function
(On review: what John Millikin said.)
This is like many other optimizations - you sacrifice some readability for speed. As John mentioned, if you've done your profiling homework and found this to be a significantly useful enough change and you need the extra speed, then go for it. It'd probably be good to put a note up with all the other imports:
from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below
Module initialization only occurs once - on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope.
Couple that information with the argument for readability and I would say that it is best to have the import statement at module scope.
Just to complete Moe's answer and the original question:
When we have to deal with circular dependences we can do some "tricks". Assuming we're working with modules a.py and b.py that contain x() and b y(), respectively. Then:
We can move one of the from imports at the bottom of the module.
We can move one of the from imports inside the function or method that is actually requiring the import (this isn't always possible, as you may use it from several places).
We can change one of the two from imports to be an import that looks like: import a
So, to conclude. If you aren't dealing with circular dependencies and doing some kind of trick to avoid them, then it's better to put all your imports at the top because of the reasons already explained in other answers to this question. And please, when doing this "tricks" include a comment, it's always welcome! :)
In addition to the excellent answers already given, it's worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution.
This issue often comes up in Apache Spark's Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It's best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.
I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users.
So my answer is no, do not always put the imports at the top of your modules.
It's interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.
Readability
In addition to startup performance, there is a readability argument to be made for localizing import statements. For example take python line numbers 1283 through 1296 in my current first python project:
listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
str(Gtk.get_minor_version())+"."+
str(Gtk.get_micro_version())])
import xml.etree.ElementTree as ET
xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
result[2]+" "+result[3]])
If the import statement was at the top of file I would have to scroll up a long way, or press Home, to find out what ET was. Then I would have to navigate back to line 1283 to continue reading code.
Indeed even if the import statement was at the top of the function (or class) as many would place it, paging up and back down would be required.
Displaying the Gnome version number will rarely be done so the import at top of file introduces unnecessary startup lag.
There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help.
test.py
X=10
Y=11
Z=12
def add(i):
i = i + 10
runlocal.py
from test import add, X, Y, Z
def callme():
x=X
y=Y
z=Z
ladd=add
for i in range(100000000):
ladd(i)
x+y+z
callme()
run.py
from test import add, X, Y, Z
def callme():
for i in range(100000000):
add(i)
X+Y+Z
callme()
A time on Linux shows a small gain
/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py
0:17.80 real, 17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py
0:14.23 real, 14.22 user, 0.01 sys
real is wall clock. user is time in program. sys is time for system calls.
https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names
I would like to mention a usecase of mine, very similar to those mentioned by #John Millikin and #V.K.:
Optional Imports
I do data analysis with Jupyter Notebook, and I use the same IPython notebook as a template for all analyses. In some occasions, I need to import Tensorflow to do some quick model runs, but sometimes I work in places where tensorflow isn't set up / is slow to import. In those cases, I encapsulate my Tensorflow-dependent operations in a helper function, import tensorflow inside that function, and bind it to a button.
This way, I could do "restart-and-run-all" without having to wait for the import, or having to resume the rest of the cells when it fails.
While PEP encourages importing at the top of a module, it isn't an error to import at other levels. That indicates imports should be at the top, however there are exceptions.
It is a micro-optimization to load modules when they are used. Code that is sluggish importing can be optimized later if it makes a sizable difference.
Still, you might introduce flags to conditionally import at as near to the top as possible, allowing a user to use configuration to import the modules they need while still importing everything immediately.
Importing as soon as possible means the program will fail if any imports (or imports of imports) are missing or have syntax errors. If all imports occur at the top of all modules then python works in two steps. Compile. Run.
Built in modules work anywhere they are imported because they are well designed. Modules you write should be the same. Moving around your imports to the top or to their first use can help ensure there are no side effects and the code is injecting dependencies.
Whether you put imports at the top or not, your code should still work when the imports are at the top. So start by importing immediately then optimize as needed.
This is a fascinating discussion. Like many others I had never even considered this topic. I got cornered into having to have the imports in the functions because of wanting to use the Django ORM in one of my libraries. I was having to call django.setup() before importing my model classes and because this was at the top of the file it was being dragged into completely non-Django library code because of the IoC injector construction.
I kind of hacked around a bit and ended up putting the django.setup() in the singleton constructor and the relevant import at the top of each class method. Now this worked fine but made me uneasy because the imports weren't at the top and also I started worrying about the extra time hit of the imports. Then I came here and read with great interest everybody's take on this.
I have a long C++ background and now use Python/Cython. My take on this is that why not put the imports in the function unless it causes you a profiled bottleneck. It's only like declaring space for variables just before you need them. The trouble is I have thousands of lines of code with all the imports at the top! So I think I will do it from now on and change the odd file here and there when I'm passing through and have the time.

What is the benefit of putting an import statement inside of a class? [duplicate]

I created a module named util that provides classes and functions I often use in Python.
Some of them need imported features. What are the pros and the cons of importing needed things inside class/function definition? Is it better than import at the beginning of a module file? Is it a good idea?
It's the most common style to put every import at the top of the file. PEP 8 recommends it, which is a good reason to do it to start with. But that's not a whim, it has advantages (although not critical enough to make everything else a crime). It allows finding all imports at a glance, as opposed to looking through the whole file. It also ensures everything is imported before any other code (which may depend on some imports) is executed. NameErrors are usually easy to resolve, but they can be annoying.
There's no (significant) namespace pollution to be avoided by keeping the module in a smaller scope, since all you add is the actual module (no, import * doesn't count and probably shouldn't be used anyway). Inside functions, you'd import again on every call (not really harmful since everything is imported once, but uncalled for).
PEP8, the Python style guide, states that:
Imports are always put at the top of
the file, just after any module
comments and docstrings, and before module globals and constants.
Of course this is no hard and fast rule, and imports can go anywhere you want them to. But putting them at the top is the best way to go about it. You can of course import within functions or a class.
But note you cannot do this:
def foo():
from os import *
Because:
SyntaxWarning: import * only allowed at module level
Like flying sheep's answer, I agree that the others are right, but I put imports in other places like in __init__() routines and function calls when I am DEVELOPING code. After my class or function has been tested and proven to work with the import inside of it, I normally give it its own module with the import following PEP8 guidelines. I do this because sometimes I forget to delete imports after refactoring code or removing old code with bad ideas. By keeping the imports inside the class or function under development, I am specifying its dependencies should I want to copy it elsewhere or promote it to its own module...
Only move imports into a local scope, such as inside a function definition, if it’s necessary to solve a problem such as avoiding a circular import or are trying to reduce the initialization time of a module. This technique is especially helpful if many of the imports are unnecessary depending on how the program executes. You may also want to move imports into a function if the modules are only ever used in that function. Note that loading a module the first time may be expensive because of the one time initialization of the module, but loading a module multiple times is virtually free, costing only a couple of dictionary lookups. Even if the module name has gone out of scope, the module is probably available in sys.modules.
https://docs.python.org/3/faq/programming.html#what-are-the-best-practices-for-using-import-in-a-module
I believe that it's best practice (according to some PEP's) that you keep import statements at the beginning of a module. You can add import statements to an __init__.py file, which will import those module to all modules inside the package.
So...it's certainly something you can do the way you're doing it, but it's discouraged and actually unnecessary.
While the other answers are mostly right, there is a reason why python allows this.
It is not smart to import redundant stuff which isn’t needed. So, if you want to e.g. parse XML into an element tree, but don’t want to use the slow builtin XML parser if lxml is available, you would need to check this the moment you need to invoke the parser.
And instead of memorizing the availability of lxml at the beginning, I would prefer to try importing and using lxml, except it’s not there, in which case I’d fallback to the builtin xml module.

Pythonic way to import modules in a packaged .py file [duplicate]

PEP 8 states:
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed?
Isn't this:
class SomeClass(object):
def not_often_called(self)
from datetime import datetime
self.datetime = datetime.now()
more efficient than this?
from datetime import datetime
class SomeClass(object):
def not_often_called(self)
self.datetime = datetime.now()
Module importing is quite fast, but not instant. This means that:
Putting the imports at the top of the module is fine, because it's a trivial cost that's only paid once.
Putting the imports within a function will cause calls to that function to take longer.
So if you care about efficiency, put the imports at the top. Only move them into a function if your profiling shows that would help (you did profile to see where best to improve performance, right??)
The best reasons I've seen to perform lazy imports are:
Optional library support. If your code has multiple paths that use different libraries, don't break if an optional library is not installed.
In the __init__.py of a plugin, which might be imported but not actually used. Examples are Bazaar plugins, which use bzrlib's lazy-loading framework.
Putting the import statement inside of a function can prevent circular dependencies.
For example, if you have 2 modules, X.py and Y.py, and they both need to import each other, this will cause a circular dependency when you import one of the modules causing an infinite loop. If you move the import statement in one of the modules then it won't try to import the other module till the function is called, and that module will already be imported, so no infinite loop. Read here for more - effbot.org/zone/import-confusion.htm
I have adopted the practice of putting all imports in the functions that use them, rather than at the top of the module.
The benefit I get is the ability to refactor more reliably. When I move a function from one module to another, I know that the function will continue to work with all of its legacy of testing intact. If I have my imports at the top of the module, when I move a function, I find that I end up spending a lot of time getting the new module's imports complete and minimal. A refactoring IDE might make this irrelevant.
There is a speed penalty as mentioned elsewhere. I have measured this in my application and found it to be insignificant for my purposes.
It is also nice to be able to see all module dependencies up front without resorting to search (e.g. grep). However, the reason I care about module dependencies is generally because I'm installing, refactoring, or moving an entire system comprising multiple files, not just a single module. In that case, I'm going to perform a global search anyway to make sure I have the system-level dependencies. So I have not found global imports to aid my understanding of a system in practice.
I usually put the import of sys inside the if __name__=='__main__' check and then pass arguments (like sys.argv[1:]) to a main() function. This allows me to use main in a context where sys has not been imported.
Most of the time this would be useful for clarity and sensible to do but it's not always the case. Below are a couple of examples of circumstances where module imports might live elsewhere.
Firstly, you could have a module with a unit test of the form:
if __name__ == '__main__':
import foo
aa = foo.xyz() # initiate something for the test
Secondly, you might have a requirement to conditionally import some different module at runtime.
if [condition]:
import foo as plugin_api
else:
import bar as plugin_api
xx = plugin_api.Plugin()
[...]
There are probably other situations where you might place imports in other parts in the code.
The first variant is indeed more efficient than the second when the function is called either zero or one times. With the second and subsequent invocations, however, the "import every call" approach is actually less efficient. See this link for a lazy-loading technique that combines the best of both approaches by doing a "lazy import".
But there are reasons other than efficiency why you might prefer one over the other. One approach is makes it much more clear to someone reading the code as to the dependencies that this module has. They also have very different failure characteristics -- the first will fail at load time if there's no "datetime" module while the second won't fail until the method is called.
Added Note: In IronPython, imports can be quite a bit more expensive than in CPython because the code is basically being compiled as it's being imported.
Curt makes a good point: the second version is clearer and will fail at load time rather than later, and unexpectedly.
Normally I don't worry about the efficiency of loading modules, since it's (a) pretty fast, and (b) mostly only happens at startup.
If you have to load heavyweight modules at unexpected times, it probably makes more sense to load them dynamically with the __import__ function, and be sure to catch ImportError exceptions, and handle them in a reasonable manner.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
In most cases you want to load the modules at the top of the source file. For somebody reading your code, it makes it much easier to tell what function or object came from what module.
One good reason to import a module elsewhere in the code is if it's used in a debugging statement.
For example:
do_something_with_x(x)
I could debug this with:
from pprint import pprint
pprint(x)
do_something_with_x(x)
Of course, the other reason to import modules elsewhere in the code is if you need to dynamically import them. This is because you pretty much don't have any choice.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
Here's an updated summary of the answers to this
and
related
questions.
PEP 8
recommends putting imports at the top.
It's often more convenient to get
ImportErrors
when you first run your program
rather than when your program first calls your function.
Putting imports in the function scope
can help avoid issues with circular imports.
Putting imports in the function scope
helps keep maintain a clean module namespace,
so that it does not appear among tab-completion suggestions.
Start-up time:
imports in a function won't run until (if) that function is called.
Might get significant with heavy-weight libraries.
Even though import statements are super fast on subsequent runs,
they still incur a speed penalty
which can be significant if the function is trivial but frequently in use.
Imports under the __name__ == "__main__" guard seem very reasonable.
Refactoring
might be easier if the imports are located in the function
where they're used (facilitates moving it to another module).
It can also be argued that this is good for readability.
However, most would argue the contrary, i.e.
Imports at the top enhance readability,
since you can see all your dependencies at a glance.
It seems unclear if dynamic or conditional imports favour one style over another.
I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.
If you import at the top, you take the load hit no matter what. That's pretty small, but commonly in the milliseconds, not nanoseconds.
If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn't happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as #aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).
Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don't know why).
0 foo: 14429.0924 µs
1 foo: 63.8962 µs
2 foo: 10.0136 µs
3 foo: 7.1526 µs
4 foo: 7.8678 µs
0 bar: 9.0599 µs
1 bar: 6.9141 µs
2 bar: 7.1526 µs
3 bar: 7.8678 µs
4 bar: 7.1526 µs
The code:
from __future__ import print_function
from time import time
def foo():
import collections
import re
import string
import math
import subprocess
return
def bar():
import collections
import re
import string
import math
import subprocess
return
t0 = time()
for i in xrange(5):
foo()
t1 = time()
print(" %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
for i in xrange(5):
bar()
t1 = time()
print(" %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
It's a tradeoff, that only the programmer can decide to make.
Case 1 saves some memory and startup time by not importing the datetime module (and doing whatever initialization it might require) until needed. Note that doing the import 'only when called' also means doing it 'every time when called', so each call after the first one is still incurring the additional overhead of doing the import.
Case 2 save some execution time and latency by importing datetime beforehand so that not_often_called() will return more quickly when it is called, and also by not incurring the overhead of an import on every call.
Besides efficiency, it's easier to see module dependencies up front if the import statements are ... up front. Hiding them down in the code can make it more difficult to easily find what modules something depends on.
Personally I generally follow the PEP except for things like unit tests and such that I don't want always loaded because I know they aren't going to be used except for test code.
Here's an example where all the imports are at the very top (this is the only time I've needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows.
import os
# ...
try:
kill = os.kill # will raise AttributeError on Windows
from signal import SIGTERM
def terminate(process):
kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
try:
from win32api import TerminateProcess # use win32api if available
def terminate(process):
TerminateProcess(int(process._handle), -1)
except ImportError:
def terminate(process):
raise NotImplementedError # define a dummy function
(On review: what John Millikin said.)
This is like many other optimizations - you sacrifice some readability for speed. As John mentioned, if you've done your profiling homework and found this to be a significantly useful enough change and you need the extra speed, then go for it. It'd probably be good to put a note up with all the other imports:
from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below
Module initialization only occurs once - on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope.
Couple that information with the argument for readability and I would say that it is best to have the import statement at module scope.
Just to complete Moe's answer and the original question:
When we have to deal with circular dependences we can do some "tricks". Assuming we're working with modules a.py and b.py that contain x() and b y(), respectively. Then:
We can move one of the from imports at the bottom of the module.
We can move one of the from imports inside the function or method that is actually requiring the import (this isn't always possible, as you may use it from several places).
We can change one of the two from imports to be an import that looks like: import a
So, to conclude. If you aren't dealing with circular dependencies and doing some kind of trick to avoid them, then it's better to put all your imports at the top because of the reasons already explained in other answers to this question. And please, when doing this "tricks" include a comment, it's always welcome! :)
In addition to the excellent answers already given, it's worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution.
This issue often comes up in Apache Spark's Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It's best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.
I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users.
So my answer is no, do not always put the imports at the top of your modules.
It's interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.
Readability
In addition to startup performance, there is a readability argument to be made for localizing import statements. For example take python line numbers 1283 through 1296 in my current first python project:
listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
str(Gtk.get_minor_version())+"."+
str(Gtk.get_micro_version())])
import xml.etree.ElementTree as ET
xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
result[2]+" "+result[3]])
If the import statement was at the top of file I would have to scroll up a long way, or press Home, to find out what ET was. Then I would have to navigate back to line 1283 to continue reading code.
Indeed even if the import statement was at the top of the function (or class) as many would place it, paging up and back down would be required.
Displaying the Gnome version number will rarely be done so the import at top of file introduces unnecessary startup lag.
There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help.
test.py
X=10
Y=11
Z=12
def add(i):
i = i + 10
runlocal.py
from test import add, X, Y, Z
def callme():
x=X
y=Y
z=Z
ladd=add
for i in range(100000000):
ladd(i)
x+y+z
callme()
run.py
from test import add, X, Y, Z
def callme():
for i in range(100000000):
add(i)
X+Y+Z
callme()
A time on Linux shows a small gain
/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py
0:17.80 real, 17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py
0:14.23 real, 14.22 user, 0.01 sys
real is wall clock. user is time in program. sys is time for system calls.
https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names
I would like to mention a usecase of mine, very similar to those mentioned by #John Millikin and #V.K.:
Optional Imports
I do data analysis with Jupyter Notebook, and I use the same IPython notebook as a template for all analyses. In some occasions, I need to import Tensorflow to do some quick model runs, but sometimes I work in places where tensorflow isn't set up / is slow to import. In those cases, I encapsulate my Tensorflow-dependent operations in a helper function, import tensorflow inside that function, and bind it to a button.
This way, I could do "restart-and-run-all" without having to wait for the import, or having to resume the rest of the cells when it fails.
While PEP encourages importing at the top of a module, it isn't an error to import at other levels. That indicates imports should be at the top, however there are exceptions.
It is a micro-optimization to load modules when they are used. Code that is sluggish importing can be optimized later if it makes a sizable difference.
Still, you might introduce flags to conditionally import at as near to the top as possible, allowing a user to use configuration to import the modules they need while still importing everything immediately.
Importing as soon as possible means the program will fail if any imports (or imports of imports) are missing or have syntax errors. If all imports occur at the top of all modules then python works in two steps. Compile. Run.
Built in modules work anywhere they are imported because they are well designed. Modules you write should be the same. Moving around your imports to the top or to their first use can help ensure there are no side effects and the code is injecting dependencies.
Whether you put imports at the top or not, your code should still work when the imports are at the top. So start by importing immediately then optimize as needed.
This is a fascinating discussion. Like many others I had never even considered this topic. I got cornered into having to have the imports in the functions because of wanting to use the Django ORM in one of my libraries. I was having to call django.setup() before importing my model classes and because this was at the top of the file it was being dragged into completely non-Django library code because of the IoC injector construction.
I kind of hacked around a bit and ended up putting the django.setup() in the singleton constructor and the relevant import at the top of each class method. Now this worked fine but made me uneasy because the imports weren't at the top and also I started worrying about the extra time hit of the imports. Then I came here and read with great interest everybody's take on this.
I have a long C++ background and now use Python/Cython. My take on this is that why not put the imports in the function unless it causes you a profiled bottleneck. It's only like declaring space for variables just before you need them. The trouble is I have thousands of lines of code with all the imports at the top! So I think I will do it from now on and change the odd file here and there when I'm passing through and have the time.

Importing modules in Python - recommended position

This is a question regarding good python coding and positioning.
I have a somewhat large code for which I use lots of external modules/packages/functions. Currently I load all of them at the very top of the code because that's how I've seen it done. This is bothersome for example when I need to comment out a block of the code for testing because then I need to go up, look for the modules that block was using and comment them out too. I know I don't have to do this last part, but I do it for consistency since I don't like to import things I won't be using.
If the imported modules where listed right above the block that makes use of them, this process would be easier and the code would be easier to follow, at least for me.
My question is whether it is recommended to import all modules at the beginning of the code or should I do it throughout the code as necessary?
Its officially recommended to import at the beginning, see PEP8:
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
Imports should be grouped in the following order:
standard library imports
related third party imports
local application/library specific imports
You should put a blank line between each group of imports.
Put any relevant __all__ specification after the imports.
Well, I would say do it however is easiest for you. But I think following the PEP8 recommendation (already referenced in another answer) is generally the best method. It's easier to see everything your module references this way all in one place, and that's where new programmers will look to add their imports to your code later.
As an aside, this page gives one reason why you might violate this convention, https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Import_Statement_Overhead:
Note that putting an import in a function can speed up the initial loading of the module, especially if the imported module might not be required. This is generally a case of a "lazy" optimization -- avoiding work (importing a module, which can be very expensive) until you are sure it is required.
If the import of a module is expensive (it's huge, or has side effects), then you might want to put it later. In short, if you want to follow this part of 'The Zen of Python':
Although practicality beats purity.
wrt to your import placement, then do so.

Should I use a main() method in a simple Python script?

I have a lot of simple scripts that calculate some stuff or so. They consist of just a single module.
Should I write main methods for them and call them with the if __name__ construct, or just dump it all right in there?
What are the advantages of either method?
I always write a main() function (appropriately named), and put nothing but command-line parsing and a call to main() in the if __name__ == '__main__' block. That's because no matter how silly, trivial, or single-purpose I originally expect that script to be, I always end up wanting to call it from another module at some later date.
Either I take the time to make it an importable module today, or spend extra time to refactor it months later when I want to reuse it for something else.
Always.
Every time.
I've stopped fighting it and started writing my code with that expectation from the start.
Well, if you do this:
# your code
Then import your_module will execute your code. On the contrary, with this:
if __name__ == '__main__':
# your code
The import won't run the code, but targeting the interpreter at that file will.
If the only way the script is ever going to run is by manual interpreter opening, there's absolutely no difference.
This becomes important when you have a library (or reusing the definitions in the script).
Adding code to a library outside a definition, or outside the protection of if __name__ runs the code when importing, letting you initialize stuff that the library needs.
Maybe you want your library to also have some runnable functionality. Maybe testing, or maybe something like Python's SimpleHTTPServer (it comes with some classes, but you can also run the module and it will start a server). You can have that dual behaviour with the if __name__ clause.
Tools like epydoc import the module to access the docstrings, so running the code when you just want to generate HTML documentation is not really the intent.
The other answers are good, but I wanted to add an example of why you might want to be able to import it: unit testing. If you have a few functions and then the if __name__=="__main__":, you can import the module for unit testing. Maybe you're better at this than me, but I find that my "simple scripts that couldn't possibly have any bugs" tend to have bugs that I find with unit testing.
The if __name__ construct will let you easily re-use the functions and classes in the module in other Python scripts. If you don't do that, then everything in the module will run when it is imported.
If there's nothing in the script you want to reuse, then sure, just dump it all in there. I do that sometimes. If you later decide you want to reuse some code, I have found Python to be just about the easiest language in which to refactor code without breaking it.
For a good explanation of the purpose of Python's "main guard" idiom:
What does if __name__ == "__main__": do?

Categories