Difference between importing python library within function versus importing globally? - python

Suppose I want to import a python library for use inside a function. Is it better to import the library within the function or import it globally?
Do this
def test_func:
import pandas as pd
# code implementation
or have the line below at the top of the python file to import globally?
import pandas as pd
What are the pros and cons of each approach? Which is the best practice in python?
I am using python v3.6
EDIT: Some clarifications to make.
Suppose I have 2 functions.
def func1:
import pandas as pd
# code implementation
def func2:
import pandas as pd
# code implementation
The python script runs both functions. Will the library be imported twice or is the python compiler smart enough to import it only once? This has performance implications.

It's a difference in name-visibility and execution time-point. The module-level import is imported when the file you are loading is imported or run itself. The function local one obviously only if the function is run. The imported names are either visible to all things in the file, or just within the function the import is executed in.
As there is a cost for hitting the import statement (albeit a small one, but still), the local one will always execute, not just once. It will not fully re-import the module though, python caches modules once they are imported the first time (see reload and sys.modules).
The best practice clearly is to use module level imports, and that's what you see in 99.999% of code. A huge reason is maintainability - if you want to understand what dependencies a module has, it's convenient to just look at the top, instead of having to comb through all code.
So when to use function local imports?
There are three scenarios:
you can't use the import earlier. This happens when e.g. a backend for a db or other system/functionality is chosen at runtime through configuration or system inspection.
you otherwise have circular imports. This is a rare case and also a code-smell, so if that is necessary, consider refactoring.
reducing startup-time by deferring module imports. This is very rarely useful though.
So for your case, the answer is a quick and simple "don't do it".

The module will be loaded when you import it, so if you need to import a rarely used module but cost a lot of time to initialize, you should import it when you need it.
Actually, if we just care about performance but not readability, it maybe always better to import module when we really need it.
But we need to keep our program maintainable. Importing all modules on the top is the most explicit way to tell others and author himself which modules are used.
To sum up, if you really have a very costly but rarely used module, you should import it locally. Otherwise you should import them on the top.

Related

python - import at top of file vs inside a function [duplicate]

PEP 8 states:
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed?
Isn't this:
class SomeClass(object):
def not_often_called(self)
from datetime import datetime
self.datetime = datetime.now()
more efficient than this?
from datetime import datetime
class SomeClass(object):
def not_often_called(self)
self.datetime = datetime.now()
Module importing is quite fast, but not instant. This means that:
Putting the imports at the top of the module is fine, because it's a trivial cost that's only paid once.
Putting the imports within a function will cause calls to that function to take longer.
So if you care about efficiency, put the imports at the top. Only move them into a function if your profiling shows that would help (you did profile to see where best to improve performance, right??)
The best reasons I've seen to perform lazy imports are:
Optional library support. If your code has multiple paths that use different libraries, don't break if an optional library is not installed.
In the __init__.py of a plugin, which might be imported but not actually used. Examples are Bazaar plugins, which use bzrlib's lazy-loading framework.
Putting the import statement inside of a function can prevent circular dependencies.
For example, if you have 2 modules, X.py and Y.py, and they both need to import each other, this will cause a circular dependency when you import one of the modules causing an infinite loop. If you move the import statement in one of the modules then it won't try to import the other module till the function is called, and that module will already be imported, so no infinite loop. Read here for more - effbot.org/zone/import-confusion.htm
I have adopted the practice of putting all imports in the functions that use them, rather than at the top of the module.
The benefit I get is the ability to refactor more reliably. When I move a function from one module to another, I know that the function will continue to work with all of its legacy of testing intact. If I have my imports at the top of the module, when I move a function, I find that I end up spending a lot of time getting the new module's imports complete and minimal. A refactoring IDE might make this irrelevant.
There is a speed penalty as mentioned elsewhere. I have measured this in my application and found it to be insignificant for my purposes.
It is also nice to be able to see all module dependencies up front without resorting to search (e.g. grep). However, the reason I care about module dependencies is generally because I'm installing, refactoring, or moving an entire system comprising multiple files, not just a single module. In that case, I'm going to perform a global search anyway to make sure I have the system-level dependencies. So I have not found global imports to aid my understanding of a system in practice.
I usually put the import of sys inside the if __name__=='__main__' check and then pass arguments (like sys.argv[1:]) to a main() function. This allows me to use main in a context where sys has not been imported.
Most of the time this would be useful for clarity and sensible to do but it's not always the case. Below are a couple of examples of circumstances where module imports might live elsewhere.
Firstly, you could have a module with a unit test of the form:
if __name__ == '__main__':
import foo
aa = foo.xyz() # initiate something for the test
Secondly, you might have a requirement to conditionally import some different module at runtime.
if [condition]:
import foo as plugin_api
else:
import bar as plugin_api
xx = plugin_api.Plugin()
[...]
There are probably other situations where you might place imports in other parts in the code.
The first variant is indeed more efficient than the second when the function is called either zero or one times. With the second and subsequent invocations, however, the "import every call" approach is actually less efficient. See this link for a lazy-loading technique that combines the best of both approaches by doing a "lazy import".
But there are reasons other than efficiency why you might prefer one over the other. One approach is makes it much more clear to someone reading the code as to the dependencies that this module has. They also have very different failure characteristics -- the first will fail at load time if there's no "datetime" module while the second won't fail until the method is called.
Added Note: In IronPython, imports can be quite a bit more expensive than in CPython because the code is basically being compiled as it's being imported.
Curt makes a good point: the second version is clearer and will fail at load time rather than later, and unexpectedly.
Normally I don't worry about the efficiency of loading modules, since it's (a) pretty fast, and (b) mostly only happens at startup.
If you have to load heavyweight modules at unexpected times, it probably makes more sense to load them dynamically with the __import__ function, and be sure to catch ImportError exceptions, and handle them in a reasonable manner.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
In most cases you want to load the modules at the top of the source file. For somebody reading your code, it makes it much easier to tell what function or object came from what module.
One good reason to import a module elsewhere in the code is if it's used in a debugging statement.
For example:
do_something_with_x(x)
I could debug this with:
from pprint import pprint
pprint(x)
do_something_with_x(x)
Of course, the other reason to import modules elsewhere in the code is if you need to dynamically import them. This is because you pretty much don't have any choice.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
Here's an updated summary of the answers to this
and
related
questions.
PEP 8
recommends putting imports at the top.
It's often more convenient to get
ImportErrors
when you first run your program
rather than when your program first calls your function.
Putting imports in the function scope
can help avoid issues with circular imports.
Putting imports in the function scope
helps keep maintain a clean module namespace,
so that it does not appear among tab-completion suggestions.
Start-up time:
imports in a function won't run until (if) that function is called.
Might get significant with heavy-weight libraries.
Even though import statements are super fast on subsequent runs,
they still incur a speed penalty
which can be significant if the function is trivial but frequently in use.
Imports under the __name__ == "__main__" guard seem very reasonable.
Refactoring
might be easier if the imports are located in the function
where they're used (facilitates moving it to another module).
It can also be argued that this is good for readability.
However, most would argue the contrary, i.e.
Imports at the top enhance readability,
since you can see all your dependencies at a glance.
It seems unclear if dynamic or conditional imports favour one style over another.
I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.
If you import at the top, you take the load hit no matter what. That's pretty small, but commonly in the milliseconds, not nanoseconds.
If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn't happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as #aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).
Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don't know why).
0 foo: 14429.0924 µs
1 foo: 63.8962 µs
2 foo: 10.0136 µs
3 foo: 7.1526 µs
4 foo: 7.8678 µs
0 bar: 9.0599 µs
1 bar: 6.9141 µs
2 bar: 7.1526 µs
3 bar: 7.8678 µs
4 bar: 7.1526 µs
The code:
from __future__ import print_function
from time import time
def foo():
import collections
import re
import string
import math
import subprocess
return
def bar():
import collections
import re
import string
import math
import subprocess
return
t0 = time()
for i in xrange(5):
foo()
t1 = time()
print(" %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
for i in xrange(5):
bar()
t1 = time()
print(" %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
It's a tradeoff, that only the programmer can decide to make.
Case 1 saves some memory and startup time by not importing the datetime module (and doing whatever initialization it might require) until needed. Note that doing the import 'only when called' also means doing it 'every time when called', so each call after the first one is still incurring the additional overhead of doing the import.
Case 2 save some execution time and latency by importing datetime beforehand so that not_often_called() will return more quickly when it is called, and also by not incurring the overhead of an import on every call.
Besides efficiency, it's easier to see module dependencies up front if the import statements are ... up front. Hiding them down in the code can make it more difficult to easily find what modules something depends on.
Personally I generally follow the PEP except for things like unit tests and such that I don't want always loaded because I know they aren't going to be used except for test code.
Here's an example where all the imports are at the very top (this is the only time I've needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows.
import os
# ...
try:
kill = os.kill # will raise AttributeError on Windows
from signal import SIGTERM
def terminate(process):
kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
try:
from win32api import TerminateProcess # use win32api if available
def terminate(process):
TerminateProcess(int(process._handle), -1)
except ImportError:
def terminate(process):
raise NotImplementedError # define a dummy function
(On review: what John Millikin said.)
This is like many other optimizations - you sacrifice some readability for speed. As John mentioned, if you've done your profiling homework and found this to be a significantly useful enough change and you need the extra speed, then go for it. It'd probably be good to put a note up with all the other imports:
from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below
Module initialization only occurs once - on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope.
Couple that information with the argument for readability and I would say that it is best to have the import statement at module scope.
Just to complete Moe's answer and the original question:
When we have to deal with circular dependences we can do some "tricks". Assuming we're working with modules a.py and b.py that contain x() and b y(), respectively. Then:
We can move one of the from imports at the bottom of the module.
We can move one of the from imports inside the function or method that is actually requiring the import (this isn't always possible, as you may use it from several places).
We can change one of the two from imports to be an import that looks like: import a
So, to conclude. If you aren't dealing with circular dependencies and doing some kind of trick to avoid them, then it's better to put all your imports at the top because of the reasons already explained in other answers to this question. And please, when doing this "tricks" include a comment, it's always welcome! :)
In addition to the excellent answers already given, it's worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution.
This issue often comes up in Apache Spark's Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It's best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.
I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users.
So my answer is no, do not always put the imports at the top of your modules.
It's interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.
Readability
In addition to startup performance, there is a readability argument to be made for localizing import statements. For example take python line numbers 1283 through 1296 in my current first python project:
listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
str(Gtk.get_minor_version())+"."+
str(Gtk.get_micro_version())])
import xml.etree.ElementTree as ET
xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
result[2]+" "+result[3]])
If the import statement was at the top of file I would have to scroll up a long way, or press Home, to find out what ET was. Then I would have to navigate back to line 1283 to continue reading code.
Indeed even if the import statement was at the top of the function (or class) as many would place it, paging up and back down would be required.
Displaying the Gnome version number will rarely be done so the import at top of file introduces unnecessary startup lag.
There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help.
test.py
X=10
Y=11
Z=12
def add(i):
i = i + 10
runlocal.py
from test import add, X, Y, Z
def callme():
x=X
y=Y
z=Z
ladd=add
for i in range(100000000):
ladd(i)
x+y+z
callme()
run.py
from test import add, X, Y, Z
def callme():
for i in range(100000000):
add(i)
X+Y+Z
callme()
A time on Linux shows a small gain
/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py
0:17.80 real, 17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py
0:14.23 real, 14.22 user, 0.01 sys
real is wall clock. user is time in program. sys is time for system calls.
https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names
I would like to mention a usecase of mine, very similar to those mentioned by #John Millikin and #V.K.:
Optional Imports
I do data analysis with Jupyter Notebook, and I use the same IPython notebook as a template for all analyses. In some occasions, I need to import Tensorflow to do some quick model runs, but sometimes I work in places where tensorflow isn't set up / is slow to import. In those cases, I encapsulate my Tensorflow-dependent operations in a helper function, import tensorflow inside that function, and bind it to a button.
This way, I could do "restart-and-run-all" without having to wait for the import, or having to resume the rest of the cells when it fails.
While PEP encourages importing at the top of a module, it isn't an error to import at other levels. That indicates imports should be at the top, however there are exceptions.
It is a micro-optimization to load modules when they are used. Code that is sluggish importing can be optimized later if it makes a sizable difference.
Still, you might introduce flags to conditionally import at as near to the top as possible, allowing a user to use configuration to import the modules they need while still importing everything immediately.
Importing as soon as possible means the program will fail if any imports (or imports of imports) are missing or have syntax errors. If all imports occur at the top of all modules then python works in two steps. Compile. Run.
Built in modules work anywhere they are imported because they are well designed. Modules you write should be the same. Moving around your imports to the top or to their first use can help ensure there are no side effects and the code is injecting dependencies.
Whether you put imports at the top or not, your code should still work when the imports are at the top. So start by importing immediately then optimize as needed.
This is a fascinating discussion. Like many others I had never even considered this topic. I got cornered into having to have the imports in the functions because of wanting to use the Django ORM in one of my libraries. I was having to call django.setup() before importing my model classes and because this was at the top of the file it was being dragged into completely non-Django library code because of the IoC injector construction.
I kind of hacked around a bit and ended up putting the django.setup() in the singleton constructor and the relevant import at the top of each class method. Now this worked fine but made me uneasy because the imports weren't at the top and also I started worrying about the extra time hit of the imports. Then I came here and read with great interest everybody's take on this.
I have a long C++ background and now use Python/Cython. My take on this is that why not put the imports in the function unless it causes you a profiled bottleneck. It's only like declaring space for variables just before you need them. The trouble is I have thousands of lines of code with all the imports at the top! So I think I will do it from now on and change the odd file here and there when I'm passing through and have the time.

What benefits or disadvantages would importing a module that contains 'import' commands?

If I were to create a module that was called for example imp_mod.py and inside it contained all (subjectively used) relevant modules that I frequently used.
Would importing this module into my main program allow me access to the imports contained inside imp_mod.py?
If so, what disadvantages would this bring?
I guess a major advantage would be a reduction of time spent importing even though its only a couple of seconds saved...
Yes, it would allow you to access them. If you place these imports in imp_mod.py:
from os import listdir
from collections import defaultdict
from copy import deepcopy
Then, you could do this in another file, say, myfile.py:
import imp_mod
imp_mod.listdir
imp_mod.defaultdict
imp_mod.deepcopy
You're wrong about reduction of importing time, as what happens is the opposite. Python will need to import imp_mod and then import the other modules afterwards, while the first import would not be needed if you were importing these modules in myfile.py itself. If you do the same imports in another file, they will already be in cache, so virtually no time is spent in the next import.
The real disadvantage here is less readability. Whoever looks at imp_mod.listdir, for example, will ask himself what the heck is this method and why it has the same name as that os module's method. When he had to open imp_mod.py just to find out that it's the same method, well, he probably wouldn't be happy. I wouldn't.
As lucasnadalutti mentioned, you can access them by importing your module.
In terms of advantages, it can make your main program care less about where the imports are coming from if the imp_mod handles all imports, however, as your program gets more complex and starts to include more namespaces, this approach can get more messy. You can start to handle a bit of this by using __init__.py within directories to handle imports to do a similar thing, but as things get more complex, personally, I feel it add a little more complexity. I'd rather just know where a module came from to look it up.

Pythonic way to import modules in a packaged .py file [duplicate]

PEP 8 states:
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
However if the class/method/function that I am importing is only used in rare cases, surely it is more efficient to do the import when it is needed?
Isn't this:
class SomeClass(object):
def not_often_called(self)
from datetime import datetime
self.datetime = datetime.now()
more efficient than this?
from datetime import datetime
class SomeClass(object):
def not_often_called(self)
self.datetime = datetime.now()
Module importing is quite fast, but not instant. This means that:
Putting the imports at the top of the module is fine, because it's a trivial cost that's only paid once.
Putting the imports within a function will cause calls to that function to take longer.
So if you care about efficiency, put the imports at the top. Only move them into a function if your profiling shows that would help (you did profile to see where best to improve performance, right??)
The best reasons I've seen to perform lazy imports are:
Optional library support. If your code has multiple paths that use different libraries, don't break if an optional library is not installed.
In the __init__.py of a plugin, which might be imported but not actually used. Examples are Bazaar plugins, which use bzrlib's lazy-loading framework.
Putting the import statement inside of a function can prevent circular dependencies.
For example, if you have 2 modules, X.py and Y.py, and they both need to import each other, this will cause a circular dependency when you import one of the modules causing an infinite loop. If you move the import statement in one of the modules then it won't try to import the other module till the function is called, and that module will already be imported, so no infinite loop. Read here for more - effbot.org/zone/import-confusion.htm
I have adopted the practice of putting all imports in the functions that use them, rather than at the top of the module.
The benefit I get is the ability to refactor more reliably. When I move a function from one module to another, I know that the function will continue to work with all of its legacy of testing intact. If I have my imports at the top of the module, when I move a function, I find that I end up spending a lot of time getting the new module's imports complete and minimal. A refactoring IDE might make this irrelevant.
There is a speed penalty as mentioned elsewhere. I have measured this in my application and found it to be insignificant for my purposes.
It is also nice to be able to see all module dependencies up front without resorting to search (e.g. grep). However, the reason I care about module dependencies is generally because I'm installing, refactoring, or moving an entire system comprising multiple files, not just a single module. In that case, I'm going to perform a global search anyway to make sure I have the system-level dependencies. So I have not found global imports to aid my understanding of a system in practice.
I usually put the import of sys inside the if __name__=='__main__' check and then pass arguments (like sys.argv[1:]) to a main() function. This allows me to use main in a context where sys has not been imported.
Most of the time this would be useful for clarity and sensible to do but it's not always the case. Below are a couple of examples of circumstances where module imports might live elsewhere.
Firstly, you could have a module with a unit test of the form:
if __name__ == '__main__':
import foo
aa = foo.xyz() # initiate something for the test
Secondly, you might have a requirement to conditionally import some different module at runtime.
if [condition]:
import foo as plugin_api
else:
import bar as plugin_api
xx = plugin_api.Plugin()
[...]
There are probably other situations where you might place imports in other parts in the code.
The first variant is indeed more efficient than the second when the function is called either zero or one times. With the second and subsequent invocations, however, the "import every call" approach is actually less efficient. See this link for a lazy-loading technique that combines the best of both approaches by doing a "lazy import".
But there are reasons other than efficiency why you might prefer one over the other. One approach is makes it much more clear to someone reading the code as to the dependencies that this module has. They also have very different failure characteristics -- the first will fail at load time if there's no "datetime" module while the second won't fail until the method is called.
Added Note: In IronPython, imports can be quite a bit more expensive than in CPython because the code is basically being compiled as it's being imported.
Curt makes a good point: the second version is clearer and will fail at load time rather than later, and unexpectedly.
Normally I don't worry about the efficiency of loading modules, since it's (a) pretty fast, and (b) mostly only happens at startup.
If you have to load heavyweight modules at unexpected times, it probably makes more sense to load them dynamically with the __import__ function, and be sure to catch ImportError exceptions, and handle them in a reasonable manner.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
In most cases you want to load the modules at the top of the source file. For somebody reading your code, it makes it much easier to tell what function or object came from what module.
One good reason to import a module elsewhere in the code is if it's used in a debugging statement.
For example:
do_something_with_x(x)
I could debug this with:
from pprint import pprint
pprint(x)
do_something_with_x(x)
Of course, the other reason to import modules elsewhere in the code is if you need to dynamically import them. This is because you pretty much don't have any choice.
I wouldn't worry about the efficiency of loading the module up front too much. The memory taken up by the module won't be very big (assuming it's modular enough) and the startup cost will be negligible.
Here's an updated summary of the answers to this
and
related
questions.
PEP 8
recommends putting imports at the top.
It's often more convenient to get
ImportErrors
when you first run your program
rather than when your program first calls your function.
Putting imports in the function scope
can help avoid issues with circular imports.
Putting imports in the function scope
helps keep maintain a clean module namespace,
so that it does not appear among tab-completion suggestions.
Start-up time:
imports in a function won't run until (if) that function is called.
Might get significant with heavy-weight libraries.
Even though import statements are super fast on subsequent runs,
they still incur a speed penalty
which can be significant if the function is trivial but frequently in use.
Imports under the __name__ == "__main__" guard seem very reasonable.
Refactoring
might be easier if the imports are located in the function
where they're used (facilitates moving it to another module).
It can also be argued that this is good for readability.
However, most would argue the contrary, i.e.
Imports at the top enhance readability,
since you can see all your dependencies at a glance.
It seems unclear if dynamic or conditional imports favour one style over another.
I was surprised not to see actual cost numbers for the repeated load-checks posted already, although there are many good explanations of what to expect.
If you import at the top, you take the load hit no matter what. That's pretty small, but commonly in the milliseconds, not nanoseconds.
If you import within a function(s), then you only take the hit for loading if and when one of those functions is first called. As many have pointed out, if that doesn't happen at all, you save the load time. But if the function(s) get called a lot, you take a repeated though much smaller hit (for checking that it has been loaded; not for actually re-loading). On the other hand, as #aaronasterling pointed out you also save a little because importing within a function lets the function use slightly-faster local variable lookups to identify the name later (http://stackoverflow.com/questions/477096/python-import-coding-style/4789963#4789963).
Here are the results of a simple test that imports a few things from inside a function. The times reported (in Python 2.7.14 on a 2.3 GHz Intel Core i7) are shown below (the 2nd call taking more than later calls seems consistent, though I don't know why).
0 foo: 14429.0924 µs
1 foo: 63.8962 µs
2 foo: 10.0136 µs
3 foo: 7.1526 µs
4 foo: 7.8678 µs
0 bar: 9.0599 µs
1 bar: 6.9141 µs
2 bar: 7.1526 µs
3 bar: 7.8678 µs
4 bar: 7.1526 µs
The code:
from __future__ import print_function
from time import time
def foo():
import collections
import re
import string
import math
import subprocess
return
def bar():
import collections
import re
import string
import math
import subprocess
return
t0 = time()
for i in xrange(5):
foo()
t1 = time()
print(" %2d foo: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
for i in xrange(5):
bar()
t1 = time()
print(" %2d bar: %12.4f \xC2\xB5s" % (i, (t1-t0)*1E6))
t0 = t1
It's a tradeoff, that only the programmer can decide to make.
Case 1 saves some memory and startup time by not importing the datetime module (and doing whatever initialization it might require) until needed. Note that doing the import 'only when called' also means doing it 'every time when called', so each call after the first one is still incurring the additional overhead of doing the import.
Case 2 save some execution time and latency by importing datetime beforehand so that not_often_called() will return more quickly when it is called, and also by not incurring the overhead of an import on every call.
Besides efficiency, it's easier to see module dependencies up front if the import statements are ... up front. Hiding them down in the code can make it more difficult to easily find what modules something depends on.
Personally I generally follow the PEP except for things like unit tests and such that I don't want always loaded because I know they aren't going to be used except for test code.
Here's an example where all the imports are at the very top (this is the only time I've needed to do this). I want to be able to terminate a subprocess on both Un*x and Windows.
import os
# ...
try:
kill = os.kill # will raise AttributeError on Windows
from signal import SIGTERM
def terminate(process):
kill(process.pid, SIGTERM)
except (AttributeError, ImportError):
try:
from win32api import TerminateProcess # use win32api if available
def terminate(process):
TerminateProcess(int(process._handle), -1)
except ImportError:
def terminate(process):
raise NotImplementedError # define a dummy function
(On review: what John Millikin said.)
This is like many other optimizations - you sacrifice some readability for speed. As John mentioned, if you've done your profiling homework and found this to be a significantly useful enough change and you need the extra speed, then go for it. It'd probably be good to put a note up with all the other imports:
from foo import bar
from baz import qux
# Note: datetime is imported in SomeClass below
Module initialization only occurs once - on the first import. If the module in question is from the standard library, then you will likely import it from other modules in your program as well. For a module as prevalent as datetime, it is also likely a dependency for a slew of other standard libraries. The import statement would cost very little then since the module intialization would have happened already. All it is doing at this point is binding the existing module object to the local scope.
Couple that information with the argument for readability and I would say that it is best to have the import statement at module scope.
Just to complete Moe's answer and the original question:
When we have to deal with circular dependences we can do some "tricks". Assuming we're working with modules a.py and b.py that contain x() and b y(), respectively. Then:
We can move one of the from imports at the bottom of the module.
We can move one of the from imports inside the function or method that is actually requiring the import (this isn't always possible, as you may use it from several places).
We can change one of the two from imports to be an import that looks like: import a
So, to conclude. If you aren't dealing with circular dependencies and doing some kind of trick to avoid them, then it's better to put all your imports at the top because of the reasons already explained in other answers to this question. And please, when doing this "tricks" include a comment, it's always welcome! :)
In addition to the excellent answers already given, it's worth noting that the placement of imports is not merely a matter of style. Sometimes a module has implicit dependencies that need to be imported or initialized first, and a top-level import could lead to violations of the required order of execution.
This issue often comes up in Apache Spark's Python API, where you need to initialize the SparkContext before importing any pyspark packages or modules. It's best to place pyspark imports in a scope where the SparkContext is guaranteed to be available.
I do not aspire to provide complete answer, because others have already done this very well. I just want to mention one use case when I find especially useful to import modules inside functions. My application uses python packages and modules stored in certain location as plugins. During application startup, the application walks through all the modules in the location and imports them, then it looks inside the modules and if it finds some mounting points for the plugins (in my case it is a subclass of a certain base class having a unique ID) it registers them. The number of plugins is large (now dozens, but maybe hundreds in the future) and each of them is used quite rarely. Having imports of third party libraries at the top of my plugin modules was a bit penalty during application startup. Especially some thirdparty libraries are heavy to import (e.g. import of plotly even tries to connect to internet and download something which was adding about one second to startup). By optimizing imports (calling them only in the functions where they are used) in the plugins I managed to shrink the startup from 10 seconds to some 2 seconds. That is a big difference for my users.
So my answer is no, do not always put the imports at the top of your modules.
It's interesting that not a single answer mentioned parallel processing so far, where it might be REQUIRED that the imports are in the function, when the serialized function code is what is being pushed around to other cores, e.g. like in the case of ipyparallel.
Readability
In addition to startup performance, there is a readability argument to be made for localizing import statements. For example take python line numbers 1283 through 1296 in my current first python project:
listdata.append(['tk font version', font_version])
listdata.append(['Gtk version', str(Gtk.get_major_version())+"."+
str(Gtk.get_minor_version())+"."+
str(Gtk.get_micro_version())])
import xml.etree.ElementTree as ET
xmltree = ET.parse('/usr/share/gnome/gnome-version.xml')
xmlroot = xmltree.getroot()
result = []
for child in xmlroot:
result.append(child.text)
listdata.append(['Gnome version', result[0]+"."+result[1]+"."+
result[2]+" "+result[3]])
If the import statement was at the top of file I would have to scroll up a long way, or press Home, to find out what ET was. Then I would have to navigate back to line 1283 to continue reading code.
Indeed even if the import statement was at the top of the function (or class) as many would place it, paging up and back down would be required.
Displaying the Gnome version number will rarely be done so the import at top of file introduces unnecessary startup lag.
There can be a performance gain by importing variables/local scoping inside of a function. This depends on the usage of the imported thing inside the function. If you are looping many times and accessing a module global object, importing it as local can help.
test.py
X=10
Y=11
Z=12
def add(i):
i = i + 10
runlocal.py
from test import add, X, Y, Z
def callme():
x=X
y=Y
z=Z
ladd=add
for i in range(100000000):
ladd(i)
x+y+z
callme()
run.py
from test import add, X, Y, Z
def callme():
for i in range(100000000):
add(i)
X+Y+Z
callme()
A time on Linux shows a small gain
/usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python run.py
0:17.80 real, 17.77 user, 0.01 sys
/tmp/test$ /usr/bin/time -f "\t%E real,\t%U user,\t%S sys" python runlocal.py
0:14.23 real, 14.22 user, 0.01 sys
real is wall clock. user is time in program. sys is time for system calls.
https://docs.python.org/3.5/reference/executionmodel.html#resolution-of-names
I would like to mention a usecase of mine, very similar to those mentioned by #John Millikin and #V.K.:
Optional Imports
I do data analysis with Jupyter Notebook, and I use the same IPython notebook as a template for all analyses. In some occasions, I need to import Tensorflow to do some quick model runs, but sometimes I work in places where tensorflow isn't set up / is slow to import. In those cases, I encapsulate my Tensorflow-dependent operations in a helper function, import tensorflow inside that function, and bind it to a button.
This way, I could do "restart-and-run-all" without having to wait for the import, or having to resume the rest of the cells when it fails.
While PEP encourages importing at the top of a module, it isn't an error to import at other levels. That indicates imports should be at the top, however there are exceptions.
It is a micro-optimization to load modules when they are used. Code that is sluggish importing can be optimized later if it makes a sizable difference.
Still, you might introduce flags to conditionally import at as near to the top as possible, allowing a user to use configuration to import the modules they need while still importing everything immediately.
Importing as soon as possible means the program will fail if any imports (or imports of imports) are missing or have syntax errors. If all imports occur at the top of all modules then python works in two steps. Compile. Run.
Built in modules work anywhere they are imported because they are well designed. Modules you write should be the same. Moving around your imports to the top or to their first use can help ensure there are no side effects and the code is injecting dependencies.
Whether you put imports at the top or not, your code should still work when the imports are at the top. So start by importing immediately then optimize as needed.
This is a fascinating discussion. Like many others I had never even considered this topic. I got cornered into having to have the imports in the functions because of wanting to use the Django ORM in one of my libraries. I was having to call django.setup() before importing my model classes and because this was at the top of the file it was being dragged into completely non-Django library code because of the IoC injector construction.
I kind of hacked around a bit and ended up putting the django.setup() in the singleton constructor and the relevant import at the top of each class method. Now this worked fine but made me uneasy because the imports weren't at the top and also I started worrying about the extra time hit of the imports. Then I came here and read with great interest everybody's take on this.
I have a long C++ background and now use Python/Cython. My take on this is that why not put the imports in the function unless it causes you a profiled bottleneck. It's only like declaring space for variables just before you need them. The trouble is I have thousands of lines of code with all the imports at the top! So I think I will do it from now on and change the odd file here and there when I'm passing through and have the time.

Importing Modules - How Much is Too Much?

As a newbie, I've grown fond of importing modules almost by default when I start writing something just because I call on certain functions within them quite a bit (modules like random, time, os, and sys). However, does this impact performance at all? If I only want a specific function in the module, am I better served by asking for it explicitly or should I not care?
When you do from module import name, Python still has to run module, so there is no difference performance-wise.
Generally, if you are only using one thing from a module, the using from x import y is fine, but otherwise, using import x and then x.y results in a lower chance of conflicting names.
The reason for this is that Python doesn't treat functions and classes specially. When you import from a module, all you are doing is taking objects from that module and using them. This is useful, as it makes modules (as well as classes and functions) extremely flexible, but it does mean that Python has to run the whole script for a module before it can import from it (naturally, module writers can work around this by using the if name == "main": idiom to insert code that shouldn't be run on import).

Is there any possible point to reloading a Python module immediately?

Is there any conceivable point to reloading these modules immediately after importing them? This is the code that I was reviewing which made me wonder:
import time
import sys
import os
import string
import pp
import numpy
import nrrd
reload(nrrd)
import smooth as sm
reload(sm)
import TensorEval2C as tensPP
reload(tensPP)
import TrackFiber4C as trackPP
reload(trackPP)
import cmpV
reload(cmpV)
import vectors as vects
reload(vects)
Edit: I suggested that this might make the creation of .pyc files more likely, but several people pointed out that this happens this first time, every time.
I note that the standard modules are just imported: it's the other modules that are reloaded. I expect whoever wrote this code wanted to be able to easily reload the whole package (so as to get their latest edits). After putting in all these redundant reload calls, the programmer only had to write
>>> reload(package)
to bring things up to date in the interpreter, instead of having to type
>>> reload(package.nrrd)
>>> reload(package.sm)
>>> reload(package.tensPP)
etc. So please ignore the suggestion that you commit violence against the programmer who wrote this: they are far from the only programmer who's had trouble with reloading of dependencies. Just encourage them to move the reloads to a convenience function.
It is possible that this does cause something to happen; the obvious example is side-effects that happen on import. For instance, a module could log to a file the time and date of every time it is imported.
There is probably no good reason for this, however.
The .pyc files would be created on the first import, so even that's not a very good reason for this.
What's the execution environment for this code? There exists at least one Python web framework that makes different reload decisions than standard python does, which leads to frustration and confusion when you make a change that doesn't 'take'.

Categories