Is there a performance cost putting python imports inside functions? - python

I build quite complex python apps, often with Django. To simplify inter-application interfaces I sometimes use service.py modules that abstract away from the models.
As these 'aggregate functionality', they frequently end up with circular imports which are easily eliminated by placing the import statements inside the service functions.
Is there a significant performance or memory cost associated with generally moving imports as close to their point of use as possible? For example, if I only use a particular imported name in one function in a file, it seems natural to place the import in that particular function rather than at the top of the file in its conventional place.
This issue is subtly different to this question because each import is in the function namespace.

The point at which you import a module is not expected to cause a performance penalty, if that's what you're worried about. Modules are singletons and will not be imported every single time an import statement is encountered. However, how you do the import, and subsequent attribute lookups, does have an impact.
For example, if you import math and then every time you need to use the sin(...) function you have to do math.sin(...), this will generally be slower than doing from math import sin and using sin(...) directly as the system does not have to keep looking up the function name within the module.
This lookup-penalty applies to anything that is accessed using the dot . and will be particularly noticeable in a loop. It's therefore advisable to get a local reference to something you might need to use/invoke frequently in a performance critical loop/section.
For example, using the original import math example, right before a critical loop, you could do something like this:
# ... within some function
sin = math.sin
for i in range(0, REALLY_BIG_NUMBER):
x = sin(i) # faster than: x = math.sin(x)
# ...
This is a trivial example, but note that you could do something similar with methods on other objects (e.g. lists, dictionaries, etc).
I'm probably a bit more concerned about the circular imports you mention. If your intention is to "fix" circular imports by moving the import statements into more "local" places (e.g. within a specific function, or block of code, etc) you probably have a deeper issue that you need to address.
Personally, I'd keep the imports at the top of the module as it's normally done. Straying away from that pattern for no good reason is likely to make your code more difficult to go through because the dependencies of your module will not be immediately apparent (i.e. there're import statements scattered throughout the code instead of in a single location).
It might also make the circular dependency issue you seem to be having more difficult to debug and easier to fall into. After all, if the module is not listed above, someone might happily think your module A has no dependency on module B and then up adding an import A in B when A already has import B hidden in some deep dark corner.
Benchmark Sample
Here's a benchmark using the lookup notation:
>>> timeit('for i in range(0, 10000): x = math.sin(i)', setup='import math', number=50000)
89.7203312900001
And another benchmark not using the lookup notation:
>>> timeit('for i in range(0, 10000): x = sin(i)', setup='from math import sin', number=50000)
78.27029322999988
Here there's a 10+ second difference.
Note that your gain depends on how much time the program spends running this code --i.e. a performance critical section instead of sporadic function calls.

See this question.
Basically whenever you import a module, if it's been imported before it will use a cached value.
This means that the performance will be hit the first time that the module is loaded, but once it's been loaded it will cache the values for future calls to it.

As ray said, importing specific functions is (slightly faster)
1.62852311134 for sin()
1.89815092087 for math.sin()
using the following code
from time import time
sin=math.sin
t1=time()
for i in xrange(10000000):
x=sin(i)
t2=time()
for i in xrange(10000000):
z=math.sin(i)
t3=time()
print (t2-t1)
print (t3-t2)

As per timeit, there is a significant cost to an import statement, even when the module is already imported in the same namespace:
$ python -m timeit -s 'import sys
def foo():
import sys
assert sys is not None
' -- 'foo()'
500000 loops, best of 5: 824 nsec per loop
$ python -m timeit -s 'import sys
def foo():
assert sys is not None
' -- 'foo()'
2000000 loops, best of 5: 96.3 nsec per loop
(Timing figures from Python 3.10.6 on Termux running on a phone.)
Instead of imports within functions, I've found that I can take advantage of Python's support for partially initialized modules and do a "tail import", pushing the import statement to the very bottom of the file (with a # isort:skip to get isort to leave it alone). This allows circular imports as long as the tail import is not required at module or class level and only at function or method level.

Related

Python - Performance difference between importing a function and locally declaring it?

Is there a significant difference in performance between importing a function versus declaring it in the current file in Python?
I have a small function (one-liner) that I use often in several .py files in my program. I wish to instead define it once so that changes I make to it are reflected everywhere. However, I am not sure whether using it as an imported function will add additional overhead when calling it...
I doubt there should be a difference between a call to a function locally declared vs a function which is imported. Although, there is a small difference between a line of code being executed vs a function called for the same code to be executed. This should help in case I was a bit confusing with my wording.
Hi Jet Blue for better understanding go with the python wiki PerformanceTips
import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.
Consider the following two snippets of code (originally from Greg McFarlane, I believe - I found it unattributed in a comp.lang.python python-list#python.org posting and later attributed to him in another source):
def doit1():
import string ###### import statement inside function
string.lower('Python')
for num in range(100000):
doit1()
or:
import string ###### import statement outside function
def doit2():
string.lower('Python')
for num in range(100000):
doit2()
doit2 will run much faster than doit1, even though the reference to the string module is global in doit2. Here's a Python interpreter session run using Python 2.3 and the new timeit module, which shows how much faster the second is than the first:
def doit1():
import string
string.lower('Python')
import string
def doit2():
string.lower('Python')
import timeit
t = timeit.Timer(setup='from __main__ import doit1', stmt='doit1()')
t.timeit()
11.479144930839539
t = timeit.Timer(setup='from __main__ import doit2', stmt='doit2()')
t.timeit()
4.6661689281463623
String methods were introduced to the language in Python 2.0. These provide a version that avoids the import completely and runs even faster:
def doit3():
'Python'.lower()
for num in range(100000):
doit3()
Here's the proof from timeit:
def doit3():
'Python'.lower()
t = timeit.Timer(setup='from main import doit3', stmt='doit3()')
t.timeit()
2.5606080293655396
The above example is obviously a bit contrived, but the general principle holds.
Note that putting an import in a function can speed up the initial loading of the module, especially if the imported module might not be required. This is generally a case of a "lazy" optimization -- avoiding work (importing a module, which can be very expensive) until you are sure it is required.
This is only a significant saving in cases where the module wouldn't have been imported at all (from any module) -- if the module is already loaded (as will be the case for many standard modules, like string or re), avoiding an import doesn't save you anything. To see what modules are loaded in the system look in sys.modules.
A good way to do lazy imports is:
email = None
def parse_email():
global email
if email is None:
import email
This way the email module will only be imported once, on the first invocation of parse_email().

Is there any way to speed up an import?

I have a CLI application that requires sympy. The speed of the CLI application matters - it's used a lot in a user feedback loop.
However, simply doing import sympy takes a full second. This gets incredibly annoying in a tight feedback loop. Is there anyway to 'preload' or optimize a module when a script is run again without a change to the module?
Obviously sympy does a lot when being imported. It could be initialization of internal data structures or similar. You could call this a flaw in the design of the sympy library.
Your only choice in this case would be to avoid redoing this initialization.
I assume that you find this behavior annoying because you intend to do it often. I propose to avoid doing it often. A way to achieve this could be to create a server which is started just once, imports sympy upon its startup, and then offers a service (via interprocess communication) which allows you to do whatever you want to do with sympy.
If this could be an option for you, I could elaborate on how to do this.
I took a look at what happens when you run import sympy, and it imports all of sympy.
https://github.com/sympy/sympy/blob/master/sympy/__init__.py
If you are only using certain parts of sympy, then only import those parts that you need.
It would be nice if you could do this:
import sympy.sets
But (as you point out) that imports sympy and then sets.
One solution is to write your own importer. You can do this with the help of the imp module.
import imp
sets = imp.load_module("sets", open("sympy/sets/__init__.py"), "sympy/sets/__init__.py", ('.py', 'U', 1))
But, even that may not optimize enough. Taking a look at sympy/sets/__init__.py I see that it does this:
from .sets import (Set, Interval, Union, EmptySet, FiniteSet, ProductSet,
Intersection, imageset, Complement, SymmetricDifference)
from .fancysets import TransformationSet, ImageSet, Range, ComplexRegion
from .contains import Contains
from .conditionset import ConditionSet
Maybe you can import only the sets module from simpy sets namespace?
import imp
sets = imp.load_module("sets", open("sympy/sets/set.py") "sympy/sets/set.py", ('.py', 'U', 1))
You should test if importing only the modules that you are using in the code improves the loading time.
IE:
from sympy import mod1, mod2, mod3
vs
import sympy
You should read these previous questions:
Python import X or from X import Y? (performance)
improving speed of Python module import
'import module' vs. 'from module import function'

Python import modules in another file

I'm currently re-factoring a project (formerly big one file) into several seperate python files, each of which runs a specific part of my application.
Eg, GUIthread.py runs the GUI, Computethread.py does some maths, etc etc.
Each thread includes the use of functions from imported modules like math, time, numpy, etc etc.
I already have a file globalClasses.py containing class definitions for my datatypes etc, which each .py file imports at the start, as per recomendation here: http://effbot.org/pyfaq/how-do-i-share-global-variables-across-modules.htm . This is working well.
What I would like to do is have all my 3rdparty module imports in the globals file as well, so that I can write, for example, import math once but have all of my project files able to use math functions.
Questions:
1. Is this possible?
2. Is it a good idea/is it good Python practice?
My current solution is just to put
import math
import time
import numpy
...
(plus imports for all the other modules I'm using as well)
at the top of every file in my project... But that doesn't seem very tidy, and it's easy to forget to move a dependency's import statement when moving code-chunks from file to file...
Yeah I guess there is a more elegant way of doing this which will save redundant line of code. Suppose you want to import some modules math, time, numpy(say), then you can create a file importing_modules(say) and import the various modules as from module_name import *, So the importing_modules.py may look something like this:
importing_modules.py
from math import *
from numpy import *
from time import *
main.py
from importing_modules import *
#Now you can call the methods of that module directly
print sqrt(25) #Now we can call sqrt() directly in place of math.sqrt() or importing_modules.math.sqrt().
The other answer shows how what you want is (sort of) possible, but didn't address your second question about good practice.
Using import * is almost invariably considered bad practice. See "Why is import * bad?" and "Importing * from a package" from the docs.
Remember from PEP 20 that explicit is better than implicit. With explicit, specific imports (e.g. from math import sqrt) in every module, there is never confusion about from where a name came, your module's namespace includes only what it needs, and bugs are prevented.
The downside of having to write a couple import statements per module does not outweigh the potential problems introduced by trying to get around writing them.

Struggling with Python timeit

I'm struggling with the timeit function in Python, and, on a deeper level, I find myself very frustrated by the quirks of this function. I'm hoping I can get some help with both issues here.
I have a script (call it my_script.py) with a lot of different function definitions, and then a lot of other stuff being calculated below them all. I want to time only one of these functions in particular - let's call it level_99_function(x). I have a big array stored in my_input. My first attempt:
timeit.timeit('f1(x)', setup = 'my_input')
Python returns the error: NameError: global name 'angle' is not defined.
Now my second attempt is to do the following:
print timeit.timeit('level_99_function(x)', setup = 'import numpy as np; import my_script.py; x= np.linspace(0,100)')
This doesn't generate any errors, but the problem is two-fold. First, and most importantly, it still doesn't time the level_99_function (or maybe it just doesn't print to output of the timer for whatever reason?) Second, the import statement seems to be running the entire script on import, which takes forever because of all the stuff I've got in this script aside from my level_99_function.
How do I get the timing of the function in question here? And on a more philosophical level, why is this such a struggle in Python? I've already got a variable and a function defined; all I want to do is time that function call with that variable. It would be nice to not have to write a super long line of code, or write multiple lines of code, or have to import things or any of that stuff. It's as easy as tic and toc in Matlab. I guess the corresponding Python commands would be to use 'time.clock()' before and after the function call, but I've read that this can be inaccurate and misleading.
You don't need to import numpy everytime within setup, instead you can import the function and variables you want from the current script with from __main__ import ... as shown in the example below.
import timeit
import numpy as np
def func1(x):
pass
def func2(x):
pass
def func3(x):
return np.array(x > 1000)
if __name__ == '__main__':
x = np.arange(10000)
time = timeit.timeit('func3(x)', setup='from __main__ import func3, x', number=1000)
print(time)
The if __name__ == '__main__' block will prevent the code within the if statement from being ran if you import the code from another script, meaning you won't accidentally run your timing tests if you import your functions.
This code only imports func3 and x. I'm only interested in func3 (not func1 and func2) and I've defined a value to test with (I call it x but it's equivalent to your my_input). You don't need to import numpy in this case.
I would however completely and utterly advise you to take roippi's comment into consideration and use IPython. The %timeit magic method is very, very useful.
As an FYI for the future:
I recently submitted a patch against issue2527, which was committed a few days ago to the default branch. So whenever 3.5 is publicly released, you can do this:
timeit.timeit('level_99_function(x)', globals=globals())
Not quite as awesome as iPython's %timeit, I know, but far better than the from __main__ import ... nonsense that you have to do right now. More info in the docs.

What are good rules of thumb for Python imports?

I am a little confused by the multitude of ways in which you can import modules in Python.
import X
import X as Y
from A import B
I have been reading up about scoping and namespaces, but I would like some practical advice on what is the best strategy, under which circumstances and why. Should imports happen at a module level or a method/function level? In the __init__.py or in the module code itself?
My question is not really answered by "Python packages - import by class, not file" although it is obviously related.
In production code in our company, we try to follow the following rules.
We place imports at the beginning of the file, right after the main file's docstring, e.g.:
"""
Registry related functionality.
"""
import wx
# ...
Now, if we import a class that is one of few in the imported module, we import the name directly, so that in the code we only have to use the last part, e.g.:
from RegistryController import RegistryController
from ui.windows.lists import ListCtrl, DynamicListCtrl
There are modules, however, that contain dozens of classes, e.g. list of all possible exceptions. Then we import the module itself and reference to it in the code:
from main.core import Exceptions
# ...
raise Exceptions.FileNotFound()
We use the import X as Y as rarely as possible, because it makes searching for usage of a particular module or class difficult. Sometimes, however, you have to use it if you wish to import two classes that have the same name, but exist in different modules, e.g.:
from Queue import Queue
from main.core.MessageQueue import Queue as MessageQueue
As a general rule, we don't do imports inside methods -- they simply make code slower and less readable. Some may find this a good way to easily resolve cyclic imports problem, but a better solution is code reorganization.
Let me just paste a part of conversation on django-dev mailing list started by Guido van Rossum:
[...]
For example, it's part of the Google Python style guides[1] that all
imports must import a module, not a class or function from that
module. There are way more classes and functions than there are
modules, so recalling where a particular thing comes from is much
easier if it is prefixed with a module name. Often multiple modules
happen to define things with the same name -- so a reader of the code
doesn't have to go back to the top of the file to see from which
module a given name is imported.
Source: http://groups.google.com/group/django-developers/browse_thread/thread/78975372cdfb7d1a
1: http://code.google.com/p/soc/wiki/PythonStyleGuide#Module_and_package_imports
I would normally use import X on module level. If you only need a single object from a module, use from X import Y.
Only use import X as Y in case you're otherwise confronted with a name clash.
I only use imports on function level to import stuff I need when the module is used as the main module, like:
def main():
import sys
if len(sys.argv) > 1:
pass
HTH
Someone above said that
from X import A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P
is equivalent to
import X
import X allows direct modifications to A-P, while from X import ... creates copies of A-P. For from X import A..P you do not get updates to variables if they are modified. If you modify them, you only modify your copy, but X does know about your modifications.
If A-P are functions, you won't know the difference.
Others have covered most of the ground here but I just wanted to add one case where I will use import X as Y (temporarily), when I'm trying out a new version of a class or module.
So if we were migrating to a new implementation of a module, but didn't want to cut the code base over all at one time, we might write a xyz_new module and do this in the source files that we had migrated:
import xyz_new as xyz
Then, once we cut over the entire code base, we'd just replace the xyz module with xyz_new and change all of the imports back to
import xyz
DON'T do this:
from X import *
unless you are absolutely sure that you will use each and every thing in that module. And even then, you should probably reconsider using a different approach.
Other than that, it's just a matter of style.
from X import Y
is good and saves you lots of typing. I tend to use that when I'm using something in it fairly frequently But if you're importing a lot from that module, you could end up with an import statement that looks like this:
from X import A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P
You get the idea. That's when imports like
import X
become useful. Either that or if I'm not really using anything in X very frequently.
I generally try to use the regular import modulename, unless the module name is long, or used often..
For example, I would do..
from BeautifulSoup import BeautifulStoneSoup as BSS
..so I can do soup = BSS(html) instead of BeautifulSoup.BeautifulStoneSoup(html)
Or..
from xmpp import XmppClientBase
..instead of importing the entire of xmpp when I only use the XmppClientBase
Using import x as y is handy if you want to import either very long method names , or to prevent clobbering an existing import/variable/class/method (something you should try to avoid completely, but it's not always possible)
Say I want to run a main() function from another script, but I already have a main() function..
from my_other_module import main as other_module_main
..wouldn't replace my main function with my_other_module's main
Oh, one thing - don't do from x import * - it makes your code very hard to understand, as you cannot easily see where a method came from (from x import *; from y import *; my_func() - where is my_func defined?)
In all cases, you could just do import modulename and then do modulename.subthing1.subthing2.method("test")...
The from x import y as z stuff is purely for convenience - use it whenever it'll make your code easier to read or write!
When you have a well-written library, which is sometimes case in python, you ought just import it and use it as it. Well-written library tends to take life and language of its own, resulting in pleasant-to-read -code, where you rarely reference the library. When a library is well-written, you ought not need renaming or anything else too often.
import gat
node = gat.Node()
child = node.children()
Sometimes it's not possible to write it this way, or then you want to lift down things from library you imported.
from gat import Node, SubNode
node = Node()
child = SubNode(node)
Sometimes you do this for lot of things, if your import string overflows 80 columns, It's good idea to do this:
from gat import (
Node, SubNode, TopNode, SuperNode, CoolNode,
PowerNode, UpNode
)
The best strategy is to keep all of these imports on the top of the file. Preferrably ordered alphabetically, import -statements first, then from import -statements.
Now I tell you why this is the best convention.
Python could perfectly have had an automatic import, which'd look from the main imports for the value when it can't be found from global namespace. But this is not a good idea. I explain shortly why. Aside it being more complicated to implement than simple import, programmers wouldn't be so much thinking about the depedencies and finding out from where you imported things ought be done some other way than just looking into imports.
Need to find out depedencies is one reason why people hate "from ... import *". Some bad examples where you need to do this exist though, for example opengl -wrappings.
So the import definitions are actually valuable as defining the depedencies of the program. It is the way how you should exploit them. From them you can quickly just check where some weird function is imported from.
The import X as Y is useful if you have different implementations of the same module/class.
With some nested try..import..except ImportError..imports you can hide the implementation from your code. See lxml etree import example:
try:
from lxml import etree
print("running with lxml.etree")
except ImportError:
try:
# Python 2.5
import xml.etree.cElementTree as etree
print("running with cElementTree on Python 2.5+")
except ImportError:
try:
# Python 2.5
import xml.etree.ElementTree as etree
print("running with ElementTree on Python 2.5+")
except ImportError:
try:
# normal cElementTree install
import cElementTree as etree
print("running with cElementTree")
except ImportError:
try:
# normal ElementTree install
import elementtree.ElementTree as etree
print("running with ElementTree")
except ImportError:
print("Failed to import ElementTree from any known place")
I'm with Jason in the fact of not using
from X import *
But in my case (i'm not an expert programmer, so my code does not meet the coding style too well) I usually do in my programs a file with all the constants like program version, authors, error messages and all that stuff, so the file are just definitions, then I make the import
from const import *
That saves me a lot of time. But it's the only file that has that import, and it's because all inside that file are just variable declarations.
Doing that kind of import in a file with classes and definitions might be useful, but when you have to read that code you spend lots of time locating functions and classes.

Categories