I'm struggling with the timeit function in Python, and, on a deeper level, I find myself very frustrated by the quirks of this function. I'm hoping I can get some help with both issues here.
I have a script (call it my_script.py) with a lot of different function definitions, and then a lot of other stuff being calculated below them all. I want to time only one of these functions in particular - let's call it level_99_function(x). I have a big array stored in my_input. My first attempt:
timeit.timeit('f1(x)', setup = 'my_input')
Python returns the error: NameError: global name 'angle' is not defined.
Now my second attempt is to do the following:
print timeit.timeit('level_99_function(x)', setup = 'import numpy as np; import my_script.py; x= np.linspace(0,100)')
This doesn't generate any errors, but the problem is two-fold. First, and most importantly, it still doesn't time the level_99_function (or maybe it just doesn't print to output of the timer for whatever reason?) Second, the import statement seems to be running the entire script on import, which takes forever because of all the stuff I've got in this script aside from my level_99_function.
How do I get the timing of the function in question here? And on a more philosophical level, why is this such a struggle in Python? I've already got a variable and a function defined; all I want to do is time that function call with that variable. It would be nice to not have to write a super long line of code, or write multiple lines of code, or have to import things or any of that stuff. It's as easy as tic and toc in Matlab. I guess the corresponding Python commands would be to use 'time.clock()' before and after the function call, but I've read that this can be inaccurate and misleading.
You don't need to import numpy everytime within setup, instead you can import the function and variables you want from the current script with from __main__ import ... as shown in the example below.
import timeit
import numpy as np
def func1(x):
pass
def func2(x):
pass
def func3(x):
return np.array(x > 1000)
if __name__ == '__main__':
x = np.arange(10000)
time = timeit.timeit('func3(x)', setup='from __main__ import func3, x', number=1000)
print(time)
The if __name__ == '__main__' block will prevent the code within the if statement from being ran if you import the code from another script, meaning you won't accidentally run your timing tests if you import your functions.
This code only imports func3 and x. I'm only interested in func3 (not func1 and func2) and I've defined a value to test with (I call it x but it's equivalent to your my_input). You don't need to import numpy in this case.
I would however completely and utterly advise you to take roippi's comment into consideration and use IPython. The %timeit magic method is very, very useful.
As an FYI for the future:
I recently submitted a patch against issue2527, which was committed a few days ago to the default branch. So whenever 3.5 is publicly released, you can do this:
timeit.timeit('level_99_function(x)', globals=globals())
Not quite as awesome as iPython's %timeit, I know, but far better than the from __main__ import ... nonsense that you have to do right now. More info in the docs.
Related
I am working in a Google Colab notebook. There is one particular, computationally intensive piece of code that I'm doing using Cython in the same notebook. Within this piece of code, I want to call a function (defined in another cell in the same notebook, in Python).
Now, that function is heavily integrated with the rest of my pure Python code and rewriting and redefining things for Cython would not be possible.
My question is: How do I call that function written in Python, from another cell that is getting compiled in Cython?
Link I have already looked at:
Call python file with python imports from C using cython
Normally, you would put the whole functionality into a module and import it in the %%cython-cell.
Another less clean (but in case of a notebook probably acceptable) way would be to import from __main__, e.g.:
[1]: def foo():
print("I'm main foo")
and then:
[2]: %%cython
def usefoo():
from __main__ import foo
foo()
and now:
[3]: usefoo()
I'm main foo
Another variant would be to import foo from __main__ slightly differently:
[2]: %%cython
from __main__ import foo
def usefoo2():
foo()
There are two main differences:
if foo isn't (yet) defined in __main__, second %%cython-cell will fail. First version will fail if foo is not or no longer defined during the call of the function usefoo.
if foo is changed in __main__, the first version will use the current version while the second version will always use the version from the moment %%cython-cell built (which might not be the same time the %%cython-cell is run due to caching). This can be quite confusing.
In the long run, this way is quite confusing and puzzling, so after short try-out phase I would change to a more sustainable approach using dedicated modules.
I know that when we do 'import module_name', then it gets loaded only once, irrespective of the number of times the code passes through the import statement.
But if we move the import statement into a function, then for each function call does the module get re-loaded? If not, then why is it a good practice to import a module at the top of the file, instead of in function?
Does this behavior change for a multi threaded or multi process app?
It does not get loaded every time.
Proof:
file.py:
print('hello')
file2.py:
def a():
import file
a()
a()
Output:
hello
Then why put it on the top?:
Because writing the imports inside a function will cause calls to that function take longer.
I know that when we do 'import module_name', then it gets loaded only once, irrespective of the number of times the code passes through the import statement.
Right!
But if we move the import statement into a function, then for each function call does the module get re-loaded?
No. But if you want, you can explicitly do it something like this:
import importlib
importlib.reload(target_module)
If not, then why is it a good practice to import a module at the top of the file, instead of in function?
When Python imports a module, it first checks the module registry (sys.modules) to see if the module is already imported. If that’s the case, Python uses the existing module object as is.
Even though it does not get reloaded, still it has to check if this module is already imported or not. So, there is some extra work done each time the function is called which is unnecessary.
It doesn't get reloaded after every function call and threading does not change this behavior. Here's how I tested it:
test.py:
print("Loaded")
testing.py:
import _thread
def call():
import test
for i in range(10):
call()
_thread.start_new_thread(call, ())
_thread.start_new_thread(call, ())
OUTPUT:
LOADED
To answer your second question, if you import the module at the top of the file, the module will be imported for all functions within the python file. This saves you from having to import the same module in multiple functions if they use the same module.
I am a python beginne, and am currently learning import modules in python.
So my question is:
Suppose I currently have three python files, which is module1.py, module2.py, and module3.py;
In module1.py:
def function1():
print('Hello')
In module2.py, in order to use those functions in module1.py:
import module1
#Also, I have some other public functions in this .py file
def function2():
print('Goodbye')
#Use the function in module1.py
if __name__ == '__main__':
module1.function1();
function2();
In module3.py, I would like to use both the functions from module1.py and module2.py.
import module1
import module2
def function3():
print('Nice yo meet you');
if __name__ == '__main__':
module1.function1()
function3()
module2.function2()
Seems like it works. But my questions are mainly on module3.py. The reason is that in module3.py, I imported both module1 and module2. However, module1 is imported by module2 already. I am just wondering if this is a good way to code? Is this effective? Should I do this? or Should I just avoid doing this and why?
Thank you so much. I am just a beginner, so if I ask stupid questions, please forgive me. Thank you!!
There will be no problem if you avoid circular imports, that is you never import a module that itself imports the current importing module.
A module does not see the importer namespace, so imports in the importer code don't become globals to the imported module.
Also module top-level code will run on first import only.
Edit 1:
I am answering Filipe's comments here because its easier.
"There will be no problem if you avoid circular imports" -> This is incorrect, python is fine with circular imports for the most part."
The fact that you sensed some misconception of mine, doesn't make that particular statement incorrect. It is correct and it is good advice.
(Saying it's fine for the most part looks a bit like saying something will run fine most of time...)
I see what you mean. I avoid it so much that I even thought your first example would give an error right away (it doesn't). You mean there is no need to avoid it because most of the time (actually given certain conditions) Python will go fine with it. I am also certain that there are cases where circular imports would be the easiest solution. That doesn't mean we should use them if we have a choice. That would promote the use of a bad architecture, where every module starts depending on every other.
It also means the coder has to be aware of the caveats.
This link I found here in SO states some of the worries about circular imports.
The previous link is somewhat old so info can be outdated by newer Python versions, but import confusion is even older and still apllies to 3.6.2.
The example you give works well because relevant or initialization module code is wrapped in a function and will not run at import time. Protecting code with an if __name__ == "__main__": also removes it from running when imported.
Something simple like this (the same example from effbot.org) won't work (remember OP says he is a beginner):
# file y.py
import x
x.func1()
# file x.py
import y
def func1():
print('printing from x.func1')
On your second comment you say:
"This is also incorrect. An imported module will become part of the namespace"
Yes. But I didn't mention that, nor its contrary. I just said that an imported module code doesn't know the namespace of the code making the import.
To eliminate the ambiguity I just meant this:
# w.py
def funcw():
print(z_var)
# z.py
import w
z_var = 'foo'
w.funcw() # error: z_var undefined in w module namespace
Running z.py gives the stated error. That's all that I meant.
Now going further, to get the access we want, we go circular...
# w.py
import z # go circular
def funcw():
'''Notice that we gain access not to the z module that imported
us but to the z module we import (yes its the same thing but
carries a different namespace). So the reference we obtain
points to a different object, because it really is in a
different namespace.'''
print(z.z_var, id(z.z_var))
...and we protect some code from running with the import:
# z.py
import w
z_var = ['foo']
if __name__ == '__main__':
print(z_var, id(z_var))
w.funcw()
By running z.py we confirm the objects are different (they can be the same with immutables, but that is python kerning - internal optimization, or implementation details - at work):
['foo'] 139791984046856
['foo'] 139791984046536
Finally I agree with your third comment about being explicit with imports.
Anyway I thank your comments. I actually improved my understanding of the problem because of them (we don't learn much about something by just avoiding it).
Is there a significant difference in performance between importing a function versus declaring it in the current file in Python?
I have a small function (one-liner) that I use often in several .py files in my program. I wish to instead define it once so that changes I make to it are reflected everywhere. However, I am not sure whether using it as an imported function will add additional overhead when calling it...
I doubt there should be a difference between a call to a function locally declared vs a function which is imported. Although, there is a small difference between a line of code being executed vs a function called for the same code to be executed. This should help in case I was a bit confusing with my wording.
Hi Jet Blue for better understanding go with the python wiki PerformanceTips
import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.
Consider the following two snippets of code (originally from Greg McFarlane, I believe - I found it unattributed in a comp.lang.python python-list#python.org posting and later attributed to him in another source):
def doit1():
import string ###### import statement inside function
string.lower('Python')
for num in range(100000):
doit1()
or:
import string ###### import statement outside function
def doit2():
string.lower('Python')
for num in range(100000):
doit2()
doit2 will run much faster than doit1, even though the reference to the string module is global in doit2. Here's a Python interpreter session run using Python 2.3 and the new timeit module, which shows how much faster the second is than the first:
def doit1():
import string
string.lower('Python')
import string
def doit2():
string.lower('Python')
import timeit
t = timeit.Timer(setup='from __main__ import doit1', stmt='doit1()')
t.timeit()
11.479144930839539
t = timeit.Timer(setup='from __main__ import doit2', stmt='doit2()')
t.timeit()
4.6661689281463623
String methods were introduced to the language in Python 2.0. These provide a version that avoids the import completely and runs even faster:
def doit3():
'Python'.lower()
for num in range(100000):
doit3()
Here's the proof from timeit:
def doit3():
'Python'.lower()
t = timeit.Timer(setup='from main import doit3', stmt='doit3()')
t.timeit()
2.5606080293655396
The above example is obviously a bit contrived, but the general principle holds.
Note that putting an import in a function can speed up the initial loading of the module, especially if the imported module might not be required. This is generally a case of a "lazy" optimization -- avoiding work (importing a module, which can be very expensive) until you are sure it is required.
This is only a significant saving in cases where the module wouldn't have been imported at all (from any module) -- if the module is already loaded (as will be the case for many standard modules, like string or re), avoiding an import doesn't save you anything. To see what modules are loaded in the system look in sys.modules.
A good way to do lazy imports is:
email = None
def parse_email():
global email
if email is None:
import email
This way the email module will only be imported once, on the first invocation of parse_email().
I build quite complex python apps, often with Django. To simplify inter-application interfaces I sometimes use service.py modules that abstract away from the models.
As these 'aggregate functionality', they frequently end up with circular imports which are easily eliminated by placing the import statements inside the service functions.
Is there a significant performance or memory cost associated with generally moving imports as close to their point of use as possible? For example, if I only use a particular imported name in one function in a file, it seems natural to place the import in that particular function rather than at the top of the file in its conventional place.
This issue is subtly different to this question because each import is in the function namespace.
The point at which you import a module is not expected to cause a performance penalty, if that's what you're worried about. Modules are singletons and will not be imported every single time an import statement is encountered. However, how you do the import, and subsequent attribute lookups, does have an impact.
For example, if you import math and then every time you need to use the sin(...) function you have to do math.sin(...), this will generally be slower than doing from math import sin and using sin(...) directly as the system does not have to keep looking up the function name within the module.
This lookup-penalty applies to anything that is accessed using the dot . and will be particularly noticeable in a loop. It's therefore advisable to get a local reference to something you might need to use/invoke frequently in a performance critical loop/section.
For example, using the original import math example, right before a critical loop, you could do something like this:
# ... within some function
sin = math.sin
for i in range(0, REALLY_BIG_NUMBER):
x = sin(i) # faster than: x = math.sin(x)
# ...
This is a trivial example, but note that you could do something similar with methods on other objects (e.g. lists, dictionaries, etc).
I'm probably a bit more concerned about the circular imports you mention. If your intention is to "fix" circular imports by moving the import statements into more "local" places (e.g. within a specific function, or block of code, etc) you probably have a deeper issue that you need to address.
Personally, I'd keep the imports at the top of the module as it's normally done. Straying away from that pattern for no good reason is likely to make your code more difficult to go through because the dependencies of your module will not be immediately apparent (i.e. there're import statements scattered throughout the code instead of in a single location).
It might also make the circular dependency issue you seem to be having more difficult to debug and easier to fall into. After all, if the module is not listed above, someone might happily think your module A has no dependency on module B and then up adding an import A in B when A already has import B hidden in some deep dark corner.
Benchmark Sample
Here's a benchmark using the lookup notation:
>>> timeit('for i in range(0, 10000): x = math.sin(i)', setup='import math', number=50000)
89.7203312900001
And another benchmark not using the lookup notation:
>>> timeit('for i in range(0, 10000): x = sin(i)', setup='from math import sin', number=50000)
78.27029322999988
Here there's a 10+ second difference.
Note that your gain depends on how much time the program spends running this code --i.e. a performance critical section instead of sporadic function calls.
See this question.
Basically whenever you import a module, if it's been imported before it will use a cached value.
This means that the performance will be hit the first time that the module is loaded, but once it's been loaded it will cache the values for future calls to it.
As ray said, importing specific functions is (slightly faster)
1.62852311134 for sin()
1.89815092087 for math.sin()
using the following code
from time import time
sin=math.sin
t1=time()
for i in xrange(10000000):
x=sin(i)
t2=time()
for i in xrange(10000000):
z=math.sin(i)
t3=time()
print (t2-t1)
print (t3-t2)
As per timeit, there is a significant cost to an import statement, even when the module is already imported in the same namespace:
$ python -m timeit -s 'import sys
def foo():
import sys
assert sys is not None
' -- 'foo()'
500000 loops, best of 5: 824 nsec per loop
$ python -m timeit -s 'import sys
def foo():
assert sys is not None
' -- 'foo()'
2000000 loops, best of 5: 96.3 nsec per loop
(Timing figures from Python 3.10.6 on Termux running on a phone.)
Instead of imports within functions, I've found that I can take advantage of Python's support for partially initialized modules and do a "tail import", pushing the import statement to the very bottom of the file (with a # isort:skip to get isort to leave it alone). This allows circular imports as long as the tail import is not required at module or class level and only at function or method level.