Calling function using Timeit - python

I'm trying to time several things in python, including upload time to Amazon's S3 Cloud Storage, and am having a little trouble. I can time my hash, and a few other things, but not the upload. I thought this post would finally, get me there, but I can't seem to find salvation. Any help would be appreciated. Very new to python, thanks!
import timeit
accKey = r"xxxxxxxxxxx";
secKey = r"yyyyyyyyyyyyyyyyyyyyyyyyy";
bucket_name = 'sweet_data'
c = boto.connect_s3(accKey, secKey)
b = c.get_bucket(bucket_name);
k = Key(b);
p = '/my/aws.path'
f = 'C:\\my.file'
def upload_data(p, f):
k.key = p
k.set_contents_from_filename(f)
return
t = timeit.Timer(lambda: upload_data(p, f), "from aws_lib import upload_data; p=%r; f = %r" % (p,f))
# Just calling the function works fine
#upload_data(p, f)

I know this is heresy in the Python community, but I actually recommend not to use timeit, especially for something like this. For your purposes, I believe it will be good enough (and possibly even better than timeit!) if you simply use time.time() to time things. In other words, do something like
from time import time
t0 = time()
myfunc()
t1 = time()
print t1 - t0
Note that depending on your platform, you might want to try time.clock() instead (see Stack Overflow questions such as this and this), and if you're on Python 3.3, then you have better options, due to PEP 418.

You can use the command line interface to timeit.
Just save your code as a module without the timing stuff. For example:
# file: test.py
data = range(5)
def foo(l):
return sum(l)
Then you can run the timing code from the command line, like this:
$ python -mtimeit -s 'import test;' 'test.foo(test.data)'
See also:
http://docs.python.org/2/library/timeit.html#command-line-interface
http://docs.python.org/2/library/timeit.html#examples

Related

Accurate timing for imports in Python

The timeit module is great for measuring the execution time of small code snippets but when the code changes global state (like timeit) it's really hard to get accurate timings.
For example if I want to time it takes to import a module then the first import will take much longer than subsequent imports, because the submodules and dependencies are already imported and the files are already cached. So using a bigger number of repeats, like in:
>>> import timeit
>>> timeit.timeit('import numpy', number=1)
0.2819331711316805
>>> # Start a new Python session:
>>> timeit.timeit('import numpy', number=1000)
0.3035142574359181
doesn't really work, because the time for one execution is almost the same as for 1000 rounds. I could execute the command to "reload" the package:
>>> timeit.timeit('imp.reload(numpy)', 'import importlib as imp; import numpy', number=1000)
3.6543283935557156
But that it's only 10 times slower than the first import seems to suggest it's not accurate either.
It also seems impossible to unload a module entirely ("Unload a module in Python").
So the question is: What would be an appropriate way to accuratly measure the import time?
Since it's nearly impossible to fully unload a module, maybe the inspiration behind this answer is this...
You could run a loop in a python script to run x times a python command importing numpy and another one doing nothing, and substract both + average:
import subprocess,time
n=100
python_load_time = 0
numpy_load_time = 0
for i in range(n):
s = time.time()
subprocess.call(["python","-c","import numpy"])
numpy_load_time += time.time()-s
s = time.time()
subprocess.call(["python","-c","pass"])
python_load_time += time.time()-s
print("average numpy load time = {}".format((numpy_load_time-python_load_time)/n))

Debugging: Get filename and line number from which a function is called?

I'm currently building quite a complex system in Python, and when I'm debugging I often put simple print statements in several scripts. To keep an overview I often also want to print out the file name and line number where the print statement is located. I can of course do that manually, or with something like this:
from inspect import currentframe, getframeinfo
print getframeinfo(currentframe()).filename + ':' + str(getframeinfo(currentframe()).lineno) + ' - ', 'what I actually want to print out here'
Which prints something like:
filenameX.py:273 - what I actually want to print out here
To make it more simple, I want to be able to do something like:
print debuginfo(), 'what I actually want to print out here'
So I put it into a function somewhere and tried doing:
from debugutil import debuginfo
print debuginfo(), 'what I actually want to print out here'
print debuginfo(), 'and something else here'
Unfortunately, I get:
debugutil.py:3 - what I actually want to print out here
debugutil.py:3 - and something else here
It prints out the file name and line number on which I defined the function, instead of the line on which I call debuginfo(). This is obvious, because the code is located in the debugutil.py file.
So my question is actually: How can I get the filename and line number from which this debuginfo() function is called?
The function inspect.stack() returns a list of frame records, starting with the caller and moving out, which you can use to get the information you want:
from inspect import getframeinfo, stack
def debuginfo(message):
caller = getframeinfo(stack()[1][0])
print("%s:%d - %s" % (caller.filename, caller.lineno, message)) # python3 syntax print
def grr(arg):
debuginfo(arg) # <-- stack()[1][0] for this line
grr("aargh") # <-- stack()[2][0] for this line
Output:
example.py:8 - aargh
If you put your trace code in another function, and call that from your main code, then you need to make sure you get the stack information from the grandparent, not the parent or the trace function itself
Below is a example of 3 level deep system to further clarify what I mean. My main function calls a trace function, which calls yet another function to do the work.
######################################
import sys, os, inspect, time
time_start = 0.0 # initial start time
def trace_libary_init():
global time_start
time_start = time.time() # when the program started
def trace_library_do(relative_frame, msg=""):
global time_start
time_now = time.time()
# relative_frame is 0 for current function (this one),
# 1 for direct parent, or 2 for grand parent..
total_stack = inspect.stack() # total complete stack
total_depth = len(total_stack) # length of total stack
frameinfo = total_stack[relative_frame][0] # info on rel frame
relative_depth = total_depth - relative_frame # length of stack there
# Information on function at the relative frame number
func_name = frameinfo.f_code.co_name
filename = os.path.basename(frameinfo.f_code.co_filename)
line_number = frameinfo.f_lineno # of the call
func_firstlineno = frameinfo.f_code.co_firstlineno
fileline = "%s:%d" % (filename, line_number)
time_diff = time_now - time_start
print("%13.6f %-20s %-24s %s" % (time_diff, fileline, func_name, msg))
################################
def trace_do(msg=""):
trace_library_do(1, "trace within interface function")
trace_library_do(2, msg)
# any common tracing stuff you might want to do...
################################
def main(argc, argv):
rc=0
trace_libary_init()
for i in range(3):
trace_do("this is at step %i" %i)
time.sleep((i+1) * 0.1) # in 1/10's of a second
return rc
rc=main(sys.argv.__len__(), sys.argv)
sys.exit(rc)
This will print something like:
$ python test.py
0.000005 test.py:39 trace_do trace within interface func
0.001231 test.py:49 main this is at step 0
0.101541 test.py:39 trace_do trace within interface func
0.101900 test.py:49 main this is at step 1
0.302469 test.py:39 trace_do trace within interface func
0.302828 test.py:49 main this is at step 2
The trace_library_do() function at the top is an example of something that you can drop into a library, and then call it from other tracing functions. The relative depth value controls which entry in the python stack gets printed.
I showed pulling out a few other interesting values in that function, like the line number of start of the function, the total stack depth, and the full path to the file. I didn't show it, but the global and local variables in the function are also available in inspect, as well as the full stack trace to all other functions below yours. There is more than enough information with what I am showing above to make hierarchical call/return timing traces. It's actually not that much further to creating the main parts of your own source level debugger from here -- and it's all mostly just sitting there waiting to be used.
I'm sure someone will object that I'm using internal fields with data returned by the inspect structures, as there may well be access functions that do this same thing for you. But I found them in by stepping through this type of code in a python debugger, and they work at least here. I'm running python 2.7.12, your results might very if you are running a different version.
In any case, I strongly recommend that you import the inspect code into some python code of your own, and look at what it can provide you -- Especially if you can single step through your code in a good python debugger. You will learn a lot on how python works, and get to see both the benefits of the language, and what is going on behind the curtain to make that possible.
Full source level tracing with timestamps is a great way to enhance your understanding of what your code is doing, especially in more of a dynamic real time environment. The great thing about this type of trace code is that once it's written, you don't need debugger support to see it.
An update to the accepted answer using string interpolation and displaying the caller's function name.
import inspect
def debuginfo(message):
caller = inspect.getframeinfo(inspect.stack()[1][0])
print(f"{caller.filename}:{caller.function}:{caller.lineno} - {message}")
The traceprint package can now do that for you:
import traceprint
def func():
print(f'Hello from func')
func()
# File "/traceprint/examples/example.py", line 6, in <module>
# File "/traceprint/examples/example.py", line 4, in func
# Hello from func
PyCharm will automatically make the file link clickable / followable.
Install via pip install traceprint.
Just put the code you posted into a function:
from inspect import currentframe, getframeinfo
def my_custom_debuginfo(message):
print getframeinfo(currentframe()).filename + ':' + str(getframeinfo(currentframe()).lineno) + ' - ', message
and then use it as you want:
# ... some code here ...
my_custom_debuginfo('what I actually want to print out here')
# ... more code ...
I recommend you put that function in a separate module, that way you can reuse it every time you need it.
Discovered this question for a somewhat related problem, but I wanted more details re: the execution (and I didn't want to install an entire call graph package).
If you want more detailed information, you can retrieve a full traceback with the standard library module traceback, and either stash the stack object (a list of tuples) with traceback.extract_stack() or print it out with traceback.print_stack(). This was more suitable for my needs, hope it helps someone else!

Is it REALLY true that Python code runs faster in a function?

I saw a comment that lead me to the question Why does Python code run faster in a function?.
I got to thinking, and figured I would try it myself using the timeit library, however I got very different results:
(note: 10**8 was changed to 10**7 to make things a little bit speedier to time)
>>> from timeit import repeat
>>> setup = """
def main():
for i in xrange(10**7):
pass
"""
>>> stmt = """
for i in xrange(10**7):
pass
"""
>>> min(repeat('main()', setup, repeat=7, number=10))
1.4399558753975725
>>> min(repeat(stmt, repeat=7, number=10))
1.4410973942722194
>>> 1.4410973942722194 / 1.4399558753975725
1.000792745732109
Did I use timeit correctly?
Why are these results less 0.1% different from each other, while the results from the other question were nearly 250% different?
Does it only make a difference when using CPython compiled versions of Python (like Cython)?
Ultimately: is Python code really faster in a function, or does it just depend on how you time it?
The flaw in your test is the way timeit compiles the code of your stmt. It's actually compiled within the following template:
template = """
def inner(_it, _timer):
%(setup)s
_t0 = _timer()
for _i in _it:
%(stmt)s
_t1 = _timer()
return _t1 - _t0
"""
Thus stmt is actually running in a function, using the fastlocals array (i.e. STORE_FAST).
Here's a test with your function in the question as f_opt versus the unoptimized compiled stmt executed in the function f_no_opt:
>>> code = compile(stmt, '<string>', 'exec')
>>> f_no_opt = types.FunctionType(code, globals())
>>> t_no_opt = min(timeit.repeat(f_no_opt, repeat=10, number=10))
>>> t_opt = min(timeit.repeat(f_opt, repeat=10, number=10))
>>> t_opt / t_no_opt
0.4931101445632647
It comes down to compiler optimization algorithms. When performing Just-in-time compilation, it is much easier to identify frequently used chunks of code if they're found in functions.
The efficiency gains really would depend on the nature of the tasks being performed. In the example you gave, you aren't really doing anything computationally intensive, leaving fewer opportunities to achieve gains in efficiency through optimization.
As others have pointed out, however, CPython does not do just-in-time compilation. When code is compiled, however, C compilers will often execute them faster.
Check out this document on the GCC compiler: http://gcc.gnu.org/onlinedocs/gcc/Inline.html

Python: how to run several scripts (or functions) at the same time under windows 7 multicore processor 64bit

sorry for this question because there are several examples in Stackoverflow. I am writing in order to clarify some of my doubts because I am quite new in Python language.
i wrote a function:
def clipmyfile(inFile,poly,outFile):
... # doing something with inFile and poly and return outFile
Normally I do this:
clipmyfile(inFile="File1.txt",poly="poly1.shp",outFile="res1.txt")
clipmyfile(inFile="File2.txt",poly="poly2.shp",outFile="res2.txt")
clipmyfile(inFile="File3.txt",poly="poly3.shp",outFile="res3.txt")
......
clipmyfile(inFile="File21.txt",poly="poly21.shp",outFile="res21.txt")
I had read in this example Run several python programs at the same time and i can use (but probably i wrong)
from multiprocessing import Pool
p = Pool(21) # like in your example, running 21 separate processes
to run the function in the same time and speed my analysis
I am really honest to say that I didn't understand the next step.
Thanks in advance for help and suggestion
Gianni
The map that is used in the example you provided only works for functions that recieve one argument. You can see a solution to this here: Python multiprocessing pool.map for multiple arguments
In your case what you would do is (assuming you have 3 arrays with files, polies, outs):
def expand_args(f_p_o):
clipmyfile(*f_p_o)
files = ["file1.txt", "file2.txt"]
polis = ["poli1.txt", "poly2.txt"]
outis = ["out1.txt", "out2.txt"]
len_f = len(files)
p = Pool()
p.map(expand_args, [(files[i], polis[i], outis[i]) for i in xrange(len_f)])

timeit module hangs with bigger values of pow()

I am trying to calculate the time taken by pow function to calculate exponential modulo. With the values of g,x,p hardcoded the code gives error and with the values placed in the pow function, the code hangs. The same piece of code is working efficiently when i am using time() and clock() to calculate the time taken by this piece of code.
i wanted accuracy and for that now i have moved to timeit module after testing with clock() and time() functions.
The code works fine with small values such as pow(2, 3, 5) which makes sense. how can i improve the efficency to calculate time using timeit module.
Also i am a beginner to python, forgive me if there is any stupid mistake in the code.
import math
import random
import hashlib
import time
from timeit import Timer
g = 141802876407053547664378835005750805370737584038368838959151050908654130616798415530564917923311706921535439557793280725844349256960807398107370211978304
x = 1207729835787890214
p = 4870352607375058055471602136317178172283784073796673298937466544646468718314482464390112574915498953621226853454222898392076852427324057496200810018794472
t = Timer('pow(g,x,p)', 'import math')
z = t.timeit()
print ('the value of z is: '), z
Thanks
There are two issues here:
You can't directly access globals from timeit: See this question. You can use this to fix the error:
t = Timer('pow(g,x,p)', 'from __main__ import g,x,p')
Or just put the numerical values directly in the string.
By default, the timeit module runs 1000000 iterations, which will take much too long here. You can change the number of iterations, for example:
z = t.timeit(1000)
This will prevent what seems like a hang (but is actually just a very long calculation).

Categories