How to use python timeit to get median runtime - python

I want to benchmark a bit of python code (not the language I am used to, but I have to do some comparisons in python). I have understood that timeit is a good tool for this, and I have a code like this:
n = 10
duration = timeit.Timer(my_func).timeit(number=n)
duration/n
to measure the mean runtime of the function. Now, I want to instead have the median time (the reason is that I want to make a comparison to something I get in median time, and it would be good to use the same measure in all cases). Now, timeit only seems to return the full runtime, and not the time of each individual run, so I am not sure how to find the median runtime. What is the best way to get this?

You can use the repeat method instead, which gives you the individual times as a list:
import timeit
from statistics import median
def my_func():
for _ in range(1000000):
pass
n = 10
durations = timeit.Timer(my_func).repeat(repeat=n, number=1)
print(median(durations))
Try it online!

Related

Python sortedcontainers is too slow

#This is my code
from sortedcontainers import SortedList, SortedSet, SortedDict
import timeit
import random
def test_speed1(data):
SortedList(data)
def test_speed2(data):
sorted_data = SortedList()
for val in data:
sorted_data.add(val)
data = []
numpts = 10 ** 5
for i in range(numpts):
data.append(random.random())
print(f'Num of pts:{len(data)}')
sorted_data = SortedList()
n_runs=10
result = timeit.timeit(stmt='test_speed1(data)', globals=globals(), number=n_runs)
print(f'Speed1 is {1000*result/n_runs:0.0f}ms')
n_runs=10
result = timeit.timeit(stmt='test_speed2(data)', globals=globals(), number=n_runs)
print(f'Speed2 is {1000*result/n_runs:0.0f}ms')
enter image description here
The code for test speed2 is supposed to take 12~ ms (I checked the setup they report). Why does it take 123 ms (10X slowers)???
test_speed1 runs in 15 ms (which makes sense)
I am running in Conda.
The
This is where they outlined the performance
https://grantjenks.com/docs/sortedcontainers/performance.html
You are presumably not executing your benchmark in the same conditions as they do:
you are not using the same benchmark code,
you don't use the same computer with the same performance characteristics,
you are not using the same Python version and environment,
you are not running the same OS,
etc.
Hence, the benchmark results are not comparable and you cannot conclude anything about the performance (and certainly not that "sortedcontainers is too slow").
Performance is only relative to a given execution context and they only stated that their solution is faster relative to other concurrent solutions.
If you really wish to execute the benchmark on your computer, follow the instructions they give in the documentation.
"init() uses Python’s highly optimized sorted() function while add() cannot.". This is why the speed2 is faster than the speed3.
This is the answer I got from the developers on the sortedcontainers library.

How to omit methods in cProfile

When I profile code which heavily uses the all/any builtins I find that the call graph as produced by profiling tools (like gprof2dot) can be more difficult to interpret, because there are many edges from different callers leading to the all/any node, and many edges leading away. I'm looking for a way of essentially omitting the all/any nodes from the call graph such that in this very simplified example the two code paths would not converge and then split.
import cProfile
import random
import time
def bad_true1(*args, **kwargs):
time.sleep(random.random() / 100)
return True
def bad_true2(*args, **kwargs):
time.sleep(random.random() / 100)
return True
def code_path_one():
nums = [random.random() for _ in range(100)]
return all(bad_true1(x) for x in nums)
def code_path_two():
nums = [random.random() for _ in range(100)]
return all(bad_true2(x) for x in nums)
def do_the_thing():
code_path_one()
code_path_two()
def main():
profile = OmittingProfile()
profile.enable()
do_the_thing()
profile.disable()
profile.dump_stats("foo.prof")
if "__main__" == __name__:
main()
I don't think cProfile provides a way to filter out functions while collecting. You can probably manually filter functions out after you got the stats, but that's more than you want to do.
Also, in my experience, as long as there are a lot of nested calls, cProfile will only be helpful to find the "most time consuming function", and that's it, no extra context information, as cProfile only logs its parent function, not the whole call stack. For complicated programs, that may not be super helpful.
From the reasons above, I would recommend you to try some other profiling tools. For example, viztracer. VizTracer draws out your entire program execution so you know what happens in your program. And it happens to have the ability to filter out builtin functions.
pip install viztracer
# --ignore_c_function is the optional filter
viztracer --ignore_c_function your_program.py
vizviewer result.json
Of course, there are other statistical profilers that provide flamegraph, which contains less information, but also introduces less overhead, like pyspy, scalene.
cProfile is a good tool for some simple profiling, but it's definitely not the best tool in the market.

Accurate timing for imports in Python

The timeit module is great for measuring the execution time of small code snippets but when the code changes global state (like timeit) it's really hard to get accurate timings.
For example if I want to time it takes to import a module then the first import will take much longer than subsequent imports, because the submodules and dependencies are already imported and the files are already cached. So using a bigger number of repeats, like in:
>>> import timeit
>>> timeit.timeit('import numpy', number=1)
0.2819331711316805
>>> # Start a new Python session:
>>> timeit.timeit('import numpy', number=1000)
0.3035142574359181
doesn't really work, because the time for one execution is almost the same as for 1000 rounds. I could execute the command to "reload" the package:
>>> timeit.timeit('imp.reload(numpy)', 'import importlib as imp; import numpy', number=1000)
3.6543283935557156
But that it's only 10 times slower than the first import seems to suggest it's not accurate either.
It also seems impossible to unload a module entirely ("Unload a module in Python").
So the question is: What would be an appropriate way to accuratly measure the import time?
Since it's nearly impossible to fully unload a module, maybe the inspiration behind this answer is this...
You could run a loop in a python script to run x times a python command importing numpy and another one doing nothing, and substract both + average:
import subprocess,time
n=100
python_load_time = 0
numpy_load_time = 0
for i in range(n):
s = time.time()
subprocess.call(["python","-c","import numpy"])
numpy_load_time += time.time()-s
s = time.time()
subprocess.call(["python","-c","pass"])
python_load_time += time.time()-s
print("average numpy load time = {}".format((numpy_load_time-python_load_time)/n))

Optimizing a multithreaded numpy array function

Given 2 large arrays of 3D points (I'll call the first "source", and the second "destination"), I needed a function that would return indices from "destination" which matched elements of "source" as its closest, with this limitation: I can only use numpy... So no scipy, pandas, numexpr, cython...
To do this i wrote a function based on the "brute force" answer to this question. I iterate over elements of source, find the closest element from destination and return its index. Due to performance concerns, and again because i can only use numpy, I tried multithreading to speed it up. Here are both threaded and unthreaded functions and how they compare in speed on an 8 core machine.
import timeit
import numpy as np
from numpy.core.umath_tests import inner1d
from multiprocessing.pool import ThreadPool
def threaded(sources, destinations):
# Define worker function
def worker(point):
dlt = (destinations-point) # delta between destinations and given point
d = inner1d(dlt,dlt) # get distances
return np.argmin(d) # return closest index
# Multithread!
p = ThreadPool()
return p.map(worker, sources)
def unthreaded(sources, destinations):
results = []
#for p in sources:
for i in range(len(sources)):
dlt = (destinations-sources[i]) # difference between destinations and given point
d = inner1d(dlt,dlt) # get distances
results.append(np.argmin(d)) # append closest index
return results
# Setup the data
n_destinations = 10000 # 10k random destinations
n_sources = 10000 # 10k random sources
destinations= np.random.rand(n_destinations,3) * 100
sources = np.random.rand(n_sources,3) * 100
#Compare!
print 'threaded: %s'%timeit.Timer(lambda: threaded(sources,destinations)).repeat(1,1)[0]
print 'unthreaded: %s'%timeit.Timer(lambda: unthreaded(sources,destinations)).repeat(1,1)[0]
Retults:
threaded: 0.894030461056
unthreaded: 1.97295164054
Multithreading seems beneficial but I was hoping for more than 2X increase given the real life dataset i deal with are much larger.
All recommendations to improve performance (within the limitations described above) will be greatly appreciated!
Ok, I've been reading Maya documentation on python and I came to these conclusions/guesses:
They're probably using CPython inside (several references to that documentation and not any other).
They're not fond of threads (lots of non-thread safe methods)
Since the above, I'd say it's better to avoid threads. Because of the GIL problem, this is a common problem and there are several ways to do the earlier.
Try to build a tool C/C++ extension. Once that is done, use threads in C/C++. Personally, I'd only try SIP to work, and then move on.
Use multiprocessing. Even if your custom python distribution doesn't include it, you can get to a working version since it's all pure python code. multiprocessing is not affected by the GIL since it spawns separate processes.
The above should've worked out for you. If not, try another parallel tool (after some serious praying).
On a side note, if you're using outside modules, be most mindful of trying to match maya's version. This may have been the reason because you couldn't build scipy. Of course, scipy has a huge codebase and the windows platform is not the most resilient to build stuff.

timeit module hangs with bigger values of pow()

I am trying to calculate the time taken by pow function to calculate exponential modulo. With the values of g,x,p hardcoded the code gives error and with the values placed in the pow function, the code hangs. The same piece of code is working efficiently when i am using time() and clock() to calculate the time taken by this piece of code.
i wanted accuracy and for that now i have moved to timeit module after testing with clock() and time() functions.
The code works fine with small values such as pow(2, 3, 5) which makes sense. how can i improve the efficency to calculate time using timeit module.
Also i am a beginner to python, forgive me if there is any stupid mistake in the code.
import math
import random
import hashlib
import time
from timeit import Timer
g = 141802876407053547664378835005750805370737584038368838959151050908654130616798415530564917923311706921535439557793280725844349256960807398107370211978304
x = 1207729835787890214
p = 4870352607375058055471602136317178172283784073796673298937466544646468718314482464390112574915498953621226853454222898392076852427324057496200810018794472
t = Timer('pow(g,x,p)', 'import math')
z = t.timeit()
print ('the value of z is: '), z
Thanks
There are two issues here:
You can't directly access globals from timeit: See this question. You can use this to fix the error:
t = Timer('pow(g,x,p)', 'from __main__ import g,x,p')
Or just put the numerical values directly in the string.
By default, the timeit module runs 1000000 iterations, which will take much too long here. You can change the number of iterations, for example:
z = t.timeit(1000)
This will prevent what seems like a hang (but is actually just a very long calculation).

Categories