In a slightly contrived experiment I wanted to compare some of Python's built-in functions to those of numpy. When I started timing these though, I found something bizarre.
When I wrote the following:
import timeit
timeit.timeit('import math; math.e**2', number=1000000)
I would get two different results in almost random alternation in a very statistically significant way.
This alternates between 2 seconds, and 0.5 seconds.
This confused me so I ran some experiments to figure out what was going on and I was only more confused. So I tried the following experiments:
[timeit.timeit('import math; math.e**2', number=1000000) for i in xrange(100)]
which led entirely to the 0.5 number. I then tried seeding this with a generator:
test = (timeit.timeit('import math; math.e**2', number=1000000) for i in xrange(100))
[item for item in test]
which led to a list entirely full of the 2.0 number.
On the suggestion of alecxe I changed my timeit statement to:
timeit.timeit('math.e**2', 'import math', number=1000000)
which similarly alternated between about 0.1 and 0.4 seconds, but when I reran the experiment comparing generators and list comprehensions, but this time the results were flipped. That is to say that the generator expression regularly came up with the 0.1 second number, while the list comprehension returned a full list of the 0.4 second number.
Direct console output:
>>> test = (timeit.timeit('math.e**2', 'import math', number=1000000) for i in xrange(100))
>>> test.next()
0.15114784240722656
>>> timeit.timeit('math.e**2', 'import math', number=1000000)
0.44176197052001953
>>>
Edit: I'm using Ubuntu 12.04 running dwm, and I've seen these results both in xterm and a gnome-terminal. I'm using python 2.7.3
Does anybody know what's going on here? This seems really bizarre to me.
Turns out there were a couple things happening here, though apparently some of these quirks my be specific to my machine, but nevertheless I figure it's worth posting them in case someone is puzzled by the same thing.
Firstly, there's a different between the two timeit functions in that the:
timeit.timeit('math.e**2', 'import math', number=1000000)
the import statements are lazily loaded. This becomes obvious if you try the following experiment:
timeit.timeit('1+1', 'import math', number=1000000)
versus:
timeit.timeit('1+1', number=1000000)
So when it was directly run in the list comprehension it looks like this import statement was being loaded for every entry. (Exact reasons for this are probably related to my configuration).
Past that, going back to the original question, it looks like 3/4 of the time was actually spent import math, so I'm guessing that when the equation was generated, there was no cache storage between iterations, while there was import caching within the list comprehension (again, the exact reason for this is probably configuration specific)
Related
#This is my code
from sortedcontainers import SortedList, SortedSet, SortedDict
import timeit
import random
def test_speed1(data):
SortedList(data)
def test_speed2(data):
sorted_data = SortedList()
for val in data:
sorted_data.add(val)
data = []
numpts = 10 ** 5
for i in range(numpts):
data.append(random.random())
print(f'Num of pts:{len(data)}')
sorted_data = SortedList()
n_runs=10
result = timeit.timeit(stmt='test_speed1(data)', globals=globals(), number=n_runs)
print(f'Speed1 is {1000*result/n_runs:0.0f}ms')
n_runs=10
result = timeit.timeit(stmt='test_speed2(data)', globals=globals(), number=n_runs)
print(f'Speed2 is {1000*result/n_runs:0.0f}ms')
enter image description here
The code for test speed2 is supposed to take 12~ ms (I checked the setup they report). Why does it take 123 ms (10X slowers)???
test_speed1 runs in 15 ms (which makes sense)
I am running in Conda.
The
This is where they outlined the performance
https://grantjenks.com/docs/sortedcontainers/performance.html
You are presumably not executing your benchmark in the same conditions as they do:
you are not using the same benchmark code,
you don't use the same computer with the same performance characteristics,
you are not using the same Python version and environment,
you are not running the same OS,
etc.
Hence, the benchmark results are not comparable and you cannot conclude anything about the performance (and certainly not that "sortedcontainers is too slow").
Performance is only relative to a given execution context and they only stated that their solution is faster relative to other concurrent solutions.
If you really wish to execute the benchmark on your computer, follow the instructions they give in the documentation.
"init() uses Python’s highly optimized sorted() function while add() cannot.". This is why the speed2 is faster than the speed3.
This is the answer I got from the developers on the sortedcontainers library.
I want to benchmark a bit of python code (not the language I am used to, but I have to do some comparisons in python). I have understood that timeit is a good tool for this, and I have a code like this:
n = 10
duration = timeit.Timer(my_func).timeit(number=n)
duration/n
to measure the mean runtime of the function. Now, I want to instead have the median time (the reason is that I want to make a comparison to something I get in median time, and it would be good to use the same measure in all cases). Now, timeit only seems to return the full runtime, and not the time of each individual run, so I am not sure how to find the median runtime. What is the best way to get this?
You can use the repeat method instead, which gives you the individual times as a list:
import timeit
from statistics import median
def my_func():
for _ in range(1000000):
pass
n = 10
durations = timeit.Timer(my_func).repeat(repeat=n, number=1)
print(median(durations))
Try it online!
The timeit module is great for measuring the execution time of small code snippets but when the code changes global state (like timeit) it's really hard to get accurate timings.
For example if I want to time it takes to import a module then the first import will take much longer than subsequent imports, because the submodules and dependencies are already imported and the files are already cached. So using a bigger number of repeats, like in:
>>> import timeit
>>> timeit.timeit('import numpy', number=1)
0.2819331711316805
>>> # Start a new Python session:
>>> timeit.timeit('import numpy', number=1000)
0.3035142574359181
doesn't really work, because the time for one execution is almost the same as for 1000 rounds. I could execute the command to "reload" the package:
>>> timeit.timeit('imp.reload(numpy)', 'import importlib as imp; import numpy', number=1000)
3.6543283935557156
But that it's only 10 times slower than the first import seems to suggest it's not accurate either.
It also seems impossible to unload a module entirely ("Unload a module in Python").
So the question is: What would be an appropriate way to accuratly measure the import time?
Since it's nearly impossible to fully unload a module, maybe the inspiration behind this answer is this...
You could run a loop in a python script to run x times a python command importing numpy and another one doing nothing, and substract both + average:
import subprocess,time
n=100
python_load_time = 0
numpy_load_time = 0
for i in range(n):
s = time.time()
subprocess.call(["python","-c","import numpy"])
numpy_load_time += time.time()-s
s = time.time()
subprocess.call(["python","-c","pass"])
python_load_time += time.time()-s
print("average numpy load time = {}".format((numpy_load_time-python_load_time)/n))
While searching for some numpy stuff, I came across a question discussing the rounding accuracy of numpy.dot():
Numpy: Difference between dot(a,b) and (a*b).sum()
Since I happen to have two (different) Computers with Haswell-CPUs sitting on my desk, that should provide FMAand everything, I thought I'd test the example given by Ophion in the first answer, and I got a result that somewhat surprised me:
After updating/installing/fixing lapack/blas/atlas/numpy, I get the following on both machines:
>>> a = np.ones(1000, dtype=np.float128)+1e-14
>>> (a*a).sum()
1000.0000000000199999
>>> np.dot(a,a)
1000.0000000000199948
>>> a = np.ones(1000, dtype=np.float64)+1e-14
>>> (a*a).sum()
1000.0000000000198
>>> np.dot(a,a)
1000.0000000000176
So the standard multiplication + sum() is more precise than np.dot(). timeit however confirmed that the .dot() version is faster (but not much) for both float64 and float128.
Can anyone provide an explanation for this?
edit: I accidentally deleted the info on numpy versions: same results for 1.9.0 and 1.9.3 with python 3.4.0 and 3.4.1.
It looks like they recently added a special Pairwise Summation to ndarray.sum for improved numerical stability.
From PR 3685, this affects:
all add.reduce calls that go over float_add with IS_BINARY_REDUCE true
so this also improves mean/std/var and anything else that uses sum.
See here for code changes.
I have a piece of code which computes the Helmholtz-Hodge Decomposition.
I've been running on my Mac OS Yosemite and it was working just fine. A month ago, however, my Mac got pretty slow (it was really old), and I opted to buy a new notebook (Windows 8.1, Dell).
After installing all Python libs and so on, I continued my work running this same code (versioned in Git). And then the result was pretty weird, completely different from the one obtained in the old notebook.
For instance, what I do is to construct to matrices a and b(really long calculus) and then I call the solver:
s = numpy.linalg.solve(a, b)
This was returning a (wrong, and different of the result obtained in my Mac, which was right).
Then, I tried to use:
s = scipy.linalg.solve(a, b)
And the program exits with code 0 but at the middle of it.
Then, I just made a simple test of:
print 'here1'
s = scipy.linalg.solve(a, b)
print 'here2'
And here2 is never printed.
I tried:
print 'here1'
x, info = numpy.linalg.cg(a, b)
print 'here2'
And the same happens.
I also tried to check the solution after using numpy.linalg.solve:
print numpy.allclose(numpy.dot(a, s), b)
And I got a False (?!).
I don't know what is happening, how to find a solution, I just know that the same code runs in my Mac, but it would be very good if I could run it in other platforms. Now I'm stucked in this problem (don't have a Mac anymore) and with no clue about the cause.
The weirdest thing is that I don't receive any error on runtime warning, no feedback at all.
Thank you for any help.
EDIT:
Numpy Suit Test Results:
Scipy Suit Test Results:
Download Anaconda package manager
http://continuum.io/downloads
When you download this it will already have all the dependencies for numpy worked out for you. It installs locally and will work on most platforms.
This is not really an answer, but this blog discusses in length the problems of having a numpy ecosystem that evolves fast, at the expense of reproducibility.
By the way, which version of numpy are you using? The documentation for the latest 1.9 does not report any method called cg as the one you use...
I suggest the use of this example so that you (and others) can check the results.
>>> import numpy as np
>>> import scipy.linalg
>>> np.random.seed(123)
>>> a = np.random.random(size=(10000, 10000))
>>> b = np.random.random(size=(10000,))
>>> s_np = np.linalg.solve(a, b)
>>> s_sc = scipy.linalg.solve(a, b)
>>> np.allclose(s_np,s_sc)
>>> s_np
array([-15.59186559, 7.08345804, 4.48174646, ..., -16.43310046,
-8.81301553, -10.77509242])
I hope you can find the answer - one option in the future is to create a virtual machine for each of your projects, using Docker. This allows easy portability.
See a great article here discussing Docker for research.