I benchmarked these two functions (they unzip pairs back into source lists, came from here):
n = 10**7
a = list(range(n))
b = list(range(n))
pairs = list(zip(a, b))
def f1(a, b, pairs):
a[:], b[:] = zip(*pairs)
def f2(a, b, pairs):
for i, (a[i], b[i]) in enumerate(pairs):
pass
Results with timeit.timeit (five rounds, numbers are seconds):
f1 1.06 f2 1.57
f1 0.96 f2 1.69
f1 1.00 f2 1.85
f1 1.11 f2 1.64
f1 0.95 f2 1.63
So clearly f1 is a lot faster than f2, right?
But then I also measured with timeit.default_timer and got a completely different picture:
f1 7.28 f2 1.92
f1 5.34 f2 1.66
f1 6.46 f2 1.70
f1 6.82 f2 1.59
f1 5.88 f2 1.63
So clearly f2 is a lot faster, right?
Sigh. Why do the timings totally differ like that, and which timing method should I believe?
Full benchmark code:
from timeit import timeit, default_timer
n = 10**7
a = list(range(n))
b = list(range(n))
pairs = list(zip(a, b))
def f1(a, b, pairs):
a[:], b[:] = zip(*pairs)
def f2(a, b, pairs):
for i, (a[i], b[i]) in enumerate(pairs):
pass
print('timeit')
for _ in range(5):
for f in f1, f2:
t = timeit(lambda: f(a, b, pairs), number=1)
print(f.__name__, '%.2f' % t, end=' ')
print()
print('default_timer')
for _ in range(5):
for f in f1, f2:
t0 = default_timer()
f(a, b, pairs)
t = default_timer() - t0
print(f.__name__, '%.2f' % t, end=' ')
print()
As Martijn commented, the difference is Python's garbage collection, which timeit.timeit disables during its run. And zip creates 10 million iterator objects, one for each of the 10 million iterables it's given.
So, garbage-collecting 10 million objects simply takes a lot of time, right? Mystery solved!
Well... no. That's not really what happens, and it's way more interesting than that. And there's a lesson to be learned to make such code faster in real life.
Python's main way to discard objects no longer needed is reference counting. The garbage collector, which is being disabled here, is for reference cycles, which the reference counting won't catch. And there aren't any cycles here, so it's all discarded by reference counting and the garbage collector doesn't actually collect any garbage.
Let's look at a few things. First, let's reproduce the much faster time by disabling the garbage collector ourselves.
Common setup code (all further blocks of code should be run directly after this in a fresh run, don't combine them):
import gc
from timeit import default_timer as timer
n = 10**7
a = list(range(n))
b = list(range(n))
pairs = list(zip(a, b))
Timing with garbage collection enabled (the default):
t0 = timer()
a[:], b[:] = zip(*pairs)
t1 = timer()
print(t1 - t0)
I ran it three times, took 7.09, 7.03 and 7.09 seconds.
Timing with garbage collection disabled:
t0 = timer()
gc.disable()
a[:], b[:] = zip(*pairs)
gc.enable()
t1 = timer()
print(t1 - t0)
Took 0.96, 1.02 and 0.99 seconds.
So now we know it's indeed the garbage collection that somehow takes most of the time, even though it's not collecting anything.
Here's something interesting: Already just the creation of the zip iterator is responsible for most of the time:
t0 = timer()
z = zip(*pairs)
t1 = timer()
print(t1 - t0)
That took 6.52, 6.51 and 6.50 seconds.
Note that I kept the zip iterator in a variable, so there isn't even anything to discard yet, neither by reference counting nor by garbage collecting!
What?! Where does the time go, then?
Well... as I said, there are no reference cycles, so the garbage collector won't actually collect any garbage. But the garbage collector doesn't know that! In order to figure that out, it needs to check!
Since the iterators could become part of a reference cycle, they're registered for garbage collection tracking. Let's see how many more objects get tracked due to the zip creation (doing this just after the common setup code):
gc.collect()
tracked_before = len(gc.get_objects())
z = zip(*pairs)
print(len(gc.get_objects()) - tracked_before)
The output: 10000003 new objects tracked. I believe that's the zip object itself, its internal tuple to hold the iterators, its internal result holder tuple, and the 10 million iterators.
Ok, so the garbage collector tracks all these objects. But what does that mean? Well, every now and then, after a certain number of new object creations, the collector goes through the tracked objects to see whether some are garbage and can be discarded. The collector keeps three "generations" of tracked objects. New objects go into generation 0. If they survive a collection run there, they're moved into generation 1. If they survive a collection there, they're moved into generation 2. If they survive further collection runs there, they remain in generation 2. Let's check the generations before and after:
gc.collect()
print('collections:', [stats['collections'] for stats in gc.get_stats()])
print('objects:', [len(gc.get_objects(i)) for i in range(3)])
z = zip(*pairs)
print('collections:', [stats['collections'] for stats in gc.get_stats()])
print('objects:', [len(gc.get_objects(i)) for i in range(3)])
Output (each line shows values for the three generations):
collections: [13111, 1191, 2]
objects: [17, 0, 13540]
collections: [26171, 2378, 20]
objects: [317, 2103, 10011140]
The 10011140 shows that most of the 10 million iterators were not just registered for tracking, but are already in generation 2. So they were part of at least two garbage collection runs. And the number of generation 2 collections went up from 2 to 20, so our millions of iterators were part of up to 20 garbage collection runs (two to get into generation 2, and up to 18 more while already in generation 2). We can also register a callback to count more precisely:
checks = 0
def count(phase, info):
if phase == 'start':
global checks
checks += len(gc.get_objects(info['generation']))
gc.callbacks.append(count)
z = zip(*pairs)
gc.callbacks.remove(count)
print(checks)
That told me 63,891,314 checks total (i.e., on average, each iterator was part of over 6 garbage collection runs). That's a lot of work. And all this just to create the zip iterator, before even using it.
Meanwhile, the loop
for i, (a[i], b[i]) in enumerate(pairs):
pass
creates almost no new objects at all. Let's check how much tracking enumerate causes:
gc.collect()
tracked_before = len(gc.get_objects())
e = enumerate(pairs)
print(len(gc.get_objects()) - tracked_before)
Output: 3 new objects tracked (the enumerate iterator object itself, the single iterator it creates for iterating over pairs, and the result tuple it'll use (code here)).
I'd say that answers the question "Why do the timings totally differ like that?". The zip solution creates millions of objects that go through multiple garbage collection runs, while the loop solution doesn't. So disabling the garbage collector helps the zip solution tremendously, while the loop solution doesn't care.
Now about the second question: "Which timing method should I believe?". Here's what the documentation has to say about it (emphasis mine):
By default, timeit() temporarily turns off garbage collection during the timing. The advantage of this approach is that it makes independent timings more comparable. The disadvantage is that GC may be an important component of the performance of the function being measured. If so, GC can be re-enabled as the first statement in the setup string. For example:
timeit.Timer('for i in range(10): oct(i)', 'gc.enable()').timeit()
In our case here, the cost of garbage collection doesn't stem from some other unrelated code. It's directly caused by the zip call. And you do pay this price in reality, when you run that. So in this case, I do consider it an "important component of the performance of the function being measured". To directly answer the question as asked: Here I'd believe the default_timer method, not the timeit method. Or put differently: Here the timeit method should be used with enabling garbage collection as suggested in the documentatiion.
Or... alternatively, we could actually disable garbage collection as part of the solution (not just for benchmarking):
def f1(a, b, pairs):
gc.disable()
a[:], b[:] = zip(*pairs)
gc.enable()
But is that a good idea? Here's what the gc documentation says:
Since the collector supplements the reference counting already used in Python, you can disable the collector if you are sure your program does not create reference cycles.
Sounds like it's an ok thing to do. But I'm not sure I don't create reference cycles elsewhere in my program, so I finish with gc.enable() to turn garbage collection back on after I'm done. At that point, all those temporary objects have already been discarded thanks to reference counting. So all I'm doing is avoiding lots of pointless garbage collection checks. I find this a valuable lesson and I might actually do that in the future, if I know I only temporarily create a lot of objects.
Finally, I highly recommend reading the gc module documentation and the Design of CPython’s Garbage Collector in Python's developer guide. Most of it is easy to understand, and I found it quite interesting and enlightening.
I'm working with very large numbers such as: 632382 to the power of 518061.
When I try calculating it directly using Python (632382**518061), it takes a really long time.
However, when I compare 2 very large numbers:
>>> 632382**518061 > 519432**525806
True
Python does it very quickly.
I assumed that in order to compare both numbers, Python would calculate them beforehand. But since the comparison is much faster than its actual calculation, Python is doing something different.
How is Python able to perform the comparison much faster (apparently without calculating the exact values)?
What takes so long is printing the values.
If I enter
>>> x = 632382**518061
in an interactive Python session, it takes about a second.
If I then enter
>>> x
it takes at least half a minute (I aborted it before it generated any output).1
Evaluating and printing the result of the expression 632382**518061 > 519432**525806 does not require printing the two large numbers, therefore it takes less time.
It still takes longer than evaluating the two numbers (without printing), as expected:
>>> from timeit import timeit
>>> timeit('632382**518061', number=1)
1.312588474999984
>>> timeit('519432**525806', number=1)
1.281405287000041
>>> timeit('632382**518061 > 519432**525806', number=1)
2.685868804999984
1After all, the decimal representation of x has 3005262 digits, which we can calculate much more quickly than with len(str(x)) by using logarithms:
>>> from math import log10, ceil
>>> ceil(518061 * log10(632382))
3005262
I was just experimenting with some code and I found something out what makes no sence to me
>>> import timeit
>>> timeit.timeit("524288000/1024/1024")
0.05489620000000173
>>> timeit.timeit("524288000//1024//1024")
0.030612500000017917
>>>
using // in calculations is faster then / calculations
but when i repeated it this where the results:
>>> timeit.timeit("524288000//1024//1024")
0.02494899999999234
>>> timeit.timeit("524288000/1024/1024")
0.02480830000001788
and now is / faster then // what makes no sense to me
why is this?
edit:
the results of the experiment with the the amount of times repeated on 10000 this are the results:
avg for /: 0.0261193088
avg for //: 0.025788395899999896
When you time a function the CPU calculates the difference between the time when the instruction finished and the time when the instruction started, but a lot happens under the hood and not just the algorithm that you're timing.
Try to read some books about Operating Systems and you'll understand better.
In order to do these kind of experiments you should repeat this algorithm thousands of times to escape from variations.
Try the code below, but if you want to do real experiments change the loop value to something greater
import timeit
loops = 100
oneSlashAvg = 0
for i in range(loops):
oneSlashAvg += timeit.timeit("524288000/1024/1024")
print(oneSlashAvg/loops)
doubleSlashAvg = 0
for i in range(loops):
doubleSlashAvg += timeit.timeit("524288000//1024//1024")
print(doubleSlashAvg/loops)
I have observed that calling gevent.idle() multiple times makes each successive call slower and slower. The same behaviour can be observed with gevent.sleep(0).
With 100 calls:
>>> timeit.timeit(setup='import gevent', stmt='gevent.idle()',
number=100)
0.0005408697757047776
With 100000 calls:
>>> timeit.timeit(setup='import gevent', stmt='gevent.idle()',
number=100000)
0.5255624202554827
I thought gevent.sleep/gevent.idle would basically just check if there is something else to do or return immediately.
Why are the calls getting slower and slower?
I did this test
import time
def test1():
a=100
b=200
start=time.time()
if (a>b):
c=a
else:
c=b
end=time.time()
print(end-start)
def test2():
a="amisetertzatzaz1111reaet"
b="avieatzfzatzr333333ts"
start=time.time()
if (a>b):
c=a
else:
c=b
end=time.time()
print(end-start)
def test3():
a="100"
b="200"
start=time.time()
if (a>b):
c=a
else:
c=b
end=time.time()
print(end-start)
And obtain as result
1.9073486328125e-06 #test1()
9.5367431640625e-07 #test2()
1.9073486328125e-06 #test3()
Execution times are similar. It's true, use integer instead of string reduce the storage space but what about the execution time?
Timing a single execution of a short piece of code doesn't tell you very much at all. In particular, if you look at the timing numbers from your test1 and test3, you'll see that the numbers are identical. That ought to be a warning sign that, in fact, all that you're seeing here is the resolution of the timer:
>>> 2.0 / 2 ** 20
1.9073486328125e-06
>>> 1.0 / 2 ** 20
9.5367431640625e-07
For better results, you need to run the code many times, and measure and subtract the timing overhead. Python has a built-in module timeit for doing exactly this. Let's time 100 million executions of each kind of comparison:
>>> from timeit import timeit
>>> timeit('100 > 200', number=10**8)
5.98881983757019
>>> timeit('"100" > "200"', number=10**8)
7.528342008590698
so you can see that the difference is not really all that much (string comparison only about 25% slower in this case). So why is string comparison slower? Well, the way to find out is to look at the implementation of the comparison operation.
In Python 2.7, comparison is implemented by the do_cmp function in object.c. (Please open this code in a new window to follow the rest of my analysis.) On line 817, you'll see that if the objects being compared are the same type and if they have a tp_compare function in their class structure, then that function is called. In the case of integer objects, this is what happens, the function being int_compare in intobject.c, which you'll see is very simple.
But strings don't have a tp_compare function, so do_cmp proceeds to call try_rich_to_3way_compare which then calls try_rich_compare_bool up to three times (trying the three comparison operators EQ, LT and GT in turn). This calls try_rich_compare which calls string_richcompare in stringobject.c.
So string comparison is slower because it has to use the complicated "rich comparison" infrastructure, whereas integer comparison is more direct. But even so, it doesn't make all that much difference.
Huh? Since the storage space is reduced, the number of bits that need to be compared is also reduced. Comparing bits is work, doing less work means it goes faster.