Why is inspect.currentframe slower than sys. _getframe? - python

Following up on this answer: https://stackoverflow.com/a/17366561/1982118
On my macbook pro 2015 (2.8 GHz Intel Core i7) with python 3.6, I get:
python3 -m timeit -s 'import inspect' 'inspect.currentframe().f_code.co_name'
>>> 1000000 loops, best of 3: 0.428 usec per loop
python3 -m timeit -s 'import sys' 'sys._getframe().f_code.co_name'
>>> 10000000 loops, best of 3: 0.114 usec per loop
using sys._getframe() is 4 times faster than inspect.currentframe().
How come?

Assuming that the question is about CPython, you can see the implementation of inspect.currentframe here:
def currentframe():
"""Return the frame of the caller or None if this is not possible."""
return sys._getframe(1) if hasattr(sys, "_getframe") else None
The function calls hasattr in addition to sys._getframe, thus it has to be slower.
hasattr works by attempting to get the attribute and catching the AttributeError exception, if that fails. The _getframe attribute exists and is retrieved again, thus adding to the overhead.

Related

Creating list in Python, list(a) vs [*a] [duplicate]

Recently, I had to convert the values of a dictionary to a list in Python 3.6 and an use case where this is supposed to happen a lot.
Trying to be a good guy I wanted to use a solution which is close to the PEP. Now, PEP 3106 suggests
list(d.keys())
which obviously works fine - but using timeit on my Windows 7 machine i see
>python -m timeit "[*{'a': 1, 'b': 2}.values()]"
1000000 loops, best of 3: 0.249 usec per loop
>python -m timeit "list({'a': 1, 'b': 2}.values())"
1000000 loops, best of 3: 0.362 usec per loop
I assume that there is an advantage in the latter version, because why else should the PEP suggest the slower one.
So here comes my question: What's the advantage of the latter version compared to the first one?
The answer is because the faster syntax was first introduced in PEP 448 in 2013, while PEP 3106 you reference was written in 2006, so even if is faster in a real way now, it didn't exist when the PEP was written.
As noted by others, the role of PEPs is not to provide a template for the fastest possible code - in general, code in PEPs will aim to be simpler and as clear as possible, because examples are generally about understanding concepts rather than achieving the best possible results, so even if the syntax did exist at the time, and is faster in a real (and reliable) way, it still may not have been used.
A bit of further testing with larger values:
python -m timeit -s "x = [1]*10000" "[*x]"
10000 loops, best of 3: 44.6 usec per loop
python -m timeit -s "x = [1]*10000" "list(x)"
10000 loops, best of 3: 44.8 usec per loop
Shows the difference isn't really two times, but rather a flat cost - I would guess it's the cost of looking up the list() built-in function. This is negligible in most real cases.

Why is max slower than sort?

I've found that max is slower than the sort function in Python 2 and 3.
Python 2
$ python -m timeit -s 'import random;a=range(10000);random.shuffle(a)' 'a.sort();a[-1]'
1000 loops, best of 3: 239 usec per loop
$ python -m timeit -s 'import random;a=range(10000);random.shuffle(a)' 'max(a)'
1000 loops, best of 3: 342 usec per loop
Python 3
$ python3 -m timeit -s 'import random;a=list(range(10000));random.shuffle(a)' 'a.sort();a[-1]'
1000 loops, best of 3: 252 usec per loop
$ python3 -m timeit -s 'import random;a=list(range(10000));random.shuffle(a)' 'max(a)'
1000 loops, best of 3: 371 usec per loop
Why is max (O(n)) slower than the sort function (O(nlogn))?
You have to be very careful when using the timeit module in Python.
python -m timeit -s 'import random;a=range(10000);random.shuffle(a)' 'a.sort();a[-1]'
Here the initialisation code runs once to produce a randomised array a. Then the rest of the code is run several times. The first time it sorts the array, but every other time you are calling the sort method on an already sorted array. Only the fastest time is returned, so you are actually timing how long it takes Python to sort an already sorted array.
Part of Python's sort algorithm is to detect when the array is already partly or completely sorted. When completely sorted it simply has to scan once through the array to detect this and then it stops.
If instead you tried:
python -m timeit -s 'import random;a=range(100000);random.shuffle(a)' 'sorted(a)[-1]'
then the sort happens on every timing loop and you can see that the time for sorting an array is indeed much longer than to just find the maximum value.
Edit: #skyking's answer explains the part I left unexplained: a.sort() knows it is working on a list so can directly access the elements. max(a) works on any arbitrary iterable so has to use generic iteration.
First off, note that max() uses the iterator protocol, while list.sort() uses ad-hoc code. Clearly, using an iterator is an important overhead, that's why you are observing that difference in timings.
However, apart from that, your tests are not fair. You are running a.sort() on the same list more than once. The algorithm used by Python is specifically designed to be fast for already (partially) sorted data. Your tests are saying that the algorithm is doing its job well.
These are fair tests:
$ python3 -m timeit -s 'import random;a=list(range(10000));random.shuffle(a)' 'max(a[:])'
1000 loops, best of 3: 227 usec per loop
$ python3 -m timeit -s 'import random;a=list(range(10000));random.shuffle(a)' 'a[:].sort()'
100 loops, best of 3: 2.28 msec per loop
Here I'm creating a copy of the list every time. As you can see, the order of magnitude of the results are different: micro- vs milliseconds, as we would expect.
And remember: big-Oh specifies an upper bound! The lower bound for Python's sorting algorithm is Ω(n). Being O(n log n) does not automatically imply that every run takes a time proportional to n log n. It does not even imply that it needs to be slower than a O(n) algorithm, but that's another story. What's important to understand is that in some favorable cases, an O(n log n) algorithm may run in O(n) time or less.
This could be because l.sort is a member of list while max is a generic function. This means that l.sort can rely on the internal representation of list while max will have to go through generic iterator protocol.
This makes that each element fetch for l.sort is faster than each element fetch that max does.
I assume that if you instead use sorted(a) you will get the result slower than max(a).

Class instantiation slower in Python 3 than Python 2

I noticed by chance that a simple program generating a class from a large datafile ran a lot faster in Python 2.7 vs. 3.5. I read here that the use of "infinite precision" integers was to blame for slowdown in simple enumeration, but even when I tried a simple test instantiating this class I found that Python 3 was significantly slower:
class Benchmark(object):
def __init__(self):
self.members = ['a', 'b', 'c', 'd']
def test():
test = Benchmark()
if __name__ == '__main__':
import timeit
print(timeit.timeit("test()", setup="from __main__ import test"))
I thought perhaps it was something to do with the size of each class instance, but the Python 3 instance was smaller than 2 (56 vs. 64)
$python3 benchmarks.py
0.7017288669958361
$python benchmarks.py
0.508942842484
I have tried many variations on this theme, including with 3.4 on a different machine, and still get the same results. Any ideas what's going on?
You are not measuring class instantiation time, you are measuring class instantiation, plus assignment, plus list creation, ...
Here's a correct benchmark:
$ python -m timeit -s 'class C(object): pass' 'C()'
10000000 loops, best of 3: 0.0639 usec per loop
$ python3 -m timeit -s 'class C(object): pass' 'C()'
10000000 loops, best of 3: 0.0622 usec per loop
As you can see, Python 3 is sightly faster.

Python: default argument values vs global variables

I saw this default values usage in the Python's Queue module:
def _put(self, item, heappush=heapq.heappush):
heappush(self.queue, item)
def _get(self, heappop=heapq.heappop):
return heappop(self.queue)
I wonder why the variables are used as function arguments here? Is it just a matter of taste or some kind of optimization?
It's a micro optimization. Default values are evaluated only once at function definition time, and locals (including parameters) are a bit faster to access than globals, they're implemented as a C array lookup instead of a dict lookup. It also allows avoiding repeatedly looking up the heappush and heappop members of heapq, without polluting the namespace by pulling them in directly.
Timeit snippets:
python -mtimeit --setup "import heapq" --setup "def f(q,x,p=heapq.heappush): p(q,x)" "f([], 1)"
1000000 loops, best of 3: 0.538 usec per loop
python -mtimeit --setup "import heapq" --setup "def f(q,p=heapq.heappop): p(q)" "f([1])"
1000000 loops, best of 3: 0.386 usec per loop
python -mtimeit --setup "import heapq" --setup "def f(q,x): heapq.heappush(q,x)" "f([], 1)"
1000000 loops, best of 3: 0.631 usec per loop
python -mtimeit --setup "import heapq" --setup "def f(q): heapq.heappop(q)" "f([1])"
1000000 loops, best of 3: 0.52 usec per loop

Is there an up-to-date fast YAML parser with python bindings?

What's the latest and greatest for fast YAML parsing in Python? Syck is out of date and recommends using PyYaml, yet PyYaml is pretty slow, and suffers from the GIL problem:
>>> def xit(f, x):
import threading
for i in xrange(x):
threading.Thread(target=f).start()
>>> def stressit():
start = time.time()
res = yaml.load(open(path_to_11000_byte_yaml_file))
print "Took %.2fs" % (time.time() - start,)
>>> xit(stressit, 1)
Took 0.37s
>>> xit(stressit, 2)
Took 1.40s
Took 1.41s
>>> xit(stressit, 4)
Took 2.98s
Took 2.98s
Took 2.99s
Took 3.00s
Given my use case I can cache the parsed objects, but I'd still prefer a faster solution even for that.
The linked wiki page states after the warning "Use libyaml (c), and PyYaml (python)". Although the note does have a bad wikilink (should be PyYAML not PyYaml).
As for performance, depending on how you installed PyYAML you should have the CParser class available which implements a YAML parser written in optimized C. While I don't think this gets around the GIL issue, it is markedly faster. Here are a few cursory benchmarks I ran on my machine (AMD Athlon II X4 640, 3.0GHz, 8GB RAM):
First with the default pure-Python parser:
$ /usr/bin/python2 -m timeit -s 'import yaml; y=file("large.yaml", "r").read()' \
'yaml.load(y)'
10 loops, best of 3: 405 msec per loop
With the CParser:
$ /usr/bin/python2 -m timeit -s 'import yaml; y=file("large.yaml", "r").read()' \
'yaml.load(y, Loader=yaml.CLoader)'
10 loops, best of 3: 59.2 msec per loop
And, for comparison, with PyPy using the pure-Python parser.
$ pypy -m timeit -s 'import yaml; y=file("large.yaml", "r").read()' \
'yaml.load(y)'
10 loops, best of 3: 101 msec per loop
For large.yaml I just googled for "large yaml file" and came across this:
https://gist.github.com/nrh/667383/raw/1b3ba75c939f2886f63291528df89418621548fd/large.yaml
(I had to remove the first couple of lines to make it a single-doc YAML file otherwise yaml.load complains.)
EDIT:
Another thing to consider is using the multiprocessing module instead of threads. This gets around GIL problems, but does require a bit more boiler-plate code to communicate between the processes. There are a number of good libraries available though to make multiprocessing easier. There's a pretty good list of them here.

Categories