what and how is the dissembler function used for in python? - python

I just ran across the dissembler function in python. But i couldn't make out what it means. Can anyone explain the working and use, based on the results of the factorial function (based on recursion and loop)
The recursive code and the corresponding dis code:
>>> def fact(n):
... if n==1:
... return 1
... return n*fact(n-1)
...
>>> dis.dis(fact)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (1)
6 COMPARE_OP 2 (==)
9 POP_JUMP_IF_FALSE 16
3 12 LOAD_CONST 1 (1)
15 RETURN_VALUE
4 >> 16 LOAD_FAST 0 (n)
19 LOAD_GLOBAL 0 (fact)
22 LOAD_FAST 0 (n)
25 LOAD_CONST 1 (1)
28 BINARY_SUBTRACT
29 CALL_FUNCTION 1
32 BINARY_MULTIPLY
33 RETURN_VALUE
And the factorial function using loop gives the following result:
def factor(n):
... f=1
... while n>1:
... f*=n
... n-=1
...
>>> dis.dis(factor)
2 0 LOAD_CONST 1 (1)
3 STORE_FAST 1 (f)
3 6 SETUP_LOOP 36 (to 45)
>> 9 LOAD_FAST 0 (n)
12 LOAD_CONST 1 (1)
15 COMPARE_OP 4 (>)
18 POP_JUMP_IF_FALSE 44
4 21 LOAD_FAST 1 (f)
24 LOAD_FAST 0 (n)
27 INPLACE_MULTIPLY
28 STORE_FAST 1 (f)
5 31 LOAD_FAST 0 (n)
34 LOAD_CONST 1 (1)
37 INPLACE_SUBTRACT
38 STORE_FAST 0 (n)
41 JUMP_ABSOLUTE 9
>> 44 POP_BLOCK
>> 45 LOAD_CONST 0 (None)
48 RETURN_VALUE
Can anyone tell me how to determine which one is faster?

To measure how fast something is running, use the timeit module, which comes with Python.
The dis module is used to get some idea of what the bytecode may look like; and its very specific to cpython.
One use of it is to see what, when and how storage is assigned for variables in a loop or method. However, this is a specialized module that is not normally used for efficiency calculations; use timeit to figure out how fast something is, and then dis to get an understanding of what is going on under the hood - to arrive at a possible why.

It's impossible to determine which one will be faster simply by looking at the bytecode; each VM has a different cost associated with each opcode and so runtimes can vary widely.

The dis.dis() function disassembles a function into its bytecode interpretation.
Timing
As stated by Ignacio, the pure length of the bytecode does not accurately represent the running time due to differences in how python interpreters actually run opcode and the timeit module would be what you want to use there.
Actual Purpose
There are several uses of this function, but they are not things that most people would end up doing. You can look at the output to help as part of the process of optimizing or debugging speed issues. It would also likely prove useful in working directly on the python interpreter, or writing your own. You can look at the documentation here to see a full list of the opcodes (though, just as that page will state, it's perfectly likely to change between versions of python).
Overall, this is not something you'd really use much in a production application (unless your application is a python disassembler!) but when you really, really need to optimize your code and debug at the lowest level, this is where the function would come in handy.

Related

What happens if you send() to a generator *expression* in Python?

I was surprised to find that, say,
ge=(x*x for x in [1,2,3])
accepts the .send method. The argument of the first call must be None, as with any other generator , but the behaviour of further calls, say, ans=ge.send(99) seems identical to ans=next(ge).
Where goes my 99? There are no yield expressions within ge, nothing to be assigned. Is the value injected simply discarded (as I suspect), or there is some Mystery involved?
Has anybody seen that?
Same thing as if you send to the equivalent generator created with a generator function:
def genfunc(outer_iterable):
for x in outer_iterable:
yield x*x
ge = genfunc([1, 2, 3])
which is to say, the send argument gets discarded.
We can disassemble the bytecode for further confirmation:
import dis
ge=(x*x for x in [1,2,3])
print('Genexp:')
dis.dis(ge)
def genfunc(outer_iterable):
for x in outer_iterable:
yield x*x
ge = genfunc([1, 2, 3])
print()
print('Generator function:')
dis.dis(ge)
Output:
Genexp:
3 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 15 (to 21)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 BINARY_MULTIPLY
16 YIELD_VALUE
17 POP_TOP
18 JUMP_ABSOLUTE 3
>> 21 LOAD_CONST 0 (None)
24 RETURN_VALUE
Generator function:
9 0 SETUP_LOOP 23 (to 26)
3 LOAD_FAST 0 (outer_iterable)
6 GET_ITER
>> 7 FOR_ITER 15 (to 25)
10 STORE_FAST 1 (x)
10 13 LOAD_FAST 1 (x)
16 LOAD_FAST 1 (x)
19 BINARY_MULTIPLY
20 YIELD_VALUE
21 POP_TOP
22 JUMP_ABSOLUTE 7
>> 25 POP_BLOCK
>> 26 LOAD_CONST 0 (None)
29 RETURN_VALUE
The genexp and the generator created through the generator function have very similar disassemblies, and in both, the YIELD_VALUE is immediately followed by a POP_TOP that discards any value sent in from send.
Thank you, all. So yes, the arg of send is discarded, but the fact that send is accepted seems to be an anomaly.
Another, related bug has been already commented here (page 32139885), the yield expression should be forbidden in genexps, but it isn't. The form ge=((yield x*x) for x in [1,2,3]) is accepted, and .send() works.
The answer returned by send is then a mixture of elements in the internal iterable, and the args of send... If I am not mistaken, GvR wrote that in Python 3.8 this (the yield expression) will be treated as an error, and in 3.7 it should signal that it is deprecated. (People agreed that it was confusing.)
But I tested that in Python 3.7 (Anaconda, Windows 64), and I got no deprecation warning. Anyway, this seems to be a real bug, not a feature to be deprecated. I believe that for the moment there is nothing more to say...
JK

Where did I go wrong in this time complexity calculation?

I have this function in Python:
digit_sum = 0
while number > 0:
digit_sum += (number % 10)
number = number // 10
For determining the time complexity, I applied the following logic:
Line 1: 1 basic operation (assignment), gets executed 1 time so gets a value of 1
Line 2: 2 basic operations (reading the variable 'number' and comparing against zero), gets executed n+1 times so gets a value of 2*(n+1)
Line 3: 4 basic operations (reading the variable 'number', %10, computing the sum, and assignment), gets executed n times so gets a value of 4*n
Line 4: 3 basic operations (reading the variable 'number', //10 and assignment), gets executed n times so gets a value of 3*n
This brings me to a total of 1 + 2n+2 + 4n + 3n = 9n+3
But my textbook has a solution of 8n+3. Where did I go wrong in my logic?
Thanks,
Alex
When talking about complexity all you really care about is asymptotic complexity. Here, O(n). The 8 or 9 or 42 doesn't really matter, especially as there is no way for you to know.
Thus counting "operations" is pointless. It exposes the architectural details of the underlying processor (be it an actual hw proc or an interpreter). The only way to actually get the "real" count of operations would be to have a look at a specific implementation (for instance, say CPython 3.6.0) and see the bytecode it generates from your program.
Here is what my CPython 2.7.12 does:
>>> def test(number):
... digit_sum = 0
... while number > 0:
... digit_sum += (number % 10)
... number = number // 10
...
>>> import dis
>>> dis.dis(test)
2 0 LOAD_CONST 1 (0)
3 STORE_FAST 1 (digit_sum)
3 6 SETUP_LOOP 40 (to 49)
>> 9 LOAD_FAST 0 (number)
12 LOAD_CONST 1 (0)
15 COMPARE_OP 4 (>)
18 POP_JUMP_IF_FALSE 48
4 21 LOAD_FAST 1 (digit_sum)
24 LOAD_FAST 0 (number)
27 LOAD_CONST 2 (10)
30 BINARY_MODULO
31 INPLACE_ADD
32 STORE_FAST 1 (digit_sum)
5 35 LOAD_FAST 0 (number)
38 LOAD_CONST 2 (10)
41 BINARY_FLOOR_DIVIDE
42 STORE_FAST 0 (number)
45 JUMP_ABSOLUTE 9
>> 48 POP_BLOCK
>> 49 LOAD_CONST 0 (None)
52 RETURN_VALUE
I let you draw your own conclusions as to what you want to actually count as a basic operation. Python interpreter interprets bytecodes one after the other, so arguably you have 15 "basic operations" inside your loop. That's the closest you can get to a meaningful number. Still, every operation in there has different runtimes so that 15 carries no valuable information.
Also, keep in mind this is specific to CPython 2.7.12. It's very likely another version will generate something else, taking advantage of new bytecodes that might make it possible to express some operations in a simpler way.

Why doesn't Python optimize away temporary variables?

Fowler's Extract Variable refactoring method, formerly Introduce Explaining Variable, says use a temporary variable to make code clearer for humans. The idea is to elucidate complex code by introducing an otherwise unneeded local variable, and naming that variable for exposition purposes. It also advocates this kind of explaining over comments.. Other languages optimize away temporary variables so there's no cost in time or space resources. Why doesn't Python do this?
In [3]: def multiple_of_six_fat(n):
...: multiple_of_two = n%2 == 0
...: multiple_of_three = n%3 == 0
...: return multiple_of_two and multiple_of_three
...:
In [4]: def multiple_of_six_lean(n):
...: return n%2 == 0 and n%3 == 0
...:
In [5]: import dis
In [6]: dis.dis(multiple_of_six_fat)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (2)
6 BINARY_MODULO
7 LOAD_CONST 2 (0)
10 COMPARE_OP 2 (==)
13 STORE_FAST 1 (multiple_of_two)
3 16 LOAD_FAST 0 (n)
19 LOAD_CONST 3 (3)
22 BINARY_MODULO
23 LOAD_CONST 2 (0)
26 COMPARE_OP 2 (==)
29 STORE_FAST 2 (multiple_of_three)
4 32 LOAD_FAST 1 (multiple_of_two)
35 JUMP_IF_FALSE_OR_POP 41
38 LOAD_FAST 2 (multiple_of_three)
>> 41 RETURN_VALUE
In [7]: dis.dis(multiple_of_six_lean)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (2)
6 BINARY_MODULO
7 LOAD_CONST 2 (0)
10 COMPARE_OP 2 (==)
13 JUMP_IF_FALSE_OR_POP 29
16 LOAD_FAST 0 (n)
19 LOAD_CONST 3 (3)
22 BINARY_MODULO
23 LOAD_CONST 2 (0)
26 COMPARE_OP 2 (==)
>> 29 RETURN_VALUE
Because Python is a highly dynamic language, and references can influence behaviour.
Compare the following, for example:
>>> id(object()) == id(object())
True
>>> ob1 = object()
>>> ob2 = object()
>>> id(ob1) == id(ob2)
False
Had Python 'optimised' the ob1 and ob2 variables away, behaviour would have changed.
Python object lifetime is governed by reference counts. Add weak references into the mix plus threading, and you'll see that optimising away variables (even local ones) can lead to undesirable behaviour changes.
Besides, in Python, removing those variables would hardly have changed anything from a performance perspective. The local namespace is already highly optimised (values are looked up by index in an array); if you are worried about the speed of dereferencing local variables, you are using the wrong programming language for that time critical section of your project.
Issue 2181 (optimize out local variables at end of function) has some interesting points:
It can make debugging harder since the symbols no longer
exist. Guido says only do it for -O.
Might break some usages of inspect or sys._getframe().
Changes the lifetime of objects. For example myfunc in the following example might fail after optimization because at the moment Python guarantees that the file object is not closed before the function exits. (bad style, but still)
def myfunc():
f = open('somewhere', 'r')
fd = f.fileno()
return os.fstat(fd)
cannot be rewritten as:
def bogus():
fd = open('somewhere', 'r').fileno()
# the file is auto-closed here and fd becomes invalid
return os.fstat(fd)
A core developer says that "it is unlikely to give any speedup in real-world code, I don't think we should add complexity to the compiler."

Can simple calculations on variable length iterables be made faster in Python?

I'm calculating the euclidean distance between two vectors represented by tuples.
(u[0]-v[0])**2 + (u[1]-v[1])**2 + (u[3]-v[3])**2 ...
The hard-coded way of doing this is pretty fast. However, I would like to make no assumptions about the length of these vectors. That results in solutions like:
sum([(a-b)**2 for a, b in izip(u, v)]) # Faster without generator
or
sum = 0
for i in xrange(len(u)):
sum += (u[i]-v[i])**2
which turn out to be much (at least twice) slower than the first version. Is there some smart way of optimizing this, without resorting to NumPy/SciPy? I'm aware that those packages offer the fastest way of doing such things, but at the moment, I'm more trying to get experience with optimizing "bare Python". What I found works fast is to dynamically build a string that defines the function and exec() it, but that's really a last resort, I would say...
The requirements:
CPython 2.7
Standard library only
"Real" (e.g. no exec()), pure Python
Even though my question is about the matter of small operations in general, you may assume in your solution that one of the vectors remains the same over several function calls.
mysum = 0
for a, b in izip(u, v) :
mysum += (a-b)**2
About 35% faster than #3
PS: have you tried Cython (not CPython) or Shedskin?
What I'm understanding is that you don't really need to make the code faster, you just want to know why it's slower. To answer that, let's look at the disassembly. For the purposes of this discussion, I'm going to wrap each method in a function call, the loading of u and v and the return command can be ignored in each disassembly.
def test1(u, v):
return (u[0]-v[0])**2 + (u[1]-v[1])**2 + (u[3]-v[3])**2
dis.dis(test1)
0 LOAD_FAST 0 (u)
3 LOAD_CONST 1 (0)
6 BINARY_SUBSCR
7 LOAD_FAST 1 (v)
10 LOAD_CONST 1 (0)
13 BINARY_SUBSCR
14 BINARY_SUBTRACT
15 LOAD_CONST 2 (2)
18 BINARY_POWER
19 LOAD_FAST 0 (u)
22 LOAD_CONST 3 (1)
25 BINARY_SUBSCR
26 LOAD_FAST 1 (v)
29 LOAD_CONST 3 (1)
32 BINARY_SUBSCR
33 BINARY_SUBTRACT
34 LOAD_CONST 2 (2)
37 BINARY_POWER
38 BINARY_ADD
39 LOAD_FAST 0 (u)
42 LOAD_CONST 4 (3)
45 BINARY_SUBSCR
46 LOAD_FAST 1 (v)
49 LOAD_CONST 4 (3)
52 BINARY_SUBSCR
53 BINARY_SUBTRACT
54 LOAD_CONST 2 (2)
57 BINARY_POWER
58 BINARY_ADD
59 RETURN_VALUE
I cut the first example off at a length of 3 because it would just repeat the same pattern over and over. You can quickly see that there is no function call overhead and pretty much the interpreter is doing the minimum possible work on these operands to achieve your result.
def test2(u, v):
sum((a-b)**2 for a, b in izip(u, v))
dis.dis(test2)
0 LOAD_GLOBAL 0 (sum)
3 LOAD_CONST 1 (<code object <genexpr> at 02C6F458, file "<pyshell#10>", line 2>)
6 MAKE_FUNCTION 0
9 LOAD_GLOBAL 1 (izip)
12 LOAD_FAST 0 (u)
15 LOAD_FAST 1 (v)
18 CALL_FUNCTION 2
21 GET_ITER
22 CALL_FUNCTION 1
25 CALL_FUNCTION 1
28 RETURN_VALUE
What we see here is that we create a function out of the generator expression, load 2 globals (sum and izip, global lookups are slower than local lookups, we can't avoid making them once but if they're going to be called in a tight loop, many people assign them to a local, such as _izip or _sum), and then suffer 4 expensive bytecode operations in a row, calling izip, getting the iterator from the generator, calling the function created by the generator, and then calling the sum function (which will consume the iterator and add each item before returning).
def test3(u, v):
sum = 0
for i in xrange(len(u)):
sum += (u[i]-v[i])**2
dis.dis(test3)
0 LOAD_CONST 1 (0)
3 STORE_FAST 2 (sum)
6 SETUP_LOOP 52 (to 61)
9 LOAD_GLOBAL 0 (xrange)
12 LOAD_GLOBAL 1 (len)
15 LOAD_FAST 0 (u)
18 CALL_FUNCTION 1
21 CALL_FUNCTION 1
24 GET_ITER
25 FOR_ITER 32 (to 60)
28 STORE_FAST 3 (i)
31 LOAD_FAST 2 (sum)
34 LOAD_FAST 0 (u)
37 LOAD_FAST 3 (i)
40 BINARY_SUBSCR
41 LOAD_FAST 1 (v)
44 LOAD_FAST 3 (i)
47 BINARY_SUBSCR
48 BINARY_SUBTRACT
49 LOAD_CONST 2 (2)
52 BINARY_POWER
53 INPLACE_ADD
54 STORE_FAST 2 (sum)
57 JUMP_ABSOLUTE 25
60 POP_BLOCK
61 LOAD_CONST 0 (None)
64 RETURN_VALUE
What we see here is a more straightforward version of what is happening in test2. No generator expression or call to sum, but we've replaced that function call overhead with an unnecessary function call by doing xrange(len(u)) instead of the faster solution suggested by #Lucas Malor.
def test4(u, v):
mysum = 0
for a, b in izip(u, v) :
mysum += (a-b)**2
return mysum
dis.dis(test4)
0 LOAD_CONST 1 (0)
3 STORE_FAST 2 (mysum)
6 SETUP_LOOP 47 (to 56)
9 LOAD_GLOBAL 0 (izip)
12 LOAD_FAST 0 (u)
15 LOAD_FAST 1 (v)
18 CALL_FUNCTION 2
21 GET_ITER
22 FOR_ITER 30 (to 55)
25 UNPACK_SEQUENCE 2
28 STORE_FAST 3 (a)
31 STORE_FAST 4 (b)
34 LOAD_FAST 2 (mysum)
37 LOAD_FAST 3 (a)
40 LOAD_FAST 4 (b)
43 BINARY_SUBTRACT
44 LOAD_CONST 2 (2)
47 BINARY_POWER
48 INPLACE_ADD
49 STORE_FAST 2 (mysum)
52 JUMP_ABSOLUTE 22
55 POP_BLOCK
56 LOAD_FAST 2 (mysum)
59 RETURN_VALUE
The above represents #Lucas Malor's contribution and it's faster in a few ways. It replaces subscript operations with unpacking while reducing the number of calls to 1. This is, in many cases, as fast you're going to achieve with the constraints you've given us.
Note that it would only be worth evaluating a run-time generated string similar to the function in test1 if you were going to call the function enough times to merit the overhead. Note also that as the length of u and v becomes increasingly large (which is typically how algorithms of this type are evaluated) the function call overhead of the other solutions becomes increasingly small and therefore, in most cases, the most readable solution is vastly superior. At the same time, even though it's slower in small cases, if the length of your sequences, u and v, may be very long, I recommend a generator expression as opposed to a list comprehension. The memory savings will cause much faster execution in most cases (and faster gc).
Overall, my recommendation is that the tiny speedup in cases of short sequences is just not worth the increase in code size and inconsistent behavior with other implementations of python you're looking at by performing micro-optimizations. The "best" solution is almost certainly test2.

Is it better to save the length of a list that I use several time?

I know about inlining, and from what I checked it is not done by the Python's compiler.
My question is : is there any optimizations with the python's compiler which transforms :
print myList.__len__()
for i in range(0, myList.__len__()):
print i + myList.__len__()
to
l = myList.__len__()
print l
for i in range(0, l):
print i + l
So is it done by the compiler ?
If it is not : is it worth it to do it by myself ?
Bonus question (not so related) : I like to have a lot of functions (better for readability IMHO)... like there is no inlining in Python is this something to avoid (lots of functions) ?
No, there isn't. You can check what Python does by compiling the code to byte-code using the dis module:
>>> def test():
... print myList.__len__()
... for i in range(0, myList.__len__()):
... print i + myList.__len__()
...
>>> import dis
>>> dis.dis(test)
2 0 LOAD_GLOBAL 0 (myList)
3 LOAD_ATTR 1 (__len__)
6 CALL_FUNCTION 0
9 PRINT_ITEM
10 PRINT_NEWLINE
3 11 SETUP_LOOP 44 (to 58)
14 LOAD_GLOBAL 2 (range)
17 LOAD_CONST 1 (0)
20 LOAD_GLOBAL 0 (myList)
23 LOAD_ATTR 1 (__len__)
26 CALL_FUNCTION 0
29 CALL_FUNCTION 2
32 GET_ITER
>> 33 FOR_ITER 21 (to 57)
36 STORE_FAST 0 (i)
4 39 LOAD_FAST 0 (i)
42 LOAD_GLOBAL 0 (myList)
45 LOAD_ATTR 1 (__len__)
48 CALL_FUNCTION 0
51 BINARY_ADD
52 PRINT_ITEM
53 PRINT_NEWLINE
54 JUMP_ABSOLUTE 33
>> 57 POP_BLOCK
>> 58 LOAD_CONST 0 (None)
61 RETURN_VALUE
As you can see, the __len__ attribute is looked up and called each time.
Python cannot know what a given method will return between calls, the __len__ method is no exception. If python were to try to optimize that by assuming the value returned would be the same between calls, you'd run into countless different problems, and we haven't even tried to use multi-threading yet.
Note that you would be much better off using len(myList), and not call the __len__() hook directly:
print len(myList)
for i in xrange(len(myList):
print i + len(myList)
No, the optimization you're asking about is not done by the CPython compiler. In fact hardly any optimizations are done by the CPython compiler.
To see for yourself, import dis and disassemble a function with code like you're asking about: dis.dis(func).
The reason this isn't optimized is that it is entirely possible that an attribute (even a method like __len__) will be a completely different object the next time it is accessed. This rarely happens, of course, but Python supports it.
Attribute access does consume time, so storing a reference to an attribute you will be using repeatedly (especially in a local variable) can make your code run faster. However, it decreases readability, so I'd wait until you know that a given piece of code is a bottleneck before applying it. In your case, the time spent printing is easily going to overwhelm the attribute access.
In the final analysis, if performance were paramount you'd be using something other than Python in the first place, no?

Categories