I was surprised to find that, say,
ge=(x*x for x in [1,2,3])
accepts the .send method. The argument of the first call must be None, as with any other generator , but the behaviour of further calls, say, ans=ge.send(99) seems identical to ans=next(ge).
Where goes my 99? There are no yield expressions within ge, nothing to be assigned. Is the value injected simply discarded (as I suspect), or there is some Mystery involved?
Has anybody seen that?
Same thing as if you send to the equivalent generator created with a generator function:
def genfunc(outer_iterable):
for x in outer_iterable:
yield x*x
ge = genfunc([1, 2, 3])
which is to say, the send argument gets discarded.
We can disassemble the bytecode for further confirmation:
import dis
ge=(x*x for x in [1,2,3])
print('Genexp:')
dis.dis(ge)
def genfunc(outer_iterable):
for x in outer_iterable:
yield x*x
ge = genfunc([1, 2, 3])
print()
print('Generator function:')
dis.dis(ge)
Output:
Genexp:
3 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 15 (to 21)
6 STORE_FAST 1 (x)
9 LOAD_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 BINARY_MULTIPLY
16 YIELD_VALUE
17 POP_TOP
18 JUMP_ABSOLUTE 3
>> 21 LOAD_CONST 0 (None)
24 RETURN_VALUE
Generator function:
9 0 SETUP_LOOP 23 (to 26)
3 LOAD_FAST 0 (outer_iterable)
6 GET_ITER
>> 7 FOR_ITER 15 (to 25)
10 STORE_FAST 1 (x)
10 13 LOAD_FAST 1 (x)
16 LOAD_FAST 1 (x)
19 BINARY_MULTIPLY
20 YIELD_VALUE
21 POP_TOP
22 JUMP_ABSOLUTE 7
>> 25 POP_BLOCK
>> 26 LOAD_CONST 0 (None)
29 RETURN_VALUE
The genexp and the generator created through the generator function have very similar disassemblies, and in both, the YIELD_VALUE is immediately followed by a POP_TOP that discards any value sent in from send.
Thank you, all. So yes, the arg of send is discarded, but the fact that send is accepted seems to be an anomaly.
Another, related bug has been already commented here (page 32139885), the yield expression should be forbidden in genexps, but it isn't. The form ge=((yield x*x) for x in [1,2,3]) is accepted, and .send() works.
The answer returned by send is then a mixture of elements in the internal iterable, and the args of send... If I am not mistaken, GvR wrote that in Python 3.8 this (the yield expression) will be treated as an error, and in 3.7 it should signal that it is deprecated. (People agreed that it was confusing.)
But I tested that in Python 3.7 (Anaconda, Windows 64), and I got no deprecation warning. Anyway, this seems to be a real bug, not a feature to be deprecated. I believe that for the moment there is nothing more to say...
JK
Related
I found out something weird.
I defined two test functions as such:
def with_brackets(n=10000):
d = dict()
for i in range(n):
d["hello"] = i
def with_setitem(n=10000):
d = dict()
st = d.__setitem__
for i in range(n):
st("hello", i)
One would expect the two functions to be roughly the same execution speed. However:
>>> timeit(with_brackets, number=1000)
0.6558860000222921
>>> timeit(with_setitem, number=1000)
0.9857697170227766
There is possibly something I missed, but it does seem like setitem is almost twice as long, and I don't really understand why. Isn't dict[key] = x supposed to call __setitem__?
(Using CPython 3.9)
Edit: Using timeit instead of time
Isn't dict[key] = x supposed to call __setitem__?
Strictly speaking, no. Running both your functions through dis.dis, we get (I am only including the for loop):
>>> dis.dis(with_brackets)
...
>> 22 FOR_ITER 12 (to 36)
24 STORE_FAST 3 (i)
5 26 LOAD_FAST 0 (n)
28 LOAD_FAST 1 (d)
30 LOAD_CONST 1 ('hello')
32 STORE_SUBSCR
34 JUMP_ABSOLUTE 22
...
Vs
>>> dis.dis(with_setitem)
...
>> 28 FOR_ITER 14 (to 44)
30 STORE_FAST 4 (i)
6 32 LOAD_FAST 2 (setitem)
34 LOAD_CONST 1 ('hello')
36 LOAD_FAST 0 (n)
38 CALL_FUNCTION 2
40 POP_TOP
42 JUMP_ABSOLUTE 28
...
The usage of __setitem__ involves a function call (see the usage of CALL_FUNCTION and POP_TOP instead of just STORE_SUBSCR - that's the difference underneath the hood), and function calls do add some amount of overhead, so using the bracket accessor leads to more optimised opcode.
I know what a try-else block is, but consider the following two functions:
# Without else
def number_of_foos1(x):
try:
number = x['foo_count']
except:
return 0
return number
# With else
def number_of_foos2(x):
try:
number = x['foo_count']
except:
return 0
else:
return number
x_with_foo = dict(foo_count=5)
x_without_foo = 3
Unlike this try-else question we're not adding extra lines to the try block. In both cases the try block is a single line, and the principle of keeping error handling "close" to the errors that caused it is not violated.
The difference is in where we go after the successful try block.
In the first block, the code continues after the except block, and in the second, the code continues at the else.
They obviously give the same output:
In [138]: number_of_foos1(x_with_foo)
Out[139]: 5
In [140]: number_of_foos1(x_without_foo)
Out[140]: 0
In [141]: number_of_foos2(x_with_foo)
Out[141]: 5
In [142]: number_of_foos2(x_without_foo)
Out[142]: 0
Is either preferred? Are they even any different as far as the interpreter is concerned? Should you always have an else when continuing after a successful try or is it OK just to carry on unindented, as in number_of_foos1?
I would say that the case where you enter in the exception block must be rare (that's what we call that an exception). So using else gives too much importance to that block, which isn't supposed to happen in normal operation.
So if an exception occurs, handle the error and return, and forget about it.
Using else here add more complexity, and you can confirm this by disassembling both functions:
>>> dis.dis(number_of_foos1)
4 0 SETUP_EXCEPT 14 (to 17)
5 3 LOAD_FAST 0 (x)
6 LOAD_CONST 1 ('foo_count')
9 BINARY_SUBSCR
10 STORE_FAST 1 (number)
13 POP_BLOCK
14 JUMP_FORWARD 12 (to 29)
6 >> 17 POP_TOP
18 POP_TOP
19 POP_TOP
7 20 LOAD_CONST 2 (0)
23 RETURN_VALUE
24 POP_EXCEPT
25 JUMP_FORWARD 1 (to 29)
28 END_FINALLY
8 >> 29 LOAD_FAST 1 (number)
32 RETURN_VALUE
>>> dis.dis(number_of_foos2)
<exactly the same beginning then:>
15 20 LOAD_CONST 2 (0)
23 RETURN_VALUE
24 POP_EXCEPT
25 JUMP_FORWARD 5 (to 33)
28 END_FINALLY
17 >> 29 LOAD_FAST 1 (number)
32 RETURN_VALUE
>> 33 LOAD_CONST 0 (None)
36 RETURN_VALUE
>>>
As you see in the second example, addresses 24, 25, 28, 33 and 36 aren't reachable, that's because Python inserts jumps to the end of the code, and also a default return None in the main branch. All this code is useless, and would vouch for snippet #1 which is simpler and returns the result in the main branch.
I am looking in to the performance issues of the loop like structures in Python and found the following statements:
Besides the syntactic benefit of list comprehensions, they are often
as fast or faster than equivalent use of map.
(Performance Tips)
List comprehensions run a bit faster than equivalent for-loops (unless
you're just going to throw away the result).
(Python Speed)
I am wondering what difference under the hood gives list comprehension this advantage. Thanks.
Test one: throwing away the result.
Here's our dummy function:
def examplefunc(x):
pass
And here are our challengers:
def listcomp_throwaway():
[examplefunc(i) for i in range(100)]
def forloop_throwaway():
for i in range(100):
examplefunc(i)
I won't do an analysis of its raw speed, only why, per the OP's question. Lets take a look at the diffs of the machine code.
--- List comprehension
+++ For loop
## -1,15 +1,16 ##
- 55 0 BUILD_LIST 0
+ 59 0 SETUP_LOOP 30 (to 33)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (100)
9 CALL_FUNCTION 1
12 GET_ITER
- >> 13 FOR_ITER 18 (to 34)
+ >> 13 FOR_ITER 16 (to 32)
16 STORE_FAST 0 (i)
- 19 LOAD_GLOBAL 1 (examplefunc)
+
+ 60 19 LOAD_GLOBAL 1 (examplefunc)
22 LOAD_FAST 0 (i)
25 CALL_FUNCTION 1
- 28 LIST_APPEND 2
- 31 JUMP_ABSOLUTE 13
- >> 34 POP_TOP
- 35 LOAD_CONST 0 (None)
- 38 RETURN_VALUE
+ 28 POP_TOP
+ 29 JUMP_ABSOLUTE 13
+ >> 32 POP_BLOCK
+ >> 33 LOAD_CONST 0 (None)
+ 36 RETURN_VALUE
The race is on. Listcomp's first move is to build an empty list, while for loop's is to setup a loop. Both of them then proceed to load global range(), the constant 100, and call the range function for a generator. Then they both get the current iterator and get the next item, and store it into the variable i. Then they load examplefunc and i and call examplefunc. Listcomp appends it to the list and starts the loop over again. For loop does the same in three instructions instead of two. Then they both load None and return it.
So who seems better in this analysis? Here, list comprehension does some redundant operations such as building the list and appending to it, if you don't care about the result. For loop is pretty efficient too.
If you time them, using a for loop is about one-third faster than a list comprehension. (In this test, examplefunc divided its argument by five and threw it away instead of doing nothing at all.)
Test two: Keeping the result like normal.
No dummy function this test. So here are our challengers:
def listcomp_normal():
l = [i*5 for i in range(100)]
def forloop_normal():
l = []
for i in range(100):
l.append(i*5)
The diff isn't any use to us today. It's just the two machine codes in two blocks.
List comp's machine code:
55 0 BUILD_LIST 0
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (100)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 16 (to 32)
16 STORE_FAST 0 (i)
19 LOAD_FAST 0 (i)
22 LOAD_CONST 2 (5)
25 BINARY_MULTIPLY
26 LIST_APPEND 2
29 JUMP_ABSOLUTE 13
>> 32 STORE_FAST 1 (l)
35 LOAD_CONST 0 (None)
38 RETURN_VALUE
For loop's machine code:
59 0 BUILD_LIST 0
3 STORE_FAST 0 (l)
60 6 SETUP_LOOP 37 (to 46)
9 LOAD_GLOBAL 0 (range)
12 LOAD_CONST 1 (100)
15 CALL_FUNCTION 1
18 GET_ITER
>> 19 FOR_ITER 23 (to 45)
22 STORE_FAST 1 (i)
61 25 LOAD_FAST 0 (l)
28 LOAD_ATTR 1 (append)
31 LOAD_FAST 1 (i)
34 LOAD_CONST 2 (5)
37 BINARY_MULTIPLY
38 CALL_FUNCTION 1
41 POP_TOP
42 JUMP_ABSOLUTE 19
>> 45 POP_BLOCK
>> 46 LOAD_CONST 0 (None)
49 RETURN_VALUE
As you can probably already tell, the list comprehension has fewer instructions than for loop does.
List comprehension's checklist:
Build an anonymous empty list.
Load range.
Load 100.
Call range.
Get the iterator.
Get the next item on that iterator.
Store that item onto i.
Load i.
Load the integer five.
Multiply times five.
Append the list.
Repeat steps 6-10 until range is empty.
Point l to the anonymous empty list.
For loop's checklist:
Build an anonymous empty list.
Point l to the anonymous empty list.
Setup a loop.
Load range.
Load 100.
Call range.
Get the iterator.
Get the next item on that iterator.
Store that item onto i.
Load the list l.
Load the attribute append on that list.
Load i.
Load the integer five.
Multiply times five.
Call append.
Go to the top.
Go to absolute.
(Not including these steps: Load None, return it.)
The list comprehension doesn't have to do these things:
Load append of the list every time, since it's pre-bound as a local variable.
Load i twice per loop
Spend two instructions going to the top
Directly append to the list instead of calling a wrapper that appens the list
In conclusion, listcomp is a lot faster if you are going to use the values, but if you don't it's pretty slow.
Real speeds
Test one: for loop is faster by about one-third*
Test two: list comprehension is faster by about two-thirds*
*About -> second decimal place acurrate
I just ran across the dissembler function in python. But i couldn't make out what it means. Can anyone explain the working and use, based on the results of the factorial function (based on recursion and loop)
The recursive code and the corresponding dis code:
>>> def fact(n):
... if n==1:
... return 1
... return n*fact(n-1)
...
>>> dis.dis(fact)
2 0 LOAD_FAST 0 (n)
3 LOAD_CONST 1 (1)
6 COMPARE_OP 2 (==)
9 POP_JUMP_IF_FALSE 16
3 12 LOAD_CONST 1 (1)
15 RETURN_VALUE
4 >> 16 LOAD_FAST 0 (n)
19 LOAD_GLOBAL 0 (fact)
22 LOAD_FAST 0 (n)
25 LOAD_CONST 1 (1)
28 BINARY_SUBTRACT
29 CALL_FUNCTION 1
32 BINARY_MULTIPLY
33 RETURN_VALUE
And the factorial function using loop gives the following result:
def factor(n):
... f=1
... while n>1:
... f*=n
... n-=1
...
>>> dis.dis(factor)
2 0 LOAD_CONST 1 (1)
3 STORE_FAST 1 (f)
3 6 SETUP_LOOP 36 (to 45)
>> 9 LOAD_FAST 0 (n)
12 LOAD_CONST 1 (1)
15 COMPARE_OP 4 (>)
18 POP_JUMP_IF_FALSE 44
4 21 LOAD_FAST 1 (f)
24 LOAD_FAST 0 (n)
27 INPLACE_MULTIPLY
28 STORE_FAST 1 (f)
5 31 LOAD_FAST 0 (n)
34 LOAD_CONST 1 (1)
37 INPLACE_SUBTRACT
38 STORE_FAST 0 (n)
41 JUMP_ABSOLUTE 9
>> 44 POP_BLOCK
>> 45 LOAD_CONST 0 (None)
48 RETURN_VALUE
Can anyone tell me how to determine which one is faster?
To measure how fast something is running, use the timeit module, which comes with Python.
The dis module is used to get some idea of what the bytecode may look like; and its very specific to cpython.
One use of it is to see what, when and how storage is assigned for variables in a loop or method. However, this is a specialized module that is not normally used for efficiency calculations; use timeit to figure out how fast something is, and then dis to get an understanding of what is going on under the hood - to arrive at a possible why.
It's impossible to determine which one will be faster simply by looking at the bytecode; each VM has a different cost associated with each opcode and so runtimes can vary widely.
The dis.dis() function disassembles a function into its bytecode interpretation.
Timing
As stated by Ignacio, the pure length of the bytecode does not accurately represent the running time due to differences in how python interpreters actually run opcode and the timeit module would be what you want to use there.
Actual Purpose
There are several uses of this function, but they are not things that most people would end up doing. You can look at the output to help as part of the process of optimizing or debugging speed issues. It would also likely prove useful in working directly on the python interpreter, or writing your own. You can look at the documentation here to see a full list of the opcodes (though, just as that page will state, it's perfectly likely to change between versions of python).
Overall, this is not something you'd really use much in a production application (unless your application is a python disassembler!) but when you really, really need to optimize your code and debug at the lowest level, this is where the function would come in handy.
I know about inlining, and from what I checked it is not done by the Python's compiler.
My question is : is there any optimizations with the python's compiler which transforms :
print myList.__len__()
for i in range(0, myList.__len__()):
print i + myList.__len__()
to
l = myList.__len__()
print l
for i in range(0, l):
print i + l
So is it done by the compiler ?
If it is not : is it worth it to do it by myself ?
Bonus question (not so related) : I like to have a lot of functions (better for readability IMHO)... like there is no inlining in Python is this something to avoid (lots of functions) ?
No, there isn't. You can check what Python does by compiling the code to byte-code using the dis module:
>>> def test():
... print myList.__len__()
... for i in range(0, myList.__len__()):
... print i + myList.__len__()
...
>>> import dis
>>> dis.dis(test)
2 0 LOAD_GLOBAL 0 (myList)
3 LOAD_ATTR 1 (__len__)
6 CALL_FUNCTION 0
9 PRINT_ITEM
10 PRINT_NEWLINE
3 11 SETUP_LOOP 44 (to 58)
14 LOAD_GLOBAL 2 (range)
17 LOAD_CONST 1 (0)
20 LOAD_GLOBAL 0 (myList)
23 LOAD_ATTR 1 (__len__)
26 CALL_FUNCTION 0
29 CALL_FUNCTION 2
32 GET_ITER
>> 33 FOR_ITER 21 (to 57)
36 STORE_FAST 0 (i)
4 39 LOAD_FAST 0 (i)
42 LOAD_GLOBAL 0 (myList)
45 LOAD_ATTR 1 (__len__)
48 CALL_FUNCTION 0
51 BINARY_ADD
52 PRINT_ITEM
53 PRINT_NEWLINE
54 JUMP_ABSOLUTE 33
>> 57 POP_BLOCK
>> 58 LOAD_CONST 0 (None)
61 RETURN_VALUE
As you can see, the __len__ attribute is looked up and called each time.
Python cannot know what a given method will return between calls, the __len__ method is no exception. If python were to try to optimize that by assuming the value returned would be the same between calls, you'd run into countless different problems, and we haven't even tried to use multi-threading yet.
Note that you would be much better off using len(myList), and not call the __len__() hook directly:
print len(myList)
for i in xrange(len(myList):
print i + len(myList)
No, the optimization you're asking about is not done by the CPython compiler. In fact hardly any optimizations are done by the CPython compiler.
To see for yourself, import dis and disassemble a function with code like you're asking about: dis.dis(func).
The reason this isn't optimized is that it is entirely possible that an attribute (even a method like __len__) will be a completely different object the next time it is accessed. This rarely happens, of course, but Python supports it.
Attribute access does consume time, so storing a reference to an attribute you will be using repeatedly (especially in a local variable) can make your code run faster. However, it decreases readability, so I'd wait until you know that a given piece of code is a bottleneck before applying it. In your case, the time spent printing is easily going to overwhelm the attribute access.
In the final analysis, if performance were paramount you'd be using something other than Python in the first place, no?