I do this:
>>> dis.dis(lambda: 1 + 1)
0 LOAD_CONST 2 (2)
3 RETURN_VALUE
I was expecting a BINARY_ADD opcode to perform the addition. How was the sum computed?
This is the work of Python's peephole optimizer. It evaluates simple operations with only constants during the compile time itself and stores the result as a constant in the generated bytecode.
Quoting from the Python 2.7.9 Source code,
/* Fold binary ops on constants.
LOAD_CONST c1 LOAD_CONST c2 BINOP --> LOAD_CONST binop(c1,c2) */
case BINARY_POWER:
case BINARY_MULTIPLY:
case BINARY_TRUE_DIVIDE:
case BINARY_FLOOR_DIVIDE:
case BINARY_MODULO:
case BINARY_ADD:
case BINARY_SUBTRACT:
case BINARY_SUBSCR:
case BINARY_LSHIFT:
case BINARY_RSHIFT:
case BINARY_AND:
case BINARY_XOR:
case BINARY_OR:
if (lastlc >= 2 &&
ISBASICBLOCK(blocks, i-6, 7) &&
fold_binops_on_constants(&codestr[i-6], consts)) {
i -= 2;
assert(codestr[i] == LOAD_CONST);
cumlc = 1;
}
break;
Basically, it looks for instructions like this
LOAD_CONST c1
LOAD_CONST c2
BINARY_OPERATION
and evaluates that and replaces those instructions with the result and a LOAD_CONST instruction. Quoting the comment in the fold_binops_on_constants function,
/* Replace LOAD_CONST c1. LOAD_CONST c2 BINOP
with LOAD_CONST binop(c1,c2)
The consts table must still be in list form so that the
new constant can be appended.
Called with codestr pointing to the first LOAD_CONST.
Abandons the transformation if the folding fails (i.e. 1+'a').
If the new constant is a sequence, only folds when the size
is below a threshold value. That keeps pyc files from
becoming large in the presence of code like: (None,)*1000.
*/
The actual evaluation of this particular code happens in this block,
case BINARY_ADD:
newconst = PyNumber_Add(v, w);
break;
The Python interpreter interprets from the inside out, that is, it reads the 1 + 1 evaluates it to 2, then creates a function object that returns the constant 2 (notice the order here!). Finally, the dis function evaluates. the newly created lambda function object, which simply returns a 2.
Thus, the 1+1 has already been computed when the lambda function object is created, and the dis.dis() function knows nothing about the addition that took place when the interpreter read 1+1 and evaluated it to 2.
If you do something like:
>>> dis.dis(lambda: x + 1)
1 0 LOAD_GLOBAL 0 (x)
3 LOAD_CONST 1 (1)
6 BINARY_ADD
7 RETURN_VALUE
You'll notice that a BINARY_ADD instruction is used, since x + 1 can't be further simplified by itself.
Related
I'm fairly new to Python so this might seem like a trivial question to some. But I'm curious about how Python works internally when you bind a new object to a variable, referring to the previous object bound to the same variable name. Please see the code below as an example - I understand that python breaks the bond with the original object 'hello', bind it to the new object, but what is the sequence of events here? how does python break the bond with the original object but also refer to it?
greeting = 'hello'
greeting = f'y{greeting[1:len(greeting)]}'
In addition to the explanation, I would also very much appreciate some contexts. I understand that strings are immutable but what about other types like floats and integers?
Does it matter whether I understand how python operates internally? Also, where would be a good place to learn more about how Python works internally if it does?
Hope I'm being clear with my questions.
An explanation through the medium of the disassembly:
>>> dis.dis('''greeting = 'hello'
... greeting = f'y{greeting[1:len(greeting)]}'
... ''')
1 0 LOAD_CONST 0 ('hello')
2 STORE_NAME 0 (greeting)
2 4 LOAD_CONST 1 ('y')
6 LOAD_NAME 0 (greeting)
8 LOAD_CONST 2 (1)
10 LOAD_NAME 1 (len)
12 LOAD_NAME 0 (greeting)
14 CALL_FUNCTION 1
16 BUILD_SLICE 2
18 BINARY_SUBSCR
20 FORMAT_VALUE 0
22 BUILD_STRING 2
24 STORE_NAME 0 (greeting)
26 LOAD_CONST 3 (None)
28 RETURN_VALUE
The number on the far left indicates where the bytecode for a particular line begins. Line 1 is pretty self-explanatory, so I'll explain line 2.
As you might notice, your f-string doesn't survive compilation; it becomes a bunch of raw opcodes mixing the loading of constant segments with the evaluation of formatting placeholders, eventually leading to the stack being topped by all the fragments that will make up the final string. When they're all on the stack, it then puts all the fragments together at the end with BUILD_STRING 2 (which says "Take the top two values off the stack and combine them into a single string").
greeting is just a name holding a binding. It doesn't actually hold a value, just a reference to whatever object it's currently bound to. And the original reference is pushed onto the stack (with LOAD_NAME) entirely before the STORE_NAME that pops the top of the stack and rebinds greeting.
In short, the reason it works is that the value of greeting is no longer needed by the time it's replaced; it's used to make the new string, then discarded in favor of the new string.
In your second line, Python evaluates the right side of the assignment statement, which creates a string that uses the old binding for greeting. Only after evaluating that expression does it handle the assignment operator, which binds that string to the name. It's all very linear.
Floats and integers are also immutable. Only lists and dictionaries are mutable. Actually, it's not clear how you would modify an integer object in any case. You can't refer to the inside of the object. It's important to remember that in this case:
i = 3
j = 4
i = i + j
the last line just creates a new integer/float object and binds it to i. None of this attempts to modify the integer object 3.
I wrote this article that tries to delineate the difference between Python objects and the names we use:
https://github.com/timrprobocom/documents/blob/main/UnderstandingPythonObjects.md
This question already has answers here:
Numpy in-place operation performance
(2 answers)
Closed 1 year ago.
Following the question about Chaining *= += operators and the good comment of Tom Wojcik ("Why would you assume aaa *= 200 is faster than aaa = aaa * 200 ?"), I tested it in Jupyter notebook:
%%timeit aaa = np.arange(1,101,1)
aaa*=100
%%timeit aaa = np.arange(1,101,1)
aaa=aaa*100
And I was surprised because the first test is longer than the second one: 1530ns and 952ns, respectively. Why these values are so different?
TL;DR: this question is equivalent to the performance difference between inplace_binop (INPLACE_*) (aaa*=100) vs binop (BINARY_*) (aaa=aaa*100). The difference can be found by using dis module:
import numpy as np
import dis
aaa = np.arange(1,101,1)
dis.dis('''
for i in range(1000000):
aaa*=100
''')
3 14 LOAD_NAME 2 (aaa)
16 LOAD_CONST 1 (100)
18 INPLACE_MULTIPLY
20 STORE_NAME 2 (aaa)
22 JUMP_ABSOLUTE 10
>> 24 POP_BLOCK
>> 26 LOAD_CONST 2 (None)
28 RETURN_VALUE
dis.dis('''
for i in range(1000000):
aaa=aaa*100
''')
3 14 LOAD_NAME 2 (aaa)
16 LOAD_CONST 1 (100)
18 BINARY_MULTIPLY
20 STORE_NAME 2 (aaa)
22 JUMP_ABSOLUTE 10
>> 24 POP_BLOCK
>> 26 LOAD_CONST 2 (None)
28 RETURN_VALUE
Then back to your question, which is absolutely faster?
Unluckily, it's hard to say which function is faster, here's why:
You can check compile.c of CPython code directly. If you trace a bit into CPython code, here's the function call difference:
inplace_binop -> compiler_augassign -> compiler_visit_stmt
binop -> compiler_visit_expr1 -> compiler_visit_expr -> compiler_visit_kwonlydefaults
Since the function call and logic are different, that means there are tons of factors (including your input size(*), CPU...etc) could matter to the performance as well, you'll need to work on profiling to optimize your code based on your use case.
*: from others comment, you can check this post to know the performance of different input size.
The += symbol appeared in the C language in the 1970s, and - with the C idea of "smart assembler" correspond to a clearly different machine instruction and addressing mode
"a=a * 100" "a *= 100" produce the same effect but correspond at low level to a different way the processor is working.
a *= 100 means
find the place identified by a
multiply with 100
a = a * 100 means:
evaluate a*100
Find the place identified by a
Copy a into an accumulator
multiply with 100 the accumulator
Store the result in a
Find the place identified by a
Copy the accumulator to it
Python is coded in C, it inherited the syntax from C, but since there is no translation / optimization before the execution in interpreted languages, things are not necessarily so intimately related (since there is one less parsing step). However, an interpreter can refer to different execution routines for the three types of expression, taking advantage of different machine code depending on how the expression is formed and on the evaluation context.
Couldn't find much on this. Trying to compare 2 values, but they can't be equal. In my case, they can be (and often are) either greater than or less than.
Should I use:
if a <> b:
dostuff
or
if a != b:
dostuff
This page says they're similar, which implies there's at least something different about them.
Quoting from Python language reference,
The comparison operators <> and != are alternate spellings of the same operator. != is the preferred spelling; <> is obsolescent.
So, they both are one and the same, but != is preferred over <>.
I tried disassembling the code in Python 2.7.8
from dis import dis
form_1 = compile("'Python' <> 'Python'", "string", 'exec')
form_2 = compile("'Python' != 'Python'", "string", 'exec')
dis(form_1)
dis(form_2)
And got the following
1 0 LOAD_CONST 0 ('Python')
3 LOAD_CONST 0 ('Python')
6 COMPARE_OP 3 (!=)
9 POP_TOP
10 LOAD_CONST 1 (None)
13 RETURN_VALUE
1 0 LOAD_CONST 0 ('Python')
3 LOAD_CONST 0 ('Python')
6 COMPARE_OP 3 (!=)
9 POP_TOP
10 LOAD_CONST 1 (None)
13 RETURN_VALUE
Both <> and != are generating the same byte code
6 COMPARE_OP 3 (!=)
So they both are one and the same.
Note:
<> is removed in Python 3.x, as per the Python 3 Language Reference.
Quoting official documentation,
!= can also be written <>, but this is an obsolete usage kept for backwards compatibility only. New code should always use !=.
Conclusion
Since <> is removed in 3.x, and as per the documentation, != is the preferred way, better don't use <> at all.
Just stick to !=.
<> is outdated! Please check recent python reference manual.
In python for comparisons like this, does python create a temporary object for the string constant "help" and then continue with the equality comparison ? The object would be GCed after some point.
s1 = "nohelp"
if s1 == "help":
# Blah Blah
String literals, like all Python constants, are created during compile time, when the source code is translated to byte code. And because all Python strings are immutable the interpreter can re-use the same string object if it encounters the same string literal in multiple places. It can even do that if the literal string is created via concatenation of literals, but not if the string is built by concatenating a string literal to an existing string object.
Here's a short demo that creates a few identical strings inside and outside of functions. It also dumps the disassembled byte code of one of the functions.
from __future__ import print_function
from dis import dis
def f1(s):
a = "help"
print('f1', id(s), id(a))
return s > a
def f2(s):
a = "help"
print('f2', id(s), id(a))
return s > a
a = "help"
print(id(a))
print(f1("he" + "lp"))
b = "h"
print(f2(b + "elp"))
print("\nf1")
dis(f1)
typical output on a 32 bit machine running Python 2.6.6
3073880672
f1 3073880672 3073880672
False
f2 3073636576 3073880672
False
f1
26 0 LOAD_CONST 1 ('help')
3 STORE_FAST 1 (a)
27 6 LOAD_GLOBAL 0 (print)
9 LOAD_CONST 2 ('f1')
12 LOAD_GLOBAL 1 (id)
15 LOAD_FAST 0 (s)
18 CALL_FUNCTION 1
21 LOAD_GLOBAL 1 (id)
24 LOAD_FAST 1 (a)
27 CALL_FUNCTION 1
30 CALL_FUNCTION 3
33 POP_TOP
28 34 LOAD_FAST 0 (s)
37 LOAD_FAST 1 (a)
40 COMPARE_OP 4 (>)
43 RETURN_VALUE
Note that the ids of all the "help" strings are identical, apart from the one constructed with b + "elp".
(BTW, Python will concatenate adjacent string literals, so instead of writing "he" + "lp" I could've written "he" "lp", or even "he""lp").
The string literals themselves are not freed until the process is cleaning itself up at termination, however a string like b would be GC'ed if it went out of scope.
Note that in CPython (standard Python) when objects are GC'ed their memory is returned to Python's allocation system for recycling, not to the OS. Python does return unneeded memory to the OS, but only in special circumstances. See Releasing memory in Python and Why doesn't memory get released to system after large queries (or series of queries) in django?
Another question that discusses this topic: Why strings object are cached in python
I have found a few links talking about switch cases being faster in c++ than if else because it can be optimized in compilation. I then found some suggestions people had that using a dictionary may be faster than an If statement. However, most of the conversation are about someones work end just end up discussing that they should optimize other parts of the code first and it wont matter unless your doing millions of if else. Can anyone explain why this is?
Say I have 100 unique numbers that are going to be streamed in to a python code constantly. I want to check which number it is, then execute something. So i could either do a ton of if else, or i could put each number in a dictionary. For arguments sake, lets say its a single thread.
Does someone understand the layer between python and the low level execution that can explain how this is working?
Thanks :)
However, most of the conversation are about someones work end just end
up discussing that they should optimize other parts of the code first
and it wont matter unless your doing millions of if else. Can anyone
explain why this is?
Generally, you should only bother to optimize code if you really need to, i.e. if the program's performance is unusably slow.
If this is the case, you should use a profiler to determine which parts are actually causing the most problems. For Python, the cProfile module is pretty good for this.
Does someone understand the layer between python and the low level
execution that can explain how this is working?
If you want to get an idea of how your code executes, take a look at the dis module.
A quick example...
import dis
# Here are the things we might want to do
def do_something_a():
print 'I did a'
def do_something_b():
print 'I did b'
def do_something_c():
print 'I did c'
# Case 1
def f1(x):
if x == 1:
do_something_a()
elif x == 2:
do_something_b()
elif x == 3:
do_something_c()
# Case 2
FUNC_MAP = {1: do_something_a, 2: do_something_b, 3: do_something_c}
def f2(x):
FUNC_MAP[x]()
# Show how the functions execute
print 'Case 1'
dis.dis(f1)
print '\n\nCase 2'
dis.dis(f2)
...which outputs...
Case 1
18 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 COMPARE_OP 2 (==)
9 POP_JUMP_IF_FALSE 22
19 12 LOAD_GLOBAL 0 (do_something_a)
15 CALL_FUNCTION 0
18 POP_TOP
19 JUMP_FORWARD 44 (to 66)
20 >> 22 LOAD_FAST 0 (x)
25 LOAD_CONST 2 (2)
28 COMPARE_OP 2 (==)
31 POP_JUMP_IF_FALSE 44
21 34 LOAD_GLOBAL 1 (do_something_b)
37 CALL_FUNCTION 0
40 POP_TOP
41 JUMP_FORWARD 22 (to 66)
22 >> 44 LOAD_FAST 0 (x)
47 LOAD_CONST 3 (3)
50 COMPARE_OP 2 (==)
53 POP_JUMP_IF_FALSE 66
23 56 LOAD_GLOBAL 2 (do_something_c)
59 CALL_FUNCTION 0
62 POP_TOP
63 JUMP_FORWARD 0 (to 66)
>> 66 LOAD_CONST 0 (None)
69 RETURN_VALUE
Case 2
29 0 LOAD_GLOBAL 0 (FUNC_MAP)
3 LOAD_FAST 0 (x)
6 BINARY_SUBSCR
7 CALL_FUNCTION 0
10 POP_TOP
11 LOAD_CONST 0 (None)
14 RETURN_VALUE
...so it's pretty easy to see which function has to execute the most instructions.
As for which is actually faster, that's something you'd have to check by profiling the code.
The if/elif/else structure compares the key it was given to a sequence of possible values one by one until it finds a match in the condition of some if statement, then reads what it is supposed to execute from inside the if block. This can take a long time, because so many checks (n/2 on average, for n possible values) have to be made for every lookup.
The reason that a sequence of if statements is more difficult to optimize than a switch statement is that the condition checks (what's inside the parens in C++) might conceivably change the state of some variable that's involved in the next check, so you have to do them in order. The restrictions on switch statements remove that possibility, so the order doesn't matter (I think).
Python dictionaries are implemented as hash tables. The idea is this: if you could deal with arbitrarily large numbers and had infinite RAM, you could create a huge array of function pointers that is indexed just by casting whatever your lookup value is to an integer and using that as the index. Lookup would be virtually instantaneous.
You can't do that, of course, but you can create an array of some manageable length, pass the lookup value to a hash function (which generates some integer, depending on the lookup value), then % your result with the length of your array to get an index within the bounds of that array. That way, lookup takes as much time as is needed to call the hash function once, take the modulus, and jump to an index. If the amount of different possible lookup values is large enough, the overhead of the hash function becomes negligible compared to those n/2 condition checks.
(Actually, since many different lookup values will inevitably map to the same index, it's not quite that simple. You have to check for and resolve possible conflicts, which can be done in a number of ways. Still, the gist of it is as described above.)