how does "is" identity operator work in python?

how does "is" identity operator work in python? - python

here is my code
x = 5
y = 5
print(x is y)
print(id(x))
print(id(y))
and the output is
True
1903991482800
1903991482800
I don't know why x and y have the same location here
please help me illustrate this problom!
Thanks!

your issue is technically a complicated concept, but I will try to explain it to you in simple terms.
Let's say a number, say '3', is stored in your memory. When you declare a = 3, what the Python interpreter actually does is make that variable 'a' point to the memory location where 3 is stored. So if the number 3 is stored in an address like 'xxyyzz', then the moment you declare a = 3, the variable a points to the memory address 'xxyyzz'. Similarly, when you declare another variable b = 3, what happens is variable 'b' also points to the memory location 'xxyyzz'. The 'is' operator in Python compares the memory addresses of those variables, so you get id(a)==id(b) as True.
Hope this helps!

Related

Why doesn't my "formula" variable update automatically, like in a spreadsheet? How can I re-compute the value?

I have noticed that it's common for beginners to have the following simple logical error. Since they genuinely don't understand the problem, a) their questions can't really be said to be caused by a typo (a full explanation would be useful); b) they lack the understanding necessary to create a proper example, explain the problem with proper terminology, and ask clearly. So, I am asking on their behalf, to make a canonical duplicate target.
Consider this code example:
x = 1
y = x + 2
for _ in range(5):
x = x * 2 # so it will be 2 the first time, then 4, then 8, then 16, then 32
print(y)
Each time through the loop, x is doubled. Since y was defined as x + 2, why doesn't it change when x changes? How can I make it so that the value is automatically updated, and I get the expected output
4
6
10
18
34
?

Declarative programming
Many beginners expect Python to work this way, but it does not. Worse, they may inconsistently expect it to work that way. Carefully consider this line from the example:
x = x * 2
If assignments were like mathematical formulas, we'd have to solve for x here. The only possible (numeric) value for x would be zero, since any other number is not equal to twice that number. And how should we account for the fact that the code previously says x = 1? Isn't that a contradiction? Should we get an error message for trying to define x two different ways? Or expect x to blow up to infinity, as the program keeps trying to double the old value of x
Of course, none of those things happen. Like most programming languages in common use, Python is a declarative language, meaning that lines of code describe actions that occur in a defined order. Where there is a loop, the code inside the loop is repeated; where there is something like if/else, some code might be skipped; but in general, code within the same "block" simply happens in the order that it's written.
In the example, first x = 1 happens, so x is equal to 1. Then y = x + 2 happens, which makes y equal to 3 for the time being. This happened because of the assignment, not because of x having a value. Thus, when x changes later on in the code, that does not cause y to change.
Going with the (control) flow
So, how do we make y change? The simplest answer is: the same way that we gave it this value in the first place - by assignment, using =. In fact, thinking about the x = x * 2 code again, we already have seen how to do this.
In the example code, we want y to change multiple times - once each time through the loop, since that is where print(y) happens. What value should be assigned? It depends on x - the current value of x at that point in the process, which is determined by using... x. Just like how x = x * 2 checks the existing value of x, doubles it, and changes x to that doubled result, so we can write y = x + 2 to check the existing value of x, add two, and change y to be that new value.
Thus:
x = 1
for _ in range(5):
x = x * 2
y = x + 2
print(y)
All that changed is that the line y = x + 2 is now inside the loop. We want that update to happen every time that x = x * 2 happens, immediately after that happens (i.e., so that the change is made in time for the print(y)). So, that directly tells us where the code needs to go.
defining relationships
Suppose there were multiple places in the program where x changes:
x = x * 2
y = x + 2
print(y)
x = 24
y = x + 2
print(y)
Eventually, it will get annoying to remember to update y after every line of code that changes x. It's also a potential source of bugs, that will get worse as the program grows.
In the original code, the idea behind writing y = x + 2 was to express a relationship between x and y: we want the code to treat y as if it meant the same thing as x + 2, anywhere that it appears. In mathematical terms, we want to treat y as a function of x.
In Python, like most other programming languages, we express the mathematical concept of a function, using something called... a function. In Python specifically, we use the def function to write functions. It looks like:
def y(z):
return z + 2
We can write whatever code we like inside the function, and when the function is "called", that code will run, much like our existing "top-level" code runs. When Python first encounters the block starting with def, though, it only creates a function from that code - it doesn't run the code yet.
So, now we have something named y, which is a function that takes in some z value and gives back (i.e., returns) the result of calculating z + 2. We can call it by writing something like y(x), which will give it our existing x value and evaluate to the result of adding 2 to that value.
Notice that the z here is the function's own name for the value was passed in, and it does not have to match our own name for that value. In fact, we don't have to have our own name for that value at all: for example, we can write y(1), and the function will compute 3.
What do we mean by "evaluating to", or "giving back", or "returning"? Simply, the code that calls the function is an expression, just like 1 + 2, and when the value is computed, it gets used in place, in the same way. So, for example, a = y(1) will make a be equal to 3:
The function receives a value 1, calling it z internally.
The function computes z + 2, i.e. 1 + 2, getting a result of 3.
The function returns the result of 3.
That means that y(1) evaluated to 3; thus, the code proceeds as if we had put 3 where the y(1) is.
Now we have the equivalent of a = 3.
For more about using functions, see How do I get a result (output) from a function? How can I use the result later?.
Going back to the beginning of this section, we can therefore use calls to y directly for our prints:
x = x * 2
print(y(x))
x = 24
print(y(x))
We don't need to "update" y when x changes; instead, we determine the value when and where it is used. Of course, we technically could have done that anyway: it only matters that y is "correct" at the points where it's actually used for something. But by using the function, the logic for the x + 2 calculation is wrapped up, given a name, and put in a single place. We don't need to write x + 2 every time. It looks trivial in this example, but y(x) would do the trick no matter how complicated the calculation is, as long as x is the only needed input. The calculation only needs to be written once: inside the function definition, and everything else just says y(x).
It's also possible to make the y function use the x value directly from our "top-level" code, rather than passing it in explicitly. This can be useful, but in the general case it gets complicated and can make code much harder to understand and prone to bugs. For a proper understanding, please read Using global variables in a function and Short description of the scoping rules?.

Named conditions in Python and a silly textbook - and how to prove they're wrong

Python's abstraction is often seen as magic by many. Coming from a C background, I know very well there is no such thing as magic, only cold hard code made up of simple components that produces abstraction.
So, when a textbook and my teacher say that we can "store conditions" or use "named conditions" for readability, and say that assigning a boolean expression to a variable suddenly makes it a dynamic condition akin to a macro, I lose it.
EDIT 1 : They don't explicitly say its like a macro (direct quotes are placed within quotes) since we aren't expected to know any other language beforehand.
The way they say that " the variable stores the condition unevaluated ", is like saying it is a macro , and this is my opinion. They imply it to be practically the equivalent of a macro by their articulation, just without saying the word 'macro'.
Here's the claim in code form :
x,y = 1,2
less = x < y
more = x > y
'''
less/ more claimed to store not boolean True/False but some magical way of storing the
expression itself (unevaluated, say like a macro) and apparently
'no value is being stored to less and more'.
'''
It is being represented as though one was doing :
// C-style
#define less (x < y)
#define more (x > y)
Of course, this is not true, because all less and more store in the so-called 'named conditions' is just the return value of the operator between x and y .
This is obvious since < , >, == , <= , >= all have boolean return values as per the formal man pages and the spec, and less or more are only storing the True or False boolean return value , which we may prove by calling print() on them and/or by calling type() on them.
Also, changing the values of x and y , say by doing x,y = y,x does not change the values of less or more because they store not a dynamic expression but the static return value of the > or < operand on the initial x and y values.
The question isn't that this claim is a misunderstanding of the purported abstraction ( its not actually an abstraction, similar storage can be achieved in asm or C too) , but rather how to clearly and efficiently articulate to my teacher that it is not working like a C macro but rather storing the boolean return value of >or < statically.

Obviously less = x < y just looks at the current values of x and y and stores either True or False into the variable less.
If I understand where you and your teacher disagree, you two have a different idea of what the following code will print out:
x, y = 1, 2
less = x < y
print(less)
x, y = 2, 1
print(less)

"Macro's" could be implemented as text strings that can be evaluated, like (bad example - not the recommended solution):
less = "({0}) < ({1})"
and use them like:
x = 1
y = 3
outcome = eval(less.format("x", "y"))
But this is really a silly thing to do, and eval() is susceptible for security issues.
Perhaps your teacher meant to use lambda expressions, which are nameless, ad-hoc functions:
less = lambda a, b: a < b
x = 1
y = 3
outcome = less(x, y)
Note:
There is already a function for lambda a, b: a < b available in the standard library operator module: operator.lt.

Why does adding a semicolon in Python change the result? [duplicate]

This question already has answers here:
The `is` operator behaves unexpectedly with non-cached integers
(2 answers)
Python3 multiple assignment and memory address [duplicate]
(3 answers)
Closed 4 years ago.
I found a strange behavior with the semicolon ";" in Python.
>>> x=20000;y=20000
>>> x is y
True
>>> x=20000
>>> y=20000
>>> x is y
False
>>> x=20000;
>>> y=20000
>>> x is y
False
Why does the first test return "True", and the others return "False"? My Python version is 3.6.5.

In the interactive interpreter, the first semi-colon line is read and evaluated in one pass. As such, the interpreter recognizes that 20000 is the same immutable int value in each assignment, and so can (it doesn't have to, but does) make x and y references to the same object.
The important point is that this is simply an optimization that the interactive interpreter chooses to make; it's not something guaranteed by the language or some special property of the ; that joins two statements into one.
In the following two examples, by the time y=20000 is read and evaluated, x=20000 (with or without the semi-colon) has already been evaluated and forgotten. Since 20000 isn't in the range (-5 to 257) of pre-allocated int values, CPython doesn't try to find another instance of 20000 already in memory; it just creates a new one for y.

The is operator checks whether two values are the same object in memory. It's not meant to be used for checking for equality. For what is worth, you could consider the fact that it sometimes returns True and sometimes False just to be a matter of luck (even if it isn't).
For example, the results are different in an interactive session and in a standalone program:
$ cat test.py
x = 200000; y = 200000
print(x is y)
xx = 200000
yy = 200000
print(xx is yy)
$ python test.py
True
True
Or you have this other example:
>>> x = 50 + 50; y = 50 + 50
>>> x is y
True
>>> x = 5000 + 5000; y = 5000 + 5000
>>> x is y
False
This happens because the interpreter caches small numbers so they are always the same object, but it doesn't for large numbers, so both additions in the second case create a new 10000 object. It has nothing to do with the semicolon.

Odd Python ID assignment for Int values ==> Inconsistent 'is' operation [duplicate]

This question already has answers here:
"is" operator behaves unexpectedly with integers
(11 answers)
The `is` operator behaves unexpectedly with non-cached integers
(2 answers)
Closed 5 years ago.
So Python 3.6.2 has some weird behavior with their assignment of id's for integer values.
For any integer value in the range [-5, 256], any variable assigned a given value will also be assigned the same ID as any other variable with the same value. This effect can be seen below.
>>> a, b = -5, -5
>>> id(a), id(b)
(1355597296, 1355597296)
>>> a, b = -6, -6
>>> id(a), id(b)
(2781041259312, 2781041260912)
In fact, to see the ID pairs in action, you can just run this simple program that prints out the number and id in the range that I'm talking about...
for val in range(-6, 258):
print(format(val, ' 4d'), ':', format(id(val), '11x'))
If you add some other variables with values outside this range, you will see the boundary condition (i.e. -6 and 257) values id's change within the python interpreter, but never the values here.
This means (at least to me) that Python has taken the liberty to hardcode the addresses of variables that hold values in a seemingly arbitrary range of numbers.
In practice, this can be a little dangerous for a beginning Python learner: since the ID's assigned are the same within what is a a normal range of operation for beginners, they may be inclined to use logic that might get them in trouble, even though it seemingly works, and makes sense...
One possible (though a bit odd) problem might be printing an incrementing number:
a = 0
b = 10
while a is not b:
a = a + 1
print(a)
This logic, though not in the standard Pythonic way, works and is fine as long as b is in the range of statically defined numbers [-5. 256]
However, as soon as b is raised out of this range, we see the same strange behavior. In this case, it actually throws the code into an infinite loop.
I know that using 'is' to compare values is really not a good idea, but this produces inconsistent results when using the 'is' operator, and it is not immediately obvious to someone new to the language, and it would be especially confusing for new programmers that mistakenly used this method.
So my question is...
a) Why (was Python written to behave this way), and
b) Should it be changed?
p.s. In order to properly demonstrate the range in a usable script, I had to do some odd tweaks that really are improper code. However, I still hold my argument, since my method would not show any results if this odd glitch didn't exist.
for val in range(-6, 300):
a = int(float(val))
b = int(float(val))
print(format(a, ' 4d'), format(id(a), '11x'), ':',format(b, ' 4d'), format(id(b), '11x'), ':', a is b)
val = val + 1
The float(int(val)) is necessary to force Python to give each value a new address/id rather than the pointer to the object that it is accessing.

This is documented behavior of Python:
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object.
source
It helps to save memory and to make operations a bit faster.
It is implementation-specific. For example, IronPython has a range between -1000 and 1000 in which it it re-uses integers.

Integer object identity test: inconsistent behavior between large positive and small negative integers

I am using Anaconda (Python 3.6).
In the interactive mode, I did object identity test for positive integers >256:
# Interactive test 1
>>> x = 1000
>>> y = 1000
>>> x is y
False
Clearly, large integers (>256) writing in separate lines are not reused in interactive mode.
But if we write the assignment in one line, the large positive integer object is reused:
# Interactive test 2
>>> x, y = 1000, 1000
>>> x is y
True
That is, in interactive mode, writing the integer assignments in one or separate lines would make a difference for reusing the integer objects (>256). For integers in [-5,256] (as described https://docs.python.org/2/c-api/int.html), caching mechanism ensures that only one object is created, whether or not the assignment is in the same or different lines.
Now let's consider small negative integers less than -5 (any negative integer beyond the range [-5, 256] would serve the purpose), surprising results come out:
# Interactive test 3
>>> x, y = -6, -6
>>> x is y
False # inconsistent with the large positive integer 1000
>>> -6 is -6
False
>>> id(-6), id(-6), id(-6)
(2280334806256, 2280334806128, 2280334806448)
>>> a = b =-6
>>> a is b
True # different result from a, b = -6, -6
Clearly, this demonstrates inconsistency for object identity test between large positive integers (>256) and small negative integers (<-5). And for small negative integers (<-5), writing in the form a, b = -6, -6 and a = b =-6 also makes a difference (in contrast, it doesn't which form is used for large integers). Any explanations for these strange behaviors?
For comparison, let's move on to IDE run (I am using PyCharm with the same Python 3.6 interpreter), I run the following script
# IDE test case
x = 1000
y = 1000
print(x is y)
It prints True, different from the interactive run. Thanks to #Ahsanul Haque, who already gave a nice explanation to the inconsistency between IDE run and interactive run. But it still remains to answer my question on the inconsistency between large positive integer and small negative integer in the interactive run.

Only one copy of a particular constant is created for a particular source code and reused if needed further. So, in pycharm, you are getting x is y == True.
But, in the interpreter, things are different. Here, only one line/statement runs at once. A particular constant is created for each new line. It is not reused in the next line. So, x is not y here.
But, if you can initialize in same line, you can have the same behavior (Reusing the same constant).
>>> x,y = 1000, 1000
>>> x is y
True
>>> x = 1000
>>> y = 1000
>>> x is y
False
>>>
Edit:
A block is a piece of Python program text that is executed as a unit.
In an IDE, the whole module get executed at once i.e. the whole module is a block. But in interactive mode, each instruction is actually a block of code that is executed at once.
As I said earlier, a particular constant is created once for a block of code and reused if reappears in that block of code again.
This is main difference between IDE and interpreter.
Then, why actually interpreter gives same output as IDE for smaller numbers? This is when, integer caching comes into consideration.
If numbers are smaller, then they are cached and reused in next code block. So, we get the same id in the IDE.
But if they are bigger, they are not cached. Rather a new copy is created. So, as expected, the id is different.
Hope this makes sense now,

When you run 1000 is 1000 in the interactive shell or as part of the bigger script, CPython generates the bytecode like
In [3]: dis.dis('1000 is 1000')
...:
1 0 LOAD_CONST 0 (1000)
2 LOAD_CONST 0 (1000)
4 COMPARE_OP 8 (is)
6 RETURN_VALUE
What it does is:
Loads two constants (LOAD_CONST pushes co_consts[consti] onto the stack -- docs)
Compares them using is (True if operands refer to the same object; False otherwise)
Returns the result
As CPython only creates one Python object for a constant used in a code block, 1000 is 1000 will result in a single integer constant being created:
In [4]: code = compile('1000 is 1000', '<string>', 'single') # code object
In [5]: code.co_consts # constants used by the code object
Out[5]: (1000, None)
According to the bytecode above, Python will load that same object twice and compare it with itself, so the expression will evaluate to True:
In [6]: eval(code)
Out[6]: True
The results are different for -6, because -6 is not immediately recognized as a constant:
In [7]: ast.dump(ast.parse('-6'))
Out[7]: 'Module(body=[Expr(value=UnaryOp(op=USub(), operand=Num(n=6)))])'
-6 is an expression negating the value of the integer literal 6.
Nevertheless, the bytecode for -6 is -6 is virtually the same as the first bytecode sample:
In [8]: dis.dis('-6 is -6')
1 0 LOAD_CONST 1 (-6)
2 LOAD_CONST 2 (-6)
4 COMPARE_OP 8 (is)
6 RETURN_VALUE
So Python loads two -6 constants and compares them using is.
How does the -6 expression become a constant? CPython has a peephole optimizer, capable of optimizing simple expressions involving constants by evaluating them right after the compilation, and storing the results in the table of constants.
As of CPython 3.6, folding unary operations is handled by fold_unaryops_on_constants in Python/peephole.c. In particular, - (unary minus) is evaluated by PyNumber_Negative that returns a new Python object (-6 is not cached). After that, the newly created object is inserted to the consts table. However, the optimizer does not check whether the result of the expression can be reused, so the results of identical expressions end up being distinct Python objects (again, as of CPython 3.6).
To illustrate this, I'll compile the -6 is -6 expression:
In [9]: code = compile('-6 is -6', '<string>', 'single')
There're two -6 constants in the co_consts tuple
In [10]: code.co_consts
Out[10]: (6, None, -6, -6)
and they have different memory addresses
In [11]: [id(const) for const in code.co_consts if const == -6]
Out[11]: [140415435258128, 140415435258576]
Of course, this means that -6 is -6 evaluates to False:
In [12]: eval(code)
Out[12]: False
For the most part the explanation above remains valid in presence of variables. When executed in the interactive shell, these three lines
>>> x = 1000
>>> y = 1000
>>> x is y
False
are parts of three different code blocks, so the 1000 constant won't be reused. However, if you put them all in one code block (like a function body) the constant will be reused.
In contrast, the x, y = 1000, 1000 line is always executed in one code block (even in the interactive shell), and therefore CPython always reuses the constant. In x, y = -6, -6, -6 isn't reused for the reasons explained in the first part of my answer.
x = y = -6 is trivial. Since there's exactly one Python object involved, x is y would return True even if you replaced -6 with something else.

For complement the answer of the Ahsanul Haque, Try this in any IDE:
x = 1000
y = 1000
print (x is y)
print('\ninitial id x: ',id(x))
print('initial id y: ',id(y))
x=2000
print('\nid x after change value: ',id(x))
print('id y after change x value: ', id(y))
initial id x: 139865953872336
initial id y: 139865953872336
id x after change value: 139865953872304
id y after change x value: 139865953872336
Very likely you will see the same ID for 'x' and 'y', then run the code in the interpreter and ids will be different.
>x=1000
>y=1000
>id(x)
=> 139865953870576
>id(y)
=> 139865953872368
See Here.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how does "is" identity operator work in python? - python

here is my code x = 5 y = 5 print(x is y) print(id(x)) print(id(y)) and the output is True 1903991482800 1903991482800 I don't know why x and y have the same location here please help me illustrate this problom! Thanks!

Related

Why doesn't my "formula" variable update automatically, like in a spreadsheet? How can I re-compute the value?

Named conditions in Python and a silly textbook - and how to prove they're wrong

Why does adding a semicolon in Python change the result? [duplicate]

Odd Python ID assignment for Int values ==> Inconsistent 'is' operation [duplicate]

Integer object identity test: inconsistent behavior between large positive and small negative integers

Categories

Resources