Comparing strings using '==' and 'is' [duplicate]

Comparing strings using '==' and 'is' [duplicate] - python

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
Types for which “is” keyword may be equivalent to equality operator in Python
Python “is” operator behaves unexpectedly with integers
Hi.
I have a question which perhaps might enlighten me on more than what I am asking.
Consider this:
>>> x = 'Hello'
>>> y = 'Hello'
>>> x == y
True
>>> x is y
True
I have always used the comparison operator. Also I read that is compares the memory address and hence in this case, returns True
So my question is, is this another way to compare variables in Python? If yes, then why is this not used?
Also I noticed that in C++, if the variables have the same value, their memory addresses are different.
{ int x = 40; int y = 40; cout << &x, &y; }
0xbfe89638, 0xbfe89634
What is the reason for Python having the same memory addresses?

This is an implementation detail and absolutely not to be relied upon. is compares identities, not values. Short strings are interned, so they map to the same memory address, but this doesn't mean you should compare them with is. Stick to ==.

There are two ways to check for equality in Python: == and is. == will check the value, while is will check the identity. In almost every case, if is is true, then == must be true.
Sometimes, Python (specifically, CPython) will optimize values together so that they have the same identity. This is especially true for short strings. Python realizes that 'Hello' is the same as 'Hello' and since strings are immutable, they become the same through string interning / string pooling.
See a related question: Python: Why does ("hello" is "hello") evaluate as True?

This is because of a Python feature called String interning which is a method of storing only one copy of each distinct string value.

In Python both strings and integers are immutable therefore you can cache them. Integers in the range of ´-5´ to ´256´ and small strings(don't know the exact size atm) get cached, therefore they are the same object. x and y are only names that refer to these objects.
Also == compares for equals values, while is compares for object identity. None True and False are global objects, for example you can rebind False to True.
The following shows that not every thing is being cached:
x = 'Test' * 2000
y = 'Test' * 2000
>>> x == y
True
>>> x is y
False
>>> x = 10000000000000
>>> y = 10000000000000
>>> x == y
True
>>> x is y
False

In Python, variables are just names that point to some object (and they can point to the same object). In C++, variables also define the actual memory that is reserved for them; this is why they have distinct memory addresses.
About Python string interning and differences between the two comparison operators, see carl's response.

Related

`is` vs `==` for comparing primitives [duplicate]

I've started learning Python (python 3.3) and I was trying out the is operator. I tried this:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
>>> c = 'isitthespace'
>>> d = 'isitthespace'
>>> c is d
True
>>> e = 'isitthespace?'
>>> f = 'isitthespace?'
>>> e is f
False
It seems like the space and the question mark make the is behave differently. What's going on?
EDIT: I know I should be using ==, I just wanted to know why is behaves like this.

Warning: this answer is about the implementation details of a specific python interpreter. comparing strings with is==bad idea.
Well, at least for cpython3.4/2.7.3, the answer is "no, it is not the whitespace". Not only the whitespace:
Two string literals will share memory if they are either alphanumeric or reside on the same block (file, function, class or single interpreter command)
An expression that evaluates to a string will result in an object that is identical to the one created using a string literal, if and only if it is created using constants and binary/unary operators, and the resulting string is shorter than 21 characters.
Single characters are unique.
Examples
Alphanumeric string literals always share memory:
>>> x='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> y='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> x is y
True
Non-alphanumeric string literals share memory if and only if they share the enclosing syntactic block:
(interpreter)
>>> x='`!##$%^&*() \][=-. >:"?<a'; y='`!##$%^&*() \][=-. >:"?<a';
>>> z='`!##$%^&*() \][=-. >:"?<a';
>>> x is y
True
>>> x is z
False
(file)
x='`!##$%^&*() \][=-. >:"?<a';
y='`!##$%^&*() \][=-. >:"?<a';
z=(lambda : '`!##$%^&*() \][=-. >:"?<a')()
print(x is y)
print(x is z)
Output: True and False
For simple binary operations, the compiler is doing very simple constant propagation (see peephole.c), but with strings it does so only if the resulting string is shorter than 21 charcters. If this is the case, the rules mentioned earlier are in force:
>>> 'a'*10+'a'*10 is 'a'*20
True
>>> 'a'*21 is 'a'*21
False
>>> 'aaaaaaaaaaaaaaaaaaaaa' is 'aaaaaaaa' + 'aaaaaaaaaaaaa'
False
>>> t=2; 'a'*t is 'aa'
False
>>> 'a'.__add__('a') is 'aa'
False
>>> x='a' ; x+='a'; x is 'aa'
False
Single characters always share memory, of course:
>>> chr(0x20) is ' '
True

To expand on Ignacio’s answer a bit: The is operator is the identity operator. It is used to compare object identity. If you construct two objects with the same contents, then it is usually not the case that the object identity yields true. It works for some small strings because CPython, the reference implementation of Python, stores the contents separately, making all those objects reference to the same string content. So the is operator returns true for those.
This however is an implementation detail of CPython and is generally neither guaranteed for CPython nor any other implementation. So using this fact is a bad idea as it can break any other day.
To compare strings, you use the == operator which compares the equality of objects. Two string objects are considered equal when they contain the same characters. So this is the correct operator to use when comparing strings, and is should be generally avoided if you do not explicitely want object identity (example: a is False).
If you are really interested in the details, you can find the implementation of CPython’s strings here. But again: This is implementation detail, so you should never require this to work.

The is operator relies on the id function, which is guaranteed to be unique among simultaneously existing objects. Specifically, id returns the object's memory address. It seems that CPython has consistent memory addresses for strings containing only characters a-z and A-Z.
However, this seems to only be the case when the string has been assigned to a variable:
Here, the id of "foo" and the id of a are the same. a has been set to "foo" prior to checking the id.
>>> a = "foo"
>>> id(a)
4322269384
>>> id("foo")
4322269384
However, the id of "bar" and the id of a are different when checking the id of "bar" prior to setting a equal to "bar".
>>> id("bar")
4322269224
>>> a = "bar"
>>> id(a)
4322268984
Checking the id of "bar" again after setting a equal to "bar" returns the same id.
>>> id("bar")
4322268984
So it seems that cPython keeps consistent memory addresses for strings containing only a-zA-Z when those strings are assigned to a variable. It's also entirely possible that this is version dependent: I'm running python 2.7.3 on a macbook. Others might get entirely different results.

In fact your code amounts to comparing objects id (i.e. their physical address). So instead of your is comparison:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
You can do:
>>> id(a) == id(b)
False
But, note that if a and b were directly in the comparison it would work.
>>> id('is it the space?') == id('is it the space?')
True
In fact, in an expression there's sharing between the same static strings. But, at the program scale there's only sharing for word-like strings (so neither spaces nor punctuations).
You should not rely on this behavior as it's not documented anywhere and is a detail of implementation.

Two or more identical strings of consecutive alphanumeric (only) characters are stored in one structure, thus they share their memory reference. There are posts about this phenomenon all over the internet since the 1990's. It has evidently always been that way. I have never seen a reasonable guess as to why that's the case. I only know that it is. Furthermore, if you split and re-join alphanumeric strings to remove spaces between words, the resulting identical alphanumeric strings do NOT share a reference, which I find odd. See below:
Add any non-alphanumeric value identically to both strings, and they instantly become copies, but not shared references.
a ="abbacca"; b = "abbacca"; a is b => True
a ="abbacca "; b = "abbacca "; a is b => False
a ="abbacca?"; b = "abbacca?"; a is b => False
~Dr. C.

'is' operator compare the actual object.
c is d should also be false. My guess is that python make some optimization and in that case, it is the same object.

why id is different in when both string are the same? [duplicate]

I've started learning Python (python 3.3) and I was trying out the is operator. I tried this:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
>>> c = 'isitthespace'
>>> d = 'isitthespace'
>>> c is d
True
>>> e = 'isitthespace?'
>>> f = 'isitthespace?'
>>> e is f
False
It seems like the space and the question mark make the is behave differently. What's going on?
EDIT: I know I should be using ==, I just wanted to know why is behaves like this.

Warning: this answer is about the implementation details of a specific python interpreter. comparing strings with is==bad idea.
Well, at least for cpython3.4/2.7.3, the answer is "no, it is not the whitespace". Not only the whitespace:
Two string literals will share memory if they are either alphanumeric or reside on the same block (file, function, class or single interpreter command)
An expression that evaluates to a string will result in an object that is identical to the one created using a string literal, if and only if it is created using constants and binary/unary operators, and the resulting string is shorter than 21 characters.
Single characters are unique.
Examples
Alphanumeric string literals always share memory:
>>> x='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> y='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> x is y
True
Non-alphanumeric string literals share memory if and only if they share the enclosing syntactic block:
(interpreter)
>>> x='`!##$%^&*() \][=-. >:"?<a'; y='`!##$%^&*() \][=-. >:"?<a';
>>> z='`!##$%^&*() \][=-. >:"?<a';
>>> x is y
True
>>> x is z
False
(file)
x='`!##$%^&*() \][=-. >:"?<a';
y='`!##$%^&*() \][=-. >:"?<a';
z=(lambda : '`!##$%^&*() \][=-. >:"?<a')()
print(x is y)
print(x is z)
Output: True and False
For simple binary operations, the compiler is doing very simple constant propagation (see peephole.c), but with strings it does so only if the resulting string is shorter than 21 charcters. If this is the case, the rules mentioned earlier are in force:
>>> 'a'*10+'a'*10 is 'a'*20
True
>>> 'a'*21 is 'a'*21
False
>>> 'aaaaaaaaaaaaaaaaaaaaa' is 'aaaaaaaa' + 'aaaaaaaaaaaaa'
False
>>> t=2; 'a'*t is 'aa'
False
>>> 'a'.__add__('a') is 'aa'
False
>>> x='a' ; x+='a'; x is 'aa'
False
Single characters always share memory, of course:
>>> chr(0x20) is ' '
True

To expand on Ignacio’s answer a bit: The is operator is the identity operator. It is used to compare object identity. If you construct two objects with the same contents, then it is usually not the case that the object identity yields true. It works for some small strings because CPython, the reference implementation of Python, stores the contents separately, making all those objects reference to the same string content. So the is operator returns true for those.
This however is an implementation detail of CPython and is generally neither guaranteed for CPython nor any other implementation. So using this fact is a bad idea as it can break any other day.
To compare strings, you use the == operator which compares the equality of objects. Two string objects are considered equal when they contain the same characters. So this is the correct operator to use when comparing strings, and is should be generally avoided if you do not explicitely want object identity (example: a is False).
If you are really interested in the details, you can find the implementation of CPython’s strings here. But again: This is implementation detail, so you should never require this to work.

The is operator relies on the id function, which is guaranteed to be unique among simultaneously existing objects. Specifically, id returns the object's memory address. It seems that CPython has consistent memory addresses for strings containing only characters a-z and A-Z.
However, this seems to only be the case when the string has been assigned to a variable:
Here, the id of "foo" and the id of a are the same. a has been set to "foo" prior to checking the id.
>>> a = "foo"
>>> id(a)
4322269384
>>> id("foo")
4322269384
However, the id of "bar" and the id of a are different when checking the id of "bar" prior to setting a equal to "bar".
>>> id("bar")
4322269224
>>> a = "bar"
>>> id(a)
4322268984
Checking the id of "bar" again after setting a equal to "bar" returns the same id.
>>> id("bar")
4322268984
So it seems that cPython keeps consistent memory addresses for strings containing only a-zA-Z when those strings are assigned to a variable. It's also entirely possible that this is version dependent: I'm running python 2.7.3 on a macbook. Others might get entirely different results.

In fact your code amounts to comparing objects id (i.e. their physical address). So instead of your is comparison:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
You can do:
>>> id(a) == id(b)
False
But, note that if a and b were directly in the comparison it would work.
>>> id('is it the space?') == id('is it the space?')
True
In fact, in an expression there's sharing between the same static strings. But, at the program scale there's only sharing for word-like strings (so neither spaces nor punctuations).
You should not rely on this behavior as it's not documented anywhere and is a detail of implementation.

Two or more identical strings of consecutive alphanumeric (only) characters are stored in one structure, thus they share their memory reference. There are posts about this phenomenon all over the internet since the 1990's. It has evidently always been that way. I have never seen a reasonable guess as to why that's the case. I only know that it is. Furthermore, if you split and re-join alphanumeric strings to remove spaces between words, the resulting identical alphanumeric strings do NOT share a reference, which I find odd. See below:
Add any non-alphanumeric value identically to both strings, and they instantly become copies, but not shared references.
a ="abbacca"; b = "abbacca"; a is b => True
a ="abbacca "; b = "abbacca "; a is b => False
a ="abbacca?"; b = "abbacca?"; a is b => False
~Dr. C.

'is' operator compare the actual object.
c is d should also be false. My guess is that python make some optimization and in that case, it is the same object.

Checking Objects and Values using Is Operator in Python [duplicate]

This question already has answers here:
Understanding the "is" operator [duplicate]
(11 answers)
Why does comparing strings using either '==' or 'is' sometimes produce a different result?
(15 answers)
Closed 1 year ago.
Why is this:
x = str(input("Enter a string: ")) #input "cat"
y = str(input("Enter another string: ")) #input "cat"
print(x is y) #Outputs False
Not the same as this:
x = "cat"
y = "cat"
print(x is y) #Outputs True

From this Real Python article:
The == operator compares the value or equality of two objects, whereas the Python is operator checks whether two variables point to the same object in memory. In the vast majority of cases, this means you should use the equality operators == and !=, except when you’re comparing to None.
>>> x = None
>>> y = None
>>> id(x); id(y)
4389651888
4389651888
>>> x is y
True
== calls the __eq__ method of an object, is checks whether the id() of two objects is equal (memory address).
The rest of the article I linked is really informative; it talks about how Python will give the same id to small integers by default, and that you can use the sys.intern() method to ensure string variables point to the same object in memory as well.

The is operator in python is used to check if two objects point to the same memory location.
In the second case, when python runs, for optimization purposes both x and y point to the same memory location. However, in the first case, x and y are not defined until the user inputs a value during run time. So, they're both allocated memory in different locations.

Python: if elif loop [duplicate]

This question already has answers here:
Why does comparing strings using either '==' or 'is' sometimes produce a different result?
(15 answers)
Closed 9 years ago.
I noticed a Python script I was writing was acting squirrelly, and traced it to an infinite loop, where the loop condition was while line is not ''. Running through it in the debugger, it turned out that line was in fact ''. When I changed it to !='' rather than is not '', it worked fine.
Also, is it generally considered better to just use '==' by default, even when comparing int or Boolean values? I've always liked to use 'is' because I find it more aesthetically pleasing and pythonic (which is how I fell into this trap...), but I wonder if it's intended to just be reserved for when you care about finding two objects with the same id.

For all built-in Python objects (like
strings, lists, dicts, functions,
etc.), if x is y, then x==y is also
True.
Not always. NaN is a counterexample. But usually, identity (is) implies equality (==). The converse is not true: Two distinct objects can have the same value.
Also, is it generally considered better to just use '==' by default, even
when comparing int or Boolean values?
You use == when comparing values and is when comparing identities.
When comparing ints (or immutable types in general), you pretty much always want the former. There's an optimization that allows small integers to be compared with is, but don't rely on it.
For boolean values, you shouldn't be doing comparisons at all. Instead of:
if x == True:
# do something
write:
if x:
# do something
For comparing against None, is None is preferred over == None.
I've always liked to use 'is' because
I find it more aesthetically pleasing
and pythonic (which is how I fell into
this trap...), but I wonder if it's
intended to just be reserved for when
you care about finding two objects
with the same id.
Yes, that's exactly what it's for.

I would like to show a little example on how is and == are involved in immutable types. Try that:
a = 19998989890
b = 19998989889 +1
>>> a is b
False
>>> a == b
True
is compares two objects in memory, == compares their values. For example, you can see that small integers are cached by Python:
c = 1
b = 1
>>> b is c
True
You should use == when comparing values and is when comparing identities. (Also, from an English point of view, "equals" is different from "is".)

The logic is not flawed. The statement
if x is y then x==y is also True
should never be read to mean
if x==y then x is y
It is a logical error on the part of the reader to assume that the converse of a logic statement is true. See http://en.wikipedia.org/wiki/Converse_(logic)

See This question
Your logic in reading
For all built-in Python objects (like
strings, lists, dicts, functions,
etc.), if x is y, then x==y is also
True.
is slightly flawed.
If is applies then == will be True, but it does NOT apply in reverse. == may yield True while is yields False.

Python values actually aren't equivalent [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Python “is” operator behaves unexpectedly with integers
I'm learning Python, and am curious as to why:
x = 500
x is 500
returns False, but:
y = 100
y is 100
returns True?

Python reuses small integers. That is, all 1s (for example) are the same 1 object. The range is -5 to 255, if I remember correctly, though this is a CPython implementation detail that should not be relied upon. I am pretty sure Jython and IronPython, for example, handle this differently.
The reason this works out fine is that ints are immutable. That is, you can't change a 4 to a 5 in-place. if a has a value of 4, a = 5 is actually pointing a to a different object, not changing the value a contains. Python doesn't share any mutable types (such as lists) where unexpectedly having multiple references to the same object might cause problems.
You should use == for comparing most things. is is for checking to see whether two references point to the same object; it is roughly equivalent to id(x) == id(y).

is tests for identity - x is y asks if they are the same object, not if they are simply 'equivalent'. So you also have, eg:
>>> x = []
>>> y = []
>>> z = x
>>> x is y
False
>>> x is z
True
For equivalence, you want to test equality:
>>> x = 500
>>> x == 500
True
Python (or, at least, cpython - the major implementation) does some optimisations so that certain immutable objects only exist once throughout the lifetime of the interpreter. So, every 5 throughout your program will be the same integer object. The same thing happens with string literals, for example.

"is" compare objects IDs and "==" will compare object values. So, if you need to compare values, go with "==" and if you whant to compare objects, go with "is".
As in Python everything is an object, is compares objects IDs, it's faster, but some times unpredictable. You need to be very sure of what you are doing to use "is" for simple comparsion.
About the situation above, I found here: http://docs.python.org/c-api/int.html the following remark:
The current implementation keeps an array of integer objects for all
integers between -5 and 256, when you create an int in that range you
actually just get back a reference to the existing object. So it
should be possible to change the value of 1. I suspect the behaviour
of Python in this case is undefined. :-)
So, you can do the following test and see this behaviour:
>>> a = 256
>>> id(a)
19707932
>>> id(256)
19707932
>>> a = 257
>>> id(a)
26286076
>>> id(257)
26286064
So, for integers above 256, "is" will not work. Be careful using "is" for comparsion.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Comparing strings using '==' and 'is' [duplicate] - python

This is an implementation detail and absolutely not to be relied upon. is compares identities, not values. Short strings are interned, so they map to the same memory address, but this doesn't mean you should compare them with is. Stick to ==.

This is because of a Python feature called String interning which is a method of storing only one copy of each distinct string value.

Related

`is` vs `==` for comparing primitives [duplicate]

why id is different in when both string are the same? [duplicate]

Checking Objects and Values using Is Operator in Python [duplicate]

Python: if elif loop [duplicate]

Python values actually aren't equivalent [duplicate]

Categories

Resources