Related
I read the Python 2 docs and noticed the id() function:
Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
So, I experimented by using id() with a list:
>>> list = [1,2,3]
>>> id(list[0])
31186196
>>> id(list[1])
31907092 // increased by 896
>>> id(list[2])
31907080 // decreased by 12
What is the integer returned from the function? Is it synonymous to memory addresses in C? If so, why doesn't the integer correspond to the size of the data type?
When is id() used in practice?
Your post asks several questions:
What is the number returned from the function?
It is "an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime." (Python Standard Library - Built-in Functions) A unique number. Nothing more, and nothing less. Think of it as a social-security number or employee id number for Python objects.
Is it the same with memory addresses in C?
Conceptually, yes, in that they are both guaranteed to be unique in their universe during their lifetime. And in one particular implementation of Python, it actually is the memory address of the corresponding C object.
If yes, why doesn't the number increase instantly by the size of the data type (I assume that it would be int)?
Because a list is not an array, and a list element is a reference, not an object.
When do we really use id( ) function?
Hardly ever. You can test if two references are the same by comparing their ids, but the is operator has always been the recommended way of doing that. id( ) is only really useful in debugging situations.
That's the identity of the location of the object in memory...
This example might help you understand the concept a little more.
foo = 1
bar = foo
baz = bar
fii = 1
print id(foo)
print id(bar)
print id(baz)
print id(fii)
> 1532352
> 1532352
> 1532352
> 1532352
These all point to the same location in memory, which is why their values are the same. In the example, 1 is only stored once, and anything else pointing to 1 will reference that memory location.
Rob's answer (most voted above) is correct. I would like to add that in some situations using IDs is useful as it allows for comparison of objects and finding which objects refer to your objects.
The later usually helps you for example to debug strange bugs where mutable objects are passed as parameter to say classes and are assigned to local vars in a class. Mutating those objects will mutate vars in a class. This manifests itself in strange behavior where multiple things change at the same time.
Recently I had this problem with a Python/Tkinter app where editing text in one text entry field changed the text in another as I typed :)
Here is an example on how you might use function id() to trace where those references are. By all means this is not a solution covering all possible cases, but you get the idea. Again IDs are used in the background and user does not see them:
class democlass:
classvar = 24
def __init__(self, var):
self.instancevar1 = var
self.instancevar2 = 42
def whoreferencesmylocalvars(self, fromwhere):
return {__l__: {__g__
for __g__ in fromwhere
if not callable(__g__) and id(eval(__g__)) == id(getattr(self,__l__))
}
for __l__ in dir(self)
if not callable(getattr(self, __l__)) and __l__[-1] != '_'
}
def whoreferencesthisclassinstance(self, fromwhere):
return {__g__
for __g__ in fromwhere
if not callable(__g__) and id(eval(__g__)) == id(self)
}
a = [1,2,3,4]
b = a
c = b
democlassinstance = democlass(a)
d = democlassinstance
e = d
f = democlassinstance.classvar
g = democlassinstance.instancevar2
print( 'My class instance is of', type(democlassinstance), 'type.')
print( 'My instance vars are referenced by:', democlassinstance.whoreferencesmylocalvars(globals()) )
print( 'My class instance is referenced by:', democlassinstance.whoreferencesthisclassinstance(globals()) )
OUTPUT:
My class instance is of <class '__main__.democlass'> type.
My instance vars are referenced by: {'instancevar2': {'g'}, 'classvar': {'f'}, 'instancevar1': {'a', 'c', 'b'}}
My class instance is referenced by: {'e', 'd', 'democlassinstance'}
Underscores in variable names are used to prevent name colisions. Functions use "fromwhere" argument so that you can let them know where to start searching for references. This argument is filled by a function that lists all names in a given namespace. Globals() is one such function.
id() does return the address of the object being referenced (in CPython), but your confusion comes from the fact that python lists are very different from C arrays. In a python list, every element is a reference. So what you are doing is much more similar to this C code:
int *arr[3];
arr[0] = malloc(sizeof(int));
*arr[0] = 1;
arr[1] = malloc(sizeof(int));
*arr[1] = 2;
arr[2] = malloc(sizeof(int));
*arr[2] = 3;
printf("%p %p %p", arr[0], arr[1], arr[2]);
In other words, you are printing the address from the reference and not an address relative to where your list is stored.
In my case, I have found the id() function handy for creating opaque handles to return to C code when calling python from C. Doing that, you can easily use a dictionary to look up the object from its handle and it's guaranteed to be unique.
I am starting out with python and I use id when I use the interactive shell to see whether my variables are assigned to the same thing or if they just look the same.
Every value is an id, which is a unique number related to where it is stored in the memory of the computer.
If you're using python 3.4.1 then you get a different answer to your question.
list = [1,2,3]
id(list[0])
id(list[1])
id(list[2])
returns:
1705950792
1705950808 # increased by 16
1705950824 # increased by 16
The integers -5 to 256 have a constant id, and on finding it multiple times its id does not change, unlike all other numbers before or after it that have different id's every time you find it.
The numbers from -5 to 256 have id's in increasing order and differ by 16.
The number returned by id() function is a unique id given to each item stored in memory and it is analogy wise the same as the memory location in C.
The is operator uses it to check whether two objects are identical (as opposed to equal). The actual value that is returned from id() is pretty much never used for anything because it doesn't really have a meaning, and it's platform-dependent.
The answer is pretty much never. IDs are mainly used internally to Python.
The average Python programmer will probably never need to use id() in their code.
It is the address of the object in memory, exactly as the doc says. However, it has metadata attached to it, properties of the object and location in the memory is needed to store the metadata. So, when you create your variable called list, you also create metadata for the list and its elements.
So, unless you an absolute guru in the language you can't determine the id of the next element of your list based on the previous element, because you don't know what the language allocates along with the elements.
I have an idea to use value of id() in logging.
It's cheap to get and it's quite short.
In my case I use tornado and id() would like to have an anchor to group messages scattered and mixed over file by web socket.
I'm a little bit late and i will talk about Python3. To understand what id() is and how it (and Python) works, consider next example:
>>> x=1000
>>> y=1000
>>> id(x)==id(y)
False
>>> id(x)
4312240944
>>> id(y)
4312240912
>>> id(1000)
4312241104
>>> x=1000
>>> id(x)
4312241104
>>> y=1000
>>> id(y)
4312241200
You need to think about everything on the right side as objects. Every time you make assignment - you create new object and that means new id. In the middle you can see a "wild" object which is created only for function - id(1000). So, it's lifetime is only for that line of code. If you check the next line - you see that when we create new variable x, it has the same id as that wild object. Pretty much it works like memory address.
As of in python 3 id is assigned to a value not a variable. This means that if you create two functions as below, all the three id's are the same.
>>> def xyz():
... q=123
... print(id(q))
...
>>> def iop():
... w=123
... print(id(w))
>>> xyz()
1650376736
>>> iop()
1650376736
>>> id(123)
1650376736
Be carefull (concerning the answer just below)...That's only true because 123 is between -5 and 256...
In [111]: q = 257
In [112]: id(q)
Out[112]: 140020248465168
In [113]: w = 257
In [114]: id(w)
Out[114]: 140020274622544
In [115]: id(257)
Out[115]: 140020274622768
Two string variables are set to the same value. s1 == s2 always returns True, but s1 is s2 sometimes returns False.
If I open my Python interpreter and do the same is comparison, it succeeds:
>>> s1 = 'text'
>>> s2 = 'text'
>>> s1 is s2
True
Why is this?
is is identity testing, and == is equality testing. What happens in your code would be emulated in the interpreter like this:
>>> a = 'pub'
>>> b = ''.join(['p', 'u', 'b'])
>>> a == b
True
>>> a is b
False
So, no wonder they're not the same, right?
In other words: a is b is the equivalent of id(a) == id(b)
Other answers here are correct: is is used for identity comparison, while == is used for equality comparison. Since what you care about is equality (the two strings should contain the same characters), in this case the is operator is simply wrong and you should be using == instead.
The reason is works interactively is that (most) string literals are interned by default. From Wikipedia:
Interned strings speed up string
comparisons, which are sometimes a
performance bottleneck in applications
(such as compilers and dynamic
programming language runtimes) that
rely heavily on hash tables with
string keys. Without interning,
checking that two different strings
are equal involves examining every
character of both strings. This is
slow for several reasons: it is
inherently O(n) in the length of the
strings; it typically requires reads
from several regions of memory, which
take time; and the reads fills up the
processor cache, meaning there is less
cache available for other needs. With
interned strings, a simple object
identity test suffices after the
original intern operation; this is
typically implemented as a pointer
equality test, normally just a single
machine instruction with no memory
reference at all.
So, when you have two string literals (words that are literally typed into your program source code, surrounded by quotation marks) in your program that have the same value, the Python compiler will automatically intern the strings, making them both stored at the same memory location. (Note that this doesn't always happen, and the rules for when this happens are quite convoluted, so please don't rely on this behavior in production code!)
Since in your interactive session both strings are actually stored in the same memory location, they have the same identity, so the is operator works as expected. But if you construct a string by some other method (even if that string contains exactly the same characters), then the string may be equal, but it is not the same string -- that is, it has a different identity, because it is stored in a different place in memory.
The is keyword is a test for object identity while == is a value comparison.
If you use is, the result will be true if and only if the object is the same object. However, == will be true any time the values of the object are the same.
One last thing to note is you may use the sys.intern function to ensure that you're getting a reference to the same string:
>>> from sys import intern
>>> a = intern('a')
>>> a2 = intern('a')
>>> a is a2
True
As pointed out in previous answers, you should not be using is to determine equality of strings. But this may be helpful to know if you have some kind of weird requirement to use is.
Note that the intern function used to be a built-in on Python 2, but it was moved to the sys module in Python 3.
is is identity testing and == is equality testing. This means is is a way to check whether two things are the same things, or just equivalent.
Say you've got a simple person object. If it is named 'Jack' and is '23' years old, it's equivalent to another 23-year-old Jack, but it's not the same person.
class Person(object):
def __init__(self, name, age):
self.name = name
self.age = age
def __eq__(self, other):
return self.name == other.name and self.age == other.age
jack1 = Person('Jack', 23)
jack2 = Person('Jack', 23)
jack1 == jack2 # True
jack1 is jack2 # False
They're the same age, but they're not the same instance of person. A string might be equivalent to another, but it's not the same object.
This is a side note, but in idiomatic Python, you will often see things like:
if x is None:
# Some clauses
This is safe, because there is guaranteed to be one instance of the Null Object (i.e., None).
If you're not sure what you're doing, use the '=='.
If you have a little more knowledge about it you can use 'is' for known objects like 'None'.
Otherwise, you'll end up wondering why things doesn't work and why this happens:
>>> a = 1
>>> b = 1
>>> b is a
True
>>> a = 6000
>>> b = 6000
>>> b is a
False
I'm not even sure if some things are guaranteed to stay the same between different Python versions/implementations.
From my limited experience with Python, is is used to compare two objects to see if they are the same object as opposed to two different objects with the same value. == is used to determine if the values are identical.
Here is a good example:
>>> s1 = u'public'
>>> s2 = 'public'
>>> s1 is s2
False
>>> s1 == s2
True
s1 is a Unicode string, and s2 is a normal string. They are not the same type, but they are the same value.
I think it has to do with the fact that, when the 'is' comparison evaluates to false, two distinct objects are used. If it evaluates to true, that means internally it's using the same exact object and not creating a new one, possibly because you created them within a fraction of 2 or so seconds and because there isn't a large time gap in between it's optimized and uses the same object.
This is why you should be using the equality operator ==, not is, to compare the value of a string object.
>>> s = 'one'
>>> s2 = 'two'
>>> s is s2
False
>>> s2 = s2.replace('two', 'one')
>>> s2
'one'
>>> s2 is s
False
>>>
In this example, I made s2, which was a different string object previously equal to 'one' but it is not the same object as s, because the interpreter did not use the same object as I did not initially assign it to 'one', if I had it would have made them the same object.
The == operator tests value equivalence. The is operator tests object identity, and Python tests whether the two are really the same object (i.e., live at the same address in memory).
>>> a = 'banana'
>>> b = 'banana'
>>> a is b
True
In this example, Python only created one string object, and both a and b refers to it. The reason is that Python internally caches and reuses some strings as an optimization. There really is just a string 'banana' in memory, shared by a and b. To trigger the normal behavior, you need to use longer strings:
>>> a = 'a longer banana'
>>> b = 'a longer banana'
>>> a == b, a is b
(True, False)
When you create two lists, you get two objects:
>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> a is b
False
In this case we would say that the two lists are equivalent, because they have the same elements, but not identical, because they are not the same object. If two objects are identical, they are also equivalent, but if they are equivalent, they are not necessarily identical.
If a refers to an object and you assign b = a, then both variables refer to the same object:
>>> a = [1, 2, 3]
>>> b = a
>>> b is a
True
Reference: Think Python 2e by Allen B. Downey
I believe that this is known as "interned" strings. Python does this, so does Java, and so do C and C++ when compiling in optimized modes.
If you use two identical strings, instead of wasting memory by creating two string objects, all interned strings with the same contents point to the same memory.
This results in the Python "is" operator returning True because two strings with the same contents are pointing at the same string object. This will also happen in Java and in C.
This is only useful for memory savings though. You cannot rely on it to test for string equality, because the various interpreters and compilers and JIT engines cannot always do it.
Actually, the is operator checks for identity and == operator checks for equality.
From the language reference:
Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed. E.g., after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists. (Note that c = d = [] assigns the same object to both c and d.)
So from the above statement we can infer that the strings, which are immutable types, may fail when checked with "is" and may succeed when checked with "is".
The same applies for int and tuple which are also immutable types.
is will compare the memory location. It is used for object-level comparison.
== will compare the variables in the program. It is used for checking at a value level.
is checks for address level equivalence
== checks for value level equivalence
is is identity testing and == is equality testing (see the Python documentation).
In most cases, if a is b, then a == b. But there are exceptions, for example:
>>> nan = float('nan')
>>> nan is nan
True
>>> nan == nan
False
So, you can only use is for identity tests, never equality tests.
The basic concept, we have to be clear, while approaching this question, is to understand the difference between is and ==.
"is" is will compare the memory location. if id(a)==id(b), then a is b returns true else it returns false.
So, we can say that is is used for comparing memory locations. Whereas,
== is used for equality testing which means that it just compares only the resultant values. The below shown code may acts as an example to the above given theory.
Code
In the case of string literals (strings without getting assigned to variables), the memory address will be same as shown in the picture. so, id(a)==id(b). The remaining of this is self-explanatory.
Two string variables are set to the same value. s1 == s2 always returns True, but s1 is s2 sometimes returns False.
If I open my Python interpreter and do the same is comparison, it succeeds:
>>> s1 = 'text'
>>> s2 = 'text'
>>> s1 is s2
True
Why is this?
is is identity testing, and == is equality testing. What happens in your code would be emulated in the interpreter like this:
>>> a = 'pub'
>>> b = ''.join(['p', 'u', 'b'])
>>> a == b
True
>>> a is b
False
So, no wonder they're not the same, right?
In other words: a is b is the equivalent of id(a) == id(b)
Other answers here are correct: is is used for identity comparison, while == is used for equality comparison. Since what you care about is equality (the two strings should contain the same characters), in this case the is operator is simply wrong and you should be using == instead.
The reason is works interactively is that (most) string literals are interned by default. From Wikipedia:
Interned strings speed up string
comparisons, which are sometimes a
performance bottleneck in applications
(such as compilers and dynamic
programming language runtimes) that
rely heavily on hash tables with
string keys. Without interning,
checking that two different strings
are equal involves examining every
character of both strings. This is
slow for several reasons: it is
inherently O(n) in the length of the
strings; it typically requires reads
from several regions of memory, which
take time; and the reads fills up the
processor cache, meaning there is less
cache available for other needs. With
interned strings, a simple object
identity test suffices after the
original intern operation; this is
typically implemented as a pointer
equality test, normally just a single
machine instruction with no memory
reference at all.
So, when you have two string literals (words that are literally typed into your program source code, surrounded by quotation marks) in your program that have the same value, the Python compiler will automatically intern the strings, making them both stored at the same memory location. (Note that this doesn't always happen, and the rules for when this happens are quite convoluted, so please don't rely on this behavior in production code!)
Since in your interactive session both strings are actually stored in the same memory location, they have the same identity, so the is operator works as expected. But if you construct a string by some other method (even if that string contains exactly the same characters), then the string may be equal, but it is not the same string -- that is, it has a different identity, because it is stored in a different place in memory.
The is keyword is a test for object identity while == is a value comparison.
If you use is, the result will be true if and only if the object is the same object. However, == will be true any time the values of the object are the same.
One last thing to note is you may use the sys.intern function to ensure that you're getting a reference to the same string:
>>> from sys import intern
>>> a = intern('a')
>>> a2 = intern('a')
>>> a is a2
True
As pointed out in previous answers, you should not be using is to determine equality of strings. But this may be helpful to know if you have some kind of weird requirement to use is.
Note that the intern function used to be a built-in on Python 2, but it was moved to the sys module in Python 3.
is is identity testing and == is equality testing. This means is is a way to check whether two things are the same things, or just equivalent.
Say you've got a simple person object. If it is named 'Jack' and is '23' years old, it's equivalent to another 23-year-old Jack, but it's not the same person.
class Person(object):
def __init__(self, name, age):
self.name = name
self.age = age
def __eq__(self, other):
return self.name == other.name and self.age == other.age
jack1 = Person('Jack', 23)
jack2 = Person('Jack', 23)
jack1 == jack2 # True
jack1 is jack2 # False
They're the same age, but they're not the same instance of person. A string might be equivalent to another, but it's not the same object.
This is a side note, but in idiomatic Python, you will often see things like:
if x is None:
# Some clauses
This is safe, because there is guaranteed to be one instance of the Null Object (i.e., None).
If you're not sure what you're doing, use the '=='.
If you have a little more knowledge about it you can use 'is' for known objects like 'None'.
Otherwise, you'll end up wondering why things doesn't work and why this happens:
>>> a = 1
>>> b = 1
>>> b is a
True
>>> a = 6000
>>> b = 6000
>>> b is a
False
I'm not even sure if some things are guaranteed to stay the same between different Python versions/implementations.
From my limited experience with Python, is is used to compare two objects to see if they are the same object as opposed to two different objects with the same value. == is used to determine if the values are identical.
Here is a good example:
>>> s1 = u'public'
>>> s2 = 'public'
>>> s1 is s2
False
>>> s1 == s2
True
s1 is a Unicode string, and s2 is a normal string. They are not the same type, but they are the same value.
I think it has to do with the fact that, when the 'is' comparison evaluates to false, two distinct objects are used. If it evaluates to true, that means internally it's using the same exact object and not creating a new one, possibly because you created them within a fraction of 2 or so seconds and because there isn't a large time gap in between it's optimized and uses the same object.
This is why you should be using the equality operator ==, not is, to compare the value of a string object.
>>> s = 'one'
>>> s2 = 'two'
>>> s is s2
False
>>> s2 = s2.replace('two', 'one')
>>> s2
'one'
>>> s2 is s
False
>>>
In this example, I made s2, which was a different string object previously equal to 'one' but it is not the same object as s, because the interpreter did not use the same object as I did not initially assign it to 'one', if I had it would have made them the same object.
The == operator tests value equivalence. The is operator tests object identity, and Python tests whether the two are really the same object (i.e., live at the same address in memory).
>>> a = 'banana'
>>> b = 'banana'
>>> a is b
True
In this example, Python only created one string object, and both a and b refers to it. The reason is that Python internally caches and reuses some strings as an optimization. There really is just a string 'banana' in memory, shared by a and b. To trigger the normal behavior, you need to use longer strings:
>>> a = 'a longer banana'
>>> b = 'a longer banana'
>>> a == b, a is b
(True, False)
When you create two lists, you get two objects:
>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> a is b
False
In this case we would say that the two lists are equivalent, because they have the same elements, but not identical, because they are not the same object. If two objects are identical, they are also equivalent, but if they are equivalent, they are not necessarily identical.
If a refers to an object and you assign b = a, then both variables refer to the same object:
>>> a = [1, 2, 3]
>>> b = a
>>> b is a
True
Reference: Think Python 2e by Allen B. Downey
I believe that this is known as "interned" strings. Python does this, so does Java, and so do C and C++ when compiling in optimized modes.
If you use two identical strings, instead of wasting memory by creating two string objects, all interned strings with the same contents point to the same memory.
This results in the Python "is" operator returning True because two strings with the same contents are pointing at the same string object. This will also happen in Java and in C.
This is only useful for memory savings though. You cannot rely on it to test for string equality, because the various interpreters and compilers and JIT engines cannot always do it.
Actually, the is operator checks for identity and == operator checks for equality.
From the language reference:
Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed. E.g., after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists. (Note that c = d = [] assigns the same object to both c and d.)
So from the above statement we can infer that the strings, which are immutable types, may fail when checked with "is" and may succeed when checked with "is".
The same applies for int and tuple which are also immutable types.
is will compare the memory location. It is used for object-level comparison.
== will compare the variables in the program. It is used for checking at a value level.
is checks for address level equivalence
== checks for value level equivalence
is is identity testing and == is equality testing (see the Python documentation).
In most cases, if a is b, then a == b. But there are exceptions, for example:
>>> nan = float('nan')
>>> nan is nan
True
>>> nan == nan
False
So, you can only use is for identity tests, never equality tests.
The basic concept, we have to be clear, while approaching this question, is to understand the difference between is and ==.
"is" is will compare the memory location. if id(a)==id(b), then a is b returns true else it returns false.
So, we can say that is is used for comparing memory locations. Whereas,
== is used for equality testing which means that it just compares only the resultant values. The below shown code may acts as an example to the above given theory.
Code
In the case of string literals (strings without getting assigned to variables), the memory address will be same as shown in the picture. so, id(a)==id(b). The remaining of this is self-explanatory.
None in Python is a reserved word, just a question crossed my mind about the exact value of None in memory. What I'm holding in my mind is this, None's representation in memory is either 0 or a pointer pointing to a heap. But neither the pointer pointing to an empty area in the memory will make sense neither the zero value. Because when I tested the following:
>>> None.__sizeof__()
16
It turns out that None consumes 16 bytes and that's actually too much for simply an empty value.
So what does None actually represents in memory?
None is singleton object which doesn't provide (almost) none methods and attributes and its only purpose is to signify the fact that there is no value for some specific operation.
As a real object it still needs to have some headers, some reflection information and things like that so it takes the minimum memory occupied by every python object.
None references some Python object of the NoneType, actually the only instance of its kind. Since it’s a Python object—like literally every other thing—it’s not just the raw content what is stored about also the additional object information like its type.
Every object has this overhead; that’s why integers start with a size of 28 for example. You cannot change this, it’s an implementation detail of Python and its type system.
None is a special constant in Python that represents the absence of a value or a null value. It is an object of its own datatype, the NoneType. We cannot create multiple None objects but can assign it to variables. These variables will be equal to one another. We must take special care that None does not imply False, 0 or any empty list, dictionary, string etc. For example:
>>> None == 0
False
>>> None == []
False
>>> None == False
False
>>> x = None
>>> y = None
>>> x == y
True
Void functions that do not return anything will return a None object automatically. None is also returned by functions in which the program flow does not encounter a return statement. For example:
def a_void_function():
a = 1
b = 2
c = a + b
x = a_void_function()
print(x)
Output:
None
This program has a function that does not return a value, although it does some operations inside. So when we print x, we get None which is returned automatically (implicitly). Similarly, here is another example:
def improper_return_function(a):
if (a % 2) == 0:
return True
x = improper_return_function(3)
print(x)
Output:
None
Although this function has a return statement, it is not reached in every case. The function will return True only when the input is even. So, if we give the function an odd number, None is returned implicitly.
It turns out that None consumes 16 bytes and that's actually too much for simply an empty value.
None ist not an empty value. It is an NoneType object which represents the absence of a value as you can read here: Python Built-in constants Doc. You can assign None to variables, but you can't assign to None, as it would raise an syntax error.
Two string variables are set to the same value. s1 == s2 always returns True, but s1 is s2 sometimes returns False.
If I open my Python interpreter and do the same is comparison, it succeeds:
>>> s1 = 'text'
>>> s2 = 'text'
>>> s1 is s2
True
Why is this?
is is identity testing, and == is equality testing. What happens in your code would be emulated in the interpreter like this:
>>> a = 'pub'
>>> b = ''.join(['p', 'u', 'b'])
>>> a == b
True
>>> a is b
False
So, no wonder they're not the same, right?
In other words: a is b is the equivalent of id(a) == id(b)
Other answers here are correct: is is used for identity comparison, while == is used for equality comparison. Since what you care about is equality (the two strings should contain the same characters), in this case the is operator is simply wrong and you should be using == instead.
The reason is works interactively is that (most) string literals are interned by default. From Wikipedia:
Interned strings speed up string
comparisons, which are sometimes a
performance bottleneck in applications
(such as compilers and dynamic
programming language runtimes) that
rely heavily on hash tables with
string keys. Without interning,
checking that two different strings
are equal involves examining every
character of both strings. This is
slow for several reasons: it is
inherently O(n) in the length of the
strings; it typically requires reads
from several regions of memory, which
take time; and the reads fills up the
processor cache, meaning there is less
cache available for other needs. With
interned strings, a simple object
identity test suffices after the
original intern operation; this is
typically implemented as a pointer
equality test, normally just a single
machine instruction with no memory
reference at all.
So, when you have two string literals (words that are literally typed into your program source code, surrounded by quotation marks) in your program that have the same value, the Python compiler will automatically intern the strings, making them both stored at the same memory location. (Note that this doesn't always happen, and the rules for when this happens are quite convoluted, so please don't rely on this behavior in production code!)
Since in your interactive session both strings are actually stored in the same memory location, they have the same identity, so the is operator works as expected. But if you construct a string by some other method (even if that string contains exactly the same characters), then the string may be equal, but it is not the same string -- that is, it has a different identity, because it is stored in a different place in memory.
The is keyword is a test for object identity while == is a value comparison.
If you use is, the result will be true if and only if the object is the same object. However, == will be true any time the values of the object are the same.
One last thing to note is you may use the sys.intern function to ensure that you're getting a reference to the same string:
>>> from sys import intern
>>> a = intern('a')
>>> a2 = intern('a')
>>> a is a2
True
As pointed out in previous answers, you should not be using is to determine equality of strings. But this may be helpful to know if you have some kind of weird requirement to use is.
Note that the intern function used to be a built-in on Python 2, but it was moved to the sys module in Python 3.
is is identity testing and == is equality testing. This means is is a way to check whether two things are the same things, or just equivalent.
Say you've got a simple person object. If it is named 'Jack' and is '23' years old, it's equivalent to another 23-year-old Jack, but it's not the same person.
class Person(object):
def __init__(self, name, age):
self.name = name
self.age = age
def __eq__(self, other):
return self.name == other.name and self.age == other.age
jack1 = Person('Jack', 23)
jack2 = Person('Jack', 23)
jack1 == jack2 # True
jack1 is jack2 # False
They're the same age, but they're not the same instance of person. A string might be equivalent to another, but it's not the same object.
This is a side note, but in idiomatic Python, you will often see things like:
if x is None:
# Some clauses
This is safe, because there is guaranteed to be one instance of the Null Object (i.e., None).
If you're not sure what you're doing, use the '=='.
If you have a little more knowledge about it you can use 'is' for known objects like 'None'.
Otherwise, you'll end up wondering why things doesn't work and why this happens:
>>> a = 1
>>> b = 1
>>> b is a
True
>>> a = 6000
>>> b = 6000
>>> b is a
False
I'm not even sure if some things are guaranteed to stay the same between different Python versions/implementations.
From my limited experience with Python, is is used to compare two objects to see if they are the same object as opposed to two different objects with the same value. == is used to determine if the values are identical.
Here is a good example:
>>> s1 = u'public'
>>> s2 = 'public'
>>> s1 is s2
False
>>> s1 == s2
True
s1 is a Unicode string, and s2 is a normal string. They are not the same type, but they are the same value.
I think it has to do with the fact that, when the 'is' comparison evaluates to false, two distinct objects are used. If it evaluates to true, that means internally it's using the same exact object and not creating a new one, possibly because you created them within a fraction of 2 or so seconds and because there isn't a large time gap in between it's optimized and uses the same object.
This is why you should be using the equality operator ==, not is, to compare the value of a string object.
>>> s = 'one'
>>> s2 = 'two'
>>> s is s2
False
>>> s2 = s2.replace('two', 'one')
>>> s2
'one'
>>> s2 is s
False
>>>
In this example, I made s2, which was a different string object previously equal to 'one' but it is not the same object as s, because the interpreter did not use the same object as I did not initially assign it to 'one', if I had it would have made them the same object.
The == operator tests value equivalence. The is operator tests object identity, and Python tests whether the two are really the same object (i.e., live at the same address in memory).
>>> a = 'banana'
>>> b = 'banana'
>>> a is b
True
In this example, Python only created one string object, and both a and b refers to it. The reason is that Python internally caches and reuses some strings as an optimization. There really is just a string 'banana' in memory, shared by a and b. To trigger the normal behavior, you need to use longer strings:
>>> a = 'a longer banana'
>>> b = 'a longer banana'
>>> a == b, a is b
(True, False)
When you create two lists, you get two objects:
>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> a is b
False
In this case we would say that the two lists are equivalent, because they have the same elements, but not identical, because they are not the same object. If two objects are identical, they are also equivalent, but if they are equivalent, they are not necessarily identical.
If a refers to an object and you assign b = a, then both variables refer to the same object:
>>> a = [1, 2, 3]
>>> b = a
>>> b is a
True
Reference: Think Python 2e by Allen B. Downey
I believe that this is known as "interned" strings. Python does this, so does Java, and so do C and C++ when compiling in optimized modes.
If you use two identical strings, instead of wasting memory by creating two string objects, all interned strings with the same contents point to the same memory.
This results in the Python "is" operator returning True because two strings with the same contents are pointing at the same string object. This will also happen in Java and in C.
This is only useful for memory savings though. You cannot rely on it to test for string equality, because the various interpreters and compilers and JIT engines cannot always do it.
Actually, the is operator checks for identity and == operator checks for equality.
From the language reference:
Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed. E.g., after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists. (Note that c = d = [] assigns the same object to both c and d.)
So from the above statement we can infer that the strings, which are immutable types, may fail when checked with "is" and may succeed when checked with "is".
The same applies for int and tuple which are also immutable types.
is will compare the memory location. It is used for object-level comparison.
== will compare the variables in the program. It is used for checking at a value level.
is checks for address level equivalence
== checks for value level equivalence
is is identity testing and == is equality testing (see the Python documentation).
In most cases, if a is b, then a == b. But there are exceptions, for example:
>>> nan = float('nan')
>>> nan is nan
True
>>> nan == nan
False
So, you can only use is for identity tests, never equality tests.
The basic concept, we have to be clear, while approaching this question, is to understand the difference between is and ==.
"is" is will compare the memory location. if id(a)==id(b), then a is b returns true else it returns false.
So, we can say that is is used for comparing memory locations. Whereas,
== is used for equality testing which means that it just compares only the resultant values. The below shown code may acts as an example to the above given theory.
Code
In the case of string literals (strings without getting assigned to variables), the memory address will be same as shown in the picture. so, id(a)==id(b). The remaining of this is self-explanatory.