Does comparing using `==` compare identities before comparing values? - python

If I compare two variables using ==, does Python compare the identities, and, if they're not the same, then compare the values?
For example, I have two strings which point to the same string object:
>>> a = 'a sequence of chars'
>>> b = a
Does this compare the values, or just the ids?:
>>> b == a
True
It would make sense to compare identity first, and I guess that is the case, but I haven't yet found anything in the documentation to support this. The closest I've got is this:
x==y calls x.__eq__(y)
which doesn't tell me whether anything is done before calling x.__eq__(y).

For user-defined class instances, is is used as a fallback - where the default __eq__ isn't overridden, a == b is evaluated as a is b. This ensures that the comparison will always have a result (except in the NotImplemented case, where comparison is explicitly forbidden).
This is (somewhat obliquely - good spot Sven Marnach) referred to in the data model documentation (emphasis mine):
User-defined classes have __eq__() and __hash__() methods by
default; with them, all objects compare unequal (except with
themselves) and x.__hash__() returns an appropriate value such
that x == y implies both that x is y and hash(x) == hash(y).
You can demonstrate it as follows:
>>> class Unequal(object):
def __eq__(self, other):
return False
>>> ue = Unequal()
>>> ue is ue
True
>>> ue == ue
False
so __eq__ must be called before id, but:
>>> class NoEqual(object):
pass
>>> ne = NoEqual()
>>> ne is ne
True
>>> ne == ne
True
so id must be invoked where __eq__ isn't defined.
You can see this in the CPython implementation, which notes:
/* If neither object implements it, provide a sensible default
for == and !=, but raise an exception for ordering. */
The "sensible default" implemented is a C-level equality comparison of the pointers v and w, which will return whether or not they point to the same object.

In addition to the answer by #jonrsharpe: if the objects being compared implement __eq__, it would be wrong for Python to check for identity first.
Look at the following example:
>>> x = float('nan')
>>> x is x
True
>>> x == x
False
NaN is a specific thing that should never compare equal to itself; however, even in this case x is x should return True, because of the semantics of is.

Related

Check if object is in an iterable using "is" identity instead of "==" equality

if object in lst:
#do something
As far as I can tell, when you execute this statement it is internally checking == between object and every element in lst, which will refer to the __eq__ methods of these two objects. This can have the implication of two distinct objects being "equal", which is usually desired if all of their attributes are the same.
However, is there a way to Pythonically achieve a predicate such as in where the underlying equality check is is - i.e. we're actually checking if the two references are to the same object?
3list membership in python is dictated by the __contains__ dunder method. You can choose to overwrite this for a custom implementation if you want to use the normal "in" syntax:
class my_list(list):
def __contains__(self, x):
for y in self:
if x is y:
return True
return False
4 in my_list([4, [3,2,1]])
>> True
[3,2,1] in my_list([4, [3,2,1]]) # Because while the lists are "==" equal, they have different pointers.
>>> False
Otherwise, I'd suggest kaya3's answer of using a generator check.
Use the any function:
if any(x is object for x in lst):
# ...
if you want to specifically use is then just use filter like:
filtered_list = filter(lambda n: n is object, list)

Comparison operators vs “rich comparison” methods in Python

Can someone explain me the differences between the two. Are those normally equivalent ? Maybe I'm completely wrong here, but I thought that each comparison operator was necessarily related to one “rich comparison” method. This is from the documentation:
The correspondence between operator symbols and method names is as
follows:
x<y calls x.__lt__(y), x<=y calls x.__le__(y), x==y calls x.__eq__(y), x!=y calls x.__ne__(y), x>y calls x.__gt__(y), and x>=y calls x.__ge__(y).
Here is an example that demonstrates my confusion.
Python 3.x:
dict1 = {1:1}
dict2 = {2:2}
>>> dict1 < dict2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'dict' and 'dict'
>>> dict1.__lt__(dict2)
NotImplemented
Python 2.x:
dict1 = {1:1}
dict2 = {2:2}
>>> dict1 < dict2
True
>>> dict1.__lt__(dict2)
NotImplemented
From the python 3 example, it seems logic that calling dict1 < dict2 is not supported. But what about Python 2 example ? Why is it accepted ?
I know that unlike Python 2, in Python 3, not all objects supports comparison operators. At my surprise though, both version return the NotImplemented singleton when calling __lt__().
This is relying on the __cmp__ magic method, which is what the rich-comparison operators were meant to replace:
>>> dict1 = {1:1}
>>> dict2 = {2:2}
>>> dict1.__cmp__
<method-wrapper '__cmp__' of dict object at 0x10f075398>
>>> dict1.__cmp__(dict2)
-1
As to the ordering logic, here is the Python 2.7 documentation:
Mappings (instances of dict) compare equal if and only if they have
equal (key, value) pairs. Equality comparison of the keys and values
enforces reflexivity.
Outcomes other than equality are resolved consistently, but are not
otherwise defined.
With a footnote:
Earlier versions of Python used lexicographic comparison of the sorted
(key, value) lists, but this was very expensive for the common case of
comparing for equality. An even earlier version of Python compared
dictionaries by identity only, but this caused surprises because
people expected to be able to test a dictionary for emptiness by
comparing it to {}.
And, in Python 3.0, ordering has been simplified. This is from the documentation:
The ordering comparison operators (<, <=, >=, >) raise a TypeError
exception when the operands don’t have a meaningful natural ordering.
builtin.sorted() and list.sort() no longer accept the cmp argument
providing a comparison function. Use the key argument instead.
The cmp() function should be treated as gone, and the __cmp__() special method
is no longer supported. Use __lt__() for sorting, __eq__() with
__hash__(), and other rich comparisons as needed. (If you really need the cmp() functionality, you could use the expression (a > b) - (a <> b) as the equivalent for cmp(a, b).)
So, to be explicit, in Python 2, since the rich comparison operators are not implemented, dict objects will fall-back to __cmp__, from the data-model documentation:
object.__cmp__(self, other)
Called by comparison operations if rich
comparison (see above) is not defined. Should return a negative
integer if self < other, zero if self == other, a positive integer
if self > other.
Note for operator < versus __lt__:
import types
class A:
def __lt__(self, other): return True
def new_lt(self, other): return False
a = A()
print(a < a, a.__lt__(a)) # True True
a.__lt__ = types.MethodType(new_lt, a)
print(a < a, a.__lt__(a)) # True False
A.__lt__ = types.MethodType(new_lt, A)
print(a < a, a.__lt__(a)) # False False
< calls __lt__ defined on class; __lt__ calls __lt__ defined on object.
It's usually the same :) And it is totally delicious to use: A.__lt__ = new_lt

Python 3 == operator

I'm confused as to how the == operator works in Python 3. From the docs, eq(a, b) is equivalent to a == b. Also eq and __eq__ are equivalent.
Take the following example:
class Potato:
def __eq__(self, other):
print("In Potato's __eq__")
return True
>> p = Potato()
>> p == "hello"
In Potato's __eq__ # As expected, p.__eq__("hello") is called
True
>> "hello" == p
In Potato's __eq__ # Hmm, I expected this to be false because
True # this should call "hello".__eq__(p)
>> "hello".__eq__(p)
NotImplemented # Not implemented? How does == work for strings then?
AFAIK, the docs only talk about the == -> __eq__ mapping, but don't say anything about what happens either one of the arguments is not an object (e.g. 1 == p), or when the first object's __eq__ is NotImplemented, like we saw with "hello".__eq(p).
I'm looking for the general algorithm that is employed for equality... Most, if not all other SO answers, refer to Python 2's coercion rules, which don't apply anymore in Python 3.
You're mixing up the functions in the operator module and the methods used to implement those operators. operator.eq(a, b) is equivalent to a == b or operator.__eq__(a, b), but not to a.__eq__(b).
In terms of the __eq__ method, == and operator.eq work as follows:
def eq(a, b):
if type(a) is not type(b) and issubclass(type(b), type(a)):
# Give type(b) priority
a, b = b, a
result = a.__eq__(b)
if result is NotImplemented:
result = b.__eq__(a)
if result is NotImplemented:
result = a is b
return result
with the caveat that the real code performs method lookup for __eq__ in a way that bypasses instance dicts and custom __getattribute__/__getattr__ methods.
When you do this:
"hello" == potato
Python first calls "hello".__eq__(potato). That return NotImplemented, so Python tries it the other way: potato.__eq__("hello").
Returning NotImplemented doesn't mean there's no implementation of .__eq__ on that object. It means that the implementation didn't know how to compare to the value that was passed in. From https://docs.python.org/3/library/constants.html#NotImplemented:
Note When a binary (or in-place) method returns NotImplemented the
interpreter will try the reflected operation on the other type (or
some other fallback, depending on the operator). If all attempts
return NotImplemented, the interpreter will raise an appropriate
exception. Incorrectly returning NotImplemented will result in a
misleading error message or the NotImplemented value being returned to
Python code. See Implementing the arithmetic operations for examples.
I'm confused as to how the == operator works in Python 3. From the docs, eq(a, b) is equivalent to a == b. Also eq and __eq__ are equivalent.
No that is only the case in the operator module. The operator module is used to pass an == as a function for instance. But operator has not much to do with vanilla Python itself.
AFAIK, the docs only talk about the == -> eq mapping, but don't say anything about what happens either one of the arguments is not an object (e.g. 1 == p), or when the first object's.
In Python everything is an object: an int is an object, a "class" is an object", a None is an object, etc. We can for instance get the __eq__ of 0:
>>> (0).__eq__
<method-wrapper '__eq__' of int object at 0x55a81fd3a480>
So the equality is implemented in the "int class". As specified in the documentation on the datamodel __eq__ can return several values: True, False but any other object (for which the truthiness will be calculated). If on the other hand NotImplemented is returned, Python will fallback and call the __eq__ object on the object on the other side of the equation.

What does "return a in b" mean?

I want to understand how the return statement works. I am familiar with the return statement but not aware of the return in statement. Below is an example of a class method that uses it and I would like to know what it does.
def a(self, argv):
some = self.fnc("Format Specifier")
return argv in some
value in values means "True if value is in values otherwise False"
a simple example:
In [1]: "foo" in ("foo", "bar", "baz")
Out[1]: True
In [2]: "foo" in ("bar", "baz")
Out[2]: False
So in your case return argv in some means "return True if argv is in some otherwise return False"
It means whether argv is an element of some(in Boolean value). some could be list, tuple, dict etc.
It may be more clear if you know what happens in the background. When you use x in y, that is a shortcut for y.__contains__(x)1. When you define a class, you can define your own __contains__ method that can actually return anything you want. Usually, it returns either True or False. Therefore, argv in some will be the result of argv.__contains__(some): either True or False. You then return that.
1If y does not have the __contains__ method, it is converted to an iterator and each item in it is checked for equality with x.
The return itself is of no importance here: you can interpret this as:
return (argv in some)
Now the in keyword means:
The operators in and not in test for collection membership. x in s evaluates to true if x is a member of the collection s, and false otherwise. x not in s returns the negation of x in s. The collection membership test has traditionally been bound to sequences; an object is a member of a collection if the collection is a sequence and contains an element equal to that object. However, it make sense for many other object types to support membership tests without being a sequence. In particular, dictionaries (for keys) and sets support membership testing.
Python uses a fallback mechanism where it will check whether some (in this case) supports one of the following methods:
First it checks whether some is a list or tuple:
For the list and tuple types, x in y is true if and only if there exists an index i such that either x is y[i] or x == y[i] is true.
Next it checks whether both argv and some are strings:
For the Unicode and string types, x in y is true if and only if x is a substring of y. An equivalent test is y.find(x) != -1. Note, x and y need not be the same type; consequently, u'ab' in 'abc' will return True. Empty strings are always considered to be a substring of any other string, so "" in "abc" will return True.
Now every object that implements a __contains__ method supports such in and not in test as is further described in the documentation:
For user-defined classes which define the __contains__() method, x in y is true if and only if y.__contains__(x) is true.
So besides implemented usages for dictionaries, tuples and lists, you can define your own __contains__ method for arbitrary objects.
Another way to support this functionality is the following:
For user-defined classes which do not define __contains__() but do define __iter__(), x in y is true if some value z with x == z is produced while iterating over y. If an exception is raised during the iteration, it is as if in raised that exception.
And finally:
Lastly, the old-style iteration protocol is tried: if a class defines __getitem__(), x in y is true if and only if there is a non-negative integer index i such that x == y[i], and all lower integer indices do not raise IndexError exception. (If any other exception is raised, it is as if in raised that exception).

Does a Python object which doesn't override comparison operators equals itself?

class A(object):
def __init__(self, value):
self.value = value
x = A(1)
y = A(2)
q = [x, y]
q.remove(y)
I want to remove from the list a specific object which was added before to it and to which I still have a reference. I do not want an equality test. I want an identity test. This code seems to work in both CPython and IronPython, but does the language guarantee this behavior or is it just a fluke?
The list.remove method documentation is this: same as del s[s.index(x)], which implies that an equality test is performed.
So will an object be equal to itself if you don't override __cmp__, __eq__ or __ne__?
Yes. In your example q.remove(y) would remove the first occurrence of an object which compares equal with y. However, the way the class A is defined, you shouldn't† ever have a variable compare equal with y - with the exception of any other names which are also bound to the same y instance.
The relevant section of the docs is here:
If no __cmp__(), __eq__() or __ne__() operation is defined, class
instances are compared by object identity ("address").
So comparison for A instances is by identity (implemented as memory address in CPython). No other object can have an identity equal to id(y) within y's lifetime, i.e. for as long as you hold a reference to y (which you must, if you're going to remove it from a list!)
† Technically, it is still possible to have objects at other memory locations which are comparing equal - mock.ANY is one such example. But these objects need to override their comparison operators to force the result.
In python, by default an object is always equal to itself (the only exception I can think of is float("nan"). An object of a user-defined class will not be equal to any other object unless you define a comparison function.
See also http://docs.python.org/reference/expressions.html#notin
The answer is yes and no.
Consider the following example
>>> class A(object):
def __init__(self, value):
self.value = value
>>> x = A(1)
>>> y = A(2)
>>> z = A(3)
>>> w = A(3)
>>> q = [x, y,z]
>>> id(y) #Second element in the list and y has the same reference
46167248
>>> id(q[1]) #Second element in the list and y has the same reference
46167248
>>> q.remove(y) #So it just compares the id and removes it
>>> q
[<__main__.A object at 0x02C19AB0>, <__main__.A object at 0x02C19B50>]
>>> q.remove(w) #Fails because though z and w contain the same value yet they are different object
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
q.remove(w)
ValueError: list.remove(x): x not in list
It will remove from the list iff they are the same object. If they are different object with same value it won;t remove it.

Categories