Python 3 == operator - python

I'm confused as to how the == operator works in Python 3. From the docs, eq(a, b) is equivalent to a == b. Also eq and __eq__ are equivalent.
Take the following example:
class Potato:
def __eq__(self, other):
print("In Potato's __eq__")
return True
>> p = Potato()
>> p == "hello"
In Potato's __eq__ # As expected, p.__eq__("hello") is called
True
>> "hello" == p
In Potato's __eq__ # Hmm, I expected this to be false because
True # this should call "hello".__eq__(p)
>> "hello".__eq__(p)
NotImplemented # Not implemented? How does == work for strings then?
AFAIK, the docs only talk about the == -> __eq__ mapping, but don't say anything about what happens either one of the arguments is not an object (e.g. 1 == p), or when the first object's __eq__ is NotImplemented, like we saw with "hello".__eq(p).
I'm looking for the general algorithm that is employed for equality... Most, if not all other SO answers, refer to Python 2's coercion rules, which don't apply anymore in Python 3.

You're mixing up the functions in the operator module and the methods used to implement those operators. operator.eq(a, b) is equivalent to a == b or operator.__eq__(a, b), but not to a.__eq__(b).
In terms of the __eq__ method, == and operator.eq work as follows:
def eq(a, b):
if type(a) is not type(b) and issubclass(type(b), type(a)):
# Give type(b) priority
a, b = b, a
result = a.__eq__(b)
if result is NotImplemented:
result = b.__eq__(a)
if result is NotImplemented:
result = a is b
return result
with the caveat that the real code performs method lookup for __eq__ in a way that bypasses instance dicts and custom __getattribute__/__getattr__ methods.

When you do this:
"hello" == potato
Python first calls "hello".__eq__(potato). That return NotImplemented, so Python tries it the other way: potato.__eq__("hello").
Returning NotImplemented doesn't mean there's no implementation of .__eq__ on that object. It means that the implementation didn't know how to compare to the value that was passed in. From https://docs.python.org/3/library/constants.html#NotImplemented:
Note When a binary (or in-place) method returns NotImplemented the
interpreter will try the reflected operation on the other type (or
some other fallback, depending on the operator). If all attempts
return NotImplemented, the interpreter will raise an appropriate
exception. Incorrectly returning NotImplemented will result in a
misleading error message or the NotImplemented value being returned to
Python code. See Implementing the arithmetic operations for examples.

I'm confused as to how the == operator works in Python 3. From the docs, eq(a, b) is equivalent to a == b. Also eq and __eq__ are equivalent.
No that is only the case in the operator module. The operator module is used to pass an == as a function for instance. But operator has not much to do with vanilla Python itself.
AFAIK, the docs only talk about the == -> eq mapping, but don't say anything about what happens either one of the arguments is not an object (e.g. 1 == p), or when the first object's.
In Python everything is an object: an int is an object, a "class" is an object", a None is an object, etc. We can for instance get the __eq__ of 0:
>>> (0).__eq__
<method-wrapper '__eq__' of int object at 0x55a81fd3a480>
So the equality is implemented in the "int class". As specified in the documentation on the datamodel __eq__ can return several values: True, False but any other object (for which the truthiness will be calculated). If on the other hand NotImplemented is returned, Python will fallback and call the __eq__ object on the object on the other side of the equation.

Related

Comparison operators vs “rich comparison” methods in Python

Can someone explain me the differences between the two. Are those normally equivalent ? Maybe I'm completely wrong here, but I thought that each comparison operator was necessarily related to one “rich comparison” method. This is from the documentation:
The correspondence between operator symbols and method names is as
follows:
x<y calls x.__lt__(y), x<=y calls x.__le__(y), x==y calls x.__eq__(y), x!=y calls x.__ne__(y), x>y calls x.__gt__(y), and x>=y calls x.__ge__(y).
Here is an example that demonstrates my confusion.
Python 3.x:
dict1 = {1:1}
dict2 = {2:2}
>>> dict1 < dict2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'dict' and 'dict'
>>> dict1.__lt__(dict2)
NotImplemented
Python 2.x:
dict1 = {1:1}
dict2 = {2:2}
>>> dict1 < dict2
True
>>> dict1.__lt__(dict2)
NotImplemented
From the python 3 example, it seems logic that calling dict1 < dict2 is not supported. But what about Python 2 example ? Why is it accepted ?
I know that unlike Python 2, in Python 3, not all objects supports comparison operators. At my surprise though, both version return the NotImplemented singleton when calling __lt__().
This is relying on the __cmp__ magic method, which is what the rich-comparison operators were meant to replace:
>>> dict1 = {1:1}
>>> dict2 = {2:2}
>>> dict1.__cmp__
<method-wrapper '__cmp__' of dict object at 0x10f075398>
>>> dict1.__cmp__(dict2)
-1
As to the ordering logic, here is the Python 2.7 documentation:
Mappings (instances of dict) compare equal if and only if they have
equal (key, value) pairs. Equality comparison of the keys and values
enforces reflexivity.
Outcomes other than equality are resolved consistently, but are not
otherwise defined.
With a footnote:
Earlier versions of Python used lexicographic comparison of the sorted
(key, value) lists, but this was very expensive for the common case of
comparing for equality. An even earlier version of Python compared
dictionaries by identity only, but this caused surprises because
people expected to be able to test a dictionary for emptiness by
comparing it to {}.
And, in Python 3.0, ordering has been simplified. This is from the documentation:
The ordering comparison operators (<, <=, >=, >) raise a TypeError
exception when the operands don’t have a meaningful natural ordering.
builtin.sorted() and list.sort() no longer accept the cmp argument
providing a comparison function. Use the key argument instead.
The cmp() function should be treated as gone, and the __cmp__() special method
is no longer supported. Use __lt__() for sorting, __eq__() with
__hash__(), and other rich comparisons as needed. (If you really need the cmp() functionality, you could use the expression (a > b) - (a <> b) as the equivalent for cmp(a, b).)
So, to be explicit, in Python 2, since the rich comparison operators are not implemented, dict objects will fall-back to __cmp__, from the data-model documentation:
object.__cmp__(self, other)
Called by comparison operations if rich
comparison (see above) is not defined. Should return a negative
integer if self < other, zero if self == other, a positive integer
if self > other.
Note for operator < versus __lt__:
import types
class A:
def __lt__(self, other): return True
def new_lt(self, other): return False
a = A()
print(a < a, a.__lt__(a)) # True True
a.__lt__ = types.MethodType(new_lt, a)
print(a < a, a.__lt__(a)) # True False
A.__lt__ = types.MethodType(new_lt, A)
print(a < a, a.__lt__(a)) # False False
< calls __lt__ defined on class; __lt__ calls __lt__ defined on object.
It's usually the same :) And it is totally delicious to use: A.__lt__ = new_lt

Does comparing using `==` compare identities before comparing values?

If I compare two variables using ==, does Python compare the identities, and, if they're not the same, then compare the values?
For example, I have two strings which point to the same string object:
>>> a = 'a sequence of chars'
>>> b = a
Does this compare the values, or just the ids?:
>>> b == a
True
It would make sense to compare identity first, and I guess that is the case, but I haven't yet found anything in the documentation to support this. The closest I've got is this:
x==y calls x.__eq__(y)
which doesn't tell me whether anything is done before calling x.__eq__(y).
For user-defined class instances, is is used as a fallback - where the default __eq__ isn't overridden, a == b is evaluated as a is b. This ensures that the comparison will always have a result (except in the NotImplemented case, where comparison is explicitly forbidden).
This is (somewhat obliquely - good spot Sven Marnach) referred to in the data model documentation (emphasis mine):
User-defined classes have __eq__() and __hash__() methods by
default; with them, all objects compare unequal (except with
themselves) and x.__hash__() returns an appropriate value such
that x == y implies both that x is y and hash(x) == hash(y).
You can demonstrate it as follows:
>>> class Unequal(object):
def __eq__(self, other):
return False
>>> ue = Unequal()
>>> ue is ue
True
>>> ue == ue
False
so __eq__ must be called before id, but:
>>> class NoEqual(object):
pass
>>> ne = NoEqual()
>>> ne is ne
True
>>> ne == ne
True
so id must be invoked where __eq__ isn't defined.
You can see this in the CPython implementation, which notes:
/* If neither object implements it, provide a sensible default
for == and !=, but raise an exception for ordering. */
The "sensible default" implemented is a C-level equality comparison of the pointers v and w, which will return whether or not they point to the same object.
In addition to the answer by #jonrsharpe: if the objects being compared implement __eq__, it would be wrong for Python to check for identity first.
Look at the following example:
>>> x = float('nan')
>>> x is x
True
>>> x == x
False
NaN is a specific thing that should never compare equal to itself; however, even in this case x is x should return True, because of the semantics of is.

Why don't methods have reference equality?

I had a bug where I was relying on methods being equal to each other when using is. It turns out that's not the case:
>>> class What:
... def meth(self):
... pass
>>> What.meth is What.meth
True
>>> inst = What()
>>> inst.meth is inst.meth
False
Why is that the case? It works for regular functions:
>>> def func(): pass
>>> func is func
True
Method objects are created each time you access them. Functions act as descriptors, returning a method object when their .__get__ method is called:
>>> What.__dict__['meth']
<function What.meth at 0x10a6f9c80>
>>> What.__dict__['meth'].__get__(What(), What)
<bound method What.meth of <__main__.What object at 0x10a6f7b10>>
If you're on Python 3.8 or later, you can use == equality testing instead. On Python 3.8 and later, two methods are equal if their .__self__ and .__func__ attributes are identical objects (so if they wrap the same function, and are bound to the same instance, both tested with is).
Before 3.8, method == behaviour is inconsistent based on how the method was implemented - Python methods and one of the two C method types compare __self__ for equality instead of identity, while the other C method type compares __self__ by identity. See Python issue 1617161.
If you need to test that the methods represent the same underlying function, test their __func__ attributes:
>>> What.meth == What.meth # functions (or unbound methods in Python 2)
True
>>> What().meth == What.meth # bound method and function
False
>>> What().meth == What().meth # bound methods with *different* instances
False
>>> What().meth.__func__ == What().meth.__func__ # functions
True
Martijn is right that a new Methods are objects generated by .__get__ so their address pointers don't equate with an is evaluation. Note that using == will evaluate as intended in Python 2.7.
Python2.7
class Test(object):
def tmethod(self):
pass
>>> Test.meth is Test.meth
False
>>> Test.meth == Test.meth
True
>>> t = Test()
>>> t.meth is t.meth
False
>>> t.meth == t.meth
True
Note however that methods referenced from an instance do not equate to those referenced from class because of the self reference carried along with the method from an instance.
>>> t = Test()
>>> t.meth is Test.meth
False
>>> t.meth == Test.meth
False
In Python 3.3 the is operator for methods more often behaves the same as the == so you get the expected behavior instead in this example. This results from both __cmp__ disappearing and a cleaner method object representation in Python 3; methods now have __eq__ and references are not built-on-the-fly objects, so the behavior follows as one might expect without Python 2 expectations.
Python3.3
>>> Test.meth is Test.meth
True
>>> Test.meth == Test.meth
True
>>> Test.meth.__eq__(Test.meth)
True

Set "in" operator: uses equality or identity?

class A(object):
def __cmp__(self):
print '__cmp__'
return object.__cmp__(self)
def __eq__(self, rhs):
print '__eq__'
return True
a1 = A()
a2 = A()
print a1 in set([a1])
print a1 in set([a2])
Why does first line prints True, but second prints False? And neither enters operator eq?
I am using Python 2.6
Set __contains__ makes checks in the following order:
'Match' if hash(a) == hash(b) and (a is b or a==b) else 'No Match'
The relevant C source code is in Objects/setobject.c::set_lookkey() and in Objects/object.c::PyObject_RichCompareBool().
You need to define __hash__ too. For example
class A(object):
def __hash__(self):
print '__hash__'
return 42
def __cmp__(self, other):
print '__cmp__'
return object.__cmp__(self, other)
def __eq__(self, rhs):
print '__eq__'
return True
a1 = A()
a2 = A()
print a1 in set([a1])
print a1 in set([a2])
Will work as expected.
As a general rule, any time you implement __cmp__ you should implement a __hash__ such that for all x and y such that x == y, x.__hash__() == y.__hash__().
Sets and dictionaries gain their speed by using hashing as a fast approximation of full equality checking. If you want to redefine equality, you usually need to redefine the hash algorithm so that it is consistent.
The default hash function uses the identity of the object, which is pretty useless as a fast approximation of full equality, but at least allows you to use an arbitrary class instance as a dictionary key and retrieve the value stored with it if you pass exactly the same object as a key. But it means if you redefine equality and don't redefine the hash function, your objects will go into a dictionary/set without complaining about not being hashable, but still won't actually work the way you expect them to.
See the official python docs on __hash__ for more details.
A tangential answer, but your question and my testing made me curious. If you ignore the set operator which is the source of your __hash__ problem, it turns out your question is still interesting.
Thanks to the help I got on this SO question, I was able to chase the in operator through the source code to it's root. Near the bottom I found the PyObject_RichCompareBool function which indeed tests for identity (see the comment about "Quick result") before testing for equality.
So unless I misunderstand the way things work, the technical answer to your question is first identity and then equality, through the equality test itself. Just to reiterate, that is not the source of the behavior you were seeing but just the technical answer to your question.
If I misunderstood the source, somebody please set me straight.
int
PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
{
PyObject *res;
int ok;
/* Quick result when objects are the same.
Guarantees that identity implies equality. */
if (v == w) {
if (op == Py_EQ)
return 1;
else if (op == Py_NE)
return 0;
}
res = PyObject_RichCompare(v, w, op);
if (res == NULL)
return -1;
if (PyBool_Check(res))
ok = (res == Py_True);
else
ok = PyObject_IsTrue(res);
Py_DECREF(res);
return ok;
}
Sets seem to use hash codes, then identity, before comparing for equality. The following code:
class A(object):
def __eq__(self, rhs):
print '__eq__'
return True
def __hash__(self):
print '__hash__'
return 1
a1 = A()
a2 = A()
print 'set1'
set1 = set([a1])
print 'set2'
set2 = set([a2])
print 'a1 in set1'
print a1 in set1
print 'a1 in set2'
print a1 in set2
outputs:
set1
__hash__
set2
__hash__
a1 in set1
__hash__
True
a1 in set2
__hash__
__eq__
True
What happens seems to be:
The hash code is computed when an element is inserted into a hash. (To compare with the existing elements.)
The hash code for the object you're checking with the in operator is computed.
Elements of the set with the same hash code are inspected by first checking whether they're the same object as the one you're looking for, or if they're logically equal to it.

How is __eq__ handled in Python and in what order?

Since Python does not provide left/right versions of its comparison operators, how does it decide which function to call?
class A(object):
def __eq__(self, other):
print "A __eq__ called"
return self.value == other
class B(object):
def __eq__(self, other):
print "B __eq__ called"
return self.value == other
>>> a = A()
>>> a.value = 3
>>> b = B()
>>> b.value = 4
>>> a == b
"A __eq__ called"
"B __eq__ called"
False
This seems to call both __eq__ functions.
I am looking for the official decision tree.
The a == b expression invokes A.__eq__, since it exists. Its code includes self.value == other. Since int's don't know how to compare themselves to B's, Python tries invoking B.__eq__ to see if it knows how to compare itself to an int.
If you amend your code to show what values are being compared:
class A(object):
def __eq__(self, other):
print("A __eq__ called: %r == %r ?" % (self, other))
return self.value == other
class B(object):
def __eq__(self, other):
print("B __eq__ called: %r == %r ?" % (self, other))
return self.value == other
a = A()
a.value = 3
b = B()
b.value = 4
a == b
it will print:
A __eq__ called: <__main__.A object at 0x013BA070> == <__main__.B object at 0x013BA090> ?
B __eq__ called: <__main__.B object at 0x013BA090> == 3 ?
When Python2.x sees a == b, it tries the following.
If type(b) is a new-style class, and type(b) is a subclass of type(a), and type(b) has overridden __eq__, then the result is b.__eq__(a).
If type(a) has overridden __eq__ (that is, type(a).__eq__ isn't object.__eq__), then the result is a.__eq__(b).
If type(b) has overridden __eq__, then the result is b.__eq__(a).
If none of the above are the case, Python repeats the process looking for __cmp__. If it exists, the objects are equal iff it returns zero.
As a final fallback, Python calls object.__eq__(a, b), which is True iff a and b are the same object.
If any of the special methods return NotImplemented, Python acts as though the method didn't exist.
Note that last step carefully: if neither a nor b overloads ==, then a == b is the same as a is b.
From https://eev.ee/blog/2012/03/24/python-faq-equality/
Python 3 Changes/Updates for this algorithm
How is __eq__ handled in Python and in what order?
a == b
It is generally understood, but not always the case, that a == b invokes a.__eq__(b), or type(a).__eq__(a, b).
Explicitly, the order of evaluation is:
if b's type is a strict subclass (not the same type) of a's type and has an __eq__, call it and return the value if the comparison is implemented,
else, if a has __eq__, call it and return it if the comparison is implemented,
else, see if we didn't call b's __eq__ and it has it, then call and return it if the comparison is implemented,
else, finally, do the comparison for identity, the same comparison as is.
We know if a comparison isn't implemented if the method returns NotImplemented.
(In Python 2, there was a __cmp__ method that was looked for, but it was deprecated and removed in Python 3.)
Let's test the first check's behavior for ourselves by letting B subclass A, which shows that the accepted answer is wrong on this count:
class A:
value = 3
def __eq__(self, other):
print('A __eq__ called')
return self.value == other.value
class B(A):
value = 4
def __eq__(self, other):
print('B __eq__ called')
return self.value == other.value
a, b = A(), B()
a == b
which only prints B __eq__ called before returning False.
Note that I also correct a small error in the question where self.value is compared to other instead of other.value - in this comparison, we get two objects (self and other), usually of the same type since we are doing no type-checking here (but they can be of different types), and we need to know if they are equal. Our measure of whether or not they are equal is to check the value attribute, which must be done on both objects.
How do we know this full algorithm?
The other answers here seem incomplete and out of date, so I'm going to update the information and show you how how you could look this up for yourself.
This is handled at the C level.
We need to look at two different bits of code here - the default __eq__ for objects of class object, and the code that looks up and calls the __eq__ method regardless of whether it uses the default __eq__ or a custom one.
Default __eq__
Looking __eq__ up in the relevant C api docs shows us that __eq__ is handled by tp_richcompare - which in the "object" type definition in cpython/Objects/typeobject.c is defined in object_richcompare for case Py_EQ:.
case Py_EQ:
/* Return NotImplemented instead of False, so if two
objects are compared, both get a chance at the
comparison. See issue #1393. */
res = (self == other) ? Py_True : Py_NotImplemented;
Py_INCREF(res);
break;
So here, if self == other we return True, else we return the NotImplemented object. This is the default behavior for any subclass of object that does not implement its own __eq__ method.
How __eq__ gets called
Then we find the C API docs, the PyObject_RichCompare function, which calls do_richcompare.
Then we see that the tp_richcompare function, created for the "object" C definition is called by do_richcompare, so let's look at that a little more closely.
The first check in this function is for the conditions the objects being compared:
are not the same type, but
the second's type is a subclass of the first's type, and
the second's type has an __eq__ method,
then call the other's method with the arguments swapped, returning the value if implemented. If that method isn't implemented, we continue...
if (!Py_IS_TYPE(v, Py_TYPE(w)) &&
PyType_IsSubtype(Py_TYPE(w), Py_TYPE(v)) &&
(f = Py_TYPE(w)->tp_richcompare) != NULL) {
checked_reverse_op = 1;
res = (*f)(w, v, _Py_SwappedOp[op]);
if (res != Py_NotImplemented)
return res;
Py_DECREF(res);
Next we see if we can lookup the __eq__ method from the first type and call it.
As long as the result is not NotImplemented, that is, it is implemented, we return it.
if ((f = Py_TYPE(v)->tp_richcompare) != NULL) {
res = (*f)(v, w, op);
if (res != Py_NotImplemented)
return res;
Py_DECREF(res);
Else if we didn't try the other type's method and it's there, we then try it, and if the comparison is implemented, we return it.
if (!checked_reverse_op && (f = Py_TYPE(w)->tp_richcompare) != NULL) {
res = (*f)(w, v, _Py_SwappedOp[op]);
if (res != Py_NotImplemented)
return res;
Py_DECREF(res);
}
Finally, we get a fallback in case it isn't implemented for either one's type.
The fallback checks for the identity of the object, that is, whether it is the same object at the same place in memory - this is the same check as for self is other:
/* If neither object implements it, provide a sensible default
for == and !=, but raise an exception for ordering. */
switch (op) {
case Py_EQ:
res = (v == w) ? Py_True : Py_False;
break;
Conclusion
In a comparison, we respect the subclass implementation of comparison first.
Then we attempt the comparison with the first object's implementation, then with the second's if it wasn't called.
Finally we use a test for identity for comparison for equality.

Categories