Why/When in Python does `x==y` call `y.__eq__(x)`? - python

The Python docs clearly state that x==y calls x.__eq__(y). However it seems that under many circumstances, the opposite is true. Where is it documented when or why this happens, and how can I work out for sure whether my object's __cmp__ or __eq__ methods are going to get called.
Edit: Just to clarify, I know that __eq__ is called in preferecne to __cmp__, but I'm not clear why y.__eq__(x) is called in preference to x.__eq__(y), when the latter is what the docs state will happen.
>>> class TestCmp(object):
... def __cmp__(self, other):
... print "__cmp__ got called"
... return 0
...
>>> class TestEq(object):
... def __eq__(self, other):
... print "__eq__ got called"
... return True
...
>>> tc = TestCmp()
>>> te = TestEq()
>>>
>>> 1 == tc
__cmp__ got called
True
>>> tc == 1
__cmp__ got called
True
>>>
>>> 1 == te
__eq__ got called
True
>>> te == 1
__eq__ got called
True
>>>
>>> class TestStrCmp(str):
... def __new__(cls, value):
... return str.__new__(cls, value)
...
... def __cmp__(self, other):
... print "__cmp__ got called"
... return 0
...
>>> class TestStrEq(str):
... def __new__(cls, value):
... return str.__new__(cls, value)
...
... def __eq__(self, other):
... print "__eq__ got called"
... return True
...
>>> tsc = TestStrCmp("a")
>>> tse = TestStrEq("a")
>>>
>>> "b" == tsc
False
>>> tsc == "b"
False
>>>
>>> "b" == tse
__eq__ got called
True
>>> tse == "b"
__eq__ got called
True
Edit: From Mark Dickinson's answer and comment it would appear that:
Rich comparison overrides __cmp__
__eq__ is it's own __rop__ to it's __op__ (and similar for __lt__, __ge__, etc)
If the left object is a builtin or new-style class, and the right is a subclass of it, the right object's __rop__ is tried before the left object's __op__
This explains the behaviour in theTestStrCmp examples. TestStrCmp is a subclass of str but doesn't implement its own __eq__ so the __eq__ of str takes precedence in both cases (ie tsc == "b" calls b.__eq__(tsc) as an __rop__ because of rule 1).
In the TestStrEq examples, tse.__eq__ is called in both instances because TestStrEq is a subclass of str and so it is called in preference.
In the TestEq examples, TestEq implements __eq__ and int doesn't so __eq__ gets called both times (rule 1).
But I still don't understand the very first example with TestCmp. tc is not a subclass on int so AFAICT 1.__cmp__(tc) should be called, but isn't.

You're missing a key exception to the usual behaviour: when the right-hand operand is an instance of a subclass of the class of the left-hand operand, the special method for the right-hand operand is called first.
See the documentation at:
http://docs.python.org/reference/datamodel.html#coercion-rules
and in particular, the following two paragraphs:
For objects x and y, first
x.__op__(y) is tried. If this is not
implemented or returns
NotImplemented, y.__rop__(x) is
tried. If this is also not implemented
or returns NotImplemented, a
TypeError exception is raised. But see
the following exception:
Exception to the previous item: if the
left operand is an instance of a
built-in type or a new-style class,
and the right operand is an instance
of a proper subclass of that type or
class and overrides the base’s
__rop__() method, the right
operand’s __rop__() method is tried
before the left operand’s __op__()
method.

Actually, in the docs, it states:
[__cmp__ is c]alled by comparison operations if rich comparison (see above) is not defined.
__eq__ is a rich comparison method and, in the case of TestCmp, is not defined, hence the calling of __cmp__

As I know, __eq__() is a so-called “rich comparison” method, and is called for comparison operators in preference to __cmp__() below. __cmp__() is called if "rich comparison" is not defined.
So in A == B:
If __eq__() is defined in A it will be called
Else __cmp__() will be called
__eq__() defined in 'str' so your __cmp__() function was not called.
The same rule is for __ne__(), __gt__(), __ge__(), __lt__() and __le__() "rich comparison" methods.

Is this not documented in the Language Reference? Just from a quick look there, it looks like __cmp__ is ignored when __eq__, __lt__, etc are defined. I'm understanding that to include the case where __eq__ is defined on a parent class. str.__eq__ is already defined so __cmp__ on its subclasses will be ignored. object.__eq__ etc are not defined so __cmp__ on its subclasses will be honored.
In response to the clarified question:
I know that __eq__ is called in
preferecne to __cmp__, but I'm not
clear why y.__eq__(x) is called in
preference to x.__eq__(y), when the
latter is what the docs state will
happen.
Docs say x.__eq__(y) will be called first, but it has the option to return NotImplemented in which case y.__eq__(x) is called. I'm not sure why you're confident something different is going on here.
Which case are you specifically puzzled about? I'm understanding you just to be puzzled about the "b" == tsc and tsc == "b" cases, correct? In either case, str.__eq__(onething, otherthing) is being called. Since you don't override the __eq__ method in TestStrCmp, eventually you're just relying on the base string method and it's saying the objects aren't equal.
Without knowing the implementation details of str.__eq__, I don't know whether ("b").__eq__(tsc) will return NotImplemented and give tsc a chance to handle the equality test. But even if it did, the way you have TestStrCmp defined, you're still going to get a false result.
So it's not clear what you're seeing here that's unexpected.
Perhaps what's happening is that Python is preferring __eq__ to __cmp__ if it's defined on either of the objects being compared, whereas you were expecting __cmp__ on the leftmost object to have priority over __eq__ on the righthand object. Is that it?

Related

Equality operator for functions?

I am building up my understanding of Python, and recently I understood that functions must be classes(?), and that a def func(): just instantiates an object of class function. I was mindblown when I created an attribute of func honestly.
Lurking in dir(func) I noticed that indeed all the special methods such as .__eq__ are inherited, and I wanted to play around with it:
def func(n):
print(n)
def __eq__(self,func2):
print('hello')
However, it does not work:
>>> func.__eq__(print)
NotImplemented
What would it be the proper way to overload the equality operator for a function? I don't see how to overload it without having a proper class definition.
Because python treats all functions as objects, it might be worth thinking of creating functions as creating instances of the class function (which isn't documented) with the __call__ method being the body of the function created with def. The actual C source of the function class is on Github if you want to know implementation details.
With returning NotImplemented:
In Python, if objects do not override __eq__ or __hash__, there is default implementations where __hash__ = builtins.id and __eq__ is like lambda self, other: self is other. When the comparison operators return NotImplemented, this instructs the runtime to search for another method that does the same thing, like trying __ne__ instead of __eq__, or trying operators from the parent.
>>> def test(a):
... return a
...
>>> def test2(a):
... return a
...
>>> test == test2
False
>>> test.__eq__(test2)
NotImplemented
You can also test this by creating a dummy class that doesn't override __eq__ (like how the function class doesn't):
>>> class testcls:
... pass
...
>>> t1 = testcls()
>>> t2 = testcls()
>>> t1.__eq__(t2)
NotImplemented
>>> t1.__eq__(t1)
True
>>> t1 == t2
False
No, functions don't have to be classes (yet they are in Python), actually. But that is another story.
Seems you misunderstand that __eq__ is a method, i.e. you need a class that it belongs to in order to overload it. What you code does is defines some custom function within another function, which is valid Python code but has nothing to do with __eq__ overloading.
Since you can't inherit function, you can't overwrite it's eq. And even if you could do that, it won't work due to some purely theoretical reasons.

Inconsistent implementation of collections.abc

I'm trying to understand collections.abc source code.
Let's take a look on Hashable class' __subclasshook__ implementation:
#classmethod
def __subclasshook__(cls, C):
if cls is Hashable:
for B in C.__mro__:
if "__hash__" in B.__dict__:
if B.__dict__["__hash__"]:
return True
break
return NotImplemented
Here we first of all check that there is property hash and than check that it has non-false value. This logic is also presented in Awaitable class.
And AsyncIterable class' __subclasshook__:
#classmethod
def __subclasshook__(cls, C):
if cls is AsyncIterable:
if any("__aiter__" in B.__dict__ for B in C.__mro__):
return True
return NotImplemented
Here we just check that there is __aiter___ property, and this logic is presented in any other classes from this package.
Is there any reason for this logic difference?
The __hash__ protocol explicitly allows flagging a class as unhashable by setting __hash__ = None.
If a class [...] wishes to suppress hash support, it should include __hash__ = None in the class definition.
The reason is that a == b always requires hash(a) == hash(b). Otherwise, dict, set and similar data structures break. If a child class changes __eq__ explicitly or otherwise, this may no longer hold true. Thus, __hash__ can be flagged as not applicable.

Python: case where x==y and x.__eq__y() return different things. Why?

I'm taking my first computing science course, and we just learned about class implementation and inheritance. In particular, we just covered method overriding and how classes we define inherit from the object superclass by default. As one of my examples trying out this particular case of inheritance, I used the following code:
class A:
def __init__(self, i):
self.i = i
def __str__(self):
return "A"
# Commenting out these two lines to not override __eq__(), just use the
# default from our superclass, object
#def __eq__(self, other):
#return self.i == other.i
x = A(2)
y = A(2)
>>>print(x == y)
False
>>>print(x.__eq__(y))
NotImplemented
I expected the result from (x == y), because as I understand it the default for __eq__() is to check if they're the same objects or not, not worrying about the contents. Which is False, x and y have the same contents but are different objects. The second one surprised me though.
So my questions: I thought (x==y) and x.__eq__(y) were synonymous and made exactly the same call. Why do these produce differing output? And why does the second conditional return NotImplemented?
The == operator is equivalent to the eq function, which will internally call the __eq__ method of the left operand if it exists to try to determine equality. This is not the only thing it will do, and if __eq__ does not exist, as is the case here, it will do other checks, such as checking whether the two are the same object, or __cmp__ pre-Python 3.
So in a nutshell, your confusion arises from this assumption, which is incorrect:
I thought (x==y) and x.__eq__(y) were synonymous and made exactly the same call
In fact, (x==y) and operators.eq(x, y) are synonymous, and x.__eq__(y) is one of the things eq(x, y) will try to check.
The NotImplemented value you're seeing returned from your inherited __eq__ method is a special builtin value used as a sentinel in Python. It can be returned by __magic__ methods that implement mathematical or comparison operators to indicate that the class does not support the operator that was attempted (with the provided arguments).
This can be more useful than raising an exception, as it allows Python to fall back to other options to resolve the operator use. For instance, if you do x + y, Python will first try to run x.__add__(y). If that returns NotImplemented, it will next try the "reverse" version, y.__radd__(x), which may work if y is a more sophisticated type than x is.
In the case you're asking about, x == y, Python first tries x.__eq__(y), then y.__eq__(x), and finally x is y (which will always evaluate to a Boolean value). Since object.__eq__ returns NotImplemented in all cases, your class falls back to the identity comparison when you use the real operator, but shows you the NotImplemented sentinel when you call __eq__ directly.
If you have implemented the __eq__() function for a class, it gets called when you use x == y. Otherwise x == y relies on a default comparison logic. __eq__() does not get implemented automatically when you define a class.

python bug with __le__, __ge__?

Is it me or python that is confused with the following code ? I would expect __le__ to be called by a <= ab, not __ge__:
#!/usr/bin/env python2
class B(object):
def __ge__(self, other):
print("__ge__ unexpectedly called")
class A(object):
def __le__(self, other):
print("__le__ called")
class AB(A, B):
pass
a = A()
ab = AB()
a <= ab # --> __ge__ unexpectedly called
ab <= a # --> __le__ called
I get the same behavior with python 2.7, 3.2 and pypy 1.9.
What can I do to get __le__ called instead of __ge__ ??
The short answer is that they wanted to allow AB to override the behavior from A. Python can't call AB.__lt__(a, ab), because a may not be a valid self for an AB method, so instead, it calls AB.__gt__(ab, a), which is valid.
The long answer is a bit more complicated.
According to the docs for rich comparison operators:
There are no swapped-argument versions of these methods (to be used when the left argument does not support the operation but the right argument does); rather, __lt__() and __gt__() are each other’s reflection, __le__() and __ge__() are each other’s reflection, and __eq__() and __ne__() are their own reflection.
In other words, x <= y will call y.__ge__(x) in exactly the same cases where x+y would call y.__radd__(x). To compare:
>>> class X(object):
... def __add__(self, other):
... print('X.add')
>>> class Y(object):
... def __radd__(self, other):
... print('Y.radd')
>>> class XY(X, Y):
... pass
>>> x, xy = X(), XY()
>>> x + xy
Y.radd
According to the docs for reflected operators:
These methods are called to implement the binary arithmetic operations… with reflected (swapped) operands. These functions are only called if the left operand does not support the corresponding operation and the operands are of different types…
Note: If the right operand’s type is a subclass of the left operand’s type and that subclass provides the reflected method for the operation, this method will be called before the left operand’s non-reflected method. This behavior allows subclasses to override their ancestors’ operations.
So, because XY is a subclass of X, XY.__radd__ gets preference over X.__add__. And, likewise, because AB is a subclass of A, AB.__ge__ gets preference over A.__le__.
This probably should be documented better. To figure it out, you have to ignore the parenthetical "to be used when the left argument does not support the operation but the right argument does", guess that you need to look up the normal swapped operators (there's no link, or even mention, here), then ignore the wording that says "These functions are only called if the left operand does not support the corresponding operation", and see the "Note", which contradicts what came above… Also notice that the docs explicitly say, "There are no implied relationships among the comparison operators", only a paragraph before describing the swapped cases, which imply exactly such relationships…
Finally, this case seems odd, because AB, rather than overriding __ge__ itself, just inherited it from B, which knows nothing about A and is unrelated to it. Presumably B didn't intend to have its subclasses override A's behavior. But if B were meant to be used as a mixin for A-derived classes, maybe it would intend exactly such an override. And at any rate, the rule is probably already complicated enough without getting into where each method came from in the MRO. Whatever the reasoning, where the __ge__ comes from is irrelevant; if it's there on the subclass, it gets called.
For your added final, question, "What can I do to get __le__ called instead of __ge__ ??"… well, you really can't, any more than you can get X.__add__ called instead of XY.__radd__. Of course you can always implement an AB.__ge__ (or XY.__radd__) that calls A.__le__ (or X.__add__), but it's presumably easier to just implement AB.__ge__ in such a way that it works with an A as its other argument in the first place. Alternatively, you could remove the inheritance and find some other way to model whatever you were modeling that way. Or you could explicitly call a.__le__(ab) instead of a<=ab. But otherwise, if you designed your classes in a way that takes advantage of the "no implied relationships" to do something weird, you were misled by the docs, and will have to redesign them somehow.

Python NotImplemented constant

Looking through decimal.py, it uses NotImplemented in many special methods. e.g.
class A(object):
def __lt__(self, a):
return NotImplemented
def __add__(self, a):
return NotImplemented
The Python docs say:
NotImplemented
Special value which can be returned by the “rich comparison”
special methods (__eq__(), __lt__(),
and friends), to indicate that the
comparison is not implemented with
respect to the other type.
It doesn't talk about other special methods and neither does it describe the behavior.
It seems to be a magic object which if returned from other special methods raises TypeError, and in “rich comparison” special methods does nothing.
e.g.
print A() < A()
prints True, but
print A() + 1
raises TypeError, so I am curious as to what's going on and what is the usage/behavior of NotImplemented.
NotImplemented allows you to indicate that a comparison between the two given operands has not been implemented (rather than indicating that the comparison is valid, but yields False, for the two operands).
From the Python Language Reference:
For objects x and y, first x.__op__(y)
is tried. If this is not implemented
or returns NotImplemented,
y.__rop__(x) is tried. If this is also
not implemented or returns
NotImplemented, a TypeError exception
is raised. But see the following
exception:
Exception to the previous
item: if the left operand is an
instance of a built-in type or a
new-style class, and the right operand
is an instance of a proper subclass of
that type or class and overrides the
base's __rop__() method, the right
operand's __rop__() method is tried
before the left operand's __op__()
method. This is done so that a
subclass can completely override
binary operators. Otherwise, the left
operand's __op__() method would always
accept the right operand: when an
instance of a given class is expected,
an instance of a subclass of that
class is always acceptable.
It actually has the same meaning when returned from __add__ as from __lt__, the difference is Python 2.x is trying other ways of comparing the objects before giving up. Python 3.x does raise a TypeError. In fact, Python can try other things for __add__ as well, look at __radd__ and (though I'm fuzzy on it) __coerce__.
# 2.6
>>> class A(object):
... def __lt__(self, other):
... return NotImplemented
>>> A() < A()
True
# 3.1
>>> class A(object):
... def __lt__(self, other):
... return NotImplemented
>>> A() < A()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: A() < A()
See Ordering Comparisions (3.0 docs) for more info.
If you return it from __add__ it will behave like the object has no __add__ method, and raise a TypeError.
If you return NotImplemented from a rich comparison function, Python will behave like the method wasn't implemented, that is, it will defer to using __cmp__.

Categories