Python History and Design: Why issubclass() instead of rich comparisons? - python

In Python, the comparison operators -- <, <=, ==, !=, >, >= -- can be implemented to mean whatever is relevant to the implementing class. In Python 2 that was done by overriding __cmp__, and in Python 3 by overriding __lt__ and friends. What is the advantage of having an issubclass() built-in method instead of allowing for expressions such as bool < int (true), int < object (true), int <= int, or int < float (false). In particular, I'll note that classes ordered by issubclass() constitutes a partially ordered set in the mathematical sense.
The Python 3 equivalent of what I'm thinking would look like what's below. This code doesn't replace issubclass() (though looping over the MRO would accomplish that, right?). However, wouldn't this be more intuitive?
#functools.total_ordering
class Type(type):
"Metaclass whose instances (which are classes) can use <= instead issubclass()"
def __lt__(self, other):
try:
return issubclass(self, other) and self != other
except TypeError: # other isn't a type or tuple of types
return NotImplemented
def __eq__(self, other):
if isinstance(other, tuple): # For compatibility with __lt__
for other_type in other:
if type(self) is type(other_type):
return False
return True
else:
return type(self) is type(other)
Actual Question: What is the advantage of having an issubclass() built-in method instead of allowing for expressions such as bool < int (true), int < object (true), int <= int, or int < float (false).

Because it would be against the Zen of Python: http://www.python.org/dev/peps/pep-0020/
Explicit is better than implicit.
If you look at the following line of code in isolation:
issubclass(a, b)
It's perfectly obvious that a and b are variables containing classes and we are checking if a is a subclass of b. And if they happen to not contain classes, you'll know.
But looking at this
a < b
Would not tell you anything. You need to examine the surrounding code to determine they contain classes before you know that we are checking if the class in a is a subclass of b. And if say a=5 and b=6 it will still run "fine".
But Python is flexible, so if you really want this, you can implement a base type with such behaviour as you've shown.
Actually - as an aside - the prevalence of overloading operators in C++ for example is a significant drawback of the language (at least in my eyes) because when you see a + b it might as well launch a nuclear missile for all you know.... until you check types of a/b, look up the class implementation and + operator overload implementation (if any... and if not see if the parent class has any.... and if not see if the parent parent...)

One advantage in bold:
issubclass(class, classinfo)
Return true if class is a subclass (direct, indirect or virtual) of classinfo. A class is considered a subclass of itself. classinfo may be a tuple of class objects, in which case every entry in classinfo will be checked. In any other case, a TypeError exception is raised.
Another is that it's descriptive; not everyone using Python is a mathematician.

I would say the advantage is non-functional. The only technical difference is that < is infix.
But this question isn't about technical stuff. It seems to be about semantics and ease of reading.
Using < would denote order. Although class hierarchy can be interpreted as "orderable", it'll always be an approximation. A non-obvious one, for many people.
Using issubclass is clearer, still simple and doesn't lend itself to no other interpretation other than what it actually does: check if an object/classinfo is a subclass of class.
Plain, simple, unambiguous, effective. Those are the advantages. Maybe you don't/can't take advantage of them. But that's already personal taste.

Related

Why are __sub__ and __rsub__ implemented and, in this way, for numbers.Complex

I was looking at the implementation of Complex in the numbers module and noticed __sub__ and __rsub__s implementation that looked like this:
def __sub__(self, other):
""" self - other """
return self + -other
def __rsub__(self, other):
""" other - self """
return -self + other
This confused me.
Firstly, I'm not really sure why these were implemented (guessing all subclasses of Complex can fall-back to it?) and, secondly, I can't understand why they chose to use the unary - like this for its implementation.
Any ideas?
This is a generic implementation that subclasses can use, yes, if they so desire. This is an additional goal; the primary goal of these ABC types is to be able to duck-type numeric types (see PEP 3141 – A Type Hierarchy for Numbers.
The implementation uses unary minus to avoid recursion; if you used self - other then Python uses self.__sub__(other) or self.__rsub__(other) again.
Because subtraction can be cast as addition with a unary minus operation, the authors of the ABC were able to provide you these methods as a bonus; the alternative would be to provide #abstracmethod methods instead, forcing subclasses to provide a concrete implementation. Your subclass can now, optionally, implement those methods in a different way if that is more efficient, but they don't have to.
This is a pattern used in all ABCs the standard library provides. If you take a look at the documentation for the collections.abc module you'll notice a Mixin Methods column; these are all methods the respective ABC has provided as concrete implementations that may or may not rely on the abstract methods defined by that ABC or its base classes.
Also see the general PEP 3119 – Introducing Abstract Base Classes upon which PEP 3141 is built:
Some ABCs also provide concrete (i.e. non-abstract) methods; for example, the Iterator class has an __iter__ method returning itself, fulfilling an important invariant of iterators (which in Python 2 has to be implemented anew by each iterator class). These ABCs can be considered "mix-in" classes.

Implementing __eq__ via __hash__?

Reading How to implement a good __hash__ function in python - can I not write eq as
def __eq__(self, other):
return isinstance(other, self.__class__) and hash(other) == hash(self)
def __ne__(self, other):
return not self.__eq__(other)
def __hash__(self):
return hash((self.firstfield, self.secondfield, totuple(self.thirdfield)))
? Of course I am going to implement __hash__(self) as well. I have rather clearly defined class members. I am going to turn them all into tuples and make a total tuple out of those and hash that.
Generally speaking, a hash function will have collisions. If you define equality in terms of a hash, you're running the risk of entirely dissimilar items comparing as equal, simply because they ended up with the same hash code. The only way to avoid this would be if your class only had a small, fixed number of possible instances, and you somehow ensured that each one of those had a distinct hash. If your class is simple enough for that to be practical, then it is almost certainly simple enough for you to just compare instance variables to determine equality directly. Your __hash__() implementation would have to examine all the instance variables anyway, in order to calculate a meaningful hash.
Of course you can define __eq__ and __ne__ the way you have in your example, but unless you also explicitly define __hash__, you will get a TypeError: unhashable type exception any time you try to compare equality between two objects of that type.
Since you're defining some semblance of value to your object by defining an __eq__ method, a more important question is what do you consider the value of your object to be? By looking at your code, your answer to that is, "the value of this object is it's hash". Without knowing the contents of your __hash__ method, it's impossible to evaluate the quality or validity of your __eq__ method.

Python: case where x==y and x.__eq__y() return different things. Why?

I'm taking my first computing science course, and we just learned about class implementation and inheritance. In particular, we just covered method overriding and how classes we define inherit from the object superclass by default. As one of my examples trying out this particular case of inheritance, I used the following code:
class A:
def __init__(self, i):
self.i = i
def __str__(self):
return "A"
# Commenting out these two lines to not override __eq__(), just use the
# default from our superclass, object
#def __eq__(self, other):
#return self.i == other.i
x = A(2)
y = A(2)
>>>print(x == y)
False
>>>print(x.__eq__(y))
NotImplemented
I expected the result from (x == y), because as I understand it the default for __eq__() is to check if they're the same objects or not, not worrying about the contents. Which is False, x and y have the same contents but are different objects. The second one surprised me though.
So my questions: I thought (x==y) and x.__eq__(y) were synonymous and made exactly the same call. Why do these produce differing output? And why does the second conditional return NotImplemented?
The == operator is equivalent to the eq function, which will internally call the __eq__ method of the left operand if it exists to try to determine equality. This is not the only thing it will do, and if __eq__ does not exist, as is the case here, it will do other checks, such as checking whether the two are the same object, or __cmp__ pre-Python 3.
So in a nutshell, your confusion arises from this assumption, which is incorrect:
I thought (x==y) and x.__eq__(y) were synonymous and made exactly the same call
In fact, (x==y) and operators.eq(x, y) are synonymous, and x.__eq__(y) is one of the things eq(x, y) will try to check.
The NotImplemented value you're seeing returned from your inherited __eq__ method is a special builtin value used as a sentinel in Python. It can be returned by __magic__ methods that implement mathematical or comparison operators to indicate that the class does not support the operator that was attempted (with the provided arguments).
This can be more useful than raising an exception, as it allows Python to fall back to other options to resolve the operator use. For instance, if you do x + y, Python will first try to run x.__add__(y). If that returns NotImplemented, it will next try the "reverse" version, y.__radd__(x), which may work if y is a more sophisticated type than x is.
In the case you're asking about, x == y, Python first tries x.__eq__(y), then y.__eq__(x), and finally x is y (which will always evaluate to a Boolean value). Since object.__eq__ returns NotImplemented in all cases, your class falls back to the identity comparison when you use the real operator, but shows you the NotImplemented sentinel when you call __eq__ directly.
If you have implemented the __eq__() function for a class, it gets called when you use x == y. Otherwise x == y relies on a default comparison logic. __eq__() does not get implemented automatically when you define a class.

How to eliminate a python3 deprecation warning for the equality operator?

Although the title can be interpreted as three questions, the actual problem is simple to describe. On a Linux system I have python 2.7.3 installed, and want to be warned about python 3 incompatibilities. Therefore, my code snippet (tester.py) looks like:
#!/usr/bin/python -3
class MyClass(object):
def __eq__(self, other):
return False
When I execute this code snippet (thought to be only to show the problem, not an actual piece of code I am using in my project) as
./tester.py
I get the following deprecation warning:
./tester.py:3: DeprecationWarning: Overriding __eq__ blocks inheritance of __hash__ in 3.x
class MyClass(object):
My question: How do I change this code snippet to get rid of the warning, i.e. to make it compatible to version 3? I want to implement the equality operator in the correct way, not just suppressing the warning or something similar.
From the documentation page for Python 3.4:
If a class does not define an __eq__() method it should not define a __hash__() operation either; if it defines __eq__() but not __hash__(), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).
Basically, you need to define a __hash()__ function.
The problem is that for user-defined classes, the __eq()__ and __hash()__ functions are automatically defined.
x.__hash__() returns an appropriate value such that x == y implies
both that x is y and hash(x) == hash(y).
If you define just the __eq()__, then __hash()__ is set to return None. So you will hit the wall.
The simpler way out if you don't want to bother about implementing the __hash()__ and you know for certain that your object will never be hashed, you just explicitly declare __hash__ = None which takes care of the warning.
Alex: python's -3 option is warning you about a potential problem; it doesn't know that you aren't using instances of MyClass in sets or as keys in mappings, so it warns that something that you might have been relying on wouldn't work, if you were. If you aren't using MyClass that way, just ignore the warning. It's a dumb tool to help you catch potential problems; in the end, you're expected to be the one with the actual intelligence to work out which warnings actually matter.
If you really care about suppressing the warning - or, indeed, if a class is mutable and you want to make sure it isn't used in sets or as the key in any mapping - the simple assignment __hash__ = None (as Sudipta pointed out) in the class body shall do that for you. Since None isn't callable, this makes instances non-hashable.
class MyClass (object):
def __eq__(self, other): return self is other
__hash__ = None

python bug with __le__, __ge__?

Is it me or python that is confused with the following code ? I would expect __le__ to be called by a <= ab, not __ge__:
#!/usr/bin/env python2
class B(object):
def __ge__(self, other):
print("__ge__ unexpectedly called")
class A(object):
def __le__(self, other):
print("__le__ called")
class AB(A, B):
pass
a = A()
ab = AB()
a <= ab # --> __ge__ unexpectedly called
ab <= a # --> __le__ called
I get the same behavior with python 2.7, 3.2 and pypy 1.9.
What can I do to get __le__ called instead of __ge__ ??
The short answer is that they wanted to allow AB to override the behavior from A. Python can't call AB.__lt__(a, ab), because a may not be a valid self for an AB method, so instead, it calls AB.__gt__(ab, a), which is valid.
The long answer is a bit more complicated.
According to the docs for rich comparison operators:
There are no swapped-argument versions of these methods (to be used when the left argument does not support the operation but the right argument does); rather, __lt__() and __gt__() are each other’s reflection, __le__() and __ge__() are each other’s reflection, and __eq__() and __ne__() are their own reflection.
In other words, x <= y will call y.__ge__(x) in exactly the same cases where x+y would call y.__radd__(x). To compare:
>>> class X(object):
... def __add__(self, other):
... print('X.add')
>>> class Y(object):
... def __radd__(self, other):
... print('Y.radd')
>>> class XY(X, Y):
... pass
>>> x, xy = X(), XY()
>>> x + xy
Y.radd
According to the docs for reflected operators:
These methods are called to implement the binary arithmetic operations… with reflected (swapped) operands. These functions are only called if the left operand does not support the corresponding operation and the operands are of different types…
Note: If the right operand’s type is a subclass of the left operand’s type and that subclass provides the reflected method for the operation, this method will be called before the left operand’s non-reflected method. This behavior allows subclasses to override their ancestors’ operations.
So, because XY is a subclass of X, XY.__radd__ gets preference over X.__add__. And, likewise, because AB is a subclass of A, AB.__ge__ gets preference over A.__le__.
This probably should be documented better. To figure it out, you have to ignore the parenthetical "to be used when the left argument does not support the operation but the right argument does", guess that you need to look up the normal swapped operators (there's no link, or even mention, here), then ignore the wording that says "These functions are only called if the left operand does not support the corresponding operation", and see the "Note", which contradicts what came above… Also notice that the docs explicitly say, "There are no implied relationships among the comparison operators", only a paragraph before describing the swapped cases, which imply exactly such relationships…
Finally, this case seems odd, because AB, rather than overriding __ge__ itself, just inherited it from B, which knows nothing about A and is unrelated to it. Presumably B didn't intend to have its subclasses override A's behavior. But if B were meant to be used as a mixin for A-derived classes, maybe it would intend exactly such an override. And at any rate, the rule is probably already complicated enough without getting into where each method came from in the MRO. Whatever the reasoning, where the __ge__ comes from is irrelevant; if it's there on the subclass, it gets called.
For your added final, question, "What can I do to get __le__ called instead of __ge__ ??"… well, you really can't, any more than you can get X.__add__ called instead of XY.__radd__. Of course you can always implement an AB.__ge__ (or XY.__radd__) that calls A.__le__ (or X.__add__), but it's presumably easier to just implement AB.__ge__ in such a way that it works with an A as its other argument in the first place. Alternatively, you could remove the inheritance and find some other way to model whatever you were modeling that way. Or you could explicitly call a.__le__(ab) instead of a<=ab. But otherwise, if you designed your classes in a way that takes advantage of the "no implied relationships" to do something weird, you were misled by the docs, and will have to redesign them somehow.

Categories