Why is NotImplemented truthy in Python 3? - python

This question is spurred from the answers and discussions of this question. The following snippet shows the crux of the question:
>>> bool(NotImplemented)
True
The questions I have are the following:
Why was it decided that the bool value of NotImplemented should be True? It feels unpythonic.
Is there a good reason I am unaware of? The documentation seems to just say, "because it is".
Are there any examples where this is used in a reasonable manner?
Reasoning behind why I believe it's unintuitive (please disregard the lack of best practice):
>>> class A:
... def something(self):
... return NotImplemented
...
>>> a = A()
>>> a.something()
NotImplemented
>>> if a.something():
... print("this is unintuitive")
...
this is unintuitive
It seems an odd behavior that something with such a negative connotation (lack of implementation) would be considered truthy.
Relevant text from:
NotImplemented
Special value which should be returned by the binary special methods (e.g. __eq__(), __lt__(), __add__(), __rsub__(), etc.) to indicate that the operation is not implemented with respect to the other type; may be returned by the in-place binary special methods (e.g. __imul__(), __iand__(), etc.) for the same purpose. Its truth value is true.
— From the Python Docs
Edit 1
To clarify my position, I feel that NotImplemented being able to evaluate to a boolean is an anti-pattern by itself. I feel like an Exception makes more sense, but the prevailing idea is that the constant singleton was chosen for performance reasons when evaluating comparisons between different objects. I suppose I'm looking for convincing reasons as to why this is "the way" that was chosen.

By default, an object is considered truthy (bool(obj) == True) unless its class provides a way to override its truthiness. In the case of NotImplemented, no one has ever provided a compelling use-case for bool(NotImplemented) to return False, and so <class 'NotImplementedType'> has never provided an override.

As the accepted answer already explains, all classes in python are considered truthy (bool(obj) returns True) unless they specifically change that via Truth Value Testing. It makes sense in some cases to override that, like an empty list, 0, or False (see a good list here).
However there is no compelling case for NotImplemented to be falsy. It's a special value used by the interpreter, it should only be returned by special methods, and shouldn't reach regular python code.
Special value which should be returned by the binary special methods (e.g. __eq__(), __lt__(), __add__(), __rsub__(), etc.) to indicate that the operation is not implemented with respect to the other type.
Incorrectly returning NotImplemented will result in a misleading error message or the NotImplemented value being returned to Python code.
It's used by the interpreter to choose between methods, or to otherwise influence behaviour, as is the case with with the comparison operator == or bool() itself (it checks __bool__ first and then, if it returns NotImplemented, __len__).
Note that a NotImplementedError exception exists, presumably for when it actually is an error that an operation isn't implemented. In your specific example, something of class A should probably raise this exception instead.

Related

What is the canonical way of checking if two objects are of one specific class

Is there a pythonic way to check if two objects are of one specific type? I suppose this is more of academic interest than actual code necessity.
Obviously using two isinstance() would work, and the triple equation does the trick as well, but I was wondering if there is a one-liner that would work and be pythonic as well.
if type(obj1) == type(obj2) == MyClass:
#DoSomething
since this is equivalent to
if (type(obj1) == type(obj2)) and (type(obj2) == MyClass):
#DoSomething
the left check will raise a PEP8 warning to use isinstance()
You could do this if you wanted which may be useful as the number of objects increases in size (disclaimer: not worth it for 2 objects, probably worth it for 4+ objects):
>>> a = [1,2]
>>> b = [1,2]
>>> all(map(lambda x: isinstance(x, list), [a,b]))
True
Docs: all, map, lambda
That depends on whether you want to consider two objects to be of the same class if one is an instance of a subclass of the class you're looking for.
So for instance if you have one = classA() and two = classB(), where classB is a subclass of classA. Your first approach, where you had:
type(one) == type(two) == classA
will evaluate to False. But using isInstance like so:
isInstance(one, classA) and isInstance(two, classA)
will evaluate to True.
What that PEP8 is about is to warn you of a gotcha that is so common, it the Python folks decided that the programmer needed to be warned about it. It is almost always more useful to compare classes via isInstance, to the point where if the interpreter sees type(one) == type(two) it will assume that you're not intending to do the unusual thing, you're intending to do the more commonly useful isInstance comparison, but that you're doing it wrong.
Long story short, you're most likely going to want to use isInstance, but without knowing about your situation I can't say for absolutely certain that you don't want to use type.

When comparing types/classes, is it "safe" to use the 'is' operator instead of the '==' operator? [duplicate]

This question already has answers here:
Are classobjects singletons?
(2 answers)
Closed 5 years ago.
I don't think this is a duplicate, as I'm asking about is vs == in the specific case of comparing types, but please let me know and I will remove the question.
I know that the is operator in python really translates to id(type(a))==id(<type>); but so far, I've found type(a) is <type> seems to give predictable results. My question is, would using the is operator ever yield an unexpected result (i.e. something like 'foo' is str returning False)? Or does python store type classes at predictable locations, so that is will always give the same result as ==? I find is is somewhat more readable in this context.
Note this case is if I'm not dealing with inherited classes/subclasses (in which case isinstance would be suitable).
No, using is always does the same thing, it compares (in (C)Python) the address of the objects you've supplied. No redefining of is possible so you always get the same behavior.
If two objects a and b have the same object as their type, using type(a) is type(b) will always return True. In reverse, if type(a) is type(b) it is guaranteed that their types will match.
== on the other hand can lead to unexpected results if someone comes around and defines an __eq__ that does silly things:
class MetaFoo(type):
def __eq__(self, other):
return False
class Foo(metaclass=MetaFoo):
pass
f1, f2 = Foo(), Foo()
Now:
type(f1) == type(f2)
False
but:
type(f1) is type(f2)
True
People won't do something like that though, it's plain silly. So type(f1) == type(f2) doesn't guarantee anything (for non built-ins, that is).
In general, if you care that they are the exact same object (as noted in a comment) use is, if you care that they have been designed to behave the same way (for which you'd expect an equivalent __eq__ to be implemented) use ==.

Can some operators in Python not be overloaded properly?

I am studying Scott Meyers' More Effective C++. Item 7 advises to never overload && and ||, because their short-circuit behavior cannot be replicated when the operators are turned into function calls (or is this no longer the case?).
As operators can also be overloaded in Python, I am curious whether this situation exists there as well. Is there any operator in Python (2.x, 3.x) that, when overridden, cannot be given its original meaning?
Here is an example of 'original meaning'
class MyInt {
public:
MyInt operator+(MyInt &m) {
return MyInt(this.val + m.val);
};
int val;
MyInt(int v) : val(v){}
}
Exactly the same rationale applies to Python. You shouldn't (and can't) overload and and or, because their short-circuiting behavior cannot be expressed in terms of functions. not isn't permitted either - I guess this is because there's no guarantee that it will be invoked at all.
As pointed out in the comments, the proposal to allow the overloading of logical and and or was officially rejected.
The assignment operator can also not be overloaded.
class Thing: ...
thing = Thing()
thing = 'something else'
There is nothing you can override in Thing to change the behavior of the = operator.
(You can overload property assignment though.)
In Python, all object methods that represent operators are treated "equal": their precedences are described in the language model, and there is no conflict with overriding any.
But both C++ "&&" and "||" - in Python "and" and "or" - are not available in Python as object methods to start with - they check for the object truthfulness, though - which is defined by __bool__. If __bool__is not implemented, Python check for a __len__ method, and check if its output is zero, in which case the object's truth value is False. In all other cases its truth value is True. That makes it for any semantic problems that would arise from combining overriding with the short-circuiting behavior.
Note one can override & and | by implementing __and__ and __or__ with no problems.
As for the other operators, although not directly related, one should just take care with __getattribute__ - the method called when retrieving any attribute from an object (we normally don't mention it as an operator) - including calls from within itself. The __getattr__ is also in place, and is just invoked at the end of the attribute search chain, when an attribute is not found.

How to write __getitem__ cleanly?

In Python, when implementing a sequence type, I often (relatively speaking) find myself writing code like this:
class FooSequence(collections.abc.Sequence):
# Snip other methods
def __getitem__(self, key):
if isinstance(key, int):
# Get a single item
elif isinstance(key, slice):
# Get a whole slice
else:
raise TypeError('Index must be int, not {}'.format(type(key).__name__))
The code checks the type of its argument explicitly with isinstance(). This is regarded as an antipattern within the Python community. How do I avoid it?
I cannot use functools.singledispatch, because that's quite deliberately incompatible with methods (it will attempt to dispatch on self, which is entirely useless since we're already dispatching on self via OOP polymorphism). It works with #staticmethod, but what if I need to get stuff out of self?
Casting to int() and then catching the TypeError, checking for a slice, and possibly re-raising is still ugly, though perhaps slightly less so.
It might be cleaner to convert integers into one-element slices and handle both situations with the same code, but that has its own problems (return 0 or [0]?).
As much as it seems odd, I suspect that the way you have it is the best way to go about things. Patterns generally exist to encompass common use cases, but that doesn't mean that they should be taken as gospel when following them makes life more difficult. The main reason that PEP 443 gives for balking at explicit typechecking is that it is "brittle and closed to extension". However, that mainly applies to custom functions that take a number of different types at any time. From the Python docs on __getitem__:
For sequence types, the accepted keys should be integers and slice objects. Note that the special interpretation of negative indexes (if the class wishes to emulate a sequence type) is up to the __getitem__() method. If key is of an inappropriate type, TypeError may be raised; if of a value outside the set of indexes for the sequence (after any special interpretation of negative values), IndexError should be raised. For mapping types, if key is missing (not in the container), KeyError should be raised.
The Python documentation explicitly states the two types that should be accepted, and what to do if an item that is not of those two types is provided. Given that the types are provided by the documentation itself, it's unlikely to change (doing so would break far more implementations than just yours), so it's likely not worth the trouble to go out of your way to code against Python itself potentially changing.
If you're set on avoiding explicit typechecking, I would point you toward this SO answer. It contains a concise implementation of a #methdispatch decorator (not my name, but i'll roll with it) that lets #singledispatch work with methods by forcing it to check args[1] (arg) rather than args[0] (self). Using that should allow you to use custom single dispatch with your __getitem__ method.
Whether or not you consider either of these "pythonic" is up to you, but remember that while The Zen of Python notes that "Special cases aren't special enough to break the rules", it then immediately notes that "practicality beats purity". In this case, just checking for the two types that the documentation explicitly states are the only things __getitem__ should support seems like the practical way to me.
The antipattern is for code to do explicit type checking, which means using the type() function. Why? Because then a subclass of the target type will no longer work. For instance, __getitem__ can use an int, but using type() to check for it means an int-subclass, which would work, will fail only because type() does not return int.
When a type-check is necessary, isinstance is the appropriate way to do it as it does not exclude subclasses.
When writing __dunder__ methods, type checking is necessary and expected -- using isinstance().
In other words, your code is perfectly Pythonic, and its only problem is the error message (it doesn't mention slices).
I'm not aware of a way to avoid doing it once. That's just the tradeoff of using a dynamically-typed language in this way. However, that doesn't mean you have to do it over and over again. I would solve it once by creating an abstract class with split out method names, then inherit from that class instead of directly from Sequence, like:
class UnannoyingSequence(collections.abc.Sequence):
def __getitem__(self, key):
if isinstance(key, int):
return self.getitem(key)
elif isinstance(key, slice):
return self.getslice(key)
else:
raise TypeError('Index must be int, not {}'.format(type(key).__name__))
# default implementation in terms of getitem
def getslice(self, key):
# Get a whole slice
class FooSequence(UnannoyingSequence):
def getitem(self, key):
# Get a single item
# optional efficient, type-specific implementation not in terms of getitem
def getslice(self, key):
# Get a whole slice
This cleans up FooSequence enough that I might even do it this way if I only had the one derived class. I'm sort of surprised the standard library doesn't already work that way.
To stay pythonic, you have work with the semantics rather than the type of the objects. So if you have some parameter as accessor to a sequence, just use it like that. Use the abstraction for a parameter as long as possible. If you expect a set of user identifiers, do not expect a set, but rather some data structure with a method add. If you expect some text, do not expect a unicode object, but rather some container for characters featuring encode and decode methods.
I assume in general you want to do something like "Use the behavior of the base implementation unless some special value is provided. If you want to implement __getitem__, you can use a case distinction where something different happens if one special value is provided. I'd use the following pattern:
class FooSequence(collections.abc.Sequence):
# Snip other methods
def __getitem__(self, key):
try:
if key == SPECIAL_VALUE:
return SOMETHING_SPECIAL
else:
return self.our_baseclass_instance[key]
except AttributeError:
raise TypeError('Wrong type: {}'.format(type(key).__name__))
If you want to distinguish between a single value (in perl terminology "scalar") and a sequence (in Java terminology "collection"), then it is pythonically fine to determine whether an iterator is implemented. You can either use a try-catch pattern or hasattr as I do now:
>>> a = 42
>>> b = [1, 3, 5, 7]
>>> c = slice(1, 42)
>>> hasattr(a, "__iter__")
False
>>> hasattr(b, "__iter__")
True
>>> hasattr(c, "__iter__")
False
>>>
Applied to our example:
class FooSequence(collections.abc.Sequence):
# Snip other methods
def __getitem__(self, key):
try:
if hasattr(key, "__iter__"):
return map(lambda x: WHATEVER(x), key)
else:
return self.our_baseclass_instance[key]
except AttributeError:
raise TypeError('Wrong type: {}'.format(type(key).__name__))
Dynamic programming languages like python and ruby use duck typing. And a duck is an animal, that walks like a duck, swims like a duck and quacks like a duck. Not because somebody calls it a "duck".

boolean and type checking in python vs numpy

I ran into unexpected results in a python if clause today:
import numpy
if numpy.allclose(6.0, 6.1, rtol=0, atol=0.5):
print 'close enough' # works as expected (prints message)
if numpy.allclose(6.0, 6.1, rtol=0, atol=0.5) is True:
print 'close enough' # does NOT work as expected (prints nothing)
After some poking around (i.e., this question, and in particular this answer), I understand the cause: the type returned by numpy.allclose() is numpy.bool_ rather than plain old bool, and apparently if foo = numpy.bool_(1), then if foo will evaluate to True while if foo is True will evaluate to False. This appears to be the work of the is operator.
My questions are: why does numpy have its own boolean type, and what is best practice in light of this situation? I can get away with writing if foo: to get expected behavior in the example above, but I like the more stringent if foo is True: because it excludes things like 2 and [2] from returning True, and sometimes the explicit type check is desirable.
You're doing something which is considered an anti-pattern. Quoting PEP 8:
Don't compare boolean values to True or False using ==.
Yes: if greeting:
No: if greeting == True:
Worse: if greeting is True:
The fact that numpy wasn't designed to facilitate your non-pythonic code isn't a bug in numpy. In fact, it's a perfect example of why your personal idiom is an anti-pattern.
As PEP 8 says, using is True is even worse than == True. Why? Because you're checking object identity: not only must the result be truthy in a boolean context (which is usually all you need), and equal to the boolean True value, it has to actually be the constant True. It's hard to imagine any situation in which this is what you want.
And you specifically don't want it here:
>>> np.True_ == True
True
>>> np.True_ is True
False
So, all you're doing is explicitly making your code incompatible with numpy, and various other C extension libraries (conceivably a pure-Python library could return a custom value that's equal to True, but I don't know of any that do so).
In your particular case, there is no reason to exclude 2 and [2]. If you read the docs for numpy.allclose, it clearly isn't going to return them. But consider some other function, like many of those in the standard library that just say they evaluate to true or to false. That means they're explicitly allowed to return one of their truthy arguments, and often will do so. Why would you want to consider that false?
Finally, why would numpy, or any other C extension library, define such bool-compatible-but-not-bool types?
In general, it's because they're wrapping a C int or a C++ bool or some other such type. In numpy's case, it's wrapping a value that may be stored in a fastest-machine-word type or a single byte (maybe even a single bit in some cases) as appropriate for performance, and your code doesn't have to care which, because all representations look the same, including being truthy and equal to the True constant.
why does numpy have its own boolean type
Space and speed. Numpy stores things in compact arrays; if it can fit a boolean into a single byte it'll try. You can't easily do this with Python objects, as you have to store references which slows calculations down significantly.
I can get away with writing if foo: to get expected behavior in the example above, but I like the more stringent if foo is True: because it excludes things like 2 and [2] from returning True, and sometimes the explicit type check is desirable.
Well, don't do that.

Categories