Inconsistent implementation of collections.abc - python

I'm trying to understand collections.abc source code.
Let's take a look on Hashable class' __subclasshook__ implementation:
#classmethod
def __subclasshook__(cls, C):
if cls is Hashable:
for B in C.__mro__:
if "__hash__" in B.__dict__:
if B.__dict__["__hash__"]:
return True
break
return NotImplemented
Here we first of all check that there is property hash and than check that it has non-false value. This logic is also presented in Awaitable class.
And AsyncIterable class' __subclasshook__:
#classmethod
def __subclasshook__(cls, C):
if cls is AsyncIterable:
if any("__aiter__" in B.__dict__ for B in C.__mro__):
return True
return NotImplemented
Here we just check that there is __aiter___ property, and this logic is presented in any other classes from this package.
Is there any reason for this logic difference?

The __hash__ protocol explicitly allows flagging a class as unhashable by setting __hash__ = None.
If a class [...] wishes to suppress hash support, it should include __hash__ = None in the class definition.
The reason is that a == b always requires hash(a) == hash(b). Otherwise, dict, set and similar data structures break. If a child class changes __eq__ explicitly or otherwise, this may no longer hold true. Thus, __hash__ can be flagged as not applicable.

Related

How to check a class/type is iterable (uninstantiated)

I am inspecting type hints such as list[int] which is a GenericAlias.
If I get the origin using typing.get_origin(list[int]) or list[int].__origin__ it returns the class type list, as expected: <class 'list'>
How can I check if the class is iterable without instantiating it, or is that the only way?
The usual iter() and isinstance(object, collections.abc.Iterable) obviously don't work as they expect the instantiated object, not the class.
I saw this answer, but doesn't seem to work correctly in Python 3.10 (even when i_type variable is substituted for t).
This depends a bit on what you define as iterable.
The Collections Abstract Base Classes module considers a class to implement the Iterable protocol once it defines the __iter__ method. Note that you do not need to define the __next__ method. This is only needed, if you want to implement an Iterator. (Those two often get confused.)
A slightly broader definition in accordance with the the general notion of an iterable in the documentation also includes classes that implement __getitem__ (with integer indexes starting at 0 as Sequences do).
In practice this means that you have an iterable class, if and only if you can call the built-in iter() function with an instance of that class. That function merely calls the instance's __iter__ method, if it finds one.
If that is what you consider to be iterable as well, the most reliable way to check that I can think of is the following. We first find out if one of the classes in the method resolution order implements the desired instance method:
(Thanks #user2357112 for reminding me of checking inherited methods.)
def _implements_instance_method(cls: type, name: str) -> type | None:
"""
Checks whether a class implements a certain instance method.
Args:
cls: Class to check; superclasses (except `object`) are also checked
name: The name of the instance method that `cls` should have
Returns:
The earliest class in the MRO of `cls` implementing the instance
method with the provided `name` or `None` if none of them do.
"""
for base in cls.__mro__[:-1]: # do not check `object`
if name in base.__dict__ and callable(base.__dict__[name]):
return base
return None
That first check is self-explanatory; if it fails, we obviously don't have that method. But this is where it gets a little pedantic.
The second check actually does more than one thing. First off it ensures that that the name on our cls is defined as a method, i.e. callable. But it also ensures us against any descriptor shenanigans (to an extent). This is why we check callable(cls.__dict__[name]) and not simply callable(getattr(cls, name)).
If someone were to (for whatever reason) have a #classmethod or a #property called name, that would not fly here.
Next we write our actual iterable checking function:
def is_iterable_class(cls: type, strict: bool = True) -> bool:
"""
Returns `True` only if `cls` implements the iterable protocol.
Args:
cls:
The class to check for being iterable
strict (optional):
If `True` (default), only classes that implement (or inherit)
the `__iter__` instance method are considered iterable;
if `False`, classes supporting `__getitem__` subscripting
will be considered iterable.
-> https://docs.python.org/3/glossary.html#term-iterable
Returns:
`True` if `cls` is to be considered iterable; `False` otherwise.
"""
if not isinstance(cls, type):
return False
if _implements_instance_method(cls, "__iter__") is None:
if strict:
return False
return _implements_instance_method(cls, "__getitem__") is not None
return True
There are still a number of pitfalls here though.
A little demo:
from collections.abc import Iterable, Iterator
from typing import Generic, TypeVar
T = TypeVar("T")
class MyIter(Iterable[T]):
def __init__(self, *items: T) -> None:
self._items = items
def __iter__(self) -> Iterator[T]:
return iter(self._items)
class SubIter(MyIter[T]):
pass
class IdxIter(Generic[T]):
def __init__(self, *items: T) -> None:
self._items = items
def __getitem__(self, idx: int) -> T:
return self._items[idx]
class Foo:
__iter__ = "bar"
class Bar:
#classmethod
def __iter__(cls) -> Iterator[int]:
return iter(range(5))
class Baz:
def __iter__(self) -> int:
return 1
def _implements_instance_method(cls: type, name: str) -> type | None:
"""
Checks whether a class implements a certain instance method.
Args:
cls: Class to check; base classes (except `object`) are also checked
name: The name of the instance method that `cls` should have
Returns:
The earliest class in the MRO of `cls` implementing the instance
method with the provided `name` or `None` if none of them do.
"""
for base in cls.__mro__[:-1]: # do not check `object`
if name in base.__dict__ and callable(base.__dict__[name]):
return base
return None
def is_iterable_class(cls: type, strict: bool = True) -> bool:
"""
Returns `True` only if `cls` implements the iterable protocol.
Args:
cls:
The class to check for being iterable
strict (optional):
If `True` (default), only classes that implement (or inherit)
the `__iter__` instance method are considered iterable;
if `False`, classes supporting `__getitem__` subscripting
will be considered iterable.
-> https://docs.python.org/3/glossary.html#term-iterable
Returns:
`True` if `cls` is to be considered iterable; `False` otherwise.
"""
if not isinstance(cls, type):
return False
if _implements_instance_method(cls, "__iter__") is None:
if strict:
return False
return _implements_instance_method(cls, "__getitem__") is not None
return True
if __name__ == '__main__':
import numpy as np
print(f"{is_iterable_class(MyIter)=}")
print(f"{is_iterable_class(SubIter)=}")
print(f"{is_iterable_class(IdxIter)=}")
print(f"{is_iterable_class(IdxIter, strict=False)=}")
print(f"{is_iterable_class(Foo)=}")
print(f"{is_iterable_class(Bar)=}")
print(f"{is_iterable_class(Baz)=}")
print(f"{is_iterable_class(np.ndarray)=}")
try:
iter(np.array(1))
except TypeError as e:
print(repr(e))
The output:
is_iterable_class(MyIter)=True
is_iterable_class(SubIter)=True
is_iterable_class(IdxIter)=False
is_iterable_class(IdxIter, strict=False)=True
is_iterable_class(Foo)=False
is_iterable_class(Bar)=False
is_iterable_class(Baz)=True
is_iterable_class(np.ndarray)=True
TypeError('iteration over a 0-d array')
You should immediately notice that my function returns True for Baz even though it clearly messes up and delivers an integer instead of an Iterator. This is to demonstrate that the contract of the Iterable protocol ends at the definition of __iter__ and does not cover what it returns. Even though one would reasonably assume that it must return an Iterator, it is technically still an Iterable even if it doesn't.
Another great practical example of this was pointed out by #user2357112: The numpy.ndarray is certainly iterable, by contract and in practice in most situations. However, when it is a 0D-array (i.e. a scalar), the __iter__ method raises a TypeError because iterating over a scalar makes little sense.
The non-strict version of the function is even less practical since a class could easily and sensibly implement __getitem__, but not in the way expected by iter().
I see no way around those issues and even the Python documentation tells you that
the only reliable way to determine whether an object is iterable is to call iter(obj).
If it is actually the Iterator that you are interested in, you could of course expand the function to do the same checks done for the __iter__ method also for the __next__ method. But keep in mind that this will immediately exclude all the built-in collection types like list, dict etc. because they actually don't implement __next__. Again, referring to collections.abc, you can see that all Collection subtypes only inherit from Iterable, not from Iterator.
Hope this helps.

How to change object's type in python

from pydocs
Like its identity, an object’s type is also unchangeable. [1]
from footnote
It is possible in some cases to change an object’s type, under certain controlled conditions. It generally isn’t a good idea though, since it can lead to some very strange behaviour if it is handled incorrectly.
What are the cases when we can change the object's type and how to change it
class A:
pass
class B:
pass
a = A()
isinstance(a, A) # True
isinstance(a, B) # False
a.__class__ # __main__.A
# changing the class
a.__class__ = B
isinstance(a, A) # False
isinstance(a, B) # True
a.__class__ # __main__.B
However, I can't recall real-world examples where it can be helpful. Usually, class manipulations are done at the moment of creating the class (not an object of the class) by decorators or metaclasses. For example, dataclasses.dataclass is a decorator that takes a class and constructs another class on the base of it (see the source code).

How to force a python object to be of some specific other type for isinstance?

Assume
class A(object):
def init(self):
pass
and
o = object()
I want to force o to be of type A such that
isinstance(o, A) == True
is truthy.
Can this be done?
Note i am interested in both 2.7 and 3+ solutions.
You need to change the class of your object. This is not usually recommendend, as the documentation points out:
It is possible in some cases to change an object’s type, under certain controlled conditions. It generally isn’t a good idea though, since it can lead to some very strange behaviour if it is handled incorrectly
If you had to "upcast" objects (upcast doesn't exist in Python) using two classes that you created
class A:
pass
class B(A):
pass
it would be easy to do what you're asking for:
a = A()
isinstance(a, B) # False
a.__class__ = B
isinstance(a, B) # True
However, this can't work with objects of type object. The documentation clearly says that
object does not have a __dict__, so you can’t assign arbitrary attributes to an instance of the object class.
where __dict__ is a dictionary or other mapping object used to store an object’s writable attributes.
Indeed:
o = object()
o.__class__ = A
# __class__ assignment only supported for heap types or ModuleType subclasses

How does automatic inheritance from collections.Callable (and other classes) work?

This is more of a Python 2 question, but I'm curious about whether there are any differences in Python 3 as well.
I noticed that when creating certain methods on class (whether they are new-style or not), Python automatically decides that those classes are instances of some classes from the collections module. My example below demonstrates this with collections.Callable.
>>> import collections
>>> class A:
def __call__(self):
print "A"
>>> a = A()
>>> isinstance(a, collections.Callable)
True
>>> class A(object):
def __call__(self):
print "A"
>>> a = A()
>>> isinstance(a, collections.Callable)
True
>>> class A(object):
pass
>>> a = A()
>>> isinstance(a, collections.Callable)
False
>>> class A:
pass
>>> a = A()
>>> isinstance(a, collections.Callable)
False
You'll notice that I haven't explicitly made any of those classes inherit from collections.Callable, but for some reason they all do as long as they create a __call__ method. Is this done for a specific purpose and is it well defined somewhere? Is Python automatically giving classes certain base classes just for defining methods or is something else going on?
You'll get similar results for collections.Iterable and the __iter__ method and some other special methods as well.
The ABCMeta class implements hooks to customize instance and subclass checks; both __instancecheck__() and __subclasscheck__() are provided.
These hooks delegate to the ABCMeta.__subclasshook__() method:
__subclasshook__(subclass)
Check whether subclass is considered a subclass of this ABC. This means that you can customize the behavior of issubclass further without the need to call register() on every class you want to consider a subclass of the ABC.
This hook then can check if a given subclass implements the expected methods; for Callable the implementation simply has to see if there is a __call__ method present:
#classmethod
def __subclasshook__(cls, C):
if cls is Callable:
if _hasattr(C, "__call__"):
return True
return NotImplemented
This is restricted to the specific Callable class; subclasses will have to implement their own version as you most likely would add methods.

Why/When in Python does `x==y` call `y.__eq__(x)`?

The Python docs clearly state that x==y calls x.__eq__(y). However it seems that under many circumstances, the opposite is true. Where is it documented when or why this happens, and how can I work out for sure whether my object's __cmp__ or __eq__ methods are going to get called.
Edit: Just to clarify, I know that __eq__ is called in preferecne to __cmp__, but I'm not clear why y.__eq__(x) is called in preference to x.__eq__(y), when the latter is what the docs state will happen.
>>> class TestCmp(object):
... def __cmp__(self, other):
... print "__cmp__ got called"
... return 0
...
>>> class TestEq(object):
... def __eq__(self, other):
... print "__eq__ got called"
... return True
...
>>> tc = TestCmp()
>>> te = TestEq()
>>>
>>> 1 == tc
__cmp__ got called
True
>>> tc == 1
__cmp__ got called
True
>>>
>>> 1 == te
__eq__ got called
True
>>> te == 1
__eq__ got called
True
>>>
>>> class TestStrCmp(str):
... def __new__(cls, value):
... return str.__new__(cls, value)
...
... def __cmp__(self, other):
... print "__cmp__ got called"
... return 0
...
>>> class TestStrEq(str):
... def __new__(cls, value):
... return str.__new__(cls, value)
...
... def __eq__(self, other):
... print "__eq__ got called"
... return True
...
>>> tsc = TestStrCmp("a")
>>> tse = TestStrEq("a")
>>>
>>> "b" == tsc
False
>>> tsc == "b"
False
>>>
>>> "b" == tse
__eq__ got called
True
>>> tse == "b"
__eq__ got called
True
Edit: From Mark Dickinson's answer and comment it would appear that:
Rich comparison overrides __cmp__
__eq__ is it's own __rop__ to it's __op__ (and similar for __lt__, __ge__, etc)
If the left object is a builtin or new-style class, and the right is a subclass of it, the right object's __rop__ is tried before the left object's __op__
This explains the behaviour in theTestStrCmp examples. TestStrCmp is a subclass of str but doesn't implement its own __eq__ so the __eq__ of str takes precedence in both cases (ie tsc == "b" calls b.__eq__(tsc) as an __rop__ because of rule 1).
In the TestStrEq examples, tse.__eq__ is called in both instances because TestStrEq is a subclass of str and so it is called in preference.
In the TestEq examples, TestEq implements __eq__ and int doesn't so __eq__ gets called both times (rule 1).
But I still don't understand the very first example with TestCmp. tc is not a subclass on int so AFAICT 1.__cmp__(tc) should be called, but isn't.
You're missing a key exception to the usual behaviour: when the right-hand operand is an instance of a subclass of the class of the left-hand operand, the special method for the right-hand operand is called first.
See the documentation at:
http://docs.python.org/reference/datamodel.html#coercion-rules
and in particular, the following two paragraphs:
For objects x and y, first
x.__op__(y) is tried. If this is not
implemented or returns
NotImplemented, y.__rop__(x) is
tried. If this is also not implemented
or returns NotImplemented, a
TypeError exception is raised. But see
the following exception:
Exception to the previous item: if the
left operand is an instance of a
built-in type or a new-style class,
and the right operand is an instance
of a proper subclass of that type or
class and overrides the base’s
__rop__() method, the right
operand’s __rop__() method is tried
before the left operand’s __op__()
method.
Actually, in the docs, it states:
[__cmp__ is c]alled by comparison operations if rich comparison (see above) is not defined.
__eq__ is a rich comparison method and, in the case of TestCmp, is not defined, hence the calling of __cmp__
As I know, __eq__() is a so-called “rich comparison” method, and is called for comparison operators in preference to __cmp__() below. __cmp__() is called if "rich comparison" is not defined.
So in A == B:
If __eq__() is defined in A it will be called
Else __cmp__() will be called
__eq__() defined in 'str' so your __cmp__() function was not called.
The same rule is for __ne__(), __gt__(), __ge__(), __lt__() and __le__() "rich comparison" methods.
Is this not documented in the Language Reference? Just from a quick look there, it looks like __cmp__ is ignored when __eq__, __lt__, etc are defined. I'm understanding that to include the case where __eq__ is defined on a parent class. str.__eq__ is already defined so __cmp__ on its subclasses will be ignored. object.__eq__ etc are not defined so __cmp__ on its subclasses will be honored.
In response to the clarified question:
I know that __eq__ is called in
preferecne to __cmp__, but I'm not
clear why y.__eq__(x) is called in
preference to x.__eq__(y), when the
latter is what the docs state will
happen.
Docs say x.__eq__(y) will be called first, but it has the option to return NotImplemented in which case y.__eq__(x) is called. I'm not sure why you're confident something different is going on here.
Which case are you specifically puzzled about? I'm understanding you just to be puzzled about the "b" == tsc and tsc == "b" cases, correct? In either case, str.__eq__(onething, otherthing) is being called. Since you don't override the __eq__ method in TestStrCmp, eventually you're just relying on the base string method and it's saying the objects aren't equal.
Without knowing the implementation details of str.__eq__, I don't know whether ("b").__eq__(tsc) will return NotImplemented and give tsc a chance to handle the equality test. But even if it did, the way you have TestStrCmp defined, you're still going to get a false result.
So it's not clear what you're seeing here that's unexpected.
Perhaps what's happening is that Python is preferring __eq__ to __cmp__ if it's defined on either of the objects being compared, whereas you were expecting __cmp__ on the leftmost object to have priority over __eq__ on the righthand object. Is that it?

Categories