What's the difference between namedtuple and NamedTuple? - python

The typing module documentation says that the two code snippets below are equivalent.
from typing import NamedTuple
class Employee(NamedTuple):
name: str
id: int
and
from collections import namedtuple
Employee = namedtuple('Employee', ['name', 'id'])
Are they the exact same thing or, if not, what are the differences between the two implementations?

The type generated by subclassing typing.NamedTuple is equivalent to a collections.namedtuple, but with __annotations__, _field_types and _field_defaults attributes added. The generated code will behave the same, for all practical purposes, since nothing in Python currently acts on those typing related attributes (your IDE might use them, though).
As a developer, using the typing module for your namedtuples allows a more natural declarative interface:
You can easily specify default values for the fields (edit: in Python 3.7, collections.namedtuple got a new defaults keyword so this is no longer an advantage)
You don't need to repeat the type name twice ("Employee")
You can customize the type directly (e.g. adding a docstring or some methods)
As before, your class will be a subclass of tuple, and instances will be instances of tuple as usual. Interestingly, your class will not be a subclass of NamedTuple. If you want to know why, read on for more info about the implementation detail.
from typing import NamedTuple
class Employee(NamedTuple):
name: str
id: int
Behaviour in Python <= 3.8
>>> issubclass(Employee, NamedTuple)
False
>>> isinstance(Employee(name='guido', id=1), NamedTuple)
False
typing.NamedTuple is a class, it uses metaclasses and a custom __new__ to handle the annotations, and then it delegates to collections.namedtuple to build and return the type. As you may have guessed from the lowercased name convention, collections.namedtuple is not a type/class - it's a factory function. It works by building up a string of Python source code, and then calling exec on this string. The generated constructor is plucked out of a namespace and included in a 3-argument invocation of the metaclass type to build and return your class. This explains the weird inheritance breakage seen above, NamedTuple uses a metaclass in order to use a different metaclass to instantiate the class object.
Behaviour in Python >= 3.9
typing.NamedTuple is changed from a type (class) to a function (def)
>>> issubclass(Employee, NamedTuple)
TypeError: issubclass() arg 2 must be a class or tuple of classes
>>> isinstance(Employee(name="guido", id=1), NamedTuple)
TypeError: isinstance() arg 2 must be a type or tuple of types
Multiple inheritance using NamedTuple is now disallowed (it did not work properly in the first place).
See bpo40185 / GH-19371 for the change.

Related

Defining an interface in Python

I'm wondering whether we can use the typing package to produce the definition of an "interface", that is, a class/object in Python 3.
It seems that the usual way to define an "interface" in Python is to use an abstract classdefined using ABC, and use that as your type parameter. However, since Python is dynamically typed, a fully abstract type is an interface that is nothing more than a typing hint for python. In runtime, I would expect to have zero impact from said interface. A base class can have methods that are inherited, and that's not what I want.
I'm based a lot of this on my experience with TypeScript - it enables us to very easily define object types through interface or the type keyword, but those are only used by the type checker.
Let me make my use case clearer with an example:
Let's say I'm defining a function foo as below:
def foo(bar):
nums = [i for i in range(10)]
result = bar.oogle(nums)
return result
foo is, therefore, a method that expects to receive an instance of an object that must have a method oogle that accepts a list of integers. I want to make it clear to callers that this is what foo expects from bar, but bar can be of any type.
PEP544 introduced Protocol classes, which can be used to define interfaces.
from typing import Any, List, Protocol
class Bar(Protocol):
def oogle(self, quz: List[int]) -> Any:
...
def foo(bar: Bar):
nums = [i for i in range(10)]
result = bar.oogle(nums)
return result
If you execute your script using Python you will not see any difference though. You need to run your scripts with Mypy, which is a static type checker that supports protocol classes.

Reusing Dataclass Type Hints

I'm trying to reuse type hints from a dataclass in my function signature - that is, without having to type the signature out again.
What would be the best way of going about this?
from dataclasses import dataclass
from typing import Set, Tuple, Type
#dataclass
class MyDataClass:
force: Set[Tuple[str, float, bool]]
# I've had to write the same type annotation in the dataclass and the
# function signature - yuck
def do_something(force: Set[Tuple[str, float, bool]]):
print(force)
# I want to do something like this, where I reference the type annotation from
# the dataclass. But, doing it this way, pycharm thinks `force` is type `Any`
def do_something_2(force: Type["MyDataClass.force"]):
print(force)
What would be the best way of going about this?
PEP 484 gives one clear option for this case
Type aliases
Type aliases are defined by simple variable assignments:
(...)
Type aliases may be as complex as type hints in annotations -- anything that is acceptable as a type hint is acceptable in a type alias:
Applied to your example this would amount to (Mypy confirms this as correct)
from dataclasses import dataclass
Your_Type = set[tuple[str, float, bool]]
#dataclass
class MyDataClass:
force: Your_Type
def do_something(force: Your_Type):
print(force)
The above is written using Python 3.9 onward Generic Alias Type. The syntax is more concise and modern since typing.Set and typing.Tuple have been deprecated.
Now, fully understanding this in terms of the Python Data Model is more complicated than it may seem:
3.1. Objects, values and types
Every object has an identity, a type and a value.
Your first attempt of using Type would give an astonishing result
>>> type(MyDataClass.force)
AttributeError: type object 'MyDataClass' has no attribute 'force'
This is because the builtin function type returns a type (which is itself an object) but MyDataClass is "a Class" (a declaration) and the "Class attribute" force is on the Class not on the type object of the class where type() looks for it. Notice the Data Model carefully on the difference:
Classes
These objects normally act as factories for new instances of themselves
Class Instances
Instances of arbitrary classes
If instead you checked the type on an instance you would get the following result
>>> init_values: set = {(True, "the_str", 1.2)}
>>> a_var = MyDataClass(init_values)
>>> type(a_var)
<class '__main__.MyDataClass'>
>>> type(a_var.force)
<class 'set'>
Now lets recover the type object (not the type hints) on force by applying type() to the __anotations__ on the Class declaration object (here we see the Generic Alias type mentioned earlier). (Here we are indeed checking the type object on the class attribute force).
>>> type(MyDataClass.__annotations__['force'])
<class 'typing._GenericAlias'>
Or we could check the annotations on the Class instance, and recover the type hints as we are used to seeing them.
>>> init_values: set = {(True, "the_str", 1.2)}
>>> a_var = MyDataClass(init_values)
>>> a_var.__annotations__
{'force': set[tuple[str, float, bool]]}
I've had to write the same type annotation in the dataclass and the function signature -
For tuples annotations tend to become long literals and that justifies creating a purpose variable for conciseness. But in general explicit signatures are more descriptive and it's what most API's go for.
The typing Module
Fundamental building blocks:
Tuple, used by listing the element types, for example Tuple[int, int, str]. The empty tuple can be typed as Tuple[()]. Arbitrary-length homogeneous tuples can be expressed using one type and ellipsis, for example Tuple[int, ...]. (The ... here are part of the syntax, a literal ellipsis.)

How is the list class a subclass of collections.abc.Sequence in Python?

In Python, the list class seems to be a subclass of collections.abc.Sequence (which makes totally sense):
from collections.abc import Sequence
issubclass(list, Sequence)
# returns True
But the list type doesn't seem to inherit from Sequence:
dict.__mro__
# returns (dict, object)
So, how does this issubclass(list, Sequence) is working? How does it return True?
The collections.abc types overload the isinstance() and issubclass() operators to check if the object has the expected interface, without requiring true subclassing. See PEP 3119 which introduced abstract base classes; in particular the section on overloading those two functions for more. This is sort of the point of the collections.abc module. For example, if you define your own class that provides the expected attributes, Python will say that it is an instance/subclass of that type:
>>> class MyClass:
... __contains__ = 0
...
>>> from collections import abc
>>> issubclass(MyClass, abc.Container)
True
The point of checking if something is a subclass or instance of some other type generally is to check that it has the expected interface. In this case, MyClass does have a __contains__ attribute, so it's assumed to conform to the interface. Of course in this example it doesn't actually conform to the interface: MyClass.__contains__ is not a method, so your program will fail if you try to use a MyClass() object as a container. However, issubclass() and isinstance() will tell you that it is a container object. The usual duck-typing rules apply.

collections.Iterable vs typing.Iterable in type annotation and checking for Iterable

I found that in Python both collections.Iterable and typing.Iterable can be used in type annotation and checking for whether an object is iterable, i.e., both isinstance(obj, collections.Iterable) and isinstance(obj, typing.Iterable) works. My question is, what are the differences among them? And which one is preferred in which situations?
Due to PEP 585 - Type Hinting Generics In Standard Collections, Python's standard library container types are also able to accept a generic argument for type annotations. This includes the collections.abc.Iterable class.
When supporting only Python 3.9 or later, there is no longer any reason to use the typing.Iterable at all and importing any of these container types from typing is deprecated.
For older Python versions:
The typing.Iterable is generic, so you can say what it's an iterable of in your type annotations, e.g. Iterable[int] for an iterable of ints.
The collections iterable is an abstract base class. These can include extra mixin methods to make the interface easier to implement when you create your own subclasses.
Now it so happens that Iterable doesn't include any of these mixins, but it is part of the interface of other abstract base classes that do.
Theoretically, the typing iterable works for either, but it uses some weird metaclass magic to do it, so they don't behave exactly the same way in all cases. You really don't need generics at runtime, so there's no need to ever use it outside of type annotations and such. The collections iterable is less likely to cause problems as a superclass.
So in short, you should use the typing iterable in type annotations, but the collections iterable as a superclass.

Custom type hint annotation

I just wrote a simple #autowired decorator for Python that instantiate classes based on type annotations.
To enable lazy initialization of the class, the package provides a lazy(type_annotation: (Type, str)) function so that the caller can use it like this:
#autowired
def foo(bla, *, dep: lazy(MyClass)):
...
This works very well, under the hood this lazy function just returns a function that returns the actual type and that has a lazy_init property set to True. Also this does not break IDEs' (e.g., PyCharm) code completion feature.
But I want to enable the use of a subscriptable Lazy type use instead of the lazy function.
Like this:
#autowired
def foo(bla, *, dep: Lazy[MyClass]):
...
This would behave very much like typing.Union. And while I'm able to implement the subscriptable type, IDEs' code completion feature will be rendered useless as it will present suggestions for attributes in the Lazy class, not MyClass.
I've been working with this code:
class LazyMetaclass(type):
def __getitem__(lazy_type, type_annotation):
return lazy_type(type_annotation)
class Lazy(metaclass=LazyMetaclass):
def __init__(self, type_annotation):
self.type_annotation = type_annotation
I tried redefining Lazy.__dict__ as a property to forward to the subscripted type's __dict__ but this seems to have no effect on the code completion feature of PyCharm.
I strongly believe that what I'm trying to achieve is possible as typing.Union works well with IDEs' code completion. I've been trying to decipher what in the source code of typing.Union makes it to behave well with code completion features but with no success so far.
For the Container[Type] notation to work you would want to create a user-defined generic type:
from typing import TypeVar, Generic
T = TypeVar('T')
class Lazy(Generic[T]):
pass
You then use
def foo(bla, *, dep: Lazy[MyClass]):
and Lazy is seen as a container that holds the class.
Note: this still means the IDE sees dep as an object of type Lazy. Lazy is a container type here, holding an object of type MyClass. Your IDE won't auto-complete for the MyClass type, you can't use it that way.
The notation also doesn't create an instance of the Lazy class; it creates a subclass instead, via the GenericMeta metaclass. The subclass has a special attribute __args__ to let you introspect the subscription arguments:
>>> a = Lazy[str]
>>> issubclass(a, Lazy)
True
>>> a.__args__
(<class 'str'>,)
If all you wanted was to reach into the type annotations at runtime but resolve the name lazily, you could just support a string value:
def foo(bla, *, dep: 'MyClass'):
This is valid type annotation, and your decorator could resolve the name at runtime by using the typing.get_type_hints() function (at a deferred time, not at decoration time), or by wrapping strings in your lazy() callable at decoration time.
If lazy() is meant to flag a type to be treated differently from other type hints, then you are trying to overload the type hint annotations with some other meaning, and type hinting simply doesn't support such use cases, and using a Lazy[...] containing can't make it work.

Categories