mypy - Type-hint new attributes - python

I often use dict to group and namespace related data. Two drawbacks are:
I cannot type-hint individual entries (e.g. x['s']: str = ''). Accessing union-typed values (e.g. x: dict[str, str | None] = {}) later needs assert statements to please mypy.
Spelling entries is verbose. Values mapped to str keys need four extra characters (i.e. ['']); attributes only need one (i.e. .).
I've considered types.SimpleNamespace. However, like with classes, I run into this mypy error:
import types
x = types.SimpleNamespace()
x.s: str = ''
# 3 col 2 error| Type cannot be declared in assignment to non-self attribute [python/mypy]
Is there a way to type-hint attributes added after instantiation?
If not, what other structures should I consider? As with dict and unlike collections.namedtuple, I require mutability.

There is no way to type-hint attributes that are not defined inside class body or __init__.
You need to declare some sort of structure with known fields or keys and then use it. You have a whole bunch of options. First things to consider (as most similar to your existing attempt) are TypedDict and dataclass. TypedDict does no runtime validation and is just a plain dictionary during code execution (no key/value restrictions apply). dataclass will create an __init__ for you, but you'll be able to set any attributes later (without annotation, invisible for mypy). With dataclass(slots=True), it will be impossible.
Let me show some examples:
from typing import TypedDict
class MyStructure(TypedDict):
foo: str
data: MyStructure = {'foo': 'bar'}
reveal_type(data['foo']) # N: revealed type is "builtins.str"
data['foo'] = 'baz' # OK, mutable
data['foo'] = 1 # E: Value of "foo" has incompatible type "int"; expected "str" [typeddict-item]
data['bar'] # E: TypedDict "MyStructure" has no key "bar" [typeddict-item]
# Second option
from dataclasses import dataclass
#dataclass
class MyStructure2:
foo: str
data2 = MyStructure2(foo='bar')
reveal_type(data2.foo) # N: Revealed type is "builtins.str"
data2.foo = 'baz' # OK, mutable
data2.foo = 1 # E: Incompatible types in assignment (expression has type "int", variable has type "str") [assignment]
data2.bar # E: "MyStructure2" has no attribute "bar" [attr-defined]
Here's a playground link.

Related

Infer type hints for dictionary structure from class attributes definition

I'm trying to figure out a way to create precise types for a function that returns class attributes as an immutable dictionary.
For the sake of example, let's say we have a foreign class called "A", by foreign I mean it comes from a library that I don't own, but it has typing stubs such as:
class A:
name: str
content: str
age: int
Now, I want to create a function that returns dictionary, so that its shape is exactly as that class' available attributes. I am able to get list of fields and their corresponding types through typing.get_type_hints(), such as:
>>> from typing import get_type_hints
>>> obj = A()
>>> get_type_hints(obj)
{'name': <class 'str'>, 'content': <class 'str'>, 'age': <class 'int'>}
Now, what I'd like to do is to create a function that is fully and strongly typed:
def convert_object_to_dict(_obj: A) -> frozendict[str, Any]:
... # Let's ignore the implementation detail here, this is just example
(I know frozendict is not a type, but let's assume it is)
If I call it with an object that has all the fields defined, it should return a dictionary that has three keys, with their names exactly as early defined class A, for the sake of argument let's assume that it will always return all the attributes defined in a class.
The obvious return type that I can use is frozendict[str, Any], however, this doesn't tell mypy nor IDE what keys to expect. I could perhaps make it more precise, by using Union of all the types I expect, but again, that means I have to maintain it and feels redundant if get_type_hints is able to tell me exactly what is there. Alternative solution could be making TypedDict of the same shape as that class, but this would be a nightmare to maintain and would make typing more annoying than useful.
What I'd like to do is, something similar to this:
def convert_object_to_dict(_obj: A) -> TypedDict[A]:
...
So that if I call this function, mypy will be able to tell what shape the dictionary is, and so what keys should be possible to be used with other functions later on. It should raise an error if I try to use a key which doesn't exist in some later full typed contexts.
TypedDict[A] in this case, should create a TypedDict, whose fields are exactly as these available in class A.
Is it possible to do with MyPy / Python typing at all?

Reusing Dataclass Type Hints

I'm trying to reuse type hints from a dataclass in my function signature - that is, without having to type the signature out again.
What would be the best way of going about this?
from dataclasses import dataclass
from typing import Set, Tuple, Type
#dataclass
class MyDataClass:
force: Set[Tuple[str, float, bool]]
# I've had to write the same type annotation in the dataclass and the
# function signature - yuck
def do_something(force: Set[Tuple[str, float, bool]]):
print(force)
# I want to do something like this, where I reference the type annotation from
# the dataclass. But, doing it this way, pycharm thinks `force` is type `Any`
def do_something_2(force: Type["MyDataClass.force"]):
print(force)
What would be the best way of going about this?
PEP 484 gives one clear option for this case
Type aliases
Type aliases are defined by simple variable assignments:
(...)
Type aliases may be as complex as type hints in annotations -- anything that is acceptable as a type hint is acceptable in a type alias:
Applied to your example this would amount to (Mypy confirms this as correct)
from dataclasses import dataclass
Your_Type = set[tuple[str, float, bool]]
#dataclass
class MyDataClass:
force: Your_Type
def do_something(force: Your_Type):
print(force)
The above is written using Python 3.9 onward Generic Alias Type. The syntax is more concise and modern since typing.Set and typing.Tuple have been deprecated.
Now, fully understanding this in terms of the Python Data Model is more complicated than it may seem:
3.1. Objects, values and types
Every object has an identity, a type and a value.
Your first attempt of using Type would give an astonishing result
>>> type(MyDataClass.force)
AttributeError: type object 'MyDataClass' has no attribute 'force'
This is because the builtin function type returns a type (which is itself an object) but MyDataClass is "a Class" (a declaration) and the "Class attribute" force is on the Class not on the type object of the class where type() looks for it. Notice the Data Model carefully on the difference:
Classes
These objects normally act as factories for new instances of themselves
Class Instances
Instances of arbitrary classes
If instead you checked the type on an instance you would get the following result
>>> init_values: set = {(True, "the_str", 1.2)}
>>> a_var = MyDataClass(init_values)
>>> type(a_var)
<class '__main__.MyDataClass'>
>>> type(a_var.force)
<class 'set'>
Now lets recover the type object (not the type hints) on force by applying type() to the __anotations__ on the Class declaration object (here we see the Generic Alias type mentioned earlier). (Here we are indeed checking the type object on the class attribute force).
>>> type(MyDataClass.__annotations__['force'])
<class 'typing._GenericAlias'>
Or we could check the annotations on the Class instance, and recover the type hints as we are used to seeing them.
>>> init_values: set = {(True, "the_str", 1.2)}
>>> a_var = MyDataClass(init_values)
>>> a_var.__annotations__
{'force': set[tuple[str, float, bool]]}
I've had to write the same type annotation in the dataclass and the function signature -
For tuples annotations tend to become long literals and that justifies creating a purpose variable for conciseness. But in general explicit signatures are more descriptive and it's what most API's go for.
The typing Module
Fundamental building blocks:
Tuple, used by listing the element types, for example Tuple[int, int, str]. The empty tuple can be typed as Tuple[()]. Arbitrary-length homogeneous tuples can be expressed using one type and ellipsis, for example Tuple[int, ...]. (The ... here are part of the syntax, a literal ellipsis.)

How to give a Pydantic list field a default value?

I want to create a Pydantic model in which there is a list field, which left uninitialized has a default value of an empty list. Is there an idiomatic way to do this?
For Python's built-in dataclass objects you can use field(default_factory=list), however in my own experiments this seems to prevent my Pydantic models from being pickled. A naive implementation might be, something like this:
from pydantic import BaseModel
class Foo(BaseModel):
defaulted_list_field: Sequence[str] = [] # Bad!
But we all know not to use a mutable value like the empty-list literal as a default.
So what's the correct way to give a Pydantic list-field a default value?
For pydantic you can use mutable default value, like:
class Foo(BaseModel):
defaulted_list_field: List[str] = []
f1, f2 = Foo(), Foo()
f1.defaulted_list_field.append("hey!")
print(f1) # defaulted_list_field=['hey!']
print(f2) # defaulted_list_field=[]
It will be handled correctly (deep copy) and each model instance will have its own empty list.
Pydantic also has default_factory parameter. In the case of an empty list, the result will be identical, it is rather used when declaring a field with a default value, you may want it to be dynamic (i.e. different for each model).
from typing import List
from pydantic import BaseModel, Field
from uuid import UUID, uuid4
class Foo(BaseModel):
defaulted_list_field: List[str] = Field(default_factory=list)
uid: UUID = Field(default_factory=uuid4)
While reviewing my colleague's merge request I saw the usage of a mutable object as a default argument and pointed that out. To my surprise, it works as if have done a deepcopy of the object. I found an example in the project's readme, but without any clarification. And suddenly realized that developers constantly ignore this question for a long time (see links at the bottom).
Indeed, you can write something like this. And expect correct behavior:
from pydantic import BaseModel
class Foo(BaseModel):
defaulted_list_field: List[str] = []
But what happens underhood?
We need to go deeper...
After a quick search through the source code I found this:
class ModelField(Representation):
...
def get_default(self) -> Any:
return smart_deepcopy(self.default) if self.default_factory is None else self.default_factory()
While smart_deepcopy function is:
def smart_deepcopy(obj: Obj) -> Obj:
"""
Return type as is for immutable built-in types
Use obj.copy() for built-in empty collections
Use copy.deepcopy() for non-empty collections and unknown objects
"""
obj_type = obj.__class__
if obj_type in IMMUTABLE_NON_COLLECTIONS_TYPES:
return obj # fastest case: obj is immutable and not collection therefore will not be copied anyway
try:
if not obj and obj_type in BUILTIN_COLLECTIONS:
# faster way for empty collections, no need to copy its members
return obj if obj_type is tuple else obj.copy() # type: ignore # tuple doesn't have copy method
except (TypeError, ValueError, RuntimeError):
# do we really dare to catch ALL errors? Seems a bit risky
pass
return deepcopy(obj) # slowest way when we actually might need a deepcopy
Also, as mentioned in the comments you can not use mutable defaults in databases attributes declaration directly (use default_factory instead). So this example is not valid:
from pydantic.dataclasses import dataclass
#dataclass
class Foo:
bar: list = []
And gives:
ValueError: mutable default <class 'list'> for field bar is not allowed: use default_factory
Links to open discussions (no answers so far):
Why isn't mutable default value (field = List[int] = []) a documented feature?
How does pydantic.BaseModel handle mutable default args?

Define a custom Type that behaves like typing.Any

I need to create a Type that behaves like typing.Any when looked at by the type checker (mypy), but is distinguishable from typing.Any.
The use case is some pretty "meta" code that needs to find the variable that is annotated with this type out of a set of other variables that could be annotated with typing.Any.
Note that I will never have to actually make an instance of this Type, I just need it for type annotations in the context of dataclasses.
Example:
from dataclasses import dataclass, fields
from typing import Any
MyAny = ... # What to put here?
#dataclass()
class Test:
a: Any
b: MyAny = None
for field in fields(Test):
if field.type == MyAny:
print(f"It's {field.name}") # This should print "It's b"
Things I have tried:
Doesn't work, because you can't subclass Any: TypeError: Cannot subclass <class 'typing._SpecialForm'>
class MyAny(Any):
pass
Doesn't work, because it is not distinguishable from the normal Any (result of code snipped above is It's a\nIt's b)
MyAny = Any
Works at runtime, but mypy complains about the default value:
Mypy: Incompatible types in assignment (expression has type "None", variable has type "MyAny")
class MyAny:
pass
Works at runtime, but mypy can't tell that this should behave like Any:
It complains about the definition that
Mypy: Argument 2 to NewType(...) must be subclassable(got "Any")
and it complaints about the default parameter:
Mypy: Incompatible types in assignment (expression has type "None", variable has type "MyAny")
from typing import NewType
MyAny = NewType("MyAny", Any)
So is there a way to make this work?
You can use conditionals to trick mypy into interpreting one piece of code while having your runtime execute another one.
from dataclasses import dataclass, fields
from typing import Any
if False:
MyAny = Any
else:
class MyAny: # type: ignore
pass
#dataclass()
class Test:
a: Any
b: MyAny = None
for field in fields(Test):
if field.type == MyAny:
print(f"It's {field.name}") # This should print "It's b"
You could use a TypeVar.
# foo.py
from dataclasses import dataclass, fields
from typing import Any, TypeVar, Generic, Optional
MyAny = TypeVar('MyAny')
#dataclass()
class Test(Generic[MyAny]):
a: Any
b: Optional[MyAny] = None
for field in fields(Test):
if field.type == Optional[MyAny]:
print(f"It's {field.name}")
Output
$ python3 foo.py
It's b
$ mypy foo.py
Success: no issues found in 1 source file

What the code '_T = TypeVar('_T')' means in a *.pyi file?

I'm new to Python annotation (type hints). I noticed that many of the class definitions in pyi files inherit to Generic[_T], and _T = TypeVar('_T').
I am confused, what does the _T mean here?
from typing import Generic, TypeVar
_T = TypeVar('_T')
class Base(Generic[_T]): pass
I recommend reading through the entire built-in typing module documentation.
typing.TypeVar
Basic Usage
Specifically, typing.TypeVar is used to specify that multiple possible types are allowed. If no specific types are specified, then any type is valid.
from typing import TypeVar
T = TypeVar('T') # <-- 'T' can be any type
A = TypeVar('A', str, int) # <-- 'A' will be either str or int
But, if T can be any type, then why create a typing.TypeVar like that, when you could just use typing.Any for the type hint?
The reason is so you can ensure that particular input and output arguments have the same type, like in the following examples.
A Dict Lookup Example
from typing import TypeVar, Dict
Key = TypeVar('Key')
Value = TypeVar('Value')
def lookup(input_dict: Dict[Key, Value], key_to_lookup: Key) -> Value:
return input_dict[key_to_loopup]
This appears to be a trivial example at first, but these annotations require that the types of the keys in input dictionary are the same as the key_to_lookup argument, and that the type of the output matches the type of the values in the dict as well.
The keys and values as a whole could be different types, and for any particular call to this function, they could be different (because Key and Value do not restrict the types), but for a given call, the keys of the dict must match the type of the lookup key, and the same for the values and the return type.
An Addition Example
If you create a new TypeVar and limit the types to float and int:
B = TypeVar('B', float, int)
def add_x_and_y(x: B, y: B) -> B:
return x + y
This function requires that x and y either both be float, or both be int, and the same type must be returned. If x were a float and y were an int, the type checking should fail.
typing.Generic
I'm a little more sketchy on this one, but the typing.Generic (links to the official docs) Abstract Base Class (ABC) allows setting up a Class that has a defined type hint. They have a good example in the linked docs.
In this case they are creating a completely generic type class. If I understand correctly, this allows using Base[AnyName] as a type hint elsewhere in the code and then one can reuse AnyName to represent the same type elsewhere within the same definition (i.e. within the same code scope).
I suppose this would be useful to avoid having to use TypeVar repeatedly, you can basically create new TypeVars at will by just using the Base class as a type hint, as long as you just need it for the scope of that local definition.

Categories