I want to create a Pydantic model in which there is a list field, which left uninitialized has a default value of an empty list. Is there an idiomatic way to do this?
For Python's built-in dataclass objects you can use field(default_factory=list), however in my own experiments this seems to prevent my Pydantic models from being pickled. A naive implementation might be, something like this:
from pydantic import BaseModel
class Foo(BaseModel):
defaulted_list_field: Sequence[str] = [] # Bad!
But we all know not to use a mutable value like the empty-list literal as a default.
So what's the correct way to give a Pydantic list-field a default value?
For pydantic you can use mutable default value, like:
class Foo(BaseModel):
defaulted_list_field: List[str] = []
f1, f2 = Foo(), Foo()
f1.defaulted_list_field.append("hey!")
print(f1) # defaulted_list_field=['hey!']
print(f2) # defaulted_list_field=[]
It will be handled correctly (deep copy) and each model instance will have its own empty list.
Pydantic also has default_factory parameter. In the case of an empty list, the result will be identical, it is rather used when declaring a field with a default value, you may want it to be dynamic (i.e. different for each model).
from typing import List
from pydantic import BaseModel, Field
from uuid import UUID, uuid4
class Foo(BaseModel):
defaulted_list_field: List[str] = Field(default_factory=list)
uid: UUID = Field(default_factory=uuid4)
While reviewing my colleague's merge request I saw the usage of a mutable object as a default argument and pointed that out. To my surprise, it works as if have done a deepcopy of the object. I found an example in the project's readme, but without any clarification. And suddenly realized that developers constantly ignore this question for a long time (see links at the bottom).
Indeed, you can write something like this. And expect correct behavior:
from pydantic import BaseModel
class Foo(BaseModel):
defaulted_list_field: List[str] = []
But what happens underhood?
We need to go deeper...
After a quick search through the source code I found this:
class ModelField(Representation):
...
def get_default(self) -> Any:
return smart_deepcopy(self.default) if self.default_factory is None else self.default_factory()
While smart_deepcopy function is:
def smart_deepcopy(obj: Obj) -> Obj:
"""
Return type as is for immutable built-in types
Use obj.copy() for built-in empty collections
Use copy.deepcopy() for non-empty collections and unknown objects
"""
obj_type = obj.__class__
if obj_type in IMMUTABLE_NON_COLLECTIONS_TYPES:
return obj # fastest case: obj is immutable and not collection therefore will not be copied anyway
try:
if not obj and obj_type in BUILTIN_COLLECTIONS:
# faster way for empty collections, no need to copy its members
return obj if obj_type is tuple else obj.copy() # type: ignore # tuple doesn't have copy method
except (TypeError, ValueError, RuntimeError):
# do we really dare to catch ALL errors? Seems a bit risky
pass
return deepcopy(obj) # slowest way when we actually might need a deepcopy
Also, as mentioned in the comments you can not use mutable defaults in databases attributes declaration directly (use default_factory instead). So this example is not valid:
from pydantic.dataclasses import dataclass
#dataclass
class Foo:
bar: list = []
And gives:
ValueError: mutable default <class 'list'> for field bar is not allowed: use default_factory
Links to open discussions (no answers so far):
Why isn't mutable default value (field = List[int] = []) a documented feature?
How does pydantic.BaseModel handle mutable default args?
Related
This is what I have. I believe there are two problems here - the Literal and the None.
from attrs import frozen, field
from attrs.validators import instance_of
OK_ARGS = ['a', 'b']
#field
class MyClass:
my_field: Literal[OK_ARGS] | None = field(validator=instance_of((Literal[OK_ARGS], None)))
Error:
TypeError: Subscripted generics cannot be used with class and instance checks
Edit: I've made a workaround with a custom validator. Not that pretty however:
def _validator_literal_or_none(literal_type):
def inner(instance, attribute, value):
if (isinstance(value, str) and (value in literal_type)) or (value is None):
pass
else:
raise ValueError(f'You need to provide a None, or a string in this list: {literal_type}')
return inner
You can’t do isinstance() checks on Literals/Nones and that’s what the is_instance Validator is using internally (it predates those typing features by far).
While we’ve resisted adding a complete implementation of the typing language due to its complexity, having one dedicated to such cases mind be worth exploring if you’d like to open an issue.
I often use dict to group and namespace related data. Two drawbacks are:
I cannot type-hint individual entries (e.g. x['s']: str = ''). Accessing union-typed values (e.g. x: dict[str, str | None] = {}) later needs assert statements to please mypy.
Spelling entries is verbose. Values mapped to str keys need four extra characters (i.e. ['']); attributes only need one (i.e. .).
I've considered types.SimpleNamespace. However, like with classes, I run into this mypy error:
import types
x = types.SimpleNamespace()
x.s: str = ''
# 3 col 2 error| Type cannot be declared in assignment to non-self attribute [python/mypy]
Is there a way to type-hint attributes added after instantiation?
If not, what other structures should I consider? As with dict and unlike collections.namedtuple, I require mutability.
There is no way to type-hint attributes that are not defined inside class body or __init__.
You need to declare some sort of structure with known fields or keys and then use it. You have a whole bunch of options. First things to consider (as most similar to your existing attempt) are TypedDict and dataclass. TypedDict does no runtime validation and is just a plain dictionary during code execution (no key/value restrictions apply). dataclass will create an __init__ for you, but you'll be able to set any attributes later (without annotation, invisible for mypy). With dataclass(slots=True), it will be impossible.
Let me show some examples:
from typing import TypedDict
class MyStructure(TypedDict):
foo: str
data: MyStructure = {'foo': 'bar'}
reveal_type(data['foo']) # N: revealed type is "builtins.str"
data['foo'] = 'baz' # OK, mutable
data['foo'] = 1 # E: Value of "foo" has incompatible type "int"; expected "str" [typeddict-item]
data['bar'] # E: TypedDict "MyStructure" has no key "bar" [typeddict-item]
# Second option
from dataclasses import dataclass
#dataclass
class MyStructure2:
foo: str
data2 = MyStructure2(foo='bar')
reveal_type(data2.foo) # N: Revealed type is "builtins.str"
data2.foo = 'baz' # OK, mutable
data2.foo = 1 # E: Incompatible types in assignment (expression has type "int", variable has type "str") [assignment]
data2.bar # E: "MyStructure2" has no attribute "bar" [attr-defined]
Here's a playground link.
Consider I have a python class that has a attributes (i.e. a dataclass, pydantic, attrs, django model, ...) that consist of a union, i.e. None and and a state.
Now I have a complex checking function that checks some values.
If I use this checking function, I want to tell the type checker, that some of my class attributes are narrowed.
For instance see this simplified example:
import dataclasses
from typing import TypeGuard
#dataclasses.dataclass
class SomeDataClass:
state: tuple[int, int] | None
name: str
# Assume many more data attributes
class SomeDataClassWithSetState(SomeDataClass):
state: tuple[int, int]
def complex_check(data: SomeDataClass) -> TypeGuard[SomeDataClassWithSetState]:
# Assume some complex checks here, for simplicity it is only:
return data.state is not None and data.name.startswith("SPECIAL")
def get_sum(data: SomeDataClass) -> int:
if complex_check(data):
return data.state[0] + data.state[1]
return 0
Explore on mypy Playground
As seen it is possible to do this with subclasses, which for various reason is not an option for me:
it introduces a lot of duplication
some possible libraries used for dataclasses are not happy with being subclasses without side condition
there could be some Metaclass or __subclasses__ magic that handles all subclass specially, i.e. creating database for the dataclasses
So is there an option to type narrow a(n) attribute(s) of a class without introducing a solely new class, as proposed here?
TL;DR: You cannot narrow the type of an attribute. You can only narrow the type of an object.
As I already mentioned in my comment, for typing.TypeGuard to be useful it relies on two distinct types T and S. Then, depending on the returned bool, the type guard function tells the type checker to assume the object to be either T or S.
You say, you don't want to have another class/subclass alongside SomeDataClass for various (vaguely valid) reasons. But if you don't have another type, then TypeGuard is useless. So that is not the route to take here.
I understand that you want to reduce the type-safety checks like if obj.state is None because you may need to access the state attribute in multiple different places in your code. You must have some place in your code, where you create/mutate a SomeDataClass instance in a way that ensures its state attribute is not None. One solution then is to have a getter for that attribute that performs the type-safety check and only ever returns the narrower type or raises an error. I typically do this via #property for improved readability. Example:
from dataclasses import dataclass
#dataclass
class SomeDataClass:
name: str
optional_state: tuple[int, int] | None = None
#property
def state(self) -> tuple[int, int]:
if self.optional_state is None:
raise RuntimeError("or some other appropriate exception")
return self.optional_state
def set_state(obj: SomeDataClass, value: tuple[int, int]) -> None:
obj.optional_state = value
if __name__ == "__main__":
foo = SomeDataClass(optional_state=(1, 2), name="foo")
bar = SomeDataClass(name="bar")
baz = SomeDataClass(name="baz")
set_state(bar, (2, 3))
print(foo.state)
print(bar.state)
try:
print(baz.state)
except RuntimeError:
print("baz has no state")
I realize you mean there are many more checks happening in complex_check, but either that function doesn't change the type of data or it does. If the type remains the same, you need to introduce type-safety for attributes like state in some other place, which is why I suggest a getter method.
Another option is obviously to have a separate class, which is what is typically done with FastAPI/Pydantic/SQLModel for example and use clever inheritance to reduce code duplication. You mentioned this may cause problems because of subclassing magic. Well, if it does, use the other approach, but I can't think of an example that would cause the problems you mentioned. Maybe you can be more specific and show a case where subclassing would lead to problems.
Consider the following class:
from dataclasses import dataclass
#dataclass
class C:
a: int = 1
b: int
Trying to execute this yields TypeError: non-default argument 'b' follows default argument
Now consider this:
from dataclasses import dataclass, field
from dataclasses_json import config
#dataclass
class C:
a: int = field(metadata=config(encoder=lambda x: x, decoder=lambda x: x))
b: int
This executes without error.
The question is: how does the field function "cheat" the python interpreter and not considered a default value? Can I replicate this behavior in my own function?
The #dataclass is not "interpreted by the interpreter", in the sense that the interpreter does not "know" it has to raise errors like TypeError: non-default argument 'b' follows default argument. Instead, #dataclass is a regular Python function that inspects the class object and explicitly raises the error.
The high-level descriptions of this mechanism is that field returns a Field object containing the meta-data passed to field. The #dataclass code checks if class attribute values are Field objects, not whether they are created by field – one can write a custom function to construct a Field instance if needed.
Of course, the easiest approach is just to have a function that calls field in order to create a Field.
from dataclasses import field, MISSING
def auto_field(*, default=MISSING, default_factory=MISSING, init=True, metadata=None):
"""Field that inspects defaults to decide whether it is repr/hash'able"""
if default is MISSING and default_factory is MISSING:
return field(init=init, metadata=metadata)
test_default = default if default is not MISSING else default_factory()
return field(
default=default, default_factory=default_factory, init=init, metadata=metadata,
repr=type(test_default).__repr__ is not object.__repr__,
hash=getattr(test_default, '__hash__', None) is not None,
compare=getattr(test_default, '__hash__', None) is not None,
)
from dataclasses import dataclass
#dataclass(frozen=True)
class Foo:
a: int = auto_field() # not counted as a default
b: int
c: list = auto_field(default_factory=list)
print(Foo(12, 42), hash(Foo(12, 42))) # works because c is ignored for hashing
Note that conceptually, one is still restricted to the logic of dataclass and its Fields. For example, that means that one cannot create a "field which has a default but is not considered a default value" – depending on how one approaches it, dataclass would either ignore it or still raise an error when preparing the actual class.
I want to create my own parameterized type in Python for use in type hinting:
class MaybeWrapped:
# magic goes here
T = TypeVar('T')
assert MaybeWrapped[T] == Union[T, Tuple[T]]
Never mind the contrived example; how can I implement this? I looked at the source for Union and Optional, but it looks like some fairly low-level hackery that I'd like to avoid.
The only suggestion in the documentation comes from an example re-implementation of Mapping[KT,VT] that inherits from Generic. But that example is more about the __getitem__ method than about the class itself.
If you're just trying to create generic classes or functions, try taking a look at the documentation on mypy-lang.org about generic types -- it's fairly comprehensive, and more detailed then the standard library typing docs.
If you're trying to implement your specific example, it's worth pointing out that type aliases work with typevars -- you can simply do:
from typing import Union, TypeVar, Tuple
T = TypeVar('T')
MaybeWrapped = Union[T, Tuple[T]]
def foo(x: int) -> MaybeWrapped[str]:
if x % 2 == 0:
return "hi"
else:
return ("bye",)
# When running mypy, the output of this line is:
# test.py:13: error: Revealed type is 'Union[builtins.str, Tuple[builtins.str]]'
reveal_type(foo(3))
However, if you're trying to construct a generic type with genuinely new semantics, you're very likely out of luck. Your remaining options are to:
Construct some kind of custom class/metaclass thing that PEP 484-compliant type checkers can understand and use that.
Modify the type checker you're using somehow (mypy has an experimental "plugin" system, for example)
Petition to modify PEP 484 to include your new, custom type (you can do this by opening an issue in the typing module repo).
It is exactly the __getitem__ method that does all the magic.
That is the method called in when you subscribe one name with [ and ] brackets.
So, you need an __getitem__ method in the class of your class - that is, its metaclass, that will get as parameters whatever is within the brackets. That method is responsible for dynamically creating (or retrieving a cached copy) of whatever you want to generate, and return it.
I just can't possibly imagin how you want this for type hinting, since the typing library seems to cover all reasonable cases (I can't think of an example they don't cover already). But let's suppose you want a class to return a copy of itself, but with the parameter anotated as its type_ attribute:
class MyMeta(type):
def __getitem__(cls, key):
new_cls = types.new_class(f"{cls.__name__}_{key.__name__}", (cls,), {}, lambda ns: ns.__setitem__("type", key))
return new_cls
class Base(metaclass=MyMeta): pass
And on trying this in interactive mode, one can do:
In [27]: Base[int]
Out[27]: types.Base_int
update: As of Python 3.7, there is also the special method __class_getitem__ which is created just for this purpose: it acts as a classmethod and avoids the need or a metaclass just for this case. Whatever would be written in a metaclass.__getitem__ can be put in the cls.__class_getitem__ method directly. Defined in PEP 560
I'd like to propose improved solution, based on #jsbueno answer. Now our "generics" can be used in comparisons and identity checks, and they will behave like "true" generics from typing. Also we can forbid instantiation of non-typed class itself. Moreover! We got isinstance checking for free!
Also meet BaseMetaMixin class for perfect static type checking!
import types
from typing import Type, Optional, TypeVar, Union
T = TypeVar('T')
class BaseMetaMixin:
type: Type
class BaseMeta(type):
cache = {}
def __getitem__(cls: T, key: Type) -> Union[T, Type[BaseMetaMixin]]:
if key not in BaseMeta.cache:
BaseMeta.cache[key] = types.new_class(
f"{cls.__name__}_{key.__name__}",
(cls,),
{},
lambda ns: ns.__setitem__("type", key)
)
return BaseMeta.cache[key]
def __call__(cls, *args, **kwargs):
assert getattr(cls, 'type', None) is not None, "Can not instantiate Base[] generic"
return super().__call__(*args, **kwargs)
class Base(metaclass=BaseMeta):
def __init__(self, some: int):
self.some = some
# identity checking
assert Base[int] is Base[int]
assert Base[int] == Base[int]
assert Base[int].type is int
assert Optional[int] is Optional[int]
# instantiation
# noinspection PyCallByClass
b = Base[int](some=1)
assert b.type is int
assert b.some == 1
try:
b = Base(1)
except AssertionError as e:
assert str(e) == 'Can not instantiate Base[] generic'
# isinstance checking
assert isinstance(b, Base)
assert isinstance(b, Base[int])
assert not isinstance(b, Base[float])
exit(0)
# type hinting in IDE
assert b.type2 is not None # Cannot find reference 'type2' in 'Base | BaseMetaMixin'
b2 = Base[2]() # Expected type 'type', got 'int' instead