Create a pydantic.BaseModel definition with external class or dictionary - python

I have a (dynamic) definition of a simple class, like so:
class Simple:
val: int = 1
I intend to use this definition to build a pydantic.BaseModel, so it can be defined from the Simple class; basically doing this, but via type, under a metaclass structure where the Simple class is retrieved from.
from pydantic import BaseModel
class SimpleModel(Simple, BaseModel):
pass
# Actual ways tried:
SimpleModel = type('SimpleModel', (Simple, BaseModel), {})
# or
SimpleModel = type('SimpleModel', (BaseModel, ), Simple.__annotations__)
However, this approach was not returning a model class with the parameters from the Simple class.
I understand that the BaseModel already uses a rather complex metaclass under the hood, however, my intended implementation is also under a metaclass, where I intend to dynamically transfer the Simple class into a BaseModel from pydantic.
Your suggestions will be kindly appreciated.

I managed to get this working by first, casting my Simple class to be a dataclass from pydantic, then getting a pydantic model from it.
I am not an expert in pydantic, so would not mind your views on the approach.
from pydantic.dataclasses import dataclass
SimpleModel = dataclass(Simple).__pydantic_model__
The trouble I did however find (same with an answer provided by #jsbueno), that when declaring annotation for data type for pathlib.Path (as an example) with BaseModel directly, the string value provided gets coerced to the annotation data type. But with my or #jsbueno approaches, the data type remains original (no coercion).

You can simply call type passing a dictionary made of SimpleModel's __dict__ attribute - that will contain your fileds default values and the __annotations__ attribute, which are enough information for Pydantic to do its thing.
I just would just take the extra step of deleting the __weakref__ attribute that is created by default in the plain "SimpleModel" before doing that - to avoid it pointing to the wrong class.
from pydantic import BaseModel
class Simple:
val: int = 1
new_namespace = dict(Simple.__dict__) # copies the class dictproxy into a plain dictionary
del new_namespace["__weakref__"]
SimpleModel = type("SimpleModel", (BaseModel,), new_namespace)
and we have
In [58]: SimpleModel.schema()
Out[58]:
{'title': 'Simple',
'type': 'object',
'properties': {'one_val': {'title': 'One Val',
'default': 1,
'type': 'integer'}}}
That works - but since Pydantic is complex, to make it more futureproof, it might be better to use the Pydantic's metaclass supplied namespace object instead of a plain dictionary - the formal way to do that is by using
the helper functions in the types model:
import types
from pydantic import BaseModel
class Simple:
val: int = 1
SimpleModel = types.new_class(
"SimpleModel",
(BaseModel,),
exec_body=lambda ns:ns.update(
{key: val for key, val in Simple.__dict__.items()
if not key.startswith("_")}
)
)
The new_type call computes the appropriate metaclass, and pass the correct namespace object to the callback in the exec_body argument. There, we just fill it with the contents of the dict on your dynamic class.
Here, I opted to update the namespace and filter all "_" values in a single line, but you can define the function passes to "exec_body" as a full multiline function and filter the contents you want out more carefully.

Related

Why are attributes defined outside __init__ in popular packages like SQLAlchemy or Pydantic?

I'm modifying an app, trying to use Pydantic for my application models and SQLAlchemy for my database models.
I have existing classes, where I defined attributes inside the __init__ method as I was taught to do:
class Measure:
def __init__(
self,
t_received: int,
mac_address: str,
data: pd.DataFrame,
battery_V: float = 0
):
self.t_received = t_received
self.mac_address = mac_address
self.data = data
self.battery_V = battery_V
In both Pydantic and SQLAlchemy, following the docs, I have to define those attributes outside the __init__ method, for example in Pydantic:
import pydantic
class Measure(pydantic.BaseModel):
t_received: int
mac_address: str
data: pd.DataFrame
battery_V: float
Why is it the case? Isn't this bad practice? Is there any impact on other methods (classmethods, staticmethods, properties ...) of that class?
Note that this is also very unhandy because when I instantiate an object of that class, I don't get suggestions on what parameters are expected by the constructor!
Defining attributes of a class in the class namespace directly is totally acceptable and is not special per se for the packages you mentioned. Since the class namespace is (among other things) essentially a blueprint for instances of that class, defining attributes there can actually be useful, when you want to e.g. provide all public attributes with type annotations in a single place in a consistent manner.
Consider also that a public attribute does not necessarily need to be reflected by a parameter in the constructor of the class. For example, this is entirely reasonable:
class Foo:
a: list[int]
b: str
def __init__(self, b: str) -> None:
self.a = []
self.b = b
In other words, just because something is a public attribute, that does not mean it should have to be provided by the user upon initialization. To say nothing of protected/private attributes.
What is special about Pydantic (to take your example), is that the metaclass of BaseModel as well as the class itself does a whole lot of magic with the attributes defined in the class namespace. Pydantic refers to a model's typical attributes as "fields" and one bit of magic allows special checks to be done during initialization based on those fields you defined in the class namespace. For example, the constructor must receive keyword arguments that correspond to the non-optional fields you defined.
from pydantic import BaseModel
class MyModel(BaseModel):
field_a: str
field_b: int = 1
obj = MyModel(
field_a="spam", # required
field_b=2, # optional
field_c=3.14, # unexpected/ignored
)
If I were to omit field_a during construction of a MyModel instance, an error would be raised. Likewise, if I had tried to pass field_b="eggs", an error would be raised.
So the fact that you don't write your own __init__ method is a feature Pydantic provides you. You only define the fields and an appropriate constructor is "magically" there for you already.
As for the drawback you mentioned, where you don't get any auto-suggestions, that is true by default for all IDEs. Static type checkers cannot understand that dynamic constructor and simply infer what arguments are expected. Currently this is solved via extensions, such as the mypy plugin and the PyCharm plugin. Maybe soon the #dataclass_transform decorator from PEP 681
will standardize this for similar packages and thus improve support by static type checkers.
It is also worth noting that even the standard library's dataclasses only work via special extensions in type checkers.
To your other question, there is obviously some impact on methods of such classes (by design), though the specifics are not always obvious. You should of course not simply write your own __init__ method without being careful to call the superclass' __init__ properly inside it. Also, #property-setters currently don't work as you would expect it (though it is debatable if it even makes sense to use properties on Pydantic models).
To wrap up, this approach is not only not bad practice, it is a great idea to reduce boilerplate code and it is extremely common these days, as evidenced by the fact that hugely popular and established packages (like the aforementioned Pydantic, as well as e.g. SQLAlchemy, Django and others) use this pattern to a certain extent.
Pydantic has its own (rewriting) magic, but SQLalchemy is a bit easier to explain.
A SA model looks like this :
>>> from sqlalchemy import Column, Integer, String
>>> class User(Base):
...
... id = Column(Integer, primary_key=True)
... name = Column(String)
Column, Integer and String are descriptors. A descriptor is a class that overrides the get and set methods. In practice, this means the class can control how data is accessed and stored.
For example this assignment would now use the __set__ method from Column:
class User(Base):
id = Column(Integer, primary_key=True)
name = Column(String)
user = User()
user.name = 'John'
This is the same as user.name.__set__('John') , however, because of the MRO, it finds a set method in Column, so uses that instead. In a simplified version the Column looks something like this:
class Column:
def __init__(self, field=""):
self.field= field
def __get__(self, obj, type):
return obj.__dict__.get(self.field)
def __set__(self, obj, val):
if validate_field(val)
obj.__dict__[self.field] = val
else:
print('not a valid value')
(This is similar to using #property. A Descriptor is a re-usable #property)

Inherit only a Subset of Fields of Pydantic Model

I would like to generate a Pydantic model that inherits from a parent class, but only has a subset of that parent model's fields.
E.g. ModelB should inherit only field_b from ModelA:
from pydantic import BaseModel
class ModelA(BaseModel):
field_a: str
field_b: str
class ModelB(ModelA):
pass
As far as I know, there is no built-in mechanism for this in Pydantic.
Difficult solutions
You could start messing around with the internals like __fields__
and __fields_set__, but I would strongly advise against it. I think this may be less than trivial because you need to take into account validators that are already registered and maybe a bunch of other stuff that happens internally, one a field is defined on a model.
You could also go the route of e.g. defining your own __init_subclass__ on ModelA or even subclassing ModelMetaclass, but this will likely lead to the same difficulties. Unless you are very familiar with the intricacies of Pydantic models and are prepared to rework your code, if something fundamentally changes on their end, I would not recommend this.
I can think of a few workarounds though.
Potential workarounds
The simplest one in my opinion is simply factoring out the fields that you want to share into their own model:
from pydantic import BaseModel
class ModelWithB(BaseModel):
field_b: str
class ModelA(ModelWithB):
field_a: str
class ModelB(ModelWithB):
pass
This obviously doesn't work, if you have no control over ModelA. It also may mess up the order of the fields on ModelA (in this case field_b would come before field_a), which may or may not be important to you. Validation for example depends on the order in which fields were defined.
Another possible workaround would be to override the unneeded field in ModelB and make it optional with a None default and exclude it from dict and json exports:
from pydantic import BaseModel, Field
class ModelA(BaseModel):
field_a: str
field_b: str
class ModelB(ModelA):
field_a: str | None = Field(default=None, exclude=True)
b = ModelB(field_b="foo")
print(b.json())
Output:
{"field_b": "foo"}
Note that this does not actually get rid of the field. It is still there and by default still visible in the model's string representation for example, as well as in the model schema. But at least you never need to pass a value for field_a and it is not present, when calling dict or json by default.
Note also that you may run into addtional problems, if you have custom validators for field_a that don't work with a None value.
If you provide more details, I might amend this answer, but so far I hope this helps a little.
It was enough for me to hard copy the field and to adjust the extras I had defined. Here is a snippet from my code:
import copy
from pydantic import BaseModel
def copy_primary_field(
model_from: BaseModel,
model_to: BaseModel,
primary_key: str,
) -> BaseModel:
new_field_name = f"{model_from.__name__}" + "_" + primary_key
model_to.__fields__[new_field_name] = copy.deepcopy(
model_from.__fields__[primary_key]
)
model_to.__fields__[new_field_name].name = new_field_name
model_to.__fields__[new_field_name].field_info.extra["references"] = (
f"{model_from.__name__}" + ":" + primary_key
)
return model_to

Python Dynamic Type Hints (like Dataclasses)

I have a dataclass, and a function which will create an instance of that dataclass using all the kwargs passed to it.
If I try to create an instance of that dataclass, I can see the type hints/autocomplete for the __init__ method. I just need the similar type hints for a custom function that I want to create.
from dataclasses import dataclass
#dataclass
class Model:
attr1: str
attr2: int
def my_func(**kwargs):
model = Model(**kwargs)
... # Do something else
ret = [model]
return ret
# my_func should show that it needs 'attr1' & 'attr2'
my_func(attr1='hello', attr2=65535)
If your IDE isn't sophisticated enough to infer that kwargs isn't used for anything else than to create a Model instance (I'm not sure if there is such an IDE), then it has no way of knowing that attr1 and attr2 are the required arguments, and the obvious solution would be to list them explicitly.
I would refactor the function so that it takes a Model instance as argument instead.
Then you would call it like my_func(Model(...)) and the IDE could offer the autocompletion for Model.

How do you stop Pydantic from accepting additional attributes in a nested model

I find the following Pydantic behaviour surprising.
First I create Item, and then extend it with ItemExtended to include an additional attribute. Now I create ItemContainer which can take a list of Items.
from typing import List
from pydantic import BaseModel
class Item(BaseModel):
thing: int
class ItemExtended(Item):
extra_thing: int = 456
class ItemContainer(BaseModel):
items: List[Item]
For some reason if I now create an instance of ItemContainer passing in an ItemExtended then rather than converting the ItemExtended to an Item it is simply allowed, along with the unwanted default value.
i.e. this
ItemContainer(items=[ItemExtended(thing=123)])
becomes
ItemContainer(items=[ItemExtended(thing=123, extra_thing=456)])
Is there a way to strictly enforce the items type such that this
ItemContainer(items=[ItemExtended(thing=123)])
becomes
ItemContainer(items=[Item(thing=123)])
What you're asking to do violates how Python thinks about typing. Pydantic is doing something that makes intuitive sense: You're asking it be an Item, and it is, in fact, an Item.
If you want to be strict about what's provided, create a stricter subclass:
from typing import List
from pydantic import BaseModel
class Item(BaseModel):
thing: int
class StrictItem(Item):
pass
class ItemExtended(Item):
extra_thing: int = 456
class ItemContainer(BaseModel):
items: List[StrictItem]
I think you're asking for automatic conversion, however, which you could create a validator for.

Python dataclasses: What type to use if __post_init__ performs type conversion?

I have a Python class, with a field which can be passed one of several sequence types. To simplify I'll stick with tuples and lists. __init__ converts the parameter to MyList.
from typing import Union
from dataclasses import dataclass, InitVar, field
class MyList(list):
pass
#dataclass
class Struct:
field: Union[tuple, list, MyList]
def __post_init__(self):
self.field = MyList(self.field)
What type should I use for the field declaration?
If I supply a union of all possible input types, the code does not document that field is always a MyList when accessed.
If I only supply the final MyList type, PyCharm complains when I pass Struct() a list.
I could instead use:
_field: InitVar[Union[tuple, list, MyList]] = None
field: MyList = field(init=False)
def __post_init__(self, _field):
self.field = MyList(_field)
but this is tremendously ugly, especially when repeated across 3 fields. Additionally I have to construct a struct like Struct(_field=field) instead of Struct(field=field).
In April 2018, "tm" commented on this issue on PyCharm's announcement: https://blog.jetbrains.com/pycharm/2018/04/python-37-introducing-data-class/#comment-323957
You are conflating assigning a value to the attribute with the code that produces the value to assign to the attribute. I would use a separate class method to keep the two pieces of code separate.
from dataclasses import dataclass
class MyList(list):
pass
#dataclass
class Struct:
field: MyList
#classmethod
def from_iterable(cls, x):
return cls(MyList(x))
s1 = Struct(MyList([1,2,3]))
s2 = Struct.from_iterable((4,5,6))
Now, you only pass an existing value of MyList to Struct.__init__. Tuples, lists, and whatever else MyList can accept are passed to Struct.from_iterable instead, which will take care of constructing the MyList instance to pass to Struct.
Have you tried a Pydantic BaseModel instead of a dataclass?
With the following code, my Pycharm does not complain:
from pydantic import BaseModel
class MyList(list):
pass
class PydanticStruct(BaseModel):
field: MyList
def __post_init__(self):
self.field = MyList(self.field)
a = PydanticStruct(field=['a', 'b'])
dataclasses works best in straight-forward data containers, advanced utilities like conversion were consciously ommitted (see here for a complete writeup of this and similar features). Implementing this is a fair bit of work, since it should also include the pycharm plugin that notices in how far conversion would be supported now.
A much better approach would be to use one of the 3rd party that already did this, the most popular one being pydantic, probably because it has the easiest migration for dataclasses.
A native pydantic solution could look like this, where the conversion code is part of MyList. Handling it that way makes the __post_init__ unnecessary, leading to cleaner model definitions:
import pydantic
class MyList(list):
#classmethod
def __get_validators__(cls):
"""Validators handle data validation, as well as data conversion.
This function yields validator functions, with the last-yielded
result being the final value of a pydantic field annotated with
this class's type.
Since we inherit from 'list', our constructor already supports
building 'MyList' instances from iterables - if we didn't, we
would need to write that code by hand and yield it instead.
"""
yield cls
class Struct(pydantic.BaseModel):
field: MyList # accepts any iterable as input
print(Struct(field=(1, 2, 3)))
# prints: field=[1, 2, 3]

Categories