I would like to generate a Pydantic model that inherits from a parent class, but only has a subset of that parent model's fields.
E.g. ModelB should inherit only field_b from ModelA:
from pydantic import BaseModel
class ModelA(BaseModel):
field_a: str
field_b: str
class ModelB(ModelA):
pass
As far as I know, there is no built-in mechanism for this in Pydantic.
Difficult solutions
You could start messing around with the internals like __fields__
and __fields_set__, but I would strongly advise against it. I think this may be less than trivial because you need to take into account validators that are already registered and maybe a bunch of other stuff that happens internally, one a field is defined on a model.
You could also go the route of e.g. defining your own __init_subclass__ on ModelA or even subclassing ModelMetaclass, but this will likely lead to the same difficulties. Unless you are very familiar with the intricacies of Pydantic models and are prepared to rework your code, if something fundamentally changes on their end, I would not recommend this.
I can think of a few workarounds though.
Potential workarounds
The simplest one in my opinion is simply factoring out the fields that you want to share into their own model:
from pydantic import BaseModel
class ModelWithB(BaseModel):
field_b: str
class ModelA(ModelWithB):
field_a: str
class ModelB(ModelWithB):
pass
This obviously doesn't work, if you have no control over ModelA. It also may mess up the order of the fields on ModelA (in this case field_b would come before field_a), which may or may not be important to you. Validation for example depends on the order in which fields were defined.
Another possible workaround would be to override the unneeded field in ModelB and make it optional with a None default and exclude it from dict and json exports:
from pydantic import BaseModel, Field
class ModelA(BaseModel):
field_a: str
field_b: str
class ModelB(ModelA):
field_a: str | None = Field(default=None, exclude=True)
b = ModelB(field_b="foo")
print(b.json())
Output:
{"field_b": "foo"}
Note that this does not actually get rid of the field. It is still there and by default still visible in the model's string representation for example, as well as in the model schema. But at least you never need to pass a value for field_a and it is not present, when calling dict or json by default.
Note also that you may run into addtional problems, if you have custom validators for field_a that don't work with a None value.
If you provide more details, I might amend this answer, but so far I hope this helps a little.
It was enough for me to hard copy the field and to adjust the extras I had defined. Here is a snippet from my code:
import copy
from pydantic import BaseModel
def copy_primary_field(
model_from: BaseModel,
model_to: BaseModel,
primary_key: str,
) -> BaseModel:
new_field_name = f"{model_from.__name__}" + "_" + primary_key
model_to.__fields__[new_field_name] = copy.deepcopy(
model_from.__fields__[primary_key]
)
model_to.__fields__[new_field_name].name = new_field_name
model_to.__fields__[new_field_name].field_info.extra["references"] = (
f"{model_from.__name__}" + ":" + primary_key
)
return model_to
Related
I'm modifying an app, trying to use Pydantic for my application models and SQLAlchemy for my database models.
I have existing classes, where I defined attributes inside the __init__ method as I was taught to do:
class Measure:
def __init__(
self,
t_received: int,
mac_address: str,
data: pd.DataFrame,
battery_V: float = 0
):
self.t_received = t_received
self.mac_address = mac_address
self.data = data
self.battery_V = battery_V
In both Pydantic and SQLAlchemy, following the docs, I have to define those attributes outside the __init__ method, for example in Pydantic:
import pydantic
class Measure(pydantic.BaseModel):
t_received: int
mac_address: str
data: pd.DataFrame
battery_V: float
Why is it the case? Isn't this bad practice? Is there any impact on other methods (classmethods, staticmethods, properties ...) of that class?
Note that this is also very unhandy because when I instantiate an object of that class, I don't get suggestions on what parameters are expected by the constructor!
Defining attributes of a class in the class namespace directly is totally acceptable and is not special per se for the packages you mentioned. Since the class namespace is (among other things) essentially a blueprint for instances of that class, defining attributes there can actually be useful, when you want to e.g. provide all public attributes with type annotations in a single place in a consistent manner.
Consider also that a public attribute does not necessarily need to be reflected by a parameter in the constructor of the class. For example, this is entirely reasonable:
class Foo:
a: list[int]
b: str
def __init__(self, b: str) -> None:
self.a = []
self.b = b
In other words, just because something is a public attribute, that does not mean it should have to be provided by the user upon initialization. To say nothing of protected/private attributes.
What is special about Pydantic (to take your example), is that the metaclass of BaseModel as well as the class itself does a whole lot of magic with the attributes defined in the class namespace. Pydantic refers to a model's typical attributes as "fields" and one bit of magic allows special checks to be done during initialization based on those fields you defined in the class namespace. For example, the constructor must receive keyword arguments that correspond to the non-optional fields you defined.
from pydantic import BaseModel
class MyModel(BaseModel):
field_a: str
field_b: int = 1
obj = MyModel(
field_a="spam", # required
field_b=2, # optional
field_c=3.14, # unexpected/ignored
)
If I were to omit field_a during construction of a MyModel instance, an error would be raised. Likewise, if I had tried to pass field_b="eggs", an error would be raised.
So the fact that you don't write your own __init__ method is a feature Pydantic provides you. You only define the fields and an appropriate constructor is "magically" there for you already.
As for the drawback you mentioned, where you don't get any auto-suggestions, that is true by default for all IDEs. Static type checkers cannot understand that dynamic constructor and simply infer what arguments are expected. Currently this is solved via extensions, such as the mypy plugin and the PyCharm plugin. Maybe soon the #dataclass_transform decorator from PEP 681
will standardize this for similar packages and thus improve support by static type checkers.
It is also worth noting that even the standard library's dataclasses only work via special extensions in type checkers.
To your other question, there is obviously some impact on methods of such classes (by design), though the specifics are not always obvious. You should of course not simply write your own __init__ method without being careful to call the superclass' __init__ properly inside it. Also, #property-setters currently don't work as you would expect it (though it is debatable if it even makes sense to use properties on Pydantic models).
To wrap up, this approach is not only not bad practice, it is a great idea to reduce boilerplate code and it is extremely common these days, as evidenced by the fact that hugely popular and established packages (like the aforementioned Pydantic, as well as e.g. SQLAlchemy, Django and others) use this pattern to a certain extent.
Pydantic has its own (rewriting) magic, but SQLalchemy is a bit easier to explain.
A SA model looks like this :
>>> from sqlalchemy import Column, Integer, String
>>> class User(Base):
...
... id = Column(Integer, primary_key=True)
... name = Column(String)
Column, Integer and String are descriptors. A descriptor is a class that overrides the get and set methods. In practice, this means the class can control how data is accessed and stored.
For example this assignment would now use the __set__ method from Column:
class User(Base):
id = Column(Integer, primary_key=True)
name = Column(String)
user = User()
user.name = 'John'
This is the same as user.name.__set__('John') , however, because of the MRO, it finds a set method in Column, so uses that instead. In a simplified version the Column looks something like this:
class Column:
def __init__(self, field=""):
self.field= field
def __get__(self, obj, type):
return obj.__dict__.get(self.field)
def __set__(self, obj, val):
if validate_field(val)
obj.__dict__[self.field] = val
else:
print('not a valid value')
(This is similar to using #property. A Descriptor is a re-usable #property)
I have a (dynamic) definition of a simple class, like so:
class Simple:
val: int = 1
I intend to use this definition to build a pydantic.BaseModel, so it can be defined from the Simple class; basically doing this, but via type, under a metaclass structure where the Simple class is retrieved from.
from pydantic import BaseModel
class SimpleModel(Simple, BaseModel):
pass
# Actual ways tried:
SimpleModel = type('SimpleModel', (Simple, BaseModel), {})
# or
SimpleModel = type('SimpleModel', (BaseModel, ), Simple.__annotations__)
However, this approach was not returning a model class with the parameters from the Simple class.
I understand that the BaseModel already uses a rather complex metaclass under the hood, however, my intended implementation is also under a metaclass, where I intend to dynamically transfer the Simple class into a BaseModel from pydantic.
Your suggestions will be kindly appreciated.
I managed to get this working by first, casting my Simple class to be a dataclass from pydantic, then getting a pydantic model from it.
I am not an expert in pydantic, so would not mind your views on the approach.
from pydantic.dataclasses import dataclass
SimpleModel = dataclass(Simple).__pydantic_model__
The trouble I did however find (same with an answer provided by #jsbueno), that when declaring annotation for data type for pathlib.Path (as an example) with BaseModel directly, the string value provided gets coerced to the annotation data type. But with my or #jsbueno approaches, the data type remains original (no coercion).
You can simply call type passing a dictionary made of SimpleModel's __dict__ attribute - that will contain your fileds default values and the __annotations__ attribute, which are enough information for Pydantic to do its thing.
I just would just take the extra step of deleting the __weakref__ attribute that is created by default in the plain "SimpleModel" before doing that - to avoid it pointing to the wrong class.
from pydantic import BaseModel
class Simple:
val: int = 1
new_namespace = dict(Simple.__dict__) # copies the class dictproxy into a plain dictionary
del new_namespace["__weakref__"]
SimpleModel = type("SimpleModel", (BaseModel,), new_namespace)
and we have
In [58]: SimpleModel.schema()
Out[58]:
{'title': 'Simple',
'type': 'object',
'properties': {'one_val': {'title': 'One Val',
'default': 1,
'type': 'integer'}}}
That works - but since Pydantic is complex, to make it more futureproof, it might be better to use the Pydantic's metaclass supplied namespace object instead of a plain dictionary - the formal way to do that is by using
the helper functions in the types model:
import types
from pydantic import BaseModel
class Simple:
val: int = 1
SimpleModel = types.new_class(
"SimpleModel",
(BaseModel,),
exec_body=lambda ns:ns.update(
{key: val for key, val in Simple.__dict__.items()
if not key.startswith("_")}
)
)
The new_type call computes the appropriate metaclass, and pass the correct namespace object to the callback in the exec_body argument. There, we just fill it with the contents of the dict on your dynamic class.
Here, I opted to update the namespace and filter all "_" values in a single line, but you can define the function passes to "exec_body" as a full multiline function and filter the contents you want out more carefully.
I find the following Pydantic behaviour surprising.
First I create Item, and then extend it with ItemExtended to include an additional attribute. Now I create ItemContainer which can take a list of Items.
from typing import List
from pydantic import BaseModel
class Item(BaseModel):
thing: int
class ItemExtended(Item):
extra_thing: int = 456
class ItemContainer(BaseModel):
items: List[Item]
For some reason if I now create an instance of ItemContainer passing in an ItemExtended then rather than converting the ItemExtended to an Item it is simply allowed, along with the unwanted default value.
i.e. this
ItemContainer(items=[ItemExtended(thing=123)])
becomes
ItemContainer(items=[ItemExtended(thing=123, extra_thing=456)])
Is there a way to strictly enforce the items type such that this
ItemContainer(items=[ItemExtended(thing=123)])
becomes
ItemContainer(items=[Item(thing=123)])
What you're asking to do violates how Python thinks about typing. Pydantic is doing something that makes intuitive sense: You're asking it be an Item, and it is, in fact, an Item.
If you want to be strict about what's provided, create a stricter subclass:
from typing import List
from pydantic import BaseModel
class Item(BaseModel):
thing: int
class StrictItem(Item):
pass
class ItemExtended(Item):
extra_thing: int = 456
class ItemContainer(BaseModel):
items: List[StrictItem]
I think you're asking for automatic conversion, however, which you could create a validator for.
From the pydantic docs I understand this:
import pydantic
class User(pydantic.BaseModel):
id: int
name: str
class Student(pydantic.BaseModel):
semester: int
# this works as expected
class Student_User(User, Student):
building: str
print(Student_User.__fields__.keys())
#> dict_keys(['semester', 'id', 'name', 'building'])
However, when I want to create a similar object dynamically (following the section dynamic-model-creation):
# this results in a TypeError
pydantic.create_model("Student_User2", __base__=(User, Student))
I get:
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
Question: How to dynamically create a class like Student_User
Its not the answer to the original question, but if you are like me and all you care about is having a model which holds fields of other models, this should be a solutions.
Student_User = pydantic.create_model("Student_User", **{
**{key: (value.type_, value.default) for key, value in User.__fields__.items()},
**{key: (value.type_, value.default) for key, value in Student.__fields__.items()},
**{"building": (str, '')},
})
Essentially, we are dynamically creating a new pydantic model and we are setting its fields to be the fields of our other models plus an additional custom field.
Note:
OP included these lines in his question:
print(Student_User.__fields__.keys())
#> dict_keys(['semester', 'id', 'name', 'building'])
So, my guess is that his end goal was copying the fields from the other models and having a model created from multiple bases was just a method of achieving it.
As of pydantic==1.9.2,
Student_User2 = pydantic.create_model("Student_User2", __base__=(User, Student), building=(str, ...))
runs successfully and
print(Student_User2.__fields__.keys())
returns
dict_keys(['semester', 'id', 'name', 'building'])
Your problem is not with pydantic but with how python handles multiple inheritances. I am assuming in the above code, you created a class which has both the fields of User as well as Student, so a better way to do that is
class User(pydantic.BaseModel):
id: int
name: str
class Student(User):
semester: int
class Student_User(Student):
building: str
This gets your job done. So, now if you want to create these models dynamically, you would do
pydantic.create_model("Student_User2", building=(str, ...), __base__=Student)
Obviously, building is the new model's field, so you can change that as you want
So, the final complete code would look something like this
import pydantic
class User(pydantic.BaseModel):
id: int
name: str
class Student(User):
semester: int
class Student_User(Student):
building: str
print(Student_User.__fields__.keys())
model = pydantic.create_model("Student_User2", building=(str, ...), __base__=Student)
I've read some parts of the Pydantic library and done some tests but I can't figure out what is the added benefit of using Field(...) (with no extra options) in a schema definition instead of simply not adding a default value.
So what is added here:
from pydantic import BaseModel, Field
class Model(BaseModel):
a: int = Field(...)
that is not here:
from pydantic import BaseModel
class Model(BaseModel):
a: int
Is there any special behaviour that I'm missing?
These are basically the same, the reason you might want to do this is so you can supply other settings to the field via kwargs to Field().
If you have no other settings for the field, using Field() is unnecessary.