I'm using Pydantic to define hierarchical data in which there are models with identical attributes.
However, when I save and load these models, Pydantic can no longer distinguish which model was used and picks the first one in the field type annotation.
I understand that this is expected behavior based on the documentation.
However, the class type information is important to my application.
What is the recommended way to distinguish between different classes in Pydantic? One hack is to simply add an extraneous field to one of the models, but I'd like to find a more elegant solution.
See the simplified example below: container is initialized with data of type DataB, but after exporting and loading, the new container has data of type DataA as it's the first element in the type declaration of container.data.
Thanks for your help!
from abc import ABC
from pydantic import BaseModel #pydantic 1.8.2
from typing import Union
class Data(BaseModel, ABC):
""" base class for a Member """
number: float
class DataA(Data):
""" A type of Data"""
pass
class DataB(Data):
""" Another type of Data """
pass
class Container(BaseModel):
""" container holds a subclass of Data """
data: Union[DataA, DataB]
# initialize container with DataB
data = DataB(number=1.0)
container = Container(data=data)
# export container to string and load new container from string
string = container.json()
new_container = Container.parse_raw(string)
# look at type of container.data
print(type(new_container.data).__name__)
# >>> DataA
As correctly noted in the comments, without storing additional information models cannot be distinguished when parsing.
As of today (pydantic v1.8.2), the most canonical way to distinguish models when parsing in a Union (in case of ambiguity) is to explicitly add a type specifier Literal. It will look like this:
from abc import ABC
from pydantic import BaseModel
from typing import Union, Literal
class Data(BaseModel, ABC):
""" base class for a Member """
number: float
class DataA(Data):
""" A type of Data"""
tag: Literal['A'] = 'A'
class DataB(Data):
""" Another type of Data """
tag: Literal['B'] = 'B'
class Container(BaseModel):
""" container holds a subclass of Data """
data: Union[DataA, DataB]
# initialize container with DataB
data = DataB(number=1.0)
container = Container(data=data)
# export container to string and load new container from string
string = container.json()
new_container = Container.parse_raw(string)
# look at type of container.data
print(type(new_container.data).__name__)
# >>> DataB
This method can be automated, but you can use it at your own responsibility, since it breaks static typing and uses objects that may change in future versions:
from pydantic.fields import ModelField
class Data(BaseModel, ABC):
""" base class for a Member """
number: float
def __init_subclass__(cls, **kwargs):
name = 'tag'
value = cls.__name__
annotation = Literal[value]
tag_field = ModelField.infer(name=name, value=value, annotation=annotation, class_validators=None, config=cls.__config__)
cls.__fields__[name] = tag_field
cls.__annotations__[name] = annotation
class DataA(Data):
""" A type of Data"""
pass
class DataB(Data):
""" Another type of Data """
pass
Just wanted to take the opportunity to list another possible alternative here to pydantic - which already supports this use case very well, as per below answer.
I am the creator and maintainer of a relatively newer and lesser-known JSON serialization library, the Dataclass Wizard - which relies on the Python dataclasses module to perform its magic. As of the latest version, 0.14.0, the dataclass-wizard now supports dataclasses within Union types. Previously, it did not support dataclasses within Union types at all, which was kind of a glaring omission, and something on my "to-do" list of things to (eventually) add support for.
As of the latest, it should now support defining dataclasses within Union types. The reason it did not generally work before, is because the data being de-serialized is often a JSON object, which only knows simple types such as arrays and dictionaries, for example. A dict type would not otherwise match any of the Union[Data1, Data2] types, even if the object had all the correct dataclass fields as keys. This is simply because it doesn't compare the dict object against each of the dataclass fields in the Union types, though that might change in a future release.
So in any case, here is a simple example to demonstrate the usage of dataclasses in Union types, using a class inheritance model with the JSONWizard mixin class:
With Class Inheritance
from abc import ABC
from dataclasses import dataclass
from typing import Union
from dataclass_wizard import JSONWizard
#dataclass
class Data(ABC):
""" base class for a Member """
number: float
class DataA(Data, JSONWizard):
""" A type of Data"""
class _(JSONWizard.Meta):
"""
This defines a custom tag that uniquely identifies the dataclass.
"""
tag = 'A'
class DataB(Data, JSONWizard):
""" Another type of Data """
class _(JSONWizard.Meta):
"""
This defines a custom tag that uniquely identifies the dataclass.
"""
tag = 'B'
#dataclass
class Container(JSONWizard):
""" container holds a subclass of Data """
data: Union[DataA, DataB]
The usage is shown below, and is again pretty straightforward. It relies on a special __tag__ key set in a dictionary or JSON object to marshal it into the correct dataclass, based on the Meta.tag value for that class, that we have set up above.
print('== Load with DataA ==')
input_dict = {
'data': {
'number': '1.0',
'__tag__': 'A'
}
}
# De-serialize the `dict` object to a `Container` instance.
container = Container.from_dict(input_dict)
print(repr(container))
# prints:
# Container(data=DataA(number=1.0))
# Show the prettified JSON representation of the instance.
print(container)
# Assert we load the correct dataclass from the annotated `Union` types
assert type(container.data) == DataA
print()
print('== Load with DataB ==')
# initialize container with DataB
data_b = DataB(number=2.0)
container = Container(data=data_b)
print(repr(container))
# prints:
# Container(data=DataB(number=2.0))
# Show the prettified JSON representation of the instance.
print(container)
# Assert we load the correct dataclass from the annotated `Union` types
assert type(container.data) == DataB
# Assert we end up with the same instance when serializing and de-serializing
# our data.
string = container.to_json()
assert container == Container.from_json(string)
Without Class Inheritance
Here is the same example as above, but with relying solely on dataclasses, without using any special class inheritance model:
from abc import ABC
from dataclasses import dataclass
from typing import Union
from dataclass_wizard import asdict, fromdict, LoadMeta
#dataclass
class Data(ABC):
""" base class for a Member """
number: float
class DataA(Data):
""" A type of Data"""
class DataB(Data):
""" Another type of Data """
#dataclass
class Container:
""" container holds a subclass of Data """
data: Union[DataA, DataB]
# Setup tags for the dataclasses. This can be passed into either
# `LoadMeta` or `DumpMeta`.
#
# Note that I'm not a fan of this syntax either, so it might change. I was
# thinking of something more explicit, like `LoadMeta(...).bind_to(class)`
LoadMeta(DataA, tag='A')
LoadMeta(DataB, tag='B')
# The rest is the same as before.
# initialize container with DataB
data = DataB(number=2.0)
container = Container(data=data)
print(repr(container))
# prints:
# Container(data=DataB(number=2.0))
# Assert we load the correct dataclass from the annotated `Union` types
assert type(container.data) == DataB
# Assert we end up with the same data when serializing and de-serializing.
out_dict = asdict(container)
assert container == fromdict(Container, out_dict)
I'm trying to hack something together in the meantime using custom validators.
Basically the class decorator adds a class_name: str field, which is added to the json string. The validator then looks up the correct subclass based on its value.
def register_distinct_subclasses(fields: tuple):
""" fields is tuple of subclasses that we want to be registered as distinct """
field_map = {field.__name__: field for field in fields}
def _register_distinct_subclasses(cls):
""" cls is the superclass of fields, which we add a new validator to """
orig_init = cls.__init__
class _class:
class_name: str
def __init__(self, **kwargs):
class_name = type(self).__name__
kwargs["class_name"] = class_name
orig_init(**kwargs)
#classmethod
def __get_validators__(cls):
yield cls.validate
#classmethod
def validate(cls, v):
if isinstance(v, dict):
class_name = v.get("class_name")
json_string = json.dumps(v)
else:
class_name = v.class_name
json_string = v.json()
cls_type = field_map[class_name]
return cls_type.parse_raw(json_string)
return _class
return _register_distinct_subclasses
which is called as follows
Data = register_distinct_subclasses((DataA, DataB))(Data)
Related
I have Input and Output pandera SchemaModels and the Output inherits the Input which accurately represents that all attributes of the Input schema are in the scope of the Output schema.
What I want to avoid is inheriting all attributes as required (non-Optional) as they are rightly coming from the Input schema. Instead I want to preserve them as required for the Input schema but define which of them remain required for the Output schema while the other inherited attributes become optional.
This pydantic question is similar and has solution for defining __init_subclass__ method in the parent class. However, this doesn't work out of the box for pandera classes and I'm not sure if it is even implementable or the right approach.
import pandera as pa
from typing import Optional
from pandera.typing import Index, DataFrame, Series, Category
class InputSchema(pa.SchemaModel):
reporting_date: Series[pa.DateTime] = pa.Field(coerce=True)
def __init_subclass__(cls, optional_fields=None, **kwargs):
super().__init_subclass__(**kwargs)
if optional_fields:
for field in optional_fields:
cls.__fields__[field].outer_type_ = Optional
cls.__fields__[field].required = False
class OutputSchema(InputSchema, optional_fields=['reporting_date']):
test: Series[str] = pa.Field()
#pa.check_types
def func(inputs: DataFrame[InputSchema]) -> DataFrame[OutputSchema]:
inputs = inputs.drop(columns=['reporting_date'])
inputs['test'] = 'a'
return inputs
data = pd.DataFrame({'reporting_date': ['2023-01-11', '2023-01-12']})
func(data)
Error:
---> 18 class OutputSchema(InputSchema, optional_fields=['reporting_date']):
KeyError: 'reporting_date'
Edit:
Desired outcome to be able to set which fields from the inherited schema remain required while the remaining become optional:
class InputSchema(pa.SchemaModel):
reporting_date: Series[pa.DateTime] = pa.Field(coerce=True)
other_field: Series[str] = pa.Field()
class OutputSchema(InputSchema, required=['reporting_date'])
test: Series[str] = pa.Field()
The resulting OutputSchema has reporting_date and test as required while other_field as optional.
A similar question was asked on pandera's issue tracker, with a docs update on track for the next pandera release. There is no clean solution, but the most simple one is to exclude columns by overloading to_schema:
import pandera as pa
from pandera.typing import Series
class InputSchema(pa.SchemaModel):
reporting_date: Series[pa.DateTime] = pa.Field(coerce=True)
class OutputSchema(InputSchema):
test: Series[str]
#classmethod
def to_schema(cls) -> pa.DataFrameSchema:
return super().to_schema().remove_columns(["reporting_date"])
This runs without SchemaError against your check function.
Here is a solution by reusing existing type annotation from the input schema:
import pandera as pa
import pandas as pd
from typing import Optional
from pandera.typing import Index, DataFrame, Series, Category
from pydantic import Field, BaseModel
from typing import Annotated, Type
def copy_field(from_model: Type[BaseModel], fname: str, annotations: dict[str, ...]):
annotations[fname] = from_model.__annotations__[fname]
class InputSchema(pa.SchemaModel):
reporting_date: Series[pa.DateTime] = pa.Field(coerce=True)
not_inherit: Series[str]
class OutputSchema(pa.SchemaModel):
test: Series[str] = pa.Field()
copy_field(InputSchema, "reporting_date", __annotations__)
# reporting_date: Series[pa.DateTime] = pa.Field(coerce=True)
# not_inherit: Optional[Series[str]]
data = pd.DataFrame({
'reporting_date': ['2023-01-11', '2023-01-12'],
'not_inherit': ['a','a']
})
#pa.check_types
def func(
inputs: DataFrame[InputSchema]
) -> DataFrame[OutputSchema]:
inputs = inputs.drop(columns=['not_inherit'])
inputs['test'] = 'a'
return inputs
func(data)
I am using pydantic to manage settings for an app that supports different datasets. Each has a set of overridable defaults, but they are different per datasets. Currently, I have all of the logic correctly implemented via validators:
from pydantic import BaseModel
class DatasetSettings(BaseModel):
dataset_name: str
table_name: str
#validator("table_name", always=True)
def validate_table_name(cls, v, values):
if isinstance(v, str):
return v
if values["dataset_name"] == "DATASET_1":
return "special_dataset_1_default_table"
if values["dataset_name"] == "DATASET_2":
return "special_dataset_2_default_table"
return "default_table"
class AppSettings(BaseModel):
dataset_settings: DatasetSettings
app_url: str
This way, I get different defaults based on dataset_name, but the user can override them if necessary. This is the desired behavior. The trouble is that once there are more than a handful of such fields and names, it gets to be a mess to read and to maintain. It seems like inheritance/polymorphism would solve this problem but the pydantic factory logic seems too hardcoded to make it feasible, especially with nested models.
class Dataset1Settings(DatasetSettings):
dataset_name: str = "DATASET_1"
table_name: str = "special_dataset_1_default_table"
class Dataset2Settings(DatasetSettings):
dataset_name: str = "DATASET_2"
table_name: str = "special_dataset_2_default_table"
def dataset_settings_factory(dataset_name, table_name=None):
if dataset_name == "DATASET_1":
return Dataset1Settings(dataset_name, table_name)
if dataset_name == "DATASET_2":
return Dataset2Settings(dataset_name, table_name)
return DatasetSettings(dataset_name, table_name)
class AppSettings(BaseModel):
dataset_settings: DatasetSettings
app_url: str
Options I've considered:
Create a new set of default dataset settings models, override __init__ of DatasetSettings, instantiate the subclass and copy its attributes into the parent class. Kind of clunky.
Override __init__ of AppSettings using the dataset_settings_factory to set the dataset_settings attribute of AppSettings. Not so good because the default behavior doesn't work in the DatasetSettings at all, only when instantiated as a nested model in AppSettings.
I was hoping Field(default_factory=dataset_settings_factory) would work, but the default_factory is only for actual defaults so it has zero args. Is there some other way to intercept the args of a particular pydantic field and use a custom factory?
Another option would be to use a Discriminated/Tagged Unions.
But your solution (without looking in detail) looks fine too.
I ended up solving the problem following the first option, as follows. Code is runnable with pydantic 1.8.2 and pydantic 1.9.1.
from typing import Optional
from pydantic import BaseModel, Field
class DatasetSettings(BaseModel):
dataset_name: Optional[str] = Field(default="DATASET_1")
table_name: Optional[str] = None
def __init__(self, **data):
factory_dict = {"DATASET_1": Dataset1Settings, "DATASET_2": Dataset2Settings}
dataset_name = (
data["dataset_name"]
if "dataset_name" in data
else self.__fields__["dataset_name"].default
)
if dataset_name in factory_dict:
data = factory_dict[dataset_name](**data).dict()
super().__init__(**data)
class Dataset1Settings(BaseModel):
dataset_name: str = "DATASET_1"
table_name: str = "special_dataset_1_default_table"
class Dataset2Settings(BaseModel):
dataset_name: str = "DATASET_2"
table_name: str = "special_dataset_2_default_table"
class AppSettings(BaseModel):
dataset_settings: DatasetSettings = Field(default_factory=DatasetSettings)
app_url: Optional[str]
app_settings = AppSettings(dataset_settings={"dataset_name": "DATASET_1"})
assert app_settings.dataset_settings.table_name == "special_dataset_1_default_table"
app_settings = AppSettings(dataset_settings={"dataset_name": "DATASET_2"})
assert app_settings.dataset_settings.table_name == "special_dataset_2_default_table"
# bonus: no args mode
app_settings = AppSettings()
assert app_settings.dataset_settings.table_name == "special_dataset_1_default_table"
A couple of gotchas I discovered along the way:
If Dataset1Settings inherits from DatasetSettings, it enters a recursive loop calling init on init ad infinitum. This could be broken with some introspection, but I opted for the duck approach.
The current solution destroys any validators on DatasetSettings. I'm sure there's a way to call the validation logic anyway but the current solution effectively sidesteps whatever class-level validation you have by only initing with super().__init__
The same thing works for BaseSettings objects, but you have to drag their cumbersome init args:
def __init__(
self,
_env_file: Union[Path, str, None] = None,
_env_file_encoding: Optional[str] = None,
_secrets_dir: Union[Path, str, None] = None,
**values: Any
):
...
I wonder if there is a way to implement subclasses of a base class for different types. Each subclass should have individual input and output types while providing same behaviour as the base class.
Background: I want to process voltage and temperature samples. 100 voltage samples form a VoltageDataset. 100 temperature samples form a TemperatureDataset. Multiple VoltageDatasets form a VoltageDataCluster. Same for temperature. The processing of Datasets depends on their physical quantity. To ensure that voltage related processing can't be applied to temperature samples I'd like to add type hints.
So I'd would be nice if there is a way to define that VoltageDataClustes method append_dataset allows VoltageDataset as input type only. Same for temperature.
Is there a way to implement this behaviour without copy&pasting?
# base class
class DataCluster:
def __init__(self, name):
self.name = name
self.datasets = list()
def append_dataset(self, dataset: Dataset) -> None:
self.datasets.append(dataset)
# subclass that should allow VoltageDataset input only.
class VoltageDataCluster(DataCluster):
pass
# subclass that should allow TemperatureDataset input only.
class TemperatureDataCluster(DataCluster):
pass
Thanks!
Niklas
You could use pydantic generic models.
from typing import Generic, TypeVar, List
from pydantic.generics import GenericModel
DataT = TypeVar('DataT')
class DataCluster(GenericModel, Generic[DataT]):
name: str
datasets: List[DataT] = []
def append_dataset(self, dataset: DataT) -> None:
self.datasets.append(dataset)
voltage_cluster = DataCluster[VoltageDataset](name="name")
voltage_cluster.append_dataset(some_voltage_dataset)
When you inherit a class it automatically inherits the functionality of the class so there is no need to copy and paste. I'll illustrate this with an example.
# DataCluster.py
class DataCluster:
def __init__(self, name):
self.name = name
def printHello(self):
print("Hello")
# This will work in sub classes that have a "data" attribute
def printData(self):
print(self.data)
# VoltageDataCluster.py
from superclasses.DataCluster import DataCluster
class VoltageDataCluster(DataCluster):
def __init__(self, differentInput):
self.differentInput = differentInput
self.data = "someotherdata"
# mainclass.py
from superclasses.DataCluster import DataCluster
from superclasses.VoltageDataCluster import VoltageDataCluster
try:
dc = DataCluster("mark")
dc.printHello();
# The input for this class is not name
vdc = VoltageDataCluster("Some Other Input")
# These methods are only defined in DataCluster
vdc.printHello()
vdc.printData()
As you can see, even though we only defined the "printHello" method in the super class, the other class inherited this method while using different inputs. So no copy and pasting required. Here is a runnable example (I added comments to tell you where to find each file used).
EDIT: Added a data attribute so its more relevant to your example.
I have a class like:
class Pathology:
"""
Represents a pathology, which is initialized with a name and description.
"""
def __init__(self: str, name: str, description: str):
self.id = str(uuid.uuid4())
self.name = name
self.description = description
self.phases = []
def to_json(self):
return jsonpickle.encode(self, make_refs=False, unpicklable=False)
In this class, I do not ever want a user to pass in a value for id, I always wish to generate it upon construction.
When deserializing from JSON, I wish to do something like:
with open('data/test_case_1.json', 'r') as test_case_1_file:
test_case_1 = test_case_1_file.read()
# parse file
obj = jsonpickle.decode(test_case_1)
assert pathology == Pathology(**obj)
However, I run into an error TypeError: __init__() got an unexpected keyword argument 'id'
I suspect this is because the init constructor does not have the field id available.
What is the pythonic way to support this behavior?
In this class, I do not ever want a user to pass in a value for id, I always wish to generated it upon construction.
Based on the above desired result, my recommendation is to define id as a (read-only) property. The benefits of defining it as a property is that it won't be treated as an instance attribute, and coincidentally it won't accept a value via the constructor; the main drawback is that it won't show in the class's __repr__ value (assuming we use the generated one we get from dataclasses) or in the dataclasses.asdict helper function.
I've also taken added a few additional changes in the implementation as well (hopefully for the better):
Re-declare the class as a dataclass, which I personally prefer as it reduces a bit of boilerplate code such as an __init__ constructor, or the need to define an __eq__ method for example (the latter to check if two class objects are equal via ==). The dataclasses module also provides a helpful asdict function which we can make use of in the serialization process.
Use built-in JSON (de)serialization via the json module. Part of the reason for this decision is I have personally never used the jsonpickle module, and I only have a rudimentary understanding of how pickling works in general. I feel that converting class objects to/from JSON is more natural, and likely also performs better in any case.
Add a from_json_file helper method, which we can use to load a new class object from a local file path.
import json
import uuid
from dataclasses import dataclass, asdict, field, fields
from functools import cached_property
from typing import List
#dataclass
class Pathology:
"""
Represents a pathology, which is initialized with a name and description.
"""
name: str
description: str
phases: List[str] = field(init=False, default_factory=list)
#cached_property
def id(self) -> str:
return str(uuid.uuid4())
def to_json(self):
return json.dumps(asdict(self))
#classmethod
def from_json_file(cls, file_name: str):
# A list of only the fields that can be passed in to the constructor.
# Note: maybe it's worth caching this for repeated runs.
init_fields = tuple(f.name for f in fields(cls) if f.init)
if not file_name.endswith('.json'):
file_name += '.json'
with open(file_name, 'r') as in_file:
test_case_1 = json.load(in_file)
# parse file
return cls(**{k: v for k, v in test_case_1.items() if k in init_fields})
And here's some quick code I put together, to confirm that everything is as expected:
def main():
p1 = Pathology('my-name', 'my test description.')
print('P1:', p1)
p_id = p1.id
print('P1 -> id:', p_id)
assert p1.id == p_id, 'expected id value to be cached'
print('Serialized JSON:', p1.to_json())
# Save JSON to file
with open('my_file.json', 'w') as out_file:
out_file.write(p1.to_json())
# De-serialize object from file
p2 = Pathology.from_json_file('my_file')
print('P2:', p2)
# assert both objects are same
assert p2 == p1
# IDs should be unique, since it's automatically generated each time (we
# don't pass in an ID to the constructor or store it in JSON file)
assert p1.id != p2.id, 'expected IDs to be unique'
if __name__ == '__main__':
main()
scroll all the way down for a tl;dr, I provide context which I think is important but is not directly relevant to the question asked
A bit of context
I'm in the making of an API for a webapp and some values are computed based on the values of others in a pydantic BaseModel. These are used for user validation, data serialization and definition of database (NoSQL) documents.
Specifically, I have nearly all resources inheriting from a OwnedResource class, which defines, amongst irrelevant other properties like creation/last-update dates:
object_key -- The key of the object using a nanoid of length 6 with a custom alphabet
owner_key -- This key references the user that owns that object -- a nanoid of length 10.
_key -- this one is where I'm bumping into some problems, and I'll explain why.
So arangodb -- the database I'm using -- imposes _key as the name of the property by which resources are identified.
Since, in my webapp, all resources are only accessed by the users who created them, they can be identified in URLs with just the object's key (eg. /subject/{object_key}). However, as _key must be unique, I intend to construct the value of this field using f"{owner_key}/{object_key}", to store the objects of every user in the database and potentially allow for cross-user resource sharing in the future.
The goal is to have the shortest per-user unique identifier, since the owner_key part of the full _key used to actually access and act upon the document stored in the database is always the same: the currently-logged-in user's _key.
My attempt
My thought was then to define the _key field as a #property-decorated function in the class. However, Pydantic does not seem to register those as model fields.
Moreover, the attribute must actually be named key and use an alias (with Field(... alias="_key"), as pydantic treats underscore-prefixed fields as internal and does not expose them.
Here is the definition of OwnedResource:
class OwnedResource(BaseModel):
"""
Base model for resources owned by users
"""
object_key: ObjectBareKey = nanoid.generate(ID_CHARSET, OBJECT_KEY_LEN)
owner_key: UserKey
updated_at: Optional[datetime] = None
created_at: datetime = datetime.now()
#property
def key(self) -> ObjectKey:
return objectkey(self.owner_key)
class Config:
fields = {"key": "_key"} # [1]
[1] Since Field(..., alias="...") cannot be used, I use this property of the Config subclass (see pydantic's documentation)
However, this does not work, as shown in the following example:
#router.post("/subjects/")
def create_a_subject(subject: InSubject):
print(subject.dict(by_alias=True))
with InSubject defining properties proper to Subject, and Subject being an empty class inheriting from both InSubject and OwnedResource:
class InSubject(BaseModel):
name: str
color: Color
weight: Union[PositiveFloat, Literal[0]] = 1.0
goal: Primantissa # This is just a float constrained in a [0, 1] range
room: str
class Subject(InSubject, OwnedResource):
pass
When I perform a POST /subjects/, the following is printed in the console:
{'name': 'string', 'color': Color('cyan', rgb=(0, 255, 255)), 'weight': 0, 'goal': 0.0, 'room': 'string'}
As you can see, _key or key are nowhere to be seen.
Please ask for details and clarification, I tried to make this as easy to understand as possible, but I'm not sure if this is clear enough.
tl;dr
A context-less and more generic example without insightful context:
With the following class:
from pydantic import BaseModel
class SomeClass(BaseModel):
spam: str
#property
def eggs(self) -> str:
return self.spam + " bacon"
class Config:
fields = {"eggs": "_eggs"}
I would like the following to be true:
a = SomeClass(spam="I like")
d = a.dict(by_alias=True)
d.get("_eggs") == "I like bacon"
Pydantic does not support serializing properties, there is an issue on GitHub requesting this feature.
Based on this comment by ludwig-weiss he suggests subclassing BaseModel and overriding the dict method to include the properties.
class PropertyBaseModel(BaseModel):
"""
Workaround for serializing properties with pydantic until
https://github.com/samuelcolvin/pydantic/issues/935
is solved
"""
#classmethod
def get_properties(cls):
return [prop for prop in dir(cls) if isinstance(getattr(cls, prop), property) and prop not in ("__values__", "fields")]
def dict(
self,
*,
include: Union['AbstractSetIntStr', 'MappingIntStrAny'] = None,
exclude: Union['AbstractSetIntStr', 'MappingIntStrAny'] = None,
by_alias: bool = False,
skip_defaults: bool = None,
exclude_unset: bool = False,
exclude_defaults: bool = False,
exclude_none: bool = False,
) -> 'DictStrAny':
attribs = super().dict(
include=include,
exclude=exclude,
by_alias=by_alias,
skip_defaults=skip_defaults,
exclude_unset=exclude_unset,
exclude_defaults=exclude_defaults,
exclude_none=exclude_none
)
props = self.get_properties()
# Include and exclude properties
if include:
props = [prop for prop in props if prop in include]
if exclude:
props = [prop for prop in props if prop not in exclude]
# Update the attribute dict with the properties
if props:
attribs.update({prop: getattr(self, prop) for prop in props})
return attribs
You might be able to serialize your _key field using a pydantic validator with the always option set to True.
Using your example:
from typing import Optional
from pydantic import BaseModel, Field, validator
class SomeClass(BaseModel):
spam: str
eggs: Optional[str] = Field(alias="_eggs")
#validator("eggs", always=True)
def set_eggs(cls, v, values, **kwargs):
"""Set the eggs field based upon a spam value."""
return v or values.get("spam") + " bacon"
a = SomeClass(spam="I like")
my_dictionary = a.dict(by_alias=True)
print(my_dictionary)
> {'spam': 'I like', '_eggs': 'I like bacon'}
print(my_dictionary.get("_eggs"))
> "I like bacon"
So to serialize your _eggs field, instead of appending a string, you'd insert your serialization function there and return the output of that.