Changing frozen dataclass to support dynamic fields - python

I have a Python dataclass that looks like
#dataclass(frozen=True)
class BaseDataclass:
immutable_arg_1 = field1
immutable_arg_2 = field2
It's preferrable to not have to change the base class much since it's currently used in a bunch of places in the codebase it is in, but now there's a use-case for where having a dataclass that's dynamically constructed is much more convenitant. Is there any way of constructing a new dataclass that extends this with some dynamically chosen arguments (the subclass itself may be dynamically defined).
def dataclass_factory(kwargs: Dict) -> BaseDataclass:
#dataclass(frozen=True)
class DerivedDataclass(BaseDataclass):
**kwargs # not valid python syntax here
return DerivedDataclass(**kwargs)
so that I can do something like
new_dataclass = dataclass_factory({'immutable_arg_3': field3, 'immutable_arg_4': field4})

Related

Calling a dataclass constructor with decorator given only a type object

I have a dataclass which inherits an abstract class that implements some boilerplate, and also uses the #validate_arguments decorator to immediately cast strings back into numbers on object creation. The dataclass is a series of figures, some of which are calculated in the __post_init__.
report.py:
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from pydantic import validate_arguments
#dataclass
class Report(ABC):
def __post_init__(self):
self.process_attributes()
#abstractmethod
def process_attributes(self):
pass
#validate_arguments
#dataclass
class SpecificReport(Report):
some_number: int
some_other_number: float
calculated_field: float = field(init=False)
def process_attributes(self):
self.calculated_field = self.some_number * self.some_other_number
I then have another class which is initialized with a class of type Report, gathers some metadata on creation about that class, and then has methods which perform operations with these objects, including taking some content and then constructing new objects of this type from a dictionary. We determine which fields are set explicitly with inspect.signature and explode out our dictionary and call the constructor.
report_editor.py
from inspect import signature
from report import Report, SpecificReport
class ReportEditor:
def __init__(self, report_type: type[Report], content=None):
self.content = content
self.report_type = report_type
self.explicit_fields = list(signature(report_type).parameters.keys())
def process(self):
initializable_dict = {key: val for key, val in self.content.items() if key in self.explicit_fields}
report = self.report_type(**initializable_dict)
print(report)
However, this produces an error when hitting process_attributes, because the validate_arguments step is not performed. Aside from that, the object is initialized as I'd expect, but since the values are strings, they remain as such and only throw an exception once trying to do an operation.
This works just fine and produces the desired behavior:
def process(self):
initializable_dict = {key: val for key, val in self.content.items() if key in self.explicit_fields}
report = SpecificReport(**initializable_dict)
print(report)
but, of course, the intent is to abstract that away and allow this ReportEditor class to be able to do these operations without knowing what kind of Report it is.
here is main.py to run the reproducible example:
from report import SpecificReport
from report_editor import ReportEditor
def example():
new_report = SpecificReport(1, 1.0)
report_editor = ReportEditor(type(new_report), {
"some_number": "1",
"some_other_number": "1.0",
"calculated_field": "1.0"
})
report_editor.process()
if __name__ == '__main__':
example()
I tried putting #validate_arguments on both the parent and child classes, as well as only on the parent Report class. These both resulted in a TypeError: cannot create 'cython_function_or_method' instances. I'm not finding any other way there is to call the constructor from outside just using the type object.
Why is the constructor called properly, but not the decorator function in this instance? Is it possible to maybe cast a type object to a Callable in order to get the full constructor somehow? What am I missing? Or is this just not possible (maybe with generics)?
Here is the fundamental problem:
In [1]: import report
In [2]: new_report = report.SpecificReport(1, 1.0)
In [3]: type(new_report) is report.SpecificReport
Out[3]: False
This is happening because the pydantic.validate_arguments decorator returns a cythonized function:
In [4]: report.SpecificReport
Out[4]: <cyfunction SpecificReport at 0x1103bb370>
The function does the validation. The class constructor doesn't. It looks like this decorator is experimental, and at least for now, is not designed to work on classes (it just happens to work since a class is just a callable with .__annotations__ after all).
EDIT:
However, if you do want validation, you can use pydantic.dataclasses, which is a "drop-in" (not quite but drop-in by very close and they made a real effort at compatibility) replacement for the standard library dataclasses. You can use change the report.py to the following:
from abc import ABC, abstractmethod
import dataclasses
import pydantic
#pydantic.dataclasses.dataclass
class Report(ABC):
def __post_init_post_parse__(self, *args, **kwargs):
self.process_attributes()
#abstractmethod
def process_attributes(self, *args, **kwargs):
pass
#pydantic.dataclasses.dataclass
class SpecificReport(Report):
some_number: int
some_other_number: float
calculated_field: dataclasses.InitVar[float] = dataclasses.field(init=False)
def process_attributes(self, *args, **kwargs):
self.calculated_field = self.some_number * self.some_other_number
Some subtleties:
in __post_init__, the arguments haven't been parsed and validated, but you can use __post_init_post_parse__ if you want them validated/parsed. We do, or else self.some_number * self.some_other_number will raise the TypeError
Have to use dataclasses.InitVar along with dataclasses.field(init=False) because without InitVar, the validation fails if __post_init__ didn't set calculated_field (so we can't use the parsed fields in __post_init_post_parse__ because the missing attribute is checked earlier). There might be a way to prevent it from enforcing that, but this is what I found for now. I'm not very comfortable with it. Hopefully someone can find a better way.
had to use *args, **kwargs in __post_init_post_parse__ and in process because InitVar will pass an argument, so extenders of this class might want to do the same, so make it generic.
I had to add **args, **kwargs to the

Why are attributes defined outside __init__ in popular packages like SQLAlchemy or Pydantic?

I'm modifying an app, trying to use Pydantic for my application models and SQLAlchemy for my database models.
I have existing classes, where I defined attributes inside the __init__ method as I was taught to do:
class Measure:
def __init__(
self,
t_received: int,
mac_address: str,
data: pd.DataFrame,
battery_V: float = 0
):
self.t_received = t_received
self.mac_address = mac_address
self.data = data
self.battery_V = battery_V
In both Pydantic and SQLAlchemy, following the docs, I have to define those attributes outside the __init__ method, for example in Pydantic:
import pydantic
class Measure(pydantic.BaseModel):
t_received: int
mac_address: str
data: pd.DataFrame
battery_V: float
Why is it the case? Isn't this bad practice? Is there any impact on other methods (classmethods, staticmethods, properties ...) of that class?
Note that this is also very unhandy because when I instantiate an object of that class, I don't get suggestions on what parameters are expected by the constructor!
Defining attributes of a class in the class namespace directly is totally acceptable and is not special per se for the packages you mentioned. Since the class namespace is (among other things) essentially a blueprint for instances of that class, defining attributes there can actually be useful, when you want to e.g. provide all public attributes with type annotations in a single place in a consistent manner.
Consider also that a public attribute does not necessarily need to be reflected by a parameter in the constructor of the class. For example, this is entirely reasonable:
class Foo:
a: list[int]
b: str
def __init__(self, b: str) -> None:
self.a = []
self.b = b
In other words, just because something is a public attribute, that does not mean it should have to be provided by the user upon initialization. To say nothing of protected/private attributes.
What is special about Pydantic (to take your example), is that the metaclass of BaseModel as well as the class itself does a whole lot of magic with the attributes defined in the class namespace. Pydantic refers to a model's typical attributes as "fields" and one bit of magic allows special checks to be done during initialization based on those fields you defined in the class namespace. For example, the constructor must receive keyword arguments that correspond to the non-optional fields you defined.
from pydantic import BaseModel
class MyModel(BaseModel):
field_a: str
field_b: int = 1
obj = MyModel(
field_a="spam", # required
field_b=2, # optional
field_c=3.14, # unexpected/ignored
)
If I were to omit field_a during construction of a MyModel instance, an error would be raised. Likewise, if I had tried to pass field_b="eggs", an error would be raised.
So the fact that you don't write your own __init__ method is a feature Pydantic provides you. You only define the fields and an appropriate constructor is "magically" there for you already.
As for the drawback you mentioned, where you don't get any auto-suggestions, that is true by default for all IDEs. Static type checkers cannot understand that dynamic constructor and simply infer what arguments are expected. Currently this is solved via extensions, such as the mypy plugin and the PyCharm plugin. Maybe soon the #dataclass_transform decorator from PEP 681
will standardize this for similar packages and thus improve support by static type checkers.
It is also worth noting that even the standard library's dataclasses only work via special extensions in type checkers.
To your other question, there is obviously some impact on methods of such classes (by design), though the specifics are not always obvious. You should of course not simply write your own __init__ method without being careful to call the superclass' __init__ properly inside it. Also, #property-setters currently don't work as you would expect it (though it is debatable if it even makes sense to use properties on Pydantic models).
To wrap up, this approach is not only not bad practice, it is a great idea to reduce boilerplate code and it is extremely common these days, as evidenced by the fact that hugely popular and established packages (like the aforementioned Pydantic, as well as e.g. SQLAlchemy, Django and others) use this pattern to a certain extent.
Pydantic has its own (rewriting) magic, but SQLalchemy is a bit easier to explain.
A SA model looks like this :
>>> from sqlalchemy import Column, Integer, String
>>> class User(Base):
...
... id = Column(Integer, primary_key=True)
... name = Column(String)
Column, Integer and String are descriptors. A descriptor is a class that overrides the get and set methods. In practice, this means the class can control how data is accessed and stored.
For example this assignment would now use the __set__ method from Column:
class User(Base):
id = Column(Integer, primary_key=True)
name = Column(String)
user = User()
user.name = 'John'
This is the same as user.name.__set__('John') , however, because of the MRO, it finds a set method in Column, so uses that instead. In a simplified version the Column looks something like this:
class Column:
def __init__(self, field=""):
self.field= field
def __get__(self, obj, type):
return obj.__dict__.get(self.field)
def __set__(self, obj, val):
if validate_field(val)
obj.__dict__[self.field] = val
else:
print('not a valid value')
(This is similar to using #property. A Descriptor is a re-usable #property)

Python Dynamic Type Hints (like Dataclasses)

I have a dataclass, and a function which will create an instance of that dataclass using all the kwargs passed to it.
If I try to create an instance of that dataclass, I can see the type hints/autocomplete for the __init__ method. I just need the similar type hints for a custom function that I want to create.
from dataclasses import dataclass
#dataclass
class Model:
attr1: str
attr2: int
def my_func(**kwargs):
model = Model(**kwargs)
... # Do something else
ret = [model]
return ret
# my_func should show that it needs 'attr1' & 'attr2'
my_func(attr1='hello', attr2=65535)
If your IDE isn't sophisticated enough to infer that kwargs isn't used for anything else than to create a Model instance (I'm not sure if there is such an IDE), then it has no way of knowing that attr1 and attr2 are the required arguments, and the obvious solution would be to list them explicitly.
I would refactor the function so that it takes a Model instance as argument instead.
Then you would call it like my_func(Model(...)) and the IDE could offer the autocompletion for Model.

Can a method in a python class be annotated with a type that is defined by a subclass?

I have a superclass that has a method which is shared by its subclasses. However, this method should return an object with a type that is defined on the subclass. I'd like the return type for the method to be statically annotated (not a dynamic type) so code using the subclasses can benefit from mypy type checking on the return values. But I don't want to have to redefine the common method on the subclass just to provide its type annotation. Is this possible with python type annotations and mypy?
Something like this:
from typing import Type
class AbstractModel:
pass
class OrderModel(AbstractModel):
def do_order_stuff():
pass
class AbstractRepository:
model: Type[AbstractModel]
def get(self) -> model:
return self.model()
class OrderRepository(AbstractRepository):
model = OrderModel
repo = OrderRepository()
order = repo.get()
# Type checkers (like mypy) should recognize that this is valid
order.do_order_stuff()
# Type checkers should complain about this; because `OrderModel`
# does not define `foo`
order.foo()
The tricky move here is that get() is defined on the superclass AbstractRepository, which doesn't yet know the type of model. (And the -> model annotation fails, since the value of model hasn't been specified yet).
The value of model is specified by the subclass, but the subclass doesn't (re)define get() in order to provide the annotation. It seems like this should be statically analyzable; though it's a little tricky, since it would require the static analyzer to trace the model reference from the superclass to the subclass.
Any way to accomplish both a shared superclass implementation and a precise subclass return type?
Define AbstractRepository as a generic class.
from typing import TypeVar, Generic, Type, ClassVar
T = TypeVar('T')
class AbstractRespotitory(Generic[T]):
model: ClassVar[Type[T]]
#classmethod
def get(cls) -> T:
return cls.model()
(get only makes use of a class attribute, so can--and arguably should--be a class method.)

Python dataclasses: What type to use if __post_init__ performs type conversion?

I have a Python class, with a field which can be passed one of several sequence types. To simplify I'll stick with tuples and lists. __init__ converts the parameter to MyList.
from typing import Union
from dataclasses import dataclass, InitVar, field
class MyList(list):
pass
#dataclass
class Struct:
field: Union[tuple, list, MyList]
def __post_init__(self):
self.field = MyList(self.field)
What type should I use for the field declaration?
If I supply a union of all possible input types, the code does not document that field is always a MyList when accessed.
If I only supply the final MyList type, PyCharm complains when I pass Struct() a list.
I could instead use:
_field: InitVar[Union[tuple, list, MyList]] = None
field: MyList = field(init=False)
def __post_init__(self, _field):
self.field = MyList(_field)
but this is tremendously ugly, especially when repeated across 3 fields. Additionally I have to construct a struct like Struct(_field=field) instead of Struct(field=field).
In April 2018, "tm" commented on this issue on PyCharm's announcement: https://blog.jetbrains.com/pycharm/2018/04/python-37-introducing-data-class/#comment-323957
You are conflating assigning a value to the attribute with the code that produces the value to assign to the attribute. I would use a separate class method to keep the two pieces of code separate.
from dataclasses import dataclass
class MyList(list):
pass
#dataclass
class Struct:
field: MyList
#classmethod
def from_iterable(cls, x):
return cls(MyList(x))
s1 = Struct(MyList([1,2,3]))
s2 = Struct.from_iterable((4,5,6))
Now, you only pass an existing value of MyList to Struct.__init__. Tuples, lists, and whatever else MyList can accept are passed to Struct.from_iterable instead, which will take care of constructing the MyList instance to pass to Struct.
Have you tried a Pydantic BaseModel instead of a dataclass?
With the following code, my Pycharm does not complain:
from pydantic import BaseModel
class MyList(list):
pass
class PydanticStruct(BaseModel):
field: MyList
def __post_init__(self):
self.field = MyList(self.field)
a = PydanticStruct(field=['a', 'b'])
dataclasses works best in straight-forward data containers, advanced utilities like conversion were consciously ommitted (see here for a complete writeup of this and similar features). Implementing this is a fair bit of work, since it should also include the pycharm plugin that notices in how far conversion would be supported now.
A much better approach would be to use one of the 3rd party that already did this, the most popular one being pydantic, probably because it has the easiest migration for dataclasses.
A native pydantic solution could look like this, where the conversion code is part of MyList. Handling it that way makes the __post_init__ unnecessary, leading to cleaner model definitions:
import pydantic
class MyList(list):
#classmethod
def __get_validators__(cls):
"""Validators handle data validation, as well as data conversion.
This function yields validator functions, with the last-yielded
result being the final value of a pydantic field annotated with
this class's type.
Since we inherit from 'list', our constructor already supports
building 'MyList' instances from iterables - if we didn't, we
would need to write that code by hand and yield it instead.
"""
yield cls
class Struct(pydantic.BaseModel):
field: MyList # accepts any iterable as input
print(Struct(field=(1, 2, 3)))
# prints: field=[1, 2, 3]

Categories