staticmethod vs classmethod for factory pattern when using pydantic - python

Reading most questions with #classmethod vs #staticmethod, replies state that static is almost useless in python except to logically group functions, but module can do the work.
But I ended up with a factory pattern, using pydantic, where I cannot see how we can replace the static method with a class method.
Without pydantic, one would put the logic for object instanciation in __init__() constructor.
Using pydantic, I ended up to create a factory like build_static() (that will call some logic, init3x() here)
from pydantic import BaseModel
class A(BaseModel):
text: str
uid: int
#staticmethod
def build_static(text, uid):
a = A(text=text, uid=uid)
a.init3x()
a.show()
return a
def init3x(self):
self.text *= 3
print(self.text)
#classmethod
def show(cls):
print(cls.text, cls.uid)
#classmethod
def build(cls, text, uid):
cls.uid = uid
cls.text = text
cls.init3x(cls)
cls.show()
return cls
My question is: is build_static() as good pattern, using a static method ? If not, what is the right way ?
Note 1: class method build() seems a wrong pattern in this case, since there is a logic that should only apply to each instance independently (init3x() in my example)
a = A.build('test ', 1)
b = A.build('test B ', 2)
c = A.build_static('test C ', 3)
d = A.build_static('test D ', 4)
a.text, b.text, c.text, d.text
> ('test B test B test B ',
'test B test B test B ',
'test C test C test C ',
'test D test D test D ')
Which is not what you want with a factory for instances a and b.
Note 2: cls.init3x(cls) looks highly anti-pythonic to me.
UPDATE:
Seems staticmethod for factory is seen as default pattern in this book Python3 patterns idioms

If your goal is to have a constructor for A that takes the provided initial text field value and modifies it (like multiplying it by 3), three distinct simple approaches come to mind.
A) Override the __init__ method
The first option is just to put that modification logic into the __init__ method.
from pydantic import BaseModel
class A(BaseModel):
text: str
uid: int
def __init__(self, text: str, uid: int) -> None:
super().__init__(text=text * 3, uid=uid)
Pros
Straightforward and fairly transparent
Cons
Harder to maintain, if model fields change.
Unexpected behavior because a user passing a text argument to the __init__ method expects that value to end up on the model instance (and not something else).
No validation of the text argument before calling the parent class' __init__, which means the modification operation might fail or cause unexpected results, if some other type is passed.
Usage
# As expected:
a = A(text="foo", uid=123)
print(a) # text='foofoofoo' uid=123
# Oops:
a = A(text=1, uid=123)
print(a) # text='3' uid=123
B) Alternative constructor
Alternatively, you can write a custom constructor that looks essentially the same as the __init__ method and exists alongside it. Here we actually decorate it as a classmethod because we are using the class inside the method.
from __future__ import annotations
from pydantic import BaseModel
class A(BaseModel):
text: str
uid: int
#classmethod
def custom_constructor(cls, text: str, uid: int) -> A:
return cls(text=3 * text, uid=uid)
Pros
Clear to the user that the behavior is different from what he would expect of the regular constructor (the __init__ method).
Cons
Hard to maintain (same as the previous solution).
No pre-validation (same as above)
Usage
a = A.custom_constructor(text="foo", uid=123)
print(a) # text='foofoofoo' uid=123
C) Custom validator
We can simply define a regular validator for the text field that does the modification during initialization.
from pydantic import BaseModel, validator
class A(BaseModel):
text: str
uid: int
#validator("text")
def text_times_3(cls, v: str) -> str:
return v * 3
Pros
Concise (utilizing Pydantic capabilities)
Very convenient to maintain because it is unrelated to other model fields
Default validation of the initial text value ensures our custom validator will really deal with a str
Cons
Potentially unexpected for users because the text argument will be modified before ending up on the model field (same as the first solution).
Usage
# As expected:
a = A(text="foo", uid=123)
print(a) # text='foofoofoo' uid=123
# Also as expected:
a = A(text=1, uid=123)
print(a) # text='111' uid=123

Related

Abstract classes and metaclasses with dataclasses in python

I'm trying to isolate a field and a method from classes to work with mongodb.
Example of the working class:
#dataclass
class Article(Mongodata):
name: str
quantity: int
description: str
_id: Optional[int] = None
def __getdict__(self):
result = asdict(self)
result.pop("_id")
return result
How can I isolate _id and getdict into an abstract class so that everything works.
#dataclass
class Article(Mongodata):
name: str
quantity: int
description: str
#dataclass
class Mongodata(ABCMeta):
#property
#abstractmethod
def _id(self) -> Optional[int]:
return None
def __getdict__(self):
result = asdict(self)
result.pop("_id")
return result
Can you explain how abstract and metaclasses differ, and I came from java, and after reading about it I didn't understand anything?
As you mentioned you're on Python 3.9, you can set it up the same way you had it above, however if you declare the fields in Article as above and add a field definition in the superclass like below:
#dataclass
class Mongodata(ABC):
_id: Optional[int] = None
Then if you actually try to run the code, you would run into a TypeError as below:
TypeError: non-default argument 'name' follows default argument
The reason for this is the order in which dataclasses resolves the fields for a dataclass when inheritance is involved. In this case, it adds the _id field from the superclass first, and then all the fields in the Article dataclass next. Since the first param that it adds has a default value, but the params that follow it don't have a default value, it'll raise a TypeError as you might expect.
Note that you'd actually run into the same behavior if you had decided to manually generate an __init__ method for the Article class in the same way:
def __init__(self, _id: Optional[int] = None, name: str, quantity: int, description: str):
^
SyntaxError: non-default argument follows default argument
The best approach in Python 3.9, seems to be declare the dataclasses this way, so that all fields in the subclass have default values:
from abc import ABC
from dataclasses import dataclass, asdict
from typing import Optional
#dataclass
class Mongodata(ABC):
_id: Optional[int] = None
def __getdict__(self):
result = asdict(self)
result.pop("_id")
return result
#dataclass
class Article(Mongodata):
name: str = None
quantity: int = None
description: str = None
But then positional arguments from creating an Article object will be a problem, because it'll assign the first argument passed in to the constructor to _id:
a = Article('123', 321, 'desc')
So you could instead pass None as the first positional argument, and that'll get assigned to _id. Another approach that works, is to then pass keyword arguments into the constructor instead:
a = Article(name='123', quantity=321, description='desc')
This actually feels more natural with the kw_only param that was introduced to dataclasses in Python 3.10 as a means to resolve this same issue, but more on that below.
A Metaclass Approach
Another option is to declare a function which can be used as a metaclass, as below:
from dataclasses import asdict
from typing import Optional
def add_id_and_get_dict(name: str, bases: tuple[type, ...], cls_dict: dict):
"""Metaclass to add an `_id` field and a `get_dict` method."""
# Get class annotations
cls_annotations = cls_dict['__annotations__']
# This assigns the `_id: Optional[int]` annotation
cls_annotations['_id'] = Optional[int]
# This assigns the `_id = None` assignment
cls_dict['_id'] = None
def get_dict(self):
result = asdict(self)
result.pop('_id')
return result
# add get_dict() method to the class
cls_dict['get_dict'] = get_dict
# create and return a new class
cls = type(name, bases, cls_dict)
return cls
Then you can simplify your dataclass definition a little. Also you technically don't need to define a get_dict method here, but it's useful so that an IDE knows that such a method exists on the class.
from dataclasses import dataclass
from typing import Any
#dataclass
class Article(metaclass=add_id_and_get_dict):
name: str
quantity: int
description: str
# Add for type hinting, so the IDE knows such a method exists.
def get_dict(self) -> dict[str, Any]:
...
And now it's a bit more intuitive when you want to create new Article objects:
a = Article('abc', 123, 'desc')
print(a) # Article(name='abc', quantity=123, description='desc', _id=None)
print(a._id) # None
print(a.get_dict()) # {'name': 'abc', 'quantity': 123, 'description': 'desc'}
a2 = Article('abc', 321, 'desc', _id=12345)
print(a2) # Article(name='abc', quantity=321, description='desc', _id=12345)
print(a2._id) # 12345
print(a2.get_dict()) # {'name': 'abc', 'quantity': 321, 'description': 'desc'}
Keyword-only Arguments
In Python 3.10, if you don't want to assign default values to all the fields in a subclass, another option is to decorate the superclass with #dataclass(kw_only=True), so that fields defined in that class are then required to be keyword-only arguments by default.
You can also use the KW_ONLY sentinel value as a type annotation which is provided in dataclasses in Python 3.10 as shown below, which should also make things much simpler and more intuitive to work with.
from abc import ABC
from dataclasses import dataclass, asdict, KW_ONLY
from typing import Optional
#dataclass
class Mongodata(ABC):
_: KW_ONLY
_id: Optional[int] = None
#property
def dict(self):
result = asdict(self)
result.pop("_id")
return result
# noinspection PyDataclass
#dataclass
class Article(Mongodata):
name: str
quantity: int
description: str
Essentially, any fields defined after the _: KW_ONLY then become keyword-only arguments to the constructor.
Now the usage should be exactly as desired. You can pass both keyword and positional arguments to the constructor, and it appears to work as intended:
a = Article(name='123', quantity=123, description='desc')
print(a) # Article(_id=None, name='123', quantity=123, description='desc')
print(a._id) # None
print(a.dict) # {'name': '123', 'quantity': 123, 'description': 'desc'}
a2 = Article('123', 321, 'desc', _id=112233)
print(a2) # Article(_id=112233, name='123', quantity=321, description='desc')
print(a2._id) # 112233
print(a2.dict) # {'name': '123', 'quantity': 321, 'description': 'desc'}
Also, just a quick explanation that I've been able to come up with, on why this appears to work as it does. Since you've only decorated the superclass as kw_only=True, all this accomplishes is in making _id as a keyword-only argument to the constructor. The fields in the subclass are allowed as either keyword or positional arguments, since we didn't specify kw_only for them.
An easier way to think about this, is to imagine that the signature of the __init__() method that dataclasses generates, actually looks like this:
def __init__(self, name: str, quantity: int, description: str, *, _id: Optional[int] = None):
In Python (not necessarily in 3.10 alone), the appearance of * in a function signifies that all the parameters that follow it are then declared as keyword-only arguments. Note that the _id argument, in this case is added as a keyword-argument after all the positional arguments from the subclass. This means that the method signature is valid, since it's certainly possible for keyword-only arguments to a method to have default values as we do here.

How to use TypeVar for input and output of multiple generic Protocols in python?

I want to use multiple generic protocols and ensure they're compatible:
from typing import TypeVar, Protocol, Generic
from dataclasses import dataclass
# checking fails as below and with contravariant=True or covariant=True:
A = TypeVar("A")
class C(Protocol[A]):
def f(self, a: A) -> None: pass
class D(Protocol[A]):
def g(self) -> A: pass
# Just demonstrates my use case; doesn't have errors:
#dataclass
class CompatibleThings(Generic[A]):
c: C[A]
d: D[A]
Mypy gives the following error:
Invariant type variable 'A' used in protocol where contravariant one is expected
Invariant type variable 'A' used in protocol where covariant one is expected
I know this can be done by making C and D generic ABC classes, but I want to use protocols.
The short explanation is that your approach breaks subtype transitivity; see this section of PEP 544 for more information. It gives a pretty clear explanation of why your D protocol (and, implicitly, your C protocol) run into this problem, and why it requires different types of variance for each to solve it. You can also look on Wikipedia for info on type variance.
Here's the workaround: use covariant and contravariant protocols, but make your generic dataclass invariant. The big hurdle here is inheritance, which you have to handle in order to use Protocols, but is kind of tangential to your goal. I'm going to switch naming here to highlight the inheritance at play, which is what this is all about:
A = TypeVar("A") # Invariant type
A_cov = TypeVar("A_cov", covariant=True) # Covariant type
A_contra = TypeVar("A_contra", contravariant=True) # Contravariant type
# Give Intake its contravariance
class Intake(Protocol[A_contra]):
def f(self, a: A_contra) -> None: pass
# Give Output its covariance
class Output(Protocol[A_cov]):
def g(self) -> A_cov: pass
# Just tell IntakeOutput that the type needs to be the same
# Since a is invariant, it doesn't care that
# Intake and Output require contra / covariance
#dataclass
class IntakeOutput(Generic[A]):
intake: Intake[A]
output: Output[A]
You can see that this works with the following tests:
class Animal:
...
class Cat(Animal):
...
class Dog(Animal):
...
class IntakeCat:
def f(self, a: Cat) -> None: pass
class IntakeDog:
def f(self, a: Dog) -> None: pass
class OutputCat:
def g(self) -> Cat: pass
class OutputDog:
def g(self) -> Dog: pass
compat_cat: IntakeOutput[Cat] = IntakeOutput(IntakeCat(), OutputCat())
compat_dog: IntakeOutput[Dog] = IntakeOutput(IntakeDog(), OutputDog())
# This is gonna error in mypy
compat_fail: IntakeOutput[Dog] = IntakeOutput(IntakeDog(), OutputCat())
which gives the following error:
main.py:48: error: Argument 2 to "IntakeOutput" has incompatible type "OutputCat"; expected "Output[Dog]"
main.py:48: note: Following member(s) of "OutputCat" have conflicts:
main.py:48: note: Expected:
main.py:48: note: def g(self) -> Dog
main.py:48: note: Got:
main.py:48: note: def g(self) -> Cat
So what's the catch? What are you giving up? Namely, inheritance in IntakeOutput. Here's what you can't do:
class IntakeAnimal:
def f(self, a: Animal) -> None: pass
class OutputAnimal:
def g(self) -> Animal: pass
# Ok, as expected
ok1: IntakeOutput[Animal] = IntakeOutput(IntakeAnimal(), OutputAnimal())
# Ok, because Output is covariant
ok2: IntakeOutput[Animal] = IntakeOutput(IntakeAnimal(), OutputDog())
# Both fail, because Intake is contravariant
fails1: IntakeOutput[Animal] = IntakeOutput(IntakeDog(), OutputDog())
fails2: IntakeOutput[Animal] = IntakeOutput(IntakeDog(), OutputAnimal())
# Ok, because Intake is contravariant
ok3: IntakeOutput[Dog] = IntakeOutput(IntakeAnimal(), OutputDog())
# This fails, because Output is covariant
fails3: IntakeOutput[Dog] = IntakeOutput(IntakeAnimal(), OutputAnimal())
fails4: IntakeOutput[Dog] = IntakeOutput(IntakeDog(), OutputAnimal())
So. There it is. You can play around with this more here.

Annotating return types for methods returning self in mixins

I am using a builder pattern where most methods on a (big) class return their identity (self) and are thus annotated to return the type of the class they're a member of:
class TextBuilder:
parts: List[str] # omitted
render: Callable[[], str] # for brevity
def text(self, val: str) -> "TextBuilder":
self.parts.append(val)
return self
def bold(self, val: str) -> "TextBuilder":
self.parts.append(f"<b>{val}</b>")
return self
...
Example usage:
joined_text = TextBuilder().text("a ").bold("bold").text(" text").render()
# a <b>bold</b> text
Now as this class is growing large I would like to split and group related methods up into mixins:
class BaseBuilder:
parts: List[str] # omitted
render: Callable[[], str] # for brevity
class TextBuilder(BaseBuilder):
def text(self, val: str):
self.parts.append(val)
return self
...
class HtmlBuilder(BaseBuilder):
def bold(self, val: str):
self.parts.append(f"<b>{val}</b>")
return self
...
class FinalBuilder(TextBuilder, HtmlBuilder):
pass
However, I do not see a way to properly annotate the mixin classes' return types in a way that the resulting class FinalBuilder always makes mypy believe that it returns FinalBuilder and not one of the mixin classes. All that of course assuming I want to actually annotate self and return types because they may not be inferred from what goes on inside those methods.
I have tried making the mixin classes generic and marking them explicitly as returning a type T bound to BaseBuilder, but that did not satisfy mypy. Any ideas? For now I am just going to skip all these shenanigans and omit the return types everywhere as they should be properly inferred when using the FinalBuilder, but I'm still curious if there is a general way to approach this.
If you want the return type to always be what self is, just annotate the self parameter like so:
from typing import List, Callable, TypeVar
T = TypeVar('T', bound=BaseBuilder)
class BaseBuilder:
parts: List[str] # omitted
render: Callable[[], str] # for brevity
class TextBuilder(BaseBuilder):
def text(self: T, val: str) -> T:
self.parts.append(val)
return self
...
class HtmlBuilder(BaseBuilder):
def bold(self: T, val: str) -> T:
self.parts.append(f"<b>{val}</b>")
return self
...
class FinalBuilder(TextBuilder, HtmlBuilder):
pass
# Type checks
f = FinalBuilder().text("foo").bold("bar")
# Mypy states this is type 'FinalBuilder'
reveal_type(f)
A few notes:
If we don't annotate self, mypy will normally assume it's the type of whatever class we're currently contained in. However, it's actually fine to give it a custom type hint if you want, so long as that type hint is compatible with the class. (For example, it wouldn't be legal to add a def foo(self: int) -> None to HtmlBuilder since int isn't a supertype of HtmlBuilder.)
We take advantage of this by making self generic so we can specify a more specific return type.
See the mypy docs for more details: https://mypy.readthedocs.io/en/stable/generics.html#generic-methods-and-generic-self
I bounded the TypeVar to BaseBuilder so that both functions would be able to see the parts and render fields. If you want your text(...) and bold(...) functions to also see fields defined within TextBuilder and HtmlBuilder respectively, you'll need to create two TypeVars bound to these more specific child classes.

how to define constructor for Python's new NamedTuple type?

As you may know, this is most recent type of defining named tuples in python:
from typing import NamedTuple
class MyType(NamedTuple):
id: int = 0
name: str = 0
After defining the type, Python interpreter defines a default constructor getting id and name and you can instantiate a new object using your fields.
Now I want to initialise a new object using a string and within the function I parse it. How can I define another constructor without spoiling the good default ones?
How can I define another constructor without spoiling the good default ones?
You can't. Python classes can't have multiple __new__ methods (or, if you meant "initializer", __init__ methods), just one.
But there's an easy way to work around this: the alternate constructor idiom: you write a #classmethod that provides an alternate way to construct instances. There are plenty of examples in the standard library, like datetime.now and datetime.utcfromtimestamp. There are even a few examples in the basic builtin types, like int.from_bytes.
Here's how that works:
class MyType(NamedTuple):
id: int = 0
name: str = 0
#classmethod
def from_string(cls, string_to_parse):
id, name = … your parsing code here …
return cls(id, name)
This is, of course, the same thing you'd do with a collections.namedtuple subclass, a #dataclass, or a plain-old class that had too many different ways to construct it.
If you really want to, the other way to do it is to provide an ugly constructor with either keyword-only parameters, or parameters that have different meanings depending on what you pass. With NamedTuple, you'll have to either insert an extra class in the way, or monkeypatch the class after creation, because otherwise there's no documented way of getting at the default constructor implementation.
So:
class _MyType(NamedTuple):
id: int = 0
name: str = 0
class MyType(_MyType):
def __new__(cls, id: int=None, name: str=None, *, parseything: str=None):
if parseything:
if id is not None or str is not None:
raise TypeError("don't provide both")
id, name = … your parsing code here …
return super().__new__(cls, id, name)
… or, if you prefer monkeypatching:
class MyType(NamedTuple):
id: int = 0
name: str = 0
_new = MyType.__new__
def __new__(cls, id=None, name=None, *, parseything=None):
if parseything:
if id is not None or str is not None:
raise TypeError("don't provide both")
id, name = … your parsing code here …
return _new(cls, id, name)
MyType.__new__ = __new__
del _new
del __new__
… or, if you want more of a range-style ugly API you can do either of the above with:
def __new__(cls, id_or_parsey_thing: Union[int,str]=None,
name: str=None):
if isinstance(id_or_parsey_thing, str):
if name is not None:
raise TypeError("don't provide both")
id, name = … your parsing code here …
else:
id = id_or_parsey_thing
# super().__new__ or _new here
Yes since Python 3.6 there is a new alternative for namedtuple - NamedTuple. Thanks to variable annotations now it is possible. So, if you previously wrote something like:
MyType = namedtuple('MyType', ('a', 'b', 'c'))
Now you can define it as follows.
To add a new constructor just define a classmethod:
from typing import NamedTuple
class MyType(NamedTuple):
a: str
b: int
c: float
#classmethod
def from_string(cls, s):
a, b, c = s.split()
return cls(a, int(b), float(c))
print(MyType.from_string('1 2 3'))

Overriding __contains__ method for a class

I need to simulate enums in Python, and did it by writing classes like:
class Spam(Enum):
k = 3
EGGS = 0
HAM = 1
BAKEDBEANS = 2
Now I'd like to test if some constant is a valid choice for a particular Enum-derived class, with the following syntax:
if (x in Foo):
print("seems legit")
Therefore I tried to create an "Enum" base class where I override the __contains__ method like this:
class Enum:
"""
Simulates an enum.
"""
k = 0 # overwrite in subclass with number of constants
#classmethod
def __contains__(cls, x):
"""
Test for valid enum constant x:
x in Enum
"""
return (x in range(cls.k))
However, when using the in keyword on the class (like the example above), I get the error:
TypeError: argument of type 'type' is not iterable
Why that? Can I somehow get the syntactic sugar I want?
Why that?
When you use special syntax like a in Foo, the __contains__ method is looked up on the type of Foo. However, your __contains__ implementation exists on Foo itself, not its type. Foo's type is type, which doesn't implement this (or iteration), thus the error.
The same situation occurs if you instantiate an object and then, after it is created, add a __contains__ function to the instance variables. That function won't be called:
>>> class Empty: pass
...
>>> x = Empty()
>>> x.__contains__ = lambda: True
>>> 1 in x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: argument of type 'Empty' is not iterable
Can I somehow get the syntactic sugar I want?
Yes. As mentioned above, the method is looked up on Foo's type. The type of a class is called a metaclass, so you need a new metaclass that implements __contains__.
Try this one:
class MetaEnum(type):
def __contains__(cls, x):
return x in range(cls.k)
As you can see, the methods on a metaclass take the metaclass instance -- the class -- as their first argument. This should make sense. It's very similar to a classmethod, except that the method lives on the metaclass and not the class.
Inheritance from a class with a custom metaclass also inherits the metaclass, so you can create a base class like so:
class BaseEnum(metaclass=MetaEnum):
pass
class MyEnum(BaseEnum):
k = 3
print(1 in MyEnum) # True
My usecase was to test on the names of the members of my Enum.
With a slight modification to this solution:
from enum import Enum, EnumMeta, auto
class MetaEnum(EnumMeta):
def __contains__(cls, item):
return item in cls.__members__.keys()
class BaseEnum(Enum, metaclass=MetaEnum):
pass
class LogSections(BaseEnum):
configuration = auto()
debug = auto()
errors = auto()
component_states = auto()
alarm = auto()
if __name__ == "__main__":
print('configuration' in LogSections)
print('b' in LogSections)
True
False

Categories