Related
I'm trying to isolate a field and a method from classes to work with mongodb.
Example of the working class:
#dataclass
class Article(Mongodata):
name: str
quantity: int
description: str
_id: Optional[int] = None
def __getdict__(self):
result = asdict(self)
result.pop("_id")
return result
How can I isolate _id and getdict into an abstract class so that everything works.
#dataclass
class Article(Mongodata):
name: str
quantity: int
description: str
#dataclass
class Mongodata(ABCMeta):
#property
#abstractmethod
def _id(self) -> Optional[int]:
return None
def __getdict__(self):
result = asdict(self)
result.pop("_id")
return result
Can you explain how abstract and metaclasses differ, and I came from java, and after reading about it I didn't understand anything?
As you mentioned you're on Python 3.9, you can set it up the same way you had it above, however if you declare the fields in Article as above and add a field definition in the superclass like below:
#dataclass
class Mongodata(ABC):
_id: Optional[int] = None
Then if you actually try to run the code, you would run into a TypeError as below:
TypeError: non-default argument 'name' follows default argument
The reason for this is the order in which dataclasses resolves the fields for a dataclass when inheritance is involved. In this case, it adds the _id field from the superclass first, and then all the fields in the Article dataclass next. Since the first param that it adds has a default value, but the params that follow it don't have a default value, it'll raise a TypeError as you might expect.
Note that you'd actually run into the same behavior if you had decided to manually generate an __init__ method for the Article class in the same way:
def __init__(self, _id: Optional[int] = None, name: str, quantity: int, description: str):
^
SyntaxError: non-default argument follows default argument
The best approach in Python 3.9, seems to be declare the dataclasses this way, so that all fields in the subclass have default values:
from abc import ABC
from dataclasses import dataclass, asdict
from typing import Optional
#dataclass
class Mongodata(ABC):
_id: Optional[int] = None
def __getdict__(self):
result = asdict(self)
result.pop("_id")
return result
#dataclass
class Article(Mongodata):
name: str = None
quantity: int = None
description: str = None
But then positional arguments from creating an Article object will be a problem, because it'll assign the first argument passed in to the constructor to _id:
a = Article('123', 321, 'desc')
So you could instead pass None as the first positional argument, and that'll get assigned to _id. Another approach that works, is to then pass keyword arguments into the constructor instead:
a = Article(name='123', quantity=321, description='desc')
This actually feels more natural with the kw_only param that was introduced to dataclasses in Python 3.10 as a means to resolve this same issue, but more on that below.
A Metaclass Approach
Another option is to declare a function which can be used as a metaclass, as below:
from dataclasses import asdict
from typing import Optional
def add_id_and_get_dict(name: str, bases: tuple[type, ...], cls_dict: dict):
"""Metaclass to add an `_id` field and a `get_dict` method."""
# Get class annotations
cls_annotations = cls_dict['__annotations__']
# This assigns the `_id: Optional[int]` annotation
cls_annotations['_id'] = Optional[int]
# This assigns the `_id = None` assignment
cls_dict['_id'] = None
def get_dict(self):
result = asdict(self)
result.pop('_id')
return result
# add get_dict() method to the class
cls_dict['get_dict'] = get_dict
# create and return a new class
cls = type(name, bases, cls_dict)
return cls
Then you can simplify your dataclass definition a little. Also you technically don't need to define a get_dict method here, but it's useful so that an IDE knows that such a method exists on the class.
from dataclasses import dataclass
from typing import Any
#dataclass
class Article(metaclass=add_id_and_get_dict):
name: str
quantity: int
description: str
# Add for type hinting, so the IDE knows such a method exists.
def get_dict(self) -> dict[str, Any]:
...
And now it's a bit more intuitive when you want to create new Article objects:
a = Article('abc', 123, 'desc')
print(a) # Article(name='abc', quantity=123, description='desc', _id=None)
print(a._id) # None
print(a.get_dict()) # {'name': 'abc', 'quantity': 123, 'description': 'desc'}
a2 = Article('abc', 321, 'desc', _id=12345)
print(a2) # Article(name='abc', quantity=321, description='desc', _id=12345)
print(a2._id) # 12345
print(a2.get_dict()) # {'name': 'abc', 'quantity': 321, 'description': 'desc'}
Keyword-only Arguments
In Python 3.10, if you don't want to assign default values to all the fields in a subclass, another option is to decorate the superclass with #dataclass(kw_only=True), so that fields defined in that class are then required to be keyword-only arguments by default.
You can also use the KW_ONLY sentinel value as a type annotation which is provided in dataclasses in Python 3.10 as shown below, which should also make things much simpler and more intuitive to work with.
from abc import ABC
from dataclasses import dataclass, asdict, KW_ONLY
from typing import Optional
#dataclass
class Mongodata(ABC):
_: KW_ONLY
_id: Optional[int] = None
#property
def dict(self):
result = asdict(self)
result.pop("_id")
return result
# noinspection PyDataclass
#dataclass
class Article(Mongodata):
name: str
quantity: int
description: str
Essentially, any fields defined after the _: KW_ONLY then become keyword-only arguments to the constructor.
Now the usage should be exactly as desired. You can pass both keyword and positional arguments to the constructor, and it appears to work as intended:
a = Article(name='123', quantity=123, description='desc')
print(a) # Article(_id=None, name='123', quantity=123, description='desc')
print(a._id) # None
print(a.dict) # {'name': '123', 'quantity': 123, 'description': 'desc'}
a2 = Article('123', 321, 'desc', _id=112233)
print(a2) # Article(_id=112233, name='123', quantity=321, description='desc')
print(a2._id) # 112233
print(a2.dict) # {'name': '123', 'quantity': 321, 'description': 'desc'}
Also, just a quick explanation that I've been able to come up with, on why this appears to work as it does. Since you've only decorated the superclass as kw_only=True, all this accomplishes is in making _id as a keyword-only argument to the constructor. The fields in the subclass are allowed as either keyword or positional arguments, since we didn't specify kw_only for them.
An easier way to think about this, is to imagine that the signature of the __init__() method that dataclasses generates, actually looks like this:
def __init__(self, name: str, quantity: int, description: str, *, _id: Optional[int] = None):
In Python (not necessarily in 3.10 alone), the appearance of * in a function signifies that all the parameters that follow it are then declared as keyword-only arguments. Note that the _id argument, in this case is added as a keyword-argument after all the positional arguments from the subclass. This means that the method signature is valid, since it's certainly possible for keyword-only arguments to a method to have default values as we do here.
I'm currently trying my hands on the new dataclass constructions introduced in Python 3.7. I am currently stuck on trying to do some inheritance of a parent class. It looks like the order of the arguments are botched by my current approach such that the bool parameter in the child class is passed before the other parameters. This is causing a type error.
from dataclasses import dataclass
#dataclass
class Parent:
name: str
age: int
ugly: bool = False
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f'The Name is {self.name} and {self.name} is {self.age} year old')
#dataclass
class Child(Parent):
school: str
ugly: bool = True
jack = Parent('jack snr', 32, ugly=True)
jack_son = Child('jack jnr', 12, school = 'havard', ugly=True)
jack.print_id()
jack_son.print_id()
When I run this code I get this TypeError:
TypeError: non-default argument 'school' follows default argument
How do I fix this?
The way dataclasses combines attributes prevents you from being able to use attributes with defaults in a base class and then use attributes without a default (positional attributes) in a subclass.
That's because the attributes are combined by starting from the bottom of the MRO, and building up an ordered list of the attributes in first-seen order; overrides are kept in their original location. So Parent starts out with ['name', 'age', 'ugly'], where ugly has a default, and then Child adds ['school'] to the end of that list (with ugly already in the list). This means you end up with ['name', 'age', 'ugly', 'school'] and because school doesn't have a default, this results in an invalid argument listing for __init__.
This is documented in PEP-557 Dataclasses, under inheritance:
When the Data Class is being created by the #dataclass decorator, it looks through all of the class's base classes in reverse MRO (that is, starting at object) and, for each Data Class that it finds, adds the fields from that base class to an ordered mapping of fields. After all of the base class fields are added, it adds its own fields to the ordered mapping. All of the generated methods will use this combined, calculated ordered mapping of fields. Because the fields are in insertion order, derived classes override base classes.
and under Specification:
TypeError will be raised if a field without a default value follows a field with a default value. This is true either when this occurs in a single class, or as a result of class inheritance.
You do have a few options here to avoid this issue.
The first option is to use separate base classes to force fields with defaults into a later position in the MRO order. At all cost, avoid setting fields directly on classes that are to be used as base classes, such as Parent.
The following class hierarchy works:
# base classes with fields; fields without defaults separate from fields with.
#dataclass
class _ParentBase:
name: str
age: int
#dataclass
class _ParentDefaultsBase:
ugly: bool = False
#dataclass
class _ChildBase(_ParentBase):
school: str
#dataclass
class _ChildDefaultsBase(_ParentDefaultsBase):
ugly: bool = True
# public classes, deriving from base-with, base-without field classes
# subclasses of public classes should put the public base class up front.
#dataclass
class Parent(_ParentDefaultsBase, _ParentBase):
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f"The Name is {self.name} and {self.name} is {self.age} year old")
#dataclass
class Child(_ChildDefaultsBase, Parent, _ChildBase):
pass
By pulling out fields into separate base classes with fields without defaults and fields with defaults, and a carefully selected inheritance order, you can produce an MRO that puts all fields without defaults before those with defaults. The reversed MRO (ignoring object) for Child is:
_ParentBase
_ChildBase
_ParentDefaultsBase
Parent
_ChildDefaultsBase
Note that while Parent doesn't set any new fields, it does inherit the fields from _ParentDefaultsBase and should not end up 'last' in the field listing order; the above order puts _ChildDefaultsBase last so its fields 'win'. The dataclass rules are also satisfied; the classes with fields without defaults (_ParentBase and _ChildBase) precede the classes with fields with defaults (_ParentDefaultsBase and _ChildDefaultsBase).
The result is Parent and Child classes with a sane field older, while Child is still a subclass of Parent:
>>> from inspect import signature
>>> signature(Parent)
<Signature (name: str, age: int, ugly: bool = False) -> None>
>>> signature(Child)
<Signature (name: str, age: int, school: str, ugly: bool = True) -> None>
>>> issubclass(Child, Parent)
True
and so you can create instances of both classes:
>>> jack = Parent('jack snr', 32, ugly=True)
>>> jack_son = Child('jack jnr', 12, school='havard', ugly=True)
>>> jack
Parent(name='jack snr', age=32, ugly=True)
>>> jack_son
Child(name='jack jnr', age=12, school='havard', ugly=True)
Another option is to only use fields with defaults; you can still make in an error to not supply a school value, by raising one in __post_init__:
_no_default = object()
#dataclass
class Child(Parent):
school: str = _no_default
ugly: bool = True
def __post_init__(self):
if self.school is _no_default:
raise TypeError("__init__ missing 1 required argument: 'school'")
but this does alter the field order; school ends up after ugly:
<Signature (name: str, age: int, ugly: bool = True, school: str = <object object at 0x1101d1210>) -> None>
and a type hint checker will complain about _no_default not being a string.
You can also use the attrs project, which was the project that inspired dataclasses. It uses a different inheritance merging strategy; it pulls overridden fields in a subclass to the end of the fields list, so ['name', 'age', 'ugly'] in the Parent class becomes ['name', 'age', 'school', 'ugly'] in the Child class; by overriding the field with a default, attrs allows the override without needing to do a MRO dance.
attrs supports defining fields without type hints, but lets stick to the supported type hinting mode by setting auto_attribs=True:
import attr
#attr.s(auto_attribs=True)
class Parent:
name: str
age: int
ugly: bool = False
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f"The Name is {self.name} and {self.name} is {self.age} year old")
#attr.s(auto_attribs=True)
class Child(Parent):
school: str
ugly: bool = True
You can use attributes with defaults in parent classes if you exclude them from the init function. If you need the possibility to override the default at init, extend the code with the answer of Praveen Kulkarni.
from dataclasses import dataclass, field
#dataclass
class Parent:
name: str
age: int
ugly: bool = field(default=False, init=False)
#dataclass
class Child(Parent):
school: str
jack = Parent('jack snr', 32)
jack_son = Child('jack jnr', 12, school = 'havard')
jack_son.ugly = True
Or even
#dataclass
class Child(Parent):
school: str
ugly = True
# This does not work
# ugly: bool = True
jack_son = Child('jack jnr', 12, school = 'havard')
assert jack_son.ugly
Note that with Python 3.10, it is now possible to do it natively with dataclasses.
Dataclasses 3.10 added the kw_only attribute (similar to attrs).
It allows you to specify which fields are keyword_only, thus will be set at the end of the init, not causing an inheritance problem.
Taking directly from Eric Smith's blog post on the subject:
There are two reasons people [were asking for] this feature:
When a dataclass has many fields, specifying them by position can become unreadable. It also requires that for backward compatibility, all new fields are added to the end of the dataclass. This isn't always desirable.
When a dataclass inherits from another dataclass, and the base class has fields with default values, then all of the fields in the derived class must also have defaults.
What follows is the simplest way to do it with this new argument, but there are multiple ways you can use it to use inheritance with default values in the parent class:
from dataclasses import dataclass
#dataclass(kw_only=True)
class Parent:
name: str
age: int
ugly: bool = False
#dataclass(kw_only=True)
class Child(Parent):
school: str
ch = Child(name="Kevin", age=17, school="42")
print(ch.ugly)
Take a look at the blogpost linked above for a more thorough explanation of kw_only.
Cheers !
PS: As it is fairly new, note that your IDE might still raise a possible error, but it works at runtime
The approach below deals with this problem while using pure python dataclasses and without much boilerplate code.
The ugly: dataclasses.InitVar[bool] serves as a pseudo-field just to help us do initialization and will be lost once the instance is created. While _ugly: bool = field(init=False) is an instance member which will not be initialized by __init__ method but can be alternatively initialized using __post_init__ method (you can find more here.).
from dataclasses import dataclass, field, InitVar
#dataclass
class Parent:
name: str
age: int
ugly: InitVar[bool]
_ugly: bool = field(init=False)
def __post_init__(self, ugly: bool):
self._ugly = ugly
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f'The Name is {self.name} and {self.name} is {self.age} year old')
#dataclass
class Child(Parent):
school: str
jack = Parent('jack snr', 32, ugly=True)
jack_son = Child('jack jnr', 12, school='havard', ugly=True)
jack.print_id()
jack_son.print_id()
Note that this makes field ugly mandatory to make it optional you can define a class method on the Parent that includes ugly as an optional parameter:
from dataclasses import dataclass, field, InitVar
#dataclass
class Parent:
name: str
age: int
ugly: InitVar[bool]
_ugly: bool = field(init=False)
def __post_init__(self, ugly: bool):
self._ugly = ugly
#classmethod
def create(cls, ugly=True, **kwargs):
return cls(ugly=ugly, **kwargs)
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f'The Name is {self.name} and {self.name} is {self.age} year old')
#dataclass
class Child(Parent):
school: str
jack = Parent.create(name='jack snr', age=32, ugly=False)
jack_son = Child.create(name='jack jnr', age=12, school='harvard')
jack.print_id()
jack_son.print_id()
Now you can use the create(...) class method as a factory method for creating Parent/Child classes with a default value for ugly. Note you must use named parameters for this approach to work.
You're seeing this error because an argument without a default value is being added after an argument with a default value. The insertion order of inherited fields into the dataclass is the reverse of Method Resolution Order, which means that the Parent fields come first, even if they are over written later by their children.
An example from PEP-557 - Data Classes:
#dataclass
class Base:
x: Any = 15.0
y: int = 0
#dataclass
class C(Base):
z: int = 10
x: int = 15
The final list of fields is, in order,x, y, z. The final type of x is int, as specified in class C.
Unfortunately, I don't think there's any way around this. My understanding is that if the parent class has a default argument, then no child class can have non-default arguments.
based on Martijn Pieters solution I did the following:
1) Create a mixing implementing the post_init
from dataclasses import dataclass
no_default = object()
#dataclass
class NoDefaultAttributesPostInitMixin:
def __post_init__(self):
for key, value in self.__dict__.items():
if value is no_default:
raise TypeError(
f"__init__ missing 1 required argument: '{key}'"
)
2) Then in the classes with the inheritance problem:
from src.utils import no_default, NoDefaultAttributesChild
#dataclass
class MyDataclass(DataclassWithDefaults, NoDefaultAttributesPostInitMixin):
attr1: str = no_default
EDIT:
After a time I also find problems with this solution with mypy, the following code fix the issue.
from dataclasses import dataclass
from typing import TypeVar, Generic, Union
T = TypeVar("T")
class NoDefault(Generic[T]):
...
NoDefaultVar = Union[NoDefault[T], T]
no_default: NoDefault = NoDefault()
#dataclass
class NoDefaultAttributesPostInitMixin:
def __post_init__(self):
for key, value in self.__dict__.items():
if value is NoDefault:
raise TypeError(f"__init__ missing 1 required argument: '{key}'")
#dataclass
class Parent(NoDefaultAttributesPostInitMixin):
a: str = ""
#dataclass
class Child(Foo):
b: NoDefaultVar[str] = no_default
If you are using Python 3.10+, then you can utilize keyword-only arguments for the dataclass as discussed in this answer and in the python docs.
If you're using < Python 3.10, then you can utilize dataclasses.field with a default_factory that throws. Since the attribute will be declared with field(), it gets treated as if it has a default; but if a user attempts to create an instance without providing the value for that field, it will use the factory, which will error.
This technique isn't equivalent to keyword only, because you could still provide all the arguments positionally. However, this does solve the problem, and is simpler than messing around with various dataclass dunder methods.
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional, TypeVar
T = TypeVar("T")
def required() -> T:
f: T
def factory() -> T:
# mypy treats a Field as a T, even though it has attributes like .name, .default, etc
field_name = f.name # type: ignore[attr-defined]
raise ValueError(f"field '{field_name}' required")
f = field(default_factory=factory)
return f
#dataclass
class Event:
id: str
created_at: datetime
updated_at: Optional[datetime] = None
#dataclass
class NamedEvent(Event):
name: str = required()
event = NamedEvent(name="Some Event", id="ab13c1a", created_at=datetime.now())
print("created event:", event)
event2 = NamedEvent("ab13c1a", datetime.now(), name="Some Other Event")
print("created event:", event2)
event3 = NamedEvent("ab13c1a", datetime.now())
Output:
created event: NamedEvent(id='ab13c1a', created_at=datetime.datetime(2022, 7, 23, 19, 22, 17, 944550), updated_at=None, name='Some Event')
created event: NamedEvent(id='ab13c1a', created_at=datetime.datetime(2022, 7, 23, 19, 22, 17, 944588), updated_at=None, name='Some Other Event')
Traceback (most recent call last):
File ".../gist.py", line 39, in <module>
event3 = NamedEvent("ab13c1a", datetime.now())
File "<string>", line 6, in __init__
File ".../gist.py", line 14, in factory
raise ValueError(f"field '{field_name}' required")
ValueError: field 'name' required
You can also find this code on this github gist.
A possible work-around is to use monkey-patching to append the parent fields
import dataclasses as dc
def add_args(parent):
def decorator(orig):
"Append parent's fields AFTER orig's fields"
# Aggregate fields
ff = [(f.name, f.type, f) for f in dc.fields(dc.dataclass(orig))]
ff += [(f.name, f.type, f) for f in dc.fields(dc.dataclass(parent))]
new = dc.make_dataclass(orig.__name__, ff)
new.__doc__ = orig.__doc__
return new
return decorator
class Animal:
age: int = 0
#add_args(Animal)
class Dog:
name: str
noise: str = "Woof!"
#add_args(Animal)
class Bird:
name: str
can_fly: bool = True
Dog("Dusty", 2) # --> Dog(name='Dusty', noise=2, age=0)
b = Bird("Donald", False, 40) # --> Bird(name='Donald', can_fly=False, age=40)
It's also possible to prepend non-default fields,
by checking if f.default is dc.MISSING,
but this is probably too dirty.
While monkey-patching lacks some features of inheritance,
it can still be used to add methods to all pseudo-child classes.
For more fine-grained control, set the default values
using dc.field(compare=False, repr=True, ...)
You can use a modified version of dataclasses, which will generate a keyword only __init__ method:
import dataclasses
def _init_fn(fields, frozen, has_post_init, self_name):
# fields contains both real fields and InitVar pseudo-fields.
globals = {'MISSING': dataclasses.MISSING,
'_HAS_DEFAULT_FACTORY': dataclasses._HAS_DEFAULT_FACTORY}
body_lines = []
for f in fields:
line = dataclasses._field_init(f, frozen, globals, self_name)
# line is None means that this field doesn't require
# initialization (it's a pseudo-field). Just skip it.
if line:
body_lines.append(line)
# Does this class have a post-init function?
if has_post_init:
params_str = ','.join(f.name for f in fields
if f._field_type is dataclasses._FIELD_INITVAR)
body_lines.append(f'{self_name}.{dataclasses._POST_INIT_NAME}({params_str})')
# If no body lines, use 'pass'.
if not body_lines:
body_lines = ['pass']
locals = {f'_type_{f.name}': f.type for f in fields}
return dataclasses._create_fn('__init__',
[self_name, '*'] + [dataclasses._init_param(f) for f in fields if f.init],
body_lines,
locals=locals,
globals=globals,
return_type=None)
def add_init(cls, frozen):
fields = getattr(cls, dataclasses._FIELDS)
# Does this class have a post-init function?
has_post_init = hasattr(cls, dataclasses._POST_INIT_NAME)
# Include InitVars and regular fields (so, not ClassVars).
flds = [f for f in fields.values()
if f._field_type in (dataclasses._FIELD, dataclasses._FIELD_INITVAR)]
dataclasses._set_new_attribute(cls, '__init__',
_init_fn(flds,
frozen,
has_post_init,
# The name to use for the "self"
# param in __init__. Use "self"
# if possible.
'__dataclass_self__' if 'self' in fields
else 'self',
))
return cls
# a dataclass with a constructor that only takes keyword arguments
def dataclass_keyword_only(_cls=None, *, repr=True, eq=True, order=False,
unsafe_hash=False, frozen=False):
def wrap(cls):
cls = dataclasses.dataclass(
cls, init=False, repr=repr, eq=eq, order=order, unsafe_hash=unsafe_hash, frozen=frozen)
return add_init(cls, frozen)
# See if we're being called as #dataclass or #dataclass().
if _cls is None:
# We're called with parens.
return wrap
# We're called as #dataclass without parens.
return wrap(_cls)
(also posted as a gist, tested with Python 3.6 backport)
This will require to define the child class as
#dataclass_keyword_only
class Child(Parent):
school: str
ugly: bool = True
And would generate __init__(self, *, name:str, age:int, ugly:bool=True, school:str) (which is valid python). The only caveat here is not allowing to initialize objects with positional arguments, but otherwise it's a completely regular dataclass with no ugly hacks.
A quick and dirty solution:
from typing import Optional
#dataclass
class Child(Parent):
school: Optional[str] = None
ugly: bool = True
def __post_init__(self):
assert self.school is not None
Then go back and refactor once (hopefully) the language is extended.
I came back to this question after discovering that dataclasses may be getting a decorator parameter that allows fields to be reordered. This is certainly a promising development, though progress on this feature seems to have stalled somewhat.
Right now, you can get this behaviour, plus some other niceties, by using dataclassy, my reimplementation of dataclasses that overcomes frustrations like this. Using from dataclassy in place of from dataclasses in the original example means it runs without errors.
Using inspect to print the signature of Child makes what is going on clear; the result is (name: str, age: int, school: str, ugly: bool = True). Fields are always reordered so that fields with default values come after fields without them in the parameters to the initializer. Both lists (fields without defaults, and those with them) are still ordered in definition order.
Coming face to face with this issue was one of the factors that prompted me to write a replacement for dataclasses. The workarounds detailed here, while helpful, require code to be contorted to such an extent that they completely negate the readability advantage dataclasses' naive approach (whereby field ordering is trivially predictable) offers.
When you use Python inheritance to create dataclasses, you cannot guarantee that all fields with default values will appear after all fields without default values.
An easy solution is to avoid using multiple inheritance to construct a "merged" dataclass. Instead, we can build a merged dataclass just by filtering and sorting on the fields of your parent dataclasses.
Try out this merge_dataclasses() function:
import dataclasses
import functools
from typing import Iterable, Type
def merge_dataclasses(
cls_name: str,
*,
merge_from: Iterable[Type],
**kwargs,
):
"""
Construct a dataclass by merging the fields
from an arbitrary number of dataclasses.
Args:
cls_name: The name of the constructed dataclass.
merge_from: An iterable of dataclasses
whose fields should be merged.
**kwargs: Keyword arguments are passed to
:py:func:`dataclasses.make_dataclass`.
Returns:
Returns a new dataclass
"""
# Merge the fields from the dataclasses,
# with field names from later dataclasses overwriting
# any conflicting predecessor field names.
each_base_fields = [d.__dataclass_fields__ for d in merge_from]
merged_fields = functools.reduce(
lambda x, y: {**x, **y}, each_base_fields
)
# We have to reorder all of the fields from all of the dataclasses
# so that *all* of the fields without defaults appear
# in the merged dataclass *before* all of the fields with defaults.
fields_without_defaults = [
(f.name, f.type, f)
for f in merged_fields.values()
if isinstance(f.default, dataclasses._MISSING_TYPE)
]
fields_with_defaults = [
(f.name, f.type, f)
for f in merged_fields.values()
if not isinstance(f.default, dataclasses._MISSING_TYPE)
]
fields = [*fields_without_defaults, *fields_with_defaults]
return dataclasses.make_dataclass(
cls_name=cls_name,
fields=fields,
**kwargs,
)
And then you can merge dataclasses as follows. Note that we can merge A and B and the default fields b and d are moved to the end of the merged dataclass.
#dataclasses.dataclass
class A:
a: int
b: int = 0
#dataclasses.dataclass
class B:
c: int
d: int = 0
C = merge_dataclasses(
"C",
merge_from=[A, B],
)
# Note that
print(C(a=1, d=1).__dict__)
# {'a': 1, 'd': 1, 'b': 0, 'c': 0}
Of course, the pitfall of this solution is that C doesn't actually inherit from A and B, which means that you cannot use isinstance() or other type assertions to verify C's parentage.
Complementing the Martijn Pieters solution that uses attrs: it is possible to create the inheritance without the default attributes replication, with:
import attr
#attr.s(auto_attribs=True)
class Parent:
name: str
age: int
ugly: bool = attr.ib(default=False, kw_only=True)
#attr.s(auto_attribs=True)
class Child(Parent):
school: str
ugly: bool = True
More about the kw_only parameter can be found here
How about defining the ugly field like this, instead of the default way?
ugly: bool = field(metadata=dict(required=False, missing=False))
An experimental but interesting solution would be to use metaclasses. The solution below enables the usage of Python dataclasses with simple inheritance without using the dataclass decorator at all. Moreover, it makes it possible to inherit the fields of the parent base classes without complaining about the order of positional arguments(non-default fields).
from collections import OrderedDict
import typing as ty
import dataclasses
from itertools import takewhile
class DataClassTerm:
def __new__(cls, *args, **kwargs):
return super().__new__(cls)
class DataClassMeta(type):
def __new__(cls, clsname, bases, clsdict):
fields = {}
# Get list of base classes including the class to be produced(initialized without its original base classes as those have already become dataclasses)
bases_and_self = [dataclasses.dataclass(super().__new__(cls, clsname, (DataClassTerm,), clsdict))] + list(bases)
# Whatever is a subclass of DataClassTerm will become a DataClassTerm.
# Following block will iterate and create individual dataclasses and collect their fields
for base in bases_and_self[::-1]: # Ensure that last fields in last base is prioritized
if issubclass(base, DataClassTerm):
to_dc_bases = list(takewhile(lambda c: c is not DataClassTerm, base.__mro__))
for dc_base in to_dc_bases[::-1]: # Ensure that last fields in last base in MRO is prioritized(same as in dataclasses)
if dataclasses.is_dataclass(dc_base):
valid_dc = dc_base
else:
valid_dc = dataclasses.dataclass(dc_base)
for field in dataclasses.fields(valid_dc):
fields[field.name] = (field.name, field.type, field)
# Following block will reorder the fields so that fields without default values are first in order
reordered_fields = OrderedDict()
for n, t, f in fields.values():
if f.default is dataclasses.MISSING and f.default_factory is dataclasses.MISSING:
reordered_fields[n] = (n, t, f)
for n, t, f in fields.values():
if n not in reordered_fields.keys():
reordered_fields[n] = (n, t, f)
# Create a new dataclass using `dataclasses.make_dataclass`, which ultimately calls type.__new__, which is the same as super().__new__ in our case
fields = list(reordered_fields.values())
full_dc = dataclasses.make_dataclass(cls_name=clsname, fields=fields, init=True, bases=(DataClassTerm,))
# Discard the created dataclass class and create new one using super but preserve the dataclass specific namespace.
return super().__new__(cls, clsname, bases, {**full_dc.__dict__,**clsdict})
class DataClassCustom(DataClassTerm, metaclass=DataClassMeta):
def __new__(cls, *args, **kwargs):
if len(args)>0:
raise RuntimeError("Do not use positional arguments for initialization.")
return super().__new__(cls, *args, **kwargs)
Now let's create a sample dataclass with a parent dataclass and a sample mixing class:
class DataClassCustomA(DataClassCustom):
field_A_1: int = dataclasses.field()
field_A_2: ty.AnyStr = dataclasses.field(default=None)
class SomeOtherClass:
def methodA(self):
print('print from SomeOtherClass().methodA')
class DataClassCustomB(DataClassCustomA,SomeOtherClass):
field_B_1: int = dataclasses.field()
field_B_2: ty.Dict = dataclasses.field(default_factory=dict)
The result is
result_b = DataClassCustomB(field_A_1=1, field_B_1=2)
result_b
# DataClassCustomB(field_A_1=1, field_B_1=2, field_A_2=None, field_B_2={})
result_b.methodA()
# print from SomeOtherClass().methodA
An attempt to do the same with #dataclass decorator on each parent class would have raised an exception in the following child class, like TypeError(f'non-default argument <field-name) follows default argument'). The above solution prevents this from happening because the fields are first reordered. However, since the order of fields is modified the prevention of *args usage in DataClassCustom.__new__ is mandatory as the original order is no longer valid.
Although in Python >=3.10 the kw_only feature was introduced that essentially makes inheritance in dataclasses much more reliable, the above example still can be used as a way to make dataclasses inheritable that do not require the usage of #dataclass decorator.
I need a class that will accept a number of parameters, I know that all parameters will be provided but some maybe passed as None in which case my class will have to provide default values.
I want to setup a simple dataclass with a some default values like so:
#dataclass
class Specs1:
a: str
b: str = 'Bravo'
c: str = 'Charlie'
I would like to be able to get the default value for the second field but still set a value for the third one. I cannot do this with None because it is happily accepted as a value for my string:
r1 = Specs1('Apple', None, 'Cherry') # Specs1(a='Apple', b=None, c='Cherry')
I have come up with the following solution:
#dataclass
class Specs2:
def_b: ClassVar = 'Bravo'
def_c: ClassVar = 'Charlie'
a: str
b: str = def_b
c: str = def_c
def __post_init__(self):
self.b = self.def_b if self.b is None else self.b
self.c = self.def_c if self.c is None else self.c
Which seems to behave as intended:
r2 = Specs2('Apple', None, 'Cherry') # Specs2(a='Apple', b='Bravo', c='Cherry')
However, I feel it is quite ugly and that I am maybe missing something here. My actual class will have more fields so it will only get uglier.
The parameters passed to the class contain None and I do not have control over this aspect.
The simple solution is to just implement the default arguments in __post_init__() only!
#dataclass
class Specs2:
a: str
b: str
c: str
def __post_init__(self):
if self.b is None:
self.b = 'Bravo'
if self.c is None:
self.c = 'Charlie'
(Code is not tested. If I got some detail wrong, it wouldn't be the first time)
I know this is a little late, but inspired by MikeSchneeberger's answer I made a small adaptation to the __post_init__ function that allows you to keep the defaults in the standard format:
from dataclasses import dataclass, fields
def __post_init__(self):
# Loop through the fields
for field in fields(self):
# If there is a default and the value of the field is none we can assign a value
if not isinstance(field.default, dataclasses._MISSING_TYPE) and getattr(self, field.name) is None:
setattr(self, field.name, field.default)
Adding this to your dataclass should then ensure that the default values are enforced without requiring a new default class.
Here is another solution.
Define DefaultVal and NoneRefersDefault types:
from dataclasses import dataclass, fields
#dataclass
class DefaultVal:
val: Any
#dataclass
class NoneRefersDefault:
def __post_init__(self):
for field in fields(self):
# if a field of this data class defines a default value of type
# `DefaultVal`, then use its value in case the field after
# initialization has either not changed or is None.
if isinstance(field.default, DefaultVal):
field_val = getattr(self, field.name)
if isinstance(field_val, DefaultVal) or field_val is None:
setattr(self, field.name, field.default.val)
Usage:
#dataclass
class Specs3(NoneRefersDefault):
a: str
b: str = DefaultVal('Bravo')
c: str = DefaultVal('Charlie')
r3 = Specs3('Apple', None, 'Cherry') # Specs3(a='Apple', b='Bravo', c='Cherry')
EDIT #1: Rewritten NoneRefersDefault such that the following is possible as well:
#dataclass
r3 = Specs3('Apple', None) # Specs3(a='Apple', b='Bravo', c='Charlie')
EDIT #2: Note that if no class inherits from Spec, it might be better to have no default values in the dataclass and a "constructor" function create_spec instead:
#dataclass
class Specs4:
a: str
b: str
c: str
def create_spec(
a: str,
b: str = None,
c: str = None,
):
if b is None:
b = 'Bravo'
if c is None:
c = 'Charlie'
return Spec4(a=a, b=b, c=c)
also see dataclass-abc/example
In data classes you can access a default value of class attribute: Specs.b
You can check for None and pass default value if needed
Code for this:
dataclasses.dataclass()
class Specs1:
a: str
b: str = 'Bravo'
c: str = 'Charlie'
a = 'Apple'
b = None
c = 'Potato'
specs = Specs1(a=a, b=b or Specs1.b, c=c or Specs1.c)
>>> specs
Specs1(a='Apple', b='Bravo', c='Potato')
Use key based parameters. You can just do r2 = Specs1('Apple', c='Cherry'). You don't have to use None. Refer here.
Output:
Specs1(a='Apple', b='Bravo', c='Cherry')
#dataclass
class Specs1:
a: str
b: str = field(default='Bravo')
c: str = field(default='Charlie')
I understand that you just want positional arguments. This can be accomplished with in-line conditionals (for code readability).
class Specs():
def __init__(self, a=None,b=None,c=None):
self.a = a if a is not None else 'Apple'
sefl.b = b if b is not None else 'Bravo'
self.c = c if c is not None else 'Cherry'
example = Specs('Apple', None, 'Cherry')
This approach can be done without an init method, if you prefer it that way.
However, you may considered an __init__() method with named arguments.
class Specs():
def __init__(self, a = 'Apple', b = 'Bravo', c = 'Cherry'):
self.a = a
self.b = b
self.c = c
example = Specs('Apple', c = 'Cherry')
Perhaps the most efficient and convenient approach that I can think of for this task, involves using metaclasses in Python to automatically generate a __post_init__() method for the class, which will set the default value specified for a field if a None value is passed in for that field to __init__().
Assume we have these contents in a module metaclasses.py:
import logging
LOG = logging.getLogger(__name__)
logging.basicConfig(level='DEBUG')
def apply_default_values(name, bases, dct):
"""
Metaclass to generate a __post_init__() for the class, which sets the
default values for any fields that are passed in a `None` value in the
__init__() method.
"""
# Get class annotations, which `dataclasses` uses to determine which
# fields to add to the __init__() method.
cls_annotations = dct['__annotations__']
# This is a dict which will contain: {'b': 'Bravo', 'c': 'Charlie'}
field_to_default_val = {field: dct[field] for field in cls_annotations
if field in dct}
# Now we generate the lines of the __post_init()__ method
body_lines = []
for field, default_val in field_to_default_val.items():
body_lines.append(f'if self.{field} is None:')
body_lines.append(f' self.{field} = {default_val!r}')
# Then create the function, and add it to the class
fn = _create_fn('__post_init__',
('self', ),
body_lines)
dct['__post_init__'] = fn
# Return new class with the __post_init__() added
cls = type(name, bases, dct)
return cls
def _create_fn(name, args, body, *, globals=None):
"""
Create a new function. Adapted from `dataclasses._create_fn`, so we
can also log the function definition for debugging purposes.
"""
args = ','.join(args)
body = '\n'.join(f' {b}' for b in body)
# Compute the text of the entire function.
txt = f'def {name}({args}):\n{body}'
# Log the function declaration
LOG.debug('Creating new function:\n%s', txt)
ns = {}
exec(txt, globals, ns)
return ns[name]
Now in our main module, we can import and use the metaclass we just defined:
from dataclasses import dataclass
from metaclasses import apply_default_values
#dataclass
class Specs1(metaclass=apply_default_values):
a: str
b: str = 'Bravo'
c: str = 'Charlie'
r1 = Specs1('Apple', None, 'Cherry')
print(r1)
Output:
DEBUG:metaclasses:Creating new function:
def __post_init__(self):
if self.b is None:
self.b = 'Bravo'
if self.c is None:
self.c = 'Charlie'
Specs1(a='Apple', b='Bravo', c='Cherry')
To confirm that this approach is actually as efficient as stated, I've set up a small test case to create a lot of Spec objects, in order to time it against the version in #Lars's answer, which essentially does the same thing.
from dataclasses import dataclass
from timeit import timeit
from metaclasses import apply_default_values
#dataclass
class Specs1(metaclass=apply_default_values):
a: str
b: str = 'Bravo'
c: str = 'Charlie'
#dataclass
class Specs2:
a: str
b: str
c: str
def __post_init__(self):
if self.b is None:
self.b = 'Bravo'
if self.c is None:
self.c = 'Charlie'
n = 100_000
print('Manual: ', timeit("Specs2('Apple', None, 'Cherry')",
globals=globals(), number=n))
print('Metaclass: ', timeit("Specs1('Apple', None, 'Cherry')",
globals=globals(), number=n))
Timing for n=100,000 runs, the results show it's close enough to not really matter:
Manual: 0.059566365
Metaclass: 0.053688744999999996
Not too clear what you are trying to do with your Class. Should these defaults not rather be properties?
Maybe you need a definition used by your class that has default parameters such as:
def printMessage(name, msg = "My name is "):
print("Hello! ",msg + name)
printMessage("Jack")
Same thing applies to Classes.
Similar debate about "None" can be found here: Call function without optional arguments if they are None
I'm currently trying my hands on the new dataclass constructions introduced in Python 3.7. I am currently stuck on trying to do some inheritance of a parent class. It looks like the order of the arguments are botched by my current approach such that the bool parameter in the child class is passed before the other parameters. This is causing a type error.
from dataclasses import dataclass
#dataclass
class Parent:
name: str
age: int
ugly: bool = False
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f'The Name is {self.name} and {self.name} is {self.age} year old')
#dataclass
class Child(Parent):
school: str
ugly: bool = True
jack = Parent('jack snr', 32, ugly=True)
jack_son = Child('jack jnr', 12, school = 'havard', ugly=True)
jack.print_id()
jack_son.print_id()
When I run this code I get this TypeError:
TypeError: non-default argument 'school' follows default argument
How do I fix this?
The way dataclasses combines attributes prevents you from being able to use attributes with defaults in a base class and then use attributes without a default (positional attributes) in a subclass.
That's because the attributes are combined by starting from the bottom of the MRO, and building up an ordered list of the attributes in first-seen order; overrides are kept in their original location. So Parent starts out with ['name', 'age', 'ugly'], where ugly has a default, and then Child adds ['school'] to the end of that list (with ugly already in the list). This means you end up with ['name', 'age', 'ugly', 'school'] and because school doesn't have a default, this results in an invalid argument listing for __init__.
This is documented in PEP-557 Dataclasses, under inheritance:
When the Data Class is being created by the #dataclass decorator, it looks through all of the class's base classes in reverse MRO (that is, starting at object) and, for each Data Class that it finds, adds the fields from that base class to an ordered mapping of fields. After all of the base class fields are added, it adds its own fields to the ordered mapping. All of the generated methods will use this combined, calculated ordered mapping of fields. Because the fields are in insertion order, derived classes override base classes.
and under Specification:
TypeError will be raised if a field without a default value follows a field with a default value. This is true either when this occurs in a single class, or as a result of class inheritance.
You do have a few options here to avoid this issue.
The first option is to use separate base classes to force fields with defaults into a later position in the MRO order. At all cost, avoid setting fields directly on classes that are to be used as base classes, such as Parent.
The following class hierarchy works:
# base classes with fields; fields without defaults separate from fields with.
#dataclass
class _ParentBase:
name: str
age: int
#dataclass
class _ParentDefaultsBase:
ugly: bool = False
#dataclass
class _ChildBase(_ParentBase):
school: str
#dataclass
class _ChildDefaultsBase(_ParentDefaultsBase):
ugly: bool = True
# public classes, deriving from base-with, base-without field classes
# subclasses of public classes should put the public base class up front.
#dataclass
class Parent(_ParentDefaultsBase, _ParentBase):
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f"The Name is {self.name} and {self.name} is {self.age} year old")
#dataclass
class Child(_ChildDefaultsBase, Parent, _ChildBase):
pass
By pulling out fields into separate base classes with fields without defaults and fields with defaults, and a carefully selected inheritance order, you can produce an MRO that puts all fields without defaults before those with defaults. The reversed MRO (ignoring object) for Child is:
_ParentBase
_ChildBase
_ParentDefaultsBase
Parent
_ChildDefaultsBase
Note that while Parent doesn't set any new fields, it does inherit the fields from _ParentDefaultsBase and should not end up 'last' in the field listing order; the above order puts _ChildDefaultsBase last so its fields 'win'. The dataclass rules are also satisfied; the classes with fields without defaults (_ParentBase and _ChildBase) precede the classes with fields with defaults (_ParentDefaultsBase and _ChildDefaultsBase).
The result is Parent and Child classes with a sane field older, while Child is still a subclass of Parent:
>>> from inspect import signature
>>> signature(Parent)
<Signature (name: str, age: int, ugly: bool = False) -> None>
>>> signature(Child)
<Signature (name: str, age: int, school: str, ugly: bool = True) -> None>
>>> issubclass(Child, Parent)
True
and so you can create instances of both classes:
>>> jack = Parent('jack snr', 32, ugly=True)
>>> jack_son = Child('jack jnr', 12, school='havard', ugly=True)
>>> jack
Parent(name='jack snr', age=32, ugly=True)
>>> jack_son
Child(name='jack jnr', age=12, school='havard', ugly=True)
Another option is to only use fields with defaults; you can still make in an error to not supply a school value, by raising one in __post_init__:
_no_default = object()
#dataclass
class Child(Parent):
school: str = _no_default
ugly: bool = True
def __post_init__(self):
if self.school is _no_default:
raise TypeError("__init__ missing 1 required argument: 'school'")
but this does alter the field order; school ends up after ugly:
<Signature (name: str, age: int, ugly: bool = True, school: str = <object object at 0x1101d1210>) -> None>
and a type hint checker will complain about _no_default not being a string.
You can also use the attrs project, which was the project that inspired dataclasses. It uses a different inheritance merging strategy; it pulls overridden fields in a subclass to the end of the fields list, so ['name', 'age', 'ugly'] in the Parent class becomes ['name', 'age', 'school', 'ugly'] in the Child class; by overriding the field with a default, attrs allows the override without needing to do a MRO dance.
attrs supports defining fields without type hints, but lets stick to the supported type hinting mode by setting auto_attribs=True:
import attr
#attr.s(auto_attribs=True)
class Parent:
name: str
age: int
ugly: bool = False
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f"The Name is {self.name} and {self.name} is {self.age} year old")
#attr.s(auto_attribs=True)
class Child(Parent):
school: str
ugly: bool = True
You can use attributes with defaults in parent classes if you exclude them from the init function. If you need the possibility to override the default at init, extend the code with the answer of Praveen Kulkarni.
from dataclasses import dataclass, field
#dataclass
class Parent:
name: str
age: int
ugly: bool = field(default=False, init=False)
#dataclass
class Child(Parent):
school: str
jack = Parent('jack snr', 32)
jack_son = Child('jack jnr', 12, school = 'havard')
jack_son.ugly = True
Or even
#dataclass
class Child(Parent):
school: str
ugly = True
# This does not work
# ugly: bool = True
jack_son = Child('jack jnr', 12, school = 'havard')
assert jack_son.ugly
Note that with Python 3.10, it is now possible to do it natively with dataclasses.
Dataclasses 3.10 added the kw_only attribute (similar to attrs).
It allows you to specify which fields are keyword_only, thus will be set at the end of the init, not causing an inheritance problem.
Taking directly from Eric Smith's blog post on the subject:
There are two reasons people [were asking for] this feature:
When a dataclass has many fields, specifying them by position can become unreadable. It also requires that for backward compatibility, all new fields are added to the end of the dataclass. This isn't always desirable.
When a dataclass inherits from another dataclass, and the base class has fields with default values, then all of the fields in the derived class must also have defaults.
What follows is the simplest way to do it with this new argument, but there are multiple ways you can use it to use inheritance with default values in the parent class:
from dataclasses import dataclass
#dataclass(kw_only=True)
class Parent:
name: str
age: int
ugly: bool = False
#dataclass(kw_only=True)
class Child(Parent):
school: str
ch = Child(name="Kevin", age=17, school="42")
print(ch.ugly)
Take a look at the blogpost linked above for a more thorough explanation of kw_only.
Cheers !
PS: As it is fairly new, note that your IDE might still raise a possible error, but it works at runtime
The approach below deals with this problem while using pure python dataclasses and without much boilerplate code.
The ugly: dataclasses.InitVar[bool] serves as a pseudo-field just to help us do initialization and will be lost once the instance is created. While _ugly: bool = field(init=False) is an instance member which will not be initialized by __init__ method but can be alternatively initialized using __post_init__ method (you can find more here.).
from dataclasses import dataclass, field, InitVar
#dataclass
class Parent:
name: str
age: int
ugly: InitVar[bool]
_ugly: bool = field(init=False)
def __post_init__(self, ugly: bool):
self._ugly = ugly
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f'The Name is {self.name} and {self.name} is {self.age} year old')
#dataclass
class Child(Parent):
school: str
jack = Parent('jack snr', 32, ugly=True)
jack_son = Child('jack jnr', 12, school='havard', ugly=True)
jack.print_id()
jack_son.print_id()
Note that this makes field ugly mandatory to make it optional you can define a class method on the Parent that includes ugly as an optional parameter:
from dataclasses import dataclass, field, InitVar
#dataclass
class Parent:
name: str
age: int
ugly: InitVar[bool]
_ugly: bool = field(init=False)
def __post_init__(self, ugly: bool):
self._ugly = ugly
#classmethod
def create(cls, ugly=True, **kwargs):
return cls(ugly=ugly, **kwargs)
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f'The Name is {self.name} and {self.name} is {self.age} year old')
#dataclass
class Child(Parent):
school: str
jack = Parent.create(name='jack snr', age=32, ugly=False)
jack_son = Child.create(name='jack jnr', age=12, school='harvard')
jack.print_id()
jack_son.print_id()
Now you can use the create(...) class method as a factory method for creating Parent/Child classes with a default value for ugly. Note you must use named parameters for this approach to work.
You're seeing this error because an argument without a default value is being added after an argument with a default value. The insertion order of inherited fields into the dataclass is the reverse of Method Resolution Order, which means that the Parent fields come first, even if they are over written later by their children.
An example from PEP-557 - Data Classes:
#dataclass
class Base:
x: Any = 15.0
y: int = 0
#dataclass
class C(Base):
z: int = 10
x: int = 15
The final list of fields is, in order,x, y, z. The final type of x is int, as specified in class C.
Unfortunately, I don't think there's any way around this. My understanding is that if the parent class has a default argument, then no child class can have non-default arguments.
based on Martijn Pieters solution I did the following:
1) Create a mixing implementing the post_init
from dataclasses import dataclass
no_default = object()
#dataclass
class NoDefaultAttributesPostInitMixin:
def __post_init__(self):
for key, value in self.__dict__.items():
if value is no_default:
raise TypeError(
f"__init__ missing 1 required argument: '{key}'"
)
2) Then in the classes with the inheritance problem:
from src.utils import no_default, NoDefaultAttributesChild
#dataclass
class MyDataclass(DataclassWithDefaults, NoDefaultAttributesPostInitMixin):
attr1: str = no_default
EDIT:
After a time I also find problems with this solution with mypy, the following code fix the issue.
from dataclasses import dataclass
from typing import TypeVar, Generic, Union
T = TypeVar("T")
class NoDefault(Generic[T]):
...
NoDefaultVar = Union[NoDefault[T], T]
no_default: NoDefault = NoDefault()
#dataclass
class NoDefaultAttributesPostInitMixin:
def __post_init__(self):
for key, value in self.__dict__.items():
if value is NoDefault:
raise TypeError(f"__init__ missing 1 required argument: '{key}'")
#dataclass
class Parent(NoDefaultAttributesPostInitMixin):
a: str = ""
#dataclass
class Child(Foo):
b: NoDefaultVar[str] = no_default
If you are using Python 3.10+, then you can utilize keyword-only arguments for the dataclass as discussed in this answer and in the python docs.
If you're using < Python 3.10, then you can utilize dataclasses.field with a default_factory that throws. Since the attribute will be declared with field(), it gets treated as if it has a default; but if a user attempts to create an instance without providing the value for that field, it will use the factory, which will error.
This technique isn't equivalent to keyword only, because you could still provide all the arguments positionally. However, this does solve the problem, and is simpler than messing around with various dataclass dunder methods.
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional, TypeVar
T = TypeVar("T")
def required() -> T:
f: T
def factory() -> T:
# mypy treats a Field as a T, even though it has attributes like .name, .default, etc
field_name = f.name # type: ignore[attr-defined]
raise ValueError(f"field '{field_name}' required")
f = field(default_factory=factory)
return f
#dataclass
class Event:
id: str
created_at: datetime
updated_at: Optional[datetime] = None
#dataclass
class NamedEvent(Event):
name: str = required()
event = NamedEvent(name="Some Event", id="ab13c1a", created_at=datetime.now())
print("created event:", event)
event2 = NamedEvent("ab13c1a", datetime.now(), name="Some Other Event")
print("created event:", event2)
event3 = NamedEvent("ab13c1a", datetime.now())
Output:
created event: NamedEvent(id='ab13c1a', created_at=datetime.datetime(2022, 7, 23, 19, 22, 17, 944550), updated_at=None, name='Some Event')
created event: NamedEvent(id='ab13c1a', created_at=datetime.datetime(2022, 7, 23, 19, 22, 17, 944588), updated_at=None, name='Some Other Event')
Traceback (most recent call last):
File ".../gist.py", line 39, in <module>
event3 = NamedEvent("ab13c1a", datetime.now())
File "<string>", line 6, in __init__
File ".../gist.py", line 14, in factory
raise ValueError(f"field '{field_name}' required")
ValueError: field 'name' required
You can also find this code on this github gist.
A possible work-around is to use monkey-patching to append the parent fields
import dataclasses as dc
def add_args(parent):
def decorator(orig):
"Append parent's fields AFTER orig's fields"
# Aggregate fields
ff = [(f.name, f.type, f) for f in dc.fields(dc.dataclass(orig))]
ff += [(f.name, f.type, f) for f in dc.fields(dc.dataclass(parent))]
new = dc.make_dataclass(orig.__name__, ff)
new.__doc__ = orig.__doc__
return new
return decorator
class Animal:
age: int = 0
#add_args(Animal)
class Dog:
name: str
noise: str = "Woof!"
#add_args(Animal)
class Bird:
name: str
can_fly: bool = True
Dog("Dusty", 2) # --> Dog(name='Dusty', noise=2, age=0)
b = Bird("Donald", False, 40) # --> Bird(name='Donald', can_fly=False, age=40)
It's also possible to prepend non-default fields,
by checking if f.default is dc.MISSING,
but this is probably too dirty.
While monkey-patching lacks some features of inheritance,
it can still be used to add methods to all pseudo-child classes.
For more fine-grained control, set the default values
using dc.field(compare=False, repr=True, ...)
You can use a modified version of dataclasses, which will generate a keyword only __init__ method:
import dataclasses
def _init_fn(fields, frozen, has_post_init, self_name):
# fields contains both real fields and InitVar pseudo-fields.
globals = {'MISSING': dataclasses.MISSING,
'_HAS_DEFAULT_FACTORY': dataclasses._HAS_DEFAULT_FACTORY}
body_lines = []
for f in fields:
line = dataclasses._field_init(f, frozen, globals, self_name)
# line is None means that this field doesn't require
# initialization (it's a pseudo-field). Just skip it.
if line:
body_lines.append(line)
# Does this class have a post-init function?
if has_post_init:
params_str = ','.join(f.name for f in fields
if f._field_type is dataclasses._FIELD_INITVAR)
body_lines.append(f'{self_name}.{dataclasses._POST_INIT_NAME}({params_str})')
# If no body lines, use 'pass'.
if not body_lines:
body_lines = ['pass']
locals = {f'_type_{f.name}': f.type for f in fields}
return dataclasses._create_fn('__init__',
[self_name, '*'] + [dataclasses._init_param(f) for f in fields if f.init],
body_lines,
locals=locals,
globals=globals,
return_type=None)
def add_init(cls, frozen):
fields = getattr(cls, dataclasses._FIELDS)
# Does this class have a post-init function?
has_post_init = hasattr(cls, dataclasses._POST_INIT_NAME)
# Include InitVars and regular fields (so, not ClassVars).
flds = [f for f in fields.values()
if f._field_type in (dataclasses._FIELD, dataclasses._FIELD_INITVAR)]
dataclasses._set_new_attribute(cls, '__init__',
_init_fn(flds,
frozen,
has_post_init,
# The name to use for the "self"
# param in __init__. Use "self"
# if possible.
'__dataclass_self__' if 'self' in fields
else 'self',
))
return cls
# a dataclass with a constructor that only takes keyword arguments
def dataclass_keyword_only(_cls=None, *, repr=True, eq=True, order=False,
unsafe_hash=False, frozen=False):
def wrap(cls):
cls = dataclasses.dataclass(
cls, init=False, repr=repr, eq=eq, order=order, unsafe_hash=unsafe_hash, frozen=frozen)
return add_init(cls, frozen)
# See if we're being called as #dataclass or #dataclass().
if _cls is None:
# We're called with parens.
return wrap
# We're called as #dataclass without parens.
return wrap(_cls)
(also posted as a gist, tested with Python 3.6 backport)
This will require to define the child class as
#dataclass_keyword_only
class Child(Parent):
school: str
ugly: bool = True
And would generate __init__(self, *, name:str, age:int, ugly:bool=True, school:str) (which is valid python). The only caveat here is not allowing to initialize objects with positional arguments, but otherwise it's a completely regular dataclass with no ugly hacks.
A quick and dirty solution:
from typing import Optional
#dataclass
class Child(Parent):
school: Optional[str] = None
ugly: bool = True
def __post_init__(self):
assert self.school is not None
Then go back and refactor once (hopefully) the language is extended.
I came back to this question after discovering that dataclasses may be getting a decorator parameter that allows fields to be reordered. This is certainly a promising development, though progress on this feature seems to have stalled somewhat.
Right now, you can get this behaviour, plus some other niceties, by using dataclassy, my reimplementation of dataclasses that overcomes frustrations like this. Using from dataclassy in place of from dataclasses in the original example means it runs without errors.
Using inspect to print the signature of Child makes what is going on clear; the result is (name: str, age: int, school: str, ugly: bool = True). Fields are always reordered so that fields with default values come after fields without them in the parameters to the initializer. Both lists (fields without defaults, and those with them) are still ordered in definition order.
Coming face to face with this issue was one of the factors that prompted me to write a replacement for dataclasses. The workarounds detailed here, while helpful, require code to be contorted to such an extent that they completely negate the readability advantage dataclasses' naive approach (whereby field ordering is trivially predictable) offers.
When you use Python inheritance to create dataclasses, you cannot guarantee that all fields with default values will appear after all fields without default values.
An easy solution is to avoid using multiple inheritance to construct a "merged" dataclass. Instead, we can build a merged dataclass just by filtering and sorting on the fields of your parent dataclasses.
Try out this merge_dataclasses() function:
import dataclasses
import functools
from typing import Iterable, Type
def merge_dataclasses(
cls_name: str,
*,
merge_from: Iterable[Type],
**kwargs,
):
"""
Construct a dataclass by merging the fields
from an arbitrary number of dataclasses.
Args:
cls_name: The name of the constructed dataclass.
merge_from: An iterable of dataclasses
whose fields should be merged.
**kwargs: Keyword arguments are passed to
:py:func:`dataclasses.make_dataclass`.
Returns:
Returns a new dataclass
"""
# Merge the fields from the dataclasses,
# with field names from later dataclasses overwriting
# any conflicting predecessor field names.
each_base_fields = [d.__dataclass_fields__ for d in merge_from]
merged_fields = functools.reduce(
lambda x, y: {**x, **y}, each_base_fields
)
# We have to reorder all of the fields from all of the dataclasses
# so that *all* of the fields without defaults appear
# in the merged dataclass *before* all of the fields with defaults.
fields_without_defaults = [
(f.name, f.type, f)
for f in merged_fields.values()
if isinstance(f.default, dataclasses._MISSING_TYPE)
]
fields_with_defaults = [
(f.name, f.type, f)
for f in merged_fields.values()
if not isinstance(f.default, dataclasses._MISSING_TYPE)
]
fields = [*fields_without_defaults, *fields_with_defaults]
return dataclasses.make_dataclass(
cls_name=cls_name,
fields=fields,
**kwargs,
)
And then you can merge dataclasses as follows. Note that we can merge A and B and the default fields b and d are moved to the end of the merged dataclass.
#dataclasses.dataclass
class A:
a: int
b: int = 0
#dataclasses.dataclass
class B:
c: int
d: int = 0
C = merge_dataclasses(
"C",
merge_from=[A, B],
)
# Note that
print(C(a=1, d=1).__dict__)
# {'a': 1, 'd': 1, 'b': 0, 'c': 0}
Of course, the pitfall of this solution is that C doesn't actually inherit from A and B, which means that you cannot use isinstance() or other type assertions to verify C's parentage.
Complementing the Martijn Pieters solution that uses attrs: it is possible to create the inheritance without the default attributes replication, with:
import attr
#attr.s(auto_attribs=True)
class Parent:
name: str
age: int
ugly: bool = attr.ib(default=False, kw_only=True)
#attr.s(auto_attribs=True)
class Child(Parent):
school: str
ugly: bool = True
More about the kw_only parameter can be found here
How about defining the ugly field like this, instead of the default way?
ugly: bool = field(metadata=dict(required=False, missing=False))
An experimental but interesting solution would be to use metaclasses. The solution below enables the usage of Python dataclasses with simple inheritance without using the dataclass decorator at all. Moreover, it makes it possible to inherit the fields of the parent base classes without complaining about the order of positional arguments(non-default fields).
from collections import OrderedDict
import typing as ty
import dataclasses
from itertools import takewhile
class DataClassTerm:
def __new__(cls, *args, **kwargs):
return super().__new__(cls)
class DataClassMeta(type):
def __new__(cls, clsname, bases, clsdict):
fields = {}
# Get list of base classes including the class to be produced(initialized without its original base classes as those have already become dataclasses)
bases_and_self = [dataclasses.dataclass(super().__new__(cls, clsname, (DataClassTerm,), clsdict))] + list(bases)
# Whatever is a subclass of DataClassTerm will become a DataClassTerm.
# Following block will iterate and create individual dataclasses and collect their fields
for base in bases_and_self[::-1]: # Ensure that last fields in last base is prioritized
if issubclass(base, DataClassTerm):
to_dc_bases = list(takewhile(lambda c: c is not DataClassTerm, base.__mro__))
for dc_base in to_dc_bases[::-1]: # Ensure that last fields in last base in MRO is prioritized(same as in dataclasses)
if dataclasses.is_dataclass(dc_base):
valid_dc = dc_base
else:
valid_dc = dataclasses.dataclass(dc_base)
for field in dataclasses.fields(valid_dc):
fields[field.name] = (field.name, field.type, field)
# Following block will reorder the fields so that fields without default values are first in order
reordered_fields = OrderedDict()
for n, t, f in fields.values():
if f.default is dataclasses.MISSING and f.default_factory is dataclasses.MISSING:
reordered_fields[n] = (n, t, f)
for n, t, f in fields.values():
if n not in reordered_fields.keys():
reordered_fields[n] = (n, t, f)
# Create a new dataclass using `dataclasses.make_dataclass`, which ultimately calls type.__new__, which is the same as super().__new__ in our case
fields = list(reordered_fields.values())
full_dc = dataclasses.make_dataclass(cls_name=clsname, fields=fields, init=True, bases=(DataClassTerm,))
# Discard the created dataclass class and create new one using super but preserve the dataclass specific namespace.
return super().__new__(cls, clsname, bases, {**full_dc.__dict__,**clsdict})
class DataClassCustom(DataClassTerm, metaclass=DataClassMeta):
def __new__(cls, *args, **kwargs):
if len(args)>0:
raise RuntimeError("Do not use positional arguments for initialization.")
return super().__new__(cls, *args, **kwargs)
Now let's create a sample dataclass with a parent dataclass and a sample mixing class:
class DataClassCustomA(DataClassCustom):
field_A_1: int = dataclasses.field()
field_A_2: ty.AnyStr = dataclasses.field(default=None)
class SomeOtherClass:
def methodA(self):
print('print from SomeOtherClass().methodA')
class DataClassCustomB(DataClassCustomA,SomeOtherClass):
field_B_1: int = dataclasses.field()
field_B_2: ty.Dict = dataclasses.field(default_factory=dict)
The result is
result_b = DataClassCustomB(field_A_1=1, field_B_1=2)
result_b
# DataClassCustomB(field_A_1=1, field_B_1=2, field_A_2=None, field_B_2={})
result_b.methodA()
# print from SomeOtherClass().methodA
An attempt to do the same with #dataclass decorator on each parent class would have raised an exception in the following child class, like TypeError(f'non-default argument <field-name) follows default argument'). The above solution prevents this from happening because the fields are first reordered. However, since the order of fields is modified the prevention of *args usage in DataClassCustom.__new__ is mandatory as the original order is no longer valid.
Although in Python >=3.10 the kw_only feature was introduced that essentially makes inheritance in dataclasses much more reliable, the above example still can be used as a way to make dataclasses inheritable that do not require the usage of #dataclass decorator.
I'm currently trying my hands on the new dataclass constructions introduced in Python 3.7. I am currently stuck on trying to do some inheritance of a parent class. It looks like the order of the arguments are botched by my current approach such that the bool parameter in the child class is passed before the other parameters. This is causing a type error.
from dataclasses import dataclass
#dataclass
class Parent:
name: str
age: int
ugly: bool = False
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f'The Name is {self.name} and {self.name} is {self.age} year old')
#dataclass
class Child(Parent):
school: str
ugly: bool = True
jack = Parent('jack snr', 32, ugly=True)
jack_son = Child('jack jnr', 12, school = 'havard', ugly=True)
jack.print_id()
jack_son.print_id()
When I run this code I get this TypeError:
TypeError: non-default argument 'school' follows default argument
How do I fix this?
The way dataclasses combines attributes prevents you from being able to use attributes with defaults in a base class and then use attributes without a default (positional attributes) in a subclass.
That's because the attributes are combined by starting from the bottom of the MRO, and building up an ordered list of the attributes in first-seen order; overrides are kept in their original location. So Parent starts out with ['name', 'age', 'ugly'], where ugly has a default, and then Child adds ['school'] to the end of that list (with ugly already in the list). This means you end up with ['name', 'age', 'ugly', 'school'] and because school doesn't have a default, this results in an invalid argument listing for __init__.
This is documented in PEP-557 Dataclasses, under inheritance:
When the Data Class is being created by the #dataclass decorator, it looks through all of the class's base classes in reverse MRO (that is, starting at object) and, for each Data Class that it finds, adds the fields from that base class to an ordered mapping of fields. After all of the base class fields are added, it adds its own fields to the ordered mapping. All of the generated methods will use this combined, calculated ordered mapping of fields. Because the fields are in insertion order, derived classes override base classes.
and under Specification:
TypeError will be raised if a field without a default value follows a field with a default value. This is true either when this occurs in a single class, or as a result of class inheritance.
You do have a few options here to avoid this issue.
The first option is to use separate base classes to force fields with defaults into a later position in the MRO order. At all cost, avoid setting fields directly on classes that are to be used as base classes, such as Parent.
The following class hierarchy works:
# base classes with fields; fields without defaults separate from fields with.
#dataclass
class _ParentBase:
name: str
age: int
#dataclass
class _ParentDefaultsBase:
ugly: bool = False
#dataclass
class _ChildBase(_ParentBase):
school: str
#dataclass
class _ChildDefaultsBase(_ParentDefaultsBase):
ugly: bool = True
# public classes, deriving from base-with, base-without field classes
# subclasses of public classes should put the public base class up front.
#dataclass
class Parent(_ParentDefaultsBase, _ParentBase):
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f"The Name is {self.name} and {self.name} is {self.age} year old")
#dataclass
class Child(_ChildDefaultsBase, Parent, _ChildBase):
pass
By pulling out fields into separate base classes with fields without defaults and fields with defaults, and a carefully selected inheritance order, you can produce an MRO that puts all fields without defaults before those with defaults. The reversed MRO (ignoring object) for Child is:
_ParentBase
_ChildBase
_ParentDefaultsBase
Parent
_ChildDefaultsBase
Note that while Parent doesn't set any new fields, it does inherit the fields from _ParentDefaultsBase and should not end up 'last' in the field listing order; the above order puts _ChildDefaultsBase last so its fields 'win'. The dataclass rules are also satisfied; the classes with fields without defaults (_ParentBase and _ChildBase) precede the classes with fields with defaults (_ParentDefaultsBase and _ChildDefaultsBase).
The result is Parent and Child classes with a sane field older, while Child is still a subclass of Parent:
>>> from inspect import signature
>>> signature(Parent)
<Signature (name: str, age: int, ugly: bool = False) -> None>
>>> signature(Child)
<Signature (name: str, age: int, school: str, ugly: bool = True) -> None>
>>> issubclass(Child, Parent)
True
and so you can create instances of both classes:
>>> jack = Parent('jack snr', 32, ugly=True)
>>> jack_son = Child('jack jnr', 12, school='havard', ugly=True)
>>> jack
Parent(name='jack snr', age=32, ugly=True)
>>> jack_son
Child(name='jack jnr', age=12, school='havard', ugly=True)
Another option is to only use fields with defaults; you can still make in an error to not supply a school value, by raising one in __post_init__:
_no_default = object()
#dataclass
class Child(Parent):
school: str = _no_default
ugly: bool = True
def __post_init__(self):
if self.school is _no_default:
raise TypeError("__init__ missing 1 required argument: 'school'")
but this does alter the field order; school ends up after ugly:
<Signature (name: str, age: int, ugly: bool = True, school: str = <object object at 0x1101d1210>) -> None>
and a type hint checker will complain about _no_default not being a string.
You can also use the attrs project, which was the project that inspired dataclasses. It uses a different inheritance merging strategy; it pulls overridden fields in a subclass to the end of the fields list, so ['name', 'age', 'ugly'] in the Parent class becomes ['name', 'age', 'school', 'ugly'] in the Child class; by overriding the field with a default, attrs allows the override without needing to do a MRO dance.
attrs supports defining fields without type hints, but lets stick to the supported type hinting mode by setting auto_attribs=True:
import attr
#attr.s(auto_attribs=True)
class Parent:
name: str
age: int
ugly: bool = False
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f"The Name is {self.name} and {self.name} is {self.age} year old")
#attr.s(auto_attribs=True)
class Child(Parent):
school: str
ugly: bool = True
You can use attributes with defaults in parent classes if you exclude them from the init function. If you need the possibility to override the default at init, extend the code with the answer of Praveen Kulkarni.
from dataclasses import dataclass, field
#dataclass
class Parent:
name: str
age: int
ugly: bool = field(default=False, init=False)
#dataclass
class Child(Parent):
school: str
jack = Parent('jack snr', 32)
jack_son = Child('jack jnr', 12, school = 'havard')
jack_son.ugly = True
Or even
#dataclass
class Child(Parent):
school: str
ugly = True
# This does not work
# ugly: bool = True
jack_son = Child('jack jnr', 12, school = 'havard')
assert jack_son.ugly
Note that with Python 3.10, it is now possible to do it natively with dataclasses.
Dataclasses 3.10 added the kw_only attribute (similar to attrs).
It allows you to specify which fields are keyword_only, thus will be set at the end of the init, not causing an inheritance problem.
Taking directly from Eric Smith's blog post on the subject:
There are two reasons people [were asking for] this feature:
When a dataclass has many fields, specifying them by position can become unreadable. It also requires that for backward compatibility, all new fields are added to the end of the dataclass. This isn't always desirable.
When a dataclass inherits from another dataclass, and the base class has fields with default values, then all of the fields in the derived class must also have defaults.
What follows is the simplest way to do it with this new argument, but there are multiple ways you can use it to use inheritance with default values in the parent class:
from dataclasses import dataclass
#dataclass(kw_only=True)
class Parent:
name: str
age: int
ugly: bool = False
#dataclass(kw_only=True)
class Child(Parent):
school: str
ch = Child(name="Kevin", age=17, school="42")
print(ch.ugly)
Take a look at the blogpost linked above for a more thorough explanation of kw_only.
Cheers !
PS: As it is fairly new, note that your IDE might still raise a possible error, but it works at runtime
The approach below deals with this problem while using pure python dataclasses and without much boilerplate code.
The ugly: dataclasses.InitVar[bool] serves as a pseudo-field just to help us do initialization and will be lost once the instance is created. While _ugly: bool = field(init=False) is an instance member which will not be initialized by __init__ method but can be alternatively initialized using __post_init__ method (you can find more here.).
from dataclasses import dataclass, field, InitVar
#dataclass
class Parent:
name: str
age: int
ugly: InitVar[bool]
_ugly: bool = field(init=False)
def __post_init__(self, ugly: bool):
self._ugly = ugly
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f'The Name is {self.name} and {self.name} is {self.age} year old')
#dataclass
class Child(Parent):
school: str
jack = Parent('jack snr', 32, ugly=True)
jack_son = Child('jack jnr', 12, school='havard', ugly=True)
jack.print_id()
jack_son.print_id()
Note that this makes field ugly mandatory to make it optional you can define a class method on the Parent that includes ugly as an optional parameter:
from dataclasses import dataclass, field, InitVar
#dataclass
class Parent:
name: str
age: int
ugly: InitVar[bool]
_ugly: bool = field(init=False)
def __post_init__(self, ugly: bool):
self._ugly = ugly
#classmethod
def create(cls, ugly=True, **kwargs):
return cls(ugly=ugly, **kwargs)
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f'The Name is {self.name} and {self.name} is {self.age} year old')
#dataclass
class Child(Parent):
school: str
jack = Parent.create(name='jack snr', age=32, ugly=False)
jack_son = Child.create(name='jack jnr', age=12, school='harvard')
jack.print_id()
jack_son.print_id()
Now you can use the create(...) class method as a factory method for creating Parent/Child classes with a default value for ugly. Note you must use named parameters for this approach to work.
You're seeing this error because an argument without a default value is being added after an argument with a default value. The insertion order of inherited fields into the dataclass is the reverse of Method Resolution Order, which means that the Parent fields come first, even if they are over written later by their children.
An example from PEP-557 - Data Classes:
#dataclass
class Base:
x: Any = 15.0
y: int = 0
#dataclass
class C(Base):
z: int = 10
x: int = 15
The final list of fields is, in order,x, y, z. The final type of x is int, as specified in class C.
Unfortunately, I don't think there's any way around this. My understanding is that if the parent class has a default argument, then no child class can have non-default arguments.
based on Martijn Pieters solution I did the following:
1) Create a mixing implementing the post_init
from dataclasses import dataclass
no_default = object()
#dataclass
class NoDefaultAttributesPostInitMixin:
def __post_init__(self):
for key, value in self.__dict__.items():
if value is no_default:
raise TypeError(
f"__init__ missing 1 required argument: '{key}'"
)
2) Then in the classes with the inheritance problem:
from src.utils import no_default, NoDefaultAttributesChild
#dataclass
class MyDataclass(DataclassWithDefaults, NoDefaultAttributesPostInitMixin):
attr1: str = no_default
EDIT:
After a time I also find problems with this solution with mypy, the following code fix the issue.
from dataclasses import dataclass
from typing import TypeVar, Generic, Union
T = TypeVar("T")
class NoDefault(Generic[T]):
...
NoDefaultVar = Union[NoDefault[T], T]
no_default: NoDefault = NoDefault()
#dataclass
class NoDefaultAttributesPostInitMixin:
def __post_init__(self):
for key, value in self.__dict__.items():
if value is NoDefault:
raise TypeError(f"__init__ missing 1 required argument: '{key}'")
#dataclass
class Parent(NoDefaultAttributesPostInitMixin):
a: str = ""
#dataclass
class Child(Foo):
b: NoDefaultVar[str] = no_default
If you are using Python 3.10+, then you can utilize keyword-only arguments for the dataclass as discussed in this answer and in the python docs.
If you're using < Python 3.10, then you can utilize dataclasses.field with a default_factory that throws. Since the attribute will be declared with field(), it gets treated as if it has a default; but if a user attempts to create an instance without providing the value for that field, it will use the factory, which will error.
This technique isn't equivalent to keyword only, because you could still provide all the arguments positionally. However, this does solve the problem, and is simpler than messing around with various dataclass dunder methods.
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional, TypeVar
T = TypeVar("T")
def required() -> T:
f: T
def factory() -> T:
# mypy treats a Field as a T, even though it has attributes like .name, .default, etc
field_name = f.name # type: ignore[attr-defined]
raise ValueError(f"field '{field_name}' required")
f = field(default_factory=factory)
return f
#dataclass
class Event:
id: str
created_at: datetime
updated_at: Optional[datetime] = None
#dataclass
class NamedEvent(Event):
name: str = required()
event = NamedEvent(name="Some Event", id="ab13c1a", created_at=datetime.now())
print("created event:", event)
event2 = NamedEvent("ab13c1a", datetime.now(), name="Some Other Event")
print("created event:", event2)
event3 = NamedEvent("ab13c1a", datetime.now())
Output:
created event: NamedEvent(id='ab13c1a', created_at=datetime.datetime(2022, 7, 23, 19, 22, 17, 944550), updated_at=None, name='Some Event')
created event: NamedEvent(id='ab13c1a', created_at=datetime.datetime(2022, 7, 23, 19, 22, 17, 944588), updated_at=None, name='Some Other Event')
Traceback (most recent call last):
File ".../gist.py", line 39, in <module>
event3 = NamedEvent("ab13c1a", datetime.now())
File "<string>", line 6, in __init__
File ".../gist.py", line 14, in factory
raise ValueError(f"field '{field_name}' required")
ValueError: field 'name' required
You can also find this code on this github gist.
A possible work-around is to use monkey-patching to append the parent fields
import dataclasses as dc
def add_args(parent):
def decorator(orig):
"Append parent's fields AFTER orig's fields"
# Aggregate fields
ff = [(f.name, f.type, f) for f in dc.fields(dc.dataclass(orig))]
ff += [(f.name, f.type, f) for f in dc.fields(dc.dataclass(parent))]
new = dc.make_dataclass(orig.__name__, ff)
new.__doc__ = orig.__doc__
return new
return decorator
class Animal:
age: int = 0
#add_args(Animal)
class Dog:
name: str
noise: str = "Woof!"
#add_args(Animal)
class Bird:
name: str
can_fly: bool = True
Dog("Dusty", 2) # --> Dog(name='Dusty', noise=2, age=0)
b = Bird("Donald", False, 40) # --> Bird(name='Donald', can_fly=False, age=40)
It's also possible to prepend non-default fields,
by checking if f.default is dc.MISSING,
but this is probably too dirty.
While monkey-patching lacks some features of inheritance,
it can still be used to add methods to all pseudo-child classes.
For more fine-grained control, set the default values
using dc.field(compare=False, repr=True, ...)
You can use a modified version of dataclasses, which will generate a keyword only __init__ method:
import dataclasses
def _init_fn(fields, frozen, has_post_init, self_name):
# fields contains both real fields and InitVar pseudo-fields.
globals = {'MISSING': dataclasses.MISSING,
'_HAS_DEFAULT_FACTORY': dataclasses._HAS_DEFAULT_FACTORY}
body_lines = []
for f in fields:
line = dataclasses._field_init(f, frozen, globals, self_name)
# line is None means that this field doesn't require
# initialization (it's a pseudo-field). Just skip it.
if line:
body_lines.append(line)
# Does this class have a post-init function?
if has_post_init:
params_str = ','.join(f.name for f in fields
if f._field_type is dataclasses._FIELD_INITVAR)
body_lines.append(f'{self_name}.{dataclasses._POST_INIT_NAME}({params_str})')
# If no body lines, use 'pass'.
if not body_lines:
body_lines = ['pass']
locals = {f'_type_{f.name}': f.type for f in fields}
return dataclasses._create_fn('__init__',
[self_name, '*'] + [dataclasses._init_param(f) for f in fields if f.init],
body_lines,
locals=locals,
globals=globals,
return_type=None)
def add_init(cls, frozen):
fields = getattr(cls, dataclasses._FIELDS)
# Does this class have a post-init function?
has_post_init = hasattr(cls, dataclasses._POST_INIT_NAME)
# Include InitVars and regular fields (so, not ClassVars).
flds = [f for f in fields.values()
if f._field_type in (dataclasses._FIELD, dataclasses._FIELD_INITVAR)]
dataclasses._set_new_attribute(cls, '__init__',
_init_fn(flds,
frozen,
has_post_init,
# The name to use for the "self"
# param in __init__. Use "self"
# if possible.
'__dataclass_self__' if 'self' in fields
else 'self',
))
return cls
# a dataclass with a constructor that only takes keyword arguments
def dataclass_keyword_only(_cls=None, *, repr=True, eq=True, order=False,
unsafe_hash=False, frozen=False):
def wrap(cls):
cls = dataclasses.dataclass(
cls, init=False, repr=repr, eq=eq, order=order, unsafe_hash=unsafe_hash, frozen=frozen)
return add_init(cls, frozen)
# See if we're being called as #dataclass or #dataclass().
if _cls is None:
# We're called with parens.
return wrap
# We're called as #dataclass without parens.
return wrap(_cls)
(also posted as a gist, tested with Python 3.6 backport)
This will require to define the child class as
#dataclass_keyword_only
class Child(Parent):
school: str
ugly: bool = True
And would generate __init__(self, *, name:str, age:int, ugly:bool=True, school:str) (which is valid python). The only caveat here is not allowing to initialize objects with positional arguments, but otherwise it's a completely regular dataclass with no ugly hacks.
A quick and dirty solution:
from typing import Optional
#dataclass
class Child(Parent):
school: Optional[str] = None
ugly: bool = True
def __post_init__(self):
assert self.school is not None
Then go back and refactor once (hopefully) the language is extended.
I came back to this question after discovering that dataclasses may be getting a decorator parameter that allows fields to be reordered. This is certainly a promising development, though progress on this feature seems to have stalled somewhat.
Right now, you can get this behaviour, plus some other niceties, by using dataclassy, my reimplementation of dataclasses that overcomes frustrations like this. Using from dataclassy in place of from dataclasses in the original example means it runs without errors.
Using inspect to print the signature of Child makes what is going on clear; the result is (name: str, age: int, school: str, ugly: bool = True). Fields are always reordered so that fields with default values come after fields without them in the parameters to the initializer. Both lists (fields without defaults, and those with them) are still ordered in definition order.
Coming face to face with this issue was one of the factors that prompted me to write a replacement for dataclasses. The workarounds detailed here, while helpful, require code to be contorted to such an extent that they completely negate the readability advantage dataclasses' naive approach (whereby field ordering is trivially predictable) offers.
When you use Python inheritance to create dataclasses, you cannot guarantee that all fields with default values will appear after all fields without default values.
An easy solution is to avoid using multiple inheritance to construct a "merged" dataclass. Instead, we can build a merged dataclass just by filtering and sorting on the fields of your parent dataclasses.
Try out this merge_dataclasses() function:
import dataclasses
import functools
from typing import Iterable, Type
def merge_dataclasses(
cls_name: str,
*,
merge_from: Iterable[Type],
**kwargs,
):
"""
Construct a dataclass by merging the fields
from an arbitrary number of dataclasses.
Args:
cls_name: The name of the constructed dataclass.
merge_from: An iterable of dataclasses
whose fields should be merged.
**kwargs: Keyword arguments are passed to
:py:func:`dataclasses.make_dataclass`.
Returns:
Returns a new dataclass
"""
# Merge the fields from the dataclasses,
# with field names from later dataclasses overwriting
# any conflicting predecessor field names.
each_base_fields = [d.__dataclass_fields__ for d in merge_from]
merged_fields = functools.reduce(
lambda x, y: {**x, **y}, each_base_fields
)
# We have to reorder all of the fields from all of the dataclasses
# so that *all* of the fields without defaults appear
# in the merged dataclass *before* all of the fields with defaults.
fields_without_defaults = [
(f.name, f.type, f)
for f in merged_fields.values()
if isinstance(f.default, dataclasses._MISSING_TYPE)
]
fields_with_defaults = [
(f.name, f.type, f)
for f in merged_fields.values()
if not isinstance(f.default, dataclasses._MISSING_TYPE)
]
fields = [*fields_without_defaults, *fields_with_defaults]
return dataclasses.make_dataclass(
cls_name=cls_name,
fields=fields,
**kwargs,
)
And then you can merge dataclasses as follows. Note that we can merge A and B and the default fields b and d are moved to the end of the merged dataclass.
#dataclasses.dataclass
class A:
a: int
b: int = 0
#dataclasses.dataclass
class B:
c: int
d: int = 0
C = merge_dataclasses(
"C",
merge_from=[A, B],
)
# Note that
print(C(a=1, d=1).__dict__)
# {'a': 1, 'd': 1, 'b': 0, 'c': 0}
Of course, the pitfall of this solution is that C doesn't actually inherit from A and B, which means that you cannot use isinstance() or other type assertions to verify C's parentage.
Complementing the Martijn Pieters solution that uses attrs: it is possible to create the inheritance without the default attributes replication, with:
import attr
#attr.s(auto_attribs=True)
class Parent:
name: str
age: int
ugly: bool = attr.ib(default=False, kw_only=True)
#attr.s(auto_attribs=True)
class Child(Parent):
school: str
ugly: bool = True
More about the kw_only parameter can be found here
How about defining the ugly field like this, instead of the default way?
ugly: bool = field(metadata=dict(required=False, missing=False))
An experimental but interesting solution would be to use metaclasses. The solution below enables the usage of Python dataclasses with simple inheritance without using the dataclass decorator at all. Moreover, it makes it possible to inherit the fields of the parent base classes without complaining about the order of positional arguments(non-default fields).
from collections import OrderedDict
import typing as ty
import dataclasses
from itertools import takewhile
class DataClassTerm:
def __new__(cls, *args, **kwargs):
return super().__new__(cls)
class DataClassMeta(type):
def __new__(cls, clsname, bases, clsdict):
fields = {}
# Get list of base classes including the class to be produced(initialized without its original base classes as those have already become dataclasses)
bases_and_self = [dataclasses.dataclass(super().__new__(cls, clsname, (DataClassTerm,), clsdict))] + list(bases)
# Whatever is a subclass of DataClassTerm will become a DataClassTerm.
# Following block will iterate and create individual dataclasses and collect their fields
for base in bases_and_self[::-1]: # Ensure that last fields in last base is prioritized
if issubclass(base, DataClassTerm):
to_dc_bases = list(takewhile(lambda c: c is not DataClassTerm, base.__mro__))
for dc_base in to_dc_bases[::-1]: # Ensure that last fields in last base in MRO is prioritized(same as in dataclasses)
if dataclasses.is_dataclass(dc_base):
valid_dc = dc_base
else:
valid_dc = dataclasses.dataclass(dc_base)
for field in dataclasses.fields(valid_dc):
fields[field.name] = (field.name, field.type, field)
# Following block will reorder the fields so that fields without default values are first in order
reordered_fields = OrderedDict()
for n, t, f in fields.values():
if f.default is dataclasses.MISSING and f.default_factory is dataclasses.MISSING:
reordered_fields[n] = (n, t, f)
for n, t, f in fields.values():
if n not in reordered_fields.keys():
reordered_fields[n] = (n, t, f)
# Create a new dataclass using `dataclasses.make_dataclass`, which ultimately calls type.__new__, which is the same as super().__new__ in our case
fields = list(reordered_fields.values())
full_dc = dataclasses.make_dataclass(cls_name=clsname, fields=fields, init=True, bases=(DataClassTerm,))
# Discard the created dataclass class and create new one using super but preserve the dataclass specific namespace.
return super().__new__(cls, clsname, bases, {**full_dc.__dict__,**clsdict})
class DataClassCustom(DataClassTerm, metaclass=DataClassMeta):
def __new__(cls, *args, **kwargs):
if len(args)>0:
raise RuntimeError("Do not use positional arguments for initialization.")
return super().__new__(cls, *args, **kwargs)
Now let's create a sample dataclass with a parent dataclass and a sample mixing class:
class DataClassCustomA(DataClassCustom):
field_A_1: int = dataclasses.field()
field_A_2: ty.AnyStr = dataclasses.field(default=None)
class SomeOtherClass:
def methodA(self):
print('print from SomeOtherClass().methodA')
class DataClassCustomB(DataClassCustomA,SomeOtherClass):
field_B_1: int = dataclasses.field()
field_B_2: ty.Dict = dataclasses.field(default_factory=dict)
The result is
result_b = DataClassCustomB(field_A_1=1, field_B_1=2)
result_b
# DataClassCustomB(field_A_1=1, field_B_1=2, field_A_2=None, field_B_2={})
result_b.methodA()
# print from SomeOtherClass().methodA
An attempt to do the same with #dataclass decorator on each parent class would have raised an exception in the following child class, like TypeError(f'non-default argument <field-name) follows default argument'). The above solution prevents this from happening because the fields are first reordered. However, since the order of fields is modified the prevention of *args usage in DataClassCustom.__new__ is mandatory as the original order is no longer valid.
Although in Python >=3.10 the kw_only feature was introduced that essentially makes inheritance in dataclasses much more reliable, the above example still can be used as a way to make dataclasses inheritable that do not require the usage of #dataclass decorator.