I'm currently writing some codes for an option pricer and at the same time I've been trying to experiment with Python dataclasses. Here I've two classes, Option() and Option2(), with the former written in dataclass syntax and latter in conventional class syntax.
from dataclasses import dataclass, field
from typing import Optional
#dataclass
class Option:
is_american: Optional[bool] = field(default=False)
is_european: Optional[bool] = not is_american
class Option2:
def __init__(is_american=False):
self.is_european = not is_american
if __name__ == "__main__":
eu_option1 = Option()
print(f"{eu_option1.is_european = }")
eu_option2 = Option2()
print(f"{eu_option2.is_european = }")
The output gives
eu_option1.is_european = False
eu_option2.is_european = True
However, something very strange happened. Notice how in the Option2() case, is_american is set to False by default, and hence is_european must be True and it indeed is, so this is expected behaviour.
But in the dataclass Option() case, is_american is also set to False by default. However, for whatever reason, the dataclass did not trigger the is_european: Optional[bool] = not is_american and hence is_european is still False when it is supposed to be True.
What is going on here? Did I use my dataclass incorrectly?
It is likely that the dataclass constructor is struggling with the order of statements. Normally you'd have all the mandatory parameters before any optional ones for example, and it may not realise at construct time that the value is meant to be false.
There is a built-in mechanism to make sure that fields which are dependent on other fields are processed in the correct order. What you need to do is flag your secondary code as init=False and move them over to a __post_init__() method.
from dataclasses import dataclass, field
from typing import Optional, List
#dataclass
class Option:
is_american: Optional[bool] = field(default=False)
is_european: Optional[bool] = field(init=False)
def __post_init__():
self.is_european = not self.is_american
Personally I'd get rid of is_european altogether and use a get() to fetch the value if it's called. There's no need to hold the extra value if it's always going to be directly related to another value. Just calculate it on the fly when it's called.
With many languages, you wouldn't access attributes directly, you'd access them through control functions (get, set, etc) like get_is_american() or get_country(). Python has an excellent way of handling this through decorators. This allows the use of direct access when first setting up a class, then moving to managed access without having the change the code calling the attribute by using the #property decorator. Examples:
# change the is_american to _is_american to stop direct access
# Get is the default action, therefore does not need to be specified
#property
def is_american(self):
return self._is_american
#property
def is_european(self):
return not self._is_american
# Allow value to be set
#property.setter
def is_american(self, america_based: bool):
self._is_american = america_based
#property.setter
def is_european(self, europe_based: bool):
self._is_american = not europe_based
This could then be called as follows:
print(my_object.is_american)
my_object.is_american = false
print(my_object.is_european)
Did you see how flexible that approach is? If you have more countries that US or European, or if you think the process might expand, you can change the storage to a string or an enum and define the return values using the accessor. Example:
# Imagine country is now a string
#property
def is_american(self):
if self.country == 'US':
return true
else:
return false
#property
def is_european(self):
if self.country == 'EU':
return true
else:
return false
#property
def country(self):
return self._country
#property.setter
def country(self, new_country: str):
self._country = new_country
#property.setter
def is_american(self, america_check: bool):
if america_check:
self._country = "US"
else:
self._country = "EU"
#property.setter
def is_european(self, europe_check: bool):
if europe_check:
self._country = "EU"
else:
self._country = "US"
Notice how, if you already have existing code that calls is_american, none of the accessing code has to be changed even though country is now stored - and available as - a string.
Your problem is:
is_european: Optional[bool] = not is_american
not is_american is evaluated at definition time. At that point, is_american is a Field, and all Fields are truthy. If you want one field defined in terms of another, you'll want to use post-initialization processing to dynamically select the value of is_european after is_american is initialized, or make it an #property that computes its value live from the value of is_american (assuming it's impossible to be both at once).
Related
Consider the following class:
from dataclasses import dataclass
#dataclass
class Test:
result: int = None
#dataclass
class Test2:
nested = Test()
result: int = None
class Mainthing:
def __init__(self, value):
self.value = value
self.results = Test2()
def Main(self):
value = self.value
self.results.result = value
self.results.nested.result = value
If I make an instance of the class:
x = Mainthing(1)
And call the Main() function:
x.Main()
The results are as they should be,
x.results.result
Out[0]: 1
x.results.nested.result
Out[1]: 1
If I then delete the instance
del x
and make it again
x = Mainthing(1)
The x.results.result is now None as i would expect, but the nested is not
x.results.nested.result
Out[]: 1
Why is that?
Without a type annotation, nested is a class attribute, unused by #dataclass, which means it's shared between all instances of Test2 (dataclasses.fields(Test2) won't report it, because it's not an instance attribute).
Making it an instance attribute alone, with:
nested: Test = Test()
might seem like it works (it does make it a field), but it's got the same problem using mutable default arguments has everywhere; it ends up being a shared default, and mutations applied through one instance change the value shared with everyone else.
If you want to use a new Test() for each instance, make it a field with a default_factory, adding field to the imports from dataclasses, and changing the definition to, e.g.:
nested: Test = field(default_factory=Test)
If you always want to create your own instance, never accept one from the user, make it:
nested: Test = field(default_factory=Test, init=False)
so the generated __init__ does not accept it (accepting only result).
scroll all the way down for a tl;dr, I provide context which I think is important but is not directly relevant to the question asked
A bit of context
I'm in the making of an API for a webapp and some values are computed based on the values of others in a pydantic BaseModel. These are used for user validation, data serialization and definition of database (NoSQL) documents.
Specifically, I have nearly all resources inheriting from a OwnedResource class, which defines, amongst irrelevant other properties like creation/last-update dates:
object_key -- The key of the object using a nanoid of length 6 with a custom alphabet
owner_key -- This key references the user that owns that object -- a nanoid of length 10.
_key -- this one is where I'm bumping into some problems, and I'll explain why.
So arangodb -- the database I'm using -- imposes _key as the name of the property by which resources are identified.
Since, in my webapp, all resources are only accessed by the users who created them, they can be identified in URLs with just the object's key (eg. /subject/{object_key}). However, as _key must be unique, I intend to construct the value of this field using f"{owner_key}/{object_key}", to store the objects of every user in the database and potentially allow for cross-user resource sharing in the future.
The goal is to have the shortest per-user unique identifier, since the owner_key part of the full _key used to actually access and act upon the document stored in the database is always the same: the currently-logged-in user's _key.
My attempt
My thought was then to define the _key field as a #property-decorated function in the class. However, Pydantic does not seem to register those as model fields.
Moreover, the attribute must actually be named key and use an alias (with Field(... alias="_key"), as pydantic treats underscore-prefixed fields as internal and does not expose them.
Here is the definition of OwnedResource:
class OwnedResource(BaseModel):
"""
Base model for resources owned by users
"""
object_key: ObjectBareKey = nanoid.generate(ID_CHARSET, OBJECT_KEY_LEN)
owner_key: UserKey
updated_at: Optional[datetime] = None
created_at: datetime = datetime.now()
#property
def key(self) -> ObjectKey:
return objectkey(self.owner_key)
class Config:
fields = {"key": "_key"} # [1]
[1] Since Field(..., alias="...") cannot be used, I use this property of the Config subclass (see pydantic's documentation)
However, this does not work, as shown in the following example:
#router.post("/subjects/")
def create_a_subject(subject: InSubject):
print(subject.dict(by_alias=True))
with InSubject defining properties proper to Subject, and Subject being an empty class inheriting from both InSubject and OwnedResource:
class InSubject(BaseModel):
name: str
color: Color
weight: Union[PositiveFloat, Literal[0]] = 1.0
goal: Primantissa # This is just a float constrained in a [0, 1] range
room: str
class Subject(InSubject, OwnedResource):
pass
When I perform a POST /subjects/, the following is printed in the console:
{'name': 'string', 'color': Color('cyan', rgb=(0, 255, 255)), 'weight': 0, 'goal': 0.0, 'room': 'string'}
As you can see, _key or key are nowhere to be seen.
Please ask for details and clarification, I tried to make this as easy to understand as possible, but I'm not sure if this is clear enough.
tl;dr
A context-less and more generic example without insightful context:
With the following class:
from pydantic import BaseModel
class SomeClass(BaseModel):
spam: str
#property
def eggs(self) -> str:
return self.spam + " bacon"
class Config:
fields = {"eggs": "_eggs"}
I would like the following to be true:
a = SomeClass(spam="I like")
d = a.dict(by_alias=True)
d.get("_eggs") == "I like bacon"
Pydantic does not support serializing properties, there is an issue on GitHub requesting this feature.
Based on this comment by ludwig-weiss he suggests subclassing BaseModel and overriding the dict method to include the properties.
class PropertyBaseModel(BaseModel):
"""
Workaround for serializing properties with pydantic until
https://github.com/samuelcolvin/pydantic/issues/935
is solved
"""
#classmethod
def get_properties(cls):
return [prop for prop in dir(cls) if isinstance(getattr(cls, prop), property) and prop not in ("__values__", "fields")]
def dict(
self,
*,
include: Union['AbstractSetIntStr', 'MappingIntStrAny'] = None,
exclude: Union['AbstractSetIntStr', 'MappingIntStrAny'] = None,
by_alias: bool = False,
skip_defaults: bool = None,
exclude_unset: bool = False,
exclude_defaults: bool = False,
exclude_none: bool = False,
) -> 'DictStrAny':
attribs = super().dict(
include=include,
exclude=exclude,
by_alias=by_alias,
skip_defaults=skip_defaults,
exclude_unset=exclude_unset,
exclude_defaults=exclude_defaults,
exclude_none=exclude_none
)
props = self.get_properties()
# Include and exclude properties
if include:
props = [prop for prop in props if prop in include]
if exclude:
props = [prop for prop in props if prop not in exclude]
# Update the attribute dict with the properties
if props:
attribs.update({prop: getattr(self, prop) for prop in props})
return attribs
You might be able to serialize your _key field using a pydantic validator with the always option set to True.
Using your example:
from typing import Optional
from pydantic import BaseModel, Field, validator
class SomeClass(BaseModel):
spam: str
eggs: Optional[str] = Field(alias="_eggs")
#validator("eggs", always=True)
def set_eggs(cls, v, values, **kwargs):
"""Set the eggs field based upon a spam value."""
return v or values.get("spam") + " bacon"
a = SomeClass(spam="I like")
my_dictionary = a.dict(by_alias=True)
print(my_dictionary)
> {'spam': 'I like', '_eggs': 'I like bacon'}
print(my_dictionary.get("_eggs"))
> "I like bacon"
So to serialize your _eggs field, instead of appending a string, you'd insert your serialization function there and return the output of that.
I've a class like following:
class Invoice
def __init__(self, invoice_id):
self.invoice_id = invoice_id
self._amount = None
#property
def amount(self):
return self._amount
#amount.setter
def amount(self, amount):
self._amount = amount
In the above example, whenever I would try to get invoice amount without setting its value, I would get a None, like following:
invoice = Invoice(invoice_id='asdf234')
invoice.amount
>> None
But in this situation, None is not the correct default value for amount. I should be able to differentiate between amount value as None vs amount value not been set at all. So question is following:
How do we handle cases when class property doesn't have a right default value ? In the above example if I remove self._amount = None from init, I would get AttributeError for self._amount and self.amount would return a valid value only after I call invoice.amount = 5. Is this the right way to handle it ? But this also leads to inconsistency in object state as application would be changing instance properties at runtime.
I've kept amount as a property for invoice for better understanding and readability of Invoice and its attributes. Should class properties only be used when we're aware of its init / default values ?
None is conventionally used for the absence of a value, but sometimes you need it to actually be an allowable value. If you want to be able to distinguish between None and an un-set value, simply define your own singleton for this purpose.
class Undefined:
__str__ = __repr__ = lambda self: "Undefined"
Undefined = Undefined()
class Invoice
def __init__(self, invoice_id):
self.invoice_id = invoice_id
self._amount = Undefined
#property
def amount(self):
return self._amount
#amount.setter
def amount(self, amount):
if amount is Undefined:
raise ValueError("amount must be an actual value")
self._amount = amount
Of course, you may now need to test for Undefined in other methods to make sure they're not being used before the instance is properly initialized. A better approach might be to set the attribute during initialization and require its value to be passed in to __init__(). That way, you avoid having an Invoice in an invalid (incompletely initialized) state. Someone could still set _amount to an invalid value, but they'd simply get the trouble they were asking for. We're all adults here.
Don't use a property in this case. Your getters and setters don't do anything except return and set the value. Just use a normal attribute. If you later want to control access, you can just add that property later without changing the interface of your class!
As for dealing with non-set values, just create your own object to represent a value that has never been set. It can be as simple as:
>>> NOT_SET = object()
>>> class Invoice:
... def __init__(self, invoice_id):
... self.invoice_id = invoice_id
... self.amount = NOT_SET
...
>>> inv = Invoice(42)
>>> if inv.amount is NOT_SET:
... inv.amount = 1
...
You could also use an enum if you want better support for typing.
You can make a static class and use that to determine whether a value has been set at all.
class AmountNotSet(object):
pass
class Invoice(object):
def __init__(self, invoice_id):
self.invoice_id = invoice_id
self._amount = AmountNotSet
# ...etc...
Then you can check whether the invoice is set or not like so:
invoice1 = Invoice(1)
invoice2 = Invoice(2)
invoice2.amount = None
invoice1.amount is AmountNotSet # => True
invoice2.amount is None # => True
I want to pass a default argument to an instance method using the value of an attribute of the instance:
class C:
def __init__(self, format):
self.format = format
def process(self, formatting=self.format):
print(formatting)
When trying that, I get the following error message:
NameError: name 'self' is not defined
I want the method to behave like this:
C("abc").process() # prints "abc"
C("abc").process("xyz") # prints "xyz"
What is the problem here, why does this not work? And how could I make this work?
You can't really define this as the default value, since the default value is evaluated when the method is defined which is before any instances exist. The usual pattern is to do something like this instead:
class C:
def __init__(self, format):
self.format = format
def process(self, formatting=None):
if formatting is None:
formatting = self.format
print(formatting)
self.format will only be used if formatting is None.
To demonstrate the point of how default values work, see this example:
def mk_default():
print("mk_default has been called!")
def myfun(foo=mk_default()):
print("myfun has been called.")
print("about to test functions")
myfun("testing")
myfun("testing again")
And the output here:
mk_default has been called!
about to test functions
myfun has been called.
myfun has been called.
Notice how mk_default was called only once, and that happened before the function was ever called!
In Python, the name self is not special. It's just a convention for the parameter name, which is why there is a self parameter in __init__. (Actually, __init__ is not very special either, and in particular it does not actually create the object... that's a longer story)
C("abc").process() creates a C instance, looks up the process method in the C class, and calls that method with the C instance as the first parameter. So it will end up in the self parameter if you provided it.
Even if you had that parameter, though, you would not be allowed to write something like def process(self, formatting = self.formatting), because self is not in scope yet at the point where you set the default value. In Python, the default value for a parameter is calculated when the function is compiled, and "stuck" to the function. (This is the same reason why, if you use a default like [], that list will remember changes between calls to the function.)
How could I make this work?
The traditional way is to use None as a default, and check for that value and replace it inside the function. You may find it is a little safer to make a special value for the purpose (an object instance is all you need, as long as you hide it so that the calling code does not use the same instance) instead of None. Either way, you should check for this value with is, not ==.
Since you want to use self.format as a default argument this implies that the method needs to be instance specific (i.e. there is no way to define this at class level). Instead you can define the specific method during the class' __init__ for example. This is where you have access to instance specific attributes.
One approach is to use functools.partial in order to obtain an updated (specific) version of the method:
from functools import partial
class C:
def __init__(self, format):
self.format = format
self.process = partial(self.process, formatting=self.format)
def process(self, formatting):
print(formatting)
c = C('default')
c.process()
# c.process('custom') # Doesn't work!
c.process(formatting='custom')
Note that with this approach you can only pass the corresponding argument by keyword, since if you provided it by position, this would create a conflict in partial.
Another approach is to define and set the method in __init__:
from types import MethodType
class C:
def __init__(self, format):
self.format = format
def process(self, formatting=self.format):
print(formatting)
self.process = MethodType(process, self)
c = C('test')
c.process()
c.process('custom')
c.process(formatting='custom')
This allows also passing the argument by position, however the method resolution order becomes less apparent (which can affect the IDE inspection for example, but I suppose there are IDE specific workarounds for that).
Another approach would be to create a custom type for these kind of "instance attribute defaults" together with a special decorator that performs the corresponding getattr argument filling:
import inspect
class Attribute:
def __init__(self, name):
self.name = name
def decorator(method):
signature = inspect.signature(method)
def wrapper(self, *args, **kwargs):
bound = signature.bind(*((self,) + args), **kwargs)
bound.apply_defaults()
bound.arguments.update({k: getattr(self, v.name) for k, v in bound.arguments.items()
if isinstance(v, Attribute)})
return method(*bound.args, **bound.kwargs)
return wrapper
class C:
def __init__(self, format):
self.format = format
#decorator
def process(self, formatting=Attribute('format')):
print(formatting)
c = C('test')
c.process()
c.process('custom')
c.process(formatting='custom')
You can't access self in the method definition. My workaround is this -
class Test:
def __init__(self):
self.default_v = 20
def test(self, v=None):
v = v or self.default_v
print(v)
Test().test()
> 20
Test().test(10)
> 10
"self" need to be pass as the first argument to any class functions if you want them to behave as non-static methods.
it refers to the object itself. You could not pass "self" as default argument as it's position is fix as first argument.
In your case instead of "formatting=self.format" use "formatting=None" and then assign value from code as below:
[EDIT]
class c:
def __init__(self, cformat):
self.cformat = cformat
def process(self, formatting=None):
print "Formating---",formatting
if formatting == None:
formatting = self.cformat
print formatting
return formatting
else:
print formatting
return formatting
c("abc").process() # prints "abc"
c("abc").process("xyz") # prints "xyz"
Note : do not use "format" as variable name, 'cause it is built-in function in python
Instead of creating a list of if-thens that span your default arguements, one can make use of a 'defaults' dictionary and create new instances of a class by using eval():
class foo():
def __init__(self,arg):
self.arg = arg
class bar():
def __init__(self,*args,**kwargs):
#default values are given in a dictionary
defaults = {'foo1':'foo()','foo2':'foo()'}
for key in defaults.keys():
#if key is passed through kwargs, use that value of that key
if key in kwargs: setattr(self,key,kwargs[key])
#if no key is not passed through kwargs
#create a new instance of the default value
else: setattr(self,key, eval(defaults[key]))
I throw this at the beginning of every class that instantiates another class as a default argument. It avoids python evaluating the default at compile... I would love a cleaner pythonic approach, but lo'.
What must I do to use my objects of a custom type as keys in a Python dictionary (where I don't want the "object id" to act as the key) , e.g.
class MyThing:
def __init__(self,name,location,length):
self.name = name
self.location = location
self.length = length
I'd want to use MyThing's as keys that are considered the same if name and location are the same.
From C#/Java I'm used to having to override and provide an equals and hashcode method, and promise not to mutate anything the hashcode depends on.
What must I do in Python to accomplish this ? Should I even ?
(In a simple case, like here, perhaps it'd be better to just place a (name,location) tuple as key - but consider I'd want the key to be an object)
You need to add 2 methods, note __hash__ and __eq__:
class MyThing:
def __init__(self,name,location,length):
self.name = name
self.location = location
self.length = length
def __hash__(self):
return hash((self.name, self.location))
def __eq__(self, other):
return (self.name, self.location) == (other.name, other.location)
def __ne__(self, other):
# Not strictly necessary, but to avoid having both x==y and x!=y
# True at the same time
return not(self == other)
The Python dict documentation defines these requirements on key objects, i.e. they must be hashable.
An alternative in Python 2.6 or above is to use collections.namedtuple() -- it saves you writing any special methods:
from collections import namedtuple
MyThingBase = namedtuple("MyThingBase", ["name", "location"])
class MyThing(MyThingBase):
def __new__(cls, name, location, length):
obj = MyThingBase.__new__(cls, name, location)
obj.length = length
return obj
a = MyThing("a", "here", 10)
b = MyThing("a", "here", 20)
c = MyThing("c", "there", 10)
a == b
# True
hash(a) == hash(b)
# True
a == c
# False
You override __hash__ if you want special hash-semantics, and __cmp__ or __eq__ in order to make your class usable as a key. Objects who compare equal need to have the same hash value.
Python expects __hash__ to return an integer, returning Banana() is not recommended :)
User defined classes have __hash__ by default that calls id(self), as you noted.
There is some extra tips from the documentation.:
Classes which inherit a __hash__()
method from a parent class but change
the meaning of __cmp__() or __eq__()
such that the hash value returned is
no longer appropriate (e.g. by
switching to a value-based concept of
equality instead of the default
identity based equality) can
explicitly flag themselves as being
unhashable by setting __hash__ = None
in the class definition. Doing so
means that not only will instances of
the class raise an appropriate
TypeError when a program attempts to
retrieve their hash value, but they
will also be correctly identified as
unhashable when checking
isinstance(obj, collections.Hashable)
(unlike classes which define their own
__hash__() to explicitly raise TypeError).
I noticed in python 3.8.8 (maybe ever earlier) you don't need anymore explicitly declare __eq__() and __hash__() to have to opportunity to use your own class as a key in dict.
class Apple:
def __init__(self, weight):
self.weight = weight
def __repr__(self):
return f'Apple({self.weight})'
apple_a = Apple(1)
apple_b = Apple(1)
apple_c = Apple(2)
apple_dictionary = {apple_a : 3, apple_b : 4, apple_c : 5}
print(apple_dictionary[apple_a]) # 3
print(apple_dictionary) # {Apple(1): 3, Apple(1): 4, Apple(2): 5}
I assume from some time Python manages it on its own, however I can be wrong.
The answer for today as I know other people may end up here like me, is to use dataclasses in python >3.7. It has both hash and eq functions.
#dataclass(frozen=True) example (Python 3.7)
#dataclass had been previously mentioned at: https://stackoverflow.com/a/69313714/895245 but here's an example.
This awesome new feature, among other good things, automatically defines a __hash__ and __eq__ method for you, making it just work as usually expected in dicts and sets:
dataclass_cheat.py
from dataclasses import dataclass, FrozenInstanceError
#dataclass(frozen=True)
class MyClass1:
n: int
s: str
#dataclass(frozen=True)
class MyClass2:
n: int
my_class_1: MyClass1
d = {}
d[MyClass1(n=1, s='a')] = 1
d[MyClass1(n=2, s='a')] = 2
d[MyClass1(n=2, s='b')] = 3
d[MyClass2(n=1, my_class_1=MyClass1(n=1, s='a'))] = 4
d[MyClass2(n=2, my_class_1=MyClass1(n=1, s='a'))] = 5
d[MyClass2(n=2, my_class_1=MyClass1(n=2, s='a'))] = 6
assert d[MyClass1(n=1, s='a')] == 1
assert d[MyClass1(n=2, s='a')] == 2
assert d[MyClass1(n=2, s='b')] == 3
assert d[MyClass2(n=1, my_class_1=MyClass1(n=1, s='a'))] == 4
assert d[MyClass2(n=2, my_class_1=MyClass1(n=1, s='a'))] == 5
assert d[MyClass2(n=2, my_class_1=MyClass1(n=2, s='a'))] == 6
# Due to `frozen=True`
o = MyClass1(n=1, s='a')
try:
o.n = 2
except FrozenInstanceError as e:
pass
else:
raise 'error'
As we can see in this example, the hashes are being calculated based on the contents of the objects, and not simply on the addresses of instances. This is why something like:
d = {}
d[MyClass1(n=1, s='a')] = 1
assert d[MyClass1(n=1, s='a')] == 1
works even though the second MyClass1(n=1, s='a') is a completely different instance from the first with a different address.
frozen=True is mandatory, otherwise the class is not hashable, otherwise it would make it possible for users to inadvertently make containers inconsistent by modifying objects after they are used as keys. Further documentation: https://docs.python.org/3/library/dataclasses.html
Tested on Python 3.10.7, Ubuntu 22.10.