Python dataclasses_json: can I store many references to one object? - python

I want to use #dataclass_json decorator to store my #dataclass instances.
And I want to have many reference to one object in the instances. And I want to have this reference structure saved (so that I could modify one settings object and the modifications would be applied to many objects that use the settings).
It can be easily done while the dataclass object lies in memory, but when I try to store it in JSON, it saves the copy of instance instead of a reference of it. Can I somehow deal with it?
P.S. Here's my code example:
from dataclasses import dataclass
from dataclasses_json import dataclass_json
from typing import List
#dataclass_json
#dataclass
class RadarSettings:
freq: float = 10e9
prf: float = 1e-3
#dataclass_json
#dataclass
class Radar:
name: str = ""
preset_settings: RadarSettings = None # Here should be references to some boilerplate preset settings for many radars
custom_settings: RadarSettings = None # And here should be the custom settings to this current radar
#dataclass_json
#dataclass
class RadarScene:
name: str = ""
radars: List["Radar"] = None
preset = RadarSettings()
radar1 = Radar(name="mega search mode radar from hell", preset_settings=preset)
radar2 = Radar(name="satanic sensor array radar", preset_settings=preset)
# The preset_settings is one same object for both radars! If I modify it, the modifications will be applied to both radars
print(id(radar1.preset_settings), id(radar2.preset_settings))
scene_to_save = RadarScene(name="Infernal scene", radars=[radar1, radar2])
loaded_scene = RadarScene.from_json(scene_to_save.to_json())
print(id(loaded_scene.radars[0]), id(loaded_scene.radars[1]))
# Alas! Here will be two instances of preset_settings saved. I need one =(

The problem you have described is expected behavior. When you save your data to json format you get a string representation of the data that is plain text.
You may fix the issue with at least couple approaches.
Method 1.
Load RadarScene data, create preset = RadarSettings(), iterate over all Radars in the RadarScene and update preset_settings attribute: radar.preset_settings = preset. This method can be incapsulated into RadarScene class so you can call it right after loading data.
Method 2.
Create new singleton class RadarSettingsDefault inherited from RadarSettings and modify Radar class: preset_settings: RadarSettingsDefault = None.

Related

Trying to set a superclass field in a subclass using validator

I am trying to set a super-class field in a subclass using validator as follows:
Approach 1
from typing import List
from pydantic import BaseModel, validator, root_validator
class ClassSuper(BaseModel):
field1: int = 0
class ClassSub(ClassSuper):
field2: List[int]
#validator('field1')
def validate_field1(cls, v, values):
return len(values["field2"])
sub = ClassSub(field2=[1, 2, 3])
print(sub.field1) # It prints 0, but expected it to print 3
If I run the code above it prints 0, but I expected it to print 3 (which is basically len(field2)). However, if I use #root_validator() instead, I get the expected result.
Approach 2
from typing import List
from pydantic import BaseModel, validator, root_validator
class ClassSuper(BaseModel):
field1: int = 0
class ClassSub(ClassSuper):
field2: List[int]
#root_validator()
def validate_field1(cls, values):
values["field1"] = len(values["field2"])
return values
sub = ClassSub(field2=[1, 2, 3])
print(sub.field1) # This prints 3, as expected
New to using pydantic and I am bit puzzled what I am doing wrong with the Approach 1. Thank you for your help.
The reason your Approach 1 does not work is because by default, validators for a field are not called, when the value for that field is not supplied (see docs).
Your validate_field1 is never even called. If you add always=True to your #validator, the method is called, even if you don't provide a value for field1.
However, if you try that, you'll see that it will still not work, but instead throw an error about the key "field2" not being present in values.
This in turn is due to the fact that validators are called in the order they were defined. In this case, field1 is defined before field2, which means that field2 is not yet validated by the time validate_field1 is called. And values only contains previously-validated fields (see docs). Thus, at the time validate_field1 is called, values is simply an empty dictionary.
Using the #root_validator is the correct approach here because it receives the entire model's data, regardless of whether or not field values were supplied explicitly or by default.
And just as a side note: If you don't need to specify any parameters for it, you can use #root_validator without the parantheses.
And as another side note: If you are using Python 3.9+, you can use the regular list class as the type annotation. (See standard generic alias types) That means field2: list[int] without the need for typing.List.
Hope this helps.

Automatic indent pythoh dataclasses with commentы as Go structs style

Python dataclasses are really great. They allow to define classes in very beautiful way.
from dataclasses import dataclass
#dataclass
class InventoryItem:
"""Class for keeping track of an item in inventory."""
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
Moreover lots of useful tools re-use python annotations the same way and allow to define classes (that are more like structures in other languages) the same way. One of the example is Pydantic.
from pydantic import BaseModel
class User(BaseModel):
id: int
name = 'John Doe'
signup_ts: Optional[datetime] = None
friends: List[int] = []
I myself use pydantic quite a lot these days. Look at an example from my recent practice:
class G6A(BaseModel):
transaction_id: items.TransactionReference # Transaction Id
mpan_core: items.MPAN # MPAN Core
registration_date: items.CallistoDate # Registration Date
action_required: G6AAction # Action Required
I parse some very inconvenient api and that's why I want to leave a comment on every line. With that it will be working as self-documentation. The problem is that, at least for me, this looks very ugly. It's hard to look throw lines, 'cause the look like table with broken columns. Let's try to fix this by doing accurate indentations:
class G6A(BaseModel):
transaction_id: items.TransactionReference # Transaction Id
mpan_core: items.MPAN # MPAN Core
registration_date: items.CallistoDate # Registration Date
action_required: G6AAction # Action Required
I'm sure that this way it is much more readable. By doing so we define structure, like actual table, where 1 column is attribute name, 2 column is attribute type and the last one is comment. It is actually inspired by Go structs
type T struct {
name string // name of the object
value int // its value
}
So, my questions is - are there any automatic tools (linters), that will reformat dataclass/pydantic-models the way I described above? I looked throw autopep8, black linter and find nothing. Also googled and so on and still nothing. Any ideas how to achieve that by existing tools ?
I think yapf has something like that for comments. Check the SPACES_BEFORE_COMMENT "knob":
The number of spaces required before a trailing comment. This can be a single value (representing the number of spaces before each trailing comment) or list of of values (representing alignment column values; trailing comments within a block will be aligned to the first column value that is greater than the maximum line length within the block)
.style.yapf:
[style]
based_on_style = pep8
spaces_before_comment = 10,20,30,40,50,60,70,80
Configures alignment columns: 10, 20, ..., 80.
foo.py:
class G6A(BaseModel):
transaction_id: items.TransactionReference # Transaction Id
mpan_core: items.MPAN # MPAN Core
registration_date: items.CallistoDate # Registration Date
action_required: G6AAction # Action Required
Output of yapf foo.py:
class G6A(BaseModel):
transaction_id: items.TransactionReference # Transaction Id
mpan_core: items.MPAN # MPAN Core
registration_date: items.CallistoDate # Registration Date
action_required: G6AAction # Action Required

Dataclass in python when the attribute doesn't respect naming rules

If you have data like this (from a yaml file):
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
...
How would load that in a dataclass that is explicit about the keys and type it has?
Ideally I would have:
#dataclasses.dataclass
class X:
C>A/G>T: str
C>G/G>C: str
...
Update:
SBS_Mutations = TypedDict(
"SBS_Mutations",
{
"C>A/G>T": str,
"C>G/G>C": str,
"C>T/G>A": str,
"T>A/A>T": str,
"T>C/A>G": str,
"T>G/A>C": str,
},
)
my_data = {....}
SBS_Mutations(my_data) # not sure how to use it here
if you want symbols like that, they obviously can't be Python identifiers, and then, it is meaningless to want to use the facilities that a dataclass, with attribute access, gives you.
Just keep your data in dictionaries, or in Pandas dataframes, where such names can be column titles.
Otherwise, post a proper code snippet with a minimum example of where you are getting the data from, and then, one can add in an answer, a proper place to translate your orignal name into a valid Python attribute name, and help building a dynamic data class with it.
This sounds like a good use case for my dotwiz library, which I have recently published. This provides a dict subclass which enables attribute-style dot access for nested keys.
As of the recent release, it offers a DotWizPlus implementation (a wrapper around a dict object) that also case transforms keys so that they are valid lower-cased, python identifier names, as shown below.
# requires the following dependencies:
# pip install PyYAML dotwiz
import yaml
from dotwiz import DotWizPlus
yaml_str = """
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
"""
yaml_dict = yaml.safe_load(yaml_str)
print(yaml_dict)
dw = DotWizPlus(yaml_dict)
print(dw)
assert dw.items.c_a_g_t == '#string' # True
print(dw.to_attr_dict())
Output:
{'items': {'C>A/G>T': '#string', 'C>G/G>C': '#string'}}
✪(items=✪(c_a_g_t='#string', c_g_g_c='#string'))
{'items': {'c_a_g_t': '#string', 'c_g_g_c': '#string'}}
NB: This currently fails when accessing the key items from just a DotWiz instance, as the key name conflicts with the builtin attribute dict.items(). I've currently submitted a bug request and hopefully work through this one edge case in particular.
Type Hinting
If you want type-hinting or auto-suggestions for field names, you can try something like this where you subclass from DotWizPlus:
import yaml
from dotwiz import DotWizPlus
class Item(DotWizPlus):
c_a_g_t: str
c_g_g_c: str
#classmethod
def from_yaml(cls, yaml_string: str, loader=yaml.safe_load):
yaml_dict = loader(yaml_str)
return cls(yaml_dict['items'])
yaml_str = """
items:
C>A/G>T: "#string1"
C>G/G>C: "#string2"
"""
dw = Item.from_yaml(yaml_str)
print(dw)
# ✪(c_a_g_t='#string1', c_g_g_c='#string2')
assert dw.c_a_g_t == '#string1' # True
# auto-completion will work, as IDE knows the type is a `str`
# dw.c_a_g_t.
Dataclasses
If you would still prefer dataclasses for type-hinting purposes, there is another library you can also check out called dataclass-wizard, which can help to simplify this task as well.
More specifically, YAMLWizard makes it easier to load/dump a class object with YAML. Note that this uses the PyYAML library behind the scenes by default.
Note that I couldn't get the case-transform to work in this case, since I guess it's a bug in the underlying to_snake_case() implementation. I'm also going to submit a bug request to look into this edge case. However, for now it should work if the key name in YAML is specified a bit more explicitly:
from dataclasses import dataclass
from dataclass_wizard import YAMLWizard, json_field
yaml_str = """
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
"""
#dataclass
class Container(YAMLWizard):
items: 'Item'
#dataclass
class Item:
c_a_g_t: str = json_field('C>A/G>T')
c_g_g_c: str = json_field('C>G/G>C')
c = Container.from_yaml(yaml_str)
print(c)
# True
assert c.items.c_g_g_c == c.items.c_a_g_t == '#string'
Output:
Container(items=Item(c_a_g_t='#string', c_g_g_c='#string'))

Check for extra keys in marshmallow.Schema.dump()

I want to be able to take some Python object (more precisely, a dataclass) and dump it to it's dict representation using a schema. Let me give you an example:
from marshmallow import Schema, field
import dataclasses
#dataclasses.dataclass
class Foo:
x: int
y: int
z: int
class FooSchema(Schema):
x = field.Int()
y = field.Int()
FooSchema().dump(Foo(1,2,3))
As you can see, the schema differs from the Foo definition. I want to somehow be able to recognize it when dumping - so I would get some ValidationError with an explanation that there's an extra field z. It doesn't really have to be .dump(), I looked at .load() and .validate() but only the former seems to accept objects, not only dicts.
Is there a way to do this in marshmallow? Because for now when I do this dump, I would just get a dictionary: {"x": 1, "y": 2} without z of course, but no errors whatsoever. And I would want the same behavior for a case, when there's no key in dumped object (like z was in schema but not in Foo). This wold basically serve me as a sanity check of changes done to the classes themselves - maybe if it's not possible in marshmallow you know some lib/technique that makes it so?
So I had this problem today and did some digging. Based off https://github.com/marshmallow-code/marshmallow/issues/1545 its something people are considering but the current implimentaiton of dump iterates through the fields listed dout in the schema definition so wont work.
The best I could get to work was:
from marshmallow import EXCLUDE, INCLUDE, RAISE, Schema, fields
class FooSchema(Schema):
class Meta:
unknown = INCLUDE
x = fields.Int()
y = fields.Int()
Which atleast sort of displays as a dict.

Python: Factory Boy to generate List of length specified on object creation

I'm trying to use Factoryboy to create a list in an object of the length specified when created.
I can create the list, but every attempt to create a list with the length specified causes issues due to the lazy nature of the provided length/size.
This is what I have so far:
class FooFactory(factory.Factory):
class Meta:
model = command.Foo
foo_uuid = factory.Faker("uuid4")
bars = factory.List([
factory.LazyAttribute(lambda o: BarFactory()
for _ in range(3))
])
This will create a list of 3 random Bars. I have tried using a combination of Params and exclude, but because range expects an Int, and the int won't be lazily loaded until later, it causes an error.
I would like something similar to how one to many relationships are generated with post_generation ie.
foo = FooFactory(number_of_bars=5)
Anyone had any luck with this?
Main solution
Two things are needed for this:
parameters
and LazyAttribute
(the links point to their documentation, for more detail).
Parameters are like factory attributes that are not passed to the instance that will be created.
In this case, they provide a way to parametrize the length of the list of Bars.
But in order to use parameters to customize a field in the factory, we need to have access to self,
that is, the instance being built.
We can achieve that with LazyAttribute, which is a declaration that takes a function with one argument:
the object being built.
Just what we needed.
So the snippet in the question could be re-written as follows:
class FooFactory(factory.Factory):
class Meta:
model = command.Foo
class Params:
number_of_bars = 1
foo_uuid = factory.Faker("uuid4")
bars = factory.LazyAttribute(lambda self: [BarFactory()] * self.number_of_bars)
And used like this:
foo = FooFactory(number_of_bars=3)
If the number_of_bars argument is not provided, the default of 1 is used.
Drawbacks
Sadly, there is a some limitation to what we can do here.
The preferred way to use a factory in the definition of another factory is via
SubFactory.
That is preferred for two reasons:
it respects the build strategy used for the parent factory
it collects extra keyword arguments to customize the subfactory
The first one means that if we used SubFactory to build a Bar in FooFactory
and called FooFactory with FooFactory.create or FooFactory.build,
the Bar subfactory would respect that and use the same strategy.
In summary, the build strategy only builds an instance,
while the create strategy builds and saves the instance to the persistent storage being used,
for example a database, so respecting this choice is important.
See the docs
for more details.
The second one means that we can directly customize attributes of Bar when calling FooFactory.
For example:
foo = FooFactory(bar__id=2)
would set the id of the bar of foo to be 2 instead of what the Bar subfactory would generate by default.
But I could not find a way to use SubFactory and a dynamic length via Params.
There is no way, as far as I know, to access the value of a parameter in a context where FactoryBoy expects a SubFactory.
The problem is that the declarations that give us access to the object being built always expect a final value to be returned,
not another factory to be called later.
This means that, in the example above, if we write instead:
class FooFactory(factory.Factory):
# ... rest of the factory
bars = factory.LazyAttribute(lambda self: [factory.SubFactory(BarFactory)] * self.number_of_bars)
then calling it like
foo = FooFactory(number_of_bars=3)
would result in a foo that has a list of 3 BarFactory in foo.bars instead of a list of 3 Bars.
And using SelfAttribute,
which is a way to reference another attribute of the instance being built, doesn't work either
because it is not evaluated before the rest of the expression in a declaration like this:
class FooFactory(factory.Factory):
# ... rest of the factory
bars = factory.List([factory.SubFactory(BarFactory)] * SelfAttribute("number_of_bars"))
That raises TypeError: can't multiply sequence by non-int of type 'SelfAttribute'.
A possible workaround is to call BarFactory beforehand and pass it to FooFactory:
number_of_bars = 3
bars = BarFactory.create_batch(number_of_bars)
foo = FooFactory(bars=bars)
But that's certainly not as nice.
Another one that I found out recently is RelatedFactoryList.
But that's still experimental and it doesn't seem to have a way to access parameters.
Additionally, since it's generated after the base factory, it also might not work if the instance constructor
expects that attribute as an argument.
There is a way to pass the length of a list and retain the ability to set additional properties on the subfactory. It requires creating a post_generation method.
class FooFactory(factory.Factory):
class Meta:
model = command.Foo
foo_uuid = factory.Faker("uuid4")
bars__count = 5 # Optional: default number of bars to create
#factory.post_generation
def bars(self, create, extracted, **kwargs):
if not create:
return
num_bars = kwargs.get('count', 0)
color = kwargs.get('color')
if num_bars > 0:
self.bars = [BarFactory(color=color)] * num_bars
elif extracted:
self.bars=extracted
any parameter with the construct modelname__paramname will be passed to the post_generation method as paramname in kwargs.
You can then call the FooFactory as:
FooFactory.create(bars__color='blue')
and it will create Foo with 5 Bars (the default value).
You can also call FooFactory and tell it to create 10 Bars.
FooFactory.create(bars__color='blue', bars__count=10)

Categories