Dictionary vs Marshmallow dataclass schemas for webargs - python

The webargs module allows to describe argument schemas as a pure dictionary or marshmallow dataclass schema:
# Dictionary variant
#use_args({'field1': field.Int(required=True, validate=validate.Range(min=1))}, location='json')
def post(args: Dict[str, any]):
controller.post(args)
# Marshmallow dataclass schema
#dataclass()
class Arg:
field1: int = field(metadata=dict(required=True, validate=validate.Range(min=1)))
#use_args(Arg.Schema(), location='json')
def post(arg: Arg):
controller.post(arg)
The first variant looks shorter and faster but we lose syntax highlight and type checks in IDE (cuz it's dict) and also it leads to longer calls, i.e. args['field1'] instead of arg.field1.
Which variant do you use in your big projects? Are there some best practices when to use the first or second variant?

There is no best practice. It is a matter of preference, really.
Sometimes, people think that a schema is too much code for just just one or two query args and a dict is enough.
I like to use schemas everywhere. I find it more consistent and it allows all schemas to derive from a base schema to inherit Meta parameters.
I don't use marshmallow dataclass, just pure marshmallow, so I always get a dict anyway.

Related

What's the diference between 'normal' Python Classes and Pydantic Classes?

I would like to know the difference between classes built normally in python and those built with the Pydantic lib, for example:
eg normal;
class Node:
def __init__(self, chave=None, esquerda=None, direita=None):
self.chave = chave
self.esquerda = esquerda
self.direita = direita
eg pydantic;
from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel
class User(BaseModel):
id: int
name = 'John Doe'
signup_ts: Optional[datetime] = None
friends: List[int] = []
There are a few main differences.
Firstly, purpose. Pydantic models are designed to:
Data validation and settings management using python type annotations.
pydantic enforces type hints at runtime, and provides user friendly errors when data is invalid.
When you use type annotations you receive a lot of validators and some useful methods out of the box.
As Ahmed and John says, in your example you can't assign “hello” to id in BaseModel (pydantic) because you type id as an int. But you can pass a string “1” (must be a numerical, not float) and it will be mapped to int. In this case:
pydantic uses int(v) to coerce types to an int; see this warning on loss of information during data conversion
Also Pydantic models allows you to use many more types than standard python types, like urls and much more. It means that you can easily validate more data types.
You can easily create complex models using composition.
Pydantic has some kind of integration with orms: docs
There are a lot of other features, much more than I can describe in a single answer. I strongly recommend reading the documentation, it is very clear and useful.
The pydantic models are very useful for example in building microservices where you can share your interfaces as pydantic models. Also all models can easily generate the json schema. See: Schema, exporting models.
Pydantic is also a big part of a growing in popularity python web framework fastapi.

Dataclass in python when the attribute doesn't respect naming rules

If you have data like this (from a yaml file):
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
...
How would load that in a dataclass that is explicit about the keys and type it has?
Ideally I would have:
#dataclasses.dataclass
class X:
C>A/G>T: str
C>G/G>C: str
...
Update:
SBS_Mutations = TypedDict(
"SBS_Mutations",
{
"C>A/G>T": str,
"C>G/G>C": str,
"C>T/G>A": str,
"T>A/A>T": str,
"T>C/A>G": str,
"T>G/A>C": str,
},
)
my_data = {....}
SBS_Mutations(my_data) # not sure how to use it here
if you want symbols like that, they obviously can't be Python identifiers, and then, it is meaningless to want to use the facilities that a dataclass, with attribute access, gives you.
Just keep your data in dictionaries, or in Pandas dataframes, where such names can be column titles.
Otherwise, post a proper code snippet with a minimum example of where you are getting the data from, and then, one can add in an answer, a proper place to translate your orignal name into a valid Python attribute name, and help building a dynamic data class with it.
This sounds like a good use case for my dotwiz library, which I have recently published. This provides a dict subclass which enables attribute-style dot access for nested keys.
As of the recent release, it offers a DotWizPlus implementation (a wrapper around a dict object) that also case transforms keys so that they are valid lower-cased, python identifier names, as shown below.
# requires the following dependencies:
# pip install PyYAML dotwiz
import yaml
from dotwiz import DotWizPlus
yaml_str = """
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
"""
yaml_dict = yaml.safe_load(yaml_str)
print(yaml_dict)
dw = DotWizPlus(yaml_dict)
print(dw)
assert dw.items.c_a_g_t == '#string' # True
print(dw.to_attr_dict())
Output:
{'items': {'C>A/G>T': '#string', 'C>G/G>C': '#string'}}
✪(items=✪(c_a_g_t='#string', c_g_g_c='#string'))
{'items': {'c_a_g_t': '#string', 'c_g_g_c': '#string'}}
NB: This currently fails when accessing the key items from just a DotWiz instance, as the key name conflicts with the builtin attribute dict.items(). I've currently submitted a bug request and hopefully work through this one edge case in particular.
Type Hinting
If you want type-hinting or auto-suggestions for field names, you can try something like this where you subclass from DotWizPlus:
import yaml
from dotwiz import DotWizPlus
class Item(DotWizPlus):
c_a_g_t: str
c_g_g_c: str
#classmethod
def from_yaml(cls, yaml_string: str, loader=yaml.safe_load):
yaml_dict = loader(yaml_str)
return cls(yaml_dict['items'])
yaml_str = """
items:
C>A/G>T: "#string1"
C>G/G>C: "#string2"
"""
dw = Item.from_yaml(yaml_str)
print(dw)
# ✪(c_a_g_t='#string1', c_g_g_c='#string2')
assert dw.c_a_g_t == '#string1' # True
# auto-completion will work, as IDE knows the type is a `str`
# dw.c_a_g_t.
Dataclasses
If you would still prefer dataclasses for type-hinting purposes, there is another library you can also check out called dataclass-wizard, which can help to simplify this task as well.
More specifically, YAMLWizard makes it easier to load/dump a class object with YAML. Note that this uses the PyYAML library behind the scenes by default.
Note that I couldn't get the case-transform to work in this case, since I guess it's a bug in the underlying to_snake_case() implementation. I'm also going to submit a bug request to look into this edge case. However, for now it should work if the key name in YAML is specified a bit more explicitly:
from dataclasses import dataclass
from dataclass_wizard import YAMLWizard, json_field
yaml_str = """
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
"""
#dataclass
class Container(YAMLWizard):
items: 'Item'
#dataclass
class Item:
c_a_g_t: str = json_field('C>A/G>T')
c_g_g_c: str = json_field('C>G/G>C')
c = Container.from_yaml(yaml_str)
print(c)
# True
assert c.items.c_g_g_c == c.items.c_a_g_t == '#string'
Output:
Container(items=Item(c_a_g_t='#string', c_g_g_c='#string'))

Check for extra keys in marshmallow.Schema.dump()

I want to be able to take some Python object (more precisely, a dataclass) and dump it to it's dict representation using a schema. Let me give you an example:
from marshmallow import Schema, field
import dataclasses
#dataclasses.dataclass
class Foo:
x: int
y: int
z: int
class FooSchema(Schema):
x = field.Int()
y = field.Int()
FooSchema().dump(Foo(1,2,3))
As you can see, the schema differs from the Foo definition. I want to somehow be able to recognize it when dumping - so I would get some ValidationError with an explanation that there's an extra field z. It doesn't really have to be .dump(), I looked at .load() and .validate() but only the former seems to accept objects, not only dicts.
Is there a way to do this in marshmallow? Because for now when I do this dump, I would just get a dictionary: {"x": 1, "y": 2} without z of course, but no errors whatsoever. And I would want the same behavior for a case, when there's no key in dumped object (like z was in schema but not in Foo). This wold basically serve me as a sanity check of changes done to the classes themselves - maybe if it's not possible in marshmallow you know some lib/technique that makes it so?
So I had this problem today and did some digging. Based off https://github.com/marshmallow-code/marshmallow/issues/1545 its something people are considering but the current implimentaiton of dump iterates through the fields listed dout in the schema definition so wont work.
The best I could get to work was:
from marshmallow import EXCLUDE, INCLUDE, RAISE, Schema, fields
class FooSchema(Schema):
class Meta:
unknown = INCLUDE
x = fields.Int()
y = fields.Int()
Which atleast sort of displays as a dict.

"str" is not a dataclass error with marshmallow and marshmallow-dataclass

Using python, I am trying to create data schema for my dataclasses using marshmallow and marshmallow-dataclass. I believe I have followed the docs, adding the decorator to my relevant dataclasses and NewTypes to my fields that are not standard python objects. However, I am getting an error before the program even loads.
The error relates to a str not being a dataclass and I have no idea how the decorator is processing a string instead of a dataclass.
I am sure I have missed something simple, so apologies in advance if that is the case.
A summarised version of the code is:
from marshmallow_dataclass import dataclass as m_dataclass, NewType
ProjectileDataType = NewType("ProjectileDataType", Any)
#m_dataclass
class ProjectileData:
Schema: ClassVar[Type[Schema]] = Schema
# what created it?
skill_name: str = field(default="None")
# what does it look like?
sprite: str = field(default="None")
defintions.py : https://pastebin.com/tHnVE2Gc
Error traceback: https://pastebin.com/htuqhKSU
Docs: https://github.com/lovasoa/marshmallow_dataclass , https://marshmallow.readthedocs.io/en/stable/quickstart.html

How to apply 'load_from' and 'dump_to' to every field in a marshmallow schema?

I've been trying to implement an 'abstract' schema class that will automatically convert values in CamelCase (serialized) to snake_case (deserialized).
class CamelCaseSchema(marshmallow.Schema):
#marshmallow.pre_load
def camel_to_snake(self, data):
return {
utils.CaseConverter.camel_to_snake(key): value for key, value in data.items()
}
#marshmallow.post_dump
def snake_to_camel(self, data):
return {
utils.CaseConverter.snake_to_camel(key): value for key, value in data.items()
}
While using something like this works nicely, it does not achieve everything applying load_from and dump_to to a field does. Namely, it fails to provide correct field names when there's an issue with deserialization. For instance, I get:
{'something_id': [u'Not a valid integer.']} instead of {'somethingId': [u'Not a valid integer.']}.
While I can post-process these emitted errors, this seems like an unnecessary coupling that I wish to avoid if I'm to make the use of schema fully transparent.
Any ideas? I tried tackling the metaclasses involved, but the complexity was a bit overwhelming and everything seemed exceptionally ugly.
You're using marshmallow 2. Marshmallow 3 is now out and I recommend using it. My answer will apply to marshmallow 3.
In marshmallow 3, load_from / dump_to have been replace by a single attribute : data_key.
You'd need to alter data_key in each field when instantiating the schema. This will happen after field instantiation but I don't think it matters.
You want to do that ASAP when the schema is instantiated to avoid inconsistency issues. The right moment to do that would be in the middle of Schema._init_fields, before the data_key attributes are checked for consistency. But duplicating this method would be a pity. Besides, due to the nature of the camel/snake case conversion the consistency checks can be applied before the conversion anyway.
And since _init_fields is private API, I'd recommend doing the modification at the end of __init__.
class CamelCaseSchema(Schema):
def __init__(self, **kwargs):
super().__init__(**kwargs)
for field_name, field in self.fields.items():
fields.data_key = utils.CaseConverter.snake_to_camel(field_name)
I didn't try that but I think it should work.

Categories