Python dataclasses are really great. They allow to define classes in very beautiful way.
from dataclasses import dataclass
#dataclass
class InventoryItem:
"""Class for keeping track of an item in inventory."""
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
Moreover lots of useful tools re-use python annotations the same way and allow to define classes (that are more like structures in other languages) the same way. One of the example is Pydantic.
from pydantic import BaseModel
class User(BaseModel):
id: int
name = 'John Doe'
signup_ts: Optional[datetime] = None
friends: List[int] = []
I myself use pydantic quite a lot these days. Look at an example from my recent practice:
class G6A(BaseModel):
transaction_id: items.TransactionReference # Transaction Id
mpan_core: items.MPAN # MPAN Core
registration_date: items.CallistoDate # Registration Date
action_required: G6AAction # Action Required
I parse some very inconvenient api and that's why I want to leave a comment on every line. With that it will be working as self-documentation. The problem is that, at least for me, this looks very ugly. It's hard to look throw lines, 'cause the look like table with broken columns. Let's try to fix this by doing accurate indentations:
class G6A(BaseModel):
transaction_id: items.TransactionReference # Transaction Id
mpan_core: items.MPAN # MPAN Core
registration_date: items.CallistoDate # Registration Date
action_required: G6AAction # Action Required
I'm sure that this way it is much more readable. By doing so we define structure, like actual table, where 1 column is attribute name, 2 column is attribute type and the last one is comment. It is actually inspired by Go structs
type T struct {
name string // name of the object
value int // its value
}
So, my questions is - are there any automatic tools (linters), that will reformat dataclass/pydantic-models the way I described above? I looked throw autopep8, black linter and find nothing. Also googled and so on and still nothing. Any ideas how to achieve that by existing tools ?
I think yapf has something like that for comments. Check the SPACES_BEFORE_COMMENT "knob":
The number of spaces required before a trailing comment. This can be a single value (representing the number of spaces before each trailing comment) or list of of values (representing alignment column values; trailing comments within a block will be aligned to the first column value that is greater than the maximum line length within the block)
.style.yapf:
[style]
based_on_style = pep8
spaces_before_comment = 10,20,30,40,50,60,70,80
Configures alignment columns: 10, 20, ..., 80.
foo.py:
class G6A(BaseModel):
transaction_id: items.TransactionReference # Transaction Id
mpan_core: items.MPAN # MPAN Core
registration_date: items.CallistoDate # Registration Date
action_required: G6AAction # Action Required
Output of yapf foo.py:
class G6A(BaseModel):
transaction_id: items.TransactionReference # Transaction Id
mpan_core: items.MPAN # MPAN Core
registration_date: items.CallistoDate # Registration Date
action_required: G6AAction # Action Required
Related
I am trying to set a super-class field in a subclass using validator as follows:
Approach 1
from typing import List
from pydantic import BaseModel, validator, root_validator
class ClassSuper(BaseModel):
field1: int = 0
class ClassSub(ClassSuper):
field2: List[int]
#validator('field1')
def validate_field1(cls, v, values):
return len(values["field2"])
sub = ClassSub(field2=[1, 2, 3])
print(sub.field1) # It prints 0, but expected it to print 3
If I run the code above it prints 0, but I expected it to print 3 (which is basically len(field2)). However, if I use #root_validator() instead, I get the expected result.
Approach 2
from typing import List
from pydantic import BaseModel, validator, root_validator
class ClassSuper(BaseModel):
field1: int = 0
class ClassSub(ClassSuper):
field2: List[int]
#root_validator()
def validate_field1(cls, values):
values["field1"] = len(values["field2"])
return values
sub = ClassSub(field2=[1, 2, 3])
print(sub.field1) # This prints 3, as expected
New to using pydantic and I am bit puzzled what I am doing wrong with the Approach 1. Thank you for your help.
The reason your Approach 1 does not work is because by default, validators for a field are not called, when the value for that field is not supplied (see docs).
Your validate_field1 is never even called. If you add always=True to your #validator, the method is called, even if you don't provide a value for field1.
However, if you try that, you'll see that it will still not work, but instead throw an error about the key "field2" not being present in values.
This in turn is due to the fact that validators are called in the order they were defined. In this case, field1 is defined before field2, which means that field2 is not yet validated by the time validate_field1 is called. And values only contains previously-validated fields (see docs). Thus, at the time validate_field1 is called, values is simply an empty dictionary.
Using the #root_validator is the correct approach here because it receives the entire model's data, regardless of whether or not field values were supplied explicitly or by default.
And just as a side note: If you don't need to specify any parameters for it, you can use #root_validator without the parantheses.
And as another side note: If you are using Python 3.9+, you can use the regular list class as the type annotation. (See standard generic alias types) That means field2: list[int] without the need for typing.List.
Hope this helps.
So, I'm building an API to interact with my personal wine labels collections database.
For what I understand, a pydantic model purpose is to serve as a "verifier" of the schema that is sent to the API. So, my pydantic schema for adding a label is the following:
from pydantic import BaseModel
from typing import Optional
class WineLabels(BaseModel):
name: Optional[str]
type: Optional[str]
year = Optional[int]
grapes = Optional[str]
country = Optional[str]
region = Optional[str]
price = Optional[float]
id = Optional[str]
None of the fields is to be updated automatically. This is equal to the sqlalchemy model since I want to add all the fields manually.
So my question is, let's say I want to create a call to search by ID and another one to search by name. I do not believe these schema should be applied. Should I create another schema ? Should I create one like this?:
class SearchWineLabel(WineLabels):
id: str
Should a schema be created for each purpose that cannot be fulfilled by an already existing schema?
Sorry, but I can't understand the logic behind it.
Thanks!!
If you want to search by id or name, I'm not sure if you even need a schema - one or more get parameters would usually be enough in those cases (and is usually better semantically).
In any case, the schema would be written for what the endpoint is expected to receive, not by using a general schema that contains the field in some other way. Think of the schemas as the input/output definitions for given resources and endpoints.
You usually want to have different schemas for adding and updating (since adding will require certain fields to be present, while updating may allow null or a missing field in any location).
The Pydantic schemas will allow you to express these differences without writing code, and it will be reflected in your generated api docs under /docs
If you have data like this (from a yaml file):
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
...
How would load that in a dataclass that is explicit about the keys and type it has?
Ideally I would have:
#dataclasses.dataclass
class X:
C>A/G>T: str
C>G/G>C: str
...
Update:
SBS_Mutations = TypedDict(
"SBS_Mutations",
{
"C>A/G>T": str,
"C>G/G>C": str,
"C>T/G>A": str,
"T>A/A>T": str,
"T>C/A>G": str,
"T>G/A>C": str,
},
)
my_data = {....}
SBS_Mutations(my_data) # not sure how to use it here
if you want symbols like that, they obviously can't be Python identifiers, and then, it is meaningless to want to use the facilities that a dataclass, with attribute access, gives you.
Just keep your data in dictionaries, or in Pandas dataframes, where such names can be column titles.
Otherwise, post a proper code snippet with a minimum example of where you are getting the data from, and then, one can add in an answer, a proper place to translate your orignal name into a valid Python attribute name, and help building a dynamic data class with it.
This sounds like a good use case for my dotwiz library, which I have recently published. This provides a dict subclass which enables attribute-style dot access for nested keys.
As of the recent release, it offers a DotWizPlus implementation (a wrapper around a dict object) that also case transforms keys so that they are valid lower-cased, python identifier names, as shown below.
# requires the following dependencies:
# pip install PyYAML dotwiz
import yaml
from dotwiz import DotWizPlus
yaml_str = """
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
"""
yaml_dict = yaml.safe_load(yaml_str)
print(yaml_dict)
dw = DotWizPlus(yaml_dict)
print(dw)
assert dw.items.c_a_g_t == '#string' # True
print(dw.to_attr_dict())
Output:
{'items': {'C>A/G>T': '#string', 'C>G/G>C': '#string'}}
✪(items=✪(c_a_g_t='#string', c_g_g_c='#string'))
{'items': {'c_a_g_t': '#string', 'c_g_g_c': '#string'}}
NB: This currently fails when accessing the key items from just a DotWiz instance, as the key name conflicts with the builtin attribute dict.items(). I've currently submitted a bug request and hopefully work through this one edge case in particular.
Type Hinting
If you want type-hinting or auto-suggestions for field names, you can try something like this where you subclass from DotWizPlus:
import yaml
from dotwiz import DotWizPlus
class Item(DotWizPlus):
c_a_g_t: str
c_g_g_c: str
#classmethod
def from_yaml(cls, yaml_string: str, loader=yaml.safe_load):
yaml_dict = loader(yaml_str)
return cls(yaml_dict['items'])
yaml_str = """
items:
C>A/G>T: "#string1"
C>G/G>C: "#string2"
"""
dw = Item.from_yaml(yaml_str)
print(dw)
# ✪(c_a_g_t='#string1', c_g_g_c='#string2')
assert dw.c_a_g_t == '#string1' # True
# auto-completion will work, as IDE knows the type is a `str`
# dw.c_a_g_t.
Dataclasses
If you would still prefer dataclasses for type-hinting purposes, there is another library you can also check out called dataclass-wizard, which can help to simplify this task as well.
More specifically, YAMLWizard makes it easier to load/dump a class object with YAML. Note that this uses the PyYAML library behind the scenes by default.
Note that I couldn't get the case-transform to work in this case, since I guess it's a bug in the underlying to_snake_case() implementation. I'm also going to submit a bug request to look into this edge case. However, for now it should work if the key name in YAML is specified a bit more explicitly:
from dataclasses import dataclass
from dataclass_wizard import YAMLWizard, json_field
yaml_str = """
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
"""
#dataclass
class Container(YAMLWizard):
items: 'Item'
#dataclass
class Item:
c_a_g_t: str = json_field('C>A/G>T')
c_g_g_c: str = json_field('C>G/G>C')
c = Container.from_yaml(yaml_str)
print(c)
# True
assert c.items.c_g_g_c == c.items.c_a_g_t == '#string'
Output:
Container(items=Item(c_a_g_t='#string', c_g_g_c='#string'))
I'am new to python language.
And my question is certainly a naive one and concerning python syntax.
I am at the step where I must go from theory to practice.
here is a class (a typescript one) I want to translate to python language.
class Category {
id: number;
type: 'shop'|'blog';
name: string;
slug: string;
path: string;
image: string|null;
items: number;
customFields: CustomFields;
parents?: Category[]|null;
children?: Category[]|null;
}
as python is untyped language I've got doubts about how to translate :
the optional property : '?'
the associated class : customFields: CustomFields;
the arrays of associated class (that are self associated) and that are nullable : children?: Category[]|null;
I've always worked with typed language until now and it's destabilising my habits to just write nothing.
would that look like this (it's a model for django.db migration):
>from django.db import models
>>class Category(models.Model):
>>> id = models.AutoField(primary_key=True)
>>> type: 'shop'|'blog'
>>> name = models.CharField(max_length=100)
>>> slug = models.CharField(max_length=100)
>>> path = models.CharField(max_length=250)
and then ... ?
could you provide also some tuto, doc, example where you learn python in pratice ?
thanks to all of you !
When you define a function in Python you can enforce static typing but its not necessary. In case you need to have a static typing enforced you can do something like this.
//for functions
def addition(a: int, b: int) -> int:
return a + b
addition(4,10)
//For Variables or attributes
name: str = 'test'
age: int = 10
rating: float = 1.11
is_exist: bool = True
There are more things found in python documentation related to typing in case
you can refer documentation.
it's best to learn the language by following the documentation for it, there are enough examples in the documentation. https://docs.python.org/3/
If you want to learn Django, there are quite a lot of tutorials. But first of all, look here. https://docs.djangoproject.com/en/3.2/
As for your example with the Category Class
The id is determined automatically and it is not necessary to explicitly specify it in the model, see more https://docs.djangoproject.com/en/3.2/topics/db/models/
If you must enforce typing in your Python code, take a look at isinstance.
For optional Class attributes in Python, you can use keyword arguments as shown in this answer.
I want to be able to take some Python object (more precisely, a dataclass) and dump it to it's dict representation using a schema. Let me give you an example:
from marshmallow import Schema, field
import dataclasses
#dataclasses.dataclass
class Foo:
x: int
y: int
z: int
class FooSchema(Schema):
x = field.Int()
y = field.Int()
FooSchema().dump(Foo(1,2,3))
As you can see, the schema differs from the Foo definition. I want to somehow be able to recognize it when dumping - so I would get some ValidationError with an explanation that there's an extra field z. It doesn't really have to be .dump(), I looked at .load() and .validate() but only the former seems to accept objects, not only dicts.
Is there a way to do this in marshmallow? Because for now when I do this dump, I would just get a dictionary: {"x": 1, "y": 2} without z of course, but no errors whatsoever. And I would want the same behavior for a case, when there's no key in dumped object (like z was in schema but not in Foo). This wold basically serve me as a sanity check of changes done to the classes themselves - maybe if it's not possible in marshmallow you know some lib/technique that makes it so?
So I had this problem today and did some digging. Based off https://github.com/marshmallow-code/marshmallow/issues/1545 its something people are considering but the current implimentaiton of dump iterates through the fields listed dout in the schema definition so wont work.
The best I could get to work was:
from marshmallow import EXCLUDE, INCLUDE, RAISE, Schema, fields
class FooSchema(Schema):
class Meta:
unknown = INCLUDE
x = fields.Int()
y = fields.Int()
Which atleast sort of displays as a dict.