How to pass dynamic strings to pydantic fields - python

I have this code in my framework on Python 3.10 + Pydantic
class DataInclude(BaseModel):
currencyAccount: Literal["CNY счет", "AMD счет", "RUB счет", "USD счет", "EUR счет", "GBP счет", "CHF счет"]
I want to learn how to do it right to use dynamic parameters in a string
name = (CNY, AMD, RUB, USD, EUR, GBP, CHF)
class DataInclude(BaseModel):
currencyAccount: Literal[f"{name}счет"]
With the help of regex I couldn't either

As I mentioned in my comment already, you cannot dynamically specify a typing.Literal type.
Instead of doing that, you could just create your own enum.Enum to represent the valid currency options. Pydantic plays nicely with those. And the Enum functional API allows you to set it up dynamically.
from enum import Enum
from pydantic import BaseModel, ValidationError
CURRENCIES = (
"CNY",
"AMD",
"RUB",
"USD",
"EUR",
"GBP",
"CHF",
)
Currency = Enum(
"Currency",
((curr, f"{curr} счет") for curr in CURRENCIES),
type=str,
)
class Data(BaseModel):
currency: Currency
if __name__ == "__main__":
obj = Data(currency="CNY счет")
print(obj, "\n")
try:
Data(currency="foo")
except ValidationError as e:
print(e)
Output:
currency=<Currency.CNY: 'CNY счет'>
1 validation error for Data
currency
value is not a valid enumeration member ...
The drawback to the Enum functional API is that your average static type checker will not be able to infer any enumeration members. Thus, your IDE will probably not provide you with auto-suggestions, if you wanted to use a member like Currency.AMD for example. If that bothers you, consider using the regular class definition for that Currency enum.

Related

Using cattrs / attrs where attr name does not match keys to create an object

I am looking at moving to cattrs / attrs from a completely manual process of typing out all my classes but need some help understanding how to achieve the following.
This is a single example but the data returned will be varied and sometimes not with all the fields populated.
data = {
"data": [
{
"broadcaster_id": "123",
"broadcaster_login": "Sam",
"language": "en",
"subscriber_id": "1234",
"subscriber_login": "Dave",
"moderator_id": "12345",
"moderator_login": "Tom",
"delay": "0",
"title": "Weekend Events"
}
]
}
#attrs.define
class PartialUser:
id: int
login: str
#attrs.define
class Info:
language: str
title: str
delay: int
broadcaster: PartialUser
subscriber: PartialUser
moderator: PartialUser
So I understand how you would construct this and it works perfectly fine with 1:1 mappings, as expected, but how would you create the PartialUser objects dynamically since the names are not identical to the JSON response from the API?
instance = cattrs.structure(data["data"][0], Info)
Is there some trick to using a converter?
This would need to be done for around 70 classes which is why I thought maybe cattrs could modernise and simplify what I'm trying to do.
thanks
Here's one possible solution.
This is the strategy: we will customize the structuring hook by wrapping it. The default hook expects the keys in the input dictionary to match the structure of the class, but here this is not the case. So we'll substitute our own structuring hook that does a little preprocessing and then calls into the default hook.
The default hook for an attrs class cls can be retrieved like this:
from cattrs import Converter
from cattrs.gen import make_dict_structure_fn
c = Converter()
handler = make_dict_structure_fn(cls, c)
Knowing this, we can implement a helper function thusly:
def group_by_prefix(cls: type, c: Converter, *prefixes: str) -> None:
handler = make_dict_structure_fn(cls, c)
def prefix_grouping_hook(val: dict[str, Any], _) -> Any:
by_prefix = {}
for key in val:
if "_" in key and (prefix := (parts := key.split("_", 1))[0]) in prefixes:
by_prefix.setdefault(prefix, {})[parts[1]] = val[key]
return handler(val | by_prefix, _)
c.register_structure_hook(cls, prefix_grouping_hook)
This function takes an attrs class cls, a converter, and a list of prefixes. Then it creates a hook and registers it with the converter for the class cls. Inside, it does a little bit of preprocessing to beat the data into the shape cattrs expects.
Here's how you'd use it for the Info class:
>>> c = Converter()
>>> group_by_prefix(Info, c, "broadcaster", "subscriber", "moderator")
>>> print(c.structure(data["data"][0], Info))
Info(language='en', title='Weekend Events', delay=0, broadcaster=PartialUser(id=123, login='Sam'), subscriber=PartialUser(id=1234, login='Dave'), moderator=PartialUser(id=12345, login='Tom'))
You can use this approach to make the solution more elaborate as needed.

Customize JSON representation of Pydantic model

I have a Pydantic model defined as follows:
class IntOrString(BaseModel):
int_value: Optional[StrictInt] = None
string_value: Optional[StrictStr] = None
Is there a way I can customize json() to make the output as follows:
p = IntOrString(int_value=123)
print(p.json())
#> 123
p = IntOrString(string_value="Hello World")
print(p.json())
#> "Hello World"
Note: IntOrString can be a nested attribute of another Pydantic model.
In addition to object (e.g. {"id": 123}), string, number boolean are also valid JSON type. In other words, the question is can a pydantic model be serialized to string, number or boolean instead of object?
I know it's a weird requirement. Just want to know if that's possible.
Thank you.
For such a simple thing as excluding None-valued fields in the JSON representation, you can simply use the built-in exclude_none parameter:
from typing import Optional
from pydantic import BaseModel, StrictInt, StrictStr
class Dummy(BaseModel):
id: Optional[StrictInt] = None
name: Optional[StrictStr] = None
class Other(BaseModel):
dummy: Dummy
if __name__ == '__main__':
p = Dummy(id=123)
print(p.json(exclude_none=True))
p = Dummy(name="Hello World")
print(p.json(exclude_none=True))
o = Other(dummy=Dummy(id=123))
print(o.json(exclude_none=True))
Output:
{"id": 123}
{"name": "Hello World"}
{"dummy": {"id": 123}}
If you want more complex stuff, you may want to provide your own custom JSON encoder either via the encoder parameter on a call-by-call basis or in the model config via json_dumps or json_encoders.

Conditional call of a FastAPI Model

I have a multilang FastAPI connected to MongoDB. My document in MongoDB is duplicated in the two languages available and structured this way (simplified example):
{
"_id": xxxxxxx,
"en": {
"title": "Drinking Water Composition",
"description": "Drinking water composition expressed in... with pesticides.",
"category": "Water",
"tags": ["water","pesticides"]
},
"fr": {
"title": "Composition de l'eau de boisson",
"description": "Composition de l'eau de boisson exprimée en... présence de pesticides....",
"category": "Eau",
"tags": ["eau","pesticides"]
},
}
I therefore implemented two models DatasetFR and DatasetEN, each one makeS references with specific external Models (Enum) for category and tags in each lang.
class DatasetFR(BaseModel):
title:str
description: str
category: CategoryFR
tags: Optional[List[TagsFR]]
# same for DatasetEN chnaging the lang tag to EN
In the routes definition I forced the language parameter to declare the corresponding Model and get the corresponding validation.
#router.post("?lang=fr", response_description="Add a dataset")
async def create_dataset(request:Request, dataset: DatasetFR = Body(...), lang:str="fr"):
...
return JSONResponse(status_code=status.HTTP_201_CREATED, content=created_dataset)
#router.post("?lang=en", response_description="Add a dataset")
async def create_dataset(request:Request, dataset: DatasetEN = Body(...), lang:str="en"):
...
return JSONResponse(status_code=status.HTTP_201_CREATED, content=created_dataset)
But this seems to be in contradiction with the DRY principle. So, I wonder here if someone knows an elegant solution to: - given the parameter lang, dynamically call the corresponding model.
Or, if we can create a Parent Model Dataset that takes the lang argument and retrieve the child model Dataset.
This would incredibly ease building my API routes and the call of my models and mathematically divide by two the writing...
There are 2 parts to the answer (API call and data structure)
for the API call, you could separate them into 2 routes like /api/v1/fr/... and /api/v1/en/... (separating ressource representation!) and play with fastapi.APIRouter to declare the same route twice but changing for each route the validation schema by the one you want to use.
you could start by declaring a common BaseModel as an ABC as well as an ABCEnum.
from abc import ABC
from pydantic import BaseModel
class MyModelABC(ABC, BaseModel):
attribute1: MyEnumABC
class MyModelFr(MyModelABC):
attribute1: MyEnumFR
class MyModelEn(MyModelABC):
attribute1: MyEnumEn
Then you can select the accurate Model for the routes through a class factory:
my_class_factory: dict[str, MyModelABC] = {
"fr": MyModelFr,
"en": MyModelEn,
}
Finally you can create your routes through a route factory:
def generate_language_specific_router(language: str, ...) -> APIRouter:
router = APIRouter(prefix=language)
MySelectedModel: MyModelABC = my_class_factory[language]
#router.post("/")
def post_something(my_model_data: MySelectedModel):
# My internal logic
return router
About the second part (internal computation and data storage), internationalisation is often done through hashmaps.
The standard python library gettext could be investigated
Otherwise, the original language can be explicitely used as the key/hash and then map translations to it (also including the original language if you want to have consistency in your calls).
It can look like:
dictionnary_of_babel = {
"word1": {
"en": "word1",
"fr": "mot1",
},
"word2": {
"en": "word2",
},
"Drinking Water Composition": {
"en": "Drinking Water Composition",
"fr": "Composition de l'eau de boisson",
},
}
my_arbitrary_object = {
"attribute1": "word1",
"attribute2": "word2",
"attribute3": "Drinking Water Composition",
}
my_translated_object = {}
for attribute, english_sentence in my_arbitrary_object.items():
if "fr" in dictionnary_of_babel[english_sentence].keys():
my_translated_object[attribute] = dictionnary_of_babel[english_sentence]["fr"]
else:
my_translated_object[attribute] = dictionnary_of_babel[english_sentence]["en"] # ou sans "en"
expected_translated_object = {
"attribute1": "mot1",
"attribute2": "word2",
"attribute3": "Composition de l'eau de boisson",
}
assert expected_translated_object == my_translated_object
This code should run as is
A proposal for mongoDB representation, if we don't want to have a separate table for translations, could be a data structure such as:
# normal:
my_attribute: "sentence"
# internationalized
my_attribute_internationalized: {
sentence: {
original_lang: "sentence"
lang1: "sentence_lang1",
lang2: "sentence_lang2",
}
}
A simple tactic to generalize string translation is to define an anonymous function _() that embeds the translation like:
CURRENT_MODULE_LANG = "fr"
def _(original_string: str) -> str:
"""Switch from original_string to translation"""
return dictionnary_of_babel[original_string][CURRENT_MODULE_LANG]
Then call it everywhere a translation is needed:
>>> print(_("word 1"))
"mot 1"
You can find a reference to this practice in the django documentation about internationalization-in-python-code.
For static translation (for example a website or a documentation), you can use .po files and editors like poedit (See the french translation of python docs for a practical usecase)!
Option 1
A solution would be the following. Define lang as Query paramter and add a regular expression that the parameter should match. In your case, that would be ^(fr|en)$, meaning that only fr or en would be valid inputs. Thus, if no match was found, the request would stop there and the client would receive a "string does not match regex..." error.
Next, define the body parameter as a generic type of dict and declare it as Body field; thus, instructing FastAPI to expect a JSON body.
Following, create a dictionary of your models that you can use to look up for a model using the lang attribute. Once you find the corresponding model, try to parse the JSON body using models[lang].parse_obj(body) (equivalent to using models[lang](**body)). If no ValidationError is raised, you know the resulting model instance is valid. Otherwise, return an HTTP_422_UNPROCESSABLE_ENTITY error, including the errors, which you can handle as desired.
If you would also like FR and EN being valid lang values, adjust the regex to ignore case using ^(?i)(fr|en)$ instead, and make sure to convert lang to lower case when looking up for a model (i.e., models[lang.lower()].parse_obj(body)).
import pydantic
from fastapi import FastAPI, Response, status, Body, Query
from fastapi.responses import JSONResponse
from fastapi.encoders import jsonable_encoder
models = {"fr": DatasetFR, "en": DatasetEN}
#router.post("/", response_description="Add a dataset")
async def create_dataset(body: dict = Body(...), lang: str = Query(..., regex="^(fr|en)$")):
try:
model = models[lang].parse_obj(body)
except pydantic.ValidationError as e:
return Response(content=e.json(), status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, media_type="application/json")
return JSONResponse(content=jsonable_encoder(model.dict()), status_code=status.HTTP_201_CREATED)
Update
Since the two models have identical attributes (i.e., title and description), you could define a parent model (e.g., Dataset) with those two attributes, and have DatasetFR and DatasetEN models inherit those.
class Dataset(BaseModel):
title:str
description: str
class DatasetFR(Dataset):
category: CategoryFR
tags: Optional[List[TagsFR]]
class DatasetEN(Dataset):
category: CategoryEN
tags: Optional[List[TagsEN]]
Additionally, it might be a better approach to move the logic from inside the route to a dependecy function and have it return the model, if it passes the validation; otherwise, raise an HTTPException, as also demonstrated by #tiangolo. You can use jsonable_encoder, which is internally used by FastAPI, to encode the validation errors() (the same function can also be used when returning the JSONResponse).
from fastapi.exceptions import HTTPException
from fastapi import Depends
models = {"fr": DatasetFR, "en": DatasetEN}
async def checker(body: dict = Body(...), lang: str = Query(..., regex="^(fr|en)$")):
try:
model = models[lang].parse_obj(body)
except pydantic.ValidationError as e:
raise HTTPException(detail=jsonable_encoder(e.errors()), status_code=status.HTTP_422_UNPROCESSABLE_ENTITY)
return model
#router.post("/", response_description="Add a dataset")
async def create_dataset(model: Dataset = Depends(checker)):
return JSONResponse(content=jsonable_encoder(model.dict()), status_code=status.HTTP_201_CREATED)
Option 2
A further approach would be to have a single Pydantic model (let's say Dataset) and customise the validators for category and tags fields. You can also define lang as part of Dataset, thus, no need to have it as query parameter. You can use a set, as described here, to keep the values of each Enum class, so that you can efficiently check if a value exists in the Enum; and have dictionaries to quickly look up for a set using the lang attribute. In the case of tags, to verify that every element in the list is valid, use set.issubset, as described here. If an attribute is not valid, you can raise ValueError, as shown in the documentation, "which will be caught and used to populate ValidationError" (see "Note" section here). Again, if you need the lang codes written in uppercase being valid inputs, adjust the regex pattern, as described earlier.
P.S. You don't even need to use Enum with this approach. Instead, populate each set below with the permitted values. For instance,
categories_FR = {"Eau"} categories_EN = {"Water"} tags_FR = {"eau", "pesticides"} tags_EN = {"water", "pesticides"}. Additionally, if you would like not to use regex, but rather have a custom validation error for lang attribute as well, you could add it in the same validator decorator and perform validation similar (and previous) to the other two fields.
from pydantic import validator
categories_FR = set(item.value for item in CategoryFR)
categories_EN = set(item.value for item in CategoryEN)
tags_FR = set(item.value for item in TagsFR)
tags_EN = set(item.value for item in TagsEN)
cats = {"fr": categories_FR, "en": categories_EN}
tags = {"fr": tags_FR, "en": tags_EN}
def raise_error(values):
raise ValueError(f'value is not a valid enumeration member; permitted: {values}')
class Dataset(BaseModel):
lang: str = Body(..., regex="^(fr|en)$")
title: str
description: str
category: str
tags: List[str]
#validator("category", "tags")
def validate_atts(cls, v, values, field):
lang = values.get('lang')
if lang:
if field.name == "category":
if v not in cats[lang]: raise_error(cats[lang])
elif field.name == "tags":
if not set(v).issubset(tags[lang]): raise_error(tags[lang])
return v
#router.post("/", response_description="Add a dataset")
async def create_dataset(model: Dataset):
return JSONResponse(content=jsonable_encoder(model.dict()), status_code=status.HTTP_201_CREATED)
Option 3
Another approach would be to use Discriminated Unions, as described in this answer.
As per the documentation:
When Union is used with multiple submodels, you sometimes know
exactly which submodel needs to be checked and validated and want to
enforce this. To do that you can set the same field - let's call it
my_discriminator - in each of the submodels with a discriminated
value, which is one (or many) Literal value(s). For your Union,
you can set the discriminator in its value:
Field(discriminator='my_discriminator').
Setting a discriminated union has many benefits:
validation is faster since it is only attempted against one model
only one explicit error is raised in case of failure
the generated JSON schema implements the associated OpenAPI specification

How to define Python Enum properties if MySQL ENUM values have space in their names?

I have Python Enum class like this:
from enum import Enum
class Seniority(Enum):
Intern = "Intern"
Junior_Engineer = "Junior Engineer"
Medior_Engineer = "Medior Engineer"
Senior_Engineer = "Senior Engineer"
In MYSQL database, seniority ENUM column has values "Intern", "Junior Engineer", "Medior Engineer", "Senior Engineer".
The problem is that I get an error:
LookupError: "Junior Engineer" is not among the defined enum values
This error has occurred when I call query like:
UserProperty.query.filter_by(full_name='John Doe').first()
seniority is enum property in the UserProperty model.
class UserProperty(db.Model):
...
seniority = db.Column(db.Enum(Seniority), nullable=True)
...
For this class I've defined Schema class using marshmallow Schema and EnumField from marshmallow_enum package:
class UserPropertySchema(Schema):
...
seniority = EnumField(Seniority, by_value=True)
...
What to do in this situation, because I can't define python class property name with space. How to force python to use values of defined properties instead of property names?
As Shenanigator stated in the comment of my question, we can use aliases to solve this problem.
Seniority = Enum(
value='Seniority',
names=[
('Intern', 'Intern'),
('Junior Engineer', 'Junior Engineer'),
('Junior_Engineer', 'Junior_Engineer'),
('Medior Engineer', 'Medior Engineer'),
('Medior_Engineer', 'Medior_Engineer'),
('Senior Engineer', 'Senior Engineer'),
('Senior_Engineer', 'Senior_Engineer')
]
)

Prohibit unknown values?

Can I raise an error with colander, if values are in the payload that are not in the schema? Thus, allowing only whitelisted fields?
This is a sample:
# coding=utf-8
from colander import MappingSchema, String, Length
from colander import SchemaNode
class SamplePayload(MappingSchema):
name = SchemaNode(String())
foo = SchemaNode(Int())
class Sample(MappingSchema):
type = SchemaNode(String(), validator=Length(max=32))
payload = SamplePayload()
# This json should not be accepted (and should yield something like: Unknown field in payload: bar
{
"type":"foo",
"payload":{
"name":"a name",
"foo":123,
"bar":false
}
}
Yes, see the docs of colander.Mapping
Creating a mapping with colander.Mapping(unknown='raise') will cause a colander.Invalid exception to be raised when unknown keys are present in the cstruct during deserialization.
According to issue 116 in the tracker, the way to apply this to a Schema object is to override the schema_type method:
class StrictMappingSchema(MappingSchema):
def schema_type(self, **kw):
return colander.Mapping(unknown='raise')
class SamplePayload(StrictMappingSchema):
name = SchemaNode(String())
foo = SchemaNode(Int())

Categories