Using python, I am trying to create data schema for my dataclasses using marshmallow and marshmallow-dataclass. I believe I have followed the docs, adding the decorator to my relevant dataclasses and NewTypes to my fields that are not standard python objects. However, I am getting an error before the program even loads.
The error relates to a str not being a dataclass and I have no idea how the decorator is processing a string instead of a dataclass.
I am sure I have missed something simple, so apologies in advance if that is the case.
A summarised version of the code is:
from marshmallow_dataclass import dataclass as m_dataclass, NewType
ProjectileDataType = NewType("ProjectileDataType", Any)
#m_dataclass
class ProjectileData:
Schema: ClassVar[Type[Schema]] = Schema
# what created it?
skill_name: str = field(default="None")
# what does it look like?
sprite: str = field(default="None")
defintions.py : https://pastebin.com/tHnVE2Gc
Error traceback: https://pastebin.com/htuqhKSU
Docs: https://github.com/lovasoa/marshmallow_dataclass , https://marshmallow.readthedocs.io/en/stable/quickstart.html
Related
I would like to know the difference between classes built normally in python and those built with the Pydantic lib, for example:
eg normal;
class Node:
def __init__(self, chave=None, esquerda=None, direita=None):
self.chave = chave
self.esquerda = esquerda
self.direita = direita
eg pydantic;
from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel
class User(BaseModel):
id: int
name = 'John Doe'
signup_ts: Optional[datetime] = None
friends: List[int] = []
There are a few main differences.
Firstly, purpose. Pydantic models are designed to:
Data validation and settings management using python type annotations.
pydantic enforces type hints at runtime, and provides user friendly errors when data is invalid.
When you use type annotations you receive a lot of validators and some useful methods out of the box.
As Ahmed and John says, in your example you can't assign “hello” to id in BaseModel (pydantic) because you type id as an int. But you can pass a string “1” (must be a numerical, not float) and it will be mapped to int. In this case:
pydantic uses int(v) to coerce types to an int; see this warning on loss of information during data conversion
Also Pydantic models allows you to use many more types than standard python types, like urls and much more. It means that you can easily validate more data types.
You can easily create complex models using composition.
Pydantic has some kind of integration with orms: docs
There are a lot of other features, much more than I can describe in a single answer. I strongly recommend reading the documentation, it is very clear and useful.
The pydantic models are very useful for example in building microservices where you can share your interfaces as pydantic models. Also all models can easily generate the json schema. See: Schema, exporting models.
Pydantic is also a big part of a growing in popularity python web framework fastapi.
If you have data like this (from a yaml file):
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
...
How would load that in a dataclass that is explicit about the keys and type it has?
Ideally I would have:
#dataclasses.dataclass
class X:
C>A/G>T: str
C>G/G>C: str
...
Update:
SBS_Mutations = TypedDict(
"SBS_Mutations",
{
"C>A/G>T": str,
"C>G/G>C": str,
"C>T/G>A": str,
"T>A/A>T": str,
"T>C/A>G": str,
"T>G/A>C": str,
},
)
my_data = {....}
SBS_Mutations(my_data) # not sure how to use it here
if you want symbols like that, they obviously can't be Python identifiers, and then, it is meaningless to want to use the facilities that a dataclass, with attribute access, gives you.
Just keep your data in dictionaries, or in Pandas dataframes, where such names can be column titles.
Otherwise, post a proper code snippet with a minimum example of where you are getting the data from, and then, one can add in an answer, a proper place to translate your orignal name into a valid Python attribute name, and help building a dynamic data class with it.
This sounds like a good use case for my dotwiz library, which I have recently published. This provides a dict subclass which enables attribute-style dot access for nested keys.
As of the recent release, it offers a DotWizPlus implementation (a wrapper around a dict object) that also case transforms keys so that they are valid lower-cased, python identifier names, as shown below.
# requires the following dependencies:
# pip install PyYAML dotwiz
import yaml
from dotwiz import DotWizPlus
yaml_str = """
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
"""
yaml_dict = yaml.safe_load(yaml_str)
print(yaml_dict)
dw = DotWizPlus(yaml_dict)
print(dw)
assert dw.items.c_a_g_t == '#string' # True
print(dw.to_attr_dict())
Output:
{'items': {'C>A/G>T': '#string', 'C>G/G>C': '#string'}}
✪(items=✪(c_a_g_t='#string', c_g_g_c='#string'))
{'items': {'c_a_g_t': '#string', 'c_g_g_c': '#string'}}
NB: This currently fails when accessing the key items from just a DotWiz instance, as the key name conflicts with the builtin attribute dict.items(). I've currently submitted a bug request and hopefully work through this one edge case in particular.
Type Hinting
If you want type-hinting or auto-suggestions for field names, you can try something like this where you subclass from DotWizPlus:
import yaml
from dotwiz import DotWizPlus
class Item(DotWizPlus):
c_a_g_t: str
c_g_g_c: str
#classmethod
def from_yaml(cls, yaml_string: str, loader=yaml.safe_load):
yaml_dict = loader(yaml_str)
return cls(yaml_dict['items'])
yaml_str = """
items:
C>A/G>T: "#string1"
C>G/G>C: "#string2"
"""
dw = Item.from_yaml(yaml_str)
print(dw)
# ✪(c_a_g_t='#string1', c_g_g_c='#string2')
assert dw.c_a_g_t == '#string1' # True
# auto-completion will work, as IDE knows the type is a `str`
# dw.c_a_g_t.
Dataclasses
If you would still prefer dataclasses for type-hinting purposes, there is another library you can also check out called dataclass-wizard, which can help to simplify this task as well.
More specifically, YAMLWizard makes it easier to load/dump a class object with YAML. Note that this uses the PyYAML library behind the scenes by default.
Note that I couldn't get the case-transform to work in this case, since I guess it's a bug in the underlying to_snake_case() implementation. I'm also going to submit a bug request to look into this edge case. However, for now it should work if the key name in YAML is specified a bit more explicitly:
from dataclasses import dataclass
from dataclass_wizard import YAMLWizard, json_field
yaml_str = """
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
"""
#dataclass
class Container(YAMLWizard):
items: 'Item'
#dataclass
class Item:
c_a_g_t: str = json_field('C>A/G>T')
c_g_g_c: str = json_field('C>G/G>C')
c = Container.from_yaml(yaml_str)
print(c)
# True
assert c.items.c_g_g_c == c.items.c_a_g_t == '#string'
Output:
Container(items=Item(c_a_g_t='#string', c_g_g_c='#string'))
So I've been trying to make a class using pydantic that is created through a config json file. I've been running into an issue where I am trying to set a default value.
The basic idea is that there is a step type, that can be annotated with a "type" field:
from typing import Literal, Union, List
from pydantic import Field
from typing_extensions import Annotated
import pydantic
import sys
class Type1Step(pydantic.BaseModel):
step_type: Literal["type_1"]
class Type2Step(pydantic.BaseModel):
step_type: Literal["type_2"]
StepT = Annotated[
Union[Type1Step, Type2Step],
Field(discriminator="step_type"),
]
class Plan(pydantic.BaseModel):
pre_steps: List[StepT] = ()
post_steps: List[StepT] = ()
but I get this error:
E ValueError: Field default cannot be set in Annotated for 'post_steps_0'
I think I am misunderstanding how the Annotated type works. Does anyone have any idea on what I am doing wrong? Thanks.
Edit: Issue has been solved. This was a bug solved in pydantic version 1.9.1
I was using pydantic version 1.9.0
This was a bug that was fixed in pydantic version 1.9.1:
https://github.com/samuelcolvin/pydantic/pull/4067
The webargs module allows to describe argument schemas as a pure dictionary or marshmallow dataclass schema:
# Dictionary variant
#use_args({'field1': field.Int(required=True, validate=validate.Range(min=1))}, location='json')
def post(args: Dict[str, any]):
controller.post(args)
# Marshmallow dataclass schema
#dataclass()
class Arg:
field1: int = field(metadata=dict(required=True, validate=validate.Range(min=1)))
#use_args(Arg.Schema(), location='json')
def post(arg: Arg):
controller.post(arg)
The first variant looks shorter and faster but we lose syntax highlight and type checks in IDE (cuz it's dict) and also it leads to longer calls, i.e. args['field1'] instead of arg.field1.
Which variant do you use in your big projects? Are there some best practices when to use the first or second variant?
There is no best practice. It is a matter of preference, really.
Sometimes, people think that a schema is too much code for just just one or two query args and a dict is enough.
I like to use schemas everywhere. I find it more consistent and it allows all schemas to derive from a base schema to inherit Meta parameters.
I don't use marshmallow dataclass, just pure marshmallow, so I always get a dict anyway.
Okay, so pardon me if I don't make much sense. I face this 'ObjectId' object is not iterable whenever I run the collections.find() functions. Going through the answers here, I'm not sure where to start. I'm new to programming, please bear with me.
Every time I hit the route which is supposed to fetch me data from Mongodb, I getValueError: [TypeError("'ObjectId' object is not iterable"), TypeError('vars() argument must have __dict__ attribute')].
Help
Exclude the "_id" from the output.
result = collection.find_one({'OpportunityID': oppid}, {'_id': 0})
I was having a similar problem to this myself. Not having seen your code I am guessing the traceback similarly traces the error to FastAPI/Starlette not being able to process the "_id" field - what you will therefore need to do is change the "_id" field in the results from an ObjectId to a string type and rename the field to "id" (without the underscore) on return to avoid incurring issues with Pydantic.
First of all, if we had some examples of your code, this would be much easier. I can only assume that you are not mapping your MongoDb collection data to your Pydantic BaseModel correctly.
Read this:
MongoDB stores data as BSON. FastAPI encodes and decodes data as JSON strings. BSON has support for additional non-JSON-native data types, including ObjectId which can't be directly encoded as JSON. Because of this, we convert ObjectIds to strings before storing them as the _id.
I want to draw attention to the id field on this model. MongoDB uses _id, but in Python, underscores at the start of attributes have special meaning. If you have an attribute on your model that starts with an underscore, pydantic—the data validation framework used by FastAPI—will assume that it is a private variable, meaning you will not be able to assign it a value! To get around this, we name the field id but give it an alias of _id. You also need to set allow_population_by_field_name to True in the model's Config class.
Here is a working example:
First create the BaseModel:
class PyObjectId(ObjectId):
""" Custom Type for reading MongoDB IDs """
#classmethod
def __get_validators__(cls):
yield cls.validate
#classmethod
def validate(cls, v):
if not ObjectId.is_valid(v):
raise ValueError("Invalid object_id")
return ObjectId(v)
#classmethod
def __modify_schema__(cls, field_schema):
field_schema.update(type="string")
class Student(BaseModel):
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
first_name: str
last_name: str
class Config:
allow_population_by_field_name = True
arbitrary_types_allowed = True
json_encoders = {ObjectId: str}
Now just unpack everything:
async def get_student(student_id) -> Student:
data = await collection.find_one({'_id': student_id})
if data is None:
raise HTTPException(status_code=404, detail='Student not found.')
student: Student = Student(**data)
return student
Use the response model inside app decorator Here is the sample example
from pydantic import BaseModel
class Todo(BaseModel):
title:str
details:str
main.py
#app.get("/{title}",response_model=Todo)
async def get_todo(title:str):
response=await fetch_one_todo(title)
if not response:
raise
HTTPException(status_code=status.HTTP_404_NOT_FOUND,detail='not found')
return response
use db.collection.find(ObjectId:"12348901384918")
here db.collection is database name and use double quotes for the string .
I was trying to iterate through all the documents and what worked for me was this solution https://github.com/tiangolo/fastapi/issues/1515#issuecomment-782835977
These lines just needed to be added after the child of ObjectID class. An example is given in the following link.
https://github.com/tiangolo/fastapi/issues/1515#issuecomment-782838556
I had this issue until I upgraded from mongodb version 5.0.9 to version 6.0.0 so mongodb made some changes on their end to handle this if you have the ability to upgrade! I ran into this issue when creating a test server and when I created a new test server that was 6.0.0, it fixed the error.