If you have data like this (from a yaml file):
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
...
How would load that in a dataclass that is explicit about the keys and type it has?
Ideally I would have:
#dataclasses.dataclass
class X:
C>A/G>T: str
C>G/G>C: str
...
Update:
SBS_Mutations = TypedDict(
"SBS_Mutations",
{
"C>A/G>T": str,
"C>G/G>C": str,
"C>T/G>A": str,
"T>A/A>T": str,
"T>C/A>G": str,
"T>G/A>C": str,
},
)
my_data = {....}
SBS_Mutations(my_data) # not sure how to use it here
if you want symbols like that, they obviously can't be Python identifiers, and then, it is meaningless to want to use the facilities that a dataclass, with attribute access, gives you.
Just keep your data in dictionaries, or in Pandas dataframes, where such names can be column titles.
Otherwise, post a proper code snippet with a minimum example of where you are getting the data from, and then, one can add in an answer, a proper place to translate your orignal name into a valid Python attribute name, and help building a dynamic data class with it.
This sounds like a good use case for my dotwiz library, which I have recently published. This provides a dict subclass which enables attribute-style dot access for nested keys.
As of the recent release, it offers a DotWizPlus implementation (a wrapper around a dict object) that also case transforms keys so that they are valid lower-cased, python identifier names, as shown below.
# requires the following dependencies:
# pip install PyYAML dotwiz
import yaml
from dotwiz import DotWizPlus
yaml_str = """
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
"""
yaml_dict = yaml.safe_load(yaml_str)
print(yaml_dict)
dw = DotWizPlus(yaml_dict)
print(dw)
assert dw.items.c_a_g_t == '#string' # True
print(dw.to_attr_dict())
Output:
{'items': {'C>A/G>T': '#string', 'C>G/G>C': '#string'}}
✪(items=✪(c_a_g_t='#string', c_g_g_c='#string'))
{'items': {'c_a_g_t': '#string', 'c_g_g_c': '#string'}}
NB: This currently fails when accessing the key items from just a DotWiz instance, as the key name conflicts with the builtin attribute dict.items(). I've currently submitted a bug request and hopefully work through this one edge case in particular.
Type Hinting
If you want type-hinting or auto-suggestions for field names, you can try something like this where you subclass from DotWizPlus:
import yaml
from dotwiz import DotWizPlus
class Item(DotWizPlus):
c_a_g_t: str
c_g_g_c: str
#classmethod
def from_yaml(cls, yaml_string: str, loader=yaml.safe_load):
yaml_dict = loader(yaml_str)
return cls(yaml_dict['items'])
yaml_str = """
items:
C>A/G>T: "#string1"
C>G/G>C: "#string2"
"""
dw = Item.from_yaml(yaml_str)
print(dw)
# ✪(c_a_g_t='#string1', c_g_g_c='#string2')
assert dw.c_a_g_t == '#string1' # True
# auto-completion will work, as IDE knows the type is a `str`
# dw.c_a_g_t.
Dataclasses
If you would still prefer dataclasses for type-hinting purposes, there is another library you can also check out called dataclass-wizard, which can help to simplify this task as well.
More specifically, YAMLWizard makes it easier to load/dump a class object with YAML. Note that this uses the PyYAML library behind the scenes by default.
Note that I couldn't get the case-transform to work in this case, since I guess it's a bug in the underlying to_snake_case() implementation. I'm also going to submit a bug request to look into this edge case. However, for now it should work if the key name in YAML is specified a bit more explicitly:
from dataclasses import dataclass
from dataclass_wizard import YAMLWizard, json_field
yaml_str = """
items:
C>A/G>T: "#string"
C>G/G>C: "#string"
"""
#dataclass
class Container(YAMLWizard):
items: 'Item'
#dataclass
class Item:
c_a_g_t: str = json_field('C>A/G>T')
c_g_g_c: str = json_field('C>G/G>C')
c = Container.from_yaml(yaml_str)
print(c)
# True
assert c.items.c_g_g_c == c.items.c_a_g_t == '#string'
Output:
Container(items=Item(c_a_g_t='#string', c_g_g_c='#string'))
Related
Python dataclasses are really great. They allow to define classes in very beautiful way.
from dataclasses import dataclass
#dataclass
class InventoryItem:
"""Class for keeping track of an item in inventory."""
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
Moreover lots of useful tools re-use python annotations the same way and allow to define classes (that are more like structures in other languages) the same way. One of the example is Pydantic.
from pydantic import BaseModel
class User(BaseModel):
id: int
name = 'John Doe'
signup_ts: Optional[datetime] = None
friends: List[int] = []
I myself use pydantic quite a lot these days. Look at an example from my recent practice:
class G6A(BaseModel):
transaction_id: items.TransactionReference # Transaction Id
mpan_core: items.MPAN # MPAN Core
registration_date: items.CallistoDate # Registration Date
action_required: G6AAction # Action Required
I parse some very inconvenient api and that's why I want to leave a comment on every line. With that it will be working as self-documentation. The problem is that, at least for me, this looks very ugly. It's hard to look throw lines, 'cause the look like table with broken columns. Let's try to fix this by doing accurate indentations:
class G6A(BaseModel):
transaction_id: items.TransactionReference # Transaction Id
mpan_core: items.MPAN # MPAN Core
registration_date: items.CallistoDate # Registration Date
action_required: G6AAction # Action Required
I'm sure that this way it is much more readable. By doing so we define structure, like actual table, where 1 column is attribute name, 2 column is attribute type and the last one is comment. It is actually inspired by Go structs
type T struct {
name string // name of the object
value int // its value
}
So, my questions is - are there any automatic tools (linters), that will reformat dataclass/pydantic-models the way I described above? I looked throw autopep8, black linter and find nothing. Also googled and so on and still nothing. Any ideas how to achieve that by existing tools ?
I think yapf has something like that for comments. Check the SPACES_BEFORE_COMMENT "knob":
The number of spaces required before a trailing comment. This can be a single value (representing the number of spaces before each trailing comment) or list of of values (representing alignment column values; trailing comments within a block will be aligned to the first column value that is greater than the maximum line length within the block)
.style.yapf:
[style]
based_on_style = pep8
spaces_before_comment = 10,20,30,40,50,60,70,80
Configures alignment columns: 10, 20, ..., 80.
foo.py:
class G6A(BaseModel):
transaction_id: items.TransactionReference # Transaction Id
mpan_core: items.MPAN # MPAN Core
registration_date: items.CallistoDate # Registration Date
action_required: G6AAction # Action Required
Output of yapf foo.py:
class G6A(BaseModel):
transaction_id: items.TransactionReference # Transaction Id
mpan_core: items.MPAN # MPAN Core
registration_date: items.CallistoDate # Registration Date
action_required: G6AAction # Action Required
Okay, so pardon me if I don't make much sense. I face this 'ObjectId' object is not iterable whenever I run the collections.find() functions. Going through the answers here, I'm not sure where to start. I'm new to programming, please bear with me.
Every time I hit the route which is supposed to fetch me data from Mongodb, I getValueError: [TypeError("'ObjectId' object is not iterable"), TypeError('vars() argument must have __dict__ attribute')].
Help
Exclude the "_id" from the output.
result = collection.find_one({'OpportunityID': oppid}, {'_id': 0})
I was having a similar problem to this myself. Not having seen your code I am guessing the traceback similarly traces the error to FastAPI/Starlette not being able to process the "_id" field - what you will therefore need to do is change the "_id" field in the results from an ObjectId to a string type and rename the field to "id" (without the underscore) on return to avoid incurring issues with Pydantic.
First of all, if we had some examples of your code, this would be much easier. I can only assume that you are not mapping your MongoDb collection data to your Pydantic BaseModel correctly.
Read this:
MongoDB stores data as BSON. FastAPI encodes and decodes data as JSON strings. BSON has support for additional non-JSON-native data types, including ObjectId which can't be directly encoded as JSON. Because of this, we convert ObjectIds to strings before storing them as the _id.
I want to draw attention to the id field on this model. MongoDB uses _id, but in Python, underscores at the start of attributes have special meaning. If you have an attribute on your model that starts with an underscore, pydantic—the data validation framework used by FastAPI—will assume that it is a private variable, meaning you will not be able to assign it a value! To get around this, we name the field id but give it an alias of _id. You also need to set allow_population_by_field_name to True in the model's Config class.
Here is a working example:
First create the BaseModel:
class PyObjectId(ObjectId):
""" Custom Type for reading MongoDB IDs """
#classmethod
def __get_validators__(cls):
yield cls.validate
#classmethod
def validate(cls, v):
if not ObjectId.is_valid(v):
raise ValueError("Invalid object_id")
return ObjectId(v)
#classmethod
def __modify_schema__(cls, field_schema):
field_schema.update(type="string")
class Student(BaseModel):
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
first_name: str
last_name: str
class Config:
allow_population_by_field_name = True
arbitrary_types_allowed = True
json_encoders = {ObjectId: str}
Now just unpack everything:
async def get_student(student_id) -> Student:
data = await collection.find_one({'_id': student_id})
if data is None:
raise HTTPException(status_code=404, detail='Student not found.')
student: Student = Student(**data)
return student
Use the response model inside app decorator Here is the sample example
from pydantic import BaseModel
class Todo(BaseModel):
title:str
details:str
main.py
#app.get("/{title}",response_model=Todo)
async def get_todo(title:str):
response=await fetch_one_todo(title)
if not response:
raise
HTTPException(status_code=status.HTTP_404_NOT_FOUND,detail='not found')
return response
use db.collection.find(ObjectId:"12348901384918")
here db.collection is database name and use double quotes for the string .
I was trying to iterate through all the documents and what worked for me was this solution https://github.com/tiangolo/fastapi/issues/1515#issuecomment-782835977
These lines just needed to be added after the child of ObjectID class. An example is given in the following link.
https://github.com/tiangolo/fastapi/issues/1515#issuecomment-782838556
I had this issue until I upgraded from mongodb version 5.0.9 to version 6.0.0 so mongodb made some changes on their end to handle this if you have the ability to upgrade! I ran into this issue when creating a test server and when I created a new test server that was 6.0.0, it fixed the error.
Using python, I am trying to create data schema for my dataclasses using marshmallow and marshmallow-dataclass. I believe I have followed the docs, adding the decorator to my relevant dataclasses and NewTypes to my fields that are not standard python objects. However, I am getting an error before the program even loads.
The error relates to a str not being a dataclass and I have no idea how the decorator is processing a string instead of a dataclass.
I am sure I have missed something simple, so apologies in advance if that is the case.
A summarised version of the code is:
from marshmallow_dataclass import dataclass as m_dataclass, NewType
ProjectileDataType = NewType("ProjectileDataType", Any)
#m_dataclass
class ProjectileData:
Schema: ClassVar[Type[Schema]] = Schema
# what created it?
skill_name: str = field(default="None")
# what does it look like?
sprite: str = field(default="None")
defintions.py : https://pastebin.com/tHnVE2Gc
Error traceback: https://pastebin.com/htuqhKSU
Docs: https://github.com/lovasoa/marshmallow_dataclass , https://marshmallow.readthedocs.io/en/stable/quickstart.html
I want to use #dataclass_json decorator to store my #dataclass instances.
And I want to have many reference to one object in the instances. And I want to have this reference structure saved (so that I could modify one settings object and the modifications would be applied to many objects that use the settings).
It can be easily done while the dataclass object lies in memory, but when I try to store it in JSON, it saves the copy of instance instead of a reference of it. Can I somehow deal with it?
P.S. Here's my code example:
from dataclasses import dataclass
from dataclasses_json import dataclass_json
from typing import List
#dataclass_json
#dataclass
class RadarSettings:
freq: float = 10e9
prf: float = 1e-3
#dataclass_json
#dataclass
class Radar:
name: str = ""
preset_settings: RadarSettings = None # Here should be references to some boilerplate preset settings for many radars
custom_settings: RadarSettings = None # And here should be the custom settings to this current radar
#dataclass_json
#dataclass
class RadarScene:
name: str = ""
radars: List["Radar"] = None
preset = RadarSettings()
radar1 = Radar(name="mega search mode radar from hell", preset_settings=preset)
radar2 = Radar(name="satanic sensor array radar", preset_settings=preset)
# The preset_settings is one same object for both radars! If I modify it, the modifications will be applied to both radars
print(id(radar1.preset_settings), id(radar2.preset_settings))
scene_to_save = RadarScene(name="Infernal scene", radars=[radar1, radar2])
loaded_scene = RadarScene.from_json(scene_to_save.to_json())
print(id(loaded_scene.radars[0]), id(loaded_scene.radars[1]))
# Alas! Here will be two instances of preset_settings saved. I need one =(
The problem you have described is expected behavior. When you save your data to json format you get a string representation of the data that is plain text.
You may fix the issue with at least couple approaches.
Method 1.
Load RadarScene data, create preset = RadarSettings(), iterate over all Radars in the RadarScene and update preset_settings attribute: radar.preset_settings = preset. This method can be incapsulated into RadarScene class so you can call it right after loading data.
Method 2.
Create new singleton class RadarSettingsDefault inherited from RadarSettings and modify Radar class: preset_settings: RadarSettingsDefault = None.
I use PyYAML dump to print complex data structures but there is one class of objects that cannot, and also I do not want to, be dumped.
Currently I get:
yaml.representer.RepresenterError: cannot represent an object
I would like yaml.dump to completely ignore this particular class or just render the classname and continue as usual.
If this is possible, how can I do that?
You'll have to provide a representer for the object. There are multiple ways to do that, some involving changing the object.
When you explicitly register a representer, the object doesn't have to be changed:
import sys
from ruamel import yaml
class Secret():
def __init__(self, user, password):
self.user = user
self.password = password
def secret_representer(dumper, data):
return dumper.represent_scalar(u'!secret', u'unknown')
yaml.add_representer(Secret, secret_representer)
data = dict(a=1, b=2, c=[42, Secret(user='cary', password='knoop')])
yaml.dump(data, sys.stdout)
In secret_representer, the data is the instantiated Secret(), since the function doesn't use that, no "secrets" are leaked. You could also e.g. return the user name, but not the password. The represent_scalar function expects a tag (here I used !secret) and a scalar (here the string unknown).
The output of the above:
a: 1
b: 2
c: [42, !secret '[unknown]']
I am using ruamel.yaml in the above which is an upgraded version of PyYAML (disclaimer: I am the author of that package). The above should work with PyYAML as well.