Pydantic how to create a model with required fields and dynamic fields? - python

I have a dynamic json like:
{
"ts": 1111111, // this field is required
... // others are dynamic
}
The fields are all dynamic except ts. E.g:
{"ts":111,"f1":"aa","f2":"bb"}
{"ts":222,"f3":"cc","f4":"dd"}
How to declare this model with Pydantic?
class JsonData(BaseModel):
ts: int
... # and then?
Equivalent to typescript:
interface JsonData {
ts: number;
[key: string]: any;
}
Thanks.

Without Type Validation
If type validation doesn't matter then you could use the extra option allow:
'allow' will assign the attributes to the model.
class JsonData(BaseModel):
ts: int
class Config:
extra = "allow"
This will give:
data = JsonData.parse_raw("""
{
"ts": 222,
"f3": "cc",
"f4": "dd"
}
""")
repr(data)
# "JsonData(ts=222, f4='dd', f3='cc')"
And the fields can be accessed via:
print(data.f3)
# cc
With Type Validation
However, if you could change the request body to contain an object holding the "dynamic fields", like the following:
{
"ts": 111,
"fields": {
"f1": "aa",
"f2": "bb"
}
}
or
{
"ts": 222,
"fields": {
"f3": "cc",
"f4": "dd"
}
}
you could use a Pydantic model like this one:
from pydantic import BaseModel
class JsonData(BaseModel):
ts: int
fields: dict[str, str] = {}
That way, any number of fields could be processed, while the field type would be validated, e.g.:
# No "extra" fields; yields an empty dict:
data = JsonData.parse_raw("""
{
"ts": "222"
}
""")
repr(data)
# "JsonData(ts=222, fields={})"
# One extra field:
data = JsonData.parse_raw("""
{
"ts": 111,
"fields": {
"f1": "aa"
}
}
""")
repr(data)
# "JsonData(ts=111, fields={'f1': 'aa'})"
# Several extra fields:
data = JsonData.parse_raw("""
{
"ts": 222,
"fields": {
"f2": "bb",
"f3": "cc",
"f4": "dd"
}
}
""")
repr(data)
# "JsonData(ts=222, fields={'f2': 'bb', 'f3': 'cc', 'f4': 'dd'})"
The fields can be accessed like this:
print(data.fields["f2"])
# bb
In this context, you might also want to consider the field types to be StrictStr as opposed to str. When using str, other types will get coerced into strings, for example when an integer or float is passed in. This doesn't happen with StrictStr.
See also: this workaround.

Related

Validating nested dict with Pydantic `create_model`

I am using create_model to validate a config file which runs into many nested dicts. In the below example i can validate everything except the last nest of sunrise and sunset.
class System(BaseModel):
data: Optional[create_model('Data', type=(str, ...), daytime=(dict, ...))] = None
try:
p = System.parse_obj({
'data': {
'type': 'solar',
'daytime': {
'sunrise': 5,
'sunset': 10
}
}})
print(p.dict())
vals = p.dict()
print(vals['data']['daytime'], type(vals['data']['daytime']['sunrise']))
except ValidationError as e:
print(e)
How can i incorporate a nested dict in create_model and ensure sunrise and sunset are validate or is there any other way to validate this. ?
Thanks for your feedback.
It's not entirely clear what you are trying to achieve, so here's my best guess:
Pydantic's create_model() is meant to be used when
the shape of a model is not known until runtime.
From your code it's not clear whether or not this is really necessary.
Here's how I would approach this.
(The code below is using Python 3.9 and above type hints. If you're using an earlier version, you might have to replace list with typing.List and dict with typing.Dict.)
Using a "Static" Model
from pydantic import BaseModel, create_model
from typing import Optional
class Data(BaseModel):
type: str
daytime: dict[str, int] # <- explicit types in the dict, values will be coerced
class System(BaseModel):
data: Optional[Data]
system = {
"data": {
"type": "solar",
"daytime": {
"sunrise": 1,
"sunset": 10
}
}
}
p = System.parse_obj(system)
print(repr(p))
# System(data=Data(type='solar', daytime={'sunrise': 1, 'sunset': 10}))
This will accept integers in daytime but not other types, like strings.
That means, something like this:
system = {
"data": {
"type": "solar",
"daytime": {
"sunrise": "some string",
"sunset": 10
}
}
}
p = System.parse_obj(system)
will fail with:
pydantic.error_wrappers.ValidationError: 1 validation error for System
data -> daytime -> sunrise
value is not a valid integer (type=type_error.integer)
If you want more control over how "daytime" is being parsed, you could create an additional model for it, e.g.:
class Daytime(BaseModel):
sunrise: int
sunset: int
class Data(BaseModel):
type: str
daytime: Daytime
class System(BaseModel):
data: Optional[Data]
This will work as above however, only the parameters sunrise and sunset will be parsed and everything else that might be inside "daytime" will be ignored (by default).
Using a "Dynamic" Model
If you really need to create the model dynamically, you can use a similar approach:
class System(BaseModel):
data: Optional[
create_model(
"Data",
type=(str, ...),
daytime=(dict[str, int], ...), # <- explicit types in dict
)
] = None
system = {
"data": {
"type": "solar",
"daytime": {
"sunrise": 1,
"sunset": 10
}
}
}
p = System.parse_obj(system)
print(repr(p))
will work, while
system = {
"data": {
"type": "solar",
"daytime": {
"sunrise": "abc",
"sunset": 10
}
}
}
p = System.parse_obj(system)
will fail with
pydantic.error_wrappers.ValidationError: 1 validation error for System
data -> daytime -> sunrise
value is not a valid integer (type=type_error.integer)

Cannot construct an Explanation object

Trying to construct an Explanation object for a unit test, but can't seem to get it to work. Here's what I'm trying:
from google.cloud import aiplatform
aiplatform.compat.types.explanation_v1.Explanation(
attributions=aiplatform.compat.types.explanation_v1.Attribution(
{
"approximation_error": 0.010399332817679649,
"baseline_output_value": 0.9280818700790405,
"feature_attributions": {
"feature_1": -0.0410824716091156,
"feature_2": 0.01155053575833639,
},
"instance_output_value": 0.6717480421066284,
"output_display_name": "true",
"output_index": [0],
"output_name": "scores",
}
)
)
which gives:
".venv/lib/python3.7/site-packages/proto/message.py", line 521, in __init__
super().__setattr__("_pb", self._meta.pb(**params))
TypeError: Value must be iterable
I found this on github, but I'm not sure how to apply that workaround here.
As the error mentioned value to be passed at attributions should be iterable. See Explanation attributes documentation.
I tried your code and placed the Attribution object in a list and the error is gone. I assigned your objects in variables just so the code is readable.
See code and testing below:
from google.cloud import aiplatform
test = {
"approximation_error": 0.010399332817679649,
"baseline_output_value": 0.9280818700790405,
"feature_attributions": {
"feature_1": -0.0410824716091156,
"feature_2": 0.01155053575833639,
},
"instance_output_value": 0.6717480421066284,
"output_display_name": "true",
"output_index": [0],
"output_name": "scores",
}
attributions=aiplatform.compat.types.explanation_v1.Attribution(test)
x = aiplatform.compat.types.explanation_v1.Explanation(
attributions=[attributions]
)
print(x)
Output:
attributions {
baseline_output_value: 0.9280818700790405
instance_output_value: 0.6717480421066284
feature_attributions {
struct_value {
fields {
key: "feature_1"
value {
number_value: -0.0410824716091156
}
}
fields {
key: "feature_2"
value {
number_value: 0.01155053575833639
}
}
}
}
output_index: 0
output_display_name: "true"
approximation_error: 0.010399332817679649
output_name: "scores"
}

How can I define custom output types for mutations with graphene-django?

Create/remove/update/delete (CRUD) mutations usually return the corresponding database model instance as output type of the mutation. However for non-CRUD mutations I'd like to define business logic specific mutation output types. E.g. returning the count of list elements + a list of IDs which cannot be mapped 1-to-1 between graphql type and db models. How can I achieve this with graphene-django?
List not related to Models
As you want to return both a count and a list of elements, you can create a custom type:
class ListWithCountType(graphene.Scalar):
#staticmethod
def serialize(some_argument):
# make computation here
count = ...
some_list = ...
return { "count": count, "list": some_list }
Then on your mutation you use it like this:
class MyMutation(graphene.Mutation):
list_with_count = graphene.Field(ListWithCountType)
#classmethod
def mutate(cls, root, info, **kwargs):
some_argument = kwargs.pop("some_argument")
return cls(list_with_count=some_argument)
Add to your schema:
class Query(graphene.ObjectType):
my_mutation = MyMutation.Field()
Should return something like:
{
"data": {
"list_with_count": {
"count": <COUNT VALUE>,
"list": <SOME_LIST VALUE>
}
}
}
*PS: if this is only an output, ok. But if you want this type to be an argument, you should also implement "parse_literal" and "parse_value", besides the "serialize".
Here is an example with a custom ErrorType used with forms.
List related to Models
From the docs:
# cookbook/ingredients/schema.py
import graphene
from graphene_django.types import DjangoObjectType
from cookbook.ingredients.models import Category
class CategoryType(DjangoObjectType):
class Meta:
model = Category
class Query(object):
all_categories = graphene.List(CategoryType)
def resolve_all_categories(self, info, **kwargs):
return Category.objects.all()
On your schema:
import graphene
import cookbook.ingredients.schema
class Query(cookbook.ingredients.schema.Query, graphene.ObjectType):
pass
schema = graphene.Schema(query=Query)
Then you can query like:
query {
allCategories {
id
}
}
Should return something like:
{
"data": {
"allCategories": [
{
"id": "1",
},
{
"id": "2",
},
{
"id": "3",
},
{
"id": "4",
}
]
}
}
Here is an example with user model.

Graphene-django - How to catch response of query?

I use django and django graphene for make a graphql API.
In the view of my application, I use reactJS and react-bootstrap-table. React-bootstrap-table expects that I pass it an object array but does not support nested objects.
I created query in my schema.py:
class ApplicationNode(DjangoObjectType):
class Meta:
model = Application
filter_fields = ['name', 'sonarQube_URL']
interfaces = (relay.Node,)
class Query(ObjectType):
application = relay.Node.Field(ApplicationNode)
all_applications = DjangoFilterConnectionField(ApplicationNode)
The answers to these queries are JSON nested objects like this:
{
"data": {
"allApplications": {
"edges": [
{
"node": {
"id": "QXBwbGljYXRpb25Ob2RlOjE=",
"name": "foo",
"sonarQubeUrl": "foo.com",
"flow":{
"id": "QYBwbGljYXRpb45Ob2RlOjE=",
"name": "flow_foo"
}
}
},
{
"node": {
"id": "QXBwbGljYXRpb25Ob2RlOjI=",
"name": "bar",
"sonarQubeUrl": "bar.com"
"flow":{
"id": "QXBwbGljYXRpb26Ob2RlOjA=",
"name": "flow_bar"
}
}
}
]
}
}
}
I have to put them flat before giving them to React-bootstrap-table.
What is the better way, intercept the results of graphene-django queries to put them flat or make this job in ReactJS view?
If the first way is better, how to intercept the results of graphene-django queries to put them flat?
The best thing to do is to wrap react-bootstrap-table in a new component. In the component massage the relay props into a flat structure as needed for react bootstrap table.
For example:
MyReactTable = ({allApplications}) => {
let flatApplications = allApplications.edges.map(({node: app}) => {
return {
name: app.name,
sonarQubeUrl: app.sonarQubeUrl,
flowName: app.flow.name
};
});
return (
<BootstrapTable data={flatApplications} striped={true} hover={true}>
<TableHeaderColumn dataField="name" isKey={true} dataAlign="center" dataSort={true}>Name</TableHeaderColumn>
<TableHeaderColumn dataField="sonarQubeUrl" dataSort={true}>Sonar Qube Url</TableHeaderColumn>
<TableHeaderColumn dataField="flowName" dataFormat={priceFormatter}>Flow Name</TableHeaderColumn>
</BootstrapTable>
);
};

define parent in elasticsearch-dsl-py

I'm trying to use Elasticsearch-dsl-py to index some data from a jsonl file with many fields. ignoring the less general parts, the code looks like this:
es = Elasticsearch()
for id,line in enumerate(open(jsonlfile)):
jline = json.loads(line)
children = jline.pop('allChildrenOfTypeX')
res = es.index(index="mydocs", doc_type='fatherdoc', id=id, body=jline)
for ch in children:
res = es.index(index="mydocs", doc_type='childx', parent=id, body=ch)
trying to run this ends with the error:
RequestError: TransportError(400, u'illegal_argument_exception', u"Can't specify parent if no parent field has been configured")
I guess I need to tell es in advance that has a parent. However, what I don't want is to map ALL the fields of both just to do it.
Any help is greatly welcomed!
When creating your mydocs index, in the definition of your childx mapping type, you need to specify the _parent field with the value fatherdoc:
PUT mydocs
{
"mappings": {
"fatherdoc": {
"properties": {
... parent type fields ...
}
},
"childx": {
"_parent": { <---- add this
"type": "fatherdoc"
},
"properties": {
... parent type fields ...
}
}
}
}

Categories