Prohibit unknown values? - python

Can I raise an error with colander, if values are in the payload that are not in the schema? Thus, allowing only whitelisted fields?
This is a sample:
# coding=utf-8
from colander import MappingSchema, String, Length
from colander import SchemaNode
class SamplePayload(MappingSchema):
name = SchemaNode(String())
foo = SchemaNode(Int())
class Sample(MappingSchema):
type = SchemaNode(String(), validator=Length(max=32))
payload = SamplePayload()
# This json should not be accepted (and should yield something like: Unknown field in payload: bar
{
"type":"foo",
"payload":{
"name":"a name",
"foo":123,
"bar":false
}
}

Yes, see the docs of colander.Mapping
Creating a mapping with colander.Mapping(unknown='raise') will cause a colander.Invalid exception to be raised when unknown keys are present in the cstruct during deserialization.
According to issue 116 in the tracker, the way to apply this to a Schema object is to override the schema_type method:
class StrictMappingSchema(MappingSchema):
def schema_type(self, **kw):
return colander.Mapping(unknown='raise')
class SamplePayload(StrictMappingSchema):
name = SchemaNode(String())
foo = SchemaNode(Int())

Related

Customize JSON representation of Pydantic model

I have a Pydantic model defined as follows:
class IntOrString(BaseModel):
int_value: Optional[StrictInt] = None
string_value: Optional[StrictStr] = None
Is there a way I can customize json() to make the output as follows:
p = IntOrString(int_value=123)
print(p.json())
#> 123
p = IntOrString(string_value="Hello World")
print(p.json())
#> "Hello World"
Note: IntOrString can be a nested attribute of another Pydantic model.
In addition to object (e.g. {"id": 123}), string, number boolean are also valid JSON type. In other words, the question is can a pydantic model be serialized to string, number or boolean instead of object?
I know it's a weird requirement. Just want to know if that's possible.
Thank you.
For such a simple thing as excluding None-valued fields in the JSON representation, you can simply use the built-in exclude_none parameter:
from typing import Optional
from pydantic import BaseModel, StrictInt, StrictStr
class Dummy(BaseModel):
id: Optional[StrictInt] = None
name: Optional[StrictStr] = None
class Other(BaseModel):
dummy: Dummy
if __name__ == '__main__':
p = Dummy(id=123)
print(p.json(exclude_none=True))
p = Dummy(name="Hello World")
print(p.json(exclude_none=True))
o = Other(dummy=Dummy(id=123))
print(o.json(exclude_none=True))
Output:
{"id": 123}
{"name": "Hello World"}
{"dummy": {"id": 123}}
If you want more complex stuff, you may want to provide your own custom JSON encoder either via the encoder parameter on a call-by-call basis or in the model config via json_dumps or json_encoders.

Conditional call of a FastAPI Model

I have a multilang FastAPI connected to MongoDB. My document in MongoDB is duplicated in the two languages available and structured this way (simplified example):
{
"_id": xxxxxxx,
"en": {
"title": "Drinking Water Composition",
"description": "Drinking water composition expressed in... with pesticides.",
"category": "Water",
"tags": ["water","pesticides"]
},
"fr": {
"title": "Composition de l'eau de boisson",
"description": "Composition de l'eau de boisson exprimée en... présence de pesticides....",
"category": "Eau",
"tags": ["eau","pesticides"]
},
}
I therefore implemented two models DatasetFR and DatasetEN, each one makeS references with specific external Models (Enum) for category and tags in each lang.
class DatasetFR(BaseModel):
title:str
description: str
category: CategoryFR
tags: Optional[List[TagsFR]]
# same for DatasetEN chnaging the lang tag to EN
In the routes definition I forced the language parameter to declare the corresponding Model and get the corresponding validation.
#router.post("?lang=fr", response_description="Add a dataset")
async def create_dataset(request:Request, dataset: DatasetFR = Body(...), lang:str="fr"):
...
return JSONResponse(status_code=status.HTTP_201_CREATED, content=created_dataset)
#router.post("?lang=en", response_description="Add a dataset")
async def create_dataset(request:Request, dataset: DatasetEN = Body(...), lang:str="en"):
...
return JSONResponse(status_code=status.HTTP_201_CREATED, content=created_dataset)
But this seems to be in contradiction with the DRY principle. So, I wonder here if someone knows an elegant solution to: - given the parameter lang, dynamically call the corresponding model.
Or, if we can create a Parent Model Dataset that takes the lang argument and retrieve the child model Dataset.
This would incredibly ease building my API routes and the call of my models and mathematically divide by two the writing...
There are 2 parts to the answer (API call and data structure)
for the API call, you could separate them into 2 routes like /api/v1/fr/... and /api/v1/en/... (separating ressource representation!) and play with fastapi.APIRouter to declare the same route twice but changing for each route the validation schema by the one you want to use.
you could start by declaring a common BaseModel as an ABC as well as an ABCEnum.
from abc import ABC
from pydantic import BaseModel
class MyModelABC(ABC, BaseModel):
attribute1: MyEnumABC
class MyModelFr(MyModelABC):
attribute1: MyEnumFR
class MyModelEn(MyModelABC):
attribute1: MyEnumEn
Then you can select the accurate Model for the routes through a class factory:
my_class_factory: dict[str, MyModelABC] = {
"fr": MyModelFr,
"en": MyModelEn,
}
Finally you can create your routes through a route factory:
def generate_language_specific_router(language: str, ...) -> APIRouter:
router = APIRouter(prefix=language)
MySelectedModel: MyModelABC = my_class_factory[language]
#router.post("/")
def post_something(my_model_data: MySelectedModel):
# My internal logic
return router
About the second part (internal computation and data storage), internationalisation is often done through hashmaps.
The standard python library gettext could be investigated
Otherwise, the original language can be explicitely used as the key/hash and then map translations to it (also including the original language if you want to have consistency in your calls).
It can look like:
dictionnary_of_babel = {
"word1": {
"en": "word1",
"fr": "mot1",
},
"word2": {
"en": "word2",
},
"Drinking Water Composition": {
"en": "Drinking Water Composition",
"fr": "Composition de l'eau de boisson",
},
}
my_arbitrary_object = {
"attribute1": "word1",
"attribute2": "word2",
"attribute3": "Drinking Water Composition",
}
my_translated_object = {}
for attribute, english_sentence in my_arbitrary_object.items():
if "fr" in dictionnary_of_babel[english_sentence].keys():
my_translated_object[attribute] = dictionnary_of_babel[english_sentence]["fr"]
else:
my_translated_object[attribute] = dictionnary_of_babel[english_sentence]["en"] # ou sans "en"
expected_translated_object = {
"attribute1": "mot1",
"attribute2": "word2",
"attribute3": "Composition de l'eau de boisson",
}
assert expected_translated_object == my_translated_object
This code should run as is
A proposal for mongoDB representation, if we don't want to have a separate table for translations, could be a data structure such as:
# normal:
my_attribute: "sentence"
# internationalized
my_attribute_internationalized: {
sentence: {
original_lang: "sentence"
lang1: "sentence_lang1",
lang2: "sentence_lang2",
}
}
A simple tactic to generalize string translation is to define an anonymous function _() that embeds the translation like:
CURRENT_MODULE_LANG = "fr"
def _(original_string: str) -> str:
"""Switch from original_string to translation"""
return dictionnary_of_babel[original_string][CURRENT_MODULE_LANG]
Then call it everywhere a translation is needed:
>>> print(_("word 1"))
"mot 1"
You can find a reference to this practice in the django documentation about internationalization-in-python-code.
For static translation (for example a website or a documentation), you can use .po files and editors like poedit (See the french translation of python docs for a practical usecase)!
Option 1
A solution would be the following. Define lang as Query paramter and add a regular expression that the parameter should match. In your case, that would be ^(fr|en)$, meaning that only fr or en would be valid inputs. Thus, if no match was found, the request would stop there and the client would receive a "string does not match regex..." error.
Next, define the body parameter as a generic type of dict and declare it as Body field; thus, instructing FastAPI to expect a JSON body.
Following, create a dictionary of your models that you can use to look up for a model using the lang attribute. Once you find the corresponding model, try to parse the JSON body using models[lang].parse_obj(body) (equivalent to using models[lang](**body)). If no ValidationError is raised, you know the resulting model instance is valid. Otherwise, return an HTTP_422_UNPROCESSABLE_ENTITY error, including the errors, which you can handle as desired.
If you would also like FR and EN being valid lang values, adjust the regex to ignore case using ^(?i)(fr|en)$ instead, and make sure to convert lang to lower case when looking up for a model (i.e., models[lang.lower()].parse_obj(body)).
import pydantic
from fastapi import FastAPI, Response, status, Body, Query
from fastapi.responses import JSONResponse
from fastapi.encoders import jsonable_encoder
models = {"fr": DatasetFR, "en": DatasetEN}
#router.post("/", response_description="Add a dataset")
async def create_dataset(body: dict = Body(...), lang: str = Query(..., regex="^(fr|en)$")):
try:
model = models[lang].parse_obj(body)
except pydantic.ValidationError as e:
return Response(content=e.json(), status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, media_type="application/json")
return JSONResponse(content=jsonable_encoder(model.dict()), status_code=status.HTTP_201_CREATED)
Update
Since the two models have identical attributes (i.e., title and description), you could define a parent model (e.g., Dataset) with those two attributes, and have DatasetFR and DatasetEN models inherit those.
class Dataset(BaseModel):
title:str
description: str
class DatasetFR(Dataset):
category: CategoryFR
tags: Optional[List[TagsFR]]
class DatasetEN(Dataset):
category: CategoryEN
tags: Optional[List[TagsEN]]
Additionally, it might be a better approach to move the logic from inside the route to a dependecy function and have it return the model, if it passes the validation; otherwise, raise an HTTPException, as also demonstrated by #tiangolo. You can use jsonable_encoder, which is internally used by FastAPI, to encode the validation errors() (the same function can also be used when returning the JSONResponse).
from fastapi.exceptions import HTTPException
from fastapi import Depends
models = {"fr": DatasetFR, "en": DatasetEN}
async def checker(body: dict = Body(...), lang: str = Query(..., regex="^(fr|en)$")):
try:
model = models[lang].parse_obj(body)
except pydantic.ValidationError as e:
raise HTTPException(detail=jsonable_encoder(e.errors()), status_code=status.HTTP_422_UNPROCESSABLE_ENTITY)
return model
#router.post("/", response_description="Add a dataset")
async def create_dataset(model: Dataset = Depends(checker)):
return JSONResponse(content=jsonable_encoder(model.dict()), status_code=status.HTTP_201_CREATED)
Option 2
A further approach would be to have a single Pydantic model (let's say Dataset) and customise the validators for category and tags fields. You can also define lang as part of Dataset, thus, no need to have it as query parameter. You can use a set, as described here, to keep the values of each Enum class, so that you can efficiently check if a value exists in the Enum; and have dictionaries to quickly look up for a set using the lang attribute. In the case of tags, to verify that every element in the list is valid, use set.issubset, as described here. If an attribute is not valid, you can raise ValueError, as shown in the documentation, "which will be caught and used to populate ValidationError" (see "Note" section here). Again, if you need the lang codes written in uppercase being valid inputs, adjust the regex pattern, as described earlier.
P.S. You don't even need to use Enum with this approach. Instead, populate each set below with the permitted values. For instance,
categories_FR = {"Eau"} categories_EN = {"Water"} tags_FR = {"eau", "pesticides"} tags_EN = {"water", "pesticides"}. Additionally, if you would like not to use regex, but rather have a custom validation error for lang attribute as well, you could add it in the same validator decorator and perform validation similar (and previous) to the other two fields.
from pydantic import validator
categories_FR = set(item.value for item in CategoryFR)
categories_EN = set(item.value for item in CategoryEN)
tags_FR = set(item.value for item in TagsFR)
tags_EN = set(item.value for item in TagsEN)
cats = {"fr": categories_FR, "en": categories_EN}
tags = {"fr": tags_FR, "en": tags_EN}
def raise_error(values):
raise ValueError(f'value is not a valid enumeration member; permitted: {values}')
class Dataset(BaseModel):
lang: str = Body(..., regex="^(fr|en)$")
title: str
description: str
category: str
tags: List[str]
#validator("category", "tags")
def validate_atts(cls, v, values, field):
lang = values.get('lang')
if lang:
if field.name == "category":
if v not in cats[lang]: raise_error(cats[lang])
elif field.name == "tags":
if not set(v).issubset(tags[lang]): raise_error(tags[lang])
return v
#router.post("/", response_description="Add a dataset")
async def create_dataset(model: Dataset):
return JSONResponse(content=jsonable_encoder(model.dict()), status_code=status.HTTP_201_CREATED)
Option 3
Another approach would be to use Discriminated Unions, as described in this answer.
As per the documentation:
When Union is used with multiple submodels, you sometimes know
exactly which submodel needs to be checked and validated and want to
enforce this. To do that you can set the same field - let's call it
my_discriminator - in each of the submodels with a discriminated
value, which is one (or many) Literal value(s). For your Union,
you can set the discriminator in its value:
Field(discriminator='my_discriminator').
Setting a discriminated union has many benefits:
validation is faster since it is only attempted against one model
only one explicit error is raised in case of failure
the generated JSON schema implements the associated OpenAPI specification

graphQL and graphene: How to extract information from a graphQL query using graphene?

I want to extract the line 'Unique protein chains: 1' from this entry, using a graphQL query.
I know this is the query I want to use:
{
entry(entry_id: "5O6C") {
rcsb_entry_info {
polymer_entity_count_protein
}
}
}
and I can see the output if I use the graphQL interface here:
{
"data": {
"entry": {
"rcsb_entry_info": {
"polymer_entity_count_protein": 1
}
}
}
}
Has the information I want : "polymer_entity_count_protein": 1
I want to run this query through python so it can be fed into other pipelines (and also process multiple IDs).
I found graphene to be one library that will do graphQL queries, and this is the hello world example, which I can get to work on my machine:
import graphene
class Query(graphene.ObjectType):
hello = graphene.String(name=graphene.String(default_value="world"))
def resolve_hello(self, info, name):
return name
schema = graphene.Schema(query=Query)
result = schema.execute('{ hello }')
print(result.data['hello']) # "Hello World"
I don't understand how to combine the two. Can someone show me how I edit my python code with the query of interest, so what's printed at the end is:
'506C 1'
I have seen some other examples/queries about graphene/graphQL: e.g. here; except I can't understand how to make my specific example work.
Based on answer below, I ran:
import graphene
class Query(graphene.ObjectType):
# ResponseType needs to be the type of your response
# the following line defines the return value of your query (ResponseType)
# and the inputType (graphene.String())
entry = graphene.String(entry_id=graphene.String(default_value=''))
def resolve_entry(self, info, **kwargs):
id = kwargs.get('entry_id')
# as you already have a working query you should enter the logic here
schema = graphene.Schema(query=Query)
# not totally sure if the query needs to look like this, it also depends heavily on your response type
query = '{ entry(entry_id="506C"){rcsb_entry_info}'
result = schema.execute(query)
print("506C" + str(result.data.entry.rcsb_entry_info.polymer_entity_count_protein))
However, I get:
Traceback (most recent call last):
File "graphene_query_for_rcsb.py", line 18, in <module>
print("506C" + str(result.data.entry.rcsb_entry_info.polymer_entity_count_protein))
AttributeError: 'NoneType' object has no attribute 'entry'
Did you write the logic of the already working query you have in your question? Is it not using python/ graphene?
I'm not sure if I understood the question correctly but here's a general idea:
import graphene
class Query(graphene.ObjectType):
# ResponseType needs to be the type of your response
# the following line defines the return value of your query (ResponseType)
# and the inputType (graphene.String())
entry = graphene.Field(ResponseType, entry_id=graphene.String()
def resolve_entry(self, info, **kwargs):
id = kwargs.get('entry_id')
# as you already have a working query you should enter the logic here
schema = graphene.Schema(query=Query)
# not totally sure if the query needs to look like this, it also depends heavily on your response type
query = '{ entry(entry_id="506C"){rcsb_entry_info}}'
result = schema.execute(query)
print("506C" + str(result.data.entry.rcsb_entry_info.polymer_entity_count_protein)
Here an example for a response type:
if you have the query
# here TeamType is my ResponseType
get_team = graphene.Field(TeamType, id=graphene.Int())
def resolve_get_team(self, info, **kwargs):
id = kwargs.get('id')
if id is not None:
return Team.objects.get(pk=id)
else:
raise Exception();
the responseType is defined as:
class TeamType(DjangoObjectType):
class Meta:
model = Team
but you can also define a response type that is not based on a model:
class DeleteResponse(graphene.ObjectType):
numberOfDeletedObject = graphene.Int(required=True)
numberOfDeletedTeams = graphene.Int(required=False)
And your response type should look something like this:
class myResponse(graphene.ObjectType):
rcsb_entry_info = graphne.Field(Polymer)
class Polymer(graphene.ObjectType):
polymer_entity_count_protein = graphene.Int()
again this is not testet or anything and I don't really know what your response really is.

How to get the type of a variable defined in a protobuf message?

I'm trying to do some 'translation' from protobuf files to Objective-C classes using Python. For example, given the protobuf message:
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
}
I want to translate it into an objc class:
#interface Person : NSObject
#property (nonatomic, copy) NSString *name;
#property (nonatomic, assign) int ID;
#property (nonatomic, copy) NSString *email;
#end
The key point is to acquire every property's name and type. For example, 'optional string email' in the protobuf message, its name is 'email', type is 'string', so it should be NSString *email in objective-c. I followed the official tutorial, wrote an addressbook.proto just the same as the one in the tutorial and compiled it. Then I wrote my python code:
import addressbook_pb2 as addressbook
p = addressbook.Person()
all_fields = p.DESCRIPTOR.fields_by_name
# print "all fields: %s" %all_fields
field_keys = all_fields.keys()
# print "all keys: %s" %field_keys
for key in field_keys:
one_field = all_fields[key]
print one_field.label
This just gave me:
1
2
3
2
So I guess label is not what I need, while field_keys is just the list of names that I expect. I tried some other words, and did some search on the web, but didn't find the right answer.
If there's no way to acquire the type, I have another thought, which is to read and analyze every line of the protobuf source file in a pure 'Pythonic' way, but I really don't want to do this if its not necessary.
Can anybody help me?
The FieldDescriptor class has a message_type member which, if a composite field, is a descriptor of the message type contained in this field. Otherwise, this is None.
Combine this with iterating through a dictionary of DESCRIPTORS means you can get the name and type of composite and non-composite (raw) fields.
import addressbook_pb2 as addressbook
DESCRIPTORS = addressbook.Person.DESCRIPTOR.fields_by_name
for (field_name, field_descriptor) in DESCRIPTORS.items():
if field_descriptor.message_type:
# Composite field
print(field_name, field_descriptor.message_type.name)
else:
# Raw type
print(field_name, field_descriptor.type)
# TYPE_DOUBLE
# TYPE_FLOAT
# TYPE_INT64
# TYPE_UINT64
# TYPE_INT32
# TYPE_FIXED64
# TYPE_FIXED32
# TYPE_BOOL
# TYPE_STRING
# TYPE_GROUP
# TYPE_MESSAGE
# TYPE_BYTES
# TYPE_UINT32
# TYPE_ENUM
# TYPE_SFIXED32
# TYPE_SFIXED64
# TYPE_SINT32
# TYPE_SINT64
# MAX_TYPE
The raw types are class attributes; https://github.com/protocolbuffers/protobuf/blob/master/python/google/protobuf/descriptor.py
Thanks to Marc's answer, I figured out some solution. This is just a thought, but it's a huge step for me.
Python code:
import addressbook_pb2 as addressbook
typeDict = {"1":"CGFloat", "2":"CGFloat", "3":"NSInteger", "4":"NSUinteger", "5":"NSInteger", "8":"BOOL", "9":"NSString", "13":"NSUinteger", "17":"NSInteger", "18":"NSInteger"}
attrDict = {"CGFloat":"assign", "NSInteger":"assign", "NSUinteger":"assign", "BOOL":"assign", "NSString":"copy"}
p = addressbook.Person()
all_fields = p.DESCRIPTOR.fields_by_name
field_keys = all_fields.keys()
for key in field_keys:
one_field = all_fields[key]
typeNumStr = str(one_field.type)
className = typeDict.get(typeNumStr, "NSObject")
attrStr = attrDict.get(className, "retain")
propertyStr = "#property (nonatomic, %s) %s *%s" %(attrStr, className, key)
print propertyStr
For the addressbook example, it prints:
#property (nonatomic, copy) NSString *email
#property (nonatomic, copy) NSString *name
#property (nonatomic, retain) NSObject *phone
#property (nonatomic, assign) NSInteger *id
Not the final solution, but it means a lot. Thank you, Marc!

ndb.Key filter for MapReduce input_reader

Playing with new Google App Engine MapReduce library filters for input_reader I would like to know how can I filter by ndb.Key.
I read this post and I've played with datetime, string, int, float, in filters tuples, but How I can filter by ndb.Key?
When I try to filter by a ndb.Key I get this error:
BadReaderParamsError: Expected Key, got u"Key('Clients', 406)"
Or this error:
TypeError: Key('Clients', 406) is not JSON serializable
I tried to pass a ndb.Key object and string representation of the ndb.Key.
Here are my two filters tuples:
Sample 1:
input_reader': {
'input_reader': 'mapreduce.input_readers.DatastoreInputReader',
'entity_kind': 'model.Sales',
'filters': [("client","=", ndb.Key('Clients', 406))]
}
Sample 2:
input_reader': {
'input_reader': 'mapreduce.input_readers.DatastoreInputReader',
'entity_kind': 'model.Sales',
'filters': [("client","=", "%s" % ndb.Key('Clients', 406))]
}
This is a bit tricky.
If you look at the code on Google Code you can see that mapreduce.model defines a JSON_DEFAULTS dict which determines the classes that get special-case handling in JSON serialization/deserialization: by default, just datetime. So, you can monkey-patch the ndb.Key class into there, and provide it with functions to do that serialization/deserialization - something like:
from mapreduce import model
def _JsonEncodeKey(o):
"""Json encode an ndb.Key object."""
return {'key_string': o.urlsafe()}
def _JsonDecodeKey(d):
"""Json decode a ndb.Key object."""
return ndb.Key(urlsafe=d['key_string'])
model.JSON_DEFAULTS[ndb.Key] = (_JsonEncodeKey, _JsonDecodeKey)
model._TYPE_IDS['Key'] = ndb.Key
You may also need to repeat those last two lines to patch mapreduce.lib.pipeline.util as well.
Also note if you do this, you'll need to ensure that this gets run on any instance that runs any part of a mapreduce: the easiest way to do this is to write a wrapper script that imports the above registration code, as well as mapreduce.main.APP, and override the mapreduce URL in your app.yaml to point to your wrapper.
Make your own input reader based on DatastoreInputReader, which knows how to decode key-based filters:
class DatastoreKeyInputReader(input_readers.DatastoreKeyInputReader):
"""Augment the base input reader to accommodate ReferenceProperty filters"""
def __init__(self, *args, **kwargs):
try:
filters = kwargs['filters']
decoded = []
for f in filters:
value = f[2]
if isinstance(value, list):
value = db.Key.from_path(*value)
decoded.append((f[0], f[1], value))
kwargs['filters'] = decoded
except KeyError:
pass
super(DatastoreKeyInputReader, self).__init__(*args, **kwargs)
Run this function on your filters before passing them in as options:
def encode_filters(filters):
if filters is not None:
encoded = []
for f in filters:
value = f[2]
if isinstance(value, db.Model):
value = value.key()
if isinstance(value, db.Key):
value = value.to_path()
entry = (f[0], f[1], value)
encoded.append(entry)
filters = encoded
return filters
Are you aware of the to_old_key() and from_old_key() methods?
I had the same problem and came up with a workaround with computed properties.
You can add to your Sales model a new ndb.ComputedProperty with the Key id. Ids are just strings, so you wont have any JSON problems.
client_id = ndb.ComputedProperty(lambda self: self.client.id())
And then add that condition to your mapreduce query filters
input_reader': {
'input_reader': 'mapreduce.input_readers.DatastoreInputReader',
'entity_kind': 'model.Sales',
'filters': [("client_id","=", '406']
}
The only drawback is that Computed properties are not indexed and stored until you call the put() parameter, so you will have to traverse all the Sales entities and save them:
for sale in Sales.query().fetch():
sale.put()

Categories