Python Cerberus: problem with validation different schemas using 'anyof_schema' rule - python

I am trying to use Cerberus to validate a list that contains strings or dictionaries using anyof_schema rule as proposed in this post:
from cerberus import Validator
A = {'type': 'dict',
'schema': {'name': {'type': 'string', 'required': True},
'run': {'type': 'string', 'required': True}}}
B = {'type': 'string', 'empty': False}
schema = {
'some_field': {
'type': 'list',
'anyof_schema': [A, B]
}
}
v = Validator(schema)
challenge = {
'some_field': ['simple string 1', {'name': 'some name', 'run': 'some command'}]
}
print(v.validate(challenge))
print(v.errors)
But validation fails, output:
False
{'some_field': ['no definitions validate', {'anyof definition 0': [{0: ['must be of dict type']}], 'anyof definition 1': [{1: ['must be of string type']}]}]}
It seems that anyof_schema rule works only if all schemas in the provided set describe the same data type (e.g. dictionaries).
Why anyof_schema rule fails in my case and how can I resolve this problem?
I am using Python 3.5.3 and Cerberus 1.3.1

The thing is that your schema looks expanded like this:
{"some_field": {
"anyof" : [
{"schema": …},
{"schema": …},
]
}}
which means that whole list is validated against only one of the variants per rules set contained by anyof.
Thus you just need to swap anyof and schema in your hierarchy:
{"some_field": {
"schema" : {
"anyof":[
{"type": "dict", …},
{"type": "str", …},
]
}
}}
This validates each item of the list against the allowed variants and these can therefore be of various 'shape'.

Related

RequestError: RequestError(400, search_phase_execution_exception failed to create query For input string in elasticsearch

Below is my dictionary
abc = [
{'id':"1", 'name': 'cristiano ronaldo', 'description': 'portugal#fifa.com'},
{'id':"2", 'name': 'lionel messi', 'description': 'argentina#fifa.com'},
{'id':"3", 'name': 'Lionel Jr', 'description': 'brazil#fifa.com'}
]
Ingested the players into elasticsearch
for i in abc:
es.index(index="players", body=i, id=i['id'])
Below is the dsl query
resp = es.search(index="players",body={
"query": {
"query_string": {
"fields": ["id^12","description^2", "name^2"],
"query": "brazil#fifa.com"
}
}})
resp
Issue 1:
if "fields": ["id^12","description^2", "name^2"] then i am getting error RequestError: RequestError(400, 'search_phase_execution_exception', 'failed to create query: For input string: "brazil#fifa.com"'
Issue 2:
if my fields are ["description^2", "name^2"] I am expecting one document which contain brazil#fifa.com but returning all 3 documents
Edited: From the comment of sagar my setting id was long which i changed now . mapping is below. and issue1 is resolved
{'players': {'mappings': {'properties': {'description': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'id': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
'name': {'type': 'text',
'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}}}}
Issue 1: if "fields": ["id^12","description^2", "name^2"] then i am
getting error RequestError: RequestError(400,
'search_phase_execution_exception', 'failed to create query: For input
string: "brazil#fifa.com"'
Above issue you are getting because your id field is define as integer or flot type of field (other then text type of field). You need to provide "lenient": true in your query and it will not return any exception.
Issue 2: if my fields are ["description^2", "name^2"] I am expecting
one document which contain brazil#fifa.com but returning all 3
documents
Above issue is happening because you are searching on text type of field which applied default standard analyzer when you do search.
So when you search brazil#fifa.com then it will create two token brazil and fifa.com. Here, fifa.com is matching in your all 3 documents so it is returning in result. To resolved this issue, you can use description.keyword field.
Below query will resolved your both issue:
{
"query": {
"query_string": {
"lenient": true,
"fields": [
"id^12",
"description.keyword^2",
"name^2"
],
"query": "brazil#fifa.com"
}
}
}
Updated:
Based on comment if you want to search fifa as well then you need to provide description as field but when you search brazil#fifa.com then you need to provide it in double quotes for exact match. Please see below example:
{
"query": {
"query_string": {
"lenient": true,
"fields": [
"id^12",
"description^2",
"name^2"
],
"query": "\"brazil#fifa.com\""
}
}
}

How to access MongoDB array in which is are stored key-value pairs by key name

I am working with pymongo and after writing aggregate query
db.collection.aggregate([{'$project': {'Id': '$ResultData.Id','data' : '$Results.Data'}}])
I received the object:
{'data': [{'key': 'valid', 'value': 'true'},
{'key': 'number', 'value': '543543'},
{'key': 'name', 'value': 'Saturdays cx'},
{'key': 'message', 'value': 'it is valid.'},
{'key': 'city', 'value': 'London'},
{'key': 'street', 'value': 'Bigeye'},
{'key': 'pc', 'value': '3566'}],
Is there a way that I can access the values by the key name? Like that '$Results.Data.city' and I will receive London. I would like to do that on the level of MongoDB aggregate query so it means I want to write a query in the way:
db.collection.aggregate([{'$project':
{'Id': '$ResultData.Id',
'data' : '$Results.Data',
'city' : $Results.Data.city',
'name' : $Results.Data.name',
'street' : $Results.Data.street',
'pc' : $Results.Data.pc',
}}])
And receive all the values of provided keys.
Using the $elemMatch projection operator in the following query from mongo shell:
db.collection.find(
{ _id: <some_value> },
{ _id: 0, data: { $elemMatch: { key: "city" } } }
)
The output:
{ "data" : [ { "key" : "city", "value" : "London" } ] }
Using PyMongo (gets the same output):
collection.find_one(
{ '_id': <some_value> },
{ '_id': 0, 'data': { '$elemMatch': { 'key': 'city' } } }
)
Using PyMongo aggregate method (gets the same result):
pipeline = [
{
'$project': {
'_id': 0,
'data': {
'$filter': {
'input': '$data', 'as': 'dat',
'cond': { '$eq': [ '$$dat.key', INPUT_KEY ] }
}
}
}
}
]
INPUT_KEY = 'city'
pprint.pprint(list(collection.aggregate(pipeline)))
Naming the received object "result", if result['data'] always is a list of dictionaries with 2 keys (key and value), you can convert the whole list to a dictionary using keys as keys and values as values. Given that this statement is somewhat confusing, here's the code:
data = {pair['key']: pair['value'] for pair in result['data']}
From here, data['city'] will give you 'London', data['street'] will be 'Bigeye' and so on. Obviously, this assumes that there are no conflicts amoung key values in result['data']. Note that this dictionary will (just as the original result['data']) only contain strings so don't expect data['number'] to be an integer.
Another approach would be to dynamically create an object holding each key-value pair as an attribute, allowing you to use the following syntax: data.city, data.street, ... But this would required more complicated code and is a less common and stable approach.

How to check referential integrity in Cerberus?

Consider the following Cerberus schema:
{
'employee': {
'type': 'list',
'schema': {
'type': 'dict',
'schema': {
'id': {'required': True, 'type': 'integer'},
'name': {'required': True, 'type': 'string'}
}
}
},
'ceo-employee-id': {'required': True, 'type': 'integer'}
}
1) How can I validate that the ceo-employee-id matches one of the id values in the employee list? (Referential integrity)
2) How can I validate that each id in the employee list is unique (i.e. no duplicate employee ids)?
I realize I can do this at run-time after validating and parsing the config as suggested by #rafael below. I am wondering if I can do it with the Cerberus validation features.
You'll need to make use of a custom validator that implements check_with methods, use the document property in these, and amend your schema to include these:
from cerberus import Validator
class CustomValidator(Validator):
def _check_with_ceo_employee(self, field, value):
if value not in (x["id"] for x in self.document["employee"]):
self._error(field, "ID is missing in employee list.")
def _check_with_employee_id_uniqueness(self, field, value):
all_ids = [x["id"] for x in self.document["employee"]]
if len(all_ids) != len(set(all_ids)):
self._error(field, "Employee IDs are not unique.")
validator = CustomValidator({
'employee': {
'type': 'list',
'schema': {
'type': 'dict',
'schema': {
'id': {'required': True, 'type': 'integer'},
'name': {'required': True, 'type': 'string'}
},
},
'check_with': 'employee_id_uniqueness'
},
'ceo-employee-id': {'required': True, 'type': 'integer', 'check_with': 'ceo_employee'}
})
The referenced document contains hints on all the parts used here.
(I apologize for any indentation error that might have slipped into the example.)
assuming that you already have validated the schema of your json you can easily check your two conditions like this.
Let doc be your json document.
employee_ids = [employee['id'] for employee in doc['employee']]
ceo_employee_id = doc['ceo-employee-id']
1) How can I validate that the ceo-employee-id matches one of the id values in the employee list? (Referential integrity)
ceo_id_exists_in_employees = any([employee_id == ceo_employee_id for employee_id in employee_ids])
2) How can I validate that each id in the employee list is unique (i.e. no duplicate employee ids)?
employee_id_is_unique = len(set(employee_ids)) == len(employee_ids)
3) Assert that both values are True
if ceo_id_exists_in_employees and employee_id_is_unique:
print('passed')
else:
print('failed')

Python Eve - Query Embedded Data Relation

I have the following resource defined:
item = {
'wrapper': {
'type': 'dict',
'schema': {
'element': {
'type': 'objectid',
'data_relation': {
'resource': 'code',
'field': '_id',
'embeddable': True,
},
},
},
},
}
When I try to query using the objectid, I get empty list.
http://127.0.0.1:5000/item?where={"wrapper.element":"5834987589b0dc353b72c27d"}
5834987589b0dc353b72c27d is the valid _id for the element.
If I move the data relation out of the embedded document I can query it as expected
Is there anyway to do this with an embedded data relation?
I have just tested with eve==0.7.1 and it works as expected by filtering with ?where={"wrapper.element" : "<your_objectid>"}, as you said.
I had a problem where the _id was being stored as a string rather than an ObjectId(), this broke the query

Combination of two fields to be unique in Python Eve

In Python Eve framework, is it possible to have a condition which checks combination of two fields to be unique?
For example the below definition restricts only firstname and lastname to be unique for items in the resource.
people = {
# 'title' tag used in item links.
'item_title': 'person',
'schema': {
'firstname': {
'type': 'string',
'required': True,
'unique': True
},
'lastname': {
'type': 'string',
'required': True,
'unique': True
}
}
Instead, is there a way to restrict firstname and lastname combination to be unique?
Or is there a way to implement a CustomValidator for this?
You can probably achieve what you want by overloading the _validate_unique and implementing custom logic there, taking advantage of self.document in order to retrieve the other field value.
However, since _validate_unique is called for every unique field, you would end up performing your custom validation twice, once for firstname and then for lastname. Not really desirable. Of course the wasy way out is setting up fullname field, but I guess that's not an option in your case.
Have you considered going for a slighty different design? Something like:
{'name': {'first': 'John', 'last': 'Doe'}}
Then all you need is make sure that name is required and unique:
{
'name': {
'type':'dict',
'required': True,
'unique': True,
'schema': {
'first': {'type': 'string'},
'last': {'type': 'string'}
}
}
}
Inspired by Nicola and _validate_unique.
from eve.io.mongo import Validator
from eve.utils import config
from flask import current_app as app
class ExtendedValidator(Validator):
def _validate_unique_combination(self, unique_combination, field, value):
""" {'type': 'list'} """
self._is_combination_unique(unique_combination, field, value, {})
def _is_combination_unique(self, unique_combination, field, value, query):
""" Test if the value combination is unique.
"""
if unique_combination:
query = {k: self.document[k] for k in unique_combination}
query[field] = value
resource_config = config.DOMAIN[self.resource]
# exclude soft deleted documents if applicable
if resource_config['soft_delete']:
query[config.DELETED] = {'$ne': True}
if self.document_id:
id_field = resource_config['id_field']
query[id_field] = {'$ne': self.document_id}
datasource, _, _, _ = app.data.datasource(self.resource)
if app.data.driver.db[datasource].find_one(query):
key_names = ', '.join([k for k in query])
self._error(field, "value combination of '%s' is not unique" % key_names)
The way I solved this issue is by creating a dynamic field using a combination of functions and lambdas to create a hash that will use
which ever fields you provide
def unique_record(fields):
def is_lambda(field):
# Test if a variable is a lambda
return callable(field) and field.__name__ == "<lambda>"
def default_setter(doc):
# Generate the composite list
r = [
str(field(doc)
# Check is lambda
if is_lambda(field)
# jmespath is not required, but it enables using nested doc values
else jmespath.search(field, doc))
for field in fields
]
# Generate MD5 has from composite string (Keep it clean)
return hashlib.md5(''.join(r).encode()).hexdigest()
return {
'type': 'string',
'unique': True,
'default_setter': default_setter
}
Practical Implementation
My use case was to create a collection that limits the amount of key value pairs a user can create within the collection
domain = {
'schema': {
'key': {
'type': 'string',
'minlength': 1,
'maxlength': 25,
'required': True,
},
'value': {
'type': 'string',
'minlength': 1,
'required': True
},
'hash': unique_record([
'key',
lambda doc: request.USER['_id']
]),
'user': {
'type': 'objectid',
'default_setter': lambda doc: request.USER['_id'] # User tenant ID
}
}
}
}
The function will receive a list of either string or lambda function for dynamic value setting at request time, in my case the user's "_id"
The function supports the use of JSON query with the JMESPATH package, this isn't mandatory, but leave the door open for nested doc flexibility in other usecases
NOTE: This will only work with values that are set by the USER at request time or injected into the request body using the pre_GET trigger pattern, like the USER object I inject in the pre_GET trigger which represents the USER currently making the request

Categories