Validate dicts in Python - python

i looking for tool, or examples to/how to validate dictionaries in python.
For example, i have dict:
test = {'foo' : 'bar', 'nested' : {'foo1' : 'bar1', 'foo2' : 'bar2'} }
And now i must validate it. Lets say, value for key foo must be boolean False or non-empty string. Next, if key foo1 have value bar1, that key foo2 must be int in range 1..10. I wrote simple function to do this, but this is not what i exactly want. Yea, sure, i can test every single item in dict with if..else, but if dict have >50 elements, then it is a bit not comfortable.
Is there any good tool/lib to do this in Python? I not looking for parsers, only fast and effective way to do this right.

Voluptous is a nice tool that does this
http://pypi.python.org/pypi/voluptuous

You can also try the link below:
https://github.com/sunlightlabs/validictory
Its a great package that helps in validation in an easier way

I highly recommend Cerberus for its readability or jsonschema because it uses the JSON Schema standard

Webster is a pypi package that does dictionary validation and value regex validation.. this allows you to insure that the dictionary has all the keys its supposed to and the values are more or less what you would expect.
https://pypi.python.org/pypi/Webster

This dict-schema-validator package is a very simple way to validate python dictionaries.
Here is a simple schema representing a Customer:
{
"_id": "ObjectId",
"created": "date",
"is_active": "bool",
"fullname": "string",
"age": ["int", "null"],
"contact": {
"phone": "string",
"email": "string"
},
"cards": [{
"type": "string",
"expires": "date"
}]
}
Validation:
from datetime import datetime
import json
from dict_schema_validator import validator
with open('models/customer.json', 'r') as j:
schema = json.loads(j.read())
customer = {
"_id": 123,
"created": datetime.now(),
"is_active": True,
"fullname": "Jorge York",
"age": 32,
"contact": {
"phone": "559-940-1435",
"email": "york#example.com",
"skype": "j.york123"
},
"cards": [
{"type": "visa", "expires": "12/2029"},
{"type": "visa"},
]
}
errors = validator.validate(schema, customer)
for err in errors:
print(err['msg'])
Output:
[*] "_id" has wrong type. Expected: "ObjectId", found: "int"
[+] Extra field: "contact.skype" having type: "str"
[*] "cards[0].expires" has wrong type. Expected: "date", found: "str"
[-] Missing field: "cards[1].expires"

Related

JSONPath - Filter expression is not working as expected

Can't get a JSONPath to work.
JSON:
{
"data": [
{
"meta": {
"definition": {
"title": "ID",
"type": "text",
"key": "657876498"
}
},
"attributes": {
"id": "8606798",
"name": "ID",
"content": {
"value": "ABC"
}
}
}
]
}
Path:
$.data[*].attributes[?(#.name=='ID')]
Which returns no match on jsonpath.com or using jsonpath-ng in python.
What am I fundamentally missing that this filter isn't working?
Note: End goal would be to get name and content.value.
EDIT:
On https://jsonpath.herokuapp.com/ the path actually works. hm...implementation dependent?
Indeed, this seems to be an idiosyncrasy of the jsonpath-ng implementation. It looks like the "filter op can only be used on lists", i.e. an iterable.
As a workaround, we can filter on the data-array and pull the attributes afterward instead:
$.data[?(#.attributes.name="ID")].attributes
This gives the same result as Jayway's JsonPath using your path; however, this approach may not be satisfactory in all situations.
Please try below paths for name and content.value
$.data[:0].attributes.name
$.data[:0].attributes.content.value

No enum error when validating JSON using jsonschema in python

First of all, I am not getting a proper error reponse on the web platform as well (https://jsonschemalint.com). I am using jsonschema in python, and have a proper json schema and json data that works.
The problem I'd like to solve is the following: Before we deliver JSON files with example data, we need to run them through SoapUI to test if they are proper, as we are dealing with huge files and usually our devs may make some errors in generating them, so we do the final check.
I'd like to create a script to automate this, avoiding SoapUI. So after googling, I came across jsonschema, and tried to use it. I get all the proper results,etc, I get errors when I delete certain elements as usual, but the biggest issues are the following:
Example :
I have a subsubsub object in my JSON schema, let's call it Test1, which contains the following :
**Schema**
{
"exname":"2",
"info":{},
"consumes":{},
"produces":{},
"schemes":{},
"tags":{},
"parameters":{},
"paths":{},
"definitions":{
"MainTest1":{
"description":"",
"minProperties":1,
"properties":{
"test1":{
"items":{
"$ref":"#//Test1"
},
"maxItems":10,
"minItems":1,
"type":"array"
},
"test2":{
"items":{
"$ref":"#//"
},
"maxItems":10,
"minItems":1,
"type":"array"
}
}
},
"Test1":{
"description":"test1des",
"minProperties":1,
"properties":{
"prop1":{
"description":"prop1des",
"example":"prop1exam",
"maxLength":10,
"minLength":2,
"type":"string"
},
"prop2":{
"description":"prop2des",
"example":"prop2example",
"maxLength":200,
"minLength":2,
"type":"string"
},
"prop3":{
"enum":[
"enum1",
"enum2",
"enum3"
],
"example":"enum1",
"type":"string"
}
},
"required":[
"prop3"
],
"type":"object"
}
}
}
**Proper example for Test1**
{
"Test1": [{
"prop1": "TestStr",
"prop2": "Test and Test",
"prop3": "enum1"
}]
}
**Improper example that still passes validation for Test1**
{
"test1": [{
"prop1": "TestStr123456", [wrong as it passes the max limit]
"prop2": "Test and Test",
"prop3": " enum1" [wrong as it has a whitespace char before enum1]
}]
}
The first issue I ran across is that enum in prop3 isn't validated correctly. So, when I use " enum1" or "enumruwehrqweur" or "literally anything", the tests pass. In addition, that min-max characters do not get checked throughout my JSON. No matter how many characters I use in any field, I do not get an error. Anyone has any idea how to fix this, or has anyone found a better workaround to do what I would like to do? Thank you in advance!
There were a few issues with your schema. I'll address each of them.
In your schema, you have "Test1". In your JSON instance, you have "test1". Case is important. I would guess this is just an error in creating your example.
In your schema, you have "Test1" at the root level. Because this is not a schema key word, it is ignored, and has no effect on validation. You need to nest it inside a "properties" object, as you have done elsewhere.
{
"properties": {
"test1": {
Your validation would still not work correctly. If you want to validate each item in an array, you need to use the items keyword.
{
"properties": {
"test1": {
"items": {
"description": "test1des",
Finally, you'll need to nest the required and type key words inside the items object.
Here's the complete schema:
{
"properties": {
"test1": {
"items": {
"description": "test1des",
"minProperties": 1,
"properties": {
"prop1": {
"description": "prop1des",
"example": "prop1exam",
"maxLength": 10,
"minLength": 2,
"type": "string"
},
"prop2": {
"description": "prop2des",
"example": "prop2example",
"maxLength": 200,
"minLength": 2,
"type": "string"
},
"prop3": {
"enum": [
"enum1",
"enum2",
"enum3"
],
"example": "enum1",
"type": "string"
}
},
"required": [
"prop3"
],
"type": "object"
}
}
}
}

How to define map of string to string JSON schema using JSL python library?

Let’s say among properties of my JSON document one of them holds a collection of HTTP headers which is simply a map of string key to a string value.
{
"property": "value",
"headers": {
"Content-Type": "text/css",
"Last-Modified": "Tue, 08 Aug 2017 18:57:23 GMT",
"Etag": "123456abc"
}
}
How to define a JSON schema of such document using JSL Python library hopefully achieving something similar to this answer on how to define a map of string to an integer.
Also, I would really like to have an explanation of the resulted JSON schema (similarly to what was shown in the mentioned answer) as I am unable to clearly comprehend it.
JSL library provides a “DictField” class type for such cases when you wish to define an object (dictionary/map) and describe values type via “additional_properties”
For an example:
>>> import jsl
...
... class PayloadSchema(jsl.Document):
... ip_address = jsl.IPv4Field(required=True)
... http_headers = jsl.DictField(required=True, additional_properties=jsl.StringField(), min_properties=1)
...
>>> PayloadSchema.get_schema()
This will produce following JSON schema (draft 4):
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"ip_address": {
"type": "string",
"format": "ipv4"
},
"http_headers": {
"type": "object",
"additionalProperties": {
"type": "string"
},
"minProperties": 1
}
},
"required": [
"ip_address",
"http_headers"
],
"additionalProperties": false
}

python querying a json objectpath

I've a nested json structure, I'm using objectpath (python API version), but I don't understand how to select and filter some information (more precisely the nested information in the structure).
EG.
I want to select the "description" of the action "reading" for the user "John".
JSON:
{
"user":
{
"actions":
[
{
"name": "reading",
"description": "blablabla"
}
]
"name": "John"
}
}
CODE:
$.user[#.name is 'John' and #.actions.name is 'reading'].actions.description
but it doesn't work (empty set but in my JSON it isn't so).
Any suggestion?
Is this what you are trying to do?
import objectpath
data = {
"user": {
"actions": {
"name": "reading",
"description": "blablabla"
},
"name": "John"
}
}
tree = objectpath.Tree(data)
result = tree.execute("$.user[#.name is 'John'].actions[#.name is 'reading'].description")
for entry in result:
print entry
Output
blablabla
I had to fix your JSON. Also, tree.execute returns a generator. You could replace the for loop with print result.next(), but the for loop seemed more clear.
import objectpath import *
your_json = {"name": "felix", "last_name": "diaz"}
# This json path will bring all the key-values of your json
your_json_path='$.*'
my_key_values = Tree(your_json).execute(your_json_path)
# If you want to retrieve the name node...then specify it.
my_name= Tree(your_json).execute('$.name')
# If you want to retrieve a the last_name node...then specify it.
last_name= Tree(your_json).execute('$.last_name')
I believe you're just missing a comma in JSON:
{
"user":
{
"actions": [
{
"name": "reading",
"description": "blablabla"
}
],
"name": "John"
}
}
Assuming there is only one "John", with only one "reading" activity, the following query works:
$.user[#.name is 'John'].actions[0][#.name is 'reading'][0].description
If there could be multiple "John"s, with multiple "reading" activities, the following query will almost work:
$.user.*[#.name is 'John'].actions..*[#.name is 'reading'].description
I say almost because the use of .. will be problematic if there are other nested dictionaries with "name" and "description" entries, such as
{
"user": {
"actions": [
{
"name": "reading",
"description": "blablabla",
"nested": {
"name": "reading",
"description": "broken"
}
}
],
"name": "John"
}
}
To get a correct query, there is an open issue to correctly implement queries into arrays: https://github.com/adriank/ObjectPath/issues/60

Remove duplicates from a list of nested dictionaries

I'm writing my first python program to manage users in Atlassian On Demand using their RESTful API. I call the users/search?username= API to retrieve lists of users, which returns JSON. The results is a list of complex dictionary types that look something like this:
[
{
"self": "http://www.example.com/jira/rest/api/2/user?username=fred",
"name": "fred",
"avatarUrls": {
"24x24": "http://www.example.com/jira/secure/useravatar?size=small&ownerId=fred",
"16x16": "http://www.example.com/jira/secure/useravatar?size=xsmall&ownerId=fred",
"32x32": "http://www.example.com/jira/secure/useravatar?size=medium&ownerId=fred",
"48x48": "http://www.example.com/jira/secure/useravatar?size=large&ownerId=fred"
},
"displayName": "Fred F. User",
"active": false
},
{
"self": "http://www.example.com/jira/rest/api/2/user?username=andrew",
"name": "andrew",
"avatarUrls": {
"24x24": "http://www.example.com/jira/secure/useravatar?size=small&ownerId=andrew",
"16x16": "http://www.example.com/jira/secure/useravatar?size=xsmall&ownerId=andrew",
"32x32": "http://www.example.com/jira/secure/useravatar?size=medium&ownerId=andrew",
"48x48": "http://www.example.com/jira/secure/useravatar?size=large&ownerId=andrew"
},
"displayName": "Andrew Anderson",
"active": false
}
]
I'm calling this multiple times and thus getting duplicate people in my results. I have been searching and reading but cannot figure out how to deduplicate this list. I figured out how to sort this list using a lambda function. I realize I could sort the list, then iterate and delete duplicates. I'm thinking there must be a more elegant solution.
Thank you!
The usernames are unique, right?
Does it have to be a list? Seems like an easy solution would be to make it a dict of dicts instead. Use the usernames as keys, and only the most recent version will be present.
If the values have to be ordered, there is an OrderedDict type you could look into: http://docs.python.org/2/library/collections.html#collections.OrderedDict
Let say it is what you got,
JSON = [
{
"name": "fred",
...
},
{
"name": "peter",
...
},
{
"name": "fred",
...
},
Convert this list of dict to a dict of dict will remove the duplicate, like so:
r = dict([(user['name'], user) for user in JSON])
In r you will only find one record of fred and peter each.

Categories