No enum error when validating JSON using jsonschema in python - python

First of all, I am not getting a proper error reponse on the web platform as well (https://jsonschemalint.com). I am using jsonschema in python, and have a proper json schema and json data that works.
The problem I'd like to solve is the following: Before we deliver JSON files with example data, we need to run them through SoapUI to test if they are proper, as we are dealing with huge files and usually our devs may make some errors in generating them, so we do the final check.
I'd like to create a script to automate this, avoiding SoapUI. So after googling, I came across jsonschema, and tried to use it. I get all the proper results,etc, I get errors when I delete certain elements as usual, but the biggest issues are the following:
Example :
I have a subsubsub object in my JSON schema, let's call it Test1, which contains the following :
**Schema**
{
"exname":"2",
"info":{},
"consumes":{},
"produces":{},
"schemes":{},
"tags":{},
"parameters":{},
"paths":{},
"definitions":{
"MainTest1":{
"description":"",
"minProperties":1,
"properties":{
"test1":{
"items":{
"$ref":"#//Test1"
},
"maxItems":10,
"minItems":1,
"type":"array"
},
"test2":{
"items":{
"$ref":"#//"
},
"maxItems":10,
"minItems":1,
"type":"array"
}
}
},
"Test1":{
"description":"test1des",
"minProperties":1,
"properties":{
"prop1":{
"description":"prop1des",
"example":"prop1exam",
"maxLength":10,
"minLength":2,
"type":"string"
},
"prop2":{
"description":"prop2des",
"example":"prop2example",
"maxLength":200,
"minLength":2,
"type":"string"
},
"prop3":{
"enum":[
"enum1",
"enum2",
"enum3"
],
"example":"enum1",
"type":"string"
}
},
"required":[
"prop3"
],
"type":"object"
}
}
}
**Proper example for Test1**
{
"Test1": [{
"prop1": "TestStr",
"prop2": "Test and Test",
"prop3": "enum1"
}]
}
**Improper example that still passes validation for Test1**
{
"test1": [{
"prop1": "TestStr123456", [wrong as it passes the max limit]
"prop2": "Test and Test",
"prop3": " enum1" [wrong as it has a whitespace char before enum1]
}]
}
The first issue I ran across is that enum in prop3 isn't validated correctly. So, when I use " enum1" or "enumruwehrqweur" or "literally anything", the tests pass. In addition, that min-max characters do not get checked throughout my JSON. No matter how many characters I use in any field, I do not get an error. Anyone has any idea how to fix this, or has anyone found a better workaround to do what I would like to do? Thank you in advance!

There were a few issues with your schema. I'll address each of them.
In your schema, you have "Test1". In your JSON instance, you have "test1". Case is important. I would guess this is just an error in creating your example.
In your schema, you have "Test1" at the root level. Because this is not a schema key word, it is ignored, and has no effect on validation. You need to nest it inside a "properties" object, as you have done elsewhere.
{
"properties": {
"test1": {
Your validation would still not work correctly. If you want to validate each item in an array, you need to use the items keyword.
{
"properties": {
"test1": {
"items": {
"description": "test1des",
Finally, you'll need to nest the required and type key words inside the items object.
Here's the complete schema:
{
"properties": {
"test1": {
"items": {
"description": "test1des",
"minProperties": 1,
"properties": {
"prop1": {
"description": "prop1des",
"example": "prop1exam",
"maxLength": 10,
"minLength": 2,
"type": "string"
},
"prop2": {
"description": "prop2des",
"example": "prop2example",
"maxLength": 200,
"minLength": 2,
"type": "string"
},
"prop3": {
"enum": [
"enum1",
"enum2",
"enum3"
],
"example": "enum1",
"type": "string"
}
},
"required": [
"prop3"
],
"type": "object"
}
}
}
}

Related

Best practices when pulling API data that is being updated/changed repeatedly

currently I'm pulling data from the Quickbooks online API, parsing it and storing it into a database. My problem is, right now with the small amount of data I am pulling I am just deleting my db tables and repopulating the tables with the updated - is there any way I can do this more optimally?
QuickBooks provides an API that is exactly what you're looking for. It's called Change Data Capture and is a common pattern for time-based updates like you're describing.
If you refer to Intuit's docs they tell you all about it:
https://developer.intuit.com/app/developer/qbo/docs/learn/explore-the-quickbooks-online-api/change-data-capture
Basically you make requests like this, providing a date/time you want data changed since:
https://quickbooks.api.intuit.com/v3/company/<realmId>/cdc?entities=<entityList>&changedSince=<dateTime>
And you get back a list of changed objects that you can then update your local database with:
{
"CDCResponse": [{
"QueryResponse": [{
"Customer": [{
...
"Id": "63",
...
},
{
...
"Id": "99",
...
}],
"startPosition": 1,
"maxResults": 2
},
{
"Estimate": [{
...
"Id": "34",
...
},
{
...
"Id": "123",
...
},
{
"domain": "QBO",
"status": "Deleted",
"Id": "979",
"MetaData": {
"LastUpdatedTime": "2015-12-23T12:55:50-08:00"
}
}],
"startPosition": 1,
"maxResults": 3,
"totalCount": 5
}]
}],
"time": "2015-12-23T10:00:01-07:00"
}

JSON EXtraction in Python

I am trying to extract a specific part of the JSON but I keep on getting errors.
I am interested in the following sections:
"field": "tag",
"value": "Wian",
I can extract the entire filter section using:
for i in range(0,values_num):
dedata[i]['filter']
But if I try to filter beyond that point I just get errors.
Could someone please assist me with this?
Here is the JSON output style:
{
"mod_time": 1594631137499,
"description": "",
"id": 82,
"name": "Wian",
"include_custom_devices": true,
"dynamic": true,
"field": null,
"value": null,
"filter": {
"rules": [
{
"field": "tag",
"operand": {
"value": "Wian",
"is_regex": false
},
"operator": "~"
}
],
"operator": "and"
}
}
You are probably trying to access the data in rules but since its an array, you have to specifically access that array by getting the [0] index.
You could simplistically just use .get('<name>') as shown below:
dedata['filter']['rules'][0].get('field'))
Likewise for value:
dedata[i]['filter']['rules'][0]['operand'].get('value')
comment out the for loop and try without it and [i] and see if it works

Is there a way to use JSON schemas to enforce values between fields?

I've recently started playing with JSON schemas to start enforcing API payloads. I'm hitting a bit of a roadblock with defining the schema for a legacy API that has some pretty kludgy design logic which has resulted (along with poor documentation) in clients misusing the endpoint.
Here's the schema so far:
{
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {
"type": "string"
},
"object_id": {
"type": "string"
},
"question_id": {
"type": "string",
"pattern": "^-1|\\d+$"
},
"question_set_id": {
"type": "string",
"pattern": "^-1|\\d+$"
},
"timestamp": {
"type": "string",
"format": "date-time"
},
"values": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"type",
"object_id",
"question_id",
"question_set_id",
"timestamp",
"values"
],
"additionalProperties": false
}
}
Notice that for question_id and question_set_id, they both take a numeric string that can either be a -1 or some other non-negative integer.
My question: is there a way to enforce that if question_id is set to -1, that question_set_id is also set to -1 and vice-versa.
It would be awesome if I could have that be validated by the parser rather than having to do that check in application logic.
Just for additional context, I've been using python's jsl module to generate this schema.
You can achieve the desired behavior by adding the following to your items schema. It asserts that the schema must conform to at least one of the schemas in the list. Either both are "-1" or both are positive integers. (I assume you have good reason for representing integers as strings.)
"anyOf": [
{
"properties": {
"question_id": { "enum": ["-1"] },
"question_set_id": { "enum": ["-1"] }
}
},
{
"properties": {
"question_id": {
"type": "string",
"pattern": "^\\d+$"
},
"question_set_id": {
"type": "string",
"pattern": "^\\d+$"
}
}
}

Mongodb distinct count for multi fields With example

I am using pymongo MongoClient to do multiple fields distinct count.
I found a similar example here: Link
But it doesn't works for me.
For example, by given:
data = [{"name": random.choice(all_names),
"value": random.randint(1, 1000)} for i in range(1000)]
collection.insert(data)
I want to count how many name, value combination. So I followed the Link above, and write this just for a test (I know this solution is not what I want, I just follow the pattern of the Link, and trying to get understand how it works, at least this code can returns me somthing):
collection.aggregate([
{
"$group": {
"_id": {
"name": "$name",
"value": "$value",
}
}
},
{
"$group": {
"_id": {
"name": "$_id.name",
},
"count": {"$sum": 1},
},
}
])
But the console gives me this:
on namespace test.$cmd failed: exception: A pipeline stage
specification object must contain exactly one field.
So, what is the right code to do this? Thank you for all your helps.
Finally I found a solution: Group by Null
res = col.aggregate([
{
"$group": {
"_id": {
"name": "$name",
"value": "$value",
},
}
},
{
"$group": {"_id": None, "count": {"$sum": 1}}
},
])

python querying a json objectpath

I've a nested json structure, I'm using objectpath (python API version), but I don't understand how to select and filter some information (more precisely the nested information in the structure).
EG.
I want to select the "description" of the action "reading" for the user "John".
JSON:
{
"user":
{
"actions":
[
{
"name": "reading",
"description": "blablabla"
}
]
"name": "John"
}
}
CODE:
$.user[#.name is 'John' and #.actions.name is 'reading'].actions.description
but it doesn't work (empty set but in my JSON it isn't so).
Any suggestion?
Is this what you are trying to do?
import objectpath
data = {
"user": {
"actions": {
"name": "reading",
"description": "blablabla"
},
"name": "John"
}
}
tree = objectpath.Tree(data)
result = tree.execute("$.user[#.name is 'John'].actions[#.name is 'reading'].description")
for entry in result:
print entry
Output
blablabla
I had to fix your JSON. Also, tree.execute returns a generator. You could replace the for loop with print result.next(), but the for loop seemed more clear.
import objectpath import *
your_json = {"name": "felix", "last_name": "diaz"}
# This json path will bring all the key-values of your json
your_json_path='$.*'
my_key_values = Tree(your_json).execute(your_json_path)
# If you want to retrieve the name node...then specify it.
my_name= Tree(your_json).execute('$.name')
# If you want to retrieve a the last_name node...then specify it.
last_name= Tree(your_json).execute('$.last_name')
I believe you're just missing a comma in JSON:
{
"user":
{
"actions": [
{
"name": "reading",
"description": "blablabla"
}
],
"name": "John"
}
}
Assuming there is only one "John", with only one "reading" activity, the following query works:
$.user[#.name is 'John'].actions[0][#.name is 'reading'][0].description
If there could be multiple "John"s, with multiple "reading" activities, the following query will almost work:
$.user.*[#.name is 'John'].actions..*[#.name is 'reading'].description
I say almost because the use of .. will be problematic if there are other nested dictionaries with "name" and "description" entries, such as
{
"user": {
"actions": [
{
"name": "reading",
"description": "blablabla",
"nested": {
"name": "reading",
"description": "broken"
}
}
],
"name": "John"
}
}
To get a correct query, there is an open issue to correctly implement queries into arrays: https://github.com/adriank/ObjectPath/issues/60

Categories