How to require data in the JSON list? - python

how does one require data in a list? How does one do this and specify the type e.g. dictionary?
Below are two JSON samples and a schema. Both JSON samples are valid according to the schema. The sample with the empty list should fail validation IMHO. How do I make that happen?
from jsonschema import validate
# this is ok per the schema
{
"mylist":[
{
"num_items":8,
"freq":8.5,
"other":2
},
{
"num_items":8,
"freq":8.5,
"other":4
}
]
}
# this should fail validation, but does not.
{
"mylist":[
]
}
# schema
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"properties": {
"mylist": {
"type": "array",
"items": {
"type": "object",
"properties": {
"num_items": {
"type": "integer"
},
"freq": {
"type": "number"
},
"other": {
"type": "integer"
}
},
"required": [
"freq",
"num_items",
"other"
]
}
}
},
"required": [
"mylist"
]
}

Related

Python: JSON to CSV

I am receiving a JSON file from a Docparser API, which I would like to convert to a CSV document.
The structure is here below:
{
"type": "object",
"properties": {
"id": {
"type": "string"
},
"document_id": {
"type": "string"
},
"remote_id": {
"type": "string"
},
"file_name": {
"type": "string"
},
"page_count": {
"type": "integer"
},
"uploaded_at": {
"type": "string"
},
"processed_at": {
"type": "string"
},
"table_data": [
{
"type": "array",
"items": {
"type": "object",
"properties": {
"account_ref": {
"type": "string"
},
"client": {
"type": "string"
},
"transaction_type": {
"type": "string"
},
"key_4": {
"type": "string"
},
"date_yyyymmdd": {
"type": "string"
},
"amount_excl": {
"type": "string"
}
},
"required": [
"account_ref",
"client",
"transaction_type",
"key_4",
"date_yyyymmdd",
"amount_excl"
]
}
}
]
}
}
The first problem that I have is how to only work with the table_data section?
My second problem is writing the actual code that allows me to put each section, i.e. account_ref, client, etc., into their own columns. I had so many changes to my code, the output varied from adding the properties into columns and dumping the table_data part into one cell, to only printing the headers into a single cell (as a list).
Here's my current code (which is not working correctly):
import pydocparser
import json
import pandas as pd
parser = pydocparser.Parser()
parser.login('API')
data2 = str(parser.fetch("Name of Parser", 'documentID'))
data2 = str(data2).replace("'", '"') # I had to put this in because it kept saying that it needs double quotes.
y = json.loads(str(data2))
json_file = open(r"C:\File.json", "w")
json_file.write(str(y))
json_file.close()
df1 = df = pd.DataFrame({str(y)})
df1.to_csv(r"C:\jsonCSV.csv")
Thanks for your help!
Pandas has a nice built in function called pandas.json_noramlize()
If you're using pandas version lower then 1.0.0 use pandas.io.json.json_normalize(), it should split the columns nicely.
read more about it here:
>1.0.0:
https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.io.json.json_normalize.html
=<1.0.0
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html

How do I apply a subschema to an value when the key is unknown with JSON Schema?

Got a bit of a puzzle here, I'm trying to build a schema to use in my python app, But I can't figure out how to get this "we" field to be both required and contain a random string (ex: "QWERT1")
{
"we": [
{
"finished": "01.23.2020 12:56:31",
"run": "02611",
"scenarios": [
{
"name": "name",
"status": "failed",
"run_id": "42",
"tests": [
{
"test_id": "7",
"name": "TC29",
"status": "success",
"finished": "01.23.2020 12:56:31"
}
]
}
]
}
]
}
Rest of the fields should be also mandatory (name, status etc). If I exclude the "we" from the required the rest of the fields are treated as non-mandatory, and if I add the "we" as mandatory I can't then use there any other word :/
This my schema I've ended up with (with "we" mandatory):
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"we": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"finished": {
"type": "string"
},
"run": {
"type": "string"
},
"scenarios": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"status": {
"type": "string"
},
"run_id": {
"type": "string"
},
"tests": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"test_id": {
"type": "string"
},
"name": {
"type": "string"
},
"status": {
"type": "string"
},
"finished": {
"type": "string"
}
},
"required": [
"test_id",
"name",
"status",
"finished"
]
}
]
}
},
"required": [
"name",
"status",
"run_id",
"tests"
]
}
]
}
},
"required": [
"finished",
"run",
"scenarios"
]
}
]
}
},
"required": [
"we"
]
}
Any ideas ?
If I understand correctly, the root object key could be any string.
First, you need to replace required with minProperties: 1. If you require only 1 property, you also need maxProperties: 1.
Next, you need to use additionalProperties rather than properties > we.
additionalProperties applies the value subschema to all property values at the JSON instance location object.
Here's a bare version of that schema...
{
"$schema": "http://json-schema.org/draft-07/schema#",
"minProperties": 1,
"additionalProperties": {}
}
You can test it with your schema and instance here: https://jsonschema.dev/s/2kE9y

JSON Schema throw validation error if invalid optional attribute

I have a json schema as shown below which has three properties height,weight and volume which are optional. But I want to do following additional check here:
If any other attributes apart from height,weight and volume is passed then it should throw an error
Not sure how to achieve this since these are optional attributes.
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"options": {
"type": "object",
"properties": {
"height": {
"type": "number"
},
"weight": {
"type": "number"
},
"volume": {
"type": "number"
}
}
}
}
}
What you're looking for is the additionalProperties key. From JsonSchema docs
The additionalProperties keyword is used to control the handling of extra stuff, that is, properties whose names are not listed in the properties keyword. By default any additional properties are allowed.
So, this yould become:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"options": {
"type": "object",
"properties": {
"height": {
"type": "number"
},
"weight": {
"type": "number"
},
"volume": {
"type": "number"
}
},
"additionalProperties": false
}
}
}
From my understanding, this is supported since draft 00, so it should be ok with draft 4, but just for you to know, the 8th version is here.

JSON Schema: How to check if a field contains a value

I have a JSON schema validator where I need to check a specific field email to see if it's one of 4 possible emails. Lets call the possibilities ['test1', 'test2', 'test3', 'test4']. Sometimes the emails contain a \n new line separator so I need to account for that also. Is it possible to do a string contains method in JSON Schema?
Here is my schema without the email checks:
{
"type": "object",
"properties": {
"data": {
"type":"object",
"properties": {
"email": {
"type": "string"
}
},
"required": ["email"]
}
}
}
My input payload is:
{
"data": {
"email": "test3\njunktext"
}
}
I would need the following payload to pass validation since it has test3 in it. Thanks!
I can think of two ways:
Using enum you can define a list of valid emails:
{
"type": "object",
"properties": {
"data": {
"type": "object",
"properties": {
"email": {
"enum": [
"test1",
"test2",
"test3"
]
}
},
"required": [
"email"
]
}
}
}
Or with pattern which allows you to use a regular expression for matching a valid email:
{
"type": "object",
"properties": {
"data": {
"type": "object",
"properties": {
"email": {
"pattern": "test"
}
},
"required": [
"email"
]
}
}
}

Is there a way to use JSON schemas to enforce values between fields?

I've recently started playing with JSON schemas to start enforcing API payloads. I'm hitting a bit of a roadblock with defining the schema for a legacy API that has some pretty kludgy design logic which has resulted (along with poor documentation) in clients misusing the endpoint.
Here's the schema so far:
{
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {
"type": "string"
},
"object_id": {
"type": "string"
},
"question_id": {
"type": "string",
"pattern": "^-1|\\d+$"
},
"question_set_id": {
"type": "string",
"pattern": "^-1|\\d+$"
},
"timestamp": {
"type": "string",
"format": "date-time"
},
"values": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"type",
"object_id",
"question_id",
"question_set_id",
"timestamp",
"values"
],
"additionalProperties": false
}
}
Notice that for question_id and question_set_id, they both take a numeric string that can either be a -1 or some other non-negative integer.
My question: is there a way to enforce that if question_id is set to -1, that question_set_id is also set to -1 and vice-versa.
It would be awesome if I could have that be validated by the parser rather than having to do that check in application logic.
Just for additional context, I've been using python's jsl module to generate this schema.
You can achieve the desired behavior by adding the following to your items schema. It asserts that the schema must conform to at least one of the schemas in the list. Either both are "-1" or both are positive integers. (I assume you have good reason for representing integers as strings.)
"anyOf": [
{
"properties": {
"question_id": { "enum": ["-1"] },
"question_set_id": { "enum": ["-1"] }
}
},
{
"properties": {
"question_id": {
"type": "string",
"pattern": "^\\d+$"
},
"question_set_id": {
"type": "string",
"pattern": "^\\d+$"
}
}
}

Categories