Python Cerberus embed numeric config data in schema - python

I have a set of documents and schemas I am doing validation against (shocker).
These documents are JSON messages from various different clients that use various different formats, thus a schema is defined for each document/message received from these clients.
I want to use a dispatcher (dictionary with function calls as values) to help perform the mapping/formatting of a document after it is validated against a matching schema.
Once I know the schema a message is valid against, I can then create the desired message payload for my various consumer services by calling the requisite mapping function.
To this end I need a key in my dispatcher which uniquely maps to its respective mapping function for that schema. The key also needs to be used to identify a schema so the correct mapping function can be called.
My question is this: Is there a way to embed a config value like a numeric ID into a schema?
I want to take this schema:
schema = {
"timestamp": {"type": "number"},
"values": {
"type": "list",
"schema": {
"type": "dict",
"schema": {
"id": {"required": True, "type": "string"},
"v": {"required": True, "type": "number"},
"q": {"type": "boolean"},
"t": {"required": True, "type": "number"},
},
},
},
}
And add a schema_id like this:
schema = {
"schema_id": 1,
"timestamp": {"type": "number"},
"values": {
"type": "list",
"schema": {
"type": "dict",
"schema": {
"id": {"required": True, "type": "string"},
"v": {"required": True, "type": "number"},
"q": {"type": "boolean"},
"t": {"required": True, "type": "number"},
},
},
},
}
So after successful validation, a link between message/document, to the schema via schema_id to the resulting mapping_function in the dispatcher is created.
Something like this:
mapping_dispatcher = {1: map_function_1, 2: map_function_2...}
if Validator.validate(document, schema) is True:
id = schema["schema_id"]
formatted_message = mapping_dispatcher[id](document)
A last ditch effort could be to simply stringify the json schemas and use those as keys but I'm not sure how I feel about that (it feels clever but wrong)...
I could also be going about this all wrong and there's a smarter way to do it.
Thanks!
small update
I've hacked around it by stringifying the schema, converting to bytes, then hex, then adding the integer values together like so:
schema_id = 0
bytes_schema = str.encode(schema)
hex_schema = codecs.encode(bytes_schema, "hex")
for char in hex_schema:
schema_id += int(char)
>>>schema_id
36832

So instead of a hash function I just embedded the schema in another json object that held the info like so:
[
{
"schema_id": "3",
"schema": {
"deviceName": {
"type": "string"
},
"tagName": {
"required": true,
"type": "string"
},
"deviceID": {
"type": "string"
},
"success": {
"type": "boolean"
},
"datatype": {
"type": "string"
},
"timestamp": {
"required": true,
"type": "number"
},
"value": {
"required": true,
"type": "number"
},
"registerId": {
"type": "string"
},
"description": {
"type": "string"
}
}
}
]
Was overthinking it I guess.

Related

how to define the intra-dependencies within JSON elements?

Let's say I have the following JSON which defines an API:
{"contract_name": "create_user",
"method": "POST",
"url": "/users",
"request_body": {
"username": {
"type": "email",
"options": "None",
"required": "true",
"unique":"true"
},
"role": {
"type": "string",
"options": [
"TESTER",
"RA_DEVICE"
],
"required": "true"
},
"userProfile": {
"type": "object",
"options": {
"TESTER": {
"firstName": {
"type": "string",
"options": "None",
"required": "true"
},
"lastName": {
"type": "string",
"options": "None",
"required": "true"
}
},
"RA_DEVICE": {
"deviceId": {
"type": "long",
"options": "None",
"required": "true",
"unique":"true"
}
}}
}
Now let's say I want to create API test-cases from generic python script(which can take any API definition and generate test-cases for it with some random values of 'type' of the element).
Problem: I want to define a condition between 'userProfile' and 'role' such that the 'userProfile' value should be based on the value of 'role'.
i.e. if my python script chooses "RA_DEVICE" as role then the value of
'userProfile' should correspond to "RA_DEVICE" in its options to be a random number representing 'deviceId'.
Kindly suggest a way to define relationship between 'role' and 'userProfile' with consideration that my python script to generate test-cases is to be generic(can generate test-cases for any API having intra-element dependencies).

ElasticSearch Parse Error

I am attempting to read JSON Data from a Network Port Scan and store these results in an ElasticSearch Index as a document. However, whenever I try to do this, I get a MapperParsingException error on the scan output results. In my mapping, I even tried to change the analysis to not_analyzed and no, but the error doesnt go away. Then, I figured that ES might be trying to interpret certain values as date values and attempted to set date_format to 0 or none. That led to a dead-end as well, with the mapping throwing an Unsupported option exception.
I have a dump of the values that I want to index in ElasticSearch here:
{
"protocol": "tcp",
"service": "ssh",
"state": "open",
"script_out": [
{
"output": "\n 1024 de:4e:50:33:cd:f6:8a:d0:c4:5a:e9:7d:1e:7b:13:12 (DSA)\nssh-dss AAAAB3NzaC1kc3MAAACBANkPx1nphZwsN1SVPPQHwz93abIHuEC4wMEeZiXdBC8RoSUUeCmdgPfIh4or0LvZ1pqaZP/k0qzCLyVxFt/eI7n36Lb9sZdVMf1Ao7E9TSc7lj9wg5ffY58WbWob/GQs1llGZ2K9Gp7oWuwCjKP164MsxMvahoJAAaWfap48ZiXpAAAAFQCnRMwRp8wBzzQU6lia8NegIb5rswAAAIEAxvN66VMDxE5aU8SvwwVmcUNwtVQWZ6pxn2W0gzF6H7JL1BhcnbCwQ3J/S6WdtqL2Dscw8drdAvsrN4XC8RT6Jowsir4q4HSQCybll6fSpNEdlv/nLIlYsH5ZuZZUIMxbTQ9vT0oYvzpDHejIQ/Zl1inYnJ+6XJmOc0LPUsu5PEsAAACAQO+Tsd3inLGskrqyrWSDO0VDD3cApYW7C+uTWXBfIoh/sVw+X9+OPa833w/PQkpacm68kYPXKS7GK8lqhg93dwbUNYFKz9MMNY6WVOjeAX9HtUAbglgLyRIt0CBqmL4snoZeKab22Nlmaf4aU5cHFlG9gnFEcK0vVIwIWp2EM/I=\n 2048 94:5f:86:77:81:39:2e:03:e0:42:d8:7d:10:a5:60:f0 (RSA)\nssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDV9BKj+QSavAr4UcDaCoHADVIaMOpcI5/hx/X9CRLDTxmB/WvEiL42tziMZEx7ipHT28/hl4HOwK64eXZuK75JMrMDutCZ2gmvRmvFKl6mAVbUEOlVkMGZeNJxATCZyWQyrZ6wA9E2ns5+id6l9C8we+bdq39cIR/e+yR8Ht8sfaigDi0gcW67GrHDI/oIgTQ79l+T/xAqCVrtQxqn/6pCuaCWQUVCxgOPXmJPbsd+g+oqZtm0aEjIJvcDJocMkZ2qMMlgMPeJBN27FCTKB80UUbV57iHXHzZF+cD7v+Jlw0fmyMapMkkPH+aabOUy7Kkbty1mucrFxaisLsckEf47",
"elements": {
"null": [
{
"type": "ssh-dss",
"bits": "1024",
"key": "AAAAB3NzaC1kc3MAAACBANkPx1nphZwsN1SVPPQHwz93abIHuEC4wMEeZiXdBC8RoSUUeCmdgPfIh4or0LvZ1pqaZP/k0qzCLyVxFt/eI7n36Lb9sZdVMf1Ao7E9TSc7lj9wg5ffY58WbWob/GQs1llGZ2K9Gp7oWuwCjKP164MsxMvahoJAAaWfap48ZiXpAAAAFQCnRMwRp8wBzzQU6lia8NegIb5rswAAAIEAxvN66VMDxE5aU8SvwwVmcUNwtVQWZ6pxn2W0gzF6H7JL1BhcnbCwQ3J/S6WdtqL2Dscw8drdAvsrN4XC8RT6Jowsir4q4HSQCybll6fSpNEdlv/nLIlYsH5ZuZZUIMxbTQ9vT0oYvzpDHejIQ/Zl1inYnJ+6XJmOc0LPUsu5PEsAAACAQO+Tsd3inLGskrqyrWSDO0VDD3cApYW7C+uTWXBfIoh/sVw+X9+OPa833w/PQkpacm68kYPXKS7GK8lqhg93dwbUNYFKz9MMNY6WVOjeAX9HtUAbglgLyRIt0CBqmL4snoZeKab22Nlmaf4aU5cHFlG9gnFEcK0vVIwIWp2EM/I=",
"fingerprint": "de4e5033cdf68ad0c45ae97d1e7b1312"
},
{
"type": "ssh-rsa",
"bits": "2048",
"key": "AAAAB3NzaC1yc2EAAAADAQABAAABAQDV9BKj+QSavAr4UcDaCoHADVIaMOpcI5/hx/X9CRLDTxmB/WvEiL42tziMZEx7ipHT28/hl4HOwK64eXZuK75JMrMDutCZ2gmvRmvFKl6mAVbUEOlVkMGZeNJxATCZyWQyrZ6wA9E2ns5+id6l9C8we+bdq39cIR/e+yR8Ht8sfaigDi0gcW67GrHDI/oIgTQ79l+T/xAqCVrtQxqn/6pCuaCWQUVCxgOPXmJPbsd+g+oqZtm0aEjIJvcDJocMkZ2qMMlgMPeJBN27FCTKB80UUbV57iHXHzZF+cD7v+Jlw0fmyMapMkkPH+aabOUy7Kkbty1mucrFxaisLsckEf47",
"fingerprint": "945f867781392e03e042d87d10a560f0"
}
]
},
"id": "ssh-hostkey"
}
],
"banner": "product: OpenSSH version: 6.2 extrainfo: protocol 2.0",
"port": "22"
},
Update
I am able to index the content in the "output" key. However, the error appears when I try and index the content in the "elements" key
Update 2
There's a possibility that there's something wrong with my mapping. This is the python code that I am using for the mapping.
"scan_info": {
"properties": {
"protocol": {
"type": "string",
"index": "analyzed"
},
"service": {
"type": "string",
"index": "analyzed"
},
"state": {
"type": "string",
"index": "not_analyzed"
},
"banner": {
"type": "string",
"index": "analyzed"
},
"port": {
"type": "string",
"index": "not_analyzed"
},
"script_out": { #is this the problem??
"type": "object",
"dynamic": True
}
}
}
I am drawing a blank here. What do I need to do?

Is there a way to use JSON schemas to enforce values between fields?

I've recently started playing with JSON schemas to start enforcing API payloads. I'm hitting a bit of a roadblock with defining the schema for a legacy API that has some pretty kludgy design logic which has resulted (along with poor documentation) in clients misusing the endpoint.
Here's the schema so far:
{
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {
"type": "string"
},
"object_id": {
"type": "string"
},
"question_id": {
"type": "string",
"pattern": "^-1|\\d+$"
},
"question_set_id": {
"type": "string",
"pattern": "^-1|\\d+$"
},
"timestamp": {
"type": "string",
"format": "date-time"
},
"values": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"type",
"object_id",
"question_id",
"question_set_id",
"timestamp",
"values"
],
"additionalProperties": false
}
}
Notice that for question_id and question_set_id, they both take a numeric string that can either be a -1 or some other non-negative integer.
My question: is there a way to enforce that if question_id is set to -1, that question_set_id is also set to -1 and vice-versa.
It would be awesome if I could have that be validated by the parser rather than having to do that check in application logic.
Just for additional context, I've been using python's jsl module to generate this schema.
You can achieve the desired behavior by adding the following to your items schema. It asserts that the schema must conform to at least one of the schemas in the list. Either both are "-1" or both are positive integers. (I assume you have good reason for representing integers as strings.)
"anyOf": [
{
"properties": {
"question_id": { "enum": ["-1"] },
"question_set_id": { "enum": ["-1"] }
}
},
{
"properties": {
"question_id": {
"type": "string",
"pattern": "^\\d+$"
},
"question_set_id": {
"type": "string",
"pattern": "^\\d+$"
}
}
}

Python Validictory (JSON schema validation) constraints depending on value given with the JSON string

I am trying to create a JSON Schema validator which decides if one value is valid based on another in the given JSON
schema = {
"foo": {"type": "boolean"},
"bar": {
"type": "object", "properties": {
"measurements":
{ "type": "object", "properties": {
"x" : { "type": "number"}, #"required": "ONLY IF FOO is TRUE"
"y" : { "type": "number"}
}
}
}
Is that somehow possible?

How to make json-schema to allow one but not another field?

Is it possible to make jsonschema to have only one of two fields.
For example, image if I want to have a JSON with ether start_dt or end_dt but not both of them at the same time. like this:
OK
{
"name": "foo",
"start_dt": "2012-10-10"
}
OK
{
"name": "foo",
"end_dt": "2012-10-10"
}
NOT OK
{
"name": "foo",
"start_dt": "2012-10-10"
"end_dt": "2013-11-11"
}
What should I add to the schema:
{
"title": "Request Schema",
"type": "object",
"properties": {
"name":
{
"type": "string"
},
"start_dt":
{
"type": "string",
"format": "date"
},
"end_dt":
{
"type": "string",
"format": "date"
}
}
}
You can express this using oneOf. This means that the data must match exactly one of the supplied sub-schemas, but not more than one.
Combining this with required, this schema says that instances must either define start_dt, OR define end_dt - but if they contain both, then it is invalid:
{
"type": "object",
"properties": {
"name": {"type": "string"},
"start_dt": {"type": "string", "format": "date"},
"end_dt": {"type": "string", "format": "date"}
},
"oneOf": [
{"required": ["start_dt"]},
{"required": ["end_dt"]}
]
}
Online demos with your three examples:
OK
OK
NOT OK

Categories