JSON Schema: Validate that exactly one property is present - python

I would like validate a JSON structure in which either the userId key or the appUserId key must be present (exactly one of them - not both).
For example,
{ "userId": "X" }
{ "appUserId": "Y" }
Are valid, but:
{ "userId": "X", "appUserId": "Y"}
{ }
Are not.
How can I validate this condition using a JSON Schema? I have tried the oneOf keyword, but it works for values, not keys.

This works for me:
from jsonschema import validate
schema = {
"type" : "object",
"properties" : {
"userId": {"type" : "number"},
"appUserId": {"type" : "number"},
},
"oneOf": [
{
"type": "object",
"required": ["userId"],
},
{
"type": "object",
"required": ["appUserId"],
}
],
}
validate({'userId': 1}, schema) # Ok
validate({'appUserId': 1}, schema) # Ok
validate({'userId': 1, 'appUserId': 1}, schema) # ValidationError

I would use a combination of min/maxProperties and additionalProperties in schema:
{
"type" : "object",
"properties" : {
"userId": { "type": "string" },
"appUserId": { "type": "string" },
},
"maxProperties": 1,
"minProperties": 1,
"additionalProperties": false
}
// invalid cases
{ }
{ "userId": "111", "appUserId": "222" }
{ "anotherUserId": "333" }
// valid cases
{ "userId": "111" }
{ "appUserId": "222" }

Related

ElasticSearch: Retrieve field and it's normalization

I want to retrieve a field as well as it's normalized version from Elasticsearch.
Here's my index definition and data
PUT normalizersample
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"refresh_interval": "60s",
"analysis": {
"normalizer": {
"my_normalizer": {
"filter": [
"lowercase",
"german_normalization",
"asciifolding"
],
"type": "custom"
}
}
}
},
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"myField": {
"type": "text",
"store": true,
"fields": {
"keyword": {
"type": "keyword",
"store": true
},
"normalized": {
"type": "keyword",
"store": true,
"normalizer": "my_normalizer"
}
}
}
}
}
}
POST normalizersample/_doc/1
{
"myField": ["Andreas", "Ämdreas", "Anders"]
}
My first approach was to use scripted fields like
GET /myIndex/_search
{
"size": 100,
"query": {
"match_all": {}
},
"script_fields": {
"keyword": {
"script": "doc['myField.keyword']"
},
"normalized": {
"script": "doc['myField.normalized']"
}
}
}
However, since myField is an array, this returns two lists of strings per ES document and each of them are sorted alphabetically. Hence, the corresponding entries might not match to each other due to the normalization.
"hits" : [
{
"_index" : "normalizersample",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"de" : [
"amdreas",
"anders",
"andreas"
],
"keyword" : [
"Anders",
"Andreas",
"Ämdreas"
]
}
}
]
While I would like to retrieve [(Andreas, andreas), (Ämdreas, amdreas) (Anders, anders)] or a similar format where I can match every entry to its normalization.
The only way I found was to call Term Vectors on both fields since they contain a position field, but this seems like a huge overhead to me. (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html)
Is there a simpler way to retrieve tuples with the keyword and the normalized field?
Thanks a lot!

How to require data in the JSON list?

how does one require data in a list? How does one do this and specify the type e.g. dictionary?
Below are two JSON samples and a schema. Both JSON samples are valid according to the schema. The sample with the empty list should fail validation IMHO. How do I make that happen?
from jsonschema import validate
# this is ok per the schema
{
"mylist":[
{
"num_items":8,
"freq":8.5,
"other":2
},
{
"num_items":8,
"freq":8.5,
"other":4
}
]
}
# this should fail validation, but does not.
{
"mylist":[
]
}
# schema
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"properties": {
"mylist": {
"type": "array",
"items": {
"type": "object",
"properties": {
"num_items": {
"type": "integer"
},
"freq": {
"type": "number"
},
"other": {
"type": "integer"
}
},
"required": [
"freq",
"num_items",
"other"
]
}
}
},
"required": [
"mylist"
]
}

How to add data to a topic using AvroProducer

I have a topic with the following schema. Could someone help me out on how to add data to the different fields.
{
"name": "Project",
"type": "record",
"namespace": "abcdefg",
"fields": [
{
"name": "Object",
"type": {
"name": "Object",
"type": "record",
"fields": [
{
"name": "Number_ID",
"type": "int"
},
{
"name": "Accept",
"type": "boolean"
}
]
}
},
{
"name": "DataStructureType",
"type": "string"
},
{
"name": "ProjectID",
"type": "string"
}
]
}
I tried the following code. I get list is not iterable or list is out of range.
from confluent_kafka import avro
from confluent_kafka.avro import AvroProducer
AvroProducerConf = {'bootstrap.servers': 'localhost:9092','schema.registry.url': 'http://localhost:8081'}
value_schema = avro.load('project.avsc')
avroProducer = AvroProducer(AvroProducerConf, default_value_schema = value_schema)
while True:
avroProducer.produce(topic = 'my_topic', value = {['Object'][0] : "value", ['Object'] [1] : "true", ['DataStructureType'] : "testvalue", ['ProjectID'] : "123"})
avroProducer.flush()
It's not clear what you're expecting something like this to do... ['Object'][0] and keys of a dict cannot be lists.
Try sending this, which matches your Avro schema
value = {
'Object': {
"Number_ID", 1,
"Accept": true
},
'DataStructureType' : "testvalue",
'ProjectID' : "123"
}

JSON Schema: How to check if a field contains a value

I have a JSON schema validator where I need to check a specific field email to see if it's one of 4 possible emails. Lets call the possibilities ['test1', 'test2', 'test3', 'test4']. Sometimes the emails contain a \n new line separator so I need to account for that also. Is it possible to do a string contains method in JSON Schema?
Here is my schema without the email checks:
{
"type": "object",
"properties": {
"data": {
"type":"object",
"properties": {
"email": {
"type": "string"
}
},
"required": ["email"]
}
}
}
My input payload is:
{
"data": {
"email": "test3\njunktext"
}
}
I would need the following payload to pass validation since it has test3 in it. Thanks!
I can think of two ways:
Using enum you can define a list of valid emails:
{
"type": "object",
"properties": {
"data": {
"type": "object",
"properties": {
"email": {
"enum": [
"test1",
"test2",
"test3"
]
}
},
"required": [
"email"
]
}
}
}
Or with pattern which allows you to use a regular expression for matching a valid email:
{
"type": "object",
"properties": {
"data": {
"type": "object",
"properties": {
"email": {
"pattern": "test"
}
},
"required": [
"email"
]
}
}
}

Add #timestamp field in ElasticSearch with Python

I'm using Python to add entries in a local ElasticSearch (localhost:9200)
Currently, I use this method:
def insertintoes(data):
"""
Insert data into ElasicSearch
:param data: dict
:return:
"""
timestamp = data.get('#timestamp')
logstashIndex = 'logstash-' + timestamp.strftime("%Y.%m.%d")
es = Elasticsearch()
if not es.indices.exists(logstashIndex):
# Setting mappings for index
mapping = '''
{
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"norms": false
},
"dynamic_templates": [
{
"message_field": {
"path_match": "message",
"match_mapping_type": "string",
"mapping": {
"norms": false,
"type": "text"
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"fields": {
"keyword": {
"type": "keyword"
}
},
"norms": false,
"type": "text"
}
}
}
],
"properties": {
"#timestamp": {
"type": "date",
"include_in_all": true
},
"#version": {
"type": "keyword",
"include_in_all": true
}
}
}
}
}
'''
es.indices.create(logstashIndex, ignore=400, body=mapping)
es.index(index=logstashIndex, doc_type='system', timestamp=timestamp, body=data)
data is a dict structure with a valid #timestamp defined like this data['#timestamp'] = datetime.datetime.now()
The problem is, even if there is a timestamp value in my data, Kibana doesn't show the entry in «discovery» field. :(
Here is an example of a full entry in ElasicSearch:
{
"_index": "logstash-2017.06.25",
"_type": "system",
"_id": "AVzf3QX3iazKBndbIkg4",
"_score": 1,
"_source": {
"priority": 6,
"uid": 0,
"gid": 0,
"systemd_slice": "system.slice",
"cap_effective": "1fffffffff",
"exe": "/usr/bin/bash",
"hostname": "ns3003395",
"syslog_facility": 9,
"comm": "crond",
"systemd_cgroup": "/system.slice/cronie.service",
"systemd_unit": "cronie.service",
"syslog_identifier": "CROND",
"message": "(root) CMD (/usr/local/rtm/bin/rtm 14 > /dev/null 2> /dev/null)",
"systemd_invocation_id": "9228b6c72e6a4624a1806e4c59af8d04",
"syslog_pid": 26652,
"pid": 26652,
"#timestamp": "2017-06-25T17:27:01.734453"
}
}
As you can see, there IS a #timestamp field but it doesn't seems to be what Kibana expects.
And don't know what to do to make my entries visible in Kibana.
Any idea ?
Elasticsearch is not recognizing #timestamp as a date, but as a string. If your data['#timestamp'] is a datetime object, you can try to convert it to a ISO string, which is automatically recognized, try:
timestamp = data.get('#timestamp').isoformat()
timestamp should now be a string, but in ISO format

Categories