I am developing a json schema and I am trying to test if files validate against it properly. Still new to the whole json schema world (since today), apologies if my terminology is not correct.
I have different types of files, and they will differ with regards to their biomaterial_type. Each of them should be tested for "#/definitions/basic", some of them for "#/definitions/donor", and they all will have unique fields to test for.
Here is a (shortened) example containing one biomaterial_type:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"definitions": {
"basic": {
"type": "object",
"description": "Objects shared across all samples",
"properties": {
"sample_ontology_uri" : {
"type": "array", "minItems": 1,
"items": {
"type": "string",
"format": "uri",
"description": "(Ontology: EFO) links to sample ontology information."}},
"disease_ontology_uri" : {
"type": "array", "minItems": 1,
"items": {
"type": "string",
"format": "uri",
"description": "(Ontology: NCIM)"}},
"disease" : {
"type": "array", "minItems": 1, "maxItems": 1,
"items": {
"type": "string",
"description": "Free form field "}},
"biomaterial_provider" : {
"type": "array", "minItems": 1, "maxItems": 1,
"items": {
"type": "string",
"description": "The name of the company, laboratory or person that provided the biological material."}},
"biomaterial_type" : {
"type": "array", "minItems": 1, "maxItems": 1,
"items": {
"type": "string",
"description": "The type of the biosample used (Cell Line, Primary Cell, Primary Cell Culture, Primary Tissue)",
"enum":["Cell Line", "Primary Cell", "Primary Cell Culture", "Primary Tissue"]}},
"treatment" : {
"type": "array", "minItems": 1, "maxItems": 1,
"items": {
"type": "string",
"description": "Any artificial modification (differentiation, activation, genome editing, etc)."}},
"biological_replicates": {
"type": "array",
"items": {
"type": "string",
"description": "List of biological replicate sample accessions"}}
},
"required": ["sample_ontology_curie", "disease_ontology_curie", "disease", "biomaterial_provider", "biomaterial_type", "treatment", "biological_replicates"]
},
"donor": {
"type": "object",
"description": "Additional set of properties for samples coming from a donor.",
"properties": {
"donor_id" : {
"type": "array", "minItems": 1, "maxItems": 1,
"items": {
"type": "string",
"description": "An identifying designation for the donor that provided the cells/tissues."}},
"donor_age" : {
"type": "array", "minItems": 1, "maxItems": 1,
"items": {
"description": "The age of the donor that provided the cells/tissues. NA if not available. If over 90 years enter as 90+. If entering a range of ages use the format “{age}-{age}”.",
"oneOf": [
{ "type": "number" },
{ "type": "string", "enum": ["90+", "NA"] },
{ "type": "string", "format": "uri" }
]
}},
"donor_age_unit" : {
"type": "array", "minItems": 1, "maxItems": 1,
"items": {
"type": "string",
"description": "The unit of measurement used to describe the age of the sample (year, month, week, day)",
"enum": ["year", "month", "week", "day"]}},
"donor_life_stage": {
"type": "array", "minItems": 1, "maxItems": 1,
"items": {
"type": "string",
"description": "The stage or phase of the donor when the sample was taken (embryonic, fetal, postnatal, newborn, child, adult, unknown)",
"enum": ["embryonic", "fetal", "postnatal", "newborn", "child", "adult", "unknown"]}},
"donor_health_status" : {
"type": "array", "minItems": 1, "maxItems": 1, "items": {
"type": "string",
"description": "The health status of the donor that provided the primary cell. NA if not available."}},
"donor_health_status_ontology_uri" : {
"type": "array", "minItems": 1,
"items": {
"type": "string",
"format": "uri",
"description": "(Ontology: NCIM) "}},
"donor_sex" : {"type": "array", "minItems": 1, "maxItems": 1, "items": {"type": "string", "enum": ["Male", "Female", "Unknown", "Mixed"], "description": "'Male', 'Female', 'Unknown', or 'Mixed' for pooled samples."}},
"donor_ethnicity" : {
"type": "array", "minItems": 1, "maxItems": 1,
"items": {
"type": "string",
"description": "The ethnicity of the donor that provided the primary cell. NA if not available. If dealing with small/vulnerable populations consider identifiability issues."}}
},
"required": ["donor_id", "donor_age", "donor_age_unit", "donor_life_stage", "donor_health_status_uri", "donor_health_status", "donor_sex", "donor_ethnicity"]
}
},
"type" : "object",
"if":
{"properties":
{ "biomaterial_type": {"const": "Primary Tissue"}},
"required": ["biomaterial_type"] },
"then": {
"allOf": [
{ "$ref": "#/definitions/donor" },
{
"properties": {
"tissue_type" : {
"type": "array", "minItems": 1, "maxItems": 1,
"items": {
"type": "string",
"description": "The type of tissue."}},
"tissue_depot" : {
"type": "array", "minItems": 1, "maxItems": 1,
"items": {
"type": "string",
"description": "Details about the anatomical location from which the primary tissue was collected."}},
"collection_method" : {
"type": "array", "minItems": 1, "maxItems": 1,
"items": {
"type": "string",
"description": "The protocol for collecting the primary tissue."}}
},
"required": ["tissue_type", "tissue_depot", "collection_method"]
}
]
}
}
Additional biomaterial_type will be added via additional if-conditions.
Here is a example json:
{
"SAMPLE_SET": {
"SAMPLE": [
{
"TITLE": "Homo sapiens male embryo (108 days) small intestine tissue",
"SAMPLE_NAME": {
"TAXON_ID": "9606",
"SCIENTIFIC_NAME": "Homo sapiens",
"COMMON_NAME": "human"
},
"SAMPLE_ATTRIBUTES": {
"SAMPLE_ATTRIBUTE": [
{
"TAG": "SAMPLE_ONTOLOGY_URI",
"VALUE": "http://purl.obolibrary.org/obo/UBERON:0002108"
},
{
"TAG": "DISEASE_ONTOLOGY_URI",
"VALUE": "https://ncit.nci.nih.gov/ncitbrowser/ConceptReport.jsp?dictionary=NCI_Thesaurus&code=C115935"
},
{
"TAG": "DISEASE",
"VALUE": "Healthy"
},
{
"TAG": "BIOMATERIAL_PROVIDER",
"VALUE": "Ian Glass at Congenital Defects Lab, University of Washington"
},
{
"TAG": "BIOMATERIAL_TYPE",
"VALUE": "Primary Tissue"
},
{
"TAG": "TISSUE_TYPE",
"VALUE": "small intestine"
},
{
"TAG": "TISSUE_DEPOT",
"VALUE": "Ian Glass at Congenital Defects Lab, University of Washington"
},
{
"TAG": "COLLECTION_METHOD",
"VALUE": "unknown"
},
{
"TAG": "DONOR_ID",
"VALUE": "ENCDO119ASK"
},
{
"TAG": "DONOR_AGE",
"VALUE": "NA"
},
{
"TAG": "DONOR_AGE_UNIT",
"VALUE": "day"
},
{
"TAG": "DONOR_LIFE_STAGE",
"VALUE": "embryonic"
},
{
"TAG": "DONOR_HEALTH_STATUS_ONTOLOGY_URI",
"VALUE": "https://ncit.nci.nih.gov/ncitbrowser/ConceptReport.jsp?dictionary=NCI_Thesaurus&code=C115935"
},
{
"TAG": "DONOR_HEALTH_STATUS",
"VALUE": "Healthy"
},
{
"TAG": "DONOR_SEX",
"VALUE": "Male"
},
{
"TAG": "DONOR_ETHNICITY",
"VALUE": "NA"
}
]
},
"_accession": "ENCBS054KUO",
"_center_name": "ENCODE"
},
]
}
}
I am trying to test if schema makes sense using jsonschema with python:
import json
import jsonschema
from jsonschema import validate
data = ''
schema = ''
with open('data.json', 'r') as file:
data = file.read()
with open(schema.json, 'r') as file:
schema = file.read()
try:
jsonschema.validate(json.loads(data), json.loads(schema))
print('ok')
except jsonschema.ValidationError as e:
print (e.message)
except jsonschema.SchemaError as e:
print (e)
I always get "ok", even if I provide json data with errors.
Is the problem with my Python script or with my schema?
Thanks for any pointers.
Related
I have a geojson feature collection dataset with a lot of features. I want to add/update the properties of each feature with properties of a json file. The unique identifier of both datasets is the "uuid" value.
This is the geojson format:
mtx = {
"type": "FeatureCollection",
"crs": {
"type": "name",
"properties": {
"name": "EPSG:4326"
}
},
"features": [
{
"type": "Feature",
"id": 1,
"geometry": {
"type": "Point",
"coordinates": [
5.36516933279853,
51.5510854507331
]
},
"properties": {
"OBJECTID": 1,
"PK_UID": 1,
"uuid": "1efa8916-c854-465b-80f5-1f02fd25fb31",
"road": "A2",
"lane": 1,
"km": 134.96,
"bearing": 148.02261,
"locid": "A2134.96"
}
},
{
"type": "Feature",
"id": 2,
"geometry": {
"type": "Point",
"coordinates": [
5.05380200345974,
52.3264095459638
]
},
"properties": {
"OBJECTID": 2,
"PK_UID": 2,
"uuid": "73bf3758-6754-433f-9896-d03c0673ae55",
"road": "A1",
"lane": 3,
"km": 11.593,
"bearing": 113.404253,
"locid": "A111.593"
}
}
]
}
And this is the json format:
msi= [
{
"uuid": "1efa8916-c854-465b-80f5-1f02fd25fb31",
"road": "A2",
"carriageway": "R",
"lane": "1",
"km": "134.960",
"display": "blank",
"display_attrs": "{'flashing': 'false'}",
"speedlimit": "null"
},
{
"uuid": "73bf3758-6754-433f-9896-d03c0673ae55",
"road": "A1",
"carriageway": "R",
"lane": "3",
"km": "11.593",
"display": "blank",
"display_attrs": "{'flashing': 'false'}",
"speedlimit": "null"
}
]
So how can I make a python script that loop through the geojson features and update each feature properties with the matching properties from the json based on the "uuid" value?
I tried something like this but this didn't give me the expected result:
#Loop over GeoJSON features and update the new properties from msi json
for feat in mtx['features']:
for i in range(len(msi)):
if mtx['features'][i]['properties']['uuid'] == msi[i]['uuid']:
feat ['properties'].update(msi[i])
Thanks for helping me out.
I'd do something like this (completely untested, so beware of mistakes...):
# Turn msi into a dictionary for easy access
msi_dict = {m['uuid']: m for m in msi}
for f in mtx['features']:
f['properties'].update(msi_dict.get(f['properties']['uuid'], {}))
I am using the Python 3 avro_validator library.
The schema I want to validate references other schemas in sperate avro files. The files are in the same folder. How do I compile all the referenced schemas using the library?
Python code as follows:
from avro_validator.schema import Schema
schema_file = 'basketEvent.avsc'
schema = Schema(schema_file)
parsed_schema = schema.parse()
data_to_validate = {"test": "test"}
parsed_schema.validate(data_to_validate)
The error I get back:
ValueError: Error parsing the field [contentBasket]: The type [ContentBasket] is not recognized by Avro
And example Avro file(s) below:
basketEvent.avsc
{
"type": "record",
"name": "BasketEvent",
"doc": "Indicates that a user action has taken place with a basket",
"fields": [
{
"default": "basket",
"doc": "Restricts this event to having type = basket",
"name": "event",
"type": {
"name": "BasketEventType",
"symbols": ["basket"],
"type": "enum"
}
},
{
"default": "create",
"doc": "What is being done with the basket. Note: create / delete / update will always follow a product event",
"name": "action",
"type": {
"name": "BasketEventAction",
"symbols": ["create","delete","update","view"],
"type": "enum"
}
},
{
"default": "ContentBasket",
"doc": "The set of values that are specific to a Basket event",
"name": "contentBasket",
"type": "ContentBasket"
},
{
"default": "ProductDetail",
"doc": "The set of values that are specific to a Product event",
"name": "productDetail",
"type": "ProductDetail"
},
{
"default": "Timestamp",
"doc": "The time stamp for the event being sent",
"name": "timestamp",
"type": "Timestamp"
}
]
}
contentBasket.avsc
{
"name": "ContentBasket",
"type": "record",
"doc": "The set of values that are specific to a Basket event",
"fields": [
{
"default": [],
"doc": "A range of details about product / basket availability",
"name": "availability",
"type": {
"type": "array",
"items": "Availability"
}
},
{
"default": [],
"doc": "A range of care pland applicable to the basket",
"name": "carePlan",
"type": {
"type": "array",
"items": "CarePlan"
}
},
{
"default": "Category",
"name": "category",
"type": "Category"
},
{
"default": "",
"doc": "Unique identfier for this basket",
"name": "id",
"type": "string"
},
{
"default": "Price",
"doc": "Overall pricing info about the basket as a whole - individual product pricings will be dealt with at a product level",
"name": "price",
"type": "Price"
}
]
}
availability.avsc
{
"name": "Availability",
"type": "record",
"doc": "A range of values relating to the availability of a product",
"fields": [
{
"default": [],
"doc": "A list of offers associated with the overall basket - product level offers will be dealt with on an individual product basis",
"name": "shipping",
"type": {
"type": "array",
"items": "Shipping"
}
},
{
"default": "",
"doc": "The status of the product",
"name": "stockStatus",
"type": {
"name": "StockStatus",
"symbols": ["in stock","out of stock",""],
"type": "enum"
}
},
{
"default": "",
"doc": "The ID for the store when the stock can be collected, if relevant",
"name": "storeId",
"type": "string"
},
{
"default": "",
"doc": "The status of the product",
"name": "type",
"type": {
"name": "AvailabilityType",
"symbols": ["collection","shipping",""],
"type": "enum"
}
}
]
}
maxDate.avsc
{
"type": "record",
"name": "MaxDate",
"doc": "Indicates the timestamp for latest day a delivery should be made",
"fields": [
{
"default": "Timestamp",
"doc": "The time stamp for the delivery",
"name": "timestamp",
"type": "Timestamp"
}
]
}
minDate.avsc
{
"type": "record",
"name": "MinDate",
"doc": "Indicates the timestamp for earliest day a delivery should be made",
"fields": [
{
"default": "Timestamp",
"doc": "The time stamp for the delivery",
"name": "timestamp",
"type": "Timestamp"
}
]
}
shipping.avsc
{
"name": "Shipping",
"type": "record",
"doc": "A range of values relating to shipping a product for delivery",
"fields": [
{
"default": "MaxDate",
"name": "maxDate",
"type": "MaxDate"
},
{
"default": "MinDate",
"name": "minDate",
"type": "minDate"
},
{
"default": 0,
"doc": "Revenue generated from shipping - note, once a specific shipping object is selected, the more detailed revenye data sits within the one of object in pricing - this is more just to define if shipping is free or not",
"name": "revenue",
"type": "int"
},
{
"default": "",
"doc": "The shipping supplier",
"name": "supplier",
"type": "string"
}
]
}
timestamp.avsc
{
"name": "Timestamp",
"type": "record",
"doc": "Timestamp for the action taking place",
"fields": [
{
"default": 0,
"name": "timestampMs",
"type": "long"
},
{
"default": "",
"doc": "Timestamp converted to a string in ISO format",
"name": "isoTimestamp",
"type": "string"
}
]
}
I'm not sure if that library supports what you are trying to do, but fastavro should.
If you put the first schema in a file called BasketEvent.avsc and the second schema in a file called ContentBasket.avsc then you can do the following:
from fastavro.schema import load_schema
from fastavro import validate
schema = load_schema("BasketEvent.avsc")
validate({"test": "test"}, schema)
Note that when I tried to do this I got an error of fastavro._schema_common.UnknownType: Availability because it seems that there are other referenced schemas that you haven't posted here.
Here is a schema of the json output which I am trying to parse and write specific fields from it into a csv file (Example: cve id, description,....)
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "JSON Schema for NVD Vulnerability Data Feed version 1.1",
"id": "https://scap.nist.gov/schema/nvd/feed/1.1/nvd_cve_feed_json_1.1.schema",
"definitions": {
"def_cpe_name": {
"description": "CPE name",
"type": "object",
"properties": {
"cpe22Uri": {
"type": "string"
},
"cpe23Uri": {
"type": "string"
},
"lastModifiedDate": {
"type": "string"
}
},
"required": [
"cpe23Uri"
]
},
"def_cpe_match": {
"description": "CPE match string or range",
"type": "object",
"properties": {
"vulnerable": {
"type": "boolean"
},
"cpe22Uri": {
"type": "string"
},
"cpe23Uri": {
"type": "string"
},
"versionStartExcluding": {
"type": "string"
},
"versionStartIncluding": {
"type": "string"
},
"versionEndExcluding": {
"type": "string"
},
"versionEndIncluding": {
"type": "string"
},
"cpe_name": {
"type": "array",
"items": {
"$ref": "#/definitions/def_cpe_name"
}
}
},
"required": [
"vulnerable",
"cpe23Uri"
]
},
"def_node": {
"description": "Defines a node or sub-node in an NVD applicability statement.",
"properties": {
"operator": {"type": "string"},
"negate": {"type": "boolean"},
"children": {
"type": "array",
"items": {"$ref": "#/definitions/def_node"}
},
"cpe_match": {
"type": "array",
"items": {"$ref": "#/definitions/def_cpe_match"}
}
}
},
"def_configurations": {
"description": "Defines the set of product configurations for a NVD applicability statement.",
"properties": {
"CVE_data_version": {"type": "string"},
"nodes": {
"type": "array",
"items": {"$ref": "#/definitions/def_node"}
}
},
"required": [
"CVE_data_version"
]
},
"def_subscore": {
"description": "CVSS subscore.",
"type": "number",
"minimum": 0,
"maximum": 10
},
"def_impact": {
"description": "Impact scores for a vulnerability as found on NVD.",
"type": "object",
"properties": {
"baseMetricV3": {
"description": "CVSS V3.x score.",
"type": "object",
"properties": {
"cvssV3": {"$ref": "cvss-v3.x.json"},
"exploitabilityScore": {"$ref": "#/definitions/def_subscore"},
"impactScore": {"$ref": "#/definitions/def_subscore"}
}
},
"baseMetricV2": {
"description": "CVSS V2.0 score.",
"type": "object",
"properties": {
"cvssV2": {"$ref": "cvss-v2.0.json"},
"severity": {"type": "string"},
"exploitabilityScore": {"$ref": "#/definitions/def_subscore"},
"impactScore": {"$ref": "#/definitions/def_subscore"},
"acInsufInfo": {"type": "boolean"},
"obtainAllPrivilege": {"type": "boolean"},
"obtainUserPrivilege": {"type": "boolean"},
"obtainOtherPrivilege": {"type": "boolean"},
"userInteractionRequired": {"type": "boolean"}
}
}
}
},
"def_cve_item": {
"description": "Defines a vulnerability in the NVD data feed.",
"properties": {
"cve": {"$ref": "CVE_JSON_4.0_min_1.1.schema"},
"configurations": {"$ref": "#/definitions/def_configurations"},
"impact": {"$ref": "#/definitions/def_impact"},
"publishedDate": {"type": "string"},
"lastModifiedDate": {"type": "string"}
},
"required": ["cve"]
}
},
"type": "object",
"properties": {
"CVE_data_type": {"type": "string"},
"CVE_data_format": {"type": "string"},
"CVE_data_version": {"type": "string"},
"CVE_data_numberOfCVEs": {
"description": "NVD adds number of CVE in this feed",
"type": "string"
},
"CVE_data_timestamp": {
"description": "NVD adds feed date timestamp",
"type": "string"
},
"CVE_Items": {
"description": "NVD feed array of CVE",
"type": "array",
"items": {"$ref": "#/definitions/def_cve_item"}
}
},
"required": [
"CVE_data_type",
"CVE_data_format",
"CVE_data_version",
"CVE_Items"
]
}
# -*- coding: utf-8 -*-
"""
Created on Thu Dec 3 17:08:51 2020
#author: Rajat Varshney
"""
import requests, json
api_url = 'https://services.nvd.nist.gov/rest/json/cve/1.0/'
cveid = input('Enter CVE ID: ')
api_call = requests.get(api_url+cveid)
print(api_call.content)
with open('cve details.txt', 'w') as outfile:
json.dump(api_call.content, outfile)
I am getting none when I try to validate my Json schema with my Json response using Validate from Jsonschema.validate, while it shows matched on https://www.jsonschemavalidator.net/
Json Schema
{
"KPI": [{
"KPIDefinition": {
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"version": {
"type": "number"
},
"description": {
"type": "string"
},
"datatype": {
"type": "string"
},
"units": {
"type": "string"
}
},
"KPIGroups": [{
"id": {
"type": "number"
},
"name": {
"type": "string"
}
}]
}],
"response": [{
"Description": {
"type": "string"
}
}]
}
JSON Response
JSON Response
{
"KPI": [
{
"KPIDefinition": {
"id": "2",
"name": "KPI 2",
"version": 1,
"description": "This is KPI 2",
"datatype": "1",
"units": "perHour"
},
"KPIGroups": [
{
"id": 7,
"name": "Group 7"
}
]
},
{
"KPIDefinition": {
"id": "3",
"name": "Parameter 3",
"version": 1,
"description": "This is KPI 3",
"datatype": "1",
"units": "per Hour"
},
"KPIGroups": [
{
"id": 7,
"name": "Group 7"
}
]
}
],
"response": [
{
"Description": "RECORD Found"
}
]
}
Code
json_schema2 = {"KPI":[{"KPIDefinition":{"id_new":{"type":"number"},"name":{"type":"string"},"version":{"type":"number"},"description":{"type":"string"},"datatype":{"type":"string"},"units":{"type":"string"}},"KPIGroups":[{"id":{"type":"number"},"name":{"type":"string"}}]}],"response":[{"Description":{"type":"string"}}]}
json_resp = {"KPI":[{"KPIDefinition":{"id":"2","name":"Parameter 2","version":1,"description":"This is parameter 2 definition version 1","datatype":"1","units":"kN"},"KPIGroups":[{"id":7,"name":"Group 7"}]},{"KPIDefinition":{"id":"3","name":"Parameter 3","version":1,"description":"This is parameter 3 definition version 1","datatype":"1","units":"kN"},"KPIGroups":[{"id":7,"name":"Group 7"}]}],"response":[{"Description":"RECORD FETCHED"}]}
print(jsonschema.validate(instance=json_resp, schema=json_schema2))
Validation is not being done correctly, I changed the dataType and key name in my response but still, it is not raising an exception or error.
jsonschema.validate(..) is not supposed to return anything.
Your schema object and the JSON object are both okay and validation has passed if it didn't raise any exceptions -- which seems to be the case here.
That being said, you should wrap your call within a try-except block so as to be able to catch validation errors.
Something like:
try:
jsonschema.validate(...)
print("Validation passed!")
except ValidationError:
print("Validation failed")
# similarly catch SchemaError too if needed.
Update: Your schema is invalid. As it stands, it will validate almost all inputs. A schema JSON should be an object (dict) that should have fields like "type" and based on the type, it might have other required fields like "items" or "properties". Please read up on how to write JSONSchema.
Here's a schema I wrote for your JSON:
{
"type": "object",
"required": [
"KPI",
"response"
],
"properties": {
"KPI": {
"type": "array",
"items": {
"type": "object",
"required": ["KPIDefinition","KPIGroups"],
"properties": {
"KPIDefinition": {
"type": "object",
"required": ["id","name"],
"properties": {
"id": {"type": "string"},
"name": {"type": "string"},
"version": {"type": "integer"},
"description": {"type": "string"},
"datatype": {"type": "string"},
"units": {"type": "string"},
},
"KPIGroups": {
"type": "array",
"items": {
"type": "object",
"required": ["id", "name"],
"properties": {
"id": {"type": "integer"},
"name": {"type": "string"}
}
}
}
}
}
}
},
"response": {
"type": "array",
"items": {
"type": "object",
"required": ["Description"],
"properties": {
"Description": {"type": "string"}
}
}
}
}
}
i have stored the data in arangodb in the following format:
{"data": [
{
"content": "maindb",
"type": "string",
"name": "db_name",
"key": "1745085839"
},
{
"type": "id",
"name": "rel",
"content": "1745085840",
"key": "1745085839"
},
{
"content": "user",
"type": "string",
"name": "rel_name",
"key": "1745085840"
},
{
"type": "id",
"name": "tuple",
"content": "174508584001",
"key": "1745085840"
},
{
"type": "id",
"name": "tuple",
"content": "174508584002",
"key": "1745085840"
},
{
"type": "id",
"name": "tuple",
"content": "174508584003",
"key": "1745085840"
},
{
"type": "id",
"name": "tuple",
"content": "174508584004",
"key": "1745085840"
},
{
"type": "id",
"name": "tuple",
"content": "174508584005",
"key": "1745085840"
},
{
"type": "id",
"name": "tuple",
"content": "174508584006",
"key": "1745085840"
},
{
"type": "id",
"name": "tuple",
"content": "174508584007",
"key": "1745085840"
},
{
"content": "dspclient",
"type": "varchar",
"name": "username",
"key": "174508584001"
},
{
"content": "12345",
"type": "varchar",
"name": "password",
"key": "174508584001"
},
{
"content": "12345",
"type": "varchar",
"name": "cpassword",
"key": "174508584001"
},
{
"content": "n",
"type": "varchar",
"name": "PostgreSQL",
"key": "174508584001"
},
{
"content": "n",
"name": "IBMDB2",
"type": "varchar",
"key": "174508584001"
},
{
"content": "n",
"name": "MySQL",
"type": "varchar",
"key": "174508584001"
},
{
"content": "n",
"type": "varchar",
"name": "SQLServer",
"key": "174508584001"
},
{
"content": "n",
"name": "Hadoop",
"type": "varchar",
"key": "174508584001"
},
{
"content": "None",
"name": "dir1",
"type": "varchar",
"key": "174508584001"
},
{
"content": "None",
"name": "dir2",
"type": "varchar",
"key": "174508584001"
},
{
"content": "None",
"name": "dir3",
"type": "varchar",
"key": "174508584001"
},
{
"content": "None",
"name": "dir4",
"type": "varchar",
"key": "174508584001"
},
{
"type": "inet",
"name": "ipaddr",
"content": "1921680103",
"key": "174508584001"
},
{
"content": "y",
"name": "status",
"type": "varchar",
"key": "174508584001"
},
{
"content": "None",
"type": "varchar",
"name": "logintime",
"key": "174508584001"
},
{
"content": "None",
"type": "varchar",
"name": "logindate",
"key": "174508584001"
},
{
"content": "None",
"type": "varchar",
"name": "logouttime",
"key": "174508584001"
},
{
"content": "client",
"type": "varchar",
"name": "user_type",
"key": "174508584001"
},
{
"content": "royal",
"type": "varchar",
"name": "username",
"key": "174508584002"
},
{
"content": "12345",
"type": "varchar",
"name": "password",
"key": "174508584002"
},
{
"content": "12345",
"type": "varchar",
"name": "cpassword",
"key": "174508584002"
},
{
"content": "n",
"type": "varchar",
"name": "PostgreSQL",
"key": "174508584002"
},
{
"content": "n",
"name": "IBMDB2",
"type": "varchar",
"key": "174508584002"
},
{
"content": "n",
"name": "MySQL",
"type": "varchar",
"key": "174508584002"
},
{
"content": "n",
"type": "varchar",
"name": "SQLServer",
"key": "174508584002"
},
{
"content": "n",
"name": "Hadoop",
"type": "varchar",
"key": "174508584002"
},
{
"content": "None",
"name": "dir1",
"type": "varchar",
"key": "174508584002"
},
{
"content": "None",
"name": "dir2",
"type": "varchar",
"key": "174508584002"
},
{
"content": "None",
"name": "dir3",
"type": "varchar",
"key": "174508584002"
},
{
"content": "None",
"name": "dir4",
"type": "varchar",
"key": "174508584002"
},
{
"type": "inet",
"name": "ipaddr",
"content": "1921680105",
"key": "174508584002"
},
{
"content": "y",
"name": "status",
"type": "varchar",
"key": "174508584002"
},
{
"content": "190835899000",
"type": "varchar",
"name": "logintime",
"key": "174508584002"
},
{
"content": "20151002",
"type": "varchar",
"name": "logindate",
"key": "174508584002"
},
{
"content": "None",
"type": "varchar",
"name": "logouttime",
"key": "174508584002"
},
{
"content": "client",
"type": "varchar",
"name": "user_type",
"key": "174508584002"
},
{
"content": "abc",
"type": "varchar",
"name": "username",
"key": "174508584003"
},
{
"content": "12345",
"type": "varchar",
"name": "password",
"key": "174508584003"
},
{
"content": "12345",
"type": "varchar",
"name": "cpassword",
"key": "174508584003"
},
{
"content": "n",
"type": "varchar",
"name": "PostgreSQL",
"key": "174508584003"
},
{
"content": "n",
"name": "IBMDB2",
"type": "varchar",
"key": "174508584003"
}]}
In order to perform fulltext search, I have created an index on content attribute by using the syntax from a python script:
c.DSP.ensureFulltextIndex("content");
Where, c is database, and DSP is the collection name. Now, I am trying to perform a search operation in the above data set by using the syntax:
FOR doc IN FULLTEXT(DSP, "content", "username") RETURN doc
Then, an error occure:
[1571] in function 'FULLTEXT()': no suitable fulltext index found for fulltext query on 'DSP' (while executing)
Please tell me the problem, and also tell me what will be the syntax when i will try this query with a python script.
Thanks...
Working with the 10 minutes tutorial and the driver documentation
I got it working like this:
from pyArango.connection import *
c = Connection()
db = c.createDatabase(name = "testdb")
DSP= db.createCollection(name = "DSP")
DSP.ensureFulltextIndex(fields=["content"])
doc = DSP.createDocument({"content": "test bla"})
doc.save()
print db.AQLQuery('''FOR doc IN FULLTEXT(DSP, "content", "bla") RETURN doc''', 10)
Resulting in:
[{u'_key': u'1241175138503', u'content': u'test bla', u'_rev': u'1241175138503', u'_id': u'DSP/1241175138503'}]
I've used arangosh to revalidate the steps from the python prompt:
arangosh> db._useDatabase("testdb")
arangosh [testdb]> db.DSP.getIndexes()
[
{
"id" : "DSP/0",
"type" : "primary",
"fields" : [
"_key"
],
"selectivityEstimate" : 1,
"unique" : true,
"sparse" : false
},
{
"id" : "DSP/1241140928711",
"type" : "hash",
"fields" : [
"content"
],
"selectivityEstimate" : 1,
"unique" : false,
"sparse" : true
},
{
"id" : "DSP/1241142960327",
"type" : "fulltext",
"fields" : [
"content"
],
"unique" : false,
"sparse" : true,
"minLength" : 2
}
]
arangosh [testdb]> db.testdb.toArray()
[
{
"content" : "test bla",
"_id" : "DSP/1241175138503",
"_rev" : "1241175138503",
"_key" : "1241175138503"
}
]
db._query('FOR doc IN FULLTEXT(DSP, "content", "bla") RETURN doc')