Elasticsearch Parent-Child Mapping and Indexing - python

I was following the book "Elasticsearch: The Definitive Guide". This book is outdated and when something was not working I was searching it on the internet and making it work with newer versions. But I can't find anything useful for Parent-Child Mapping and Indexing.
For example:
{
"mappings": {
"branch": {},
"employee": {
"_parent": {
"type": "branch"
}
}
}
}
How can I represent following mapping in new version of Elasticsearch.
And How can I index following parent:
{ "name": "London Westminster", "city": "London", "country": "UK" }
and following childer:
PUT company/employee/1?parent=London
{
"name": "Alice Smith",
"dob": "1970-10-24",
"hobby": "hiking"
}
Also, I am using elasticsearch python client and providing examples in it would be appreciated.

The _parent field has been removed in favor of the join field.
The join data type is a special field that creates parent/child
relation within documents of the same index. The relations section
defines a set of possible relations within the documents, each
relation being a parent name and a child name.
Consider company as the parent and employee as its child
Index Mapping:
{
"mappings": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"company": "employee"
}
}
}
}
}
Parent document in the company context
PUT /index-name/_doc/1
{
"name": "London Westminster",
"city": "London",
"country": "UK",
"my_join_field": {
"name": "company"
}
}
Child document
PUT /index-name/_doc/2?routing=1&refresh
{
"name": "Alice Smith",
"dob": "1970-10-24",
"hobby": "hiking",
"my_join_field": {
"name": "employee",
"parent": "1"
}
}

Related

How to save polygon data on Elasticsearch through Django GeoShapeField inside NestedField?

The models looks like -
class Restaurant(models.Model):
zones = JSONField(default=dict)
The document looks like-
#registry.register_document
class RestaurantDocument(Document):
zone = fields.NestedField(properties={"slug": fields.KeywordField(), "polygon_zone": fields.GeoShapeField()})
class Index:
name = 'restaurant_data'
settings = {
'number_of_shards': 1,
'number_of_replicas': 0
}
class Django:
model = Restaurant
def prepare_zone(self, instance):
return instance.zone
After indexing the mapping looks like-
"zone": {
"type": "nested",
"properties": {
"polygon_zone": {
"type": "geo_shape"
},
"slug": {
"type": "keyword"
}
}
}
But when I am saving data on zones field by following structure-
[{"slug":"dhaka","ploygon_zone":{"type":"polygon","coordinates":[[[89.84207153320312,24.02827811169503],[89.78233337402344,23.93040645231774],[89.82833862304688,23.78722976367578],[90.02197265625,23.801051951752406],[90.11329650878905,23.872024546162947],[90.11672973632812,24.00883517846163],[89.84207153320312,24.02827811169503]]]}}]
Then the elasticsearch mapping has been changed automatically by the following way-
"zone": {
"type": "nested",
"properties": {
"ploygon_zone": {
"properties": {
"coordinates": {
"type": "float"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"polygon_zone": {
"type": "geo_shape"
},
"slug": {
"type": "keyword"
}
}
}
That's why when I try to search on zone__polygon_zone field, it always returns empty because its not polygon type data.
So, how can I save polygon data on elasticsearch trough django by nested geoshape field?
There is a type while index the data. Instead of ploygon_zone, it should be polygon_zone. I believe fixing the typo will solve the issue that you are facing.

How can I add database references in schema validation when creating a mongodb collection?

Say I have a collection "cities" with the following documents:
Document 1:
{
"_id": {
"$oid": "5e00979d7c21388869c2c048"
},
"cityName": "New York"
}
Document 2:
{
"_id": {
"$oid": "5e00979d7c21388869c2c432"
},
"cityName": "Los Angeles"
}
and I want to create another collection "students" with the following document:
{
"name": "John",
"citiesVisited": [
{
"$ref": "cities",
"$id": "5e00979d7c21388869c2c048"
},
{
"$ref": "cities",
"$id": "5e00979d7c21388869c2c432"
}
]
}
How should the schema validation be? I tried the following validation:
validator = {
"$jsonSchema": {
"bsonType": "object",
"required": ["name", "citiesVisited"],
"properties": {
"name": {
"bsonType": "string",
"description": "name of student."
},
"citiesVisited": {
"bsonType": ["array"],
"items": {
"bsonType": "object",
"required": ["$ref", "$id"],
"properties": {
"$ref": {
"bsonType": "string",
"description": "collection name"
},
"$id": {
"bsonType": "string",
"description": "document id of visited city"
}
}
},
"description": "cities visited by the student"
}
}
}}
but it gives the following error when I try to get a list of all collections in the database:
bson.errors.InvalidBSON: collection must be an instance of str
I tried creating the validation without the "$" in "$ref" and "$id" and it worked fine but the document validation failed because of database references.
I want to use dbrefs when storing the cities.

How to model a complex json file as a python class

Are there any python helper libraries I can use to create models that I can use to generate complex json files, such as this. I've read about colander but I'm not sure it does what I need. The tricky bit about the following is that the trigger-rule section may have nested match rules, something as described at https://github.com/adnanh/webhook/wiki/Hook-Rules
[
{
"id": "webhook",
"execute-command": "/home/adnan/redeploy-go-webhook.sh",
"command-working-directory": "/home/adnan/go",
"pass-arguments-to-command":
[
{
"source": "payload",
"name": "head_commit.id"
},
{
"source": "payload",
"name": "pusher.name"
},
{
"source": "payload",
"name": "pusher.email"
}
],
"trigger-rule":
{
"and":
[
{
"match":
{
"type": "payload-hash-sha1",
"secret": "mysecret",
"parameter":
{
"source": "header",
"name": "X-Hub-Signature"
}
}
},
{
"match":
{
"type": "value",
"value": "refs/heads/master",
"parameter":
{
"source": "payload",
"name": "ref"
}
}
}
]
}
}
]
Define a class like this:
class AttributeDictionary(dict):
__getattr__ = dict.__getitem__
__setattr__ = dict.__setitem__
When you load your JSON, pass AttributeDictionary as the object_hook:
import json
data = json.loads(json_str, object_hook=AttributeDictionary)
Then you can access dict entries by specifying the key as an attribute:
print data[0].id
Output
webhook
Note: You will want to replace dashes in keys with underscores. If you don't, this approach won't work on those keys.

python querying a json objectpath

I've a nested json structure, I'm using objectpath (python API version), but I don't understand how to select and filter some information (more precisely the nested information in the structure).
EG.
I want to select the "description" of the action "reading" for the user "John".
JSON:
{
"user":
{
"actions":
[
{
"name": "reading",
"description": "blablabla"
}
]
"name": "John"
}
}
CODE:
$.user[#.name is 'John' and #.actions.name is 'reading'].actions.description
but it doesn't work (empty set but in my JSON it isn't so).
Any suggestion?
Is this what you are trying to do?
import objectpath
data = {
"user": {
"actions": {
"name": "reading",
"description": "blablabla"
},
"name": "John"
}
}
tree = objectpath.Tree(data)
result = tree.execute("$.user[#.name is 'John'].actions[#.name is 'reading'].description")
for entry in result:
print entry
Output
blablabla
I had to fix your JSON. Also, tree.execute returns a generator. You could replace the for loop with print result.next(), but the for loop seemed more clear.
import objectpath import *
your_json = {"name": "felix", "last_name": "diaz"}
# This json path will bring all the key-values of your json
your_json_path='$.*'
my_key_values = Tree(your_json).execute(your_json_path)
# If you want to retrieve the name node...then specify it.
my_name= Tree(your_json).execute('$.name')
# If you want to retrieve a the last_name node...then specify it.
last_name= Tree(your_json).execute('$.last_name')
I believe you're just missing a comma in JSON:
{
"user":
{
"actions": [
{
"name": "reading",
"description": "blablabla"
}
],
"name": "John"
}
}
Assuming there is only one "John", with only one "reading" activity, the following query works:
$.user[#.name is 'John'].actions[0][#.name is 'reading'][0].description
If there could be multiple "John"s, with multiple "reading" activities, the following query will almost work:
$.user.*[#.name is 'John'].actions..*[#.name is 'reading'].description
I say almost because the use of .. will be problematic if there are other nested dictionaries with "name" and "description" entries, such as
{
"user": {
"actions": [
{
"name": "reading",
"description": "blablabla",
"nested": {
"name": "reading",
"description": "broken"
}
}
],
"name": "John"
}
}
To get a correct query, there is an open issue to correctly implement queries into arrays: https://github.com/adriank/ObjectPath/issues/60

Mongodb: updating fields in embedded document

If I have a document that looks like this:
{
"_id" : 1,
"name": "Homer J. Simpson",
"income" : 45000,
"address": {
"street": "742 Evergreen Terrace",
"city": "Springfield",
"state": "???",
"email": "homer#springfield.com",
"zipcode": "12345",
"country": "USA"
}
}
And want to do an update on some of the fields in the address document (leaving the other ones unchanged), and insert new fields if they do not already exist, such as this:
{
"address": {
"email": "homer#gmail.com",
"zipcode": "77788",
"latitude" : 23.43545,
"longitude" : 123.45553
}
}
Is there a way to do an atomic update all at once, or do you need to loop over the key/values in the new data and do a .update() for each one?
Use dot notation with a $set to target multiple embedded fields in a single update:
{ "$set": {
"address.email": "homer#gmail.com",
"address.zipcode": "77788",
"address.latitude" : 23.43545,
"address.longitude" : 123.45553
} }
As Sergio metioned use a $set.
{address.latitude : "77788"}

Categories