ElasticSearch not respecting custom mapping while indexing data - python

I am using ElasticSearch 6.2.4. Currently learning it and writing code in Python. Following is my code. No matter I give age as Integer or text, it still accepts it.
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
# index settings
settings = {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"members": {
"dynamic": "strict",
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "integer"
},
}
}
}
}
if not es.indices.exists('family'):
es.indices.create(index='family', ignore=400, body=settings)
print('Created Index')
data = {'name': 'Maaz', 'age': "4"}
result = es.index(index='family', id=2, doc_type='members', body=data)
print(result)

You can give 42 and "42" as numeric type just because it still numbers and it has no impact on searching and storing this field, but you can't give, for example, "42a" in any numeric field.

Related

MongoDB How do I reference an array?

I'm trying to create a MongoDB database that contains two collections: Students and Courses.
The first collection "students" contains:
from pymongo import MongoClient
import pprint
client = MongoClient("mongodb://127.0.0.1:27017")
db = client.Database
student = [{"_id":"0",
"firstname":"Bert",
"lastname":"Holden"},
{"_id":"1",
"firstname":"Sam",
"lastname":"Olsen"},
{"_id":"2",
"firstname":"James",
"lastname":"Swan"}]
students = db.students
students.insert_many(student)
pprint.pprint(students.find_one())
The second collection "courses" contains:
from pymongo import MongoClient
import pprint
client = MongoClient("mongodb://127.0.0.1:27017")
db = client.Database
course = [{"_id":"10",
"coursename":"Databases",
"grades":"[{student_id:0, grade:83.442}, {student_id:1, grade:45.323}, {student_id:2, grade:87.435}]"}]
courses = db.courses
courses.insert_many(course)
pprint.pprint(courses.find_one())
I then want to use aggregation to find a student and the corresponding courses with grade(s).
from pymongo import MongoClient
import pprint
client = MongoClient("mongodb://127.0.0.1:27017")
db = client["Database"]
pipeline = [
{
"$lookup": {
"from": "courses",
"localField": "_id",
"foreignField": "student_id",
"as": "student_course"
}
},
{
"$match": {
"_id": "0"
}
}
]
pprint.pprint(list(db.students.aggregate(pipeline)))
I'm not sure if the student_id/grade is implemented correctly in the "courses" collection, so that might be one reason why my arregation returns [].
The aggregate works if I create seperate courses for each student, but that seems like a waste of memory, so I would like to have one course with all the student_ids and grades in an array.
Expected output:
[{'_id': '0',
'firstname': 'Bert',
'lastname': 'Holden',
'student_course': [{'_id': '10',
'coursename': 'Databases',
'grade': '83.442',
'student_id': '0'}]}]
A couple of points worth mentioning..
Your example code in file "courses.py" is inserting grades as a
string that represents an array, not an actual array. This was
pointed out by Matt in the comments, and you requested an
explanation. Here is my attempt to explain - if you insert a string
that looks like an array you cannot perform $unwind, or $lookup on
sub-elements because they aren't sub-elements, they are part of a
string.
You have array data in courses that hold students grades, which are
the datapoints that are desired, but you start the aggregation on
the student collection. Instead, perhaps change your perspective a
bit and come at it from the courses collections instead of the
student perspective. If you do, you will may re-qualify the
requirement as - "show me all courses and student grades where
student id is 0".
Your array data seems to have a datatype mismatch. The student id
is an integer in your string variable "array", but the student
collection has the student id as a string. Need to be consistent
for the $lookup to work properly (if not wanting to perform a bunch
of casting).
But, nonetheless, here is a possible solution to your problem. I have revised the python code, including a redefinition of the aggregation...
The name of my test database is pythontest as seen in this code example.
This database must exist prior to running the code else an error.
File students.py
from pymongo import MongoClient
import pprint
client = MongoClient("mongodb://127.0.0.1:27017")
db = client.pythontest
student = [{"_id":"0",
"firstname":"Bert",
"lastname":"Holden"},
{"_id":"1",
"firstname":"Sam",
"lastname":"Olsen"},
{"_id":"2",
"firstname":"James",
"lastname":"Swan"}]
students = db.students
students.insert_many(student)
pprint.pprint(students.find_one())
Then the courses file. Notice the field grades is no longer a string, but is a valid array object? Notice the student id is a string, and not an integer? (In reality, a stronger datatype such as UUID or int would likely be preferable).
File courses.py
from pymongo import MongoClient
import pprint
client = MongoClient("mongodb://127.0.0.1:27017")
db = client.pythontest
course = [{"_id":"10",
"coursename":"Databases",
"grades": [{ "student_id": "0", "grade": 83.442}, {"student_id": "1", "grade": 45.323}, {"student_id": "2", "grade": 87.435}]}]
courses = db.courses
courses.insert_many(course)
pprint.pprint(courses.find_one())
... and finally, the aggregation file with the changed aggregation pipeline...
File aggregation.py
from pymongo import MongoClient
import pprint
client = MongoClient("mongodb://127.0.0.1:27017")
db = client.pythontest
pipeline = [
{ "$match": { "grades.student_id": "0" } },
{ "$unwind": "$grades" },
{ "$project": { "coursename": 1, "student_id": "$grades.student_id", "grade": "$grades.grade" } },
{
"$lookup":
{
"from": "students",
"localField": "student_id",
"foreignField": "_id",
"as": "student"
}
},
{
"$unwind": "$student"
},
{ "$project": { "student._id": 0 } },
{ "$match": { "student_id": "0" } }
]
pprint.pprint(list(db.courses.aggregate(pipeline)))
Output of running program
> python3 aggregation.py
[{'_id': '10',
'coursename': 'Databases',
'grade': 83.442,
'student': {'firstname': 'Bert', 'lastname': 'Holden'},
'student_id': '0'}]
The format of the data at the end of the program may not be as desired, but can be tweaked by manipulating the aggregation.
** EDIT **
So if you want to approach this aggregation from the student rather than approaching it from the course you can still perform that aggregation, but because the array is in courses the aggregation will be a bit more complicated. The $lookup must utilize a pipeline itself to prepare the foreign data structures:
Aggregation from Student perspective
db.students.aggregate([
{ $match: { _id: "0" } },
{ $addFields: { "colStudents._id": "$_id" } },
{
$lookup:
{
from: "courses",
let: { varStudentId: "$colStudents._id"},
pipeline:
[
{ $unwind: "$grades" },
{ $match: { $expr: { $eq: ["$grades.student_id", "$$varStudentId" ] } } },
{ $project: { course_id: "$_id", coursename: 1, grade: "$grades.grade", _id: 0} }
],
as: "student_course"
}
},
{ $project: { _id: 0, student_id: "$_id", firstname: 1, lastname: 1, student_course: 1 } }
])
Output
> python3 aggregation.py
[{'firstname': 'Bert',
'lastname': 'Holden',
'student_course': [{'course_id': '10',
'coursename': 'Databases',
'grade': 83.442}],
'student_id': '0'}]
I was finally able to take a look at this..
TLDR; see Mongo Playground
This solution requires you to store grades as an actual object vs a string.
Consider the following database structure:
db={
// Collection
"students": [
{
"_id": "0",
"firstname": "Bert",
"lastname": "Holden"
},
{
"_id": "1",
"firstname": "Sam",
"lastname": "Olsen"
},
{
"_id": "2",
"firstname": "James",
"lastname": "Swan"
}
],
// Collection
"courses": [
{
"_id": "10",
"coursename": "Databases",
"grades": [
{
student_id: "0",
grade: 83.442
},
{
student_id: "1",
grade: 45.325
},
{
student_id: "2",
grade: 87.435
}
]
}
],
}
You can achieve what you want using the following query:
db.students.aggregate([
{
$match: {
_id: "0"
}
},
{
$lookup: {
from: "courses",
pipeline: [
{
$unwind: "$grades"
},
{
$match: {
"grades.student_id": "0"
}
},
{
$group: {
"_id": "$_id",
"coursename": {
$first: "$coursename"
},
"grade": {
$first: "$grades.grade"
},
"student_id": {
$first: "$grades.student_id"
}
}
}
],
as: "student_course"
}
}
])

how can i fix 'failed to find geo_point field [pin.location]'

so i have a index for my map points and i need to put some data in it. but it seems it does not register my data as a valid input for pin.location .
I have tried all i could get from https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-geo-distance-query.html
and still this does not work
This is where i set the index:
mappings = {
"mappings": {
"properties": {
"pin": {
"properties": {
"location": {
"type": "geo_point"
}
}
},
"index.mapping.single_type": False
}
}
}
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
if not es.indices.exists(index="groups_map"):
es.indices.create(index='groups_map', body=mappings)
es.index(index='groups_map', id=data["id"], doc_type='groups_map', body=data, request_timeout=30)
here is data:
data = {
"pin": {
"properties": {"location": {
"type": "geo_point",
'lat': request.POST['source_lat'],
'lon': request.POST['source_lon']}
}
},
"id": instance.id,
}
and this is my query data here is just a dictionary with lat and lon values
query = {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "12km",
"pin.location": {
"lat": data["lat"],
"lon": data["lon"]
}
}
}
}
}
return es.search(index="groups_map", body={"query": query}, size=20)
this is the full error i get:
elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'failed to find geo_point field [pin.location]')
The problem is that your data is not correct as you need to remove the properties key. Your data should look like this.
data = {
"pin": {
"location": {
'lat': request.POST['source_lat'],
'lon': request.POST['source_lon']
}
},
"id": instance.id,
}
Note: You need to delete and recreate your index before indexing new data.

Add #timestamp field in ElasticSearch with Python

I'm using Python to add entries in a local ElasticSearch (localhost:9200)
Currently, I use this method:
def insertintoes(data):
"""
Insert data into ElasicSearch
:param data: dict
:return:
"""
timestamp = data.get('#timestamp')
logstashIndex = 'logstash-' + timestamp.strftime("%Y.%m.%d")
es = Elasticsearch()
if not es.indices.exists(logstashIndex):
# Setting mappings for index
mapping = '''
{
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"norms": false
},
"dynamic_templates": [
{
"message_field": {
"path_match": "message",
"match_mapping_type": "string",
"mapping": {
"norms": false,
"type": "text"
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"fields": {
"keyword": {
"type": "keyword"
}
},
"norms": false,
"type": "text"
}
}
}
],
"properties": {
"#timestamp": {
"type": "date",
"include_in_all": true
},
"#version": {
"type": "keyword",
"include_in_all": true
}
}
}
}
}
'''
es.indices.create(logstashIndex, ignore=400, body=mapping)
es.index(index=logstashIndex, doc_type='system', timestamp=timestamp, body=data)
data is a dict structure with a valid #timestamp defined like this data['#timestamp'] = datetime.datetime.now()
The problem is, even if there is a timestamp value in my data, Kibana doesn't show the entry in «discovery» field. :(
Here is an example of a full entry in ElasicSearch:
{
"_index": "logstash-2017.06.25",
"_type": "system",
"_id": "AVzf3QX3iazKBndbIkg4",
"_score": 1,
"_source": {
"priority": 6,
"uid": 0,
"gid": 0,
"systemd_slice": "system.slice",
"cap_effective": "1fffffffff",
"exe": "/usr/bin/bash",
"hostname": "ns3003395",
"syslog_facility": 9,
"comm": "crond",
"systemd_cgroup": "/system.slice/cronie.service",
"systemd_unit": "cronie.service",
"syslog_identifier": "CROND",
"message": "(root) CMD (/usr/local/rtm/bin/rtm 14 > /dev/null 2> /dev/null)",
"systemd_invocation_id": "9228b6c72e6a4624a1806e4c59af8d04",
"syslog_pid": 26652,
"pid": 26652,
"#timestamp": "2017-06-25T17:27:01.734453"
}
}
As you can see, there IS a #timestamp field but it doesn't seems to be what Kibana expects.
And don't know what to do to make my entries visible in Kibana.
Any idea ?
Elasticsearch is not recognizing #timestamp as a date, but as a string. If your data['#timestamp'] is a datetime object, you can try to convert it to a ISO string, which is automatically recognized, try:
timestamp = data.get('#timestamp').isoformat()
timestamp should now be a string, but in ISO format

Elasticsearch querying nested objects returning no results

I have created Elasticsearch index and one of the nested field has mapping as following.
"groups": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"value": {
"type": "text"
}
}
}
On details about ES version, its 5.0 and I am using official python client elasticsearch-py on client side. I want to query this nested field based on its value.
Lets say there is another field called name which is a text type field. I want to find all name starting with A and falling under group specified.
Some sample data,
Groups - HR(name=HR, value=hr), Marketing(name=Marketing, value=marketing)
Names - Andrew, Alpha, Barry, John
Andrew and Alpha belong to group HR.
Based on this I tried a query
{
'query': {
'bool': {
'must': [{
'match_phrase_prefix': {
'title': 'A'
}
}]
},
'nested': {
'path': 'groups',
'query': {
'bool': {
'must': [{
'match': {
'groups.value': 'hr'
}
}]
}
}
}
}
}
For this query I referred ES docs but this query does not return anything. It would be great if someone can point out what is wrong with this query or mapping itself.
You're almost there, you simply need to move the nested query inside the bool/must query:
{
'query': {
'bool': {
'must': [
{
'match_phrase_prefix': {
'title': 'A'
}
},
{
'nested': {
'path': 'groups',
'query': {
'bool': {
'must': [{
'match': {
'groups.value': 'hr'
}
}]
}
}
}
}
]
}
}
}

Elasticsearch insensitive search accents

I'm using Elastic search with Python. I can't find a way to make insensitive search with accents.
For example:
I have two words. "Camión" and "Camion".
When a user search for "camion" I'd like the two results show up.
Creating index:
es = Elasticsearch([{u'host': u'127.0.0.1', u'port': b'9200'}])
es.indices.create(index='name', ignore=400)
es.index(
index="name",
doc_type="producto",
id=p.pk,
body={
'title': p.titulo,
'slug': p.slug,
'summary': p.summary,
'description': p.description,
'image': foto,
'price': p.price,
'wholesale_price': p.wholesale_price,
'reference': p.reference,
'ean13': p.ean13,
'rating': p.rating,
'quantity': p.quantity,
'discount': p.discount,
'sales': p.sales,
'active': p.active,
'encilleria': p.encilleria,
'brand': marca,
'brand_title': marca_titulo,
'sellos': sellos_str,
'certificados': certificados_str,
'attr_naturales': attr_naturales_str,
'soluciones': soluciones_str,
'categories': categories_str,
'delivery': p.delivery,
'stock': p.stock,
'consejos': p.consejos,
'ingredientes': p.ingredientes,
'es_pack': p.es_pack,
'temp': p.temp,
'relevancia': p.relevancia,
'descontinuado': p.descontinuado,
}
Search:
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': '127.0.0.1', 'port': '9200'}])
resul = es.search(
index="name",
body={
"query": {
"query_string": {
"query": "(title:" + search + " OR description:" + search + " OR summary:" + search + ") AND (active:true)",
"analyze_wildcard": False
}
},
"size": "9999",
}
)
print resul
I've searched on Google, Stackoverflow and elastic.co but I didn't find anything that works.
You need to change the mapping of those fields you have in the query. Changing the mapping requires re-indexing so that the fields will be analyzed differently and the query will work.
Basically, you need something like the following below. The field called text is just an example. You need to apply the same settings for other fields as well. Note that I used fields in there so that the root field will maintain the original text analyzed by default, while text.folded will remove the accented characters and will make it possible for your query to work. I have also changed the query a bit so that you search both versions of that field (camion will match, but also camión).
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"text": {
"type": "string",
"fields": {
"folded": {
"type": "string",
"analyzer": "folding"
}
}
}
}
}
}
}
And the query:
"query": {
"query_string": {
"query": "\\*.folded:camion"
}
}
Also, I strongly suggest reading this section of the documentation: https://www.elastic.co/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html

Categories