I'm using Elastic search with Python. I can't find a way to make insensitive search with accents.
For example:
I have two words. "Camión" and "Camion".
When a user search for "camion" I'd like the two results show up.
Creating index:
es = Elasticsearch([{u'host': u'127.0.0.1', u'port': b'9200'}])
es.indices.create(index='name', ignore=400)
es.index(
index="name",
doc_type="producto",
id=p.pk,
body={
'title': p.titulo,
'slug': p.slug,
'summary': p.summary,
'description': p.description,
'image': foto,
'price': p.price,
'wholesale_price': p.wholesale_price,
'reference': p.reference,
'ean13': p.ean13,
'rating': p.rating,
'quantity': p.quantity,
'discount': p.discount,
'sales': p.sales,
'active': p.active,
'encilleria': p.encilleria,
'brand': marca,
'brand_title': marca_titulo,
'sellos': sellos_str,
'certificados': certificados_str,
'attr_naturales': attr_naturales_str,
'soluciones': soluciones_str,
'categories': categories_str,
'delivery': p.delivery,
'stock': p.stock,
'consejos': p.consejos,
'ingredientes': p.ingredientes,
'es_pack': p.es_pack,
'temp': p.temp,
'relevancia': p.relevancia,
'descontinuado': p.descontinuado,
}
Search:
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': '127.0.0.1', 'port': '9200'}])
resul = es.search(
index="name",
body={
"query": {
"query_string": {
"query": "(title:" + search + " OR description:" + search + " OR summary:" + search + ") AND (active:true)",
"analyze_wildcard": False
}
},
"size": "9999",
}
)
print resul
I've searched on Google, Stackoverflow and elastic.co but I didn't find anything that works.
You need to change the mapping of those fields you have in the query. Changing the mapping requires re-indexing so that the fields will be analyzed differently and the query will work.
Basically, you need something like the following below. The field called text is just an example. You need to apply the same settings for other fields as well. Note that I used fields in there so that the root field will maintain the original text analyzed by default, while text.folded will remove the accented characters and will make it possible for your query to work. I have also changed the query a bit so that you search both versions of that field (camion will match, but also camión).
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"text": {
"type": "string",
"fields": {
"folded": {
"type": "string",
"analyzer": "folding"
}
}
}
}
}
}
}
And the query:
"query": {
"query_string": {
"query": "\\*.folded:camion"
}
}
Also, I strongly suggest reading this section of the documentation: https://www.elastic.co/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html
Related
Say I have this:
search_object = {
'query': {
'bool' : {
'must' : {
'simple_query_string' : {
'query': search_text,
'fields': [ 'french_no_accents', 'def_no_accents', ],
},
},
'filter' : [
{ 'term' : { 'def_no_accents' : 'court', }, },
{ 'term' : { 'def_no_accents' : 'bridge', }, },
],
},
},
'highlight': {
'encoder': 'html',
'fields': {
'french_no_accents': {},
'def_no_accents': {},
},
'number_of_fragments' : 0,
},
}
... whatever search string I enter as search_text, its constituent terms, but also "court" and "bridge" are highlighted. I don't want "court" or "bridge" to be highlighted.
I've tried putting the "highlight" key-value in a different spot in the structure... nothing seems to work (i.e. syntax exception thrown).
More generally, is there a formal grammar anywhere specifying what you can and can't do with ES (v7) queries?
You could add a highlight query to limit what should and shouldn't get highlighted:
{
"query": {
"bool": {
"must": {
"simple_query_string": {
"query": "abc",
"fields": [
"french_no_accents",
"def_no_accents"
]
}
},
"filter": [
{ "term": { "def_no_accents": "court" } },
{ "term": { "def_no_accents": "bridge" } }
]
}
},
"highlight": {
"encoder": "html",
"fields": {
"*_no_accents": { <--
"highlight_query": {
"simple_query_string": {
"query": "abc",
"fields": [ "french_no_accents", "def_no_accents" ]
}
}
}
},
"number_of_fragments": 0
}
}
I've used a wildcard for the two fields (*_no_accents) -- if that matches unwanted fields too, you'll need to duplicate the highlight query on two separate, non-wilcard highlight fields like you originally had. Though I can't think of a scenario where that'd happen since your multi_match query targets two concrete fields.
As to:
More generally, is there a formal grammar anywhere specifying what you can and can't do with ES (v7) queries?
what exactly are you looking for?
I am not able to understand the implementation of the elastic search query along with the synonym table. With a general query, I don't have any search problems but incorporating synonyms as become an issue to me.
es.search(index='data_inex', body={
"query": {
"match": {"inex": "tren"}
},
"settings": {
"filter": {
"synonym": {
"type": "synonym",
"lenient": true,
"synonyms": [ "foo, baz", "tren, hut" ]
}
}
}
}
)
Also, is it possible to use a file instead of this array?
Check the documentation: Click Here
You can configure synonyms file as well:
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [ "synonym" ]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/synonym.txt" // <======== location of synonym file
}
}
}
}
}
}
Please note:
Changes in the synonyms file will not reflect in the documents indexed before the change. Re-indexing is required for the same.
You cannot change the mapping (including the analyzer) of an existing field. What you need to do if you want to change the mapping of existing documents is reindex those documents to another index with the updated mapping.
Search query doesn't support "settings".
I am using ElasticSearch 6.2.4. Currently learning it and writing code in Python. Following is my code. No matter I give age as Integer or text, it still accepts it.
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
# index settings
settings = {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"members": {
"dynamic": "strict",
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "integer"
},
}
}
}
}
if not es.indices.exists('family'):
es.indices.create(index='family', ignore=400, body=settings)
print('Created Index')
data = {'name': 'Maaz', 'age': "4"}
result = es.index(index='family', id=2, doc_type='members', body=data)
print(result)
You can give 42 and "42" as numeric type just because it still numbers and it has no impact on searching and storing this field, but you can't give, for example, "42a" in any numeric field.
I have created Elasticsearch index and one of the nested field has mapping as following.
"groups": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"value": {
"type": "text"
}
}
}
On details about ES version, its 5.0 and I am using official python client elasticsearch-py on client side. I want to query this nested field based on its value.
Lets say there is another field called name which is a text type field. I want to find all name starting with A and falling under group specified.
Some sample data,
Groups - HR(name=HR, value=hr), Marketing(name=Marketing, value=marketing)
Names - Andrew, Alpha, Barry, John
Andrew and Alpha belong to group HR.
Based on this I tried a query
{
'query': {
'bool': {
'must': [{
'match_phrase_prefix': {
'title': 'A'
}
}]
},
'nested': {
'path': 'groups',
'query': {
'bool': {
'must': [{
'match': {
'groups.value': 'hr'
}
}]
}
}
}
}
}
For this query I referred ES docs but this query does not return anything. It would be great if someone can point out what is wrong with this query or mapping itself.
You're almost there, you simply need to move the nested query inside the bool/must query:
{
'query': {
'bool': {
'must': [
{
'match_phrase_prefix': {
'title': 'A'
}
},
{
'nested': {
'path': 'groups',
'query': {
'bool': {
'must': [{
'match': {
'groups.value': 'hr'
}
}]
}
}
}
}
]
}
}
}
What I assumed would be a pretty trivial query, turned out to be difficult for me.
I have a collection of event data for users to a website. I am trying to find the landing page for each user. Unique users have the field anonymousId and the users current page path is in the nested field context.page.path.
I can find the first date which the user visits but am unsure how to extract the context.page.path that goes along with that date in the same query. I used the $first operator on the page path as well, but I'm fairly certain this is incorrect...
Here is the code I'm using to create the cursor in Python (pymongo):
cursor = db.events.aggregate({
'$group':
{
'_id': '$anonymousId',
'date': { '$first': '$timestamp' },
'page': { '$first': '$context.page.path' }
}
})
Edit:
Here is the document structure for this collection (redacted)...
{
"anonymousId": "...",
"timestamp": "2016-04-05T13:05:06.076Z",
"context": {
"page": {
"path": "...",
"referrer": "...",
"title": "...",
"url": "..."
}
},
... more fields here but not relevant to this question
}
I think I may have figured it out... but I'm still not 100% positive.
cursor = db.events.aggregate([
{
'$group':
{
'_id': '$anonymousId',
'date': { '$first': '$timestamp' },
'page': { '$push': '$context.page.path' }
}
},
{
'$unwind': '$page'
}
])
forgive the poor indentation...