how to search query in elastic search with synonyms in Python - python

I am not able to understand the implementation of the elastic search query along with the synonym table. With a general query, I don't have any search problems but incorporating synonyms as become an issue to me.
es.search(index='data_inex', body={
"query": {
"match": {"inex": "tren"}
},
"settings": {
"filter": {
"synonym": {
"type": "synonym",
"lenient": true,
"synonyms": [ "foo, baz", "tren, hut" ]
}
}
}
}
)
Also, is it possible to use a file instead of this array?

Check the documentation: Click Here
You can configure synonyms file as well:
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [ "synonym" ]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/synonym.txt" // <======== location of synonym file
}
}
}
}
}
}
Please note:
Changes in the synonyms file will not reflect in the documents indexed before the change. Re-indexing is required for the same.
You cannot change the mapping (including the analyzer) of an existing field. What you need to do if you want to change the mapping of existing documents is reindex those documents to another index with the updated mapping.
Search query doesn't support "settings".

Related

Malformed query in elasticsearch

search = {
"from": str(start),
"size": str(size),
"query": {
"bool": {
"must": {
"multi_match": {
"query":query,
"fields":["name","description","tags","comments","created","creator","transaction","wallet"],
"operator":"or"}
},
"filter": { "term": { "channel": channel } } } } }
This is the python dict object. It gets the following error:
elasticsearch.BadRequestError: BadRequestError(400, 'parsing_exception', '[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]')
I'm not seeing it. Please help. Start, size, query, and channel are all variables.
I have looked at a lot of example elasticsearch queries. Nothing I've tried has gotten passed syntax errors. I've also tried simple_search_string and a simple multi_match. I always need start and size, and always need to filter on channel.
So the issue is some of those fields are arrays and need [] inside them. Specifically must and filter. Adding appropriate braces solved the issue. Here's the new format:
search = {
"from": start,
"query": {
"bool": {
"must": [
{ "multi_match": {
"query": query,
"fields": ["name","description","tags","comments","created","creator","transaction","wallet"]
} },
{ "match": {
"channel": channel
} }
]
}
}
}
Notice I've also dropped using the filter and just added another match term. I'm using size in the search call as one of its parameters.

Analizer to ignore accents and plural singular in Elasticsearch

I am working on ignoring accents and plural/singular when I make a search query. I copied the Spanish analyzer from here and left only the stemmer https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html
you can check my code in Python (I bulk the data from a CSV latter):
settings={
"settings": {
"analysis": {
"filter": {
"spanish_stemmer": {
"type": "stemmer",
"language": "light_spanish"
}
},
"analyzer": {
"rebuilt_spanish": {
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stemmer"
]
}
}
}
}
}
es.indices.create(index="activities", body=settings)
However, when I try a GET query from insomnia like geometrico, geométrico, geométricos, geometricos I get 0 results and there is a doc with Title Cuerpos geométricos. It should match since I want to make no difference with accents and plural singular. Any ideas?
The GET query I do:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "geométricos",
"fields": [
"Descripcion",
"Nombre",
"Tags"
],
"analyzer":"rebuilt_spanish"
}
}
}
}
}
You will need to add ASCII folding token filter to your token filters check official documentation here. So your Analyzer should be like this:
Anlayzer:
"analysis": {
"filter": {
"spanish_stemmer": {
"type": "stemmer",
"language": "light_spanish"
}
},
"analyzer": {
"rebuilt_spanish": {
"tokenizer": "standard",
"filter": [
"asciifolding", // ASCII folding token filter
"lowercase",
"spanish_stemmer"
]
}
}
}
}

Mongodb update a particular word in a string in a multiple document

I am updating a mongodb collection for a small project of mine and I'm stuck with updating a single word in an existing field.
Example:
{
"_id" : ObjectId("5faa46a6036e146f85a4afef"),
"name" : "Kubernetes_cluster_setup - kubernetes-cluster"
}
In the document I want to change the "name": "Kubernetes_cluster_config -kubernetes-cluster".
I want config to be replaced in place of setup, and it should not remove the -kubernetes-cluster, that is a constant value.
Applied method > $set updates the entire field, but I want -kubernetes-cluster should not be removed.
Try using $replaceOne operator.
You need an aggregation like this.
db.collection.aggregate([
{
"$match": {
"id": 0
}
},
{
"$set": {
"name": {
"$replaceOne": {
"input": "$name",
"find": "setup",
"replacement": "config"
}
}
}
}
])
The first part is to find the element (I've used by id) and the second one is used to replace into the field name, the value setup for config.
Example here
Also, if you want to replace the string for every document, you can use this query:
db.collection.aggregate([
{
"$match": {
"name": {
"$regex": "setup"
}
}
},
{
"$set": {
"name": {
"$replaceOne": {
"input": "$name",
"find": "setup",
"replacement": "config"
}
}
}
}
])
Here the query look for the documents where field name contains the word setup and then replace for config.
Example here

Elasticsearch - Boosting an individual term if it appears in the fields

I have the following search query that returns the documents that contain the word "apple", "mango" or "strawberry". Now I want to boost the scoring of the document whenever the word "cake" or "chips" (or both) is in the document (the word cake or chips doesn't have to be in the document but whenever it appears in "title" or "body" fields, the scoring should be boosted, so that the documents containing the "cake" or "chips" are ranked higher)
res = es.search(index='fruits', body={
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "(apple) OR (mango) OR (strawberry)"
}
},
{
"bool": {
"must_not": [{
"match_phrase": {
"body": "Don't match this phrase."
}
}
]
}
}
]
},
"match": {
"query": "(cake) OR (chips)",
"boost": 2
}
}
}
})
Any help would be greatly appreciated!
Just include the values you would want to be boosted in a should clause as shown in the below query:
Query:
POST <your_index_name>/_search
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"(apple) OR (mango) OR (strawberry)"
}
},
{
"bool":{
"must_not":[
{
"match_phrase":{
"body":"Don't match this phrase."
}
}
]
}
}
],
"should":[ <----- Add this
{
"query_string":{
"query":"cake OR chips",
"fields": ["title","body"], <----- Specify fields
"boost":10 <----- Boost Field
}
}
]
}
}
}
Alternately, you can push your must_not clause to a level above in the query.
Updated Query:
POST <your_index_name>/_search
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"(apple) OR (mango) OR (strawberry)"
}
}
],
"should":[
{
"query_string":{
"query":"cake OR chips",
"fields": ["title","body"],
"boost":10
}
}
],
"must_not":[ <----- Note this
{
"match_phrase":{
"body":"Don't match this phrase."
}
}
]
}
}
}
Basically should qualifies as logical OR while must is used as logical AND in terms of Boolean Operations.
In that way the query would boost the results or documents higher up the order as it would have higher relevancy score while the ones which only qualifies only under must would come with lower relevancy.
Hope this helps!

Elasticsearch match multiple fields

I am recently using elasticsearch in a website. The scenario is, I have to search a string on afield. So, if the field is named as title then my search query was,
"query" :{"match": {"title": my_query_string}}.
But now I need to add another field in it. Let say, category. So i need to find the matches of my string which are in category :some_category and which have title : my_query_string I tried with multi_match. But it does not give me the result i am looking for. I am looking into query filter now. But is there way of adding two fields in such criteria in my match query?
GET indice/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "title"
}
},
{
"match": {
"category": "category"
}
}
]
}
}
}
Replace should with must if desired.
Ok, so I think that what you need is something like this:
"query": {
"filtered": {
"query": {
"match": {
"title": YOUR_QUERY_STRING,
}
},
"filter": {
"term": {
"category": YOUR_CATEGORY
}
}
}
}
If your category field is analyzed, then you will need to use match instead of term in the filter.
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{"match": {"title": "bold title"},
{"match": {"body": "nice body"}}
]
}
},
"filter": {
"term": {
"category": "xxx"
}
}
}
}

Categories