Match entire statement in Elastic Search - python

I am trying to match whole statement while querying in Elastic search. But not able to achieve it right.
"query": {
"match_phrase": {"description": query_tokens}
}
I tried with Air Conditioning, Air Conditioner, but it provided me result like general match query.
How should i achieve complete statement fetch?

Solution
Store these statement(description field) in the keyword field.
Reason of not working
Match phrase query is analyzed which means the same analyzer which used at index time, used as query time to create search tokens and by default for text field it's standard analyzer which breaks tokens on space and , id special char.
Statement from Elasticsearch doc
The match_phrase query analyzes the text and creates a phrase query
out of the analyzed text. For example:
Index Def
{
"mappings": {
"properties": {
"description":
{
"type" : "keyword"
}
}
}
}
Index sample doc
{
"description" : "Air Conditioning, Air Conditioner"
}
Search query
{
"query": {
"match_phrase" : {
"description" : {
"query" : "Air Conditioning, Air Conditioner"
}
}
}
}
Search result
"hits": [
{
"_index": "match",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"description": "Air Conditioning, Air Conditioner"
}
}
]

Related

Mongodb update a particular word in a string in a multiple document

I am updating a mongodb collection for a small project of mine and I'm stuck with updating a single word in an existing field.
Example:
{
"_id" : ObjectId("5faa46a6036e146f85a4afef"),
"name" : "Kubernetes_cluster_setup - kubernetes-cluster"
}
In the document I want to change the "name": "Kubernetes_cluster_config -kubernetes-cluster".
I want config to be replaced in place of setup, and it should not remove the -kubernetes-cluster, that is a constant value.
Applied method > $set updates the entire field, but I want -kubernetes-cluster should not be removed.
Try using $replaceOne operator.
You need an aggregation like this.
db.collection.aggregate([
{
"$match": {
"id": 0
}
},
{
"$set": {
"name": {
"$replaceOne": {
"input": "$name",
"find": "setup",
"replacement": "config"
}
}
}
}
])
The first part is to find the element (I've used by id) and the second one is used to replace into the field name, the value setup for config.
Example here
Also, if you want to replace the string for every document, you can use this query:
db.collection.aggregate([
{
"$match": {
"name": {
"$regex": "setup"
}
}
},
{
"$set": {
"name": {
"$replaceOne": {
"input": "$name",
"find": "setup",
"replacement": "config"
}
}
}
}
])
Here the query look for the documents where field name contains the word setup and then replace for config.
Example here

Elastic search: Regex for matching longest string from list of strings in autocomplete suggester

I am very new to elastic search and trying to implement autocomplete suggester with regex queries. Once i receive a query, i am taking last 5 words of query and forming a list of tokens in this format
query - i am trying regex for elasticsearch
tokens - [elasticsearch, for elasticsearch, regex for elasticsearch ...]
My requirement is to identify indexed sentence which matches the longest string from a list of tokens
I am having a tough time writing a regular expression for it. can someone please help?
My mapping:
"mappings": {
"properties": {
"keywords": {
"type": "text",
"fields": {
"keywords_suggest": {
"type": "completion"
}
}
},
"sections": {
"type": "text",
"fields": {
"sections_suggest": {
"type": "completion"
}
}
},
"title": {
"type": "text",
"fields": {
"title_suggest": {
"type": "completion"
}
}
}
this is how I am making a search request
body = {
"from": 0, "size": size,
"query": {
"multi_match": {
"query": query,
"fields": ["title^3", "searchResultPreview^1", "body^5"], #ignore these fields as i only pasted mapping used for completion type
"fuzziness": "AUTO"
}
},
"suggest": {
"title-suggest": {
"regex": regex,
"completion": {
"field": "title.title_suggest",
"skip_duplicates": True,
}
},
"keyword-suggest": {
"regex": regex,
"completion": {
"field": "keywords.keywords_suggest",
"skip_duplicates": True,
}
},
"section-suggest": {
"regex": regex,
"completion": {
"field": "sections.sections_suggest",
"skip_duplicates": True,
}
}
}
}
search_result = self.es.search(index=index_name, body=body)
Indexed sentence1 - The Real purpose of elasticsearch is unknown
Indexed sentence2 - real function is not defined
query - i want to know the real
list of words - [ real, the real, know the real, to know the real]
I tried following regular expression -
(to know the real|know the real|the real|real)
required output - Indexed sentence1 needs to be matched as it is the longest word in the list that the sentence starts, with but it is showing matches only the sentences that start with real
can someone please tell me, where I am going wrong.
EDIT: I believe case sensitivity is not the issue as the matches for word real are case insensitive
Regular Expressions are case sensitive
use (?i) before your matching group and it should work
see it working here

how to search query in elastic search with synonyms in Python

I am not able to understand the implementation of the elastic search query along with the synonym table. With a general query, I don't have any search problems but incorporating synonyms as become an issue to me.
es.search(index='data_inex', body={
"query": {
"match": {"inex": "tren"}
},
"settings": {
"filter": {
"synonym": {
"type": "synonym",
"lenient": true,
"synonyms": [ "foo, baz", "tren, hut" ]
}
}
}
}
)
Also, is it possible to use a file instead of this array?
Check the documentation: Click Here
You can configure synonyms file as well:
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [ "synonym" ]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/synonym.txt" // <======== location of synonym file
}
}
}
}
}
}
Please note:
Changes in the synonyms file will not reflect in the documents indexed before the change. Re-indexing is required for the same.
You cannot change the mapping (including the analyzer) of an existing field. What you need to do if you want to change the mapping of existing documents is reindex those documents to another index with the updated mapping.
Search query doesn't support "settings".

python: query mongodb database to find a specific value under a unknown field

I am using python to generate a mongodb database collection and I need to find some specific values from the database, the document is like:
{
"_id":ObjectId(215454541245),
"category":food
"venues":{"Thai Restaurant":251, "KFC":124, "Chinese Restaurant":21,.....}
}
My question is that, I want to query this database and find all venues which have a value smaller than 200, so in my example, "KFC" and "Chinese Restaurant" will be returned from this query.
Anyone knows how to do that?
If you can change your schema it would be much easier to issue queries against your collection. As it is, having dynamic values as your keys is considered a bad design pattern with MongoDB as they are extremely difficult to query.
A recommended approach would be to follow an embedded model like this:
{
"_id": ObjectId("553799187174b8c402151d06"),
"category": "food",
"venues": [
{
"name": "Thai Restaurant",
"value": 251
},
{
"name": "KFC",
"value": 124
},
{
"name": "Chinese Restaurant",
"value": 21
}
]
}
Thus with this structure you could then issue the query to find all venues which have a value smaller than 200:
db.collection.findOne({"venues.value": { "$lt": 200 } },
{
"venues": { "$elemMatch": { "value": { "$lt": 200 } } },
"_id": 0
});
This will return the result:
/* 0 */
{
"venues" : [
{
"name" : "KFC",
"value" : 124
}
]
}

ElasticSearch: Finding documents with field value that is in an array

I have some customer documents that I want to be retrieved using ElasticSearch based on where the customers come from (country field is IN an array of countries).
[
{
"name": "A1",
"address": {
"street": "1 Downing Street"
"country": {
"code": "GB",
"name": "United Kingdom"
}
}
},
{
"name": "A2",
"address": {
"street": "25 Gormut Street"
"country": {
"code": "FR",
"name": "France"
}
}
},
{
"name": "A3",
"address": {
"street": "Bonjour Street"
"country": {
"code": "FR",
"name": "France"
}
}
}
]
Now, I have another an array in my Python code:
["DE", "FR", "IT"]
I'd like to obtain the two documents, A2 and A3.
How would I write this in PyES/Query DSL? Am I supposed to be using an ExistsFilter or a TermQuery for this. ExistsFilter seems to only check whether the field exists or not, but doesn't care about the value.
In NoSQL-type document stores, all you get back is the document, not parts of the document.
Your requirement: "I'd like to obtain the two documents, A2 and A3." implies that you need to index each of those documents separately, not as an array inside another "parent" document.
If you need to match values of the parent document alongside country then you need to denormalize your data and store those values from the parent doc inside each sub-doc as well.
Once you've done the above, then the query is easy. I'm assuming that the country field is mapped as:
country: {
type: "string",
index: "not_analyzed"
}
To find docs with DE, you can do:
curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"country" : "DE"
}
}
}
}
}
'
To find docs with either DE or FR:
curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"constant_score" : {
"filter" : {
"terms" : {
"country" : [
"DE",
"FR"
]
}
}
}
}
}
'
To combine the above with some other query terms:
curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"filtered" : {
"filter" : {
"terms" : {
"country" : [
"DE",
"FR"
]
}
},
"query" : {
"text" : {
"address.street" : "bonjour"
}
}
}
}
}
'
Also see this answer for an explanation of how arrays of objects can be tricky, because of the way they are flattened:
Is it possible to sort nested documents in ElasticSearch?

Categories