How to generate queries and skip parts of queries in Elasticsearch? - python

I am using Python to query Elasticsearch with a custom query. Let's look at a very simple example that will search for a given term in the field 'name' and another one in the 'surname' field of the document:
from elasticsearch import Elasticsearch
import json
# read query from external JSON
with open('query.json') as data_file:
read_query= json.load(data_file)
# search with elastic search and show hits
es = Elasticsearch()
# set query through body parameter
res = es.search(index="test", doc_type="articles", body=read_query)
print("%d documents found" % res['hits']['total'])
for doc in res['hits']['hits']:
print("%s) %s" % (doc['_id'], doc['_source']['content']))
'query.json'
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "Star",
"boost": 2
}
}
},
{
"match": {
"surname": "Fox"
}
}
]
}
}
}
Now, I am expecting the input of search words from the user, the first word that is typed in is used for the field 'name' and the second one for 'surname'. Let's imagine I will replace the {$name} and {$surname} with the two words that have been typed in by the user using python:
'query.json'
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "{$name}",
"boost": 2
}
}
},
{
"match": {
"surname": "{$surname}"
}
}
]
}
}
}
Now the problem arises when the user doesn't input the surname but only the name, so I end up with the following query:
'query.json'
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "Star",
"boost": 2
}
}
},
{
"match": {
"surname": ""
}
}
]
}
}
}
The field "surname" is now empty and elasticsearch will look for hits where "surname" is an empty string, which is not what I want. I want to ignore the surname field if the input term is empty. Is there any mechanism in elasticsearch to set a part of query to be ignored if the given term is empty?
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "Star",
"boost": 2
}
}
},
{
"match": {
"surname": "",
"ignore_if_empty" <--- this would be really cool
}
}
]
}
}
}
Maybe there is any other way of generating query strings? I can't seem to find anything about query generation in Elasticsearch. How do you guys do it? Any input is welcome!

Python DSL seems to be the proper way of doing it https://github.com/elastic/elasticsearch-dsl-py/

Related

Elasticsearch - Boosting an individual term if it appears in the fields

I have the following search query that returns the documents that contain the word "apple", "mango" or "strawberry". Now I want to boost the scoring of the document whenever the word "cake" or "chips" (or both) is in the document (the word cake or chips doesn't have to be in the document but whenever it appears in "title" or "body" fields, the scoring should be boosted, so that the documents containing the "cake" or "chips" are ranked higher)
res = es.search(index='fruits', body={
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "(apple) OR (mango) OR (strawberry)"
}
},
{
"bool": {
"must_not": [{
"match_phrase": {
"body": "Don't match this phrase."
}
}
]
}
}
]
},
"match": {
"query": "(cake) OR (chips)",
"boost": 2
}
}
}
})
Any help would be greatly appreciated!
Just include the values you would want to be boosted in a should clause as shown in the below query:
Query:
POST <your_index_name>/_search
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"(apple) OR (mango) OR (strawberry)"
}
},
{
"bool":{
"must_not":[
{
"match_phrase":{
"body":"Don't match this phrase."
}
}
]
}
}
],
"should":[ <----- Add this
{
"query_string":{
"query":"cake OR chips",
"fields": ["title","body"], <----- Specify fields
"boost":10 <----- Boost Field
}
}
]
}
}
}
Alternately, you can push your must_not clause to a level above in the query.
Updated Query:
POST <your_index_name>/_search
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"(apple) OR (mango) OR (strawberry)"
}
}
],
"should":[
{
"query_string":{
"query":"cake OR chips",
"fields": ["title","body"],
"boost":10
}
}
],
"must_not":[ <----- Note this
{
"match_phrase":{
"body":"Don't match this phrase."
}
}
]
}
}
}
Basically should qualifies as logical OR while must is used as logical AND in terms of Boolean Operations.
In that way the query would boost the results or documents higher up the order as it would have higher relevancy score while the ones which only qualifies only under must would come with lower relevancy.
Hope this helps!

Elasticsearch query with an array as search input

I'm trying to query some indexed data with an array of strings as search input.
The indexed data looks like this:
{
"pubMedID": "21528671",
"title": "Basic fibroblast [...] melanoma cells.",
"abstract": "Human malignant [...] cell growth."
}
I would like to search within the 'title' and 'abstract' fields for multiple strings. For example:
queryString=['melanoma', 'dysplastic nevus syndrome']
I already tried with the following code:
queryString=['melanoma', 'dysplastic nevus syndrome']
payload={
"query": {
"bool": {
"should": [
{
"query_string": {
"query": queryString,
"fields": [
"title",
"abstract"
]
}
}
]
}
}
}
payload_json = (json.dumps(payload))
res = esclient.search(index='medicine',body=payload_json)
But I get the following error when running this:
RequestError: RequestError(400, 'parsing_exception', '[query_string] query does not support [query]')
The query does work fine if I just put in a simple string value. Can someone tell me how I should do this kind of queries where you give as an input an array? Thank you in advance!
EDIT:
I was a bit unfamiliar with the query_string query, but it turns out you can do something like this with it too:
qs = ''
for q in queryStrings:
if qs:
qs += ' OR '
qs += q
payload={
"query": {
"bool": {
"should": [
{
"query_string": {
"query": qs,
"fields": [
"title",
"abstract"
]
}
}
]
}
}
}
the result will be a query similar to the multiple clause one's outlined below.
docs here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
ORIGINAL:
this can be achieved with multiple clauses like so:
queryString=['melanoma', 'dysplastic nevus syndrome']
payload={
"query": {
"bool": {
"should": [
{
"query_string": {
"query": queryString[0],
"fields": [
"title",
"abstract"
]
}
},
{
"query_string": {
"query": queryString[1],
"fields": [
"title",
"abstract"
]
}
}
]
}
}
}
If you have a variable number of queries, then you just need to dynamically build your "should" clauses like:
shoulds = []
for q in queryStrings:
shoulds.append({
"query_string": {
"query": q,
"fields": [
"title",
"abstract"
]
}
})
payload={
"query": {
"bool": {
"should": shoulds
}
}
}

Or in a Elasticsearch filter

I want to query my elasticsearch (using a python library) and I want to filter some of the document. Since I don't want to have a score I'm using only filter and must not keyword:
{
"_source": ["entities"],
"query": {
"bool": {
"must_not": [
{"exists": {"field": "retweeted_status"}}
],
"filter": [
{"match": {"entities.urls.display_url": "blabla.com"}},
{"match": {"entities.urls.display_url": "blibli.com"}}]
}
}
}
This is the query I have done but the problem is that in the same filter it's apparently a AND operation that is effectued. I would like it to be a OR. How can I change my query to have all the document that contain "blibli.com" OR "blabla.com"
You can nest bool inside another bool so you can write query like this:
{
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "retweeted_status"
}
}
],
"filter": [
{
"bool": {
"should": [
{
"match": {
"entities.urls.display_url": "blabla.com"
}
},
{
"match": {
"entities.urls.display_url": "blibli.com"
}
}
]
}
}
]
}
}
}
Tested on ES 5.3, you can use Explain API to check if this also works in your version of Elasticsearch.

Elasticsearch match multiple fields

I am recently using elasticsearch in a website. The scenario is, I have to search a string on afield. So, if the field is named as title then my search query was,
"query" :{"match": {"title": my_query_string}}.
But now I need to add another field in it. Let say, category. So i need to find the matches of my string which are in category :some_category and which have title : my_query_string I tried with multi_match. But it does not give me the result i am looking for. I am looking into query filter now. But is there way of adding two fields in such criteria in my match query?
GET indice/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "title"
}
},
{
"match": {
"category": "category"
}
}
]
}
}
}
Replace should with must if desired.
Ok, so I think that what you need is something like this:
"query": {
"filtered": {
"query": {
"match": {
"title": YOUR_QUERY_STRING,
}
},
"filter": {
"term": {
"category": YOUR_CATEGORY
}
}
}
}
If your category field is analyzed, then you will need to use match instead of term in the filter.
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{"match": {"title": "bold title"},
{"match": {"body": "nice body"}}
]
}
},
"filter": {
"term": {
"category": "xxx"
}
}
}
}

Elastic Search Function_Score Query with Query_String

I was doing search using elastic search using the code:
es.search(index="article-index", fields="url", body={
"query": {
"query_string": {
"query": "keywordstr",
"fields": [
"text",
"title",
"tags",
"domain"
]
}
}
})
Now I want to insert another parameter in the search scoring - "recencyboost".
I was told function_score should solve the problem
res = es.search(index="article-index", fields="url", body={
"query": {
"function_score": {
"functions": {
"DECAY_FUNCTION": {
"recencyboost": {
"origin": "0",
"scale": "20"
}
}
},
"query": {
{
"query_string": {
"query": keywordstr
}
}
},
"score_mode": "multiply"
}
}
})
It gives me error that dictionary {"query_string": {"query": keywordstr}} is not hashable.
1) How can I fix the error?
2) How can I change the decay function such that it give higher weight to higher recency boost?
You appear to have an extra query in your search (giving a total of three), which is giving you an unwanted top-level. You need to remove the top-level query and replace it with function_score as the top level key.
res = es.search(index="article-index", fields="url", body={"function_score": {
"query": {
{ "query_string": {"query": keywordstr} }
},
"functions": {
"DECAY_FUNCTION": {
"recencyboost": {
"origin": "0",
"scale": "20"
}
}
},
"score_mode": "multiply"
})
Note: score_mode defaults to "multiply", as does the unused boost_mode, so it should be unnecessary to supply it.
You cant use dictionary as a key in the dictionary. You are doing this in the following segment of the code:
"query": {
{"query_string": {"query": keywordstr}}
},
Following should work fine
"query": {
"query_string": {"query": keywordstr}
},
use it like this
query: {
function_score: {
query: {
filtered: {
query: {
bool: {
must: [
{
query_string: {
query: shop_search,
fields: [ 'shop_name']
},
boost: 2.0
},
{
query_string: {
query: shop_search,
fields: [ 'shop_name']
},
boost: 3.0
}
]
}
},
filter: {
// { term: { search_city: }}
}
},
exp: {
location: {
origin: { lat: 12.8748964,
lon: 77.6413239
},
scale: "10000m",
offset: "0m",
decay: "0.5"
}
}
// score_mode: "sum"
}

Categories