elasticsearch + python substring search in array of strings - python

I have 10k+ records in elastic search. one of the fields(dept) holds data in form of array
eg records are
{
"username": "tom",
"dept": [
"cust_service",
"sales_rpr",
"store_in",
],
"location": "NY"
}
{
"username": "adam",
"dept": [
"cust_opr",
"floor_in",
"mg_cust_opr",
],
"location": "MA"
}
.
.
.
I want to do autocomplete on dept field, if user search for cus it should return
["cust_service", "cust_opr", "mg_cust_opr"]
With best match at the top
I have made the query
query = {
"_source": [],
"size": 0,
"min_score": 0.5,
"query": {
"bool": {
"must": [
{
"wildcard": {
"dept": {
"value": "*cus*"
}
}
}
],
"filter": [],
"should": [],
"must_not": []
}
},
"aggs": {
"auto_complete": {
"terms": {
"field": f"dept.raw",
"size": 20,
"order": {"max_score": 'desc'}
},
"aggs": {
"max_score": {
"avg": {"script": "_score"}
}
}
}
}
}
It is not giving ["cust_service", "cust_opr", "mg_cust_opr"] instead gives other answers which are irrelevant to search key(cus). but when field is just string instead of array it is giving the result as expected.
How do i solve this problem?
Thanks in advance!

Related

$elemMatch to fetch specific values inside array

I have a collection named 'attendance' that has an array:
[
{
"faculty": "20XX-XXXXX-XX-1",
"sections": [
{
"section": "XXXX 3-1",
"date": "04-11-2022",
"attendance": [
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
}
]
},
{
"section": "XXXX 3-2",
"date": "04-11-2022",
"attendance": [
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
}
]
}
]
}
]
I have been trying to query the values of the specific element in my array using $and and $elemMatch in:
db.attendance.find({$and:[{faculty:"20XX-XXXXX-XX-1"},{sections:{$elemMatch:{section:"XXXX 3-1",date:"04-11-2022"}}}]});
But it still prints the other section rather than one. I want to output to be:
{
"faculty": "20XX-XXXXX-XX-1",
"sections": [
{
"section": "XXXX 3-1",
"date": "04-11-2022",
"attendance": [
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
}
]
}
And I tried using the dot notation like:
db.attendance.find({"sections.section":"XXXX 3-1", "sections.date":"04-11-2022});
Still no luck. I'm not sure if what I'm doing is right or not. Thanks in advance!
Option 1: find/elemMatch-> You will need to add the $elemMatch also to the project section of the find query as follow:
db.collection.find({
"faculty": "20XX-XXXXX-XX-1",
sections: {
$elemMatch: {
section: "XXXX 3-1",
date: "04-11-2022"
}
}
},
{
sections: {
$elemMatch: {
section: "XXXX 3-1",
date: "04-11-2022"
}
}
})
Explained:
Find query has the following syntax:db.collection.find({query},{project})
Adding the project section allow you to filter the expected output.
playground option 1
Option 2: Via aggregation/$filter:
db.collection.aggregate([
{
"$addFields": {
"sections": {
"$filter": {
"input": "$sections",
"as": "s",
"cond": {
$and: [
{
$eq: [
"$$s.section",
"XXXX 3-1"
]
},
{
$eq: [
"$$s.date",
"04-11-2022"
]
}
]
}
}
}
}
}
])
Explaned:
Replace the original sections array with new ones where the array elements are filtered based on the provided criteria.
playground option 2

Elastic Search - Multiple index with Field map

I need to search multiple indices along with its field maps.
For example I want to query a string, in
field1 with index1
field2 with index2
from elasticsearch import Elasticsearch
es = Elasticsearch([eshost])
req_string = {
"size":1000,
"query": {
"query_string": {
"query": "string to be searched",
"fields": ["field1","field2"],
}
}
}
res = es.search(index='index1,index2', body=req_string)
Is it possible to do it ?
If yes please guide with some links. Thanks in Advance !
You can use _index field, when querying across multiple indexes.
The _index field allows matching on the index a document was indexed
into. Its value is accessible in certain queries and aggregations, and
when sorting or scripting
Adding a working example with index data,search query, and search result.
Index Data:
PUT/ index1/_doc/1
{
"name": "Hello"
}
PUT/ index2/_doc/1
{
"name": "Hello World"
}
Search Query:
{
"query": {
"bool": {
"filter": [
{
"terms": {
"_index": [
"index1",
"index2"
]
}
}
],
"must": [
{
"simple_query_string": {
"query": "hello",
"fields": [
"name",
"title"
]
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "index2",
"_type": "_doc",
"_id": "1",
"_score": 0.4700036,
"_source": {
"name": "Hello World"
}
},
{
"_index": "index1",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"title": "Hello"
}
}
]
Updated Search Query:
The below search query will search for title field only in index1 and name field only in index2
{
"query": {
"bool": {
"should": [
{
"bool": {
"filter": [
{
"terms": {
"_index": [
"index1"
]
}
}
],
"must": [
{
"query_string": {
"query": "hello",
"fields": [
"title"
]
}
}
]
}
},
{
"bool": {
"filter": [
{
"terms": {
"_index": [
"index2"
]
}
}
],
"must": [
{
"query_string": {
"query": "hello",
"fields": [
"name"
]
}
}
]
}
}
]
}
}
}

ElasticSearch: Retrieve field and it's normalization

I want to retrieve a field as well as it's normalized version from Elasticsearch.
Here's my index definition and data
PUT normalizersample
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"refresh_interval": "60s",
"analysis": {
"normalizer": {
"my_normalizer": {
"filter": [
"lowercase",
"german_normalization",
"asciifolding"
],
"type": "custom"
}
}
}
},
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"myField": {
"type": "text",
"store": true,
"fields": {
"keyword": {
"type": "keyword",
"store": true
},
"normalized": {
"type": "keyword",
"store": true,
"normalizer": "my_normalizer"
}
}
}
}
}
}
POST normalizersample/_doc/1
{
"myField": ["Andreas", "Ämdreas", "Anders"]
}
My first approach was to use scripted fields like
GET /myIndex/_search
{
"size": 100,
"query": {
"match_all": {}
},
"script_fields": {
"keyword": {
"script": "doc['myField.keyword']"
},
"normalized": {
"script": "doc['myField.normalized']"
}
}
}
However, since myField is an array, this returns two lists of strings per ES document and each of them are sorted alphabetically. Hence, the corresponding entries might not match to each other due to the normalization.
"hits" : [
{
"_index" : "normalizersample",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"de" : [
"amdreas",
"anders",
"andreas"
],
"keyword" : [
"Anders",
"Andreas",
"Ämdreas"
]
}
}
]
While I would like to retrieve [(Andreas, andreas), (Ämdreas, amdreas) (Anders, anders)] or a similar format where I can match every entry to its normalization.
The only way I found was to call Term Vectors on both fields since they contain a position field, but this seems like a huge overhead to me. (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html)
Is there a simpler way to retrieve tuples with the keyword and the normalized field?
Thanks a lot!

Filter MongoDB query to find documents only if a field in a list of objects is not empty

I have a MongoDB document structure like following:
Structure
{
"stores": [
{
"items": [
{
"feedback": [],
"item_category": "101",
"item_id": "10"
},
{
"feedback": [],
"item_category": "101",
"item_id": "11"
}
]
},
{
"items": [
{
"feedback": [],
"item_category": "101",
"item_id": "10"
},
{
"feedback": ["A feedback"],
"item_category": "101",
"item_id": "11"
},
{
"feedback": [],
"item_category": "101",
"item_id": "12"
},
{
"feedback": [],
"item_category": "102",
"item_id": "13"
},
{
"feedback": [],
"item_category": "102",
"item_id": "14"
}
],
"store_id": 500
}
]
}
This is a single document in a collection. Some field are deleted to produce minimal representation of the data.
What I want is to get items only if the feedback field in the items array is not empty. The expected result is:
Expected result
{
"stores": [
{
"items": [
{
"feedback": ["A feedback"],
"item_category": "101",
"item_id": "11"
}
],
"store_id": 500
}
]
}
This is what I tried based on examples in this, which I think pretty same situation, but it didn't work. What's wrong with my query, isn't it the same situation in zipcode search example in the link? It returns everything like in the first JSON code, Structure:
What I tried
query = {
'date': {'$gte': since, '$lte': until},
'stores.items': {"$elemMatch": {"feedback": {"$ne": []}}}
}
Thanks.
Please try this :
db.yourCollectionName.aggregate([
{ $match: { 'date': { '$gte': since, '$lte': until }, 'stores.items': { "$elemMatch": { "feedback": { "$ne": [] } } } } },
{ $unwind: '$stores' },
{ $match: { 'stores.items': { "$elemMatch": { "feedback": { "$ne": [] } } } } },
{ $unwind: '$stores.items' },
{ $match: { 'stores.items.feedback': { "$ne": [] } } },
{ $group: { _id: { _id: '$_id', store_id: '$stores.store_id' }, items: { $push: '$stores.items' } } },
{ $project: { _id: '$_id._id', store_id: '$_id.store_id', items: 1 } },
{ $group: { _id: '$_id', stores: { $push: '$$ROOT' } } },
{ $project: { 'stores._id': 0 } }
])
We've all these stages as you need to operate on an array of arrays, this query is written assuming you're dealing with a large set of data, Since you're filtering on dates just in case if your documents size is way less after first $match then you can avoid following $match stage which is in between two $unwind's.
Ref 's :
$match,
$unwind,
$project,
$group
This aggregate query gets the needed result (using the provided sample document and run from the mongo shell):
db.stores.aggregate( [
{ $unwind: "$stores" },
{ $unwind: "$stores.items" },
{ $addFields: { feedbackExists: { $gt: [ { $size: "$stores.items.feedback" }, 0 ] } } },
{ $match: { feedbackExists: true } },
{ $project: { _id: 0, feedbackExists: 0 } }
] )

Unhashable type 'dict' when trying to send an Elasticsearch

I keep on getting the following error in Python
Exception has occurred: TypeError unhashable type: 'dict'
on line 92
"should": [],
"must_not": []
This is the query string
res = es.search(
scroll = '2m',
index = "logstash-*",
body = {
{
"aggs": {
"2": {
"terms": {
"field": "src_ip.keyword",
"size": 50,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"cardinality": {
"field": "src_ip.keyword"
}
}
}
}
},
"size": 0,
"_source": {
"excludes": []
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [
{
"field": "#timestamp",
"format": "date_time"
},
{
"field": "flow.start",
"format": "date_time"
},
{
"field": "timestamp",
"format": "date_time"
},
{
"field": "tls.notafter",
"format": "date_time"
},
{
"field": "tls.notbefore",
"format": "date_time"
}
],
"query": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gte": 1555777931992,
"lte": 1558369931992,
"format": "epoch_millis"
}
}
}
],
"filter": [
{
"match_all": {}
}
],
"should": [],
"must_not": []
}
}
}
})
the value of body is a set ({ } without key-value is a set literal, e.g., {1,2} is a set). Inside this set you have a dictionary.
Items in a set have to be hashable, and dictionary isn't.
As the comment from #Carcigenicate says, it seems like a typo of having {{ }} instead of { } for the value of body.
Elasticsearch documentation shows that body should be a dictionary.
More about sets from python docs

Categories