I want to retrieve a field as well as it's normalized version from Elasticsearch.
Here's my index definition and data
PUT normalizersample
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"refresh_interval": "60s",
"analysis": {
"normalizer": {
"my_normalizer": {
"filter": [
"lowercase",
"german_normalization",
"asciifolding"
],
"type": "custom"
}
}
}
},
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"myField": {
"type": "text",
"store": true,
"fields": {
"keyword": {
"type": "keyword",
"store": true
},
"normalized": {
"type": "keyword",
"store": true,
"normalizer": "my_normalizer"
}
}
}
}
}
}
POST normalizersample/_doc/1
{
"myField": ["Andreas", "Ämdreas", "Anders"]
}
My first approach was to use scripted fields like
GET /myIndex/_search
{
"size": 100,
"query": {
"match_all": {}
},
"script_fields": {
"keyword": {
"script": "doc['myField.keyword']"
},
"normalized": {
"script": "doc['myField.normalized']"
}
}
}
However, since myField is an array, this returns two lists of strings per ES document and each of them are sorted alphabetically. Hence, the corresponding entries might not match to each other due to the normalization.
"hits" : [
{
"_index" : "normalizersample",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"de" : [
"amdreas",
"anders",
"andreas"
],
"keyword" : [
"Anders",
"Andreas",
"Ämdreas"
]
}
}
]
While I would like to retrieve [(Andreas, andreas), (Ämdreas, amdreas) (Anders, anders)] or a similar format where I can match every entry to its normalization.
The only way I found was to call Term Vectors on both fields since they contain a position field, but this seems like a huge overhead to me. (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html)
Is there a simpler way to retrieve tuples with the keyword and the normalized field?
Thanks a lot!
Related
I have a collection named 'attendance' that has an array:
[
{
"faculty": "20XX-XXXXX-XX-1",
"sections": [
{
"section": "XXXX 3-1",
"date": "04-11-2022",
"attendance": [
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
}
]
},
{
"section": "XXXX 3-2",
"date": "04-11-2022",
"attendance": [
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
}
]
}
]
}
]
I have been trying to query the values of the specific element in my array using $and and $elemMatch in:
db.attendance.find({$and:[{faculty:"20XX-XXXXX-XX-1"},{sections:{$elemMatch:{section:"XXXX 3-1",date:"04-11-2022"}}}]});
But it still prints the other section rather than one. I want to output to be:
{
"faculty": "20XX-XXXXX-XX-1",
"sections": [
{
"section": "XXXX 3-1",
"date": "04-11-2022",
"attendance": [
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
},
{
"number": "XXXXX",
"status": "Present"
}
]
}
And I tried using the dot notation like:
db.attendance.find({"sections.section":"XXXX 3-1", "sections.date":"04-11-2022});
Still no luck. I'm not sure if what I'm doing is right or not. Thanks in advance!
Option 1: find/elemMatch-> You will need to add the $elemMatch also to the project section of the find query as follow:
db.collection.find({
"faculty": "20XX-XXXXX-XX-1",
sections: {
$elemMatch: {
section: "XXXX 3-1",
date: "04-11-2022"
}
}
},
{
sections: {
$elemMatch: {
section: "XXXX 3-1",
date: "04-11-2022"
}
}
})
Explained:
Find query has the following syntax:db.collection.find({query},{project})
Adding the project section allow you to filter the expected output.
playground option 1
Option 2: Via aggregation/$filter:
db.collection.aggregate([
{
"$addFields": {
"sections": {
"$filter": {
"input": "$sections",
"as": "s",
"cond": {
$and: [
{
$eq: [
"$$s.section",
"XXXX 3-1"
]
},
{
$eq: [
"$$s.date",
"04-11-2022"
]
}
]
}
}
}
}
}
])
Explaned:
Replace the original sections array with new ones where the array elements are filtered based on the provided criteria.
playground option 2
I have 10k+ records in elastic search. one of the fields(dept) holds data in form of array
eg records are
{
"username": "tom",
"dept": [
"cust_service",
"sales_rpr",
"store_in",
],
"location": "NY"
}
{
"username": "adam",
"dept": [
"cust_opr",
"floor_in",
"mg_cust_opr",
],
"location": "MA"
}
.
.
.
I want to do autocomplete on dept field, if user search for cus it should return
["cust_service", "cust_opr", "mg_cust_opr"]
With best match at the top
I have made the query
query = {
"_source": [],
"size": 0,
"min_score": 0.5,
"query": {
"bool": {
"must": [
{
"wildcard": {
"dept": {
"value": "*cus*"
}
}
}
],
"filter": [],
"should": [],
"must_not": []
}
},
"aggs": {
"auto_complete": {
"terms": {
"field": f"dept.raw",
"size": 20,
"order": {"max_score": 'desc'}
},
"aggs": {
"max_score": {
"avg": {"script": "_score"}
}
}
}
}
}
It is not giving ["cust_service", "cust_opr", "mg_cust_opr"] instead gives other answers which are irrelevant to search key(cus). but when field is just string instead of array it is giving the result as expected.
How do i solve this problem?
Thanks in advance!
generate unique id in nested document - Pymongo
my database looks like this...
{
"_id":"5ea661d6213894a6082af6d1",
"blog_id":"blog_one",
"comments": [
{
"user_id":"1",
"comment":"comment for blog one this is good"
},
{
"user_id":"2",
"comment":"other for blog one"
},
]
}
I want to add unique id in each and every comment,
I want it to output like this,
{
"_id":"5ea661d6213894a6082af6d1",
"blog_id":"blog_one",
"comments": [
{
"id" : "something" (auto generate unique),
"user_id":"1",
"comment":"comment for blog one this is good"
},
{
"id" : "something" (auto generate unique),
"user_id":"2",
"comment":"other for blog one"
},
]
}
I'm using PyMongo, is there a way to update this kind of document?
it's possible or not?
This update will add an unique id value to each of the comments array with nested documents. The id value is calculated based upon the present time as milliseconds. This value is incremented for each array element to get the new id value for the nested documents of the array.
The code runs with MongoDB version 4.2 and PyMongo 3.10.
pipeline = [
{
"$set": {
"comments": {
"$map": {
"input": { "$range": [ 0, { "$size": "$comments" } ] },
"in": {
"$mergeObjects": [
{ "id": { "$add": [ { "$toLong" : datetime.datetime.now() }, "$$this" ] } },
{ "$arrayElemAt": [ "$comments", "$$this" ] }
]
}
}
}
}
}
]
collection.update_one( { }, pipeline )
The updated document:
{
"_id" : "5ea661d6213894a6082af6d1",
"blog_id" : "blog_one",
"comments" : [
{
"id" : NumberLong("1588179349566"),
"user_id" : "1",
"comment" : "comment for blog one this is good"
},
{
"id" : NumberLong("1588179349567"),
"user_id" : "2",
"comment" : "other for blog one"
}
]
}
[ EDIT ADD ]
The following works from mongo shell. It adds unique id for the comments array's nested documents - unique across the documents.
db.collection.aggregate( [
{
"$unwind": "$comments" },
{
"$group": {
"_id": null,
"count": { "$sum": 1 },
"docs": { "$push": "$$ROOT" },
"now": { $first: "$$NOW" }
}
},
{
"$addFields": {
"docs": {
"$map": {
"input": { "$range": [ 0, "$count" ] },
"in": {
"$mergeObjects": [
{ "comments_id": { "$add": [ { "$toLong" : "$now" }, "$$this" ] } },
{ "$arrayElemAt": [ "$docs", "$$this" ] }
]
}
}
}
}
},
{
"$unwind": "$docs"
},
{
"$addFields": {
"docs.comments.comments_id": "$docs.comments_id"
}
},
{
"$replaceRoot": { "newRoot": "$docs" }
},
{
"$group": {
"_id": { "_id": "$_id", "blog_id": "$blog_id" },
"comments": { "$push": "$comments" }
}
},
{
$project: {
"_id": 0,
"_id": "$_id._id",
"blog_id": "$_id.blog_id",
"comments": 1
}
}
] ).forEach(doc => db.blogs.updateOne( { _id: doc._id }, { $set: { comments: doc.comments } } ) )
You can use ObjectId constructor to create the ids and place them in your nested documents.
I have a MongoDB document structure like following:
Structure
{
"stores": [
{
"items": [
{
"feedback": [],
"item_category": "101",
"item_id": "10"
},
{
"feedback": [],
"item_category": "101",
"item_id": "11"
}
]
},
{
"items": [
{
"feedback": [],
"item_category": "101",
"item_id": "10"
},
{
"feedback": ["A feedback"],
"item_category": "101",
"item_id": "11"
},
{
"feedback": [],
"item_category": "101",
"item_id": "12"
},
{
"feedback": [],
"item_category": "102",
"item_id": "13"
},
{
"feedback": [],
"item_category": "102",
"item_id": "14"
}
],
"store_id": 500
}
]
}
This is a single document in a collection. Some field are deleted to produce minimal representation of the data.
What I want is to get items only if the feedback field in the items array is not empty. The expected result is:
Expected result
{
"stores": [
{
"items": [
{
"feedback": ["A feedback"],
"item_category": "101",
"item_id": "11"
}
],
"store_id": 500
}
]
}
This is what I tried based on examples in this, which I think pretty same situation, but it didn't work. What's wrong with my query, isn't it the same situation in zipcode search example in the link? It returns everything like in the first JSON code, Structure:
What I tried
query = {
'date': {'$gte': since, '$lte': until},
'stores.items': {"$elemMatch": {"feedback": {"$ne": []}}}
}
Thanks.
Please try this :
db.yourCollectionName.aggregate([
{ $match: { 'date': { '$gte': since, '$lte': until }, 'stores.items': { "$elemMatch": { "feedback": { "$ne": [] } } } } },
{ $unwind: '$stores' },
{ $match: { 'stores.items': { "$elemMatch": { "feedback": { "$ne": [] } } } } },
{ $unwind: '$stores.items' },
{ $match: { 'stores.items.feedback': { "$ne": [] } } },
{ $group: { _id: { _id: '$_id', store_id: '$stores.store_id' }, items: { $push: '$stores.items' } } },
{ $project: { _id: '$_id._id', store_id: '$_id.store_id', items: 1 } },
{ $group: { _id: '$_id', stores: { $push: '$$ROOT' } } },
{ $project: { 'stores._id': 0 } }
])
We've all these stages as you need to operate on an array of arrays, this query is written assuming you're dealing with a large set of data, Since you're filtering on dates just in case if your documents size is way less after first $match then you can avoid following $match stage which is in between two $unwind's.
Ref 's :
$match,
$unwind,
$project,
$group
This aggregate query gets the needed result (using the provided sample document and run from the mongo shell):
db.stores.aggregate( [
{ $unwind: "$stores" },
{ $unwind: "$stores.items" },
{ $addFields: { feedbackExists: { $gt: [ { $size: "$stores.items.feedback" }, 0 ] } } },
{ $match: { feedbackExists: true } },
{ $project: { _id: 0, feedbackExists: 0 } }
] )
I have a elastic search index collection like below,
"_index":"test",
"_type":"abc",
"_source":{
"file_name":"xyz.ex"
"metadata":{
"format":".ex"
"profile":[
{"date_value" : "2018-05-30T00:00:00",
"key_id" : "1",
"type" : "date",
"value" : [ "30-05-2018" ]
},
{
"key_id" : "2",
"type" : "freetext",
"value" : [ "New york" ]
}
}
Now I need to search for document by matching key_id to its value. (key_id is some field whose value is stored in "value")
Ex. For key_id='1'field, if it's value = "30-05-2018" it should match the above document.
I tried mapping this as a nested object, But I am not able to write query to search with 2 or more key_id matching its respective value.
This is how I would do it. You need to AND together via bool/filter (or bool/must) two nested queries for each of the condition pair, since you want to match two different nested elements from the same parent document.
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "metadata.profile",
"query": {
"bool": {
"filter": [
{
"term": {
"metadata.profile.f1": "a"
}
},
{
"term": {
"metadata.profile.f2": true
}
}
]
}
}
}
},
{
"nested": {
"path": "metadata.profile",
"query": {
"bool": {
"filter": [
{
"term": {
"metadata.profile.f1": "b"
}
},
{
"term": {
"metadata.profile.f2": false
}
}
]
}
}
}
}
]
}
}
}