I have a elastic search index collection like below,
"_index":"test",
"_type":"abc",
"_source":{
"file_name":"xyz.ex"
"metadata":{
"format":".ex"
"profile":[
{"date_value" : "2018-05-30T00:00:00",
"key_id" : "1",
"type" : "date",
"value" : [ "30-05-2018" ]
},
{
"key_id" : "2",
"type" : "freetext",
"value" : [ "New york" ]
}
}
Now I need to search for document by matching key_id to its value. (key_id is some field whose value is stored in "value")
Ex. For key_id='1'field, if it's value = "30-05-2018" it should match the above document.
I tried mapping this as a nested object, But I am not able to write query to search with 2 or more key_id matching its respective value.
This is how I would do it. You need to AND together via bool/filter (or bool/must) two nested queries for each of the condition pair, since you want to match two different nested elements from the same parent document.
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "metadata.profile",
"query": {
"bool": {
"filter": [
{
"term": {
"metadata.profile.f1": "a"
}
},
{
"term": {
"metadata.profile.f2": true
}
}
]
}
}
}
},
{
"nested": {
"path": "metadata.profile",
"query": {
"bool": {
"filter": [
{
"term": {
"metadata.profile.f1": "b"
}
},
{
"term": {
"metadata.profile.f2": false
}
}
]
}
}
}
}
]
}
}
}
Related
I want to retrieve a field as well as it's normalized version from Elasticsearch.
Here's my index definition and data
PUT normalizersample
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"refresh_interval": "60s",
"analysis": {
"normalizer": {
"my_normalizer": {
"filter": [
"lowercase",
"german_normalization",
"asciifolding"
],
"type": "custom"
}
}
}
},
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"myField": {
"type": "text",
"store": true,
"fields": {
"keyword": {
"type": "keyword",
"store": true
},
"normalized": {
"type": "keyword",
"store": true,
"normalizer": "my_normalizer"
}
}
}
}
}
}
POST normalizersample/_doc/1
{
"myField": ["Andreas", "Ämdreas", "Anders"]
}
My first approach was to use scripted fields like
GET /myIndex/_search
{
"size": 100,
"query": {
"match_all": {}
},
"script_fields": {
"keyword": {
"script": "doc['myField.keyword']"
},
"normalized": {
"script": "doc['myField.normalized']"
}
}
}
However, since myField is an array, this returns two lists of strings per ES document and each of them are sorted alphabetically. Hence, the corresponding entries might not match to each other due to the normalization.
"hits" : [
{
"_index" : "normalizersample",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"de" : [
"amdreas",
"anders",
"andreas"
],
"keyword" : [
"Anders",
"Andreas",
"Ämdreas"
]
}
}
]
While I would like to retrieve [(Andreas, andreas), (Ämdreas, amdreas) (Anders, anders)] or a similar format where I can match every entry to its normalization.
The only way I found was to call Term Vectors on both fields since they contain a position field, but this seems like a huge overhead to me. (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html)
Is there a simpler way to retrieve tuples with the keyword and the normalized field?
Thanks a lot!
generate unique id in nested document - Pymongo
my database looks like this...
{
"_id":"5ea661d6213894a6082af6d1",
"blog_id":"blog_one",
"comments": [
{
"user_id":"1",
"comment":"comment for blog one this is good"
},
{
"user_id":"2",
"comment":"other for blog one"
},
]
}
I want to add unique id in each and every comment,
I want it to output like this,
{
"_id":"5ea661d6213894a6082af6d1",
"blog_id":"blog_one",
"comments": [
{
"id" : "something" (auto generate unique),
"user_id":"1",
"comment":"comment for blog one this is good"
},
{
"id" : "something" (auto generate unique),
"user_id":"2",
"comment":"other for blog one"
},
]
}
I'm using PyMongo, is there a way to update this kind of document?
it's possible or not?
This update will add an unique id value to each of the comments array with nested documents. The id value is calculated based upon the present time as milliseconds. This value is incremented for each array element to get the new id value for the nested documents of the array.
The code runs with MongoDB version 4.2 and PyMongo 3.10.
pipeline = [
{
"$set": {
"comments": {
"$map": {
"input": { "$range": [ 0, { "$size": "$comments" } ] },
"in": {
"$mergeObjects": [
{ "id": { "$add": [ { "$toLong" : datetime.datetime.now() }, "$$this" ] } },
{ "$arrayElemAt": [ "$comments", "$$this" ] }
]
}
}
}
}
}
]
collection.update_one( { }, pipeline )
The updated document:
{
"_id" : "5ea661d6213894a6082af6d1",
"blog_id" : "blog_one",
"comments" : [
{
"id" : NumberLong("1588179349566"),
"user_id" : "1",
"comment" : "comment for blog one this is good"
},
{
"id" : NumberLong("1588179349567"),
"user_id" : "2",
"comment" : "other for blog one"
}
]
}
[ EDIT ADD ]
The following works from mongo shell. It adds unique id for the comments array's nested documents - unique across the documents.
db.collection.aggregate( [
{
"$unwind": "$comments" },
{
"$group": {
"_id": null,
"count": { "$sum": 1 },
"docs": { "$push": "$$ROOT" },
"now": { $first: "$$NOW" }
}
},
{
"$addFields": {
"docs": {
"$map": {
"input": { "$range": [ 0, "$count" ] },
"in": {
"$mergeObjects": [
{ "comments_id": { "$add": [ { "$toLong" : "$now" }, "$$this" ] } },
{ "$arrayElemAt": [ "$docs", "$$this" ] }
]
}
}
}
}
},
{
"$unwind": "$docs"
},
{
"$addFields": {
"docs.comments.comments_id": "$docs.comments_id"
}
},
{
"$replaceRoot": { "newRoot": "$docs" }
},
{
"$group": {
"_id": { "_id": "$_id", "blog_id": "$blog_id" },
"comments": { "$push": "$comments" }
}
},
{
$project: {
"_id": 0,
"_id": "$_id._id",
"blog_id": "$_id.blog_id",
"comments": 1
}
}
] ).forEach(doc => db.blogs.updateOne( { _id: doc._id }, { $set: { comments: doc.comments } } ) )
You can use ObjectId constructor to create the ids and place them in your nested documents.
I want to search text inside fields.
I tried to fix my problem from this documentation
One of my index contains items which structure is the following:
{
url: "https://exampleurl.com"
username: "some_username"
}
Here is my querys:
"query": {
"multi_match": {
"query": keyword,
"type": "phrase",
"fields": [ "username", "url" ]
}
}
Also bool query:
"query": {
"bool": {
"must": {
"multi_match": {
"query": keyword,
"type": "phrase",
"fields": [ "username", "url" ]
}
},
}
}
"query": {
"bool": {
"must": [{
"match": {
"username": keyword,
}
}, {
"match": {
"url": keyword
}
}]
}
}
But result is a empty array
please try the below query.
Create Index
PUT test
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"properties" : {
"url" : { "type" : "text" },
"username" : { "type" : "text" }
}
}
}
Insert Document
PUT test/_doc/1
{
"url" : "https://exampleurl.com",
"username" : "Arjun Das"
}
Search
GET test/_search
{
"query": {
"multi_match": {
"query": "http",
"type": "best_fields",
"fields": [ "username", "url" ],
"fuzziness":"2"
}
}
}
I'd like to "translate" a string like:
A AND (C OR B) AND NOT D
into an Elasticsearch query like:
{
"query": {
"bool": {
"must": {
"term": {
"text": "A"
}
},
"must_not": {
"term": {
"text": "D"
}
},
"should": [
{
"term": {
"text": "B"
}
},
{
"term": {
"text": "C"
}
}
],
"minimum_should_match": 1,
"boost": 1
}
}
}
does exists some library which I can use ?
any help appreciated
Thanks!
ok according to:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
I can do query like:
{
"query": {
"query_string" : {
"default_field" : "text",
"query" : (this AND (submitted OR flowers) AND NOT blight"
}
}
}
which works great.
I was doing search using elastic search using the code:
es.search(index="article-index", fields="url", body={
"query": {
"query_string": {
"query": "keywordstr",
"fields": [
"text",
"title",
"tags",
"domain"
]
}
}
})
Now I want to insert another parameter in the search scoring - "recencyboost".
I was told function_score should solve the problem
res = es.search(index="article-index", fields="url", body={
"query": {
"function_score": {
"functions": {
"DECAY_FUNCTION": {
"recencyboost": {
"origin": "0",
"scale": "20"
}
}
},
"query": {
{
"query_string": {
"query": keywordstr
}
}
},
"score_mode": "multiply"
}
}
})
It gives me error that dictionary {"query_string": {"query": keywordstr}} is not hashable.
1) How can I fix the error?
2) How can I change the decay function such that it give higher weight to higher recency boost?
You appear to have an extra query in your search (giving a total of three), which is giving you an unwanted top-level. You need to remove the top-level query and replace it with function_score as the top level key.
res = es.search(index="article-index", fields="url", body={"function_score": {
"query": {
{ "query_string": {"query": keywordstr} }
},
"functions": {
"DECAY_FUNCTION": {
"recencyboost": {
"origin": "0",
"scale": "20"
}
}
},
"score_mode": "multiply"
})
Note: score_mode defaults to "multiply", as does the unused boost_mode, so it should be unnecessary to supply it.
You cant use dictionary as a key in the dictionary. You are doing this in the following segment of the code:
"query": {
{"query_string": {"query": keywordstr}}
},
Following should work fine
"query": {
"query_string": {"query": keywordstr}
},
use it like this
query: {
function_score: {
query: {
filtered: {
query: {
bool: {
must: [
{
query_string: {
query: shop_search,
fields: [ 'shop_name']
},
boost: 2.0
},
{
query_string: {
query: shop_search,
fields: [ 'shop_name']
},
boost: 3.0
}
]
}
},
filter: {
// { term: { search_city: }}
}
},
exp: {
location: {
origin: { lat: 12.8748964,
lon: 77.6413239
},
scale: "10000m",
offset: "0m",
decay: "0.5"
}
}
// score_mode: "sum"
}