I'm curious about the best approach to count the instances of a particular field, across all documents, in a given ElasticSearch index.
For example, if I've got the following documents in index goober:
{
'_id':'foo',
'field1':'a value',
'field2':'a value'
},
{
'_id':'bar',
'field1':'a value',
'field2':'a value'
},
{
'_id':'baz',
'field1':'a value',
'field3':'a value'
}
I'd like to know something like the following:
{
'index':'goober',
'field_counts':
'field1':3,
'field2':2,
'field3':1
}
Is this doable with a single query? or multiple? For what it's worth, I'm using python elasticsearch and elasticsearch-dsl clients.
I've successfully issued a GET request to /goober and retrieved the mappings, and am learning how to submit requests for aggregations for each field, but I'm interested in learning how many times a particular field appears across all documents.
Coming from using Solr, still getting my bearings with ES. Thanks in advance for any suggestions.
The below will return you the count of docs with "field2":
POST /INDEX/_search
{
"size": 0,
"query": {
"bool": {
"filter": {
"exists": {
"field": "field2"
}
}
}
}
}
And here is an example using multiple aggregates (will return each agg in a bucket with a count), using field exist counts:
POST /INDEX/_search
{
"size": 0,
"aggs": {
"field_has1": {
"filter": {
"exists": {
"field": "field1"
}
}
},
"field_has2": {
"filter": {
"exists": {
"field": "field2"
}
}
}
}
}
The behavior within each agg on the second example will mimic the behavior of the first query. In many cases, you can take a regular search query and nest those lookups within aggregate buckets.
Quick time-saver based on existing answer:
interesting_fields = ['field1', 'field2']
body = {
'size': 0,
'aggs': {f'has_{field_name}': {
"filter": {
"exists": {
"field": f'export.{field_name}'
}
}
} for field_name in interesting_fields},
}
print(requests.post('http://localhost:9200/INDEX/_search', json=body).json())
Related
I am updating a mongodb collection for a small project of mine and I'm stuck with updating a single word in an existing field.
Example:
{
"_id" : ObjectId("5faa46a6036e146f85a4afef"),
"name" : "Kubernetes_cluster_setup - kubernetes-cluster"
}
In the document I want to change the "name": "Kubernetes_cluster_config -kubernetes-cluster".
I want config to be replaced in place of setup, and it should not remove the -kubernetes-cluster, that is a constant value.
Applied method > $set updates the entire field, but I want -kubernetes-cluster should not be removed.
Try using $replaceOne operator.
You need an aggregation like this.
db.collection.aggregate([
{
"$match": {
"id": 0
}
},
{
"$set": {
"name": {
"$replaceOne": {
"input": "$name",
"find": "setup",
"replacement": "config"
}
}
}
}
])
The first part is to find the element (I've used by id) and the second one is used to replace into the field name, the value setup for config.
Example here
Also, if you want to replace the string for every document, you can use this query:
db.collection.aggregate([
{
"$match": {
"name": {
"$regex": "setup"
}
}
},
{
"$set": {
"name": {
"$replaceOne": {
"input": "$name",
"find": "setup",
"replacement": "config"
}
}
}
}
])
Here the query look for the documents where field name contains the word setup and then replace for config.
Example here
I am not able to understand the implementation of the elastic search query along with the synonym table. With a general query, I don't have any search problems but incorporating synonyms as become an issue to me.
es.search(index='data_inex', body={
"query": {
"match": {"inex": "tren"}
},
"settings": {
"filter": {
"synonym": {
"type": "synonym",
"lenient": true,
"synonyms": [ "foo, baz", "tren, hut" ]
}
}
}
}
)
Also, is it possible to use a file instead of this array?
Check the documentation: Click Here
You can configure synonyms file as well:
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [ "synonym" ]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/synonym.txt" // <======== location of synonym file
}
}
}
}
}
}
Please note:
Changes in the synonyms file will not reflect in the documents indexed before the change. Re-indexing is required for the same.
You cannot change the mapping (including the analyzer) of an existing field. What you need to do if you want to change the mapping of existing documents is reindex those documents to another index with the updated mapping.
Search query doesn't support "settings".
I have an elasticsearch database that I upload entries like this:
{"s_food": "bread", "s_store": "Safeway", "s_date" : "2020-06-30", "l_run" : 28900, "l_covered": 1}
When I upload it to elasticsearch, it adds an _id, _type, #timestamp and _index fields. So the entries look sort of like this:
{"s_food": "bread", "s_store": "Safeway", "s_date" : "2020-06-30", "l_run" : 28900, "l_covered": 1, "_type": "_doc", "_index": "my_index", "_id": pe39u5hs874kee}
The way that I'm using the elasticsearch database results in the same original entries getting uploaded multiple times. In this example, I only care about the s_food, s_date, and l_run fields being a unique combination. Since I have so many entries, I'd like to use the elasticsearch scroll tool to go through all the matches. So far in elasticsearch, I've only seen people use aggregation to get buckets of each term and then they iterate over each partition. I would like to use something like aggregation to get an entire entry (just 1) for each unique combination of the three fields that I care about (food, date, run). Right now I use aggregation with a scroll like so:
GET /my-index/_search?scroll=25m
{
size: 10000,
aggs: {
foods: {
terms: {
field: s_food
},
aggs: {
dates: {
terms: {
field: s_date
},
aggs: {
runs: {
terms: {
field: l_run
}
}
}
}
}
}
}
Unfortunately this is only giving me the usual bucketed structure that I don't want. Is there something else I should try?
All you need is to use top-hits aggregation with size: 1. Read more about top-hits aggregation here.
The query would look like this:
{
"size": 10000,
"aggs": {
"foods": {
"terms": {
"field": "s_food"
},
"aggs": {
"dates": {
"terms": {
"field": "s_date"
},
"aggs": {
"runs": {
"terms": {
"field": "l_run"
},
"aggs": {
"topOne": {
"top_hits": {
"size": 1
}
}
}
}
}
}
}
}
}
}
}
I'm performing a search with this aggregate and would like to get my total count (to deal with my pagination).
results = mongo.db.perfumes.aggregate(
[
{"$match": {"$text": {"$search": db_query}}},
{
"$lookup": {
"from": "users",
"localField": "author",
"foreignField": "username",
"as": "creator",
}
},
{"$unwind": "$creator"},
{
"$project": {
"_id": "$_id",
"perfumeName": "$name",
"perfumeBrand": "$brand",
"perfumeDescription": "$description",
"date_updated": "$date_updated",
"perfumePicture": "$picture",
"isPublic": "$public",
"perfumeType": "$perfume_type",
"username": "$creator.username",
"firstName": "$creator.first_name",
"lastName": "$creator.last_name",
"profilePicture": "$creator.avatar",
}
},
{"$sort": {"perfumeName": 1}},
]
)
How could I get the count of results in my route so I can pass it to my template?
I cannot use results.count() as it is a CommandCursor.
Help please? Thank you!!
Using len method to return no.of elements in an array would be easier but if you still wanted an aggregation query to return count and actual docs at the same time then try using $facet or $group :
Query 1 :
{
$facet: {
docs: [ { $match: {} } ], // passes all docs into an array field
count: [ { $count: "count" } ] // counts no.of docs
}
},
/** re-create count field from array of one object to just a number */
{
$addFields: { count: { $arrayElemAt: [ "$count.count", 0 ] } }
}
Test : mongoplayground
Query 2 :
/** Group all docs without any condition & push all docs into an array field & count no.of docs flowing through iteration using `$sum` */
{
$group: { _id: "", docs: { $push: "$$ROOT" }, count: { $sum: 1 } }
}
Test : mongoplayground
Note :
Add one of these queries at the end of your current aggregation pipeline and remember if there are no docs after $match or $unwind stages then first query would not have count field but has docs : [] but second query will just return [], code it accordingly.
If you look at the CommandCursor's docs, it does not support count()
You can use the length filter in jinja template.
{{ results | length }}
I hope the above helps.
I am using python to generate a mongodb database collection and I need to find some specific values from the database, the document is like:
{
"_id":ObjectId(215454541245),
"category":food
"venues":{"Thai Restaurant":251, "KFC":124, "Chinese Restaurant":21,.....}
}
My question is that, I want to query this database and find all venues which have a value smaller than 200, so in my example, "KFC" and "Chinese Restaurant" will be returned from this query.
Anyone knows how to do that?
If you can change your schema it would be much easier to issue queries against your collection. As it is, having dynamic values as your keys is considered a bad design pattern with MongoDB as they are extremely difficult to query.
A recommended approach would be to follow an embedded model like this:
{
"_id": ObjectId("553799187174b8c402151d06"),
"category": "food",
"venues": [
{
"name": "Thai Restaurant",
"value": 251
},
{
"name": "KFC",
"value": 124
},
{
"name": "Chinese Restaurant",
"value": 21
}
]
}
Thus with this structure you could then issue the query to find all venues which have a value smaller than 200:
db.collection.findOne({"venues.value": { "$lt": 200 } },
{
"venues": { "$elemMatch": { "value": { "$lt": 200 } } },
"_id": 0
});
This will return the result:
/* 0 */
{
"venues" : [
{
"name" : "KFC",
"value" : 124
}
]
}