Sum for Multiple Ranges on GroupBy Aggregations in Elasticsearch

Sum for Multiple Ranges on GroupBy Aggregations in Elasticsearch - python

The following mapping is aggregated on multiple levels on a field grouping documents using another field.
Mapping:
{
'predictions': {
'properties': {
'Company':{'type':'string'},
'TxnsId':{'type':'string'},
'Emp':{'type':'string'},
'Amount':{'type':'float'},
'Cash/online':{'type':'string'},
'items':{'type':'float'},
'timestamp':{'type':'date'}
}
}
}
My requirement is bit complex, I need to
For each Emp (Getting the distinct employees)
Check whether it is online or cashed transaction
Group by items with the ranges like 0-10,11-20,21-30....
Sum the Amount
Final Output is like:
>Emp-online-range-Amount
>a-online-(0-10)-1240$
>a-online-(21-30)-3543$
>b-online-(0-10)-2345$
>b-online-(11-20)-3456$

Something like this should do the job:
{
"size": 0,
"aggs": {
"by_emp": {
"terms": {
"field": "Emp"
},
"aggs": {
"cash_online": {
"filters": {
"filters": {
"cashed": {
"term": {
"Cash/online": "cached"
}
},
"online": {
"term": {
"Cash/online": "online"
}
}
}
},
"aggs": {
"ranges": {
"range": {
"field": "items",
"ranges": [
{
"from": 0,
"to": 11
},
{
"from": 11,
"to": 21
},
{
"from": 21,
"to": 31
}
]
},
"aggs": {
"total": {
"sum": {
"field": "Amount"
}
}
}
}
}
}
}
}
}
}

Related

Filter MongoDB query to find documents only if a field in a list of objects is not empty

I have a MongoDB document structure like following:
Structure
{
"stores": [
{
"items": [
{
"feedback": [],
"item_category": "101",
"item_id": "10"
},
{
"feedback": [],
"item_category": "101",
"item_id": "11"
}
]
},
{
"items": [
{
"feedback": [],
"item_category": "101",
"item_id": "10"
},
{
"feedback": ["A feedback"],
"item_category": "101",
"item_id": "11"
},
{
"feedback": [],
"item_category": "101",
"item_id": "12"
},
{
"feedback": [],
"item_category": "102",
"item_id": "13"
},
{
"feedback": [],
"item_category": "102",
"item_id": "14"
}
],
"store_id": 500
}
]
}
This is a single document in a collection. Some field are deleted to produce minimal representation of the data.
What I want is to get items only if the feedback field in the items array is not empty. The expected result is:
Expected result
{
"stores": [
{
"items": [
{
"feedback": ["A feedback"],
"item_category": "101",
"item_id": "11"
}
],
"store_id": 500
}
]
}
This is what I tried based on examples in this, which I think pretty same situation, but it didn't work. What's wrong with my query, isn't it the same situation in zipcode search example in the link? It returns everything like in the first JSON code, Structure:
What I tried
query = {
'date': {'$gte': since, '$lte': until},
'stores.items': {"$elemMatch": {"feedback": {"$ne": []}}}
}
Thanks.

Please try this :
db.yourCollectionName.aggregate([
{ $match: { 'date': { '$gte': since, '$lte': until }, 'stores.items': { "$elemMatch": { "feedback": { "$ne": [] } } } } },
{ $unwind: '$stores' },
{ $match: { 'stores.items': { "$elemMatch": { "feedback": { "$ne": [] } } } } },
{ $unwind: '$stores.items' },
{ $match: { 'stores.items.feedback': { "$ne": [] } } },
{ $group: { _id: { _id: '$_id', store_id: '$stores.store_id' }, items: { $push: '$stores.items' } } },
{ $project: { _id: '$_id._id', store_id: '$_id.store_id', items: 1 } },
{ $group: { _id: '$_id', stores: { $push: '$$ROOT' } } },
{ $project: { 'stores._id': 0 } }
])
We've all these stages as you need to operate on an array of arrays, this query is written assuming you're dealing with a large set of data, Since you're filtering on dates just in case if your documents size is way less after first $match then you can avoid following $match stage which is in between two $unwind's.
Ref 's :
$match,
$unwind,
$project,
$group

This aggregate query gets the needed result (using the provided sample document and run from the mongo shell):
db.stores.aggregate( [
{ $unwind: "$stores" },
{ $unwind: "$stores.items" },
{ $addFields: { feedbackExists: { $gt: [ { $size: "$stores.items.feedback" }, 0 ] } } },
{ $match: { feedbackExists: true } },
{ $project: { _id: 0, feedbackExists: 0 } }
] )

Why i'm getting null value instead of aggregated response?

I'm trying to perform min aggregation using nested aggregation in elasticsearch but still getting null values..
GET /my_index/_search
{
"query": {
"match": {
"FirstName": "Cheryl"
}
},
"aggs": {
"art": {
"nested": {
"path": "art"
},
"aggs": {
"min_price": {
"min": {
"field": "art.Income"
}
}
}
}
}
}
Mappings :
{
"mappings": {
"properties": {
"art": {
"type": "nested",
"properties": {
"FirstName": {
"type": "text"
},
"Price": {
"type": "integer"
}
}
}
}
}
}

Pagination with total records count without duplicates and also with limit in aggregation query

Here is the query, I tried to aggregate and my requirement is to get total records count and also pagination that has limit of 15. I got the results but it seems to be there are duplicates. Please point out what am I missing.
db.collections.aggregate([
{
"$match": {
"application_id": "some_app_id"
}
},
{
"$lookup": {
"localField": "_id",
"from": "collection1",
"foreignField": "some_id",
"as": "merged"
}
},
{
"$unwind": {"path": "$merged"}
},
{
"$group": {
"_id": null,
"count": {"$sum": 1},
"posts": {
"$push":"$$ROOT"
}
}
},
{
"$unwind": "$posts"
},
{
"$group": {
"_id": "$posts._id",
"data":{"$addToSet":"$posts"},
"count": {"$first" : "$count"},
"posts": {
"$push":"$$ROOT"
}
}
},
{
"$project":{
"_id": 1,
"count": 1,
"data": 1
}
}
])
ACTUAL OUTPUT
"count" : 89 // Here the count is actually 33 but i'm getting 89 somehow
EXPECTED OUTPUT
"count" : 33 with no duplicates

Iterating over elasticsearch response and creating aggregation function within one ES query

I have an elasticsearch query which returns the top 10 results for a given querystring. I now need to use the response to create a sum aggregation for each of the 10 top results. This is my query to return the top 10:
GET search/
{
"index": "my_index",
"query": {
"match": {
"name": {
"query": "hello world",
"fuzziness": 2
}
}
}
}
With the response from the above request, I generate a list of the 10 org_ids and iterate over each of these ID. I have to make another request using the query below (where "org_id": "12345" is the first element in my array of IDs).
POST _search/my_index
{ "size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"org_id": "12345"
}
}
]
}
},
"aggs": {
"aggregation_1": {
"sum": {
"field": "dollar_amount"
}
},
"aggregation_2": {
"sum": {
"field": "employees"
}
}
}
}
However, I think that this approach is inefficient because I have to make a total of 11 requests which won't scale well. Ideally, I would like to make one request that can do all of this.
Is there any functionality in ES that would make this possible, or would I have to make individual requests for each search parameter? I've looked through the docs and can't find anything that involves iterating over the array of results.
EDIT: For simplicity, I think having 2 requests is fine for now. So I just need to figure out how to pass through an array of org_ids into the 2nd query and do all aggregations in that 2nd query.
E.g.
POST _search/my_index
{ "size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"org_id": ["12345", "67891", "98765"]
}
}
]
}
},
"aggs": {
"aggregation_1": {
"sum": {
"field": "dollar_amount"
}
},
"aggregation_2": {
"sum": {
"field": "employees"
}
}
}
}

To start you can aggregate on one step (so 2 requests in total)
I am taking a look about the fuzziness, but I don't see how make a one shot query.
Edit: are your org_id unique (= id of documents?), can you describe your data (how org_id are linked with the fuzziness query)?
{ "size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"org_id": "12 13 14 15 16 17 18...."
}
}
]
}
},
"aggs": {
"group_org_id": {
"terms": {
"field": "org_id"
}
},
"aggs": {
"aggregation_1": {
"sum": {
"field": "dollar_amount"
}
},
"aggregation_2": {
"sum": {
"field": "employees"
}
}
}
}
}

ElasticSearch how to do a sub aggregation in a sum aggregation

Hello I have an index in ElasticSearch with:
Plant, Department, Date, Value
I am trying to do a query in elasticsearch
1) Group by Plant and Date in specific departments and sum Value:
es = Elasticsearch('elasticsearch:9200')
body = Dict({"query": {
"bool": {
"must_not": {
"match": {
"Department": "Indirect*"}}}},
"aggs": {
"group_code": {
"terms": {
"field": "Plant.keyword", "size":10000},
"aggs": {
"group_date": {
"terms": {
"field": "Date"},
"aggs": {
"group_value": {
"sum":{
"field": "Value"}}}}}}}})
2) Group by Plant and Range of Dates, and get avg and median:
es = Elasticsearch('elasticsearch:9200')
body = Dict(
{"query": {
"bool": {
"must_not": {
"match": {
"Department_Substrate": "Indirect*"}}}},
"aggs": {
"group_code": {
"terms": {
"field": "Plant.keyword",
"size": 10000},
"aggs": {
"group_date": {
"range": {
"field": "Date",
"ranges": datelist},
"aggs": {
"Median": {
"percentiles": {
"field": "Value",
"percents": [25]}},
"Mean": {
"avg": {
"field":
"Value}}}}}}}})
it works too but in this case i didn't do the grouping by plant and date before, so mixing both i have something like:
body = Dict({"query": {
"bool": {
"must_not": {
"match": {
"Department_Substrate": "Indirect*"}}}},
"aggs": {
"group_code": {
"terms": {
"field": "Plant.keyword", "size":10000},
"aggs": {
"group_date": {
"terms": {
"field": "Date"},
"aggs": {
"group_value": {
"sum":{
"field": "Value"},
"aggs": {
"group_date": {
"range": {
"field": "Date",
"ranges": datelist},
"aggs": {
"Median": {
"percentiles": {
"field": "Value",
"percents": [25]}},
"Mean": {
"avg": {
"field":
"Value"}}}}}}}}}}}})
res = es.search(index=self.index, doc_type='test', body=body)
I have this:
TransportError: TransportError(500, 'aggregation_initialization_exception', 'Aggregator [group_value] of type [sum] cannot accept sub-aggregations')
So it exists a way to do this?
if it could help my code python before was:
data = test[~test.Department.str.startswith('Indirect')]
group1 = data.groupby(['Plant', 'Date'])['Value'].sum()
group2 = pd.DataFrame(group1.reset_index()).groupby(['Plant', pd.Grouper(key='Date', freq='W')])['Value'].median()

The error is clear:"Aggregator [group_value] of type [sum] cannot accept sub-aggregations"
When you do 'sum' aggregation you can't split the result anymore.
So you'd better change the position of sum aggs.
i.e.:
{
"query": {
"bool": {
"must_not": {
"match": {
"Department_Substrate": "Indirect*"
}
}
}
},
"aggs": {
"group_code": {
"terms": {
"field": "Plant.keyword",
"size": 10000
},
"aggs": {
"group_date": {
"terms": {
"field": "Date"
},
"aggs": {
"group_date": {
"range": {
"field": "Date",
"ranges": "sdf"
},
"aggs": {
"Median": {
"percentiles": {
"field": "Value",
"percents": [
25
]
}
},
"aggs": {
"group_value": {
"sum": {
"field": "Value"
}
}
}
}
}
}
}
}
}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sum for Multiple Ranges on GroupBy Aggregations in Elasticsearch - python

Related

Filter MongoDB query to find documents only if a field in a list of objects is not empty

Why i'm getting null value instead of aggregated response?

Pagination with total records count without duplicates and also with limit in aggregation query

Iterating over elasticsearch response and creating aggregation function within one ES query

ElasticSearch how to do a sub aggregation in a sum aggregation

Categories

Resources