Python pymongo using Aggregate with matching the Date from the DB - python

I need to Sum all values from different time series.
yesterday = datetime.datetime.combine(datetime.datetime.now(), datetime.time.min) - datetime.timedelta(days=1)
test = db.aggregate([
{ "$match": { "date": yesterday, "location": location } },
{ '$group': { '_id': "$location", 'cost': { '$sum': '$cost' } } }
])
This works for me to get the sum for yesterday, but for flexibility i want to change it.
test = db.aggregate([
{ "$project": { "year": { "$year": "$date" }, "month": { "$month": "$date" } } },
{ "$match": { "month": 1, "year": 2023, "location": location } },
{ '$group': { '_id': "$location", 'cost': { '$sum': '$cost' } } }
])
When I change it like that, i don't get a result.
I notices, when i run this Code
test = db.aggregate([{ "$project": { "year" : { "$year": "$posting_date" } } }])
I get the output like {'_id': ObjectId('63b6e6c215161672a159417f'), 'year': 2023} so the field year should be created correctly, shouldn't it? So why it is not working with the $match?

Related

Incorrect range value in Pymongo query

I have some of the data as below. I wrote a query that takes the latest data of certain regions from this data, grouped them according to date_name from this data, and put the value1_gpu value between 2 and 8 (including 2 and 8). But in my query, apart from 2-8 values, out-of-range values such as 24 are also coming. How can I write the correct query?
Data:
{ {
ID:0,
metadata: {"region": "eu-1"},
Date: 2011-05-02
data:[
{
Name:'Chris',
value1_gpu:2,
created_date:2011-04-03,
Value3:10
},
{
Name:'Chris',
value1_gpu:2,
created_date:2011-04-01,
Value3:10
},
{
Name:'David',
value1_gpu:8,
created_date:2011-04-02,
Value3:30
},
{
Name:'Mary',
value1_gpu:12,
created_date:2011-04-03
Value3:30
}
]
},
{
ID:1,
metadata: {"region": "eu-2"},
Date: 2011-05-01
data:[
{
Name:'Chris',
value1_gpu:80,
created_date:2011-04-05,
Value3:100
},
{
Name:'David',
value1_gpu:60,
created_date:2011-04-05,
Value3:30
}
]
},
{
ID:2,
metadata: {"region": "eu-1"},
Date: 2011-04-29
data:[
{
Name:'Chris1',
value1_gpu:2,
created_date:2011-04-03,
Value3:10
},
{
Name:'David',
value1_gpu:8,
created_date:2011-04-02,
Value3:30
},
{
Name:'Mary',
value1_gpu:12,
created_date:2011-04-03
Value3:30
}
]
}}
My Query:
mongo.instances.aggregate([{"$match": {"$and": [{"metadata.region": 'eu-1'},
{"data.value1_gpu": { "$gte": 2, "$lte": 8}}]}},
{"$sort": {"Date": -1}},
{"$limit": 1},
{"$unwind": "$data"},
{"$group": {
"_id": "$data.name",
"created_date": {"$first": "$data.created"},
"value1_gpu": {"$first": "$data.value1_gpu"}
}}])
Example Output:
From your explanation, you just need to shift your unwind stage to before the match stage containing the match of value1_gpu.
I removed the limit as I have no clue as to what you are trying to do with it.
You could also speed up your aggregation by separating the match for region to before unwind stage (as shown below) as well as adding a project stage before the unwind stage.
db.collection.aggregate([
{
"$unwind": "$data"
},
{
"$match": {
"$and": [
{
"metadata.region": "eu-1"
},
{
"data.value1_gpu": {
"$gte": 2,
"$lte": 8
}
}
]
}
},
{
"$sort": {
"Date": -1
}
},
{
"$group": {
"_id": "$data.Name",
"created_date": {
"$first": "$data.created_date"
},
"value1_gpu": {
"$first": "$data.value1_gpu"
}
}
}
])
Result from your dataset:
[
{
"_id": "David",
"created_date": "2011-04-02",
"value1_gpu": 8
},
{
"_id": "Chris1",
"created_date": "2011-04-03",
"value1_gpu": 2
},
{
"_id": "Chris",
"created_date": "2011-04-03",
"value1_gpu": 2
}
]
MongoDB Playground

Pymongo Query with variable return empty object

I am new to Stackoverflow and I have the exact same issue from this question. This topic is marked as an answer but this doesn't really help me to solve my problem. Do you have a clue about the cause of this problem?
I have this piece of code that return an empty object :
tmp = client['test']['Prenom'].aggregate([
{
'$match': {
'annais': 2014,
'preusuel': {
'$ne': '_PRENOMS_RARES'
}
}
}, {
'$group': {
'_id': '$preusuel',
'nombre': {
'$sum': '$nombre'
},
'occurence': {
'$sum': 1
}
}
}, {
'$sort': {
'nombre': -1
}
}, {
'$limit': 10
}
])
whereas :
tmp = client['test']['Prenom'].aggregate([
{
'$match': {
'annais': annee_j,
'preusuel': {
'$ne': '_PRENOMS_RARES'
}
}
}, {
'$group': {
'_id': '$preusuel',
'nombre': {
'$sum': '$nombre'
},
'occurence': {
'$sum': 1
}
}
}, {
'$sort': {
'nombre': -1
}
}, {
'$limit': 10
}
])
work perfectly and return 10 names.
Thank you

How to get the sum of a value using an id condition over a date range with Elasticsearch?

I'm trying to write a query to get the sum of a value per month of documents with a particular Id. To do this I'm trying:
query = {
"size": 0,
"aggs" : {
"articles_over_time" : {
"date_histogram" : {
"field" : "timestamp",
"interval" : "month"
}
},
"value": {
"sum": {
"field": "generatedTotal"
}
}
}
}
This query will give me the sum of generatedTotal per month, but it is giving me the sum of generatedTotal for all documents. How can I specify to get the sum of generatedTotal per month for a particular generatorId?
Example of a document in the Elasticsearch index:
{'id': 0, 'timestamp': '2018-01-01', 'generatorId': '150', 'generatedTotal': 2166.8759558092734}
If you do it separately like that, it counts as 2 different aggregations. You first need to query for the specific generatorId that you want, then do the second aggs within the first aggs:
{
"size": 0,
"query": {
"term": {
"generatorId": "150"
}
},
"aggs": {
"articles_over_time": {
"date_histogram": {
"field": "timestamp",
"interval": "month"
},
"aggs": {
"monthlyGeneratedTotal": {
"sum": {
"field": "generatedTotal"
}
}
}
}
}
}
4 sample documents (1 with different generatorId, and not be counted in the aggregations)
{"timestamp": "2018-02-01", "generatedTotal": 3, "generatorId": "150"}
{"timestamp": "2018-01-01", "generatedTotal": 1, "generatorId": "150"}
{"timestamp": "2018-01-01", "generatedTotal": 2, "generatorId": "150"}
{"timestamp": "2018-01-01", "generatedTotal": 2, "generatorId": "160"}
Then you will have the aggregations as follow:
{
"aggregations": {
"articles_over_time": {
"buckets": [
{
"key_as_string": "2018-01-01T00:00:00.000Z",
"key": 1514764800000,
"doc_count": 2,
"monthlyGeneratedTotal": {
"value": 3.0
}
},
{
"key_as_string": "2018-02-01T00:00:00.000Z",
"key": 1517443200000,
"doc_count": 1,
"monthlyGeneratedTotal": {
"value": 3.0
}
}
]
}
}
}
I hope this answers your question.

Sum for Multiple Ranges on GroupBy Aggregations in Elasticsearch

The following mapping is aggregated on multiple levels on a field grouping documents using another field.
Mapping:
{
'predictions': {
'properties': {
'Company':{'type':'string'},
'TxnsId':{'type':'string'},
'Emp':{'type':'string'},
'Amount':{'type':'float'},
'Cash/online':{'type':'string'},
'items':{'type':'float'},
'timestamp':{'type':'date'}
}
}
}
My requirement is bit complex, I need to
For each Emp (Getting the distinct employees)
Check whether it is online or cashed transaction
Group by items with the ranges like 0-10,11-20,21-30....
Sum the Amount
Final Output is like:
>Emp-online-range-Amount
>a-online-(0-10)-1240$
>a-online-(21-30)-3543$
>b-online-(0-10)-2345$
>b-online-(11-20)-3456$
Something like this should do the job:
{
"size": 0,
"aggs": {
"by_emp": {
"terms": {
"field": "Emp"
},
"aggs": {
"cash_online": {
"filters": {
"filters": {
"cashed": {
"term": {
"Cash/online": "cached"
}
},
"online": {
"term": {
"Cash/online": "online"
}
}
}
},
"aggs": {
"ranges": {
"range": {
"field": "items",
"ranges": [
{
"from": 0,
"to": 11
},
{
"from": 11,
"to": 21
},
{
"from": 21,
"to": 31
}
]
},
"aggs": {
"total": {
"sum": {
"field": "Amount"
}
}
}
}
}
}
}
}
}
}

ordering json in python mapping object

I am using elasticsearch where the query is to be posted in json and should be in standard order or else the result will be wrong. the problem is that the python is changing my json ordering. my original json query is.
x= {
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*a*"
}
},
"filter": {
"and": {
"filters": [
{
"term": {
"city": "london"
}
},
{
"term": {
"industry.industry_not_analyed": "oil"
}
}
]
}
}
}
},
"facets": {
"industry": {
"terms": {
"field": "industry.industry_not_analyed"
}
},
"city": {
"terms": {
"field": "city.city_not_analyzed"
}
}
}
}
but the resulting python object is as follow.
{
'query': {
'filtered': {
'filter': {
'and': {
'filters': [
{
'term': {
'city': 'london'
}
},
{
'term': {
'industry.industry_not_analyed': 'oil'
}
}
]
}
},
'query': {
'query_string': {
'query': '*a*'
}
}
}
},
'facets': {
'city': {
'terms': {
'field': 'city.city_not_analyzed'
}
},
'industry': {
'terms': {
'field': 'industry.industry_not_analyed'
}
}
}
}
the result is different than what I need how do I solve this.
Use OrderedDict() instead of {}. Note that you can't simply use OrderedDict(query=...) because that would create an unordered dict in the background. Use this code instead:
x = OrderedDict()
x['query'] = OrderedDict()
...
I suggest to implement a builder for this:
x = Query().filtered().query_string("*a*").and()....

Categories