I have some of the data as below. I wrote a query that takes the latest data of certain regions from this data, grouped them according to date_name from this data, and put the value1_gpu value between 2 and 8 (including 2 and 8). But in my query, apart from 2-8 values, out-of-range values such as 24 are also coming. How can I write the correct query?
Data:
{ {
ID:0,
metadata: {"region": "eu-1"},
Date: 2011-05-02
data:[
{
Name:'Chris',
value1_gpu:2,
created_date:2011-04-03,
Value3:10
},
{
Name:'Chris',
value1_gpu:2,
created_date:2011-04-01,
Value3:10
},
{
Name:'David',
value1_gpu:8,
created_date:2011-04-02,
Value3:30
},
{
Name:'Mary',
value1_gpu:12,
created_date:2011-04-03
Value3:30
}
]
},
{
ID:1,
metadata: {"region": "eu-2"},
Date: 2011-05-01
data:[
{
Name:'Chris',
value1_gpu:80,
created_date:2011-04-05,
Value3:100
},
{
Name:'David',
value1_gpu:60,
created_date:2011-04-05,
Value3:30
}
]
},
{
ID:2,
metadata: {"region": "eu-1"},
Date: 2011-04-29
data:[
{
Name:'Chris1',
value1_gpu:2,
created_date:2011-04-03,
Value3:10
},
{
Name:'David',
value1_gpu:8,
created_date:2011-04-02,
Value3:30
},
{
Name:'Mary',
value1_gpu:12,
created_date:2011-04-03
Value3:30
}
]
}}
My Query:
mongo.instances.aggregate([{"$match": {"$and": [{"metadata.region": 'eu-1'},
{"data.value1_gpu": { "$gte": 2, "$lte": 8}}]}},
{"$sort": {"Date": -1}},
{"$limit": 1},
{"$unwind": "$data"},
{"$group": {
"_id": "$data.name",
"created_date": {"$first": "$data.created"},
"value1_gpu": {"$first": "$data.value1_gpu"}
}}])
Example Output:
From your explanation, you just need to shift your unwind stage to before the match stage containing the match of value1_gpu.
I removed the limit as I have no clue as to what you are trying to do with it.
You could also speed up your aggregation by separating the match for region to before unwind stage (as shown below) as well as adding a project stage before the unwind stage.
db.collection.aggregate([
{
"$unwind": "$data"
},
{
"$match": {
"$and": [
{
"metadata.region": "eu-1"
},
{
"data.value1_gpu": {
"$gte": 2,
"$lte": 8
}
}
]
}
},
{
"$sort": {
"Date": -1
}
},
{
"$group": {
"_id": "$data.Name",
"created_date": {
"$first": "$data.created_date"
},
"value1_gpu": {
"$first": "$data.value1_gpu"
}
}
}
])
Result from your dataset:
[
{
"_id": "David",
"created_date": "2011-04-02",
"value1_gpu": 8
},
{
"_id": "Chris1",
"created_date": "2011-04-03",
"value1_gpu": 2
},
{
"_id": "Chris",
"created_date": "2011-04-03",
"value1_gpu": 2
}
]
MongoDB Playground
Related
Is it possible to sort an array by occurrences?
For Example, given
{
"_id": {
"$oid": "60d20d342c7951852a21s53a"
},
"site": "www.xyz.ie",
"A": ["mary", "jamie", "john", "mary", "mary", "john"],
}
return
{
"_id": {
"$oid": "60d20d342c7951852a21s53a"
},
"site": "www.xyz.ie",
"A": ["mary", "jamie", "john", "mary", "mary", "john"],
"sorted_A" : ["mary","john","jamie"]
}
I am able to get it most of the way there but I cannot figure out how to join them all back together in an array.
I have been using an aggregation pipeline
Starting with $match to find the site I want
Then $unwind on with path: "$A"
Next $sortByCount on "$A"
???? I can't figure out how to group it all back together.
Here is the pipeline:
[
{
'$match': {
'site': 'www.xyz.ie'
}
}, {
'$unwind': {
'path': '$A'
}
}, {
'$sortByCount': '$A'
}, {
????
}
]
$group nu _id and A, get first site and count total elements
$sort by count in descending order
$group by only _id and get first site, and construct array of A
[
{ $match: { site: "www.xyz.ie" } },
{ $unwind: "$A" },
{
$group: {
_id: { _id: "$_id", A: "$A" },
site: { $first: "$site" },
count: { $sum: 1 }
}
},
{ $sort: { count: -1 } },
{
$group: {
_id: "$_id._id",
site: { $first: "$site" },
A: { $push: "$_id.A" }
}
}
]
Playground
generate unique id in nested document - Pymongo
my database looks like this...
{
"_id":"5ea661d6213894a6082af6d1",
"blog_id":"blog_one",
"comments": [
{
"user_id":"1",
"comment":"comment for blog one this is good"
},
{
"user_id":"2",
"comment":"other for blog one"
},
]
}
I want to add unique id in each and every comment,
I want it to output like this,
{
"_id":"5ea661d6213894a6082af6d1",
"blog_id":"blog_one",
"comments": [
{
"id" : "something" (auto generate unique),
"user_id":"1",
"comment":"comment for blog one this is good"
},
{
"id" : "something" (auto generate unique),
"user_id":"2",
"comment":"other for blog one"
},
]
}
I'm using PyMongo, is there a way to update this kind of document?
it's possible or not?
This update will add an unique id value to each of the comments array with nested documents. The id value is calculated based upon the present time as milliseconds. This value is incremented for each array element to get the new id value for the nested documents of the array.
The code runs with MongoDB version 4.2 and PyMongo 3.10.
pipeline = [
{
"$set": {
"comments": {
"$map": {
"input": { "$range": [ 0, { "$size": "$comments" } ] },
"in": {
"$mergeObjects": [
{ "id": { "$add": [ { "$toLong" : datetime.datetime.now() }, "$$this" ] } },
{ "$arrayElemAt": [ "$comments", "$$this" ] }
]
}
}
}
}
}
]
collection.update_one( { }, pipeline )
The updated document:
{
"_id" : "5ea661d6213894a6082af6d1",
"blog_id" : "blog_one",
"comments" : [
{
"id" : NumberLong("1588179349566"),
"user_id" : "1",
"comment" : "comment for blog one this is good"
},
{
"id" : NumberLong("1588179349567"),
"user_id" : "2",
"comment" : "other for blog one"
}
]
}
[ EDIT ADD ]
The following works from mongo shell. It adds unique id for the comments array's nested documents - unique across the documents.
db.collection.aggregate( [
{
"$unwind": "$comments" },
{
"$group": {
"_id": null,
"count": { "$sum": 1 },
"docs": { "$push": "$$ROOT" },
"now": { $first: "$$NOW" }
}
},
{
"$addFields": {
"docs": {
"$map": {
"input": { "$range": [ 0, "$count" ] },
"in": {
"$mergeObjects": [
{ "comments_id": { "$add": [ { "$toLong" : "$now" }, "$$this" ] } },
{ "$arrayElemAt": [ "$docs", "$$this" ] }
]
}
}
}
}
},
{
"$unwind": "$docs"
},
{
"$addFields": {
"docs.comments.comments_id": "$docs.comments_id"
}
},
{
"$replaceRoot": { "newRoot": "$docs" }
},
{
"$group": {
"_id": { "_id": "$_id", "blog_id": "$blog_id" },
"comments": { "$push": "$comments" }
}
},
{
$project: {
"_id": 0,
"_id": "$_id._id",
"blog_id": "$_id.blog_id",
"comments": 1
}
}
] ).forEach(doc => db.blogs.updateOne( { _id: doc._id }, { $set: { comments: doc.comments } } ) )
You can use ObjectId constructor to create the ids and place them in your nested documents.
I have a MongoDB document structure like following:
Structure
{
"stores": [
{
"items": [
{
"feedback": [],
"item_category": "101",
"item_id": "10"
},
{
"feedback": [],
"item_category": "101",
"item_id": "11"
}
]
},
{
"items": [
{
"feedback": [],
"item_category": "101",
"item_id": "10"
},
{
"feedback": ["A feedback"],
"item_category": "101",
"item_id": "11"
},
{
"feedback": [],
"item_category": "101",
"item_id": "12"
},
{
"feedback": [],
"item_category": "102",
"item_id": "13"
},
{
"feedback": [],
"item_category": "102",
"item_id": "14"
}
],
"store_id": 500
}
]
}
This is a single document in a collection. Some field are deleted to produce minimal representation of the data.
What I want is to get items only if the feedback field in the items array is not empty. The expected result is:
Expected result
{
"stores": [
{
"items": [
{
"feedback": ["A feedback"],
"item_category": "101",
"item_id": "11"
}
],
"store_id": 500
}
]
}
This is what I tried based on examples in this, which I think pretty same situation, but it didn't work. What's wrong with my query, isn't it the same situation in zipcode search example in the link? It returns everything like in the first JSON code, Structure:
What I tried
query = {
'date': {'$gte': since, '$lte': until},
'stores.items': {"$elemMatch": {"feedback": {"$ne": []}}}
}
Thanks.
Please try this :
db.yourCollectionName.aggregate([
{ $match: { 'date': { '$gte': since, '$lte': until }, 'stores.items': { "$elemMatch": { "feedback": { "$ne": [] } } } } },
{ $unwind: '$stores' },
{ $match: { 'stores.items': { "$elemMatch": { "feedback": { "$ne": [] } } } } },
{ $unwind: '$stores.items' },
{ $match: { 'stores.items.feedback': { "$ne": [] } } },
{ $group: { _id: { _id: '$_id', store_id: '$stores.store_id' }, items: { $push: '$stores.items' } } },
{ $project: { _id: '$_id._id', store_id: '$_id.store_id', items: 1 } },
{ $group: { _id: '$_id', stores: { $push: '$$ROOT' } } },
{ $project: { 'stores._id': 0 } }
])
We've all these stages as you need to operate on an array of arrays, this query is written assuming you're dealing with a large set of data, Since you're filtering on dates just in case if your documents size is way less after first $match then you can avoid following $match stage which is in between two $unwind's.
Ref 's :
$match,
$unwind,
$project,
$group
This aggregate query gets the needed result (using the provided sample document and run from the mongo shell):
db.stores.aggregate( [
{ $unwind: "$stores" },
{ $unwind: "$stores.items" },
{ $addFields: { feedbackExists: { $gt: [ { $size: "$stores.items.feedback" }, 0 ] } } },
{ $match: { feedbackExists: true } },
{ $project: { _id: 0, feedbackExists: 0 } }
] )
Here is the query, I tried to aggregate and my requirement is to get total records count and also pagination that has limit of 15. I got the results but it seems to be there are duplicates. Please point out what am I missing.
db.collections.aggregate([
{
"$match": {
"application_id": "some_app_id"
}
},
{
"$lookup": {
"localField": "_id",
"from": "collection1",
"foreignField": "some_id",
"as": "merged"
}
},
{
"$unwind": {"path": "$merged"}
},
{
"$group": {
"_id": null,
"count": {"$sum": 1},
"posts": {
"$push":"$$ROOT"
}
}
},
{
"$unwind": "$posts"
},
{
"$group": {
"_id": "$posts._id",
"data":{"$addToSet":"$posts"},
"count": {"$first" : "$count"},
"posts": {
"$push":"$$ROOT"
}
}
},
{
"$project":{
"_id": 1,
"count": 1,
"data": 1
}
}
])
ACTUAL OUTPUT
"count" : 89 // Here the count is actually 33 but i'm getting 89 somehow
EXPECTED OUTPUT
"count" : 33 with no duplicates
The following mapping is aggregated on multiple levels on a field grouping documents using another field.
Mapping:
{
'predictions': {
'properties': {
'Company':{'type':'string'},
'TxnsId':{'type':'string'},
'Emp':{'type':'string'},
'Amount':{'type':'float'},
'Cash/online':{'type':'string'},
'items':{'type':'float'},
'timestamp':{'type':'date'}
}
}
}
My requirement is bit complex, I need to
For each Emp (Getting the distinct employees)
Check whether it is online or cashed transaction
Group by items with the ranges like 0-10,11-20,21-30....
Sum the Amount
Final Output is like:
>Emp-online-range-Amount
>a-online-(0-10)-1240$
>a-online-(21-30)-3543$
>b-online-(0-10)-2345$
>b-online-(11-20)-3456$
Something like this should do the job:
{
"size": 0,
"aggs": {
"by_emp": {
"terms": {
"field": "Emp"
},
"aggs": {
"cash_online": {
"filters": {
"filters": {
"cashed": {
"term": {
"Cash/online": "cached"
}
},
"online": {
"term": {
"Cash/online": "online"
}
}
}
},
"aggs": {
"ranges": {
"range": {
"field": "items",
"ranges": [
{
"from": 0,
"to": 11
},
{
"from": 11,
"to": 21
},
{
"from": 21,
"to": 31
}
]
},
"aggs": {
"total": {
"sum": {
"field": "Amount"
}
}
}
}
}
}
}
}
}
}