how to aggregate on each item in collection in mongoDB - python

MongoDB noob here...
when I do db.students.find().pretty() in the shell I get a long list from my collection...like so..
{
"_id" : 19,
"name" : "Gisela Levin",
"scores" : [
{
"type" : "exam",
"score" : 44.51211101958831
},
{
"type" : "quiz",
"score" : 0.6578497966368002
},
{
"type" : "homework",
"score" : 93.36341655949683
},
{
"type" : "homework",
"score" : 49.43132782777443
}
]
}
now I've got about over 100 of these...I need to run the following on each of them...
lowest_hw_score =
db.students.aggregate(
// Initial document match (uses index, if a suitable one is available)
{ $match: {
_id : 0
}},
// Expand the scores array into a stream of documents
{ $unwind: '$scores' },
// Filter to 'homework' scores
{ $match: {
'scores.type': 'homework'
}},
// Sort in descending order
{ $sort: {
'scores.score': 1
}},
{ $limit: 1}
)
So I can run something like this on each result
for item in lowest_hw_score:
print lowest_hw_score
Right now "lowest_score" works on only one item I to run this on all items in the collection...how do I do this?

> db.students.aggregate(
{ $match : { 'scores.type': 'homework' } },
{ $unwind: "$scores" },
{ $match:{"scores.type":"homework"} },
{ $group: {
_id : "$_id",
maxScore : { $max : "$scores.score"},
minScore: { $min:"$scores.score"}
}
});
You don't really need the first $match, but if "scores.type" is indexed, it means it would be used before unwinding the scores. (I don't believe after the $unwind mongo would be able to use the index.)
Result:
{
"result" : [
{
"_id" : 19,
"maxScore" : 93.36341655949683,
"minScore" : 49.43132782777443
}
],
"ok" : 1
}
Edit: tested and updated in mongo shell

Related

Mongodb find nested dict element

{
"_id" : ObjectId("63920f965d15e98e3d7c450c"),
"first_name" : "mymy",
"last_activity" : 1669278303.4341061,
"username" : null,
"dates" : {
"29.11.2022" : {
},
"30.11.2022" : {
}
},
"user_id" : "1085116517"
}
How can I find all documents with 29.11.2022 contained in date? I tried many things but in all of them it detects the dot letter as something else.
Use $getField in $expr.
db.collection.find({
$expr: {
$eq: [
{},
{
"$getField": {
"field": "29.11.2022",
"input": "$dates"
}
}
]
}
})
Mongo Playground

MongoDB - Pull and Update in a single query

I have the following schema -
{
"_id" : ObjectId("60c3253f19862e6347bc9f4e"),
"farm_id": "Gustavo-chainer",
"first_ts" : ISODate("2021-05-18T09:53:00.000Z"),
"last_ts" : ISODate("2021-05-18T12:53:00.000Z"),
"sensor_data" : [
{
"data" : 76.0,
"sensor": "temperature-sensor",
"start_ts" : ISODate("2021-05-18T09:33:00.000Z"),
"end_ts" : ISODate("2021-05-18T09:53:00.000Z")
},
{
"data" : 74.0,
"sensor": "temperature-sensor",
"start_ts" : ISODate("2021-05-18T12:33:00.000Z"),
"end_ts" : ISODate("2021-05-18T12:53:00.000Z")
}
]
}
where first_ts = minimum of all the values of start_ts present in the sensor_data array and last_ts = maximum of all the values of end_ts present in the sensor_data array.
I want to delete a data point from sensor_data array given the start_ts and end_ts and after deletion, have to update the first_ts and last_ts accordingly.
Example -
Delete data point with "start_ts" : ISODate("2021-05-18T12:33:00.000Z") and "end_ts" : ISODate("2021-05-18T12:53:00.000Z"). After deletion, the document should look like -
{
"_id" : ObjectId("60c3253f19862e6347bc9f4e"),
"first_ts" : ISODate("2021-05-18T09:53:00.000Z"),
"last_ts" : ISODate("2021-05-18T09:53:00.000Z"),
"sensor_data" : [
{
"data" : 76.0,
"sensor": "temperature-sensor"
"start_ts" : ISODate("2021-05-18T09:33:00.000Z"),
"end_ts" : ISODate("2021-05-18T09:53:00.000Z")
}
]
}
I need to write a pymongo query that can do the above task in a single query.
You can try update with aggregation pipeline starting from MongoDB 4.2,
$filter to iterate loop of sensor_data array, check both fields date condition and $not for the opposite condition to exclude matching documents
$min to get minimum start_ts date from sensor_data.start_ts
$max to get maximum end_ts date from sensor_data.end_ts
collection.update(
{
sensor_data: {
$elemMatch: {
start_ts: ISODate("2021-05-18T12:33:00.000Z"),
end_ts: ISODate("2021-05-18T12:53:00.000Z")
}
}
},
[{
$set: {
sensor_data: {
$filter: {
input: "$sensor_data",
cond: {
$not: {
$and: [
{ $eq: ["$$this.start_ts", ISODate("2021-05-18T12:33:00.000Z")] },
{ $eq: ["$$this.end_ts", ISODate("2021-05-18T12:53:00.000Z")] }
]
}
}
}
}
}
},
{
$set: {
first_ts: { $min: "$sensor_data.start_ts" },
last_ts: { $max: "$sensor_data.end_ts" }
}
}],
{ multi: true }
)
Playground

My code is woring in mongodb but not working in pymongo

I have a documents in collection and I want to find document and update elements of list.
Here is sample data:
{
{
"_id" : ObjectId("5edd3faaf6c9d938e0bfd966"),
"id" : 1,
"status" : "XXX",
"number" : [
{
"code" : "AAA"
},
{
"code" : "CVB"
},
{
"code" : "AAA"
},
{
"code" : "BBB"
}
]
},
{
"_id" : ObjectId("asseffsfpo2dedefwef"),
"id" : 2,
"status" : "TUY",
"number" : [
{
"code" : "PPP"
},
{
"code" : "SSD"
},
{
"code" : "HDD"
},
{
"code" : "IOO"
}
]
}
}
I planed to find where "id":1 and value of number.code in ["AAA", "BBB"], change number.code to "DDD". I did it with following code:
db.test.update(
{
id: 1,
"number.code": {$in: ["AAA", "BBB"]}
},
{
$set: {"number.$[elem].code": "VVV"}
},
{ "arrayFilters": [{ "elem.code": {$in: ["AAA", "BBB"]} }], "multi": true, "upsert": false
}
)
It works in mongodb shell, but in python (with pymongo) it doesn't with the following error:
raise TypeError("%s must be True or False" % (option,))
TypeError: upsert must be True or False
Please help me. What can I do?
pymongo just has syntax that's a tad different. it would look like this:
db.test.update_many(
{
"id": 1,
"number.code": {"$in": ["AAA", "BBB"]}
},
{
"$set": {"number.$[elem].code": "VVV"}
},
array_filters=[{"elem.code": {"$in": ["AAA", "BBB"]}}],
upsert=False
)
multi flag not needed with update_many.
upsert is False by default hence also redundant.
You can find pymongo's docs here.

How to group connected documents (nodes of graph ) from mongodb

Assume that I have a one-way graph with nodes A,B,C,D,E such that
A->B->C is a connected component,
D->E is another connected component.
Nodes are saved in MongoDB as documents
{name:'A', child: 'B'}, {name:'B', child: 'C'}, {name:'C'},
{name:'D', child: 'E'}, {name:'E'}
How to get all connected components?
Expected result: 2 groups
[{name:'A'...},{name:'B'...}, {name:'C'...}],[{name:'D'...}, {name:'E'...}]
Use $graphLookup pipeline stage.
$group - collects all possible values from the name and child fields.
$addFields - generates the array with name field values of the root documents.
$unwind - splits roots array into seperated documents.
$graphLookup - collects related documents by given name field value of the root document.
$project - removes the not necessary fieldss from the result documents.
Query:
db.getCollection('t').aggregate([
{
$group: {
_id: null,
names: { $addToSet: "$name" },
childs: { $addToSet: "$child" }
}
},
{
$addFields: {
roots: { $setDifference: ["$names", "$childs"] }
}
},
{ $unwind: "$roots" },
{
$graphLookup: {
from: "t",
startWith: "$roots",
connectFromField: "child",
connectToField: "name",
as: "related"
}
},
{
$project: {
"related": 1,
"_id": 0
}
}
])
Result:
/* 1 */
{
"related" : [
{
"_id" : ObjectId("5a29316a545eb40950c33bc8"),
"name" : "C"
},
{
"_id" : ObjectId("5a29316a545eb40950c33bc7"),
"name" : "B",
"child" : "C"
},
{
"_id" : ObjectId("5a29316a545eb40950c33bc6"),
"name" : "A",
"child" : "B"
}
]
}
/* 2 */
{
"related" : [
{
"_id" : ObjectId("5a29316a545eb40950c33bca"),
"name" : "E"
},
{
"_id" : ObjectId("5a29316a545eb40950c33bc9"),
"name" : "D",
"child" : "E"
}
]
}

Pymongo Remove duplicates with condition in mongodb

Thanks for reading my question.
Please execuse any mistakes, i'm working on improving my English.
I have > 4000 records in my MongoDB, this is one of my records :
{
"_id" : ObjectId("5763821ffefb61074041477e"),
"sessionId" : "5138A3B4A5966CE4B2203B8BFC90055F",
"objects" : [
{
"id" : "334449673730",
"point" : 0.5
},
{
"id" : "790373008255",
"point" : 0.5
},
{
"id" : "790373008255",
"point" : 1.0
},
{
"id" : "572453522243",
"point" : 0.5
},
{
"id" : "572453522243",
"point" : 1.0
}
]
}
My result, i want to delete duplicate id but keep the point : 1.0
Result :
{
"_id" : ObjectId("5763821ffefb61074041477e"),
"sessionId" : "5138A3B4A5966CE4B2203B8BFC90055F",
"objects" : [
{
"id" : "334449673730",
"point" : 0.5
},
{
"id" : "790373008255",
"point" : 1.0
},
{
"id" : "572453522243",
"point" : 1.0
}
]
}
I follow this post : How to remove duplicates with a certain condition in mongodb?
its similary with my question but i don't know why result as not as i want :
pipeline = ([
{
"$group": {
"_id": "$id",
"count": { "$sum": 1 },
#"uniqueIds": { "$addToSet": "$_id" },
"Point": { "$max": "$point" }
}
},
{
"$match": {
"count": { "$gte": 1 }
}
}
])
for test_item in collection_forTest.aggregate(pipeline):
print(test_item)
Result :
{'Point': None, 'count': 1, '_id': None}
I can use python code, load all records, check where same id in list, compare if point = 1 and remove same record with point != 1 but i think its slower than aggregation
Can you help me with my problem for all > 4000 records ?
Thanks very much !

Categories