How to get all document with max date? - python

i'm trying to get, from my MongoDB, all documents with the higher date.
My db is look like :
_id:"1"
date:"21-12-20"
report:"some stuff"
_id:"2"
date:"11-11-11"
report:"qualcosa"
_id:5fe08735b5a28812866cbc8a
date:"21-12-20"
report:Object
_id:5fe0b35e2f465c2a2bbfc0fd
date:"20-12-20"
report:"ciao"
and i would like to have a result like :
_id:"1"
date:"21-12-20"
report:"some stuff"
_id:5fe08735b5a28812866cbc8a
date:"21-12-20"
report:Object
I tried to run this script :
db.collection.find({}).sort([("date", -1)]).limit(1)
but it gives me only one document.
How can I get all the documents with the greatest date automatically?

Try to remove limit(1) and it's gonna work

If you add .limit(1) it's only ever going to give you one document.
Either use the answer as a query to another .find(), or you can write an aggregate query. If you data set is a modest size, I prefer the former for clarity.
max_date = list(db.collection.find({}).sort([("date", -1)])).limit(1)
if len(max_date) > 0:
db.collection.find({'date': max_date[0]['date']})

Use an aggregation pipeline like this:
db.collection.aggregate([
{ $group: { _id: null, data: { $push: "$$ROOT" } } },
{
$set: {
data: {
$filter: {
input: "$data",
cond: { $eq: [{ $max: "$data.date" }, "$$this.date"] }
}
}
}
},
{ $unwind: "$data" },
{ $replaceRoot: { newRoot: "$data" } }
])

Related

Mongodb update a particular word in a string in a multiple document

I am updating a mongodb collection for a small project of mine and I'm stuck with updating a single word in an existing field.
Example:
{
"_id" : ObjectId("5faa46a6036e146f85a4afef"),
"name" : "Kubernetes_cluster_setup - kubernetes-cluster"
}
In the document I want to change the "name": "Kubernetes_cluster_config -kubernetes-cluster".
I want config to be replaced in place of setup, and it should not remove the -kubernetes-cluster, that is a constant value.
Applied method > $set updates the entire field, but I want -kubernetes-cluster should not be removed.
Try using $replaceOne operator.
You need an aggregation like this.
db.collection.aggregate([
{
"$match": {
"id": 0
}
},
{
"$set": {
"name": {
"$replaceOne": {
"input": "$name",
"find": "setup",
"replacement": "config"
}
}
}
}
])
The first part is to find the element (I've used by id) and the second one is used to replace into the field name, the value setup for config.
Example here
Also, if you want to replace the string for every document, you can use this query:
db.collection.aggregate([
{
"$match": {
"name": {
"$regex": "setup"
}
}
},
{
"$set": {
"name": {
"$replaceOne": {
"input": "$name",
"find": "setup",
"replacement": "config"
}
}
}
}
])
Here the query look for the documents where field name contains the word setup and then replace for config.
Example here

Getting first entry of each unique combination of fields from elasticsearch

I have an elasticsearch database that I upload entries like this:
{"s_food": "bread", "s_store": "Safeway", "s_date" : "2020-06-30", "l_run" : 28900, "l_covered": 1}
When I upload it to elasticsearch, it adds an _id, _type, #timestamp and _index fields. So the entries look sort of like this:
{"s_food": "bread", "s_store": "Safeway", "s_date" : "2020-06-30", "l_run" : 28900, "l_covered": 1, "_type": "_doc", "_index": "my_index", "_id": pe39u5hs874kee}
The way that I'm using the elasticsearch database results in the same original entries getting uploaded multiple times. In this example, I only care about the s_food, s_date, and l_run fields being a unique combination. Since I have so many entries, I'd like to use the elasticsearch scroll tool to go through all the matches. So far in elasticsearch, I've only seen people use aggregation to get buckets of each term and then they iterate over each partition. I would like to use something like aggregation to get an entire entry (just 1) for each unique combination of the three fields that I care about (food, date, run). Right now I use aggregation with a scroll like so:
GET /my-index/_search?scroll=25m
{
size: 10000,
aggs: {
foods: {
terms: {
field: s_food
},
aggs: {
dates: {
terms: {
field: s_date
},
aggs: {
runs: {
terms: {
field: l_run
}
}
}
}
}
}
}
Unfortunately this is only giving me the usual bucketed structure that I don't want. Is there something else I should try?
All you need is to use top-hits aggregation with size: 1. Read more about top-hits aggregation here.
The query would look like this:
{
"size": 10000,
"aggs": {
"foods": {
"terms": {
"field": "s_food"
},
"aggs": {
"dates": {
"terms": {
"field": "s_date"
},
"aggs": {
"runs": {
"terms": {
"field": "l_run"
},
"aggs": {
"topOne": {
"top_hits": {
"size": 1
}
}
}
}
}
}
}
}
}
}
}

Mongodb get count() of CommandCursor

I'm performing a search with this aggregate and would like to get my total count (to deal with my pagination).
results = mongo.db.perfumes.aggregate(
[
{"$match": {"$text": {"$search": db_query}}},
{
"$lookup": {
"from": "users",
"localField": "author",
"foreignField": "username",
"as": "creator",
}
},
{"$unwind": "$creator"},
{
"$project": {
"_id": "$_id",
"perfumeName": "$name",
"perfumeBrand": "$brand",
"perfumeDescription": "$description",
"date_updated": "$date_updated",
"perfumePicture": "$picture",
"isPublic": "$public",
"perfumeType": "$perfume_type",
"username": "$creator.username",
"firstName": "$creator.first_name",
"lastName": "$creator.last_name",
"profilePicture": "$creator.avatar",
}
},
{"$sort": {"perfumeName": 1}},
]
)
How could I get the count of results in my route so I can pass it to my template?
I cannot use results.count() as it is a CommandCursor.
Help please? Thank you!!
Using len method to return no.of elements in an array would be easier but if you still wanted an aggregation query to return count and actual docs at the same time then try using $facet or $group :
Query 1 :
{
$facet: {
docs: [ { $match: {} } ], // passes all docs into an array field
count: [ { $count: "count" } ] // counts no.of docs
}
},
/** re-create count field from array of one object to just a number */
{
$addFields: { count: { $arrayElemAt: [ "$count.count", 0 ] } }
}
Test : mongoplayground
Query 2 :
/** Group all docs without any condition & push all docs into an array field & count no.of docs flowing through iteration using `$sum` */
{
$group: { _id: "", docs: { $push: "$$ROOT" }, count: { $sum: 1 } }
}
Test : mongoplayground
Note :
Add one of these queries at the end of your current aggregation pipeline and remember if there are no docs after $match or $unwind stages then first query would not have count field but has docs : [] but second query will just return [], code it accordingly.
If you look at the CommandCursor's docs, it does not support count()
You can use the length filter in jinja template.
{{ results | length }}
I hope the above helps.

Sorting nested array in mongodb

I have following structure in mongodb collection as shown in the attached image:
enter image description here
I will have an index number and inside that, I will have records, now records will be an array or arrays as shown above.
I want to sort the data inside the records per score in Math subject.
What is the best way to do this?
thanks
You can using unwind aggregate
db.collection.aggregate([
{
$unwind:{
path: "$Records",
includeArrayIndex:"firstUnwind"
}
},
{
$unwind:{
path: "$Records",
}
},
{ $sort : { Match: -1} },
{
$group: {
_id: {
id: "$_id",
firstUnwind: "firstUnwind"
},
Index_Number: {$first: "$Index_Number"},
Records: {$push: "$Records"}
}
},
{
$group: {
_id: "$_id.id",
Index_Number: {$first: "$Index_Number"},
Records: {$push: "$Records"}
}
}
])`

mongo db $push finding maximum number of something

i wanted to write pipleline code, that gives me the 5 users with the most tweets, i tried to use $push, i looked up the mongo db documentation and it also showed $sort. I get an syntax error on the text line, but atleast to me it is not an obvious one.
Would be really nice if someone could point me in the right direction, since i watched some videos and read pages, but did not find what is wrong with my code.
pipeline = [
{"$group" : {
"_id": "$user.screen_name",
{
"$push": {"texts" : "$text"}},
{
"$sort" : {"texts":-1}}},
{
"$limit" :5}}
]
This aggregation pipeline document gives you a very good structured way on how aggregation works, with examples.
And as per your question, you are asking the same things more than once.
Anyway, in your query $group should not contain $sort and $limit check syntax, and $push is placed wrongly $push syntax. So your aggregation query should be as below:
pipeline = [{
"$group": {
"_id": "$user.screen_name",
"teet_data": {
"$push": {
"texts": "$text"
}
}
}
}, {
"$sort": {
"texts": -1
}
}, {
"$limit": 5
}]
"I wanted to write pipleline code, that gives me the 5 users with the most tweets"
I can't say if this is an improvement over #yogesh' answer, but given your description, you only need to count the tweets. Not to pass them all along your pipeline. At the very least, using a $sum would be much more memory efficient:
pipeline = [{
"$group": {
"_id": "$user.screen_name",
"count": { "$sum": 1 }
}
}
}, {
"$sort": {
"count": -1
}
}, {
"$limit": 5
}]

Categories