Sorting nested array in mongodb

Sorting nested array in mongodb - python

I have following structure in mongodb collection as shown in the attached image:
enter image description here
I will have an index number and inside that, I will have records, now records will be an array or arrays as shown above.
I want to sort the data inside the records per score in Math subject.
What is the best way to do this?
thanks

You can using unwind aggregate
db.collection.aggregate([
{
$unwind:{
path: "$Records",
includeArrayIndex:"firstUnwind"
}
},
{
$unwind:{
path: "$Records",
}
},
{ $sort : { Match: -1} },
{
$group: {
_id: {
id: "$_id",
firstUnwind: "firstUnwind"
},
Index_Number: {$first: "$Index_Number"},
Records: {$push: "$Records"}
}
},
{
$group: {
_id: "$_id.id",
Index_Number: {$first: "$Index_Number"},
Records: {$push: "$Records"}
}
}
])`

Related

How to get all document with max date?

i'm trying to get, from my MongoDB, all documents with the higher date.
My db is look like :
_id:"1"
date:"21-12-20"
report:"some stuff"
_id:"2"
date:"11-11-11"
report:"qualcosa"
_id:5fe08735b5a28812866cbc8a
date:"21-12-20"
report:Object
_id:5fe0b35e2f465c2a2bbfc0fd
date:"20-12-20"
report:"ciao"
and i would like to have a result like :
_id:"1"
date:"21-12-20"
report:"some stuff"
_id:5fe08735b5a28812866cbc8a
date:"21-12-20"
report:Object
I tried to run this script :
db.collection.find({}).sort([("date", -1)]).limit(1)
but it gives me only one document.
How can I get all the documents with the greatest date automatically?

Try to remove limit(1) and it's gonna work

If you add .limit(1) it's only ever going to give you one document.
Either use the answer as a query to another .find(), or you can write an aggregate query. If you data set is a modest size, I prefer the former for clarity.
max_date = list(db.collection.find({}).sort([("date", -1)])).limit(1)
if len(max_date) > 0:
db.collection.find({'date': max_date[0]['date']})

Use an aggregation pipeline like this:
db.collection.aggregate([
{ $group: { _id: null, data: { $push: "$$ROOT" } } },
{
$set: {
data: {
$filter: {
input: "$data",
cond: { $eq: [{ $max: "$data.date" }, "$$this.date"] }
}
}
}
},
{ $unwind: "$data" },
{ $replaceRoot: { newRoot: "$data" } }
])

Getting first entry of each unique combination of fields from elasticsearch

I have an elasticsearch database that I upload entries like this:
{"s_food": "bread", "s_store": "Safeway", "s_date" : "2020-06-30", "l_run" : 28900, "l_covered": 1}
When I upload it to elasticsearch, it adds an _id, _type, #timestamp and _index fields. So the entries look sort of like this:
{"s_food": "bread", "s_store": "Safeway", "s_date" : "2020-06-30", "l_run" : 28900, "l_covered": 1, "_type": "_doc", "_index": "my_index", "_id": pe39u5hs874kee}
The way that I'm using the elasticsearch database results in the same original entries getting uploaded multiple times. In this example, I only care about the s_food, s_date, and l_run fields being a unique combination. Since I have so many entries, I'd like to use the elasticsearch scroll tool to go through all the matches. So far in elasticsearch, I've only seen people use aggregation to get buckets of each term and then they iterate over each partition. I would like to use something like aggregation to get an entire entry (just 1) for each unique combination of the three fields that I care about (food, date, run). Right now I use aggregation with a scroll like so:
GET /my-index/_search?scroll=25m
{
size: 10000,
aggs: {
foods: {
terms: {
field: s_food
},
aggs: {
dates: {
terms: {
field: s_date
},
aggs: {
runs: {
terms: {
field: l_run
}
}
}
}
}
}
}
Unfortunately this is only giving me the usual bucketed structure that I don't want. Is there something else I should try?

All you need is to use top-hits aggregation with size: 1. Read more about top-hits aggregation here.
The query would look like this:
{
"size": 10000,
"aggs": {
"foods": {
"terms": {
"field": "s_food"
},
"aggs": {
"dates": {
"terms": {
"field": "s_date"
},
"aggs": {
"runs": {
"terms": {
"field": "l_run"
},
"aggs": {
"topOne": {
"top_hits": {
"size": 1
}
}
}
}
}
}
}
}
}
}
}

Mongodb get count() of CommandCursor

I'm performing a search with this aggregate and would like to get my total count (to deal with my pagination).
results = mongo.db.perfumes.aggregate(
[
{"$match": {"$text": {"$search": db_query}}},
{
"$lookup": {
"from": "users",
"localField": "author",
"foreignField": "username",
"as": "creator",
}
},
{"$unwind": "$creator"},
{
"$project": {
"_id": "$_id",
"perfumeName": "$name",
"perfumeBrand": "$brand",
"perfumeDescription": "$description",
"date_updated": "$date_updated",
"perfumePicture": "$picture",
"isPublic": "$public",
"perfumeType": "$perfume_type",
"username": "$creator.username",
"firstName": "$creator.first_name",
"lastName": "$creator.last_name",
"profilePicture": "$creator.avatar",
}
},
{"$sort": {"perfumeName": 1}},
]
)
How could I get the count of results in my route so I can pass it to my template?
I cannot use results.count() as it is a CommandCursor.
Help please? Thank you!!

Using len method to return no.of elements in an array would be easier but if you still wanted an aggregation query to return count and actual docs at the same time then try using $facet or $group :
Query 1 :
{
$facet: {
docs: [ { $match: {} } ], // passes all docs into an array field
count: [ { $count: "count" } ] // counts no.of docs
}
},
/** re-create count field from array of one object to just a number */
{
$addFields: { count: { $arrayElemAt: [ "$count.count", 0 ] } }
}
Test : mongoplayground
Query 2 :
/** Group all docs without any condition & push all docs into an array field & count no.of docs flowing through iteration using `$sum` */
{
$group: { _id: "", docs: { $push: "$$ROOT" }, count: { $sum: 1 } }
}
Test : mongoplayground
Note :
Add one of these queries at the end of your current aggregation pipeline and remember if there are no docs after $match or $unwind stages then first query would not have count field but has docs : [] but second query will just return [], code it accordingly.

If you look at the CommandCursor's docs, it does not support count()
You can use the length filter in jinja template.
{{ results | length }}
I hope the above helps.

Is there a way to eliminate all duplicates from a collection?

I have a collection where the objects have a structure similar to
{'_id': ObjectId('5e691cb9e73282f624362221'),
'created_at': 'Tue Mar 10 09:23:54 +0000 2020',
'id': 1237308186757120001,
'id_str': '1237308186757120001',
'full_text': 'See you in July'}
I am struggling to only keep object which have a unique full text. Using distinct only gives me a list of the distinct full text field values where as I want to only conserve object in the collection with unique full texts.

There is, the code should look like this:
dict = {"a": 1, "b": 2, "c": 3, "a": 5, "d": 4, "e": 5, "c": 8}
#New clean dictionary
unique = {}
#Go through the original dictionary's items
for key, value in dict.items():
if(key in unique.keys()):
#If the key already exists in the new dictionary
continue
else:
#Otherwise
unique[key] = value
print(unique)
I hope this helps you!

There are 2 ways:
MongoDB way
We perform MongoDB aggregation where we group records by full_text, filter unique documents only and insert them into collection. (in the shell)
db.collection.aggregate([
{
$group: {
_id: "$full_text",
data: {
$push: "$$ROOT"
},
count: {
$sum: 1
}
}
},
{
$match: {
count: {
$eq: 1
}
}
},
{
$addFields: {
data: {
$arrayElemAt: [
"$data",
0
]
}
}
},
{
$replaceRoot: {
newRoot: "$data"
}
},
{
$out: "tmp"
}
])
When you run this query, it will create new collection with unique full_text values. You can drop old collection and rename this one.
You may also put your collection name into $out operator like this {$out:"collection"}, but there is no going back.
Python way
We perform MongoDB aggregation grouping by full_text field, filter duplicate documents and create single array with all _id to be removed. Once MongoDB returns results, we execute remove command for duplicate documents.
db.collection.aggregate([
{
$group: {
_id: "$full_text",
data: {
$push: "$_id"
},
count: {
$sum: 1
}
}
},
{
$match: {
count: {
$gt: 1
}
}
},
{
$group: {
_id: null,
data: {
$push: "$data"
}
}
},
{
$addFields: {
data: {
$reduce: {
input: "$data",
initialValue: [],
in: {
$concatArrays: [
"$$value",
"$$this"
]
}
}
}
}
}
])
MongoPlayground
Pseudocode
data = list(collection.aggregate(...))
if len(data) > 0:
colleciton.remove({'_id':{'$in':data[0]["data"]}})

How Iterate or remove MongoDb array list item using pymongo?

I want to iterate Mongodb database Arraylist items(TRANSACTION list) and remove Arraylist specific(TRANSACTION List) item using pymongo ?
I create Mongo collection as above using python pymongo. I want to iterate array list item using pymongo and remove final item only in Arraylist?
Data insert query using Python pymongo
# added new method create block chain_structure
def addCoinWiseTransaction(self, senz, coin, format_date):
self.collection = self.db.block_chain
coinValexists = self.collection.find({"_id": str(coin)}).count()
print('coin exists : ', coinValexists)
if (coinValexists > 0):
print('coin hash exists')
newTransaction = {"$push": {"TRANSACTION": {"SENDER": senz.attributes["#SENDER"],
"RECIVER": senz.attributes["#RECIVER"],
"T_NO_COIN": int(1),
"DATE": datetime.datetime.utcnow()
}}}
self.collection.update({"_id": str(coin)}, newTransaction)
else:
flag = senz.attributes["#f"];
print flag
if (flag == "ccb"):
print('new coin mined othir minner')
root = {"_id": str(coin)
, "S_ID": int(senz.attributes["#S_ID"]), "S_PARA": senz.attributes["#S_PARA"],
"FORMAT_DATE": format_date,
"NO_COIN": int(1),
"TRANSACTION": [{"MINER": senz.attributes["#M_S_ID"],
"RECIVER": senz.attributes["#RECIVER"],
"T_NO_COIN": int(1),
"DATE": datetime.datetime.utcnow()
}
]
}
self.collection.insert(root)
else:
print('new coin mined')
root = {"_id": str(coin)
, "S_ID": int(senz.attributes["#S_ID"]), "S_PARA": senz.attributes["#S_PARA"],
"FORMAT_DATE": format_date,
"NO_COIN": int(1),
"TRANSACTION": [{"MINER": "M_1",
"RECIVER": senz.sender,
"T_NO_COIN": int(1),
"DATE": datetime.datetime.utcnow()
}
]
}
self.collection.insert(root)
return 'DONE'

To remove the last entry, the general idea (as you have mentioned) is to iterate the array and grab the index of the last element as denoted by its DATE field, then update the collection by removing it using $pull. So the crucial piece of data you need for this to work is the DATE value and the document's _id.
One approach you could take is to first use the aggregation framework to get this data. With this, you can run a pipeline where the first step if filtering the documents in the collection by using the $match operator which uses standard MongoDB queries.
The next stage after filtering the documents is to flatten the TRANSACTION array i.e. denormalise the documents in the list so that you can filter the final item i.e. get the last document by the DATE field. This is made possible with the $unwind operator, which for each input document, outputs n documents where n is the number of array elements and can be zero for an empty array.
After deconstructing the array, in order to get the last document, use the $group operator where you can regroup the flattened documents and in the process use the group accumulator operators to obtain
the last TRANSACTION date by using the $max operator applied to its embedded DATE field.
So in essence, run the following pipeline and use the results to update the collection. For example, you can run the following pipeline:
mongo shell
db.block_chain.aggregate([
{ "$match": { "_id": coin_id } },
{ "$unwind": "$TRANSACTION" },
{
"$group": {
"_id": "$_id",
"last_transaction_date": { "$max": "$TRANSACTION.DATE" }
}
}
])
You can then get the document with the update data from this aggregate operation using the toArray() method or the aggregate cursor and update your collection:
var docs = db.block_chain.aggregate([
{ "$match": { "_id": coin_id } },
{ "$unwind": "$TRANSACTION" },
{
"$group": {
"_id": "$_id",
"LAST_TRANSACTION_DATE": { "$max": "$TRANSACTION.DATE" }
}
}
]).toArray()
db.block_chain.updateOne(
{ "_id": docs[0]._id },
{
"$pull": {
"TRANSACTION": {
"DATE": docs[0]["LAST_TRANSACTION_DATE"]
}
}
}
)
python
def remove_last_transaction(self, coin):
self.collection = self.db.block_chain
pipe = [
{ "$match": { "_id": str(coin) } },
{ "$unwind": "$TRANSACTION" },
{
"$group": {
"_id": "$_id",
"last_transaction_date": { "$max": "$TRANSACTION.DATE" }
}
}
]
# run aggregate pipeline
cursor = self.collection.aggregate(pipeline=pipe)
docs = list(cursor)
# run update
self.collection.update_one(
{ "_id": docs[0]["_id"] },
{
"$pull": {
"TRANSACTION": {
"DATE": docs[0]["LAST_TRANSACTION_DATE"]
}
}
}
)
Alternatively, you can run a single aggregate operation that will also update your collection using the $out pipeline which writes the results of the pipeline to the same collection:
If the collection specified by the $out operation already
exists, then upon completion of the aggregation, the $out stage atomically replaces the existing collection with the new results collection. The $out operation does not
change any indexes that existed on the previous collection. If the
aggregation fails, the $out operation makes no changes to
the pre-existing collection.
For example, you could run this pipeline:
mongo shell
db.block_chain.aggregate([
{ "$match": { "_id": coin_id } },
{ "$unwind": "$TRANSACTION" },
{ "$sort": { "TRANSACTION.DATE": 1 } }
{
"$group": {
"_id": "$_id",
"LAST_TRANSACTION": { "$last": "$TRANSACTION" },
"FORMAT_DATE": { "$first": "$FORMAT_DATE" },
"NO_COIN": { "$first": "$NO_COIN" },
"S_ID": { "$first": "$S_ID" },
"S_PARA": { "$first": "$S_PARA" },
"TRANSACTION": { "$push": "$TRANSACTION" }
}
},
{
"$project": {
"FORMAT_DATE": 1,
"NO_COIN": 1,
"S_ID": 1,
"S_PARA": 1,
"TRANSACTION": {
"$setDifference": ["$TRANSACTION", ["$LAST_TRANSACTION"]]
}
}
},
{ "$out": "block_chain" }
])
python
def remove_last_transaction(self, coin):
self.db.block_chain.aggregate([
{ "$match": { "_id": str(coin) } },
{ "$unwind": "$TRANSACTION" },
{ "$sort": { "TRANSACTION.DATE": 1 } },
{
"$group": {
"_id": "$_id",
"LAST_TRANSACTION": { "$last": "$TRANSACTION" },
"FORMAT_DATE": { "$first": "$FORMAT_DATE" },
"NO_COIN": { "$first": "$NO_COIN" },
"S_ID": { "$first": "$S_ID" },
"S_PARA": { "$first": "$S_PARA" },
"TRANSACTION": { "$push": "$TRANSACTION" }
}
},
{
"$project": {
"FORMAT_DATE": 1,
"NO_COIN": 1,
"S_ID": 1,
"S_PARA": 1,
"TRANSACTION": {
"$setDifference": ["$TRANSACTION", ["$LAST_TRANSACTION"]]
}
}
},
{ "$out": "block_chain" }
])
Whilst this approach can be more efficient than the first, it requires knowledge of the existing fields first so in some cases the solution cannot be practical.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sorting nested array in mongodb - python

Related

How to get all document with max date?

Getting first entry of each unique combination of fields from elasticsearch

Mongodb get count() of CommandCursor

Is there a way to eliminate all duplicates from a collection?

How Iterate or remove MongoDb array list item using pymongo?

Categories

Resources