I'm performing a search with this aggregate and would like to get my total count (to deal with my pagination).
results = mongo.db.perfumes.aggregate(
[
{"$match": {"$text": {"$search": db_query}}},
{
"$lookup": {
"from": "users",
"localField": "author",
"foreignField": "username",
"as": "creator",
}
},
{"$unwind": "$creator"},
{
"$project": {
"_id": "$_id",
"perfumeName": "$name",
"perfumeBrand": "$brand",
"perfumeDescription": "$description",
"date_updated": "$date_updated",
"perfumePicture": "$picture",
"isPublic": "$public",
"perfumeType": "$perfume_type",
"username": "$creator.username",
"firstName": "$creator.first_name",
"lastName": "$creator.last_name",
"profilePicture": "$creator.avatar",
}
},
{"$sort": {"perfumeName": 1}},
]
)
How could I get the count of results in my route so I can pass it to my template?
I cannot use results.count() as it is a CommandCursor.
Help please? Thank you!!
Using len method to return no.of elements in an array would be easier but if you still wanted an aggregation query to return count and actual docs at the same time then try using $facet or $group :
Query 1 :
{
$facet: {
docs: [ { $match: {} } ], // passes all docs into an array field
count: [ { $count: "count" } ] // counts no.of docs
}
},
/** re-create count field from array of one object to just a number */
{
$addFields: { count: { $arrayElemAt: [ "$count.count", 0 ] } }
}
Test : mongoplayground
Query 2 :
/** Group all docs without any condition & push all docs into an array field & count no.of docs flowing through iteration using `$sum` */
{
$group: { _id: "", docs: { $push: "$$ROOT" }, count: { $sum: 1 } }
}
Test : mongoplayground
Note :
Add one of these queries at the end of your current aggregation pipeline and remember if there are no docs after $match or $unwind stages then first query would not have count field but has docs : [] but second query will just return [], code it accordingly.
If you look at the CommandCursor's docs, it does not support count()
You can use the length filter in jinja template.
{{ results | length }}
I hope the above helps.
Related
i'm trying to get, from my MongoDB, all documents with the higher date.
My db is look like :
_id:"1"
date:"21-12-20"
report:"some stuff"
_id:"2"
date:"11-11-11"
report:"qualcosa"
_id:5fe08735b5a28812866cbc8a
date:"21-12-20"
report:Object
_id:5fe0b35e2f465c2a2bbfc0fd
date:"20-12-20"
report:"ciao"
and i would like to have a result like :
_id:"1"
date:"21-12-20"
report:"some stuff"
_id:5fe08735b5a28812866cbc8a
date:"21-12-20"
report:Object
I tried to run this script :
db.collection.find({}).sort([("date", -1)]).limit(1)
but it gives me only one document.
How can I get all the documents with the greatest date automatically?
Try to remove limit(1) and it's gonna work
If you add .limit(1) it's only ever going to give you one document.
Either use the answer as a query to another .find(), or you can write an aggregate query. If you data set is a modest size, I prefer the former for clarity.
max_date = list(db.collection.find({}).sort([("date", -1)])).limit(1)
if len(max_date) > 0:
db.collection.find({'date': max_date[0]['date']})
Use an aggregation pipeline like this:
db.collection.aggregate([
{ $group: { _id: null, data: { $push: "$$ROOT" } } },
{
$set: {
data: {
$filter: {
input: "$data",
cond: { $eq: [{ $max: "$data.date" }, "$$this.date"] }
}
}
}
},
{ $unwind: "$data" },
{ $replaceRoot: { newRoot: "$data" } }
])
I'm curious about the best approach to count the instances of a particular field, across all documents, in a given ElasticSearch index.
For example, if I've got the following documents in index goober:
{
'_id':'foo',
'field1':'a value',
'field2':'a value'
},
{
'_id':'bar',
'field1':'a value',
'field2':'a value'
},
{
'_id':'baz',
'field1':'a value',
'field3':'a value'
}
I'd like to know something like the following:
{
'index':'goober',
'field_counts':
'field1':3,
'field2':2,
'field3':1
}
Is this doable with a single query? or multiple? For what it's worth, I'm using python elasticsearch and elasticsearch-dsl clients.
I've successfully issued a GET request to /goober and retrieved the mappings, and am learning how to submit requests for aggregations for each field, but I'm interested in learning how many times a particular field appears across all documents.
Coming from using Solr, still getting my bearings with ES. Thanks in advance for any suggestions.
The below will return you the count of docs with "field2":
POST /INDEX/_search
{
"size": 0,
"query": {
"bool": {
"filter": {
"exists": {
"field": "field2"
}
}
}
}
}
And here is an example using multiple aggregates (will return each agg in a bucket with a count), using field exist counts:
POST /INDEX/_search
{
"size": 0,
"aggs": {
"field_has1": {
"filter": {
"exists": {
"field": "field1"
}
}
},
"field_has2": {
"filter": {
"exists": {
"field": "field2"
}
}
}
}
}
The behavior within each agg on the second example will mimic the behavior of the first query. In many cases, you can take a regular search query and nest those lookups within aggregate buckets.
Quick time-saver based on existing answer:
interesting_fields = ['field1', 'field2']
body = {
'size': 0,
'aggs': {f'has_{field_name}': {
"filter": {
"exists": {
"field": f'export.{field_name}'
}
}
} for field_name in interesting_fields},
}
print(requests.post('http://localhost:9200/INDEX/_search', json=body).json())
I am using MongoDB 3.4 and PyMongo. I have a set of keywords:
keywords = [ 'bar', 'foo', ..., 'zoo' ]
I also have a collection:
docs = { 'data' : ' ... bar foo ... ',
'data' : ' ... foo ... ',
'data' : ' ... zoo ... ' }
I am looking for a PyMongo aggregation query which is going to give me a dict:
{ 'bar' : 0, 'foo' : 2, ..., 'zoo' : 0 }
There isn't anything language specific about this, as the only solutions are either all aggregate or using mapReduce, where the latter is defined in JavaScript functions
Just setting up some sample data:
db.wordstuff.insertMany([
{ 'data': "foo brick bar" },
{ 'data': "brick foo" },
{ 'data': "bar brick baz" },
{ 'data': "bax" },
{ 'data': "brin brok fu foo" }
])
Aggregation Framework
Then you can run the aggregation statement:
db.wordstuff.aggregate([
{ "$project": {
"_id": 0,
"split": {
"$filter": {
"input": { "$split": [ "$data", " " ] },
"cond": { "$in": [ "$$this", ["bar","foo","baz","blat"] ] }
}
}
}},
{ "$unwind": "$split" },
{ "$group": { "_id": "$split", "count": { "$sum": 1 } }},
{ "$group": {
"_id": null,
"data": { "$push": { "k": "$_id", "v": "$count" } }
}},
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": {
"$map": {
"input": ["bar","foo","baz","blat"],
"as": "d",
"in": {
"$cond": {
"if": { "$ne": [{ "$indexOfArray": ["$data.k","$$d"] },-1] },
"then": {
"$arrayElemAt": [
"$data",
{ "$indexOfArray": ["$data.k","$$d"] }
]
},
"else": { "k": "$$d", "v": 0 }
}
}
}
}
}
}}
])
In reality, all of the real work is done by this point:
db.wordstuff.aggregate([
{ "$project": {
"_id": 0,
"split": {
"$filter": {
"input": { "$split": [ "$data", " " ] },
"cond": { "$in": [ "$$this", ["bar","foo","baz","blat"] ] }
}
}
}},
{ "$unwind": "$split" },
{ "$group": { "_id": "$split", "count": { "$sum": 1 } }},
])
Which gives you output like:
{ "_id" : "baz", "count" : 1.0 }
{ "_id" : "bar", "count" : 2.0 }
{ "_id" : "foo", "count" : 3.0 }
So the real work here is being done by $split and that is the main dependency on using the aggregation framework, so you need MongoDB 3.4 at least in order to do this. The very simple premise is to $split the words out individually as array members, then $filter the content to match the input array of words to match.
That $filter uses $in, which is another addition as of MongoDB 3.4 to match against each listed word. There are other operators that can do this with longer syntax, but we know we already need MongoDB 3.4 so this is the shortest syntax.
All that is really done after that is to $unwind the matched array of words from each document, then $group to obtain those matched words as a distinct list, along with the count of the occurrences.
That really is all there is to it from the main perspective of the database.
The following parts are actually "optional" since these are easy to reproduce in code, and probably look a lot clearer and cleaner by doing so. But just to demonstrate the newer operators that would require MongoDB 3.4.4 at least for the introduction of $arrayToObject.
Again the basics are that the next $group "rolls up" the matched words from the cursor into an array within a single document. There is also a very specific key naming applied of "k" and "v" for later reasons.
Then you use a $replaceRoot stage since the content of the document returned is evaluated from an expression. This expression uses $map to iterate over the "input array" of words and matches those to the entries created from the aggregation. This matching is done using $indexOfArray do return the matched index of the compared value.
You use this within $cond as you either want to transform that value into a matched elment using $arrayElemAt, or alternately recognize the index was not a match. This either returns the aggregated entry ( obtained from earlier matches ) or a "default" value of 0 for the given word.
The final part uses $arrayToObject which transforms an array of objects with properties "k" and "v" in to "key/value" pairs as an object.
So you can ask MongoDB to do it, but the data is actually reduced by the minimal pipeline as shown, so you may as well do it in client code. It's pretty simple, and for JavaScript you just do:
var words = db.wordstuff.aggregate([
{ "$project": {
"_id": 0,
"split": {
"$filter": {
"input": { "$split": [ "$data", " " ] },
"cond": { "$in": [ "$$this", ["bar","foo","baz","blat"] ] }
}
}
}},
{ "$unwind": "$split" },
{ "$group": { "_id": "$split", "count": { "$sum": 1 } }},
]).toArray();
var result = ["bar","foo","baz","blat"].map(
w => ( words.map(wd => wd._id).indexOf(w) !== -1)
? words[words.map(wd => wd._id).indexOf(w)]
: { _id: w, count: 0 }
).reduce((acc,curr) => Object.assign(acc,{ [curr._id]: curr.count }),{})
So if there is anything that's language specific at all, then that would be the part. So if you choose to run the aggregation at it's basics and process the resulting cursor, then the python code would be:
input = ["bar","foo","baz","blat"]
words = list(db.wordstuff.aggregate([
{ "$project": {
"_id": 0,
"split": {
"$filter": {
"input": { "$split": [ "$data", " " ] },
"cond": { "$in": [ "$$this", input ] }
}
}
}},
{ "$unwind": "$split" },
{ "$group": { "_id": "$split", "count": { "$sum": 1 } }},
]))
result = reduce(
lambda x,y:
dict(x.items() + { y['_id']: y['count'] }.items()),
map(lambda w: words[map(lambda wd: wd['_id'],words).index(w)]
if w in map(lambda wd: wd['_id'],words)
else { '_id': w, 'count': 0 },
input
),
{}
)
And either method pulls out the same result:
{
"bar" : 2.0,
"foo" : 3.0,
"baz" : 1.0,
"blat" : 0.0
}
MapReduce
The alternate case where you don't even have the minimum MongoDB 3.4.0 available is to use mapReduce for the process instead. Again, this needs to be sent to the server as JavaScript, which is generally represented within "strings" in most language implementations ( other than JavaScript itself ):
db.wordstuff.mapReduce(
function() {
this.data.split(' ')
.filter( w => words.indexOf(w) !== -1 )
.forEach( w => emit(null,{ [w]: 1 }) );
},
function(key,values) {
return [].concat.apply([],
values.map(v => Object.keys(v).map(k => ({ k: k, v: v[k] })))
).reduce((acc,curr) => Object.assign(acc,{
[curr.k]: (acc.hasOwnProperty(curr.k))
? acc[curr.k] + curr.v : curr.v
}),{});
},
{
"out": { "inline": 1 },
"scope": { "words": ["bar","foo","baz","blat"] },
"finalize": function(key,value) {
return words.map( w => (value.hasOwnProperty(w))
? { [w]: value[w] } : { [w]: 0 }
).reduce((acc,curr) => Object.assign(acc,curr),{})
}
}
)
And that gives you the same results and really does exactly the same thing. Just a little slower because MongoDB needs to evaluate and process the JavaScript as compared to using it's own native coded methods with the aggregation framework.
I want to iterate Mongodb database Arraylist items(TRANSACTION list) and remove Arraylist specific(TRANSACTION List) item using pymongo ?
I create Mongo collection as above using python pymongo. I want to iterate array list item using pymongo and remove final item only in Arraylist?
Data insert query using Python pymongo
# added new method create block chain_structure
def addCoinWiseTransaction(self, senz, coin, format_date):
self.collection = self.db.block_chain
coinValexists = self.collection.find({"_id": str(coin)}).count()
print('coin exists : ', coinValexists)
if (coinValexists > 0):
print('coin hash exists')
newTransaction = {"$push": {"TRANSACTION": {"SENDER": senz.attributes["#SENDER"],
"RECIVER": senz.attributes["#RECIVER"],
"T_NO_COIN": int(1),
"DATE": datetime.datetime.utcnow()
}}}
self.collection.update({"_id": str(coin)}, newTransaction)
else:
flag = senz.attributes["#f"];
print flag
if (flag == "ccb"):
print('new coin mined othir minner')
root = {"_id": str(coin)
, "S_ID": int(senz.attributes["#S_ID"]), "S_PARA": senz.attributes["#S_PARA"],
"FORMAT_DATE": format_date,
"NO_COIN": int(1),
"TRANSACTION": [{"MINER": senz.attributes["#M_S_ID"],
"RECIVER": senz.attributes["#RECIVER"],
"T_NO_COIN": int(1),
"DATE": datetime.datetime.utcnow()
}
]
}
self.collection.insert(root)
else:
print('new coin mined')
root = {"_id": str(coin)
, "S_ID": int(senz.attributes["#S_ID"]), "S_PARA": senz.attributes["#S_PARA"],
"FORMAT_DATE": format_date,
"NO_COIN": int(1),
"TRANSACTION": [{"MINER": "M_1",
"RECIVER": senz.sender,
"T_NO_COIN": int(1),
"DATE": datetime.datetime.utcnow()
}
]
}
self.collection.insert(root)
return 'DONE'
To remove the last entry, the general idea (as you have mentioned) is to iterate the array and grab the index of the last element as denoted by its DATE field, then update the collection by removing it using $pull. So the crucial piece of data you need for this to work is the DATE value and the document's _id.
One approach you could take is to first use the aggregation framework to get this data. With this, you can run a pipeline where the first step if filtering the documents in the collection by using the $match operator which uses standard MongoDB queries.
The next stage after filtering the documents is to flatten the TRANSACTION array i.e. denormalise the documents in the list so that you can filter the final item i.e. get the last document by the DATE field. This is made possible with the $unwind operator, which for each input document, outputs n documents where n is the number of array elements and can be zero for an empty array.
After deconstructing the array, in order to get the last document, use the $group operator where you can regroup the flattened documents and in the process use the group accumulator operators to obtain
the last TRANSACTION date by using the $max operator applied to its embedded DATE field.
So in essence, run the following pipeline and use the results to update the collection. For example, you can run the following pipeline:
mongo shell
db.block_chain.aggregate([
{ "$match": { "_id": coin_id } },
{ "$unwind": "$TRANSACTION" },
{
"$group": {
"_id": "$_id",
"last_transaction_date": { "$max": "$TRANSACTION.DATE" }
}
}
])
You can then get the document with the update data from this aggregate operation using the toArray() method or the aggregate cursor and update your collection:
var docs = db.block_chain.aggregate([
{ "$match": { "_id": coin_id } },
{ "$unwind": "$TRANSACTION" },
{
"$group": {
"_id": "$_id",
"LAST_TRANSACTION_DATE": { "$max": "$TRANSACTION.DATE" }
}
}
]).toArray()
db.block_chain.updateOne(
{ "_id": docs[0]._id },
{
"$pull": {
"TRANSACTION": {
"DATE": docs[0]["LAST_TRANSACTION_DATE"]
}
}
}
)
python
def remove_last_transaction(self, coin):
self.collection = self.db.block_chain
pipe = [
{ "$match": { "_id": str(coin) } },
{ "$unwind": "$TRANSACTION" },
{
"$group": {
"_id": "$_id",
"last_transaction_date": { "$max": "$TRANSACTION.DATE" }
}
}
]
# run aggregate pipeline
cursor = self.collection.aggregate(pipeline=pipe)
docs = list(cursor)
# run update
self.collection.update_one(
{ "_id": docs[0]["_id"] },
{
"$pull": {
"TRANSACTION": {
"DATE": docs[0]["LAST_TRANSACTION_DATE"]
}
}
}
)
Alternatively, you can run a single aggregate operation that will also update your collection using the $out pipeline which writes the results of the pipeline to the same collection:
If the collection specified by the $out operation already
exists, then upon completion of the aggregation, the $out stage atomically replaces the existing collection with the new results collection. The $out operation does not
change any indexes that existed on the previous collection. If the
aggregation fails, the $out operation makes no changes to
the pre-existing collection.
For example, you could run this pipeline:
mongo shell
db.block_chain.aggregate([
{ "$match": { "_id": coin_id } },
{ "$unwind": "$TRANSACTION" },
{ "$sort": { "TRANSACTION.DATE": 1 } }
{
"$group": {
"_id": "$_id",
"LAST_TRANSACTION": { "$last": "$TRANSACTION" },
"FORMAT_DATE": { "$first": "$FORMAT_DATE" },
"NO_COIN": { "$first": "$NO_COIN" },
"S_ID": { "$first": "$S_ID" },
"S_PARA": { "$first": "$S_PARA" },
"TRANSACTION": { "$push": "$TRANSACTION" }
}
},
{
"$project": {
"FORMAT_DATE": 1,
"NO_COIN": 1,
"S_ID": 1,
"S_PARA": 1,
"TRANSACTION": {
"$setDifference": ["$TRANSACTION", ["$LAST_TRANSACTION"]]
}
}
},
{ "$out": "block_chain" }
])
python
def remove_last_transaction(self, coin):
self.db.block_chain.aggregate([
{ "$match": { "_id": str(coin) } },
{ "$unwind": "$TRANSACTION" },
{ "$sort": { "TRANSACTION.DATE": 1 } },
{
"$group": {
"_id": "$_id",
"LAST_TRANSACTION": { "$last": "$TRANSACTION" },
"FORMAT_DATE": { "$first": "$FORMAT_DATE" },
"NO_COIN": { "$first": "$NO_COIN" },
"S_ID": { "$first": "$S_ID" },
"S_PARA": { "$first": "$S_PARA" },
"TRANSACTION": { "$push": "$TRANSACTION" }
}
},
{
"$project": {
"FORMAT_DATE": 1,
"NO_COIN": 1,
"S_ID": 1,
"S_PARA": 1,
"TRANSACTION": {
"$setDifference": ["$TRANSACTION", ["$LAST_TRANSACTION"]]
}
}
},
{ "$out": "block_chain" }
])
Whilst this approach can be more efficient than the first, it requires knowledge of the existing fields first so in some cases the solution cannot be practical.
Let's say we have mongodb documents:
{'shop':'yes'}
{'shop':'ice_cream'}
{'shop':'grocery'}
{'amenity':'yes'}
{'amenity':'hotel'}
How do I write an aggregate query in pymongo which would return the values that are common for both keys? In that example it should return 'yes'.
Your aggregation pipeline would make use of the $setIntersection in the $project operator stage. This takes two or more arrays and returns an array that contains the elements that appear in every input array. Another aggregation operator that is useful is the $addToSet array operator which is used in creating the distinct list of values for each grouped field that can then be compared later on.
In mongoshell, inserting the documents
db.collection.insert([
{'shop':'yes'},
{'shop':'ice_cream'},
{'shop':'grocery'},
{'amenity':'yes'},
{'amenity':'hotel'}
])
You could try the following aggregation pipeline:
db.collection.aggregate([
{
"$group": {
"_id": null,
"shops": {
"$addToSet": "$shop"
},
"amenities": {
"$addToSet": "$amenity"
}
}
},
{
"$project": {
"_id": 0,
"commonToBoth": { "$setIntersection": [ "$shops", "$amenities" ] }
}
}
]);
Output:
/* 0 */
{
"result" : [
{
"commonToBoth" : [
"yes"
]
}
],
"ok" : 1
}
Pymongo:
>>> pipe = [
... {"$group": { "_id": None, "shops": {"$addToSet": "$shop"}, "amenities": {"$addToSet": "$amenity"}}},
... { "$project": {"_id": 0, "commonToBoth":{"$setIntersection": ["$shops", "$amenities"]}}}
... ]
>>>
>>> for doc in collection.aggregate(pipe):
... print(doc)
...
{u'commonToBoth': [u'yes']}