I am completely new to mongoDB,I have this below query:
jds=jd.aggregate( [
{
"$group": {
"_id": {"house_NAME":"$house_NAME"},
"count": { "$sum": 1 }
}
},
{ "$match": { "count": { "$gt": 0 } } }
] )
which returns count of each house name present in the collection.
my collection is somewhat like below :
record_id house_NAME status
1 Thomas Open
2 Panther Close
3 Thomas Close
what I want is to only return the value whose status is "Open", I want to add "and" clause in above query so it return the count of only those documents whose status "Open". I don't know how exactly to do it.
I am stucked in it .any help will be greatly appreciated !
Thanks in advance !
You can add a $match stage at the start of the pipeline
jds=jd.aggregate([
{ "$match": { "status": "Open" }},
{ "$group": {
"_id": { "house_NAME": "$house_NAME" },
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 0 }}}
])
Related
With a schema like this
{
"doc1": {
"items": [
{
"item_id": 1
},
{
"item_id": 2
},
{
"item_id": 3
},
]
},
"doc2": {
"items": [
{
"item_id": 1
},
{
"item_id": 2
},
{
"item_id": 1
},
]
}
}
I want to query for documents that contain a duplicate item in their items array field. A duplicate means items with the same item_id field.
So the result for the example above should return doc2 only, because it has two items with the same item_id
Something like this?
qry = {
"items": {
"$size": {
"$ne": {
"items.unique_count" # obviously this doesn't exist, not sure how to do it
}
}
}
}
result = MyDocument.find(qry)
One option similar to #rickhg12hs and your suggestions is:
db.collection.aggregate([
{$match: {
$expr: {
$ne: [
{$size: "$items"},
{$size: {
$reduce: {
input: "$items",
initialValue: [],
in: {$setUnion: ["$$value", ["$$this.item_id"]]}
}
}
}
]
}
}
}
])
See how it works on the playground example
I have a database collection that has objects like this:
{
"_id": ObjectId("something"),
"name_lower": "total",
"name": "Total",
"mounts": [
[
"mount1",
"instance1"
],
[
"mount2",
"instance1"
],
[
"mount1",
"instance2"
],
[
"mount2",
"instance2"
]
]
}
Say I want to remove every mount that has the instance instance2, How would I go about doing that? I have been searching for quite a while.
You can do something like this
[
{
$unwind: "$mounts"
},
{
$match: {
"mounts": {
$ne: "instance2"
}
}
},
{
$group: {
_id: "$_id",
name: {
$first: "$name"
},
mounts: {
$push: "$mounts"
}
}
}
]
Working Mongo playground
This answer is based on #varman answer but more pythonic and efficient.
The first stage should be a $match condition to filter out documents that don't need to be updated.
Since the mounts key consists of a nested array, we have to $unwind it, so that we can remove array elements that need to be removed.
We have to apply the $match condition again to filter out the element that has to be removed.
Finally, we have to $group the pipeline by _id key, so that the documents which got $unwind in the previous stage will be groupped into a single document.
from pymongo import MongoClient
client = MongoClient("<URI-String>")
col = client["<DB-Name"]["<Collection-Name>"]
count = 0
for cursor in col.aggregate([
{
"$match": {
"mounts": {"$ne": "instance2"}
}
},
{
"$unwind": "$mounts"
},
{
"$match": {
"mounts": {"$ne": "instance2"}
}
},
{
"$group": {
"_id": "$_id",
"newMounts": {
"$push": "$mounts"
}
}
},
]):
# print(cursor)
col.update_one({
"_id": cursor["_id"]
}, {
"$set": {
"mounts": cursor["newMounts"]
}
})
count += 1
print("\r", count, end="")
print("\n\nDone!!!")
I'm trying to improve the performance of my app and my knowledge of MongoDB. I have been able to execute a fire and forget query that both creates fields if they don't exist and otherwise increment a value as follows:
date = "2018-6"
sid = "012345"
cid = "06789"
key = "MESSAGES.{}.{}.{}.{}".format(date, sid, cid, hour)
db.stats.update({}, { "$inc": { key : 1 }})
This produces a single document with the following structure:
document:
{
"MESSAGES": {
"2018-6": {
"012345": {
"06789": 1
},
"011111": {
"06667": 5
}
},{
"2018-5": {
"012345": {
"06789": 20
},
"011111": {
"06667": 15
}
}
}
}
As you can probably imagine it has become a bit of a nightmare to query this structure with increasing data. I'd like to achieve the same fire and forget query but with the implementation of a better indexable schema. Something like:
documents:
[{
"SID": "012345",
"MESSAGES: {
"MONTHS": {
"KEY": "2018-6",
"CHANNELS": {
"KEY": "06789",
"COUNT": 1
}
},{
"KEY": "2018-5",
"CHANNELS": {
"KEY": "06667",
"COUNT": 20
}
}
}
},
{
"SID": "011111",
"MESSAGES: {
"MONTHS": {
"KEY": "2018-6",
"CHANNELS": {
"KEY": "06667",
"COUNT": 5
}
},{
"KEY": "2018-5",
"CHANNELS": {
"KEY": "06667",
"COUNT": 15
}
}
}
}]
I'm working with a quite a large amount of data and these queries can happen many times a second so it's important that I just execute a thing once if at all possible. Any advice you can give is very welcome, feel free to criticise anything you see here too as my goal is to learn.
Thanks in advance!
UPDATED WITH ATTEMPT:
db.test.updateOne({"SERVER_ID": "23894723487sdf"}, {
"$addToSet" : {
"MESSAGES" : {
"DATE": "2018-6",
"CHANNELS": [{
"ID": "239048349",
"COUNT": NumberInt(1)
}]
}
},
"$inc" : {
"MESSAGES.CHANNELS.$.COUNT" : 1
}},
{upsert: true})
I want to iterate Mongodb database Arraylist items(TRANSACTION list) and remove Arraylist specific(TRANSACTION List) item using pymongo ?
I create Mongo collection as above using python pymongo. I want to iterate array list item using pymongo and remove final item only in Arraylist?
Data insert query using Python pymongo
# added new method create block chain_structure
def addCoinWiseTransaction(self, senz, coin, format_date):
self.collection = self.db.block_chain
coinValexists = self.collection.find({"_id": str(coin)}).count()
print('coin exists : ', coinValexists)
if (coinValexists > 0):
print('coin hash exists')
newTransaction = {"$push": {"TRANSACTION": {"SENDER": senz.attributes["#SENDER"],
"RECIVER": senz.attributes["#RECIVER"],
"T_NO_COIN": int(1),
"DATE": datetime.datetime.utcnow()
}}}
self.collection.update({"_id": str(coin)}, newTransaction)
else:
flag = senz.attributes["#f"];
print flag
if (flag == "ccb"):
print('new coin mined othir minner')
root = {"_id": str(coin)
, "S_ID": int(senz.attributes["#S_ID"]), "S_PARA": senz.attributes["#S_PARA"],
"FORMAT_DATE": format_date,
"NO_COIN": int(1),
"TRANSACTION": [{"MINER": senz.attributes["#M_S_ID"],
"RECIVER": senz.attributes["#RECIVER"],
"T_NO_COIN": int(1),
"DATE": datetime.datetime.utcnow()
}
]
}
self.collection.insert(root)
else:
print('new coin mined')
root = {"_id": str(coin)
, "S_ID": int(senz.attributes["#S_ID"]), "S_PARA": senz.attributes["#S_PARA"],
"FORMAT_DATE": format_date,
"NO_COIN": int(1),
"TRANSACTION": [{"MINER": "M_1",
"RECIVER": senz.sender,
"T_NO_COIN": int(1),
"DATE": datetime.datetime.utcnow()
}
]
}
self.collection.insert(root)
return 'DONE'
To remove the last entry, the general idea (as you have mentioned) is to iterate the array and grab the index of the last element as denoted by its DATE field, then update the collection by removing it using $pull. So the crucial piece of data you need for this to work is the DATE value and the document's _id.
One approach you could take is to first use the aggregation framework to get this data. With this, you can run a pipeline where the first step if filtering the documents in the collection by using the $match operator which uses standard MongoDB queries.
The next stage after filtering the documents is to flatten the TRANSACTION array i.e. denormalise the documents in the list so that you can filter the final item i.e. get the last document by the DATE field. This is made possible with the $unwind operator, which for each input document, outputs n documents where n is the number of array elements and can be zero for an empty array.
After deconstructing the array, in order to get the last document, use the $group operator where you can regroup the flattened documents and in the process use the group accumulator operators to obtain
the last TRANSACTION date by using the $max operator applied to its embedded DATE field.
So in essence, run the following pipeline and use the results to update the collection. For example, you can run the following pipeline:
mongo shell
db.block_chain.aggregate([
{ "$match": { "_id": coin_id } },
{ "$unwind": "$TRANSACTION" },
{
"$group": {
"_id": "$_id",
"last_transaction_date": { "$max": "$TRANSACTION.DATE" }
}
}
])
You can then get the document with the update data from this aggregate operation using the toArray() method or the aggregate cursor and update your collection:
var docs = db.block_chain.aggregate([
{ "$match": { "_id": coin_id } },
{ "$unwind": "$TRANSACTION" },
{
"$group": {
"_id": "$_id",
"LAST_TRANSACTION_DATE": { "$max": "$TRANSACTION.DATE" }
}
}
]).toArray()
db.block_chain.updateOne(
{ "_id": docs[0]._id },
{
"$pull": {
"TRANSACTION": {
"DATE": docs[0]["LAST_TRANSACTION_DATE"]
}
}
}
)
python
def remove_last_transaction(self, coin):
self.collection = self.db.block_chain
pipe = [
{ "$match": { "_id": str(coin) } },
{ "$unwind": "$TRANSACTION" },
{
"$group": {
"_id": "$_id",
"last_transaction_date": { "$max": "$TRANSACTION.DATE" }
}
}
]
# run aggregate pipeline
cursor = self.collection.aggregate(pipeline=pipe)
docs = list(cursor)
# run update
self.collection.update_one(
{ "_id": docs[0]["_id"] },
{
"$pull": {
"TRANSACTION": {
"DATE": docs[0]["LAST_TRANSACTION_DATE"]
}
}
}
)
Alternatively, you can run a single aggregate operation that will also update your collection using the $out pipeline which writes the results of the pipeline to the same collection:
If the collection specified by the $out operation already
exists, then upon completion of the aggregation, the $out stage atomically replaces the existing collection with the new results collection. The $out operation does not
change any indexes that existed on the previous collection. If the
aggregation fails, the $out operation makes no changes to
the pre-existing collection.
For example, you could run this pipeline:
mongo shell
db.block_chain.aggregate([
{ "$match": { "_id": coin_id } },
{ "$unwind": "$TRANSACTION" },
{ "$sort": { "TRANSACTION.DATE": 1 } }
{
"$group": {
"_id": "$_id",
"LAST_TRANSACTION": { "$last": "$TRANSACTION" },
"FORMAT_DATE": { "$first": "$FORMAT_DATE" },
"NO_COIN": { "$first": "$NO_COIN" },
"S_ID": { "$first": "$S_ID" },
"S_PARA": { "$first": "$S_PARA" },
"TRANSACTION": { "$push": "$TRANSACTION" }
}
},
{
"$project": {
"FORMAT_DATE": 1,
"NO_COIN": 1,
"S_ID": 1,
"S_PARA": 1,
"TRANSACTION": {
"$setDifference": ["$TRANSACTION", ["$LAST_TRANSACTION"]]
}
}
},
{ "$out": "block_chain" }
])
python
def remove_last_transaction(self, coin):
self.db.block_chain.aggregate([
{ "$match": { "_id": str(coin) } },
{ "$unwind": "$TRANSACTION" },
{ "$sort": { "TRANSACTION.DATE": 1 } },
{
"$group": {
"_id": "$_id",
"LAST_TRANSACTION": { "$last": "$TRANSACTION" },
"FORMAT_DATE": { "$first": "$FORMAT_DATE" },
"NO_COIN": { "$first": "$NO_COIN" },
"S_ID": { "$first": "$S_ID" },
"S_PARA": { "$first": "$S_PARA" },
"TRANSACTION": { "$push": "$TRANSACTION" }
}
},
{
"$project": {
"FORMAT_DATE": 1,
"NO_COIN": 1,
"S_ID": 1,
"S_PARA": 1,
"TRANSACTION": {
"$setDifference": ["$TRANSACTION", ["$LAST_TRANSACTION"]]
}
}
},
{ "$out": "block_chain" }
])
Whilst this approach can be more efficient than the first, it requires knowledge of the existing fields first so in some cases the solution cannot be practical.
With PyMongo, group by one key seems to be ok:
results = collection.group(key={"scan_status":0}, condition={'date': {'$gte': startdate}}, initial={"count": 0}, reduce=reducer)
results:
{u'count': 215339.0, u'scan_status': u'PENDING'} {u'count': 617263.0, u'scan_status': u'DONE'}
but when I try to do group by multiple keys I get an exception:
results = collection.group(key={"scan_status":0,"date":0}, condition={'date': {'$gte': startdate}}, initial={"count": 0}, reduce=reducer)
How can I do group by multiple fields correctly?
If you are trying to count over two keys then while it is possible using .group() your better option is via .aggregate().
This uses "native code operators" and not the JavaScript interpreted code as required by .group() to do the same basic "grouping" action as you are trying to achieve.
Particularly here is the $group pipeline operator:
result = collection.aggregate([
# Matchn the documents possible
{ "$match": { "date": { "$gte": startdate } } },
# Group the documents and "count" via $sum on the values
{ "$group": {
"_id": {
"scan_status": "$scan_status",
"date": "$date"
},
"count": { "$sum": 1 }
}}
])
In fact you probably want something that reduces the "date" into a distinct period. As in:
result = collection.aggregate([
# Matchn the documents possible
{ "$match": { "date": { "$gte": startdate } } },
# Group the documents and "count" via $sum on the values
{ "$group": {
"_id": {
"scan_status": "$scan_status",
"date": {
"year": { "$year": "$date" },
"month": { "$month" "$date" },
"day": { "$dayOfMonth": "$date" }
}
},
"count": { "$sum": 1 }
}}
])
Using the Date Aggregation Operators as shown here.
Or perhaps with basic "date math":
import datetime
from datetime import date
result = collection.aggregate([
# Matchn the documents possible
{ "$match": { "date": { "$gte": startdate } } },
# Group the documents and "count" via $sum on the values
# use "epoch" "1970-01-01" as a base to convert to integer
{ "$group": {
"_id": {
"scan_status": "$scan_status",
"date": {
"$subtract": [
{ "$subtract": [ "$date", date.fromtimestamp(0) ] },
{ "$mod": [
{ "$subtract": [ "$date", date.fromtimestamp(0) ] },
1000 * 60 * 60 * 24
]}
]
}
},
"count": { "$sum": 1 }
}}
])
Which will return integer values from "epoch" time instead of a compisite value object.
But all of these options are better than .group() as they use native coded routines and perform their actions much faster than the JavaScript code you need to supply otherwise.