MongoDB - Querying a nested boolean field - python

I have the following mongoclient query:
db = db.getSiblingDB("test-db");
hosts = db.getCollection("test-collection")
db.hosts.aggregate([
{$match: {"ip_str": {$in: ["52.217.105.116"]}}}
]);
Which outputs this:
{
"_id" : ObjectId("..."),
"ip_str" : "52.217.105.116",
"data" : [
{"ssl" : {"cert" : {"expired" : "False"}}}
]
}
I'm trying to build the query so it returns a boolean True or False depending on the value of the ssl.cert.expired field. I'm not quite sure how to do this though. I've had a look into the $lookup and $where operators, but am not overly familiar with querying nested objects in Mongo yet.

As the data is an array, in order to get the (first) element of the nested expired, you should work with $arrayElemAt and provide an index as 0 to indicate the first element.
{
$project: {
ip_str: 1,
expired: {
$arrayElemAt: [
"$data.ssl.cert.expired",
0
]
}
}
}
Demo # Mongo Playgound

Related

How to scroll through elastic query results, python

I'm querying my elastic search server and limiting it to 100 results, but there could be a potential of 5000+ results, but for speed I don't want to overload the users connection trying to send it all in bulk.
data = es.search(index=case_to_view, size=100,body={
"query": {
"range" : {
"someRandomFIeld" : {
"gte" : 1,
}
}
}
})
This is doing two things, getting me results that have the field type and only getting the results where that field type exists if its value is greater than equal to 1.
data['hits']['total'] # 5089
How do I let the user get the next lot of results from the same query, ie. The next 100, previous 100, etc
You'll want to utilize the "from" and "size" properties.
You can see it here in the 7.0 documentation.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html
ex :
{
"from" : 0, "size" : 10,
"query" : {
"term" : { "user" : "kimchy" }
}
}

Use $cond within $match in mongoDB aggregation

i've tried to use $cond within $match in one of the stages of an aggregation as shown below :
{ "$match" : {
"field1" : {
"$cond" : {
"if" : { "$eq" : [ "$id" , 1206]},
"then" : 0,
"else" : 1545001200
}
},
"field2" : value2 }}
But i got this error :
Error:
Assert: command failed: {
"ok" : 0,
"errmsg" : "unknown operator: $cond",
"code" : 2,
"codeName" : "BadValue"
} : aggregate failed
The mongodb version is 3.4.4.
Any idea about this issue?
You just have to reword the logic a little bit.
{ $match: { $expr: {
$or: [
{ $and: [
{ $eq: [ "$id", 1206 ] },
{ $eq: [ "$field1", 0 ] }
]},
{ $and: [
{ $ne: [ "$id", 1206 ] },
{ $eq: [ "$field1", 1545001200 ] }
]},
],
}}}
Logically, the two statements are equivalent:
Match the document by checking field1 == 0 if id == 1206, otherwise match the document by checking field1 == 1545001200
Match the document if either (id == 1206 and field1 == 0) or (id != 1206 and field1 == 1545001200).
For those coming across this later on down the road:
This won't work for 3.4.4. But in MongoDB 3.6 they introduced the $expr operator that lets you use $cond and other operations within a $match query.
https://docs.mongodb.com/manual/reference/operator/aggregation/match/
For an example see iamfrank's answer.
Also as mentioned in the comments you could do this later down the pipeline. But ideally you'll want to filter out results as early on in the pipeline using $match in order to improve processing times.
Unless important details were left out of the question, I think you are complicating something simple. Filtering during a $match aggregation step is the natural expected thing it should do.
For this particular example, there are only two simple scenarios to match a document. There is no need to use any other operators, just define the two different mutually exclusive queries and put them in an $or logical operator:
{'$match': {'$or': [
{'id': 1206, 'field1': 0},
{'id': {'$ne': 1206}, 'field1': 1545001200},
]}}

pymongo date field converted to unknown date format

I am getting data from one collection using python and I will be processing it and storing it in another collection. In the processed collection some of the date fields looks different like Date(-61833715200000).
I use below code to get data and processing it and then I bulk insert the values to new collection.
fleet_managers = taximongo.users.aggregate([{ "$match": { "role" : "fleet_manager"}}])
fleet_managers = pd.DataFrame(list(fleet_managers))
fleet_managers['city_id'] = fleet_managers['region_id'].map({'57ff2e84f39e0f0444000004':'Chennai','57ff2e08f39e0f0444000003':'Hyderabad'})
pros_fleet_managers.insert_many(fleet_managers.to_dict('records'))
The collection looks like this:
{
"_id" : ObjectId("58006678ee5e0e29c5000009"),
"deleted_at" : NaN,
"region_id" : "57ff2e84f39e0f0444000004",
"reset_password_sent_at" : Date(-61833715200000),
"current_sign_in_at" : ISODate("2016-10-14T06:07:55.568Z"),
"last_sign_in_at" : ISODate("2016-10-14T06:07:45.574Z"),
"remember_created_at" : Date(-61833715200000)
}
What did do wrong here. Thanks already.
I have found the solution by using the $ifNull while projecting the fields.
fleet_managers = taximongo.users.aggregate([{ "$match": { "role" : "fleet_manager"}},{"$project":{'_id':1,'deleted_at':{ "$ifNull": [ "$deleted_at", "null" ] },
'reset_password_sent_at':{ "$ifNull": [ "$reset_password_sent_at", "null" ] }, 'region_id':1,'current_sign_in_at':1,'last_sign_in_at':1,'remember_created_at':{ "$ifNull": [ "$remember_created_at", "null" ] }}}])
fleet_managers = pd.DataFrame(list(fleet_managers))
fleet_managers['city_id'] = fleet_managers['region_id'].map({'57ff2e84f39e0f0444000004':'Chennai','57ff2e08f39e0f0444000003':'Hyderabad'})
pros_fleet_managers.insert_many(fleet_managers.to_dict('records'))
The above code gives the solution but I need to handle the null or non existence dynamically i.e., when fetching it from the source collection.
Help me out on this.

Remove duplicate values in mongodb

I am learning mongodb using python with tornado.I have a mongodb collection, when I do
db.cal.find()
{
"Pid" : "5652f92761be0b14889d9854",
"Registration" : "TN 56 HD 6766",
"Vid" : "56543ed261be0b0a60a896c9",
"Period" : "10-2015",
"AOs": [
"14-10-2015",
"15-10-2015",
"18-10-2015",
"14-10-2015",
"15-10-2015",
"18-10-2015"
],
"Booked": [
"5-10-2015",
"7-10-2015",
"8-10-2015",
"5-10-2015",
"7-10-2015",
"8-10-2015"
],
"NA": [
"1-10-2015",
"2-10-2015",
"3-10-2015",
"4-10-2015",
"1-10-2015",
"2-10-2015",
"3-10-2015",
"4-10-2015"
],
"AOr": [
"23-10-2015",
"27-10-2015",
"23-10-2015",
"27-10-2015"
]
}
I need an operation to remove the duplicate values from the Booked,NA,AOs,AOr. Finally it should be
{
"Pid" : "5652f92761be0b14889d9854",
"Registration" : "TN 56 HD 6766",
"Vid" : "56543ed261be0b0a60a896c9",
"AOs": [
"14-10-2015",
"15-10-2015",
"18-10-2015",
],
"Booked": [
"5-10-2015",
"7-10-2015",
"8-10-2015",
],
"NA": [
"1-10-2015",
"2-10-2015",
"3-10-2015",
"4-10-2015",
],
"AOr": [
"23-10-2015",
"27-10-2015",
]
}
How do I achieve this in mongodb?
Working solution
I have created a working solution based on JavaScript, which is available on the mongo shell:
var codes = ["AOs", "Booked", "NA", "AOr"]
// Use bulk operations for efficiency
var bulk = db.dupes.initializeUnorderedBulkOp()
db.dupes.find().forEach(
function(doc) {
// Needed to prevent unnecessary operatations
changed = false
codes.forEach(
function(code) {
var values = doc[code]
var uniq = []
for (var i = 0; i < values.length; i++) {
// If the current value can not be found, it is unique
// in the "uniq" array after insertion
if (uniq.indexOf(values[i]) == -1 ){
uniq.push(values[i])
}
}
doc[code] = uniq
if (uniq.length < values.length) {
changed = true
}
}
)
// Update the document only if something was changed
if (changed) {
bulk.find({"_id":doc._id}).updateOne(doc)
}
}
)
// Apply all changes
bulk.execute()
Resulting document with your sample input:
replset:PRIMARY> db.dupes.find().pretty()
{
"_id" : ObjectId("567931aefefcd72d0523777b"),
"Pid" : "5652f92761be0b14889d9854",
"Registration" : "TN 56 HD 6766",
"Vid" : "56543ed261be0b0a60a896c9",
"Period" : "10-2015",
"AOs" : [
"14-10-2015",
"15-10-2015",
"18-10-2015"
],
"Booked" : [
"5-10-2015",
"7-10-2015",
"8-10-2015"
],
"NA" : [
"1-10-2015",
"2-10-2015",
"3-10-2015",
"4-10-2015"
],
"AOr" : [
"23-10-2015",
"27-10-2015"
]
}
Using indices with dropDups
This simply does not work. First, as per version 3.0, this option no longer exists. Since we have 3.2 released, we should find a portable way.
Second, even with dropDups, the documentation clearly states that:
dropDups boolean : MongoDB indexes only the first occurrence of a key and removes all documents from the collection that contain subsequent occurrences of that key.
So if there would be another document which has the same values in one of the billing codes as a previous one, the whole document would be deleted.
You can't use the "dropDups" syntax here first because it has been "deprecated" as of MongoDB 2.6 and removed in MongoDB 3.0 and will not even work.
To remove the duplicate from each list you need to use the set class in python.
import pymongo
fields = ['Booked', 'NA', 'AOs', 'AOr']
client = pymongo.MongoClient()
db = client.test
collection = db.cal
bulk = colllection.initialize_ordered_op()
count = 0
for document in collection.find():
update = dict(zip(fields, [list(set(document[field])) for field in fields]))
bulk.find({'_id': document['_id']}).update_one({'$set': update})
count = count + 1
if count % 200 == 0:
bulk.execute()
bulk = colllection.initialize_ordered_op()
if count > 0:
bulk.execute()
MongoDB 3.2 deprecates Bulk() and its associated methods and provides the .bulkWrite() method. This method is available from Pymongo 3.2 as bulk_write(). The first thing to do using this method is to import the UpdateOne class.
from pymongo import UpdateOne
requests = [] # list of write operations
for document in collection.find():
update = dict(zip(fields, [list(set(document[field])) for field in fields]))
requests.append(UpdateOne({'_id': document['_id']}, {'$set': update}))
collection.bulk_write(requests)
The two queries give the same and expected result:
{'AOr': ['27-10-2015', '23-10-2015'],
'AOs': ['15-10-2015', '14-10-2015', '18-10-2015'],
'Booked': ['7-10-2015', '5-10-2015', '8-10-2015'],
'NA': ['1-10-2015', '4-10-2015', '3-10-2015', '2-10-2015'],
'Period': '10-2015',
'Pid': '5652f92761be0b14889d9854',
'Registration': 'TN 56 HD 6766',
'Vid': '56543ed261be0b0a60a896c9',
'_id': ObjectId('567f808fc6e11b467e59330f')}
have you tried "Distinct()" ?
Link: https://docs.mongodb.org/v3.0/reference/method/db.collection.distinct/
Specify Query with distinct
The following example returns the distinct values for the field sku, embedded in the item field, from the documents whose dept is equal to "A":
db.inventory.distinct( "item.sku", { dept: "A" } )
The method returns the following array of distinct sku values:
[ "111", "333" ]
Assuming that you want to remove duplicate dates from the collection, so you can add a unique index with the dropDups: true option:
db.bill_codes.ensureIndex({"fieldName":1}, {unique: true, dropDups: true})
For more reference:
db.collection.ensureIndex() - MongoDB Manual 3.0
Note: Back up your database first in case it doesn't do exactly as you're expecting.

pyMongo iterate over cursor object with subitems

The function below searches a collection with a subitem projects. If there is a subitem with isManager set to 1 it should return True otherwise it will always return False.
def isMasterProject(self, pid, uid):
masterProjects = False
proj = self.collection.find({ "_id": uid, "projects": { '$elemMatch': { "projectId": _byid(pid), "isManager": 1 } } })
for value in proj:
if str(value['projects']['projectId']) == pid:
if value['projects']['isManager'] == 1:
masterProjects = True
return masterProjects
_byid is equivalent to ObjectId
It always seem to return False. Here's an example of a collection.
{
"_id" : ObjectId("52cf683306bcfc7be96a4d89"),
"firstName" : "Test",
"lastName" : "User",
"projects" : [
{
"projectId" : ObjectId("514f593c06bcfc1e96f619be"),
"isManager" : 0
},
{
"projectId" : ObjectId("511e3ed0909706a6a188953d"),
"isManager" : 1
},
{
"projectId" : ObjectId("51803baf06bcfc149116bf62"),
"isManager" : 1
},
{
"projectId" : ObjectId("514362bf121f92fb6867e58f"),
"isManager" : 1
}
],
"user" : "test.user#example.com",
"userType" : "Basic"
}
Would it be simpler to check for an empty cursor and if so how would I do that?
How about:
obj = next(proj, None)
if obj:
$elemMatch should only return results if the criteria given match a document so you should only return a cursor from find where your criteria are true.
Since you are using _id in the query and only ever expect to get one result, why not use findOne and shortcut one step.
Another gotcha for the new initiates, be aware you are returning the whole document here and not some representation with only the matching element of the array. Things that did not match will still be there, and then expecting different results by iterating over these will lead you to grief.

Categories