Pymongo find value in subdocuments - python

I'm using MongoDB 4 and Python 3. I have 3 collections. The first collection got 2 referenced fields on the other collections.
Example :
User {
_id : ObjectId("5b866e8e06a77b30ce272ba6"),
name : "John",
pet : ObjectId("5b9248cc06a77b09a496bad0"),
car : ObjectId("5b214c044ds32f6bad7d2"),
}
Pet {
_id : ObjectId("5b9248cc06a77b09a496bad0"),
name : "Mickey",
}
Car {
_id : ObjectId("5b214c044ds32f6bad7d2"),
model : "Tesla"
}
So one User has one car and one pet. I need to query the User collection and find if there is a User who has a Pet with the name "Mickey" and a Car with the model "Tesla".
I tried this :
db.user.aggregate([{
$project : {"pet.name" : "Mickey", "car.model" : "Tesla" }
}])
But it returns me lot of data while I have just one document with this data. What I'm doing wrong ?

The answer posted by #AnthonyWinzlet has the downside that it needs to churn through all documents in the users collection and perform $lookups which is relatively costly. So depending on the size of your Users collection it may well be faster to do this:
Put an index on users.pet and users.car: db.users.createIndex({pet: 1, car: 1})
Put an index on cars.model: db.cars.createIndex({model: 1})
Put an index on pets.name: db.pets.createIndex({name: 1})
Then you could simply do this:
Get the list of all matching "Tesla" cars: db.cars.find({model: "Tesla"})
Get the list of all matching "Mickey" pets: db.pets.find({name: "Mickey"})
Find the users you are interested in: db.users.find({car: { $in: [<ids from cars query>] }, pet: { $in: [<ids from pets query>] }})
That is pretty easy to read and understand plus all three queries are fully covered by indexes so they can be expected to be as fast as things can get.

You need to use $lookup aggregation here.
Something like this
db.users.aggregate([
{ "$lookup": {
"from": Pet.collection.name,
"let": { "pet": "$pet" },
"pipeline": [
{ "$match": { "$expr": { "$eq": ["$_id", "$$pet"] }, "name" : "Mickey"}}
],
"as": "pet"
}},
{ "$lookup": {
"from": Car.collection.name,
"let": { "car": "$car" },
"pipeline": [
{ "$match": { "$expr": { "$eq": ["$_id", "$$car"] }, "model" : "Tesla"}}
],
"as": "car"
}},
{ "$match": { "pet": { "$ne": [] }, "car": { "$ne": [] } }},
{ "$project": { "name": 1 }}
])

Related

Mongodb get count() of CommandCursor

I'm performing a search with this aggregate and would like to get my total count (to deal with my pagination).
results = mongo.db.perfumes.aggregate(
[
{"$match": {"$text": {"$search": db_query}}},
{
"$lookup": {
"from": "users",
"localField": "author",
"foreignField": "username",
"as": "creator",
}
},
{"$unwind": "$creator"},
{
"$project": {
"_id": "$_id",
"perfumeName": "$name",
"perfumeBrand": "$brand",
"perfumeDescription": "$description",
"date_updated": "$date_updated",
"perfumePicture": "$picture",
"isPublic": "$public",
"perfumeType": "$perfume_type",
"username": "$creator.username",
"firstName": "$creator.first_name",
"lastName": "$creator.last_name",
"profilePicture": "$creator.avatar",
}
},
{"$sort": {"perfumeName": 1}},
]
)
How could I get the count of results in my route so I can pass it to my template?
I cannot use results.count() as it is a CommandCursor.
Help please? Thank you!!
Using len method to return no.of elements in an array would be easier but if you still wanted an aggregation query to return count and actual docs at the same time then try using $facet or $group :
Query 1 :
{
$facet: {
docs: [ { $match: {} } ], // passes all docs into an array field
count: [ { $count: "count" } ] // counts no.of docs
}
},
/** re-create count field from array of one object to just a number */
{
$addFields: { count: { $arrayElemAt: [ "$count.count", 0 ] } }
}
Test : mongoplayground
Query 2 :
/** Group all docs without any condition & push all docs into an array field & count no.of docs flowing through iteration using `$sum` */
{
$group: { _id: "", docs: { $push: "$$ROOT" }, count: { $sum: 1 } }
}
Test : mongoplayground
Note :
Add one of these queries at the end of your current aggregation pipeline and remember if there are no docs after $match or $unwind stages then first query would not have count field but has docs : [] but second query will just return [], code it accordingly.
If you look at the CommandCursor's docs, it does not support count()
You can use the length filter in jinja template.
{{ results | length }}
I hope the above helps.

How to write match condition for array values?

I have stored values in multiple variables. below are the input variables.
uid = Objectid("5d518caed55bc00001d235c1")
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
These values are changed dynamically. and below is my code:
user_posts.aggregate([{
"$match": {
"$or": [{ "userid": uid }, {
"userid": {
"$eq":
disuid
}
}]
}
},
{
"$lookup": {
"from": "user_profile",
"localField": "userid",
"foreignField": "_id",
"as": "details"
}
},
{ "$unwind": "$details" },
{
"$sort": { "created_ts": -1 }
},
{
"$project": {
"userid": 1,
"type": 1,
"location": 1,
"caption": 1
}
}
])
In the above code, I am getting matched uid values only but I need documents matched to disuid also.
In userid field, we have stored "Objectid" values only.
So my concern is how to add "Objectid" to "disuid" variable and how to write match condition for both variables using userid field?
Ok you can do it in two ways :
As you've this :
uid = Objectid("5d518caed55bc00001d235c1")
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
You need to convert your list of strings to list of ObjectId's using python code :
from bson.objectid import ObjectId
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
my_list = []
for i in disuid:
my_list.append(ObjectId(i))
It will look like this :
[ObjectId('5d76b2c847c8d3000184a090'),ObjectId('5d7abb7a97a90b0001326010')]
then by using new list my_list, you can do query like this :
user_posts.aggregate([{"$match" : { "$or" : [{ "userid" : uid }, { "userid" : { "$in" : my_list }}]}}])
Or in the other way which I wouldn't prefer, as converting just few in code is easier compared to n num of values for userid field over all documents in DB, but just in case if you want it to be done using DB query :
user_posts.aggregate([{$addFields : {userStrings : {$toString: '$userid'}}},{"$match" : { "$or" : [{ "userid" : uid }, { "userStrings" : { "$in" : disuid }}]}}])
Note : In case if you don't have bson package, then you need to install it by doing something like pip install bson

Workaround for preserveNullAndEmptyArrays in MongoDB 2.6

I am using a Python script to query a MongoDB collection. The collection contains embedded documents with varying structures.
I am trying to simply "$unwind" an array contained in several documents. However, the array is not in ALL documents.
That means only the documents that contain the field are returned, the others are ignored. I am using PyMongo 2.6 so I am unable to use preserveNullAndEmptyArrays as mentioned in the documentation because it is new in MongoDB 3.2
Is there a workaround to this? Something along the lines of "if the field path exists, unwind".
The structure of documents and code in question is outlined in detail in this separate but related question I asked earlier.
ISSUE:
I am trying to "$unwind" the value of $hostnames.name. However, since the path doesn't exist in all documents, this results in several ignored documents.
Structure 1 Hostname stored as $hostnames.name
{
"_id" : "192.168.1.1",
"addresses" : {
"ipv4" : "192.168.1.1"
},
"hostnames" : [
{
"type" : "PTR",
"name" : "example.hostname.com"
}
]
}
Structure 2 Hostname stored as $hostname
{
"_id" : "192.168.2.1",
"addresses" : {
"ipv4" : "192.168.2.1"
},
"hostname" : "helloworld.com",
}
Script
cmp = db['computers'].aggregate([
{"$project": {
"u_hostname": {
"$ifNull": [
"$hostnames.name",
{ "$map": {
"input": {"$literal": ["A"]},
"as": "el",
"in": "$hostname"
}}
]
},
"_id": 0,
"u_ipv4": "$addresses.ipv4"
}},
{"$unwind": "$u_hostname"}
])
I am missing all documents that have an empty array for "hostnames".
Here is the structure of the documents that are still missing.
Structure 3
{
"_id" : "192.168.1.1",
"addresses" : { "ipv4" : "192.168.1.1" },
"hostnames" : [], }
}
We can still preserve all the documents where the array field is missing by playing with the $ifNull operator and use a logical $condition processing to assign a value to the newly computed field.
The condition here is $eq which returns True if the field is [None] or False when the condition expression evaluates to false.
cmp = db['computers'].aggregate(
[
{"$project":{
"u_ipv4": "$addresses.ipv4",
"u_hostname": {
"$let": {
"vars": {
"hostnameName": {
"$cond": [
{"$eq": ["$hostnames", []]},
[None],
{"$ifNull": ["$hostnames.name", [None]]}
]
},
"hostname": {"$ifNull": ["$hostname", None]}
},
"in": {
"$cond": [
{"$eq": ["$$hostnameName", [None]]},
{"$map": {
"input": {"$literal": [None]},
"as": "el",
"in": "$$hostname"
}},
"$$hostnameName"
]
}
}
}
}},
{ "$unwind": "$u_hostname" }
]
)

limit results of child/sub document when using find on master document

First time exploring mongoDB and I've bumped into a pickle.
Assuming I have a table/collection called inventory.
This collection in turn have documents that look like:
{
"book" : "Harry Potter",
"users" : {
"Read_it" : {
"John" : <personal number>,
"Elise" : <personal number>
},
"Currently_reading" : { ... }
}
}
Now the dictionary "Read_it" can become quite large and I'm limited to the amount of memory the querying client has so I would like to some how limit the number of returned item and perhaps page it.
This is a function I found in the docs, not sure how to convert this into what I need.
db.inventory.find( { "book": "Harry Potter" }, { item: 1, qty: 500 } )
Skipping the second parameter to find() gives me a result in the form a complete dictionary which works as long as the "Read_it" document/container doesn't grow to big.
One solution would be to pull back the structure so it becomes more flat, but that isn't optimal in terms of other aspects of this project.
Is is possible to work with find() here or are there another function that can do this better?
You seem to asking about projecting only specific elements of a nested structure.
Consider your document example (revised for use):
{
"book" : "Harry Potter",
"users" : {
"Read_it" : {
"John" : 1,
"Elise" : 2
},
"Currently_reading" : {
"Peter": 1
},
"More_information": 5
}
}
Then just issue as follows:
db.collection.find(
{ "book": "Harry Potter" },
{
"book": 1,
"users.Currently_reading": 1,
"users.More_information": 1
}
)
Returns the result with just the fields specified:
{
"_id" : ObjectId("5573b2beb67e246aba2b4b71"),
"book" : "Harry Potter",
"users" : {
"Currently_reading" : {
"Peter" : 1
},
"More_information" : 5
}
}
Not entirely sure, but that might not be supported in all MongoDB versions. Works in 3.X though. If you find it is not supported then do this instead:
db.collection.aggregate([
{ "$match": { "book": "Harry Potter" } },
{ "$project": {
"book": 1,
"users": {
"Currently_Reading": "$users.Currently_reading",
"More_information": "$users.More_information"
}
}}
])
The $project option of the .aggregate() method allows you to manipulate the document returned quite freely. So you don't even need to keep the same structure to return nested results and could change the result further if needed.
I would also strongly suggest using arrays with properties of sub-documents rather than nested dictionaries since that form is much easier to query and filter results than your current structure allows.
Additional to unclear question
As mentioned, it is better to use arrays rather than keys to represent the nested data. So if your intent is to actually just restrict the "Read_it" items to a number of entries then your data is best modelled as such:
{
"book" : "Harry Potter",
"users" : {
"Read_it" : [
{ "username": "John", "id": 1 },
{ "username": "Elise", "id": 2 }
],
"Currently_reading" : [
{ "username": "Peter", "id": 3 }
],
"More_information": 5
}
}
Then you can do a query to limit the number of items in "Read_it" using $slice :
db.collection.find(
{ "book": "Harry Potter" },
{ "users.Read_it": { "$slice": 1 } }
)
Which returns:
{
"_id" : ObjectId("5574118012ae33005f1fca17"),
"book" : "Harry Potter",
"users" : {
"Read_it" : [
{
"username" : "John",
"id" : 1
}
],
"Currently_reading" : [
{
"username" : "Peter",
"id" : 3
}
],
"More_information" : 5
}
}
Alternate options use the projection positional $ operator or even the aggregation framework for multiple matches in the array. But there are already many answers here that show you how to do that.

how to aggregate on each item in collection in mongoDB

MongoDB noob here...
when I do db.students.find().pretty() in the shell I get a long list from my collection...like so..
{
"_id" : 19,
"name" : "Gisela Levin",
"scores" : [
{
"type" : "exam",
"score" : 44.51211101958831
},
{
"type" : "quiz",
"score" : 0.6578497966368002
},
{
"type" : "homework",
"score" : 93.36341655949683
},
{
"type" : "homework",
"score" : 49.43132782777443
}
]
}
now I've got about over 100 of these...I need to run the following on each of them...
lowest_hw_score =
db.students.aggregate(
// Initial document match (uses index, if a suitable one is available)
{ $match: {
_id : 0
}},
// Expand the scores array into a stream of documents
{ $unwind: '$scores' },
// Filter to 'homework' scores
{ $match: {
'scores.type': 'homework'
}},
// Sort in descending order
{ $sort: {
'scores.score': 1
}},
{ $limit: 1}
)
So I can run something like this on each result
for item in lowest_hw_score:
print lowest_hw_score
Right now "lowest_score" works on only one item I to run this on all items in the collection...how do I do this?
> db.students.aggregate(
{ $match : { 'scores.type': 'homework' } },
{ $unwind: "$scores" },
{ $match:{"scores.type":"homework"} },
{ $group: {
_id : "$_id",
maxScore : { $max : "$scores.score"},
minScore: { $min:"$scores.score"}
}
});
You don't really need the first $match, but if "scores.type" is indexed, it means it would be used before unwinding the scores. (I don't believe after the $unwind mongo would be able to use the index.)
Result:
{
"result" : [
{
"_id" : 19,
"maxScore" : 93.36341655949683,
"minScore" : 49.43132782777443
}
],
"ok" : 1
}
Edit: tested and updated in mongo shell

Categories