limit results of child/sub document when using find on master document - python

First time exploring mongoDB and I've bumped into a pickle.
Assuming I have a table/collection called inventory.
This collection in turn have documents that look like:
{
"book" : "Harry Potter",
"users" : {
"Read_it" : {
"John" : <personal number>,
"Elise" : <personal number>
},
"Currently_reading" : { ... }
}
}
Now the dictionary "Read_it" can become quite large and I'm limited to the amount of memory the querying client has so I would like to some how limit the number of returned item and perhaps page it.
This is a function I found in the docs, not sure how to convert this into what I need.
db.inventory.find( { "book": "Harry Potter" }, { item: 1, qty: 500 } )
Skipping the second parameter to find() gives me a result in the form a complete dictionary which works as long as the "Read_it" document/container doesn't grow to big.
One solution would be to pull back the structure so it becomes more flat, but that isn't optimal in terms of other aspects of this project.
Is is possible to work with find() here or are there another function that can do this better?

You seem to asking about projecting only specific elements of a nested structure.
Consider your document example (revised for use):
{
"book" : "Harry Potter",
"users" : {
"Read_it" : {
"John" : 1,
"Elise" : 2
},
"Currently_reading" : {
"Peter": 1
},
"More_information": 5
}
}
Then just issue as follows:
db.collection.find(
{ "book": "Harry Potter" },
{
"book": 1,
"users.Currently_reading": 1,
"users.More_information": 1
}
)
Returns the result with just the fields specified:
{
"_id" : ObjectId("5573b2beb67e246aba2b4b71"),
"book" : "Harry Potter",
"users" : {
"Currently_reading" : {
"Peter" : 1
},
"More_information" : 5
}
}
Not entirely sure, but that might not be supported in all MongoDB versions. Works in 3.X though. If you find it is not supported then do this instead:
db.collection.aggregate([
{ "$match": { "book": "Harry Potter" } },
{ "$project": {
"book": 1,
"users": {
"Currently_Reading": "$users.Currently_reading",
"More_information": "$users.More_information"
}
}}
])
The $project option of the .aggregate() method allows you to manipulate the document returned quite freely. So you don't even need to keep the same structure to return nested results and could change the result further if needed.
I would also strongly suggest using arrays with properties of sub-documents rather than nested dictionaries since that form is much easier to query and filter results than your current structure allows.
Additional to unclear question
As mentioned, it is better to use arrays rather than keys to represent the nested data. So if your intent is to actually just restrict the "Read_it" items to a number of entries then your data is best modelled as such:
{
"book" : "Harry Potter",
"users" : {
"Read_it" : [
{ "username": "John", "id": 1 },
{ "username": "Elise", "id": 2 }
],
"Currently_reading" : [
{ "username": "Peter", "id": 3 }
],
"More_information": 5
}
}
Then you can do a query to limit the number of items in "Read_it" using $slice :
db.collection.find(
{ "book": "Harry Potter" },
{ "users.Read_it": { "$slice": 1 } }
)
Which returns:
{
"_id" : ObjectId("5574118012ae33005f1fca17"),
"book" : "Harry Potter",
"users" : {
"Read_it" : [
{
"username" : "John",
"id" : 1
}
],
"Currently_reading" : [
{
"username" : "Peter",
"id" : 3
}
],
"More_information" : 5
}
}
Alternate options use the projection positional $ operator or even the aggregation framework for multiple matches in the array. But there are already many answers here that show you how to do that.

Related

Elasticsearch: I want to count how many times a particular value has appeared in particular field

I have a simple question that is becoming a dilemma. I want to count how many times a particular value has appeared in a particular field of my elasticseach index.
let's say I have product.name and I want to know how many times the exact word "Shampoo" has appeared inside it.
I know there's the terms aggregation that provides all the unique values along with the number of their occurrences. But, given that I have billions of records, that's not good for me. I don't want an extra step to find my desired value among the keys.
Filter aggregation
Defines a single bucket of all the documents in the current document set context that match a specified filter. Often this will be used to narrow down the current aggregation context to a specific set of documents.
An aggregation like value_count can be applied to the bucket returned in filter aggregation to get total count of documents.
I have taken product to be of type object, if it is of nested datatype then nested query needs to be used in place of simple term query.
Mapping
{
"index92" : {
"mappings" : {
"properties" : {
"product" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
Query:
{
"size": 0,
"aggs": {
"shampoo": {
"filter": {
"term": {
"product.name.keyword": "shampoo"
}
},"aggs": {
"count": {
"value_count": {
"field": "product.name.keyword"
}
}
}
}
}
}
Result:
"aggregations" : {
"shampoo" : {
"doc_count" : 2,
"count" : {
"value" : 2
}
}
}

How to write match condition for array values?

I have stored values in multiple variables. below are the input variables.
uid = Objectid("5d518caed55bc00001d235c1")
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
These values are changed dynamically. and below is my code:
user_posts.aggregate([{
"$match": {
"$or": [{ "userid": uid }, {
"userid": {
"$eq":
disuid
}
}]
}
},
{
"$lookup": {
"from": "user_profile",
"localField": "userid",
"foreignField": "_id",
"as": "details"
}
},
{ "$unwind": "$details" },
{
"$sort": { "created_ts": -1 }
},
{
"$project": {
"userid": 1,
"type": 1,
"location": 1,
"caption": 1
}
}
])
In the above code, I am getting matched uid values only but I need documents matched to disuid also.
In userid field, we have stored "Objectid" values only.
So my concern is how to add "Objectid" to "disuid" variable and how to write match condition for both variables using userid field?
Ok you can do it in two ways :
As you've this :
uid = Objectid("5d518caed55bc00001d235c1")
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
You need to convert your list of strings to list of ObjectId's using python code :
from bson.objectid import ObjectId
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
my_list = []
for i in disuid:
my_list.append(ObjectId(i))
It will look like this :
[ObjectId('5d76b2c847c8d3000184a090'),ObjectId('5d7abb7a97a90b0001326010')]
then by using new list my_list, you can do query like this :
user_posts.aggregate([{"$match" : { "$or" : [{ "userid" : uid }, { "userid" : { "$in" : my_list }}]}}])
Or in the other way which I wouldn't prefer, as converting just few in code is easier compared to n num of values for userid field over all documents in DB, but just in case if you want it to be done using DB query :
user_posts.aggregate([{$addFields : {userStrings : {$toString: '$userid'}}},{"$match" : { "$or" : [{ "userid" : uid }, { "userStrings" : { "$in" : disuid }}]}}])
Note : In case if you don't have bson package, then you need to install it by doing something like pip install bson

Pymongo find value in subdocuments

I'm using MongoDB 4 and Python 3. I have 3 collections. The first collection got 2 referenced fields on the other collections.
Example :
User {
_id : ObjectId("5b866e8e06a77b30ce272ba6"),
name : "John",
pet : ObjectId("5b9248cc06a77b09a496bad0"),
car : ObjectId("5b214c044ds32f6bad7d2"),
}
Pet {
_id : ObjectId("5b9248cc06a77b09a496bad0"),
name : "Mickey",
}
Car {
_id : ObjectId("5b214c044ds32f6bad7d2"),
model : "Tesla"
}
So one User has one car and one pet. I need to query the User collection and find if there is a User who has a Pet with the name "Mickey" and a Car with the model "Tesla".
I tried this :
db.user.aggregate([{
$project : {"pet.name" : "Mickey", "car.model" : "Tesla" }
}])
But it returns me lot of data while I have just one document with this data. What I'm doing wrong ?
The answer posted by #AnthonyWinzlet has the downside that it needs to churn through all documents in the users collection and perform $lookups which is relatively costly. So depending on the size of your Users collection it may well be faster to do this:
Put an index on users.pet and users.car: db.users.createIndex({pet: 1, car: 1})
Put an index on cars.model: db.cars.createIndex({model: 1})
Put an index on pets.name: db.pets.createIndex({name: 1})
Then you could simply do this:
Get the list of all matching "Tesla" cars: db.cars.find({model: "Tesla"})
Get the list of all matching "Mickey" pets: db.pets.find({name: "Mickey"})
Find the users you are interested in: db.users.find({car: { $in: [<ids from cars query>] }, pet: { $in: [<ids from pets query>] }})
That is pretty easy to read and understand plus all three queries are fully covered by indexes so they can be expected to be as fast as things can get.
You need to use $lookup aggregation here.
Something like this
db.users.aggregate([
{ "$lookup": {
"from": Pet.collection.name,
"let": { "pet": "$pet" },
"pipeline": [
{ "$match": { "$expr": { "$eq": ["$_id", "$$pet"] }, "name" : "Mickey"}}
],
"as": "pet"
}},
{ "$lookup": {
"from": Car.collection.name,
"let": { "car": "$car" },
"pipeline": [
{ "$match": { "$expr": { "$eq": ["$_id", "$$car"] }, "model" : "Tesla"}}
],
"as": "car"
}},
{ "$match": { "pet": { "$ne": [] }, "car": { "$ne": [] } }},
{ "$project": { "name": 1 }}
])

Workaround for preserveNullAndEmptyArrays in MongoDB 2.6

I am using a Python script to query a MongoDB collection. The collection contains embedded documents with varying structures.
I am trying to simply "$unwind" an array contained in several documents. However, the array is not in ALL documents.
That means only the documents that contain the field are returned, the others are ignored. I am using PyMongo 2.6 so I am unable to use preserveNullAndEmptyArrays as mentioned in the documentation because it is new in MongoDB 3.2
Is there a workaround to this? Something along the lines of "if the field path exists, unwind".
The structure of documents and code in question is outlined in detail in this separate but related question I asked earlier.
ISSUE:
I am trying to "$unwind" the value of $hostnames.name. However, since the path doesn't exist in all documents, this results in several ignored documents.
Structure 1 Hostname stored as $hostnames.name
{
"_id" : "192.168.1.1",
"addresses" : {
"ipv4" : "192.168.1.1"
},
"hostnames" : [
{
"type" : "PTR",
"name" : "example.hostname.com"
}
]
}
Structure 2 Hostname stored as $hostname
{
"_id" : "192.168.2.1",
"addresses" : {
"ipv4" : "192.168.2.1"
},
"hostname" : "helloworld.com",
}
Script
cmp = db['computers'].aggregate([
{"$project": {
"u_hostname": {
"$ifNull": [
"$hostnames.name",
{ "$map": {
"input": {"$literal": ["A"]},
"as": "el",
"in": "$hostname"
}}
]
},
"_id": 0,
"u_ipv4": "$addresses.ipv4"
}},
{"$unwind": "$u_hostname"}
])
I am missing all documents that have an empty array for "hostnames".
Here is the structure of the documents that are still missing.
Structure 3
{
"_id" : "192.168.1.1",
"addresses" : { "ipv4" : "192.168.1.1" },
"hostnames" : [], }
}
We can still preserve all the documents where the array field is missing by playing with the $ifNull operator and use a logical $condition processing to assign a value to the newly computed field.
The condition here is $eq which returns True if the field is [None] or False when the condition expression evaluates to false.
cmp = db['computers'].aggregate(
[
{"$project":{
"u_ipv4": "$addresses.ipv4",
"u_hostname": {
"$let": {
"vars": {
"hostnameName": {
"$cond": [
{"$eq": ["$hostnames", []]},
[None],
{"$ifNull": ["$hostnames.name", [None]]}
]
},
"hostname": {"$ifNull": ["$hostname", None]}
},
"in": {
"$cond": [
{"$eq": ["$$hostnameName", [None]]},
{"$map": {
"input": {"$literal": [None]},
"as": "el",
"in": "$$hostname"
}},
"$$hostnameName"
]
}
}
}
}},
{ "$unwind": "$u_hostname" }
]
)

mongodb python , quick pipeline code check

i am a beginner to mongodb and i have the assignment to write pipeline code. MY goal is to find the Region in India has the largest number of cities with longitude between 75 and 80? I hope anybody can help me to point out my misconceptions and/or mistakes, it is a very short code, so i am sure the pros will spot it right away.
Here is my code, i will post how the datastructure looks like under it :
pipeline = [
{"$match" : {"lon": {"$gte":75, "$lte" : 80}},
{'country' : 'India'}},
{ '$unwind' : '$isPartOf'},
{ "$group":
{
"_id": "$name",
"count" :{"$sum":{"cityname":"$name"}} }},
{"$sort": {"count": -1}},
{"$limit": 1}
]
{
"_id" : ObjectId("52fe1d364b5ab856eea75ebc"),
"elevation" : 1855,
"name" : "Kud",
"country" : "India",
"lon" : 75.28,
"lat" : 33.08,
"isPartOf" : [
"Jammu and Kashmir",
"Udhampur district"
],
"timeZone" : [
"Indian Standard Time"
],
"population" : 1140
}
The following pipeline will give you the desired result. The first $match pipeline operator uses standard MongoDB queries to filter the documents (cities) whose longitude is between 75 and 80 and as well as the ones only in India based on the country field. Since each document represents a city, the $unwind operator on the isPartOf deconstructs that array field from the filtered documents to output a document for each element. Each output document replaces the array with an element value. Thus for each input document, outputs n documents where n is the number of array elements and this operation is rather useful in the next $group operator stage since that's where you can calculate the number n through $sum group accumulator operator. The next pipeline stages will then transform your final document structure by introducing new replacement fields Region and NumberOfCities + sorting the documents in descending order and then returning the top 1 document which is your region with the largest number of cities:
pipeline = [
{
"$match": {
"lon": {"$gte": 75, "$lte": 80},
"country": "India"
}
},
{
"$unwind": "$isPartOf"
},
{
"$group": {
"_id": "$isPartOf",
"count": {
"$sum": 1
}
}
},
{
"$project": {
"_id": 0,
"Region": "$_id",
"NumberOfCities": "$count"
}
},
{
"$sort": {"NumberOfCities": -1}
},
{ "$limit": 1 }
]
There are some syntax and logical errors in your pipeline.
{"$match" : {"lon": {"$gte":75, "$lte" : 80}},
{'country' : 'India'}},
The Syntax here is wrong, you should just use comma to seperate key value pairs in `$match.
"_id": "$name",
You are grouping based on city name and not on the region.
{"$sum":{"cityname":"$name"}}
You need to send a numeric values to the $sum operator that result from applying a specified expression. {"cityname":"$name"} will be ignored.
The correct pipeline would be :-
[
{"$match" : {"lon": {"$gte":75,"$lte" : 80},'country' : 'India'}},
{ '$unwind' : '$isPartOf'},
{ "$group":
{
"_id": "$isPartOf",
"count" :{"$sum":1}
}
},
{"$sort": {"count": -1}},
{"$limit": 1}
]
If you want to get all the cities in that region satisfying your condition as well ,you can add "cities": {'$push': '$name'} in the $group stage.

Categories