Workaround for preserveNullAndEmptyArrays in MongoDB 2.6

Workaround for preserveNullAndEmptyArrays in MongoDB 2.6 - python

I am using a Python script to query a MongoDB collection. The collection contains embedded documents with varying structures.
I am trying to simply "$unwind" an array contained in several documents. However, the array is not in ALL documents.
That means only the documents that contain the field are returned, the others are ignored. I am using PyMongo 2.6 so I am unable to use preserveNullAndEmptyArrays as mentioned in the documentation because it is new in MongoDB 3.2
Is there a workaround to this? Something along the lines of "if the field path exists, unwind".
The structure of documents and code in question is outlined in detail in this separate but related question I asked earlier.
ISSUE:
I am trying to "$unwind" the value of $hostnames.name. However, since the path doesn't exist in all documents, this results in several ignored documents.
Structure 1 Hostname stored as $hostnames.name
{
"_id" : "192.168.1.1",
"addresses" : {
"ipv4" : "192.168.1.1"
},
"hostnames" : [
{
"type" : "PTR",
"name" : "example.hostname.com"
}
]
}
Structure 2 Hostname stored as $hostname
{
"_id" : "192.168.2.1",
"addresses" : {
"ipv4" : "192.168.2.1"
},
"hostname" : "helloworld.com",
}
Script
cmp = db['computers'].aggregate([
{"$project": {
"u_hostname": {
"$ifNull": [
"$hostnames.name",
{ "$map": {
"input": {"$literal": ["A"]},
"as": "el",
"in": "$hostname"
}}
]
},
"_id": 0,
"u_ipv4": "$addresses.ipv4"
}},
{"$unwind": "$u_hostname"}
])
I am missing all documents that have an empty array for "hostnames".
Here is the structure of the documents that are still missing.
Structure 3
{
"_id" : "192.168.1.1",
"addresses" : { "ipv4" : "192.168.1.1" },
"hostnames" : [], }
}

We can still preserve all the documents where the array field is missing by playing with the $ifNull operator and use a logical $condition processing to assign a value to the newly computed field.
The condition here is $eq which returns True if the field is [None] or False when the condition expression evaluates to false.
cmp = db['computers'].aggregate(
[
{"$project":{
"u_ipv4": "$addresses.ipv4",
"u_hostname": {
"$let": {
"vars": {
"hostnameName": {
"$cond": [
{"$eq": ["$hostnames", []]},
[None],
{"$ifNull": ["$hostnames.name", [None]]}
]
},
"hostname": {"$ifNull": ["$hostname", None]}
},
"in": {
"$cond": [
{"$eq": ["$$hostnameName", [None]]},
{"$map": {
"input": {"$literal": [None]},
"as": "el",
"in": "$$hostname"
}},
"$$hostnameName"
]
}
}
}
}},
{ "$unwind": "$u_hostname" }
]
)

Related

Mongodb find nested dict element

{
"_id" : ObjectId("63920f965d15e98e3d7c450c"),
"first_name" : "mymy",
"last_activity" : 1669278303.4341061,
"username" : null,
"dates" : {
"29.11.2022" : {
},
"30.11.2022" : {
}
},
"user_id" : "1085116517"
}
How can I find all documents with 29.11.2022 contained in date? I tried many things but in all of them it detects the dot letter as something else.

Use $getField in $expr.
db.collection.find({
$expr: {
$eq: [
{},
{
"$getField": {
"field": "29.11.2022",
"input": "$dates"
}
}
]
}
})
Mongo Playground

Mongodb update a particular word in a string in a multiple document

I am updating a mongodb collection for a small project of mine and I'm stuck with updating a single word in an existing field.
Example:
{
"_id" : ObjectId("5faa46a6036e146f85a4afef"),
"name" : "Kubernetes_cluster_setup - kubernetes-cluster"
}
In the document I want to change the "name": "Kubernetes_cluster_config -kubernetes-cluster".
I want config to be replaced in place of setup, and it should not remove the -kubernetes-cluster, that is a constant value.
Applied method > $set updates the entire field, but I want -kubernetes-cluster should not be removed.

Try using $replaceOne operator.
You need an aggregation like this.
db.collection.aggregate([
{
"$match": {
"id": 0
}
},
{
"$set": {
"name": {
"$replaceOne": {
"input": "$name",
"find": "setup",
"replacement": "config"
}
}
}
}
])
The first part is to find the element (I've used by id) and the second one is used to replace into the field name, the value setup for config.
Example here
Also, if you want to replace the string for every document, you can use this query:
db.collection.aggregate([
{
"$match": {
"name": {
"$regex": "setup"
}
}
},
{
"$set": {
"name": {
"$replaceOne": {
"input": "$name",
"find": "setup",
"replacement": "config"
}
}
}
}
])
Here the query look for the documents where field name contains the word setup and then replace for config.
Example here

How to write match condition for array values?

I have stored values in multiple variables. below are the input variables.
uid = Objectid("5d518caed55bc00001d235c1")
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
These values are changed dynamically. and below is my code:
user_posts.aggregate([{
"$match": {
"$or": [{ "userid": uid }, {
"userid": {
"$eq":
disuid
}
}]
}
},
{
"$lookup": {
"from": "user_profile",
"localField": "userid",
"foreignField": "_id",
"as": "details"
}
},
{ "$unwind": "$details" },
{
"$sort": { "created_ts": -1 }
},
{
"$project": {
"userid": 1,
"type": 1,
"location": 1,
"caption": 1
}
}
])
In the above code, I am getting matched uid values only but I need documents matched to disuid also.
In userid field, we have stored "Objectid" values only.
So my concern is how to add "Objectid" to "disuid" variable and how to write match condition for both variables using userid field?

Ok you can do it in two ways :
As you've this :
uid = Objectid("5d518caed55bc00001d235c1")
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
You need to convert your list of strings to list of ObjectId's using python code :
from bson.objectid import ObjectId
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
my_list = []
for i in disuid:
my_list.append(ObjectId(i))
It will look like this :
[ObjectId('5d76b2c847c8d3000184a090'),ObjectId('5d7abb7a97a90b0001326010')]
then by using new list my_list, you can do query like this :
user_posts.aggregate([{"$match" : { "$or" : [{ "userid" : uid }, { "userid" : { "$in" : my_list }}]}}])
Or in the other way which I wouldn't prefer, as converting just few in code is easier compared to n num of values for userid field over all documents in DB, but just in case if you want it to be done using DB query :
user_posts.aggregate([{$addFields : {userStrings : {$toString: '$userid'}}},{"$match" : { "$or" : [{ "userid" : uid }, { "userStrings" : { "$in" : disuid }}]}}])
Note : In case if you don't have bson package, then you need to install it by doing something like pip install bson

How to Build a Keyword Historgam in MongoDB?

I am using MongoDB 3.4 and PyMongo. I have a set of keywords:
keywords = [ 'bar', 'foo', ..., 'zoo' ]
I also have a collection:
docs = { 'data' : ' ... bar foo ... ',
'data' : ' ... foo ... ',
'data' : ' ... zoo ... ' }
I am looking for a PyMongo aggregation query which is going to give me a dict:
{ 'bar' : 0, 'foo' : 2, ..., 'zoo' : 0 }

There isn't anything language specific about this, as the only solutions are either all aggregate or using mapReduce, where the latter is defined in JavaScript functions
Just setting up some sample data:
db.wordstuff.insertMany([
{ 'data': "foo brick bar" },
{ 'data': "brick foo" },
{ 'data': "bar brick baz" },
{ 'data': "bax" },
{ 'data': "brin brok fu foo" }
])
Aggregation Framework
Then you can run the aggregation statement:
db.wordstuff.aggregate([
{ "$project": {
"_id": 0,
"split": {
"$filter": {
"input": { "$split": [ "$data", " " ] },
"cond": { "$in": [ "$$this", ["bar","foo","baz","blat"] ] }
}
}
}},
{ "$unwind": "$split" },
{ "$group": { "_id": "$split", "count": { "$sum": 1 } }},
{ "$group": {
"_id": null,
"data": { "$push": { "k": "$_id", "v": "$count" } }
}},
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": {
"$map": {
"input": ["bar","foo","baz","blat"],
"as": "d",
"in": {
"$cond": {
"if": { "$ne": [{ "$indexOfArray": ["$data.k","$$d"] },-1] },
"then": {
"$arrayElemAt": [
"$data",
{ "$indexOfArray": ["$data.k","$$d"] }
]
},
"else": { "k": "$$d", "v": 0 }
}
}
}
}
}
}}
])
In reality, all of the real work is done by this point:
db.wordstuff.aggregate([
{ "$project": {
"_id": 0,
"split": {
"$filter": {
"input": { "$split": [ "$data", " " ] },
"cond": { "$in": [ "$$this", ["bar","foo","baz","blat"] ] }
}
}
}},
{ "$unwind": "$split" },
{ "$group": { "_id": "$split", "count": { "$sum": 1 } }},
])
Which gives you output like:
{ "_id" : "baz", "count" : 1.0 }
{ "_id" : "bar", "count" : 2.0 }
{ "_id" : "foo", "count" : 3.0 }
So the real work here is being done by $split and that is the main dependency on using the aggregation framework, so you need MongoDB 3.4 at least in order to do this. The very simple premise is to $split the words out individually as array members, then $filter the content to match the input array of words to match.
That $filter uses $in, which is another addition as of MongoDB 3.4 to match against each listed word. There are other operators that can do this with longer syntax, but we know we already need MongoDB 3.4 so this is the shortest syntax.
All that is really done after that is to $unwind the matched array of words from each document, then $group to obtain those matched words as a distinct list, along with the count of the occurrences.
That really is all there is to it from the main perspective of the database.
The following parts are actually "optional" since these are easy to reproduce in code, and probably look a lot clearer and cleaner by doing so. But just to demonstrate the newer operators that would require MongoDB 3.4.4 at least for the introduction of $arrayToObject.
Again the basics are that the next $group "rolls up" the matched words from the cursor into an array within a single document. There is also a very specific key naming applied of "k" and "v" for later reasons.
Then you use a $replaceRoot stage since the content of the document returned is evaluated from an expression. This expression uses $map to iterate over the "input array" of words and matches those to the entries created from the aggregation. This matching is done using $indexOfArray do return the matched index of the compared value.
You use this within $cond as you either want to transform that value into a matched elment using $arrayElemAt, or alternately recognize the index was not a match. This either returns the aggregated entry ( obtained from earlier matches ) or a "default" value of 0 for the given word.
The final part uses $arrayToObject which transforms an array of objects with properties "k" and "v" in to "key/value" pairs as an object.
So you can ask MongoDB to do it, but the data is actually reduced by the minimal pipeline as shown, so you may as well do it in client code. It's pretty simple, and for JavaScript you just do:
var words = db.wordstuff.aggregate([
{ "$project": {
"_id": 0,
"split": {
"$filter": {
"input": { "$split": [ "$data", " " ] },
"cond": { "$in": [ "$$this", ["bar","foo","baz","blat"] ] }
}
}
}},
{ "$unwind": "$split" },
{ "$group": { "_id": "$split", "count": { "$sum": 1 } }},
]).toArray();
var result = ["bar","foo","baz","blat"].map(
w => ( words.map(wd => wd._id).indexOf(w) !== -1)
? words[words.map(wd => wd._id).indexOf(w)]
: { _id: w, count: 0 }
).reduce((acc,curr) => Object.assign(acc,{ [curr._id]: curr.count }),{})
So if there is anything that's language specific at all, then that would be the part. So if you choose to run the aggregation at it's basics and process the resulting cursor, then the python code would be:
input = ["bar","foo","baz","blat"]
words = list(db.wordstuff.aggregate([
{ "$project": {
"_id": 0,
"split": {
"$filter": {
"input": { "$split": [ "$data", " " ] },
"cond": { "$in": [ "$$this", input ] }
}
}
}},
{ "$unwind": "$split" },
{ "$group": { "_id": "$split", "count": { "$sum": 1 } }},
]))
result = reduce(
lambda x,y:
dict(x.items() + { y['_id']: y['count'] }.items()),
map(lambda w: words[map(lambda wd: wd['_id'],words).index(w)]
if w in map(lambda wd: wd['_id'],words)
else { '_id': w, 'count': 0 },
input
),
{}
)
And either method pulls out the same result:
{
"bar" : 2.0,
"foo" : 3.0,
"baz" : 1.0,
"blat" : 0.0
}
MapReduce
The alternate case where you don't even have the minimum MongoDB 3.4.0 available is to use mapReduce for the process instead. Again, this needs to be sent to the server as JavaScript, which is generally represented within "strings" in most language implementations ( other than JavaScript itself ):
db.wordstuff.mapReduce(
function() {
this.data.split(' ')
.filter( w => words.indexOf(w) !== -1 )
.forEach( w => emit(null,{ [w]: 1 }) );
},
function(key,values) {
return [].concat.apply([],
values.map(v => Object.keys(v).map(k => ({ k: k, v: v[k] })))
).reduce((acc,curr) => Object.assign(acc,{
[curr.k]: (acc.hasOwnProperty(curr.k))
? acc[curr.k] + curr.v : curr.v
}),{});
},
{
"out": { "inline": 1 },
"scope": { "words": ["bar","foo","baz","blat"] },
"finalize": function(key,value) {
return words.map( w => (value.hasOwnProperty(w))
? { [w]: value[w] } : { [w]: 0 }
).reduce((acc,curr) => Object.assign(acc,curr),{})
}
}
)
And that gives you the same results and really does exactly the same thing. Just a little slower because MongoDB needs to evaluate and process the JavaScript as compared to using it's own native coded methods with the aggregation framework.

limit results of child/sub document when using find on master document

First time exploring mongoDB and I've bumped into a pickle.
Assuming I have a table/collection called inventory.
This collection in turn have documents that look like:
{
"book" : "Harry Potter",
"users" : {
"Read_it" : {
"John" : <personal number>,
"Elise" : <personal number>
},
"Currently_reading" : { ... }
}
}
Now the dictionary "Read_it" can become quite large and I'm limited to the amount of memory the querying client has so I would like to some how limit the number of returned item and perhaps page it.
This is a function I found in the docs, not sure how to convert this into what I need.
db.inventory.find( { "book": "Harry Potter" }, { item: 1, qty: 500 } )
Skipping the second parameter to find() gives me a result in the form a complete dictionary which works as long as the "Read_it" document/container doesn't grow to big.
One solution would be to pull back the structure so it becomes more flat, but that isn't optimal in terms of other aspects of this project.
Is is possible to work with find() here or are there another function that can do this better?

You seem to asking about projecting only specific elements of a nested structure.
Consider your document example (revised for use):
{
"book" : "Harry Potter",
"users" : {
"Read_it" : {
"John" : 1,
"Elise" : 2
},
"Currently_reading" : {
"Peter": 1
},
"More_information": 5
}
}
Then just issue as follows:
db.collection.find(
{ "book": "Harry Potter" },
{
"book": 1,
"users.Currently_reading": 1,
"users.More_information": 1
}
)
Returns the result with just the fields specified:
{
"_id" : ObjectId("5573b2beb67e246aba2b4b71"),
"book" : "Harry Potter",
"users" : {
"Currently_reading" : {
"Peter" : 1
},
"More_information" : 5
}
}
Not entirely sure, but that might not be supported in all MongoDB versions. Works in 3.X though. If you find it is not supported then do this instead:
db.collection.aggregate([
{ "$match": { "book": "Harry Potter" } },
{ "$project": {
"book": 1,
"users": {
"Currently_Reading": "$users.Currently_reading",
"More_information": "$users.More_information"
}
}}
])
The $project option of the .aggregate() method allows you to manipulate the document returned quite freely. So you don't even need to keep the same structure to return nested results and could change the result further if needed.
I would also strongly suggest using arrays with properties of sub-documents rather than nested dictionaries since that form is much easier to query and filter results than your current structure allows.
Additional to unclear question
As mentioned, it is better to use arrays rather than keys to represent the nested data. So if your intent is to actually just restrict the "Read_it" items to a number of entries then your data is best modelled as such:
{
"book" : "Harry Potter",
"users" : {
"Read_it" : [
{ "username": "John", "id": 1 },
{ "username": "Elise", "id": 2 }
],
"Currently_reading" : [
{ "username": "Peter", "id": 3 }
],
"More_information": 5
}
}
Then you can do a query to limit the number of items in "Read_it" using $slice :
db.collection.find(
{ "book": "Harry Potter" },
{ "users.Read_it": { "$slice": 1 } }
)
Which returns:
{
"_id" : ObjectId("5574118012ae33005f1fca17"),
"book" : "Harry Potter",
"users" : {
"Read_it" : [
{
"username" : "John",
"id" : 1
}
],
"Currently_reading" : [
{
"username" : "Peter",
"id" : 3
}
],
"More_information" : 5
}
}
Alternate options use the projection positional $ operator or even the aggregation framework for multiple matches in the array. But there are already many answers here that show you how to do that.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Workaround for preserveNullAndEmptyArrays in MongoDB 2.6 - python

Related

Mongodb find nested dict element

Mongodb update a particular word in a string in a multiple document

How to write match condition for array values?

How to Build a Keyword Historgam in MongoDB?

limit results of child/sub document when using find on master document

Categories

Resources