Using PyMongo, how would one find/search for the documents where the nested array json object matches a given string.
Given the following 2 Product JSON documents in a MongoDB collection..
[{
"_id" : ObjectId("5be1a1b2aa21bb3ceac339b0"),
"id" : "1",
"prod_attr" : [
{
"name" : "Branded X 1 Sneaker"
},
{
"hierarchy" : {
"dept" : "10",
"class" : "101",
"subclass" : "1011"
}
}
]
},
{
"_id" : ObjectId("7be1a1b2aa21bb3ceac339xx"),
"id" : "2",
"prod_attr" : [
{
"name" : "Branded Y 2 Sneaker"
},
{
"hierarchy" : {
"dept" : "10",
"class" : "101",
"subclass" : "2022"
}
}
]
}
]
I would like to
1. return all documents where prod_att.hierarchy.subclass = "2022"
2. return all documents where prod_attr.name contains "Sneaker"
I appreciate the JSON could be structured differently, unfortunately that is not within my control to change.
1. Return all documents where prod_attr.hierarchy.subclass = "2022"
Based on the Query an Array of Embedded Documents documentation of MongoDB you can use dot notation concatenating the name of the array field (prod_attr), with a dot (.) and the name of the field in the nested document (hierarchy.subclass):
collection.find({"prod_attr.hierarchy.subclass": "2022"})
2. Return all documents where prod_attr.name contains "Sneaker"
As before, you can use the dot notation to query a field of a nested element inside an array.
To perform the "contains" query you have to use the $regex operator:
collection.find({"prod_attr.name": {"$regex": "Sneaker"}})
Another option is to use the MongoDB Aggregation framework:
collection.aggregate([
{"$unwind": "$prod_attr"},
{"$match": {"prod_attr.hierarchy.subclass": "2022"}}
])
the $unwind operator creates a new object for each object inside the prod_attr array, so you will have only nested documents and no array (check the documentation for details).
The next step is the $match operator that actually perform a query on the nested object.
This is a simple example but playing with the Aggregators Operators you have a lot of flexibility.
Related
MongoDB provides a very useful $sort modifier to use with the $push operation, but unfortunately it looks like I can only ask it to sort my elements by a named key - which I don't have.
Another option that could be the alternative is the sort function: sort({'array.0': 1})source, but what if I have an array inside an array?
My documents structure in the collection is as follows:
{
"_id" : 123456,
"v" : [
[
some id,
some score
],
[
"456456",
1.5
],
[
"654645",
43
],
...
]
}
And I want to sort the v array by the score (second element) of each child array.
So using $sort: I can't use a key name
'$push': {
'v': {
'$each': [[id, score]],
'$sort': {???: -1},
}
}
and using sort(): I can't use an array of array:
sort({'v.???.0': -1})
Is there something I'm missing, or is there another way to do this kind of sorting without fetching the document, sorting, and replacing? That would be a very time-consuming opeartion.
Thanks!
I'm trying to use $set to create an array/list/collection (not sure which is proper terminology), and I'm not sure how to do it. For example:
I have a document inserted into my database that looks like this:
"_id": (unique, auto-generated id)
"Grade": Sophomore
I want to insert a collection/list/array using update. So, basically I want this:
"_id": (unique, auto-generated id)
"Grade": Sophomore
"Information"{
"Class_Info": [
{"Class_Name": "Math"}
]
What I've been doing so far is using .update and dot notation. So, what I was trying to do was use $set like this:
collection.update({'_id': unique ID}, {'$set': {'Information.Class_Info.Class_Name': 'Math}})
However, what that is doing is making Class_Info a document and not a list/collection/array, so it's doing:
"_id": (unique id)
"Grade": Sophomore
"Information"{
"Class_Info": {
"Class_Name": "Math"
}
How do I specify that I want Class_Info to be a list? IF for some reason I absolutely cannot use $set to do this, it is very important that I can use dot notation because of the way the rest of my program works, so if I'm supposed to use something other than $set, can it have dot notation to specify where to insert the list? (I know $push is another option, but it doesn't use dot notation, so I can't really use it in my case).
Thanks!
If you want to do it with only one instruction but starting up from NOT having any key created yet, this is the only way to do it ($set will never create an array that's not explicit, like {$set: {"somekey": [] }}
db.test.update(
{ _id: "(unique id)" },
{ $push: {
"Information.Class_Info": { "Class_Name": "Math" }
}}
)
This query does the trick, push to a non-existing key Information.Class_Info, the object you need to create as an array. This is the only possible solution with only one instruction, using dot notation and that works.
There is a way to do it with one instructions, $set and dot notation, as follows:
db.test.updateOne(
{ _id: "my-unique-id" },
{ $set: {
"Information.Class_Info": [ { "Class_Name": "Math" } ]
}}
)
There is also a way to do it with two instructions and the array index in the dot notation, allowing you to use similar statements to add more array elements:
db.test.updateOne(
{ _id: "my-unique-id" },
{ $set: { "Information.Class_Info": [] }}
)
db.test.updateOne(
{ _id: "my-unique-id" },
{ $set: {
"Information.Class_Info.0": { "Class_Name": "Math" },
"Information.Class_Info.1": { "Class_AltName": "Mathematics" }
}}
)
Deviating from these options has interesting failure modes:
If you try to combine the second option into a single updateOne() call, which is usually possible, MongoDB will complain that "Updating the path 'Information.Class_Info.0' would create a conflict at 'Information.Class_Info'"
If you try to use dot the notation with the array index ("Information.Class_Info.0.Class_Name": "Math") but without creating an empty array first, then MongoDB will create an object with numeric keys ("0", "1", …). It really refuses to create array except when told explicitly using […] (as also told in the answer by #Maximiliano).
Alright so using Python and MongoDB I am trying to embed a subdocument within an array with a custom key value in the array. I was playing around with all sorts of different ways to do this and I couldn't figure out what I was doing wrong so I temporarily settled on the working code below. Numerous attempts always lead to the error:
in _check_write_command_response
raise OperationFailure(error.get("errmsg"), error.get("code"), error) pymongo.errors.OperationFailure: The dotted field 'first.rule'
in 'followedBy..first.rule' is not valid for storage.
Code:
citizens.update(
{"_id" : userPush},
{"$push": {"followedBy":[field[1], field[2], field[3], field[0]]}})
Produces:
"_id" : ObjectId("5…asfd"),
"uName" : "tim0",
"fName" : "tim",
"lName" : "lost",
"pic" : null,
"bio" : "I <3 MongoDB",
"followedBy" : [
[
"BobTheBomb",
"bobby",
"knight",
NumberInt(2)
],
[
"Robert",
"DROP",
"TABLE",
NumberInt(6)
]
This is what I want:
"_id" : ObjectId("5…asfd"),
"uName" : "tim0",
"fName" : "tim",
"lName" : "lost",
"pic" : null,
"bio" : "I <3 MongoDB",
"followedBy" : [
"BobTheBomb": {
"fName" : "bobby",
"lName" : "knight",
"uID" : NumberInt(2)
},
"Robert": {
"fName" : " DROP ",
"lName" : " TABLE ",
"uID" : NumberInt(6)
}
]
You will need to build that data structure, currently you are saying that "followedBy" is only a list.
so try:
citizens.update(
{"_id" : userPush},
{"$push": {"followedBy":{field[1]: { "fName":field[2], "lName":field[3], "uID":field[0]}}}})
Remove the list and replace it with a dict.
I do hope this helps.
I have realised that I have not given you valid json, I have tested this:
citizens.update(
{"_id" : userPush},
{$push:
{"followedBy":
[
{field[1]:
{ "fName": field[2], "lName": field[3], "uID": field[0]}
}
]
}
})
And it worked...
You might find that the error is caused by the modifier you are using, I found the following on a blog:
MongoDB provides several different modifiers you can use to update documents in place, including the following (for more details see updates):
- $inc Increment a numeric field (generalized; can increment by any number)
- $set Set certain fields to new values
- $unset Remove a field from the document
- $push Append a value onto an array in the document
- $pushAll Append several values onto an array
- $addToSet Add a value to an array if and only if it does not already exist
- $pop Remove the last (or first) value of an array
- $pull Remove all occurrences of a value from an array
- $pullAll Remove all occurrences of any of a set of values from an array
- $rename Rename a field
- $bit Bitwise updates
you might find that because you are inserting many items that you would rather want to use $pushAll or $addToSet rather than $push... Just a speculation...
I am running a mongodb find query with an $in operator:
collection.find({name: {$in: [name1, name2, ...]}})
I would like the results to be sorted in the same order as my name array: [name1, name2, ...]. How do I achieve this?
Note: I am accessing MongoDb through pymongo, but I don't think that's of any importance.
EDIT: as it's impossible to achieve this natively in MongoDb, I ended up using a typical Python solution:
names = [name1, name2, ...]
results = list(collection.find({"name": {"$in": names}}))
results.sort(key=lambda x: names.index(x["name"]))
You can achieve this with aggregation framework starting with upcoming version 3.4 (Nov 2016).
Assuming the order you want is the array order=["David", "Charlie", "Tess"] you do it via this pipeline:
m = { "$match" : { "name" : { "$in" : order } } };
a = { "$addFields" : { "__order" : { "$indexOfArray" : [ order, "$name" ] } } };
s = { "$sort" : { "__order" : 1 } };
db.collection.aggregate( m, a, s );
The "$addFields" stage is new in 3.4 and it allows you to "$project" new fields to existing documents without knowing all the other existing fields. The new "$indexOfArray" expression returns position of particular element in a given array.
The result of this aggregation will be documents that match your condition, in order specified in the input array order, and the documents will include all original fields, plus an additional field called __order
Impossible. $in operator checks the presence. The list is treated as set.
Options:
Split for several queries for name1 ... nameN or filter the result the same way.
More names - more queries.
Use itertools groupby/ifilter. In that case - add the "sorting precedence" flag to every document and match name1 to PREC1, name2 to PREC2, ...., then isort by PREC then group by PREC.
If your collection has the index on "name" field - option 1 is better.
If doest not have the index or you cannot create it due to high write/read ratio - option 2 is for you.
Vitaly is correct it's impossible to do that with find but it can be achieved with aggregates:
db.collection.aggregate([
{ $match: { name: { $in: [name1, name2, /* ... */] } } },
{
$project: {
name: 1,
name1: { $eq: ['name1', '$name'] },
name2: { $eq: ['name2', '$name'] },
},
},
{ $sort: { name1: 1, name2: 1 } },
])
tested on 2.6.5
I hope this will hint other people in the right direction.
Good day everyone.
Suppose we have a collection and a document which looks something like this:
test_doc = {
"ID" : "123",
"a" : [
{
'x' : "/",
'y' : "2000",
'z' : "1000"
},
{
'x' : "/var",
'y' : "3500",
'z' : "3000"
}
]
}
What i need is to retrieve a single property a.z .
In MongoDB i'm using the following query:
db.testcol.find({"ID":"123","a.x":"/"},{'a.z':1})
which returns this:
{ "_id" : ObjectId("skipped"), "a" : [ { "z" : "1000" }, { "z" : "3000" } ] }
As you can see it returns all the z properties, but i need only the first one or the second when condition is {"ID":"123","a.x":"/var"}
So, the question is: how do i get a single property in this situation? Is it just a matter of bad design or should i somehow process the returned document in code (python)? Any suggestions will be much appreciated.
In MongoDB 2.0 and older, this is not possible. What you want to do is return a specific element of the array - but that is not what your projection is actually doing, it will just return the whole array and then the z element of each one.
However, with 2.2 (rc2 as of writing this answer), things have gotten a bit better. You can now use $elemMatch as part of your projection (see SERVER-2238 for details) so that you only pull back the required array element. So, try something like this:
db.foo.find({"ID":"123",'a':{$elemMatch:{'x':"/"}}},{_id : 0, 'a.$': 1})
//returns
{ "a" : [ { "x" : "/", "y" : "2000", "z" : "1000" } ] }
Or, just use $elemMatch in the projection itself, which you may think is cleaner:
db.foo.find({"ID":"123"},{_id : 0, 'a':{$elemMatch:{'x':"/"}}})
//returns
{ "a" : [ { "x" : "/", "y" : "2000", "z" : "1000" } ] }
So, now, at least the array returned is only the one containing only the entries you want and you can simply reference the relevant z element (elemMatch projections on a subdocument are not yet supported).
Last but not least, in 2.2 we have the aggregation framework, and one of the things it can do (with the $project operator, is to reshape your documents and change sub documents and array elements into top level arrays. To get your desired result, you would do something like this:
db.foo.aggregate(
{$match : {"ID":"123"}},
{$unwind : "$a"},
{$match : {"a.x":"/"}},
{$project : {_id : 0, z : "$a.z"}}
)
The result looks like this:
{ "result" : [ { "z" : "1000" } ], "ok" : 1 }