MongoDb: $sort by $in - python

I am running a mongodb find query with an $in operator:
collection.find({name: {$in: [name1, name2, ...]}})
I would like the results to be sorted in the same order as my name array: [name1, name2, ...]. How do I achieve this?
Note: I am accessing MongoDb through pymongo, but I don't think that's of any importance.
EDIT: as it's impossible to achieve this natively in MongoDb, I ended up using a typical Python solution:
names = [name1, name2, ...]
results = list(collection.find({"name": {"$in": names}}))
results.sort(key=lambda x: names.index(x["name"]))

You can achieve this with aggregation framework starting with upcoming version 3.4 (Nov 2016).
Assuming the order you want is the array order=["David", "Charlie", "Tess"] you do it via this pipeline:
m = { "$match" : { "name" : { "$in" : order } } };
a = { "$addFields" : { "__order" : { "$indexOfArray" : [ order, "$name" ] } } };
s = { "$sort" : { "__order" : 1 } };
db.collection.aggregate( m, a, s );
The "$addFields" stage is new in 3.4 and it allows you to "$project" new fields to existing documents without knowing all the other existing fields. The new "$indexOfArray" expression returns position of particular element in a given array.
The result of this aggregation will be documents that match your condition, in order specified in the input array order, and the documents will include all original fields, plus an additional field called __order

Impossible. $in operator checks the presence. The list is treated as set.
Options:
Split for several queries for name1 ... nameN or filter the result the same way.
More names - more queries.
Use itertools groupby/ifilter. In that case - add the "sorting precedence" flag to every document and match name1 to PREC1, name2 to PREC2, ...., then isort by PREC then group by PREC.
If your collection has the index on "name" field - option 1 is better.
If doest not have the index or you cannot create it due to high write/read ratio - option 2 is for you.

Vitaly is correct it's impossible to do that with find but it can be achieved with aggregates:
db.collection.aggregate([
{ $match: { name: { $in: [name1, name2, /* ... */] } } },
{
$project: {
name: 1,
name1: { $eq: ['name1', '$name'] },
name2: { $eq: ['name2', '$name'] },
},
},
{ $sort: { name1: 1, name2: 1 } },
])
tested on 2.6.5
I hope this will hint other people in the right direction.

Related

How to get "key" if value (i.e. data in string) in mongodb is konwn using python? [duplicate]

Is it possible to wildcard the key in a query? For instance, given the following record, I'd like to do a .find({'a.*': 4})
This was discussed here https://jira.mongodb.org/browse/SERVER-267 but it looks like it's not been resolved.
{
'a': {
'b': [1, 2],
'c': [3, 4]
}
}
As asked, this is not possible. The server issue you linked to is still under "issues we're not sure of".
MongoDB has some intelligence surrounding the use of arrays, and I think that's part of the complexity surrounding such a feature.
Take the following query db.foo.find({ 'a.b' : 4 } ). This query will match the following documents.
{ a: { b: 4 } }
{ a: [ { b: 4 } ] }
So what does "wildcard" do here? db.foo.find( { a.* : 4 } ) Does it match the first document? What about the second?
Moreover, what does this mean semantically? As you've described, the query is effectively "find documents where any field in that document has a value of 4". That's a little unusual.
Is there a specific semantic that you're trying to achieve? Maybe a change in the document structure will get you the query you want.
I've came across this question because I faced the same issue. The accepted answer provider here does explains why this is not supported but not really solves the issue itself.
I've ended up with a solution that makes the wildcard usage purposed here redundant and share here just in case someone will find this post some day
Why I wanted to use wildcards in my MongoDB queries?
In my case, I needed this "feature" in order to be able to find a match inside a dictionary (just as the question's code demonstrates).
What's the alternatives?
Use a reversed map (very similar to how DNS works) and simply use it. So, in our case we can use something similar to this:
{
"a": {
"map": {
"b": [1, 2, 3],
"c": [3, 4]
},
"reverse-map": {
"1": [ "b" ],
"2": [ "b" ],
"3": [ "b", "c" ],
"4": [ "c" ]
}
}
}
I know, it takes more memory and insert / update operations should validate this set is always symmetric and yet - it solves the problem. Now, instead of making an imaginary query like
db.foo.find( { a.map.* : 4 } )
I can make an actual query
db.foo.find( { a.reverse-map.4 : {$exists: true} } )
Which will return all items that have a specific value (in our example 4)
I know - this approach takes more memory and you need to manage indexes properly if you want to gain good performance (read the docs) and still - it's good for my use-case. Hope this helps someone else someday as well
Starting from MongoDB v3.4+, you can use $objectToArray to convert a into an array of k-v tuples for querying.
db.collection.aggregate([
{
"$addFields": {
"a": {
"$objectToArray": "$a"
}
}
},
{
$match: {
"a.v": 4
}
},
{
"$addFields": {
// cosmetics to revert back to original structure
"a": {
"$arrayToObject": "$a"
}
}
}
])
Here is the Mongo playground for your reference.

Pull from nested Array in MongoDB

I have a document structure like this:
{
"name": "Example",
"description": "foo",
"vocabulary": [
["apple", "pomme"],
["hello", "bonjour"],
["bus", "bus"]
]
}
Now I want to pull an array inside the vocabulary array by specifying the first item, a.E.:
{"$pull": {"vocabulary.$": ["apple"]}
Which should remove the array ["apple", "pomme"] from vocabulary, but this doesn't work.
I tried this ($pull from nested array), but it did not work, it threw
pymongo.errors.WriteError:
The positional operator did not find the match needed from the query., full error: {'index': 0, 'code': 2, 'errmsg': 'The positional operator did not find the match needed from the query.'}
Very tricky question.
I think for this case $ positional operator is not suitable.
Instead, you need an aggregation pipeline in update query.
Query ($filter) the values with the "apple" word is not ($not) existed ($in) in the vocabulary array field. Then $set to the vocabulary field.
db.collection.update({},
[
{
"$set": {
"vocabulary": {
$filter: {
input: "$vocabulary",
cond: {
$not: {
$in: [
"apple",
"$$this"
]
}
}
}
}
}
}
])
Sample Mongo Playground

How to use PyMongo find() to search nested array attribute?

Using PyMongo, how would one find/search for the documents where the nested array json object matches a given string.
Given the following 2 Product JSON documents in a MongoDB collection..
[{
"_id" : ObjectId("5be1a1b2aa21bb3ceac339b0"),
"id" : "1",
"prod_attr" : [
{
"name" : "Branded X 1 Sneaker"
},
{
"hierarchy" : {
"dept" : "10",
"class" : "101",
"subclass" : "1011"
}
}
]
},
{
"_id" : ObjectId("7be1a1b2aa21bb3ceac339xx"),
"id" : "2",
"prod_attr" : [
{
"name" : "Branded Y 2 Sneaker"
},
{
"hierarchy" : {
"dept" : "10",
"class" : "101",
"subclass" : "2022"
}
}
]
}
]
I would like to
1. return all documents where prod_att.hierarchy.subclass = "2022"
2. return all documents where prod_attr.name contains "Sneaker"
I appreciate the JSON could be structured differently, unfortunately that is not within my control to change.
1. Return all documents where prod_attr.hierarchy.subclass = "2022"
Based on the Query an Array of Embedded Documents documentation of MongoDB you can use dot notation concatenating the name of the array field (prod_attr), with a dot (.) and the name of the field in the nested document (hierarchy.subclass):
collection.find({"prod_attr.hierarchy.subclass": "2022"})
2. Return all documents where prod_attr.name contains "Sneaker"
As before, you can use the dot notation to query a field of a nested element inside an array.
To perform the "contains" query you have to use the $regex operator:
collection.find({"prod_attr.name": {"$regex": "Sneaker"}})
Another option is to use the MongoDB Aggregation framework:
collection.aggregate([
{"$unwind": "$prod_attr"},
{"$match": {"prod_attr.hierarchy.subclass": "2022"}}
])
the $unwind operator creates a new object for each object inside the prod_attr array, so you will have only nested documents and no array (check the documentation for details).
The next step is the $match operator that actually perform a query on the nested object.
This is a simple example but playing with the Aggregators Operators you have a lot of flexibility.

Add pattern inside $in, and aggregate

I am adding a pattern inside an $in in an aggregate function.
I know there values exist but my query is returning nothing for the pattern.
Here is my query:
db.collection.aggregate([{"$unwind":"$tags"},
{'$match':
{
'tags.tag.name': {
"$in": ['AA', 'CS', '/Nie/i']},
'auditRun': 12345}},
{'$project': {
'tags.tag.name':1,
'_id': 0}},])
I am getting results for AA and CS but I am not getting anything back for the Nie.
I am expecting a few results for that as well because I have a bunch of names starting with Nie.
What am I doing wrong?
You could use the $or operator along with $regex to separately lookup the items you need a regular expression search on. It cannot be combined into a single list of items.
db.collection.aggregate([{
"$unwind": "$tags"
},
{
'$match': {
$or: [
{'tags.tag.name': {"$in": ['AA','CS']}},
{'tags.tag.name': { $regex: 'Nie', $options: 'i'}}
],
'auditRun': 12345
}
},
{
'$project': {
'tags.tag.name': 1,
'_id': 0
}
},
])
Also want to mention that if you only need the ones that start with Nie, then your regex should be ^Nie

Selectively retrieving depending on existence of key in map

Is it possible to selectively retrieve depending on the existence of keys in a map in mongodb? And if so, how do you go about doing it?
Suppose I have a document that looks like this for example..
{ "_id": 1234,
"parentfield1" : {
"childfield1" : { ...},
"childfield2" : { ...},
"childfield5" : { ...}, // There might be many childfields.. > 50
},
}
How would I be able to selectively retrieve from the document a/some particular childfields given multiple options to choose from? Some of which may not exist in the document.
i.e.
input "childfield1", "childfield2", "childfield3"
-> output
{ "_id": 1234,
"parentfield1": {
"childfield1" : { ... },
"childfield2" : { ... },
},
}
Is it even doable? Is it possible to do efficiently also?
Any help would be great (python, go).
Yes, that's the purpose of the projection parameter of find:
db.collection.find({_id: 1234}, {
'parentfield1.childfield1': 1,
'parentfield1.childfield2': 1,
'parentfield1.childfield3': 1
});
If a specified field isn't present in a given doc, the other matching fields will still be included.
Build up the projection parameter object programmatically if you want it to be dynamic.

Categories