MongoDB $sort array - python

MongoDB provides a very useful $sort modifier to use with the $push operation, but unfortunately it looks like I can only ask it to sort my elements by a named key - which I don't have.
Another option that could be the alternative is the sort function: sort({'array.0': 1})source, but what if I have an array inside an array?
My documents structure in the collection is as follows:
{
"_id" : 123456,
"v" : [
[
some id,
some score
],
[
"456456",
1.5
],
[
"654645",
43
],
...
]
}
And I want to sort the v array by the score (second element) of each child array.
So using $sort: I can't use a key name
'$push': {
'v': {
'$each': [[id, score]],
'$sort': {???: -1},
}
}
and using sort(): I can't use an array of array:
sort({'v.???.0': -1})
Is there something I'm missing, or is there another way to do this kind of sorting without fetching the document, sorting, and replacing? That would be a very time-consuming opeartion.
Thanks!

Related

Mongodb Python: How to use $push overwrites existing value rather than adds to the end of an array within a document

I have a document like this
{ "_id" : 23, "local_id" : 1234, "global_id" : [ "P123", "P345" ] }
If I want to $push new value to the array has the key “global_id” then I can do this
collection.update_one({‘local_id’: l_pid}, {’$push’: {‘global_id’: "P678"}})
and the document looks sth like this , for example : (push P678 to the array)
{ “_id” : 23, “local_id” : 1234, “global_id” : [ “P123”, “P345”,
“P678” ] }
But next time when the same key of “global_id” comes in it keeps appending to the end of array like : (this time the same P678 comes in)
{ “_id” : 23, “local_id” : 1234, “global_id” : [ “P123”, “P345”,
“P678” , “P678”] }
I want it to overwrite to existing value, and the array has to have unique value, the value can’t be the same.
How can I do it?
Thanks
Base on #rickhg12hs answer, use $addToSet solve my issue
collection.update_one({‘local_id’: l_pid}, {’$addToSet’: {‘global_id’: "P678"}})

How to extract info from the dictionaries within a list?

I'm new to Python, trying to gather data from a json file that consists of a list that contains info inside dictionaries as follows. How do I extract the "count" data from this? (Without using list comprehension)
{
"stats":[
{
"name":"Ron",
"count":98
},
{
"name":"Sandy",
"count":89
},
{
"name":"Sam",
"count":77
}
]
}
Index the list using the stats key then iterate through it
data = {
"stats":[
{
"name":"Ron",
"count":98
},
{
"name":"Sandy",
"count":89
},
{
"name":"Sam",
"count":77
}
]
}
for stat in data['stats']:
count = stat['count']
Consider the dictionary data stored in a variable source.
source = {
"stats":[
{
"name":"Ron",
"count":98
},
{
"name":"Sandy",
"count":89
},
{
"name":"Sam",
"count":77
}
]
}
Now to access the count field inside of "stats" we use indexing.
For example, to view the count of "Ron" you would write:
print(source['stats'][0]['count'])
This will result in 98
Similarly, for "Sam" it will be
print(source['stats'][2]['count'])
And the result will be 77
In short, we first index the key of dictionary, then the array position and then provide the filed from array of which you want the data.
I hope it helped.
Simply append all those values to do calculations:
count_values = []
for dic in data['stats']:
count_values.append(dic['count'])
# Do anything with count_values
print(count_values)
According to the Zen of Python, "Simple is better than complex."
Thus, list comprehension is actually the best way to extract the information you need and still have it available for further processing (in the form of a list of count values).
d = <your dict-list>
count_data_list = [ x['count'] for x in d['stats'] ]
If not, and your intention is to process the "count" data as it is extracted, I'd suggest a for-loop:
d = <your dict-list>
for x in d['stats']:
count_data = x['count']
<process "count_data">
using a map function will do that in a single line
>>> result = list(map(lambda x: x['count'], data['stats']))
[98, 89, 77]

How to use PyMongo find() to search nested array attribute?

Using PyMongo, how would one find/search for the documents where the nested array json object matches a given string.
Given the following 2 Product JSON documents in a MongoDB collection..
[{
"_id" : ObjectId("5be1a1b2aa21bb3ceac339b0"),
"id" : "1",
"prod_attr" : [
{
"name" : "Branded X 1 Sneaker"
},
{
"hierarchy" : {
"dept" : "10",
"class" : "101",
"subclass" : "1011"
}
}
]
},
{
"_id" : ObjectId("7be1a1b2aa21bb3ceac339xx"),
"id" : "2",
"prod_attr" : [
{
"name" : "Branded Y 2 Sneaker"
},
{
"hierarchy" : {
"dept" : "10",
"class" : "101",
"subclass" : "2022"
}
}
]
}
]
I would like to
1. return all documents where prod_att.hierarchy.subclass = "2022"
2. return all documents where prod_attr.name contains "Sneaker"
I appreciate the JSON could be structured differently, unfortunately that is not within my control to change.
1. Return all documents where prod_attr.hierarchy.subclass = "2022"
Based on the Query an Array of Embedded Documents documentation of MongoDB you can use dot notation concatenating the name of the array field (prod_attr), with a dot (.) and the name of the field in the nested document (hierarchy.subclass):
collection.find({"prod_attr.hierarchy.subclass": "2022"})
2. Return all documents where prod_attr.name contains "Sneaker"
As before, you can use the dot notation to query a field of a nested element inside an array.
To perform the "contains" query you have to use the $regex operator:
collection.find({"prod_attr.name": {"$regex": "Sneaker"}})
Another option is to use the MongoDB Aggregation framework:
collection.aggregate([
{"$unwind": "$prod_attr"},
{"$match": {"prod_attr.hierarchy.subclass": "2022"}}
])
the $unwind operator creates a new object for each object inside the prod_attr array, so you will have only nested documents and no array (check the documentation for details).
The next step is the $match operator that actually perform a query on the nested object.
This is a simple example but playing with the Aggregators Operators you have a lot of flexibility.

MongoDB query in pymongo with sort feature

I am new in MongoDB and I am trying to create a query.
I have a list, for example: mylist = [a,b,c,d,e]
My dataset has one key with a similar list: mydatalist = [b,d,g,e]
I want to create a query that will return all the data that contains at least one from the mylist.
What I have done.
query = {'mydatalist': {'$in': mylist}}
selector = {'_id':1,'name':1}
mydata = collection.find(query,selector)
That's work perfect. The only thing I want to do and I cannot is to sort the results in base of the number of mylist data they have in the mydatalist. Is there any way to do this in the query or I have to do it manually after in the cursor?
Update with an example:
mylist = [a,b,c,d,e,f,g]
#data from collection
data1[mydatalist] = [a,b,k,l] #2 items from mylist
data2[mydatalist] = [b,c,d,e,m] #4items from mylist
data3[mydatalist] = [a,u,i] #1 item from mylist
So, I want the results to be sorted as data2 -> data1 -> data3
So you want the results sorted by the number of matches to your array selection. Not a simple thing for a find but this can be done with the aggregation framework:
db.collection.aggregate([
// Match your selection to minimise the
{$match: {list: {$in: ['a','b','c','d','e','f','g']}}},
// Projection trick, keep the original document
{$project: {_id: {_id: "$_id", list: "$list" }, list: 1}},
// Unwind the array
{$unwind: "$list"},
// Match only the elements you want
{$match: {list: {$in: ['a','b','c','d','e','f','g']}}},
// Sum up the count of matches
{$group: {_id: "$_id", count: {$sum: 1}}},
// Order by count descending
{$sort: {count: -1 }},
// Clean up the response, however you want
{$project: { _id: 0, _id: "$_id._id", list: "$_id.list", count: 1 }}
])
And there you have your documents in the order you want:
{
"result" : [
{
"_id" : ObjectId("5305bc2dff79d25620079105"),
"count" : 4,
"list" : ["b","c","d","e","m"]
},
{
"_id" : ObjectId("5305bbfbff79d25620079104"),
"count" : 2,
"list" : ["a","b","k","l"]
},
{
"_id" : ObjectId("5305bc41ff79d25620079106"),
"count" : 1,
"list" : ["a","u","i"]
}
],
"ok" : 1
}
Also, it is probably worth mentioning that aggregate in all recent driver versions will return a cursor just as is the case with find. Currently this is emulated by the driver, but as of version 2.6 it will really be for real. This makes aggregate a very valid "swap-in" replacement for find in your implemented calls.

MongoDb: $sort by $in

I am running a mongodb find query with an $in operator:
collection.find({name: {$in: [name1, name2, ...]}})
I would like the results to be sorted in the same order as my name array: [name1, name2, ...]. How do I achieve this?
Note: I am accessing MongoDb through pymongo, but I don't think that's of any importance.
EDIT: as it's impossible to achieve this natively in MongoDb, I ended up using a typical Python solution:
names = [name1, name2, ...]
results = list(collection.find({"name": {"$in": names}}))
results.sort(key=lambda x: names.index(x["name"]))
You can achieve this with aggregation framework starting with upcoming version 3.4 (Nov 2016).
Assuming the order you want is the array order=["David", "Charlie", "Tess"] you do it via this pipeline:
m = { "$match" : { "name" : { "$in" : order } } };
a = { "$addFields" : { "__order" : { "$indexOfArray" : [ order, "$name" ] } } };
s = { "$sort" : { "__order" : 1 } };
db.collection.aggregate( m, a, s );
The "$addFields" stage is new in 3.4 and it allows you to "$project" new fields to existing documents without knowing all the other existing fields. The new "$indexOfArray" expression returns position of particular element in a given array.
The result of this aggregation will be documents that match your condition, in order specified in the input array order, and the documents will include all original fields, plus an additional field called __order
Impossible. $in operator checks the presence. The list is treated as set.
Options:
Split for several queries for name1 ... nameN or filter the result the same way.
More names - more queries.
Use itertools groupby/ifilter. In that case - add the "sorting precedence" flag to every document and match name1 to PREC1, name2 to PREC2, ...., then isort by PREC then group by PREC.
If your collection has the index on "name" field - option 1 is better.
If doest not have the index or you cannot create it due to high write/read ratio - option 2 is for you.
Vitaly is correct it's impossible to do that with find but it can be achieved with aggregates:
db.collection.aggregate([
{ $match: { name: { $in: [name1, name2, /* ... */] } } },
{
$project: {
name: 1,
name1: { $eq: ['name1', '$name'] },
name2: { $eq: ['name2', '$name'] },
},
},
{ $sort: { name1: 1, name2: 1 } },
])
tested on 2.6.5
I hope this will hint other people in the right direction.

Categories