How can I transpose a list of documents in MongoDB? - python

I have a document like:
{
"_id": "6345e01473144cec0073ea95",
"results": [
{"total_cost": 10, "total_time": 20},
{"total_cost": 30, "total_time": 40}
]
}
And I want to 'transpose' the list of documents to get:
{
"total_cost": [10, 30],
"total_time": [20, 40]
}
How can I find an object by ID, and then apply a transform on a list of documents with a mongo aggregation?
Every question/answer I have seen doing this has been for multiple documents, however this is for a single document with a list field.
(I am using MongoEngine/Pymongo so I have included the python tag)

simply access the fields with dot notation. You can think the projection of the fields as an individual array, i.e. results.total_cost is an array with content [10, 30]
db.collection.aggregate([
{
$project: {
total_cost: "$results.total_cost",
total_time: "$results.total_time"
}
}
])
Here is the Mongo Playground with your reference.

Related

is it possible to use wildcards for field names in mongodb?

I have a set of field names as follows:
"field0.registers.hilo"
"field0.registers.lllo"
...
"field1.registers.hilo"
"field1.registers.lllo"
...
"field2.registers.hilo"
"field2.registers.lllo"
...
"fieldn.registers.hilo"
"fieldn.registers.lllo"
...
Is there a way to indicate the fields in mongodb with the index to range from 0 to n succinctly without having to expand it all out beforehand?
something like this example for project:
{ $project: { "fieldn.registers.hilo": 1, "fieldn.registers.lllo": 1 } }
For now, I am fully expanding all the project fields from 0 to n in python before interfacing with the collection using pymongo.
is it possible to use wildcards for field names in mongodb?
No.
If your data is in this structure, refactor it to use lists. That's exactly what lists are desgined for.
Taking the refactored example below, Use $elemMatch to project only the array elements needed:
from pymongo import MongoClient
db = MongoClient()['mydatabase']
db.register.insert_many([{
'registers': [
{
'field': 0,
'hilo': 1,
'lllo': 2
},
{
'field': 1,
'hilo': 2,
'lllo': 3
},
{
'field': 2,
'hilo': 3,
'lllo': 4
}
]}])
print(db.register.find_one({}, {'registers': {'$elemMatch': {'field': 1}}}))
prints:
{'_id': ObjectId('60b64e57c3214d73c390557b'), 'registers': [{'field': 1, 'hilo': 2, 'lllo': 3}]}

How to aggregate through nested dictionaries, sum its values, and rank them accordingly?

I have a MongoDB database containing frequencies of words in the document level as shown below. I have about 175k documents in the same format, totaling about 2.5GB.
{
"_id": xxx,
"title": "zzz",
"vectors": {
"word1": 28,
"word2": 22,
"word3": 12,
"word4": 7,
"word5": 4
}
Now I want to iterate through all documents, calculate the sum of all frequencies for each word, and get a total ranking of these words I have in the vectors field based on the frequencies as such:
{
"vectors": {
"word1": 223458,
"word2": 98562,
"word3": 76433,
"word4": 4570,
"word5": 2599
}
$unwind does not seem to work here as I have a nested dictionary. I'm relatively new to MongoDB, and I couldn't find answers specific to this. Any ideas?
You have to convert the keys of sub-object to value using $objectToArray and then $unwind the newly converted array ($unwind will work only for array fields, that's why it didn't work for you).
Finally, group by $vectors.k where the sub-object key has been converted to value.
db.collection.aggregate([
{
"$project": {
"vectors": {
"$objectToArray": "$vectors"
}
},
},
{
"$unwind": "$vectors"
},
{
"$group": {
"_id": "$vectors.k",
"count": {
"$sum": "$vectors.v"
},
},
},
])
Mongo Playground Sample Execution

Google sheets python script conditional formatting custom formula

I want to use a script to format the background color for cells in two separate ranges based on values in a single range.
Ranges to be formatted:
AU6:BL405
BN6:CE405
Based on value = 1 in range:
A6:R405
I'm hoping that using a script will make the file more efficient compared to setting up the conditional formatting using the custom formula formatting function in Sheets.
I would appreciate any help!
Here is the code I've come up with so far:
SpreadsheetID: xxx
myRangeA = {
startRowIndex: 6,
startColumnIndex: 47,
endColumnIndex: 64,
}
myRangeB = {
startRowIndex: 6,
startColumnIndex: 66,
endColumnIndex: 83,
}
reqs = [
{addConditionalFormatRule: {
index: 0,
rule: {
ranges: [ myRangeA, myRange B ],
booleanRule: {
format: {
backgroundColor: {backgroundColor: {red: 0.8}}}
condition: {
type: CUSTOM_FORMULA,
values:
[{userEnteredValue: =A6=1}]
},
},
},
}},
]
SHEETS.spreadsheets().batchUpdate(spreadsheetId=SHEET_ID,
body={requests: reqs}).execute()

MongoDB - Convert One record with an array to multiple records in a new collection

[MongoDB shell or pyMongo] I would like to know how to efficiently convert one record in a collection with an array in one field, to multiple records in say anew collection. So far, the only solution, I've been able to achieve is iterating the records one by one and then iterating the array in the field I want and do individual inserts. I'm hoping there's a more efficient way to do this.
Example:
I want to take a collection in MongoDB with structure similar to :
[{
"_id": 1,
"points": ["a", "b", "c"]
}, {
"_id": 2,
"points": ["d"]
}]
and convert it to something like this:
[{
"_id": 1,
"points": "a"
}, {
"_id": 2,
"points": "b"
}, {
"_id": 3,
"points": "c"
}, {
"_id": 4,
"points": "d"
}]
Assuming you're ok with auto-generated _id values in the new collection, you can do this with an aggregation pipeline that uses $unwind to unwind the points array and $out to output the results to a new collection:
db.test.aggregate([
// Duplicate each doc, one per points array element
{$unwind: '$points'},
// Remove the _id field to prompt regeneration as there are now duplicates
{$project: {_id: 0}},
// Output the resulting docs to a new collection, named 'newtest'
{$out: 'newtest'}
])
Here's another version which can be expected to perform worse than #JohnnyHK's solution because of a second $unwind and a potentially massive $group but it generates integer IDs based on some order that you can specify in the $sort stage:
db.collection.aggregate([{
// flatten the "points" array to get individual documents
$unwind: { "path": "$points" },
}, {
// sort by some criterion
$sort: { "points": 1 }
}, {
// throw all sorted "points" in the very same massive array
$group: {
_id: null,
"points": { $push: "$points" },
}
}, {
// flatten the massive array making each document's position index its `_id` field
$unwind: {
"path": "$points",
includeArrayIndex: "_id"
}
} , {
// write results to new "result" collection
$out: "result"
}], {
// make sure we do not run into memory issues
allowDiskUse: true
})

Nested dictionary to database tables with sqlalchemy

Assuming a database with tables G, P, and C. A one-to-many relationship exists between G and P, as well between P and C.
Having now a nested dictionary:
db_entry = {
"G_key_1": {
"G_col_1": 11,
"G_col_2": 12,
"G_Ps": [
{
"P_col_1": 21,
"P_col_2": 22,
"P_Cs": [
{
"C_col_1": 31,
"C_col_2": 32,
}
]
}
]
}
}
Is there a nice way to add the db_entry nested dictionary to the DB? At the moment I just loop over all the children since there are only 3 levels in the dictionary. But for dictionaries with more than 3 levels it would not be possible any longer.

Categories