How to match a mongodb subarray based on main document id? - python

I have the following pipeline in my pymongo script:
pipeline = [
{'$match': {'_id': '123456'}},
{'$lookup': {
'from': 'Case',
'localField': '_id',
'foreignField': 'RecordList.Record._id',
'as': 'CaseInfo'}
},
{'$unwind':'$CaseInfo'},
{'$unwind':'$CaseInfo.RecordList'},
{'$unwind':'$CaseInfo.RecordList.Record'},
{'$match': {'CaseInfo.RecordList.Record._id': '123456'}}
]
I need to change the last line of code so that I don't need to specify the document id manually, but take it from the initial document.
I have tried the following with no luck:
{'$match': {'CaseInfo.RecordList.Record._id': '_id'}}
{'$match': {'CaseInfo.RecordList.Record._id': '$_id'}}
Could you please help me?
Also, is this the most efficient way to accomplish this, or should I be using $project? I tried using it, but I don't know the structure of the document and I need every field in the documents. I'm not sure if there's a way to not specify a "1" in the $filter operator (since I don't know the key names)
Thanks in advance

In 3.6 version, you can change the last line to
{'$match': {$expr:{$eq:['$CaseInfo.RecordList.Record._id', '$_id']}}}.
Alternatively you can rewrite the aggregation to use $lookup with pipeline variant.
Something like
[
{"$match":{"_id":"123456"}},
{"$lookup":{
"from":"Case",
"let":{"_id":"$_id"},
"pipeline":[
{"$unwind":"$RecordList"},
{"$unwind":"$RecordList.Record"},
{"$match":{"$expr":{"$eq":["$RecordList.Record._id","$$_id"]}}}
],
"as":"CaseInfo"
}},
{"$unwind":"$CaseInfo"}
]

Related

How to get the rows extracted if few columns are blank or empty

I am new to this and would need some help in extracting the records/row only when few columns are blank. Below code is ignoring the blank records and getting me the ones with value. Can someone suggest here ?
mongo_docs = mongo.db.user.find({"$and":[{"Param1": {"$ne":None}}, {"Param1": {"$ne": ""}}]})
The query you are using contains ne which stands for not-equal. You can change that to eq (equals) and check if you get the desired results.
mongo_docs = mongo.db.user.find({"$or":[{"Param1": {"$eq":None}}, {"Param1": {"$eq": ""}}]})
A simplification of the above code will also be:
mongo_docs = mongo.db.user.find({"$or":[{"Param1": {"$eq":None, "$eq": ""}}]})
You can also use the exists query if that better satisfies your requirement.
From the comments: You are absolutely right, the command will work with multiple parameters.
mongo_docs = mongo.db.user.find({"$or":[{"Param1": {"$eq":None, "$eq": ""}}, {"Param2": {"$eq":None, "$eq": ""}},{"Param3": {"$eq":None, "$eq": ""}}]})
Additionally, if you want to do it over a larger range, you can consider using text indexes. These will allow you to search all the test fields at once, and the code for the same should look something like this:
mongo.db.user.createIndex({Param1: "text", Param2: "text", Param3: "text"})
mongo.db.user.find({$text: {$search: ""}})
The above works for text content only, and I have not come across any implementation for integer values yet but cannot see why the same should not work with some minor changes using other wildcards.
References:
Search multiple fields for multiple values in MongoDB
https://docs.mongodb.com/manual/text-search/
https://docs.mongodb.com/manual/core/index-text/#std-label-text-index-compound
https://docs.mongodb.com/manual/core/index-wildcard/

Python jsonpath-ng : How to build a json document from jsonpath and values?

Developing in python, I would like to create a json document (tree) from a list of jsonpath and corresponding values.
For example, from the jsonpath
"$.orders[0].client"
and a value "acme", it would create:
{'orders':[{'client':'acme'}]}
And then, I should be able to add a new node in the json document, at the location specified by another jsonpath with a given value.
For example adding jsonpath $.orders[0].orderid and value "123", this would result in an updated json document:
{'orders':[{'client':'acme', 'orderid' : 123}]}
I have tried to understand jsonpath-ng for that purpose, but I don't understand how I could use it for that purpose (and not even if this is possible).
Anyone who could help?
Thanks in advance,
Best regards,
Laurent
If I understand your question correctly you are asking for a JSON to JSON transformation. JSONPath is not the right tool then. There are specialized libraries for this, e.g. JSONBender. You can transform some JSON like this:
import json
from jsonbender import bend, K, S
MAPPING = {
'client': S('orders', 0, 'client'),
'orderid': S('orders', 0, 'orderid')
}
source = {
"orders" : [
{
"orderid": 123,
"client" : "acme"
},
{
"orderid": 321,
"client" : "two"
}
]
}
result = bend(MAPPING, source)
print(json.dumps(result))
{"client": "acme", "orderid": 123}
You can certainly bend the result even further but simpler output might work even better.

Delete randomly selected N documents in a collection (MongoDB)

Could anybody please tell me what's the elegant way you would delete N randomly selected documents in a collection in a MongoDB database (through Python ideally)? I would like to use somewhat concise like this
db.users.remove({ $sample: { size: N } })
But this one doesn't parse and I couldn't find a working alternative anywhere else . Many thanks!
use aggregation to get your sample and store _id values to a list:
list_of_ids=list(db.users.aggregate([{'$sample': {'size': 10 }}, {'$project' : {'_id' : 1}} ]))
use delete_many to drop sample documents
results = db.users.delete_many({'_id: {'$in': list_of_ids}})
(*) make sure to check here for limitations of $sample

An efficient way to update dictionary in MongoDB?

I have this MongoDb schema:
tags:{
"image_uid":"",
"faces": [
{
"image_uid":"",
"age_real":""
}
]}
witch I update with a dictionary
feedbacks = [{
'face_uid': '02d42dee-3b66-11e2-b12e-e0cb4e12150c',
'age': 23
},
{
'face_uid': '02d42dee-3b66-11e2-b12e-e0cb4e12150d',
'age': 23
}]
in this way:
for feedback in feedbacks:
tags.update(
{'image_uid': image_uid, 'faces.face_uid': feedback['face_uid']},
{"$set": {'faces.$.age_real': feedback['age']}}, w=1
)
There is a more efficient way instead of the for loop?
Currently MongoDB offers no support for updating multiple array elements at once. However, instead of performing several updates in sequence, you might choose to use the Update if Current pattern, or something similar, to update your document locally and then replace it on the DB.
Also, check the original jira, where you can find a couple work-arounds in the comments.

What is the best way to structure embedded docs in MongoDB?

I have a document layout like this:
Program = {
'_id':ObjectId('4321...'),
'Title':'The Title',
'Episodes':[
{
'Ep_ID':'234122', # this is unique
'Title': 'Ep1',
'Duration':45.2 },
'Ep_ID':'342343' # unique
'Title': 'Ep2',
'Duration':32.3 }
]
}
What I would like to do is at another embedded doc within each Episode, like this:
Program = {
'_id':ObjectId('4321...'),
'Title':'The Title',
'Episodes':[
{
'Ep_ID':'234122' # this is unique
'Title': 'Ep1',
'Duration':45.2,
'FileAssets':[
{ 'FileName':'video1.mov', 'FileSize':2348579234 },
{ 'FileName':'video2.mov', 'FileSize':32343233 }
]
},
{
'Ep_ID':'342343' # unique
'Title': 'Ep2',
'Duration':32.3,
'FileAssets':[
{ 'FileName':'video1.mov', 'FileSize':12423773 },
{ 'FileName':'video2.mov', 'FileSize':456322 }
]
}
]
}
However, I can't figure out how to add/mod/del a doc at that '3rd' level. Is it possible or even good design? I would dearly love to have all the data in one doc, but managing is starting to seem too complex.
The other thought I had was to use the unique values that happen to exist for the sub-docs as keys. I've thought about my sub-docs and they all have some kind of unique value. So I could do this:
Program = {
'_id':ObjectId('4321...'),
'Title':'The Title',
'Ep_ID_234122':{episode data},
'Ep_ID_342343':{episode data},
'FileAsset_BigRaid_Video1.mov':{'Ep_ID_234122', + other file asset data},
'FileAsset_BigRaid_video2.mov':{'Ep_ID_234122', + other file asset data}
}
Any thoughts would be great!
Yes, you can definitely structure your data to have that kind of nesting. Whats more, you definitely shouldn't need to do anything special to accomplish it (at least using pymongo). just get your docs with an update cursor if you need to update existing documents, if that's your problem?
at least, for your first idea. your second idea for a schema is not at all a nice way to structure that data. for one, it'll be impossible for you to easily iterate over subsets of a Program document without doing string matching on the keys, and that will get expensive.
That said, I'm currently dealing with some major MongoDB performance issues, so I would probably recommend you keep your file assets in a separate collection. it will make it easier for you to scale later, if you plan for this data set to become large.

Categories