An efficient way to update dictionary in MongoDB? - python

I have this MongoDb schema:
tags:{
"image_uid":"",
"faces": [
{
"image_uid":"",
"age_real":""
}
]}
witch I update with a dictionary
feedbacks = [{
'face_uid': '02d42dee-3b66-11e2-b12e-e0cb4e12150c',
'age': 23
},
{
'face_uid': '02d42dee-3b66-11e2-b12e-e0cb4e12150d',
'age': 23
}]
in this way:
for feedback in feedbacks:
tags.update(
{'image_uid': image_uid, 'faces.face_uid': feedback['face_uid']},
{"$set": {'faces.$.age_real': feedback['age']}}, w=1
)
There is a more efficient way instead of the for loop?

Currently MongoDB offers no support for updating multiple array elements at once. However, instead of performing several updates in sequence, you might choose to use the Update if Current pattern, or something similar, to update your document locally and then replace it on the DB.
Also, check the original jira, where you can find a couple work-arounds in the comments.

Related

How to match a mongodb subarray based on main document id?

I have the following pipeline in my pymongo script:
pipeline = [
{'$match': {'_id': '123456'}},
{'$lookup': {
'from': 'Case',
'localField': '_id',
'foreignField': 'RecordList.Record._id',
'as': 'CaseInfo'}
},
{'$unwind':'$CaseInfo'},
{'$unwind':'$CaseInfo.RecordList'},
{'$unwind':'$CaseInfo.RecordList.Record'},
{'$match': {'CaseInfo.RecordList.Record._id': '123456'}}
]
I need to change the last line of code so that I don't need to specify the document id manually, but take it from the initial document.
I have tried the following with no luck:
{'$match': {'CaseInfo.RecordList.Record._id': '_id'}}
{'$match': {'CaseInfo.RecordList.Record._id': '$_id'}}
Could you please help me?
Also, is this the most efficient way to accomplish this, or should I be using $project? I tried using it, but I don't know the structure of the document and I need every field in the documents. I'm not sure if there's a way to not specify a "1" in the $filter operator (since I don't know the key names)
Thanks in advance
In 3.6 version, you can change the last line to
{'$match': {$expr:{$eq:['$CaseInfo.RecordList.Record._id', '$_id']}}}.
Alternatively you can rewrite the aggregation to use $lookup with pipeline variant.
Something like
[
{"$match":{"_id":"123456"}},
{"$lookup":{
"from":"Case",
"let":{"_id":"$_id"},
"pipeline":[
{"$unwind":"$RecordList"},
{"$unwind":"$RecordList.Record"},
{"$match":{"$expr":{"$eq":["$RecordList.Record._id","$$_id"]}}}
],
"as":"CaseInfo"
}},
{"$unwind":"$CaseInfo"}
]

Delete randomly selected N documents in a collection (MongoDB)

Could anybody please tell me what's the elegant way you would delete N randomly selected documents in a collection in a MongoDB database (through Python ideally)? I would like to use somewhat concise like this
db.users.remove({ $sample: { size: N } })
But this one doesn't parse and I couldn't find a working alternative anywhere else . Many thanks!
use aggregation to get your sample and store _id values to a list:
list_of_ids=list(db.users.aggregate([{'$sample': {'size': 10 }}, {'$project' : {'_id' : 1}} ]))
use delete_many to drop sample documents
results = db.users.delete_many({'_id: {'$in': list_of_ids}})
(*) make sure to check here for limitations of $sample

Multiple FOR loops in iterating over dictionary in Python

This is a simplistic example of a dictionary created by a json.load that I have t deal with:
{
"name": "USGS REST Services Query",
"queryInfo": {
"timePeriod": "PT4H",
"format": "json",
"data": {
"sites": [{
"id": "03198000",
"params": "[00060, 00065]"
},
{
"id": "03195000",
"params": "[00060, 00065]"
}]
}
}
}
Sometimes there may be 15-100 sites with unknown sets of parameters at each site. My goal is to either create two lists (one storing "site" IDs and the other storing "params") or a much simplified dictionary from this original dictionary. Is there a way to do this using nested for loops with kay,value pairs using the iteritem() method?
What I have tried to far is this:
queryDict = {}
for key,value in WS_Req_dict.iteritems():
if key == "queryInfo":
if value == "data":
for key, value in WS_Req_dict[key][value].iteritems():
if key == "sites":
siteVal = key
if value == "params":
paramList = [value]
queryDict["sites"] = siteVal
queryDict["sites"]["params"] = paramList
I run into trouble getting the second FOR loop to work. I haven't looked into pulling out lists yet.
I think this maybe an overall stupid way of doing it, but I can't see around it yet.
I think you can make your code much simpler by just indexing, when feasible, rather than looping over iteritems.
for site in WS_Req_dict['queryInfo']['data']['sites']:
queryDict[site['id']] = site['params']
If some of the keys might be missing, dict's get method is your friend:
for site in WS_Req_dict.get('queryInfo',{}).get('data',{}).get('sites',[]):
would let you quietly ignore missing keys. But, this is much less readable, so, if I needed it, I'd encapsulate it into a function -- and often you may not need this level of precaution! (Another good alternative is a try/except KeyError encapsulation to ignore missing keys, if they are indeed possible in your specific use case).

Mongodb: How to change an element of a nested arrary?

From what I have read it is impossible to update an element in an nested array using the positional operator $ in mongo. The $ only works one level deep. I see it is a requested feature in mongo 2.7.
Updating the whole document one level up is not an option because of write conflicts. I need to just be able to change the 'username' for a particular reward program for instance.
One of the ideas would to be pull, modify, and push the entire 'reward_programs' element but then I would loose the order. Order is important.
Consider this document:
{
"_id:"0,
"firstname":"Tom",
"profiles" : [
{
"profile_name": "tom",
"reward_programs:[
{
'program_name':'American',
'username':'tomdoe',
},
{
'program_name':'Delta',
'username':'tomdoe',
}
]
}
]
}
How would you go about specifically changing the 'username' of 'program_name'=Delta?
After doing more reading it looks like this is unsupported in mongodb at the moment. Positional updates are only supported for one level deep. The feature might be added for mongodb 2.7.
The are a couple of work arounds.
1) Flatten out your database structure. In this case, make 'reward_programs' it's own collection and do your operation on that.
2) Instead of arrays of dicts, use dicts of dicts. That way you can just have an absolute path down to the object you need to modify. This can have drawbacks to query flexibility.
3) Seems hacky to me but you can also walk the list on the nested array find it's position index in the array and do something like this:
users.update({'_id': request._id, 'profiles.profile_name': profile_name}, {'$set': {'profiles.$.reward_programs.{}.username'.format(index): new_username}})
4) Read in the whole document, modify, write back. However, this has possible write conflicts
Setting up your database structure initially is extremely important. It really depends on how you are going to use it.
A simple way to do this:
doc = collection.find_one({'_id': 0})
doc['profiles'][0]["reward_programs"][1]['username'] = 'new user name'
#replace the whole collection
collection.save(doc)

What is the best way to structure embedded docs in MongoDB?

I have a document layout like this:
Program = {
'_id':ObjectId('4321...'),
'Title':'The Title',
'Episodes':[
{
'Ep_ID':'234122', # this is unique
'Title': 'Ep1',
'Duration':45.2 },
'Ep_ID':'342343' # unique
'Title': 'Ep2',
'Duration':32.3 }
]
}
What I would like to do is at another embedded doc within each Episode, like this:
Program = {
'_id':ObjectId('4321...'),
'Title':'The Title',
'Episodes':[
{
'Ep_ID':'234122' # this is unique
'Title': 'Ep1',
'Duration':45.2,
'FileAssets':[
{ 'FileName':'video1.mov', 'FileSize':2348579234 },
{ 'FileName':'video2.mov', 'FileSize':32343233 }
]
},
{
'Ep_ID':'342343' # unique
'Title': 'Ep2',
'Duration':32.3,
'FileAssets':[
{ 'FileName':'video1.mov', 'FileSize':12423773 },
{ 'FileName':'video2.mov', 'FileSize':456322 }
]
}
]
}
However, I can't figure out how to add/mod/del a doc at that '3rd' level. Is it possible or even good design? I would dearly love to have all the data in one doc, but managing is starting to seem too complex.
The other thought I had was to use the unique values that happen to exist for the sub-docs as keys. I've thought about my sub-docs and they all have some kind of unique value. So I could do this:
Program = {
'_id':ObjectId('4321...'),
'Title':'The Title',
'Ep_ID_234122':{episode data},
'Ep_ID_342343':{episode data},
'FileAsset_BigRaid_Video1.mov':{'Ep_ID_234122', + other file asset data},
'FileAsset_BigRaid_video2.mov':{'Ep_ID_234122', + other file asset data}
}
Any thoughts would be great!
Yes, you can definitely structure your data to have that kind of nesting. Whats more, you definitely shouldn't need to do anything special to accomplish it (at least using pymongo). just get your docs with an update cursor if you need to update existing documents, if that's your problem?
at least, for your first idea. your second idea for a schema is not at all a nice way to structure that data. for one, it'll be impossible for you to easily iterate over subsets of a Program document without doing string matching on the keys, and that will get expensive.
That said, I'm currently dealing with some major MongoDB performance issues, so I would probably recommend you keep your file assets in a separate collection. it will make it easier for you to scale later, if you plan for this data set to become large.

Categories