I am new to mongodb and am using it to store a nested document. E.g. Each document contains an array of students for each class. I am trying to update the information in each of the array nodes. Is there a better way to do it instead of updating each of the array element one at a time?
Here is my record in the collection -
{
“_id” : “23343” ,
“class” : “Physics”,
“students”: [
{ “id” : “2412” , “name” : "Alice", “mentor” : 0 },
{ “id” : “2413” , “name” : "Bob, “mentor” : 0 },
]
}
There are multiple records like this in the collection.
I have a list of these courses I need to update for each record. For example I get an array of students for the above record to update like this -
{
“_id” : “23343” ,
“class” : “Physics”,
“students”: [
{ “id” : “2412” , “name” : "Alice", “mentor” : "Mark" },
{ “id” : “2413” , “name” : "Bob, “mentor” : "Jackson" },
]
}
What is the best way to update the record?
I am using python. Intuitively,I can do a find() on the collection for the course. I get a cursor for that. I can do a for each in cursor. I believe mongodb updates the whole document on update().
for record in courseCollection.find():
recordId = record['_id']
updatedList = getUpdatedStudentList( record['students'])
updatedRecord = prepareUpdatedRecord(updatedList)
courseCollection.update( {'_id' : recordId}, updateList)
The pymongo documentation site does not talk about the set option in the update function. Unless I use that I believe mongodb updates the whole document.
Also calling update with a query option by passing in the _id seems unnecessary because I just did the query and have a handle to the record. Can I somehow use the cursor to do the update there by not do the query for the update again?
You can use the $ operator with $elemMatch in an update. Let's start by inserting your document:
collection.insert({
"_id": "23343",
"class": "Physics",
"students": [
{"id": "2412", "name": "Alice", "mentor": 0},
{"id": "2413", "name": "Bob", "mentor": 0}]})
Now I'll run two update statements, first adding the mentor "Mark", then "Jackson":
collection.update(
# Query portion.
{"_id": "23343", "students": {"$elemMatch": {"id": "2412"}}},
# Update portion; $ is the position of the matching student.
{"$set": {"students.$.mentor": "Mark"}})
collection.update(
# Query portion.
{"_id": "23343", "students": {"$elemMatch": {"id": "2413"}}},
{"$set": {"students.$.mentor": "Jackson"}})
Each update statement affects just the "mentor" field of one subdocument in the "students" array.
I am not so sure what the question is, exactly. In a nutshell: yes, you'll have to update the 'parent' object and yes, you can use $set or replace the entire document, which would be the default behavior. The difference is mostly a matter of locking, concurrency and ownership which is a bit complex. Here's a little more detail on some of your concerns:
Updating a nested mongodb collection using a cursor
Please note that there are no "nested collections", there are only embedded documents. That's important, because the first class citizen in mongodb is always the actual document itself. For instance, a find() will return documents, not subsets of embedded documents alone. You can do projections, but that's only an output transformation, so to speak.
I can do a find() on the collection for the course. I get a cursor for that.
You get a cursor, but since you're querying on the primary key there can only be a single match (primary keys are unique), i.e. you could use findOne() and you don't need to iterate the single result.
E.g. Each document contains an array of students for each class.
These should usually be references to the students, i.e. there should be a separate students collection because you don't want to lose the student because it was temporarily not assigned to any course.
The pymongo documentation site does not talk about the set option in the update function. Unless I use that I believe mongodb updates the whole document.
That's true. You can do a $set on the students array of a document. That avoids overwriting any other fields, such as class. On the other hand, if somebody else has changed the class while your using was editing the students, do the updates still make sense? Unclear ownership is my primary concern with embedded documents.
Also calling update with a query option by passing in the _id seems unnecessary because I just did the query and have a handle to the record
...but what exactly is a handle to the record? A handle is an immutable, unique and usually short identifier. Just like the id. The _id is the handle. I don't know python, but I guess you could write a method that takes a pointer to a database object and performs an update, knowing that every database object must have a field called _id. But from the database's perspective, a pointer in your code is not a handle, but the id is.
Related
Is it possible to update a document based on the condition that the value of a field in that document matches a value that I provide?
I can easily do this in two steps but was wondering if there was a way to do it in a single call to MongoAtlas.
It is a single step execution. You don't need two steps.
Reference
And Python implementation is same syntax as plain queries.
db.collection.update_one({
filter condition goes here
},{
'$set': {
update fields goes here
}
}, upsert=False) //Options
Using collection.insert_one(json_dict)inserts a new collection
Can I do collection.insert_one() on an already existing object so it then updates.
My object will look something like:
"_id": 1,
"name": "Bob"
"Age": "57"
Then under "Age" I want to add "Location": "New York', how'd I do that using PyMongo
I you want to add new field to existing document, you need to update it.
There is a function collection.update_one(query, new_values). First argument is query to match existing document and second argument is update document, which will contain update operation. In your case, it would be $set. Read more about update_one here. So, final operation will look like this.
collection.update_one({"_id": 1}, {"$set": {"Location": "New York"}})
It will find document with _id 1 and set Location field or update it if already exists.
You can use the update method with upsert: true to achieve this.
Normally, update changes an existing document. But with upsert: true - if no matching document is found for the update, a new document is created.
Docs: https://docs.mongodb.com/manual/reference/method/db.collection.update/#update-upsert
I have a large json file with a collected of structured location data (coordinates, category etc.) and I need to create key-value pairs in redis for further querying. I have been able to find bits of information here and there but I am struggling to put it all together into something usable.
This is the general structure of the json;
{
"_id" : NumberInt(83412),
"contact" : {
"GooglePlaces" : null,
"Foursquare" : "https://foursquare.com/v/caf%C3%A9-vavin/4adcda06f964a520eb3221e3"
},
"name" : "Café Vavin",
"location" : {
"city" : "Paris",
"coord" : {
"coordinates" : [
2.3307001590729,
48.843868593865
],
"type" : "Point"
},
"address" : "18 rue Vavin"
}
Given that your Json data is nested/multi-level you cannot store them in a Redis Hash directly. So you'd have to serialise the value and store it as a string (other formats would also work, but let's use string for simplicity to start with). This would mean that when you get the value from Redis you'd have to deserialise the Json (stored as string) to the appropriate object in your code. Hence, the key would be the id and the value would be the rest of the Json as a string.
Now, moving on to loading the huge Json into Redis, it is a two-step process:
Parse your input Json file and break it into the format that can be used for 'Redis Mass Insertion' - https://redis.io/topics/mass-insert. You have to extract the value of _id and use it as the key and extract the rest of the Json and use it as the value as per the format shown in the link above.
Run Redis mass insertion with the contents from the previous step as the input. You can add each of the input into a single set in Redis.
Now you can run get queries on the set if you have the id. Remember, Redis is an in-memory cache which is tuned for faster storage and retrieval. You cannot run complex queries on Redis unless you store/duplicate the data in different ways for different querying needs.
Feel free to comment on this answer in case you want me to add more details. There's only so much I could reply in the comments section of the question. Hence I consolidated everything into this answer. I can add more to this if you want me to address anything specific.
More references and samples for mass insert:
How to insert Billion of data to Redis efficiently?
https://gist.github.com/Squab/52d42652719cc28451d7
Let's say I have an aggregation pipeline that for now leads to a collection with documents built like this:
{'name': 'Paul',
'football_position': 'Keeper',
'basketball_position': 4,...}
Obviously not everyone plays every sport so for some documents there would be fields that do not exist. The document regarding them would then be
{'name': 'Louis'}
What I want to do is to filter people that play at least one sport, inside my aggregation pipeline
I know that this is easy to check for one field with {'$match': {'football_position': {'$exists': True}}}, but I want to check if any of these fields exist.
I found an old question a bit similar (Check for existence of multiple fields in MongoDB document) but it checks for the existence of all fields -which, while bothersome, could be attained by the multiplication of multiples $match operations. Plus, maybe mongoDB has now a better way to handle this than writing a custom JavaScript function.
maybe mongoDB has now a better way to handle this
Yes, you can now utilise an aggregation operator $objectToArray (SERVER-23310) to turn keys into values. It should be able to count 'dynamic' number of fields. Combining this operator with $addFields could be quite useful.
Both operators are available in MongoDB v3.4.4+
Using your documents above as example:
db.sports.aggregate([
{ $addFields :
{ "numFields" :
{ $size:
{ $objectToArray:"$$ROOT"}
}
}
},
{ $match:
{ numFields:
{$gt:2}
}
}
])
The aggregation pipeline above, will first add a field called numFields. The value would be the size of an array. The array would contain the number of fields in the document. The second stage would filter only for 2 fields and greater (two fields because there's still _id field plus name).
In PyMongo, the above aggregation pipeline would look like:
cursor = collection.aggregate([
{"$addFields":{"numFields":
{"$size":{"$objectToArray":"$$ROOT"}}}},
{"$match":{"numFields":{"$gt":2}}}
])
Having said the above, if possible for your use case, I would suggest to reconsider your data models for easier access. i.e. Add a new field to keep track of number of sports when a new sport position is inserted/added.
Let's say I have a mongodb collection of the following layout:
{'number':1, '_id':...}
{'number':2, '_id':...}
{'number':4, '_id':...}
and so on. As demonstrated, not all the numbers currently present have to be consecutive.
I want to write code which (a) determines what is the highest value for number found in collection and then (b) inserts a new document whose value for number is 1 higher than the current largest.
So if this is the only code that operates on the collection, no particular value for number should be duplicated. The issue is that, done naively, this creates a race condition where two threads of this code running in parallel might find the same highest value and then insert the same next highest number twice.
So how would I do this atomically? I'm working in Python, so I would prefer a solution in that language, but I will accept an answer that explains the concept in a way that can be adapted to any language.
MongoEngine does what you're looking for in its SequenceField.
Create a new collection called indexes. This collection will look like this:
[
{ '_id': 'mydata.number', 'next': 5 }
]
Whenever you'd like to get and set the next index, you simply use the following statement:
counter = collection.find_and_modify(
query = { '_id': 'mydata.number' },
update = { '$inc': { 'next': 1 } },
new = True,
upsert = True)
What this does is it finds and updates the sequence atomically in MongoDB and retrieves the next number. If the sequence doesn't exist, it is generated.
Thus, whenever you want to insert a new value into your collection, call the code above. If you want to maintain multple indexes across different collections and their fields, simply modify mydata.number to be another string referencing your "index."
There is no clean transactional way to do this in MongoDB. This is why there is the ObjectID datatype. http://api.mongodb.org/python/current/api/bson/objectid.html
Or you can utilize a unique key inside python using something like UUID: https://docs.python.org/2/library/uuid.html