I am having a little problem updating a MongoDB document (using pymongo). I have found several answers for similar questions that didn't work out for me.
Background: I am crawling some websites and saving information to a MongoDB.
Assume I got the following document from a web page and stored in a MongoDB collection:
original_doc = {
'id': some_id,
'data': {
'key1': value1,
'key2': value2
}
}
After some time, I may want to crawl the same page again and get the following document:
new_doc = {
'id': some_id,
'data': {
'key2': new_value2,
'new_key3': new_value3
}
}
Now I want to update the already existing MongoDB document in the collection so it looks like this:
updated_doc = {
'id': some_id,
'data': {
'key1': value1,
'key2': new_value2,
'new_key3': new_value3
}
}
So basically the old document should be overwritten with the new document, but without erasing / losing data from the original document, that does not exist in the new document.
I first thought I could use the $set to update the document, but then the (key1, value1) entry gets lost. And I do not know the key of the new entry as I am not in control of the data returned by the website, so I can't just use {$set: {data.new_key3: new_doc}} either.
Is there a solution for this?
You should use _id as selector to update document. The query will be like following query...
db.collection.update({"_id" : ObjectId("55c789499dd5f5f78633da59") //add mongoId to match here},
{ $set:{"data.key2":"new_value2","data.new_key3":"new_value3"}})
This query will update existing document with new data. The mongoId will be same as old document.
Related
I'm working on this REST application in python Flask and a driver called pymongo. But if someone knows mongodb well he/she maybe able to answer my question.
Suppose Im inserting a new document in a collection say students. I want to get the whole inserted document as soon as the document is saved in the collection. Here is what i've tried so far.
res = db.students.insert_one({
"name": args["name"],
"surname": args["surname"],
"student_number": args["student_number"],
"course": args["course"],
"mark": args["mark"]
})
If i call:
print(res.inserted_id) ## i get the id
How can i get something like:
{
"name": "student1",
"surname": "surname1",
"mark": 78,
"course": "ML",
"student_number": 2
}
from the res object. Because if i print res i am getting <pymongo.results.InsertOneResult object at 0x00000203F96DCA80>
Put the data to be inserted into a dictionary variable; on insert, the variable will have the _id added by pymongo.
from pymongo import MongoClient
db = MongoClient()['mydatabase']
doc = {
"name": "name"
}
db.students.insert_one(doc)
print(doc)
prints:
{'name': 'name', '_id': ObjectId('60ce419c205a661d9f80ba23')}
Unfortunately, the commenters are correct. The PyMongo pattern doesn't specifically allow for what you are asking. You are expected to just use the inserted_id from the result and if you needed to get the full object from the collection later do a regular query operation afterwards
I want to update every field of my MongoDB collection using the field's own value to do so.
Example: if I have this document: "string": "foo", a possible update would do this: "string": $string.lower(). Here, $string would be "foo", but I don't know how to do this with PyMongo.
I've tried this:
user_collection.update_many({}, { "$set": { "word": my_func("$word")}})
Which replaces everything with "$word".
I've been able to do it successfully iterating each document but it takes too long.
As I know you can't find and update in one statement using python function. You can either use mongo query language:
user_collection.update_many({}, { "$set": {"name": { "$concat": ["$name", "_2"]}}})
or use separate functions of pymongo:
for obj in user_collection.find({some query here}):
user_collection.update({"_id": obj['_id']}, { "$set": {"name": my_func(obj['name']) } })
I have a MongoDB collection with various documents in it. Every tot seconds my Python scripts retrieves some data from an API, i want to update each document of the collection with the updated version of the document, so the entire collection has to be updated.
result = db.main_tst.insert_one(dic)
This is how i insert the data. Now instead of inserting dic, i should update it. How can i do it with Python in MongoDB? I know there is the update_many() method, but i've only found how to update a certain document, instead of the entire collection.
It should be simple :
Let's suppose if you consider below, it would update all matching documents where field name = 'N/A' to "No name" :
filterQuery = { 'name': 'N/A'}
updateQuery = { "$set": { "name": "No name" } }
result = mycol.update_many(filterQuery, updateQuery);
Where as for your requirement as you need to update all documents in a collection, all you've to do is pass empty {} in place of filter, means it should update all documents :
filterQuery = {}
updateQuery = { "$set": { "name": "No name" } }
result = mycol.update_many(filterQuery, updateQuery)
I'm attempting to create a web service using MongoDB and Flask (using the pymongo driver). A query to the database returns documents with the "_id" field included, of course. I don't want to send this to the client, so how do I remove it?
Here's a Flask route:
#app.route('/theobjects')
def index():
objects = db.collection.find()
return str(json.dumps({'results': list(objects)},
default = json_util.default,
indent = 4))
This returns:
{
"results": [
{
"whatever": {
"field1": "value",
"field2": "value",
},
"whatever2": {
"field3": "value"
},
...
"_id": {
"$oid": "..."
},
...
}
]}
I thought it was a dictionary and I could just delete the element before returning it:
del objects['_id']
But that returns a TypeError:
TypeError: 'Cursor' object does not support item deletion
So it isn't a dictionary, but something I have to iterate over with each result as a dictionary. So I try to do that with this code:
for object in objects:
del object['_id']
Each object dictionary looks the way I'd like it to now, but the objects cursor is empty. So I try to create a new dictionary and after deleting _id from each, add to a new dictionary that Flask will return:
new_object = {}
for object in objects:
for key, item in objects.items():
if key == '_id':
del object['_id']
new_object.update(object)
This just returns a dictionary with the first-level keys and nothing else.
So this is sort of a standard nested dictionaries problem, but I'm also shocked that MongoDB doesn't have a way to easily deal with this.
The MongoDB documentation explains that you can exclude _id with
{ _id : 0 }
But that does nothing with pymongo. The Pymongo documentation explains that you can list the fields you want returned, but "(“_id” will always be included)". Seriously? Is there no way around this? Is there something simple and stupid that I'm overlooking here?
To exclude the _id field in a find query in pymongo, you can use:
db.collection.find({}, {'_id': False})
The documentation is somewhat missleading on this as it says the _id field is always included. But you can exclude it like shown above.
Above answer fails if we want specific fields and still ignore _id. Use the following in such cases:
db.collection.find({'required_column_A':1,'required_col_B':1, '_id': False})
You are calling
del objects['_id']
on the cursor object!
The cursor object is obviously an iterable over the result set and not single
document that you can manipulate.
for obj in objects:
del obj['_id']
is likely what you want.
So your claim is completely wrong as the following code shows:
import pymongo
c = pymongo.Connection()
db = c['mydb']
db.foo.remove({})
db.foo.save({'foo' : 42})
for row in db.foo.find():
del row['_id']
print row
$ bin/python foo.py
> {u'foo': 42}
I was working couchdb-python ( http://code.google.com/p/couchdb-python/ ) and i was wondering if I have any way to retrieve a full list
of revisions that have occurred on a document level?
Suppose I have a database named "movies" and it contains several documents.
Each of my documents have more than 3 revisions.
Can I retrieve my documents based on the revisions?
If yes, how? I didn't see any obvious method to do it using CouchDB-Python
I am not sure about couchdb-python, however you can get the entire known revision history of a document via the HTTP API.
Learn all about it in the CouchDB Document API documentation.
A normal query:
$ curl jhs.couchone.com/db/doc
{ _id: 'doc',
_rev: '3-825cb35de44c433bfb2df415563a19de' }
Add ?revs=true to see an array of old revisions.
$ curl jhs.couchone.com/db/doc?revs=true
{ _id: 'doc',
_rev: '3-825cb35de44c433bfb2df415563a19de',
_revisions:
{ start: 3,
ids:
[ '825cb35de44c433bfb2df415563a19de',
'7051cbe5c8faecd085a3fa619e6e6337',
'967a00dff5e02add41819138abb3284d' ] } }
Also you can add ?revs_info=true for more details about the revisions, such as whether they are still available (i.e. they were added after the last compaction and you can fetch them).
$ curl jhs.couchone.com/db/doc?revs_info=true
{ _id: 'doc',
_rev: '3-825cb35de44c433bfb2df415563a19de',
_revs_info:
[ { rev: '3-825cb35de44c433bfb2df415563a19de',
status: 'available' },
{ rev: '2-7051cbe5c8faecd085a3fa619e6e6337',
status: 'available' },
{ rev: '1-967a00dff5e02add41819138abb3284d',
status: 'available' } ] }
The Database.revisions method may be what you want, http://code.google.com/p/couchdb-python/source/browse/couchdb/client.py#545.