I am using pymongo to insert documents in the mongodb.
here is code for router.py file
temp = db.admin_collection.find().sort( [("_id", -1)] ).limit(1)
for doc in temp:
admin_id = str(int(doc['_id']) + 1)
admin_doc ={
'_id' : admin_id,
'question' : ques,
'answer' : ans,
}
collection.insert(admin_doc)
what should i do so that at every insert _id is incremented by 1.
It doesn’t seem like a very good idea, but if you really want to go through with it you can try setup like below.
It should work good enough in a low traffic application with single server, but I wouldn't try anything like this with replicated or sharded enviroment or if you perform large amount of inserts.
Create separate collection to handle id seqs:
db.seqs.insert({
'collection' : 'admin_collection',
'id' : 0
})
Whenever you need to insert new document use something similar to this:
def insert_doc(doc):
doc['_id'] = str(db.seqs.find_and_modify(
query={ 'collection' : 'admin_collection' },
update={'$inc': {'id': 1}},
fields={'id': 1, '_id': 0},
new=True
).get('id'))
try:
db.admin_collection.insert(doc)
except pymongo.errors.DuplicateKeyError as e:
insert_doc(doc)
If you want to manually change the "_id" value you can do this by changing the _id value in the returned document. You could do this in a similar manner to the way you have proposed in your question. However I do not think this approach is advisable.
curs = db.admin_collection.find().sort( [("_id", -1)] ).limit(1)
for document in curs:
document['_id'] = str(int(document['_id']) + 1)
collection.insert(document)
It is generally not a good idea to manually make your own id's. These values have to be unique and there is no guarantee that str(int(document['_id']) + 1) will always be unique.
Instead if you want to duplicate the document you can delete the '_id' key and insert the document.
curs = db.admin_collection.find().sort( [("_id", -1)] ).limit(1)
for document in curs:
document.pop('_id',None)
collection.insert(document)
This inserts the document and allows mongo to generate the unique id.
Way late to this, but what about leaving the ObjectId alone and still adding a sequential id to use as a reference for calling the particular document(or props thereof) from a frontend api? I've been struggling to get the frontend to drop the "" on the ObjectId when fetching from the api.
Related
I have a problem for several hours and didnt find an answer yet. My Problem is that i want to delete a key/value pair in subdictionary.
Structure is the following in MongoDB :
'_id' : objectId
'title: 'title'
'words': {
'word1' : [pos0,pos1,pos2,pos3],
'word2' : [pos0,pos1,pos2,pos3],
.
.
.
}
When I run this query :
client = pymongo.MongoClient(f"MONGODB_CONNECTION_LINK")
db = client.database
cursor = db['templates'] #OR cursor = db['templates'][index]
query = { 'words': { f'{word}' : f'{pos}' }}
x = cursor.delete_many(query)
and print out the cursor i get the place back where the deleteResult Object is located. Also when i delete them and print out the return, it says that something was deleted.. But when i go to my database its still there.
Index is the object where i want to delete from. Like Index[0] should be the first Object in the database. But when i try to delete with or without Index its the same result.
BTW also tried some other querys but probably not the right one.
Thanks for help guys
[
There were 2 Problems inside my Code:
I didnt access the entries properly according to the Post:
MongoDB - finding entries using a nested dictionary
I saved my Word Values with dots. Therefore i allowed mongodb to store with dots. In this Case its not possible to access the Entries correctly.
Now i just added an new Regex:
regex3 = re.compile(r'[,.]')
re.sub(regex3,'',words.decode_contents()))
I think the Cursor just points to the expected Location of the data. Even if the location exists or not. You dont get back an Error or something like this. Same with the DeleteResult.
I'm a beginner in mongodb and pymongo and I'm working on a project where I have a students mongodb collection . What I want is to add a new field and specifically an adrress of a student to each element in my collection (the field is obviously added everywhere as null and will be filled by me later).
However when I try using this specific example to add a new field I get a the following syntax error:
client = MongoClient('mongodb://localhost:27017/') #connect to local mongodb
db = client['InfoSys'] #choose infosys database
students = db['Students']
students.update( { $set : {"address":1} } ) #set address field to every column (error happens here)
How can I fix this error?
You are using the update operation in wrong manner. Update operation is having the following syntax:
db.collection.update(
<query>,
<update>,
<options>
)
The main parameter <query> is not at all mentioned. It has to be at least empty like {}, In your case the following query will work:
db.students.update(
{}, // To update the all the documents.
{$set : {"address": 1}}, // Update the address field.
{multi: true} // To do multiple updates, otherwise Mongo will just update the first matching document.
)
So, in python, you can use update_many to achieve this. So, it will be like:
students.update_many(
{},
{"$set" : {"address": 1}}
)
You can read more about this operation here.
The previous answer here is spot on, but it looks like your question may relate more to PyMongo and how it manages updates to collections. https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html
According to the docs, it looks like you may want to use the 'update_many()' function. You will still need to make your query (all documents, in this case) as the first argument, and the second argument is the operation to perform on all records.
client = MongoClient('mongodb://localhost:27017/') #connect to local mongodb
db = client['InfoSys'] #choose infosys database
students = db['Students']
sudents.update_many({}, {$set : {"address":1}})
I solved my problem by iterating through every element in my collection and inserting the address field to each one.
cursor = students.find({})
for student in cursor :
students.update_one(student, {'$set': {'address': '1'}})
I currently have a dictionary with data being pulled from an API, where I have given each datapoint it's own variable (job_id, jobtitle, company etc.):
output = {
'ID': job_id,
'Title': jobtitle,
'Employer' : company,
'Employment type' : emptype,
'Fulltime' : tid,
'Deadline' : deadline,
'Link' : webpage
}
that I want to add to my database, easy enough:
db.jobs.insert_one(output)
but this is all in a for loop that will create 30-ish unique new documents, with names, titles, links and whatnot, this script will be run more than once, so what I would like for it to do is only insert the "output" as a document if it doesn't already exist in the database, all of these new documents do have their own unique ID's coming from the job_id variable am I able to check against that?
You need to try two things :
1) Doing .find() & if no document found for given job_id then writing to DB is a two way call - Instead you can have an unique-index on job_id field, that will throw an error if your operation tries to insert duplicate document (Having unique index is much more safer way to avoid duplicates, even helpful if your code logic fails).
2) If you've 30 dict's - You no need to iterate for 30 times & use insert_one to make 30 database calls, instead you can use insert_many which takes in an array of dict's & writes to database.
Note : By default all dict's are written in the order they're in the array, in case if a dict fails cause of duplicate error then insert_many fails at that point without inserting rest others, So to overcome this you need to pass an option
ordered=False that way all dictionaries will be inserted except duplicates.
EDIT:
replace
db.jobs.insert_one(output)
with
db.jobs.replace_one({'ID': job_id}, output, upsert=True)
ORIGINAL ANSWER with worked example:
Use replace_one() with upsert=True. You can run this multiple times and it will with insert if the ID isn't found or replace if it is found. It wasn't quite what you were asking as the data is always updated (so newer data will overwrite any existing data).
from pymongo import MongoClient
db = MongoClient()['mydatabase']
for i in range(30):
db.employer.replace_one({'ID': i},
{
'ID': i,
'Title': 'jobtitle',
'Employer' : 'company',
'Employment type' : 'emptype',
'Fulltime' : 'tid',
'Deadline' : 'deadline',
'Link' : 'webpage'
}, upsert=True)
# Should always print 30 regardless of number of times run.
print(db.employer.count_documents({}))
Let's take this simple collection col with 2 documents:
{
"_id" : ObjectId("5ca4bf475e7a8e4881ef9dd2"),
"timestamp" : 1551736800,
"score" : 10
}
{
"_id" : ObjectId("5ca4bf475e7a8e4881ef9dd3"),
"timestamp" : 1551737400,
"score" : 12
}
To access the last timestamp (the one of the second document), I first did this request
a = db['col'].find({}).sort("_id", -1)
and then a[0]['timestamp']
But as there will be a lot of documents in this collection, i think that it would be more efficient to request only the last one with the limit function, like
a = db['col'].find({}).sort("_id", -1).limit(1)
and then
for doc in a:
lastTimestamp = doc['timestamp']
as there will be only one, i can declare the variable inside the loop.
So three questions :
Do i have to worry about memory / speed issues if i continue to use the first request and get the first element in the dic ?
Is there a smarter way to access the first element of the cursor instead of using a loop, when using the limit request ?
Is there another way to get that timestamp that i don't know ?
Thanks !
Python 3.6 / Pymongo 3.7
If you are using any field with an unique index in the selection criteria, you should use find_one method which will return the only document that matches your query.
That being said, the find method returns a Cursor object and does not load the data into memory.
You might get a better performance if you where using a filter option. Your query as it is now will do a collection scan.
if you are not using a filter, and want to retrieve the last document, then the clean way is with the Python built-in next function. You could also use the next method.
cur = db["col"].find().sort({"_id": -1}).limit(1):
with cur:
doc = next(cur, None) # None when we have empty collection.
find().sort() is so fast and don't worry about the speed and it's the best access the first element of the cursor.
I am using pymongo "insert_one",
I want to prevent insertion of two documents with the same "name" attribute.
How do I generally prevent duplicates?
How do I config it for a specific attribute like name?
Thanks!
My code:
client = MongoClient('mongodb://localhost:8888/db')
db = client[<db>]
heights=db.heights
post_id= heights.insert_one({"name":"Tom","height":2}).inserted_id
try:
post_id2 = heights.insert_one({"name":"Tom","height":3}).inserted_id
except pymongo.errors.DuplicateKeyError, e:
print e.error_document
print post_id
print post_id2
output:
56aa7ad84f9dcee972e15fb7
56aa7ad84f9dcee972e15fb8
There is an answer for preventing addition of duplicate documents in mongoDB in general at How to stop insertion of Duplicate documents in a mongodb collection .
The idea is to use update with upsert=True instead of insert_one.
So while inserting the code for pymongo would be
db[collection_name].update(document,document,upsert=True)
You need to create an index that ensures the name is unique in that collection
e.g.
db.heights.create_index([('name', pymongo.ASCENDING)], unique=True)
Please see the official docs for further details and clarifying examples
This is your document
doc = {"key": val}
Then, use $set with your document to update
update = {"$set": doc} # it is important to use $set in your update
db[collection_name].update(document, update, upsert=True)