To find the _id of document other than during insertion

To find the _id of document other than during insertion - python

I have created several documents and inserted into Mongo DB.I am using python for the same . Is there a way where in I can get the _id of a particular record?
I know that we get the _id during insertion. But if i need to use it at a lateral interval is there a way I can get it by say using the find() command?

You can user the projection to get particular field from the document like this
db.collection.find({query},{_id:1}) this will return only _id
http://docs.mongodb.org/manual/reference/method/db.collection.find/

In python, you can use the find_one() method to get the document and access the _id property as following:
def get_id(value):
# Get the document with the record:
document = client.db.collection.find_one({'field': value}, {'_id': True})
return document['_id']

When you insert the record, you can specify the _id value. This has added benefits for when you're using ReplicaSets as well.
from pymongo.objectid import ObjectId
client.db.collection.insert({'_id': ObjectId(), 'key1': value....})
You could store those Ids in a list and use it later on if your requirement for needing the _id occurs immediately after insert.

Related

Pymongo BulkWriteResult doesn't contain upserted_ids

Okey so currently I'm trying to upsert something in a local mongodb using pymongo.(I check to see if the document is in the db and if it is, update it, otherwise just insert it)
I'm using bulk_write to do that, and everything is working ok. The data is inserted/updated.
However, i would need the ids of the newly inserted/updated documents but the "upserted_ids" in the bulkWriteResult object is empty, even if it states that it inserted 14 documents.
I've added this screenshot with the variable. Is it a bug? or is there something i'm not aware of?
Finally, is there a way of getting the ids of the documents without actually searching for them in the db? (If possible, I would prefer to use bulk_write)
Thank you for your time.
EDIT:
As suggested, i added a part of the code so it's easier to get the general ideea:
for name in input_list:
if name not in stored_names: #completely new entry (both name and package)
operations.append(InsertOne({"name": name, "package" : [package_name]}))
if len(operations) == 0:
print ("## No new permissions to insert")
return
bulkWriteResult = _db_insert_bulk(collection_name,operations)
and the insert function:
def _db_insert_bulk(collection_name,operations_list):
return db[collection_name].bulk_write(operations_list)

The upserted_ids field in the pymongo BulkWriteResult only contains the ids of the records that have been inserted as part of an upsert operation, e.g. an UpdateOne or ReplaceOne with the upsert=True parameter set.
As you are performing InsertOne which doesn't have an upsert option, the upserted_ids list will be empty.
The lack of an inserted_ids field in pymongo's BulkWriteResult in an omission in the drivers; technically it conforms to crud specificaiton mentioned in D. SM's answer as it is annotated as "Drivers may choose to not provide this property.".
But ... there is an answer. If you are only doing inserts as part of your bulk update (and not mixed bulk operations), just use insert_many(). It is just as efficient as a bulk write and, crucially, does provide the inserted_ids value in the InsertManyResult object.
from pymongo import MongoClient
db = MongoClient()['mydatabase']
inserts = [{'foo': 'bar'}]
result = db.test.insert_many(inserts, ordered=False)
print(result.inserted_ids)
Prints:
[ObjectId('5fb92cafbe8be8a43bd1bde0')]

This functionality is part of crud specification and should be implemented by compliant drivers including pymongo. Reference pymongo documentation for correct usage.
Example in Ruby:
irb(main):003:0> c.bulk_write([insert_one:{a:1}])
=> #<Mongo::BulkWrite::Result:0x00005579c42d7dd0 #results={"n_inserted"=>1, "n"=>1, "inserted_ids"=>[BSON::ObjectId('5fb7e4b12c97a60f255eb590')]}>
Your output shows that zero documents were upserted, therefore there wouldn't be any ids associated with the upserted documents.
Your code doesn't appear to show any upserts at all, which again means you won't see any upserted ids.

Can't delete mongodb document using pymongo

I'm trying to delete one specific document using (_id) with pymongo and i can't do it, some idea..
thks.
I have this code:
s = "ISODate('{0}')".format(nom_fitxer_clean)
#i generate the next string.. (ISODate('2018-11-07 00:00:00'))
myquery = { "_id": s }
#query string ({'_id': "ISODate('2018-10-07 00:00:00')"})
mycol.delete_one(myquery)
I do not get any errors or delete the document.
UPDATE:
Document

I think one possible solution could be to replace ISODate with ObjectId in your query string.
Moreover, delete_one deletes the first object which matches with your query. So it is possible that there exist multiple objects which match your query?

How to prevent duplicates in pymongo

I am using pymongo "insert_one",
I want to prevent insertion of two documents with the same "name" attribute.
How do I generally prevent duplicates?
How do I config it for a specific attribute like name?
Thanks!
My code:
client = MongoClient('mongodb://localhost:8888/db')
db = client[<db>]
heights=db.heights
post_id= heights.insert_one({"name":"Tom","height":2}).inserted_id
try:
post_id2 = heights.insert_one({"name":"Tom","height":3}).inserted_id
except pymongo.errors.DuplicateKeyError, e:
print e.error_document
print post_id
print post_id2
output:
56aa7ad84f9dcee972e15fb7
56aa7ad84f9dcee972e15fb8

There is an answer for preventing addition of duplicate documents in mongoDB in general at How to stop insertion of Duplicate documents in a mongodb collection .
The idea is to use update with upsert=True instead of insert_one.
So while inserting the code for pymongo would be
db[collection_name].update(document,document,upsert=True)

You need to create an index that ensures the name is unique in that collection
e.g.
db.heights.create_index([('name', pymongo.ASCENDING)], unique=True)
Please see the official docs for further details and clarifying examples

This is your document
doc = {"key": val}
Then, use $set with your document to update
update = {"$set": doc} # it is important to use $set in your update
db[collection_name].update(document, update, upsert=True)

Python-Eve: Prevent inserting duplicates without using unique fields

I am trying to prevent inserting duplicate documents by the following approach:
Get a list of all documents from the desired endpoint which will contain all the documents in JSON-format. This list is called available_docs.
Use a pre_POST_<endpoint> hook in order to handle the request before inserting to the data. I am not using the on_insert hook since I need to do this before validation.
Since we can access the request object use request.json to get the payload JSON-formatted
Check if request.json is already contained in available_docs
Insert new document if it's not a duplicate only, abort otherwise.
Using this approach I got the following snippet:
def check_duplicate(request):
if not request.json in available_sims:
print('Not a duplicate')
else:
print('Duplicate')
flask.abort(422, description='Document is a duplicate and already in database.')
The available_docs list looks like this:
available_docs = [{'foo': ObjectId('565e12c58b724d7884cd02bb'), 'bar': [ObjectId('565e12c58b724d7884cd02b9'), ObjectId('565e12c58b724d7884cd02ba')]}]
The payload request.json looks like this:
{'foo': '565e12c58b724d7884cd02bb', 'bar': ['565e12c58b724d7884cd02b9', '565e12c58b724d7884cd02ba']}
As you can see, the only difference between the document which was passed to the API and the document already stored in the DB is the datatype of the IDs. Due to that fact, the if-statement in my above snippet evaluates to True and judges the document to be inserted not being a duplicate whereas it definitely is a duplicate.
Is there a way to check if a passed document is already in the database? I am not able to use unique fields since the combination of all document fields needs to be unique only. There is an unique identifier (which I left out in this example), but this is not suitable for the desired comparison since it is kind of a time stamp.
I think something like casting the given IDs at the keys foo and bar as ObjectIDs would do the trick, but I do not know how to to this since I do not know where to get the datatype ObjectID from.

You approach would be much slower than setting a unique rule for the field.
Since, from your example, you are going to compare objectids, can't you simply use those as the _id field for the collection? In Mongo (and Eve of course) that field is unique by default. Actually, you typically don't even define it. You would not need to do anything at all, as a POST of a document with an already existing id would fail right away.
If you can't go that way (maybe you need to compare a different objectid field and still, for some reason, you can't simply set a unique rule for the field), I would look at querying the db for the field value instead than getting all the documents from the db and then scanning them sequentially in code. Something like db.find({db_field: new_document_field_value}). If that returns true, new document is a duplicate. Make sure db_field is indexed (which usually holds true also for fields tagged with unique rule)
EDIT after the comments. A trivial implementation would probable be something like this:
def pre_POST_callback(resource, request):
# retrieve mongodb collection using eve connection
docs = app.data.driver.db['docs']
if docs.find_one({'foo': <value>}):
flask.abort(422, description='Document is a duplicate and already in database.')
app = Eve()
app.run()

Here's my approach on preventing duplicate records:
def on_insert_subscription(items):
c_subscription = app.data.driver.db['subscription']
user = decode_token()
if user:
for item in items:
if c_subscription.find_one({
'topic': ObjectId(item['topic']),
'client': ObjectId(user['user_id'])
}):
abort(422, description="Client already subscribed to this topic")
else:
item['client'] = ObjectId(user['user_id'])
else:
abort(401, description='Please provide proper credentials')
What I'm doing here is creating subscriptions for clients. If a client is already subscribed to a topic I throw 422.
Note: the client ID is decoded from the JWT token.

sqlalchemy core integrity error

I'm working on parsing a file and inserting it into a database, using sqlalchemy core. I had it set up with the orm originally but that doesn't meet the speed requirements for the project.
My database has 2 tables: Objects and Attributes. The Objects table has a primary key of obj_id. The primary key for Attributes is composite: attr_name, attr_class, and obj_id, which is also a foreign key from Objects.
The attributes are stored after parsing the file in a list of dictionaries, like so:
[
{ 'obj_id' = obj_id, 'attr_name' = name, 'attr_class' = class, etc...},
{ ETC ETC ETC}]
The data is being inserted by first bulk inserting the objects, then the attributes. The object insert works perfectly. When inserting the attributes however, I get an integrity error, saying I tried to insert a duplicate primary key.
Here is my insert code for attributes:
self.engine.execute(
Attributes.__table__.insert(),
[{'obj_id' : attr['obj_id'],
'attr_name' : attr['attr_name'],
'attr_class': attr['attr_class'],
'attr_type' : attr['attr_type'],
'attr_size' : attr['attr_size']} for attr in attrList])
While trying to work this error out, I printed the id, name, and class of each attribute in the list to a file to find the duplicate key. Nowhere in the list is there actually an identical primary key, so this leads me to believe it is a problem with the structure of my query.
Can anyone figure this out with the info I've given, or give me somewhere to look for more information? I've already checked the documentation pretty thoroughly and couldn't find anything helpful.
Edit:
I also tried executing each insert statement separately, as suggested by someone on sqlalchemy's google group. The results were the same. The code I used:
insert = Attributes.__table__.insert()
for attr in attrList:
stmt = insert.values({'obj_id' : attr['obj_id'], ...})
self.engine.execute(stmt)
where ... was the rest of the values.
Edit 2:
The Integrity error is thrown as soon as I try to insert an attribute with the same name/class but a different object id. So for example:
In the format name-class-id:
By iteration 4, I've got:
Attr1-Class1-0
Attr2-Class2-0
Attr3-Class3-0
Attr4-Class4-0
On the next iteration, I try to insert Attr1-Class1-1, which fails.

I found the problem, completely unrelated to the insert code. When storing the data in the list, I was storing an Object as obj_id, which sqlalchemy didn't like. By fixing that I fixed the insertions.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.