MongoDB: Fetching every document from a collection

MongoDB: Fetching every document from a collection - python

I'm creating a discord bot in python. I'm using MongoDB Atlas (NoSQL)
I have a user's document which looks like this
"user": 12345,
"created_at": 2012-12-31 01:48:24
I wanted to fetch every document in a collection and then take it's created_at.
How can I do this? I tried with db.inv.find({}), but it didn't work. I checked MongoDB's documentation, but they only told about JavaScript. How can I fetch every document in my collection?

Make sure your db is the database inside mongodb.if your db is mongdb's client.db,your code is right.
MONGODB_URI = "mongodb://user:password#host:port/"
client= pymongo.MongoClient(MONGODB_URI)
# database
db = client.db
# db.collection.find({}) will get the item list
result = db.inv.find({})

db.inv.find() will give you the cursor object to all the documents and then you need to iterate on it's returned documents to get the specified field.
Make sure you have connected to right collection
result=db.inv.find()
for entry in result:
print(entry["created_at"])

Related

Pymongo BulkWriteResult doesn't contain upserted_ids

Okey so currently I'm trying to upsert something in a local mongodb using pymongo.(I check to see if the document is in the db and if it is, update it, otherwise just insert it)
I'm using bulk_write to do that, and everything is working ok. The data is inserted/updated.
However, i would need the ids of the newly inserted/updated documents but the "upserted_ids" in the bulkWriteResult object is empty, even if it states that it inserted 14 documents.
I've added this screenshot with the variable. Is it a bug? or is there something i'm not aware of?
Finally, is there a way of getting the ids of the documents without actually searching for them in the db? (If possible, I would prefer to use bulk_write)
Thank you for your time.
EDIT:
As suggested, i added a part of the code so it's easier to get the general ideea:
for name in input_list:
if name not in stored_names: #completely new entry (both name and package)
operations.append(InsertOne({"name": name, "package" : [package_name]}))
if len(operations) == 0:
print ("## No new permissions to insert")
return
bulkWriteResult = _db_insert_bulk(collection_name,operations)
and the insert function:
def _db_insert_bulk(collection_name,operations_list):
return db[collection_name].bulk_write(operations_list)

The upserted_ids field in the pymongo BulkWriteResult only contains the ids of the records that have been inserted as part of an upsert operation, e.g. an UpdateOne or ReplaceOne with the upsert=True parameter set.
As you are performing InsertOne which doesn't have an upsert option, the upserted_ids list will be empty.
The lack of an inserted_ids field in pymongo's BulkWriteResult in an omission in the drivers; technically it conforms to crud specificaiton mentioned in D. SM's answer as it is annotated as "Drivers may choose to not provide this property.".
But ... there is an answer. If you are only doing inserts as part of your bulk update (and not mixed bulk operations), just use insert_many(). It is just as efficient as a bulk write and, crucially, does provide the inserted_ids value in the InsertManyResult object.
from pymongo import MongoClient
db = MongoClient()['mydatabase']
inserts = [{'foo': 'bar'}]
result = db.test.insert_many(inserts, ordered=False)
print(result.inserted_ids)
Prints:
[ObjectId('5fb92cafbe8be8a43bd1bde0')]

This functionality is part of crud specification and should be implemented by compliant drivers including pymongo. Reference pymongo documentation for correct usage.
Example in Ruby:
irb(main):003:0> c.bulk_write([insert_one:{a:1}])
=> #<Mongo::BulkWrite::Result:0x00005579c42d7dd0 #results={"n_inserted"=>1, "n"=>1, "inserted_ids"=>[BSON::ObjectId('5fb7e4b12c97a60f255eb590')]}>
Your output shows that zero documents were upserted, therefore there wouldn't be any ids associated with the upserted documents.
Your code doesn't appear to show any upserts at all, which again means you won't see any upserted ids.

insert new field in mongodb database

I'm a beginner in mongodb and pymongo and I'm working on a project where I have a students mongodb collection . What I want is to add a new field and specifically an adrress of a student to each element in my collection (the field is obviously added everywhere as null and will be filled by me later).
However when I try using this specific example to add a new field I get a the following syntax error:
client = MongoClient('mongodb://localhost:27017/') #connect to local mongodb
db = client['InfoSys'] #choose infosys database
students = db['Students']
students.update( { $set : {"address":1} } ) #set address field to every column (error happens here)
How can I fix this error?

You are using the update operation in wrong manner. Update operation is having the following syntax:
db.collection.update(
<query>,
<update>,
<options>
)
The main parameter <query> is not at all mentioned. It has to be at least empty like {}, In your case the following query will work:
db.students.update(
{}, // To update the all the documents.
{$set : {"address": 1}}, // Update the address field.
{multi: true} // To do multiple updates, otherwise Mongo will just update the first matching document.
)
So, in python, you can use update_many to achieve this. So, it will be like:
students.update_many(
{},
{"$set" : {"address": 1}}
)
You can read more about this operation here.

The previous answer here is spot on, but it looks like your question may relate more to PyMongo and how it manages updates to collections. https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html
According to the docs, it looks like you may want to use the 'update_many()' function. You will still need to make your query (all documents, in this case) as the first argument, and the second argument is the operation to perform on all records.
client = MongoClient('mongodb://localhost:27017/') #connect to local mongodb
db = client['InfoSys'] #choose infosys database
students = db['Students']
sudents.update_many({}, {$set : {"address":1}})

I solved my problem by iterating through every element in my collection and inserting the address field to each one.
cursor = students.find({})
for student in cursor :
students.update_one(student, {'$set': {'address': '1'}})

Python-Eve: Prevent inserting duplicates without using unique fields

I am trying to prevent inserting duplicate documents by the following approach:
Get a list of all documents from the desired endpoint which will contain all the documents in JSON-format. This list is called available_docs.
Use a pre_POST_<endpoint> hook in order to handle the request before inserting to the data. I am not using the on_insert hook since I need to do this before validation.
Since we can access the request object use request.json to get the payload JSON-formatted
Check if request.json is already contained in available_docs
Insert new document if it's not a duplicate only, abort otherwise.
Using this approach I got the following snippet:
def check_duplicate(request):
if not request.json in available_sims:
print('Not a duplicate')
else:
print('Duplicate')
flask.abort(422, description='Document is a duplicate and already in database.')
The available_docs list looks like this:
available_docs = [{'foo': ObjectId('565e12c58b724d7884cd02bb'), 'bar': [ObjectId('565e12c58b724d7884cd02b9'), ObjectId('565e12c58b724d7884cd02ba')]}]
The payload request.json looks like this:
{'foo': '565e12c58b724d7884cd02bb', 'bar': ['565e12c58b724d7884cd02b9', '565e12c58b724d7884cd02ba']}
As you can see, the only difference between the document which was passed to the API and the document already stored in the DB is the datatype of the IDs. Due to that fact, the if-statement in my above snippet evaluates to True and judges the document to be inserted not being a duplicate whereas it definitely is a duplicate.
Is there a way to check if a passed document is already in the database? I am not able to use unique fields since the combination of all document fields needs to be unique only. There is an unique identifier (which I left out in this example), but this is not suitable for the desired comparison since it is kind of a time stamp.
I think something like casting the given IDs at the keys foo and bar as ObjectIDs would do the trick, but I do not know how to to this since I do not know where to get the datatype ObjectID from.

You approach would be much slower than setting a unique rule for the field.
Since, from your example, you are going to compare objectids, can't you simply use those as the _id field for the collection? In Mongo (and Eve of course) that field is unique by default. Actually, you typically don't even define it. You would not need to do anything at all, as a POST of a document with an already existing id would fail right away.
If you can't go that way (maybe you need to compare a different objectid field and still, for some reason, you can't simply set a unique rule for the field), I would look at querying the db for the field value instead than getting all the documents from the db and then scanning them sequentially in code. Something like db.find({db_field: new_document_field_value}). If that returns true, new document is a duplicate. Make sure db_field is indexed (which usually holds true also for fields tagged with unique rule)
EDIT after the comments. A trivial implementation would probable be something like this:
def pre_POST_callback(resource, request):
# retrieve mongodb collection using eve connection
docs = app.data.driver.db['docs']
if docs.find_one({'foo': <value>}):
flask.abort(422, description='Document is a duplicate and already in database.')
app = Eve()
app.run()

Here's my approach on preventing duplicate records:
def on_insert_subscription(items):
c_subscription = app.data.driver.db['subscription']
user = decode_token()
if user:
for item in items:
if c_subscription.find_one({
'topic': ObjectId(item['topic']),
'client': ObjectId(user['user_id'])
}):
abort(422, description="Client already subscribed to this topic")
else:
item['client'] = ObjectId(user['user_id'])
else:
abort(401, description='Please provide proper credentials')
What I'm doing here is creating subscriptions for clients. If a client is already subscribed to a topic I throw 422.
Note: the client ID is decoded from the JWT token.

flask python update operation

I update collection in mongo db . but cant find matches. this is my code.
collection = MongoClient()["blog"]["users"]
client = MongoClient()
db = client.blog
result = db.test.update_many({"_id": '12345'}, {"$set": {"email":
"dmitry"}})
print (result.matched_count)

You are trying to update the _id field which is immutable, you will need to create a new entry and delete the old one by storing the document into a variable and then saving it with the new _id and then removing the old document.

PYMONGO - How do I use the query $in operator with MongoIDs?

So I am trying to use the $in operator in Pymongo where I want to search with a bunch of MongoIDs.
First I have this query to find an array of MongoIDs:
findUsers = db.users.find_one({'_id':user_id},{'_id':0, 'f':1})
If I print the findUsers['f'] it looks like this:
[ObjectId('53b2dc0b24c4310292e6def5'), ObjectId('53b6dbb654a7820416a12767')]
These object IDs are user ids and what I want to do is to find all users that are in the users collection with this array of ObjectID. So my thought was this:
foundUsers = db.users.find({'_id':{'$in':findUsers['f']}})
However when I print the foundUsers the outcome is this:
<pymongo.cursor.Cursor object at 0x10d972c50>
which is not what I normally get when I print a query out :(
What am I doing wrong here?
Many thanks.
Also just for you reference, I have queried in the mongo shell and it works as expected:
db.users.find({_id: {$in:[ObjectId('53b2dc0b24c4310292e6def5'), ObjectId('53b6dbb654a7820416a12767')]}})

You are encountering the difference between findOne() and find() in MongoDB. findOne returns a single document. find() returns a mongoDB cursor. Normally you have to iterate over the cursor to show the results. The reason your code works in the mongo shell is that the mongo shell treats cursors differently if they return 20 documents or less - it handles iterating over the cursor for you:
Cursors
In the mongo shell, the primary method for the read operation is the
db.collection.find() method. This method queries a collection and
returns a cursor to the returning documents.
To access the documents, you need to iterate the cursor. However, in
the mongo shell, if the returned cursor is not assigned to a variable
using the var keyword, then the cursor is automatically iterated up to
20 times [1] to print up to the first 20 documents in the results.
http://docs.mongodb.org/manual/core/cursors/
The pymongo manual page on iterating over cursors would probably be a good place to start:
http://api.mongodb.org/python/current/api/pymongo/cursor.html
but here's a piece of code that should illustrate the basics for you. After your call to find() run this:
for doc in findUsers:
print(doc)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.