Is it possible to have a bulk operation in MongoDB (with python) where insert and update commands are mixed? Updated records can be the ones that would be inserted in the same batch.
Yes. PyMongo 2.7 added a "Bulk API" which you can read about here. PyMongo 3.0 is adding an alternative API to do the same thing that is very similar to what you mention in a comment to another answer. See this commit for a preview.
I'm not super clear on what you're asking but mongo supports "upsert" which allow for inserting if record does not exist:
http://docs.mongodb.org/manual/reference/method/db.collection.update/#definition
upsert Optional. If set to true, creates a new document when no
document matches the query criteria. The default value is false, which
does not insert a new document when no match is found.
Related
I have a table with 30k clients, with the ClientID as primary key.
I'm getting data from API calls and inserting them into the table using python.
I'd like to find a way to insert rows with new clients and, if the ClientID that comes with the API call already exists in the table, update the existing register with the updated information of this client.
Thanks!!
A snippet of code would be nice to show us what exactly you are doing right now. I presume you are using an ORM like SqlAlchemy? If so, then you are looking at doing an UPSERT type of an operation.
That is already answered HERE
Alternatively, if you are executing raw queries without an ORM then you could write a custom procedure and pass required parameters. HERE is a good write up on how that is done in MSSQL under high concurrency. You could use this as a starting point for understanding and then re-write it for PostgreSQL.
I'm using Elasticsearch in python, and I can't figure out how to get the ids of the documents deleted by the delete_by_query() method! By default it only the number of documents deleted.
There is a parameter called _source that if set to True should return the source of the deleted documents. This doesn't happen, nothing changes.
Is there a good way to know which document where deleted?
The delete by query endpoint only returns a macro summary of what happened during the task, mainly how many documents were deleted and some other details.
If you want to know the IDs of the document that are going to be deleted, you can do a search (with _source: false) before running the delete by query operation and you'll get the expected IDs.
I am working on a kind of initialization routine for a MongoDB using mongoengine.
The documents we deliver to the user are read from several JSON files and written into the database at the start of the application using the above mentioned init routine.
Some of these documents have unique keys which would raise a mongoengine.errors.NotUniqueError error if a document with a duplicate key is passed to the DB. This is not a problem at all since I am able to catch those errors using try-except.
However, some other documents are something like a bunch of values or parameters. So there is no unique key which a can check in order to prevent those from being inserted to the DB twice.
I thought I could read all existing documents from the desired collection like this:
docs = MyCollection.objects()
and check whether the document to be inserted is already available in docs by using:
doc = MyCollection(parameter='foo')
print(doc in docs)
Which prints false even if there is a MyCollection(parameter='foo') document in the the DB already.
How can I achieve a duplicate detection without using unique keys?
You can check using an if statement:
if not MyCollection.objects(parameter='foo'):
# insert your documents
I have a question regarding insert query and python mysql connection. I guess that I need to commit after every insert query made
Is there a different way to do that? I mean a fast way like one in php.
Second this also is same for update query I guess ?
Another problem here is that once you commit your query connection is closed, assume that I have a different insert queries and every time i prepare it I need to insert it to the table. How can I achieve that with python. I am using MySQLdb Library
Thanks for your answers.
You don't need to commit after each insert. You can perform many operations and commit on completion.
The executemany method of the DBAPI allows you to perform many inserts/updates in a single roundtrip
There is no link between committing a transaction and disconnecting from the database. See the Connection objects methods for the details of the commit and close methods
I ran the following query on a collection in my mongodb database.
db.coll.find({field_name:{$exists:true}}).count() and it returned 2437185. The total records reported by db.coll.find({}).count() is 2437228 .
Now when i run the query db.coll.find({field_name:{$exists:false}}).count() , instead of returning 43 it returned 0.
I have the following two questions :
Does the above scenario mean that the data in my collection has become corrupt ?.
I had posted a question earlier about this at ( Updating records in MongoDB through pymongo leads to deletion of most of them). The person who replied said that updating data in mongo db could blank out the data but not delete it. What does that mean ?.
Thank You
I believe you're running into the issue reported at SERVER-1587. What version of MongoDB are you using? If it is less than 1.9.0, you can use the following as a work-around:
db.coll.find({field_name: {$not: {$exists: true}}}).count()
As for the other question, "blanking out" in this case means that an update can change the value of or unset any or all fields in a document, but can't remove the document itself. The only way to remove a document is with remove()