Use case: Right now I am using Python elasticsearch API form bulk insert.
I have try except wrapped around my bulk insert code but whenever any exception comes elastisearch doesn't rollback inserted document.
And also doesn't try to insert the remaining documents other than the corrupted document which throws an exception.
There is no such thing, a bulk queries contains a list of queries to insert / update or delete documents.
elasticsearch is not a database (no ACID / no transaction etc..), so you must create a rollback feature yourself.
Related
I currently populate a database from a third party API that involves downloading files containing multiple SQL INSERT/DELETE/UPDATE statements and then parsing them into SQLAlchemy ORM objects to load into my database.
These files can often contain errors that I've tried to build in some integrity checks for. The particular one I'm currently struggling with is duplicate records - basically receiving a file to insert a record that currently exists. To avoid this I put a unique index on the fields that form a composite primary key. However, this means I get an error when processing a file with an SQL statement that tries to duplicate a record and a flush or commit is subsequently issued.
I don't want to commit records to the database until all the SQL statements for a given file have been processed so I can keep track of what's been processed. I was thinking that I could issue a flush at the end of the processing of every statement and then have some error handling if it fails because of a duplicate record. This would include bypassing the offending statement. However, as I understand the docs then issuing a rollback would cancel all the previous statements that had been processed to that point when I only want to skip the duplicate one.
Is there an option to partially rollback in some way or do I need to build a check up front that queries the database to check if executing an SQL statement will create a duplicate record?
I am trying to do a bulk insert of documents into a MongoDB collection in Python, using pymongo. This is what the code looks like:
collection_name.insert_many([ logs[i] for i in range (len(logs)) ])
where logs is a list of dictionaries of variable length.
This works fine when there are no issues with any of the logs. However, if any one of the logs has some kind of issue and pymongo refuses to save it (say, the issue is something like the document fails to match the validation schema set for that collection), the entire bulk insert is rolled back and no documents are inserted in the database.
Is there any way I can retry the bulk insert by ignoring only the defective log?
You can ignore those types of errors by specifying ordered: false as an option: collection.insert_many(logs, ordered=False). All operations are attempted before raising an exception, which you can catch.
See https://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.insert_many
Hello I am using my second computer to gather some data and insert it into the SQL database. I set up everything when it comes to reading and writing the database remotely, and I can insert new rows just by using the normal SQL.
With pyodbc I can read tables, but when I insert new data, nothing happens. No error message, but also no new rows in the table.
I wonder if anyone has faced this issue before and knows what the solution is.
The cursor.execute() method only prepares the SQL statement. Then, since this is an INSERT statement, you must use the cursor.commit() method for the records to actually populate your table. Likewise for a DELETE statement, you need to commit, as well.
Without more perspective here, I can only assume that you are not committing the insert.
Notice, similarly, that when you run cursor.execute("""select * from yourTable"""), you need to run cursor.fetchall() or another fetch statement to actually retrieve and view your query.
Is it possible to have a bulk operation in MongoDB (with python) where insert and update commands are mixed? Updated records can be the ones that would be inserted in the same batch.
Yes. PyMongo 2.7 added a "Bulk API" which you can read about here. PyMongo 3.0 is adding an alternative API to do the same thing that is very similar to what you mention in a comment to another answer. See this commit for a preview.
I'm not super clear on what you're asking but mongo supports "upsert" which allow for inserting if record does not exist:
http://docs.mongodb.org/manual/reference/method/db.collection.update/#definition
upsert Optional. If set to true, creates a new document when no
document matches the query criteria. The default value is false, which
does not insert a new document when no match is found.
I have a question regarding insert query and python mysql connection. I guess that I need to commit after every insert query made
Is there a different way to do that? I mean a fast way like one in php.
Second this also is same for update query I guess ?
Another problem here is that once you commit your query connection is closed, assume that I have a different insert queries and every time i prepare it I need to insert it to the table. How can I achieve that with python. I am using MySQLdb Library
Thanks for your answers.
You don't need to commit after each insert. You can perform many operations and commit on completion.
The executemany method of the DBAPI allows you to perform many inserts/updates in a single roundtrip
There is no link between committing a transaction and disconnecting from the database. See the Connection objects methods for the details of the commit and close methods