inserting dataframe to mysql table efficiently

inserting dataframe to mysql table efficiently - python

I'm looking to insert a df into a mysql table, but I want to do it a lot faster than what I have right now, cause it's extremely slow. Here is my code.
for i in range(len(mydf)):
try:
mydf.iloc[i:i+1].to_sql(save_table, con=my_engine, schema='my_schema', index=False, if_exists='append')
except exc.IntegrityError as e:
pass
except Exception as e:
logging.error("General exception was passed. Error was {}".format(e))
pass
My previous way was to do it this way, but it would be an endless loop on a duplicate row if the table already has that row. Because it would start the process over using this.
try:
mydf.to_sql(save_table, con=engine, index=False, if_exists='append')
except exc.IntegrityError as e:
logging.info("Bypassing duplicates")
except Exception as e:
logging.info("General exception was passed. Error was {}".format(e))
Wondering if I should convert my df to a list and loop through it that way or there is a better way?

This seemed to be the fastest thing.
list_values = [tuple(r) for r in self.main_sdn.to_numpy().tolist()]
try:
cur.executemany(query_string, list_values)
con.commit()
except exc.IntegrityError as e:
pass
except Exception as e:
con.rollback()
logging.error("Exception was {}".format(e))
con.close()

Related

Catch any of the errors in psycopg2 without listing them explicitly

I have a try and except block where I would like to catch only the errors in the psycopg2.errors and not any other error.
The explicit way would be:
try:
# execute a query
cur = connection.cursor()
cur.execute(sql_query)
except psycopg2.errors.SyntaxError, psycopg2.errors.GroupingError as err:
# handle in case of error
The query will always be some SELECT statement. If the execution fails it should be handled. Any other exception not belonging to psycopg, e.g. like ZeroDivisionError, should not be caught from the except clause. However, I would like to avoid to list all errors after the except clause. In fact, if you list the psycopg errors, you get a quite extensive list:
from psycopg2 import errors
dir(errors)
I have searched quite extensively and am not sure if this question has been asked already.

You can you use the base class psycopg2.Error it catch all psycopg2 related errors
import psycopg2
try:
cur = connection.cursor()
cur.execute(sql_query)
except psycopg2.Error as err:
# handle in case of error
see official documentation

Meanwhile, I have implemented by catching a generic Exception and checking if the exception belongs to the list returned by dir(errors). The solution proposed by Yannick looks simpler, though.
The function that I use prints the error details and checks using the name of the exception err_type.__name__ whether it is in any of the psycopg errors:
from psycopg2 import errors
def is_psycopg2_exception(_err):
err_type, err_obj, traceback = sys.exc_info()
print ("\npsycopg2 ERROR:", _err, "on line number:", traceback.tb_lineno)
print ("psycopg2 traceback:", traceback, "-- type:", err_type)
return err_type.__name__ in dir(errors)
Then, I use this function in the try/except clause:
try:
# execute a query
cur = connection.cursor()
cur.execute(sql_query)
except Exception as err:
if is_psycopg2_exception(err):
# handle in case of psycopg error
else:
# other type of error
sys.exit(1) # quit
For my very specific case, where I need to check for other other exceptions as well, I can readapt Yannick solution as follows:
try:
# execute a query
cur = connection.cursor()
cur.execute(sql_query)
except psycopg2.OperationalError as err:
# handle some connection-related error
except psycopg2.Error as err:
# handle in case of other psycopg error
except Exception as err:
# any other error
sys.exit(1) # quit

Handling PyMySql exceptions - Best Practices

My question regards exception best practices.
I'll present my question on a specific case with PyMySQL but it regards errors handling in general.
I am using PyMySQL and out of the many possible exceptions, there is one I want to deal with in a specific manner. "Duplicate" exception.
pymysql maps mysql errors to python errors according to the following table:
_map_error(ProgrammingError, ER.DB_CREATE_EXISTS, ER.SYNTAX_ERROR,
ER.PARSE_ERROR, ER.NO_SUCH_TABLE, ER.WRONG_DB_NAME,
ER.WRONG_TABLE_NAME, ER.FIELD_SPECIFIED_TWICE,
ER.INVALID_GROUP_FUNC_USE, ER.UNSUPPORTED_EXTENSION,
ER.TABLE_MUST_HAVE_COLUMNS, ER.CANT_DO_THIS_DURING_AN_TRANSACTION)
_map_error(DataError, ER.WARN_DATA_TRUNCATED, ER.WARN_NULL_TO_NOTNULL,
ER.WARN_DATA_OUT_OF_RANGE, ER.NO_DEFAULT, ER.PRIMARY_CANT_HAVE_NULL,
ER.DATA_TOO_LONG, ER.DATETIME_FUNCTION_OVERFLOW)
_map_error(IntegrityError, ER.DUP_ENTRY, ER.NO_REFERENCED_ROW,
ER.NO_REFERENCED_ROW_2, ER.ROW_IS_REFERENCED, ER.ROW_IS_REFERENCED_2,
ER.CANNOT_ADD_FOREIGN, ER.BAD_NULL_ERROR)
_map_error(NotSupportedError, ER.WARNING_NOT_COMPLETE_ROLLBACK,
ER.NOT_SUPPORTED_YET, ER.FEATURE_DISABLED, ER.UNKNOWN_STORAGE_ENGINE)
_map_error(OperationalError, ER.DBACCESS_DENIED_ERROR, ER.ACCESS_DENIED_ERROR,
ER.CON_COUNT_ERROR, ER.TABLEACCESS_DENIED_ERROR,
ER.COLUMNACCESS_DENIED_ERROR)
I want to specifically catch ER.DUP_ENTRY but I only know how to catch IntegrityError and that leads to redundant cases within my exception catch.
cur.execute(query, values)
except IntegrityError as e:
if e and e[0] == PYMYSQL_DUPLICATE_ERROR:
handel_duplicate_pymysql_exception(e, func_a)
else:
handel_unknown_pymysql_exception(e, func_b)
except Exception as e:
handel_unknown_pymysql_exception(e, func_b)
Is there a way to simply catch only ER.DUP_ENTRY some how?
looking for something like:
except IntegrityError.DUP_ENTRY as e:
handel_duplicate_pymysql_exception(e, func_a)
Thanks in advance for your guidance,

there is very generic way to use pymysql error handling. I am using this for sqlutil module. This way you can catch all your errors raised by pymysql without thinking about its type.
try:
connection.close()
print("connection closed successfully")
except pymysql.Error as e:
print("could not close connection error pymysql %d: %s" %(e.args[0], e.args[1]))

You cannot specify an expect clause based on an exception instance attribute obviously, and not even on an exception class attribute FWIW - it only works on exception type.
A solution to your problem is to have two nested try/except blocks, the inner one handling duplicate entries and re-raising other IntegrityErrors, the outer one being the generic case:
try:
try:
cur.execute(query, values)
except IntegrityError as e:
if e.args[0] == PYMYSQL_DUPLICATE_ERROR:
handle_duplicate_pymysql_exception(e, func_a)
else:
raise
except Exception as e:
handle_unknown_pymysql_exception(e, func_b)
Whether this is better than having a duplicate call to handle_unknown_pymysql_exception is up to you...

Python 2.7 exception handling syntax

I am bit confused about the try exception usage in Python 2.7.
try:
raise valueError("sample value error")
except Exception as e:
print str(e)
try:
raise valueError("sample value error")
except Exception,exception:
print str(exception)
try:
raise valueError("sample value error")
except exception:
print str(exception)
try:
raise valueError("sample value error")
except Exception:
print str(Exception) # it prints only the object reference
can some help me to understand the above usage?

Some concepts to help you understand the difference between the alternate variants of the except variants:
except Exception, e – This in an older variant, now deprecated, similar to except Exception as e
except Exception as e – Catch exceptions of the type Exception (or any subclass) and store them in the variable e for further processing, messaging or similar
except Exception – Catch exceptions of the type Exception (or any subclass), but ignore the value/information provided in the exception
except e – Gives me an compilation error, not sure if this related to python version, but if so, it should/would mean that you don't care about the type of exception but want to access the information in it
except – Catch any exception, and ignore the exception information
What to use, depends on many factors, but if you don't need the provided information in the exception there is no need to present the variable to catch this information.
Regarding which type of Exception to catch, take care to catch the accurate type of exceptions. If you are writing a general catch it all, it could be correct to use except Exception, but in the example case you've given I would opt for actually using except ValueError directly. This would allow for potentially other exceptions to be properly handled at another level of your code. The point is, don't catch exception you are not ready to handle.
If you want, you can read more on python 2.7 exception handling or available python 2.7 exception in the official documentation.

For Python 3 (also works in Python 2.7):
try:
raise ValueError("sample value error")
except Exception as e:
print(e)
For Python 2 (will not work in Python 3):
try:
raise ValueError("sample value error")
except Exception, e:
print e

I use:
try:
raise valueError("sample value error")
except Exception as e:
print str(e)
When I want to declare a specific error and
try:
raise valueError("sample value error")
except:
print "Something unexpected happened"
When I don't really care or except: pass , except: return etc

Use the format
try:
raise ValueError("sample value error")
except Exception, e:
print e

When PyMongo throws a DuplicateKeyError How can I tell what field caused the conflict? [duplicate]

In pymongo, when a DuplicateKeyError caught, what's the proper way to find out the duplicate value behind the the exception?
Currently I do this
try:
db.coll.insert({key: ['some_value', 'some_value_1']})
except pymongo.errors.DuplicateKeyError, e:
dups = re.findall(r'\{\ +:\ +"(.*)"\ +\}$', e.message)
if len(dups) == 1:
print dups[0]
It seems to work, but is there any easier way, like
try:
db.coll.insert({key: ['some_value', 'some_value_1']})
except pymongo.errors.DuplicateKeyError, e:
print e.dup_val
EDIT
It's a concurrent app, so check duplicates before insert might fail.
The field is an array, so it's hard to find out which one is the duplicate value.

In dev version of pymongo (2.7) you can check with error_document property:
try:
db.coll.insert({name: 'some_value'})
except pymongo.errors.DuplicateKeyError, e:
print e.error_document
As far as I know, in 2.6 and earlier versions, all info except error msg and code is discarded.

Get the duplicate value on DuplicateKeyError

In pymongo, when a DuplicateKeyError caught, what's the proper way to find out the duplicate value behind the the exception?
Currently I do this
try:
db.coll.insert({key: ['some_value', 'some_value_1']})
except pymongo.errors.DuplicateKeyError, e:
dups = re.findall(r'\{\ +:\ +"(.*)"\ +\}$', e.message)
if len(dups) == 1:
print dups[0]
It seems to work, but is there any easier way, like
try:
db.coll.insert({key: ['some_value', 'some_value_1']})
except pymongo.errors.DuplicateKeyError, e:
print e.dup_val
EDIT
It's a concurrent app, so check duplicates before insert might fail.
The field is an array, so it's hard to find out which one is the duplicate value.

In dev version of pymongo (2.7) you can check with error_document property:
try:
db.coll.insert({name: 'some_value'})
except pymongo.errors.DuplicateKeyError, e:
print e.error_document
As far as I know, in 2.6 and earlier versions, all info except error msg and code is discarded.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

inserting dataframe to mysql table efficiently - python

This seemed to be the fastest thing. list_values = [tuple(r) for r in self.main_sdn.to_numpy().tolist()] try: cur.executemany(query_string, list_values) con.commit() except exc.IntegrityError as e: pass except Exception as e: con.rollback() logging.error("Exception was {}".format(e)) con.close()

Related

Catch any of the errors in psycopg2 without listing them explicitly

Handling PyMySql exceptions - Best Practices

Python 2.7 exception handling syntax

When PyMongo throws a DuplicateKeyError How can I tell what field caused the conflict? [duplicate]

Get the duplicate value on DuplicateKeyError

Categories

Resources