Well, I've a data in a csv file which is to be inserted in the DB. Now I cannot guarantee that the data provided will be according to my needs, so I want if there is any exception, then the previous transactions on the DB should be undo.
What I currently do is that the data is saved till an exception is found. I throw the exception and then prints the exception at the line number. What I want is that the whole file should be inserted again without any duplicate rows. How can I do it?
Here is my code:-
for row in rows:
line += 1
try :
obj = XYZ(col1 = row[0],col2=row[1])
obj.save()
except : raise ("Excetion found at line ", line)
How should I undo all the previous transactions which were done before an exception arised??
Sounds like what you're looking for is atomic(), to quote the docs (emphasis mine):
Atomicity is the defining property of database transactions. atomic
allows us to create a block of code within which the atomicity on the
database is guaranteed. If the block of code is successfully
completed, the changes are committed to the database. If there is an
exception, the changes are rolled back.
Related
I am working on a Python 3.5 server project and using SQLAlchemy 1.0.12 with cx_Oracle 5.2.1 to insert data into Oracle 11g. I noticed that many of my multi-row table insertions are failing intermittently with "ORA-01458: invalid length inside variable character string" error.
I generally insert a few thousand to a few tens of thousands of rows at a time, and the data is mostly composed of strings, Pandas timestamps, and floating point numbers. I have made the following observations:
The error occurs on both Windows and Linux host OS for the Python server
The error always occurs intermittently, even when the data doesn't change
If I don't insert floating point numbers, or if I round them, the error happens less often but still happens
If I insert the rows one at a time I don't encounter the error (but this is unacceptable for me performance-wise)
Additionally, I have tried to insert again if I encountered the error. The first thing I tried was to was put a try-except block around where I call execute on the sqlalchemy.engine.base.Connection object like the following:
try:
connection.execute(my_table.insert(), records)
except DatabaseError as e:
connection.execute(my_table.insert(), records)
I noticed that using this method the second insertion still often fails. The second thing I tried was to try the same in the implementation of do_executemany of OracleDialect_cx_oracle in the sqlalchemy package (sqlalchemy\dialects\oracle\cx_oracle.py):
def do_executemany(self, cursor, statement, parameters, context=None):
if isinstance(parameters, tuple):
parameters = list(parameters)
# original code
# cursor.executemany(statement, parameters)
# new code
try:
cursor.executemany(statement, parameters)
except Exception as e:
print('trying again')
cursor.executemany(statement, parameters)
Strangely, when I do it this way the second executemany call will always work if the first one fails. I'm not certain what this means but I believe this points to the cx_Oracle driver being the cause of the issue instead of sqlalchemy.
I have searched everywhere online and have not seen any reports of the same problem. Any help would be greatly appreciated!
I had the same problem, the code would some times fail and some times go through with no error. Apparently my chunk size was to big for buffer, the error did not occur anymore once I reduced the chunk size from 10K to 500 rows.
We found out that if we replace all the float.nan objects with None before we do the insertion the error disappears completely. Very weird indeed!
I need create a lot of unique codes and insert it in database.
Of course I can write something like this:
codes = set()
while len(codes) < codes_size:
c = generate_code()
if len(Codes.objects.filter(code=c)) == 0:
codes.add(Codes(c))
Codes.objects.bulk_create(codes)
But when database already contain a lot of codes it works very slow.
If insert code after each generation - it's very slow too.
Now best idea - not verify code until bulk_create. And if bulk_create raise exception then regenerate all codes again. Exceptions very rare, but when database will be more bigger then and exceptions will be more often.
bulk_create not say which code raise exception.
My understanding is that bulk_create() performs it's operation within a transaction which is not committed if an error occurs. This means that either all inserts succeeded, or none succeeded.
For example, if a code is generated that is a duplicate of one that is already in the table, or a duplicate of another within the batch, an IntegrityError exception would be raised and none of the codes would have been inserted into the database.
In terms of exceptions, you'll likely get a subclass of django.db.utils.DatabaseError, e.g. django.db.utils.IntegrityError. Because you don't know which database error will be raised you should catch DatabaseError.
I don't think that there is any way to know from the exception which of the codes caused the problem.
One way to handle this is to generate all of the codes up front and then, within a transaction, test whether they already exist in the table in one go using filter() and insert them if there are no duplicates:
from django.db import transaction
codes = set()
while len(codes) < codes_size:
codes.add(generate_code())
with transaction.atomic():
# check for any duplicate codes in table...
dups = Codes.objects.filter(code__in=codes)
if len(dups):
print 'Duplicate code(s) generated: {}'.format(dup.code for dup in dups)
# remove dups from codes, and generate additional codes to make up the shortfall.
# Note that this might also result in duplicates....
else:
Codes.objects.bulk_create(Codes(code) for code in codes)
You still need to handle database exceptions that are not due to duplicate values. For simplicity I've left that out.
I have a MySQL table in which I have Django register that a certain user is 'connected' to a unit. For this I have to check if the unit is allready connected to some other user. (model: AppUnitSession)
So in my function I get three objects (models) as input (user, usersession, vehicle)
The problem I have is that my query for the AppUnitSession fails with a
Exception Type: DoesNotExist
Exception Value: AppUnitSession matching query does not exist.
This error occurs on the first line of this code:
sessions = AppUnitSession.objects.get(gps_unit_id=vehicle.gps_unit_id)
sessions = sessions.exclude(validation_date__isnull=True)
sessions = sessions.exclude(user_session_id=usersession.user_session_id)
from the call stack I can see that value for the vehicle.gps_unit_id is set:
{'gps_unit_id': 775L}
There are NO records in the AppUnitSession table that match this! all records in this table have gps_units_id = NULL (ea, the unit is available and the user can continue and after this there will be a record with the gps_unit_id set. If there are sessions found, I need do some more work.
For the start I don't want the error. But also I want something I can iterate over or check it's length (>1) do some more checks.
I'm a bit stuck on this one, so help and suggestions are welcome.
One way is to use filter instead of get but filter returns list of objects while get returns an object if found.
Use sessions = AppUnitSession.objects.filter(gps_unit_id=vehicle.gps_unit_id)
instead of #sessions = AppUnitSession.objects.get(gps_unit_id=vehicle.gps_unit_id)
Other approach if you only want to get rid of error is to perform exception handling:
try:
sessions = AppUnitSession.objects.get(gps_unit_id=vehicle.gps_unit_id)
sessions = sessions.exclude(validation_date__isnull=True)
sessions = sessions.exclude(user_session_id=usersession.user_session_id)
except AppUnitSession.DoesNotExist:
#some code when the object does not exists
But, it is better to use get rather than filter if you are trying to retrieve only one object. Read performance of get vs filter for one object here.
I'm trying to extract data from an API that gives me data back in JSON format. I'm using SQLalchemy and simplejson in a python script to achieve this. The database is PostgreSQL.
I have a class called Harvest, it specifies the details for the table harvest.
Here is the code I suspect is incorrect.
def process(self):
req = urllib2.Request('https://api.fulcrumapp.com/api/v2/records/', headers={"X-ApiToken":"****************************"})
resp = urllib2.urlopen(req)
data = simplejson.load(resp)
for i, m in enumerate(data['harvest']):
harvest = Harvest(m)
self.session.add(harvest)
self.session.commit()
Is there something wrong with this loop? Nothing is going through to the database.
I suspect that if there is anything wrong with the loop, it is that the loop is getting skipped. One thing you can do to verify this is:
ALTER USER application_user SET log_statements='all';
Then the statements will show up in your logs. When you are done:
ALTER USER application_user RESET log_statements;
This being said one thing I see in your code that may cause trouble later is the fact that you are committing per line. This will cause extra disk I/O. You probably want to commit after the loop.
I have postgresql db which i am updating with around 100000 records. I use session.merge() to insert/update each record and i do a commit after every 1000 records.
i=0
for record in records:
i+=1
session.merge(record)
if i%1000 == 0:
session.commit()
This code works fine. In my database i have a table with a UNIQUE field and there are some duplicated records that i insert into it. A error is thrown when this happens, saying the field is not unique. Since i am inserting 1000 records at a time, a rollback will not help me to skip these records. is there any way i can skip the session.merge() for the duplicate records (other than parsing through all the records to find the duplicate records of course)?
I think you already know this, but let's start out with a piece of dogma: you specified that the field needs to be unique, so you have to let the database check for uniqueness or deal with the errors from not letting that happen.
Checking for uniqueness:
if value not in database:
session.add(value)
session.commit()
Not checking for uniqueness and catching the exception.
try:
session.add(value)
session.commit()
except IntegrityError:
session.rollback()
The first one has a race condition. I tend to use the second pattern.
Now, bringing this back to your actual issue, if you want to assure uniqueness on a column in the database then obviously you're going to have to either let the db assure itself of the loaded value's actual uniqueness, or let the database give you an error and you handle it.
That's obviously a lot slower than adding 100k objects to the session and just committing them all, but that's how databases work.
You might want to consider massaging the data which you are loading OUTSIDE the database and BEFORE attempting to load it, to ensure uniqueness. That way, when you load it you can drop the need to check for uniqueness. Pretty easy to do with command line tools if for example you're loading from csv or text files.
you can get at a "partial rollback" using SAVEPOINT, which SQLAlchemy exposes via begin_nested(). You could do it just like this:
for i, record in enumerate(records):
try:
with session.begin_nested():
session.merge(record)
except:
print "Skipped record %s" % record
if not i % 1000:
session.commit()
notes for the above:
in python, we never do the "i = i+1" thing. use enumerate().
with session.begin_nested(): is the same as saying begin_nested(), then commit() if no exception, or rollback() if so.
You might want to consider writing a function along the lines of this example from the PostgreSQL documentation.
This is the option which works best for me because the number of records with duplicate unique keys is minimal.
def update_exception(records, i, failed_records):
failed_records.append(records[i]['pk'])
session.rollback()
start_range = int(round(i/1000,0) * 1000)
for index in range(start_range, i+1):
if records[index]['pk'] not in failed_records:
ins_obj = Model()
try:
session.merge(ins_obj)
except:
failed_records.append(json_data[table_name][index-1]['pk'])
pass
Say, if i hit an error at 2375 I store the primary key 'pk' for the 2375 record in failed_records and then i recommit from 2000 to 2375. It seems much faster than doing commits one by one.