IntegrityError: distinguish between unique constraint and not null violations

IntegrityError: distinguish between unique constraint and not null violations - python

I have this code:
try:
principal = cls.objects.create(
user_id=user.id,
email=user.email,
path='something'
)
except IntegrityError:
principal = cls.objects.get(
user_id=user.id,
email=user.email
)
It tries to create a user with the given id and email, and if there already exists one - tries to get the existing record.
I know this is a bad construction and it will be refactored anyway. But my question is this:
How do i determine what kind of IntegrityError has happened: the one related to unique constraint violation (there is unique key on (user_id, email)) or the one related to not null constraint (path cannot be null)?

psycopg2 provides the SQLSTATE with the exception as the pgcode member, which gives you quite fine-grained error information to match on.
python3
>>> import psycopg2
>>> conn = psycopg2.connect("dbname=regress")
>>> curs = conn.cursor()
>>> try:
... curs.execute("INVALID;")
... except Exception as ex:
... xx = ex
>>> xx.pgcode
'42601'
See Appendix A: Error Codes in the PostgreSQL manual for code meanings. Note that you can match coarsely on the first two chars for broad categories. In this case I can see that SQLSTATE 42601 is syntax_error in the Syntax Error or Access Rule Violation category.
The codes you want are:
23505 unique_violation
23502 not_null_violation
so you could write:
try:
principal = cls.objects.create(
user_id=user.id,
email=user.email,
path='something'
)
except IntegrityError as ex:
if ex.pgcode == '23505':
principal = cls.objects.get(
user_id=user.id,
email=user.email
)
else:
raise
That said, this is a bad way to do an upsert or merge. #pr0gg3d is presumably right in suggesting the right way to do it with Django; I don't do Django so I can't comment on that bit. For general info on upsert/merge see depesz's article on the topic.

Update as of 9-6-2017:
A pretty elegant way to do this is to try/except IntegrityError as exc, and then use some useful attributes on exc.__cause__ and exc.__cause__.diag (a diagnostic class that gives you some other super relevant information on the error at hand - you can explore it yourself with dir(exc.__cause__.diag)).
The first one you can use was described above. To make your code more future proof you can reference the psycopg2 codes directly, and you can even check the constraint that was violated using the diagnostic class I mentioned above:
except IntegrityError as exc:
from psycopg2 import errorcodes as pg_errorcodes
assert exc.__cause__.pgcode == pg_errorcodes.UNIQUE_VIOLATION
assert exc.__cause__.diag.constraint_name == 'tablename_colA_colB_unique_constraint'
edit for clarification: I have to use the __cause__ accessor because I'm using Django, so to get to the psycopg2 IntegrityError class I have to call exc.__cause__

It could be better to use:
try:
obj, created = cls.objects.get_or_create(user_id=user.id, email=user.email)
except IntegrityError:
....
as in https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create
The IntegrityError should be raised only in the case there's a NOT NULL constraint violation.
Furthermore you can use created flag to know if the object already existed.

Related

SQLAlchemy ORM Update for HSTORE fields

I have a problem when I try to update hstore field. I have the following translation hybrid and database model.
translation_hybrid = TranslationHybrid(
current_locale='en',
default_locale='de'
)
class Book:
__tablename__ = "Book"
id = Column(UUID(as_uuid=True), primary_key=True)
title_translations = Column(MutableDict.as_mutable(HSTORE), nullable=False)
title = translation_hybrid(title_translations)
I want to update title with the current locale using a single orm query. When I try the following query
query(Book).filter(Book.id == id).update({"title": "new_title"})
ORM converts this to the following sql:
UPDATE "Book" SET coalesce(title_translations -> 'en', title_translations -> 'de') = "new_title" WHERE "Book".id = id
And this sql gives the syntax error. What is the best way to update it without fetching the model first and assigning the value to the field?

We got this to run eventually, documenting here for the benefit of others that might run into this issue; Note that we're using the new select methodology and async.
As you already suggested, we solved this by assigning the updated values directly to the record object. We're basically implementing this solution from the SQLAlchemy docu:
updated_record: models.Country = None # type: ignore
try:
# fetch current data from database and lock for update
record = await session.execute(
select(models.Country)
.filter_by(id=str(country_id))
.with_for_update(nowait=True)
)
updated_record = record.scalar_one()
logger.debug(
"update() - fetched current data from database",
record=record,
updated_record=vars(updated_record),
)
# merge country_dict (which holds the data to be updated) with the data in the DB
for key, value in country_dict.items():
setattr(updated_record, key, value)
logger.debug(
"update() - merged new data into record",
updated_record=vars(updated_record),
)
# flush data to database
await session.flush()
# refresh updated_record and commit
await session.refresh(updated_record)
await session.commit()
except Exception as e: # noqa: PIE786
logger.error("update() - an error occurred", error=str(e))
await session.rollback()
raise ValueError("Record can not be updated.")
return updated_record

I think I have solved a similar instance of the problem using the bulk update query variant.
In this case a PoC solution would look like this:
session.execute(update(Book), [{"id": id, "title": title}])
session.commit()
I am not sure why this does not trigger the coalesce() issue but it seems to be working. We should probably open an issue in SQLAlchemy as I don't have the time right now to debug it to its root cause.
UPDATE
I think that the original issue actually originates in sqlalchemy-util as the coalesce seems to arise from the expr_factory of the hybrid property here.

Pymongo get inserted id's even with duplicate key error

I am working on a flask app and using mongodb with it. In one endpoint i took csv files and inserts the content to mongodb with insert_many() . Before inserting i am creating a unique index for preventing duplication on mongodb. When there is no duplication i can reach inserted_ids for that process but when it raises duplication error i get None and i can't get inserted_ids . I am using ordered=False also. Is there any way that allows me to get inserted_ids even with duplicate key error ?
def createBulk(): #in controller
identity = get_jwt_identity()
try:
csv_file = request.files['csv']
insertedResult = ProductService(identity).create_product_bulk(csv_file)
print(insertedResult) # this result is None when get Duplicate Key Error
threading.Thread(target=ProductService(identity).sendInsertedItemsToEventCollector,args=(insertedResult,)).start()
return json_response(True,status=200)
except Exception as e:
print("insertedResultErr -> ",str(e))
return json_response({'error':str(e)},400)
def create_product_bulk(self,products): # in service
data_frame = read_csv(products)
data_json = data_frame.to_json(orient="records",force_ascii=False)
try:
return self.repo_client.create_bulk(loads(data_json))
except bulkErr as e:
print(str(e))
pass
except DuplicateKeyError as e:
print(str(e))
pass
def create_bulk(self, products): # in repo
self.checkCollectionName()
self.db.get_collection(name=self.collection_name).create_index('barcode',unique=True)
return self.db.get_collection(name=self.collection_name).insert_many(products,ordered=False)

Unfortunately, not in the way you have done it with the current pymongo drivers. As you have found, if you get errors in your insert_many() it will throw an exception and the exception detail does not contain details of the inserted_ids.
It does contain details of the keys the fail (in e.details['writeErrors'][]['keyValue']) so you could try and work backwards from that from your original products list.
Your other workaround is to use insert_one() in a loop with a try ... except and check each insert. I know this is less efficient but it's a workaround ...

Python - mysql - logging queries that generate warnings

I have python code that is making mysql calls. It logs all mysql errors (and sends me a notifiocation on google chat). However warnings such as this dont get reported which makes sense since they are not warnings. I would however like the mysql statement logged when there is a warning so I can fix the underlying issue. What is the best way to find those warnings and get them to the log(with the bad mysql statment
/usr/local/lib/python3.4/dist-packages/pymysql/cursors.py:170: Warning: (1366, "Incorrect string value: '\\xF3n Cha...' for column 'recipient-name' at row 1")
.
try:
cursor.execute(query_string, field_split)
db.commit()
except pymysql.err.InternalError as e:
logger.warning('Mysql Error : %s', e)
logger.warning('Statement : %s', cursor._last_executed)
string_google= str(e.args[1] + ' - ' + cursor._last_executed)
googlechat(string_google)
return #exit rather then marking report run good

Use catch_warnings from the warnings module. It's a context manager that provides you with a list of the warnings. The code would look something like this:
with warnings.catch_warnings(record=True) as w:
function_that_triggers_warning()
if w:
logging_function(w[-1])

What's the equivalent of peewee's DoesNotExist in SQLAlchemy?

I've been using peewee with SQLite for some time and now I'm switching to SQLAlchemy with Postgres and I can't find equivalent of DoesNotExist (see example)
try:
return models.User.get(models.User.id == userid)
except models.DoesNotExist:
return None
Do you know how to achieve the same with SQLAlchemy? I've checked stuff which I can import from sqlalchemy.ext but nothing seemed right.

The closest could be this one: - http://docs.sqlalchemy.org/en/latest/orm/exceptions.html#sqlalchemy.orm.exc.NoResultFound
Code Sample:
from sqlalchemy.orm.exc import NoResultFound
try:
user = session.query(User).one()
except NoResultFound, e:
print "No users found"

Peewee does work with Postgresql, you know. ;)

How to get the record causing IntegrityError in Django

I have the following in my django model, which I am using with PostgresSql
class Business(models.Model):
location = models.CharField(max_length=200,default="")
name = models.CharField(max_length=200,default="",unique=True)
In my view I have:
for b in bs:
try:
p = Business(**b)
p.save()
except IntegrityError:
pass
When the app is run and an IntegrityError is triggered I would like to grab the already inserted record and also the object (I assume 'p') that triggered the error and update the location field.
In pseudocode:
for b in bs:
try:
p = Business(**b)
p.save()
except IntegrityError:
EXISTING_RECORD.location = EXISTING_RECORD.location + p.location
EXISTING_RECORD.save()
How is this done in django?

This is the way I got the existing record that you are asking for.
In this case, I had MyModel with
unique_together = (("owner", "hsh"),)
I used regex to get the owner and hsh of the existing record that was causing the issue.
import re
from django.db import IntegrityError
try:
// do something that might raise Integrity error
except IntegrityError as e:
#example error message (e.message): 'duplicate key value violates unique constraint "thingi_userfile_owner_id_7031f4ac5e4595e3_uniq"\nDETAIL: Key (owner_id, hsh)=(66819, 4252d2eba0e567e471cb08a8da4611e2) already exists.\n'
import re
match = re.search( r'Key \(owner_id, hsh\)=\((?P<owner_id>\d+), (?P<hsh>\w+)\) already', e.message)
existing_record = MyModel.objects.get(owner_id=match.group('owner_id'), hsh=match.group('hsh'))

I tried get_or_create, but that doesn't quite work the way you want (if you do get_or_create with both the name and the location, you still get an integrity error; if you do what Joran suggested, unless you overload update, it will overwrite location as opposed to append.
This should work the way you want:
for b in bs:
bobj, new_flag = Business.objects.get_or_create(name=b['name'])
if new_flag:
bobj.location = b['location']
else:
bobj.location += b['location'] # or possibly something like += ',' + b['location'] if you wanted to separate them
bobj.save()
It would be nice (and may be possible but I haven't tried), in the case where you can have multiple unique constraints, to be able to inspect the IntegrityException (similar to the accepted answer in IntegrityError: distinguish between unique constraint and not null violations, which also has the downside of appearing to be postgres only) to determine which field(s) violated. Note that if you wanted to follow your original framework, you can do collidedObject = Business.objects.get(name=b['name']) in your exception but that only works in the case where you know for sure that it was a name collision.

for b in bs:
p = Business.objects.get_or_create(name=b['name'])
p.update(**b)
p.save()
I think anyway

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

IntegrityError: distinguish between unique constraint and not null violations - python

Related

SQLAlchemy ORM Update for HSTORE fields

Pymongo get inserted id's even with duplicate key error

Python - mysql - logging queries that generate warnings

What's the equivalent of peewee's DoesNotExist in SQLAlchemy?

How to get the record causing IntegrityError in Django

Categories

Resources