To lock or to catch IntegrityError? [duplicate] - python

I want to get an object from the database if it already exists (based on provided parameters) or create it if it does not.
Django's get_or_create (or source) does this. Is there an equivalent shortcut in SQLAlchemy?
I'm currently writing it out explicitly like this:
def get_or_create_instrument(session, serial_number):
instrument = session.query(Instrument).filter_by(serial_number=serial_number).first()
if instrument:
return instrument
else:
instrument = Instrument(serial_number)
session.add(instrument)
return instrument

Following the solution of #WoLpH, this is the code that worked for me (simple version):
def get_or_create(session, model, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance
else:
instance = model(**kwargs)
session.add(instance)
session.commit()
return instance
With this, I'm able to get_or_create any object of my model.
Suppose my model object is :
class Country(Base):
__tablename__ = 'countries'
id = Column(Integer, primary_key=True)
name = Column(String, unique=True)
To get or create my object I write :
myCountry = get_or_create(session, Country, name=countryName)

That's basically the way to do it, there is no shortcut readily available AFAIK.
You could generalize it ofcourse:
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).one_or_none()
if instance:
return instance, False
else:
params = {k: v for k, v in kwargs.items() if not isinstance(v, ClauseElement)}
params.update(defaults or {})
instance = model(**params)
try:
session.add(instance)
session.commit()
except Exception: # The actual exception depends on the specific database so we catch all exceptions. This is similar to the official documentation: https://docs.sqlalchemy.org/en/latest/orm/session_transaction.html
session.rollback()
instance = session.query(model).filter_by(**kwargs).one()
return instance, False
else:
return instance, True
2020 update (Python 3.9+ ONLY)
Here is a cleaner version with Python 3.9's the new dict union operator (|=)
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).one_or_none()
if instance:
return instance, False
else:
kwargs |= defaults or {}
instance = model(**kwargs)
try:
session.add(instance)
session.commit()
except Exception: # The actual exception depends on the specific database so we catch all exceptions. This is similar to the official documentation: https://docs.sqlalchemy.org/en/latest/orm/session_transaction.html
session.rollback()
instance = session.query(model).filter_by(**kwargs).one()
return instance, False
else:
return instance, True
Note:
Similar to the Django version this will catch duplicate key constraints and similar errors. If your get or create is not guaranteed to return a single result it can still result in race conditions.
To alleviate some of that issue you would need to add another one_or_none() style fetch right after the session.commit(). This still is no 100% guarantee against race conditions unless you also use a with_for_update() or serializable transaction mode.

I've been playing with this problem and have ended up with a fairly robust solution:
def get_one_or_create(session,
model,
create_method='',
create_method_kwargs=None,
**kwargs):
try:
return session.query(model).filter_by(**kwargs).one(), False
except NoResultFound:
kwargs.update(create_method_kwargs or {})
created = getattr(model, create_method, model)(**kwargs)
try:
session.add(created)
session.flush()
return created, True
except IntegrityError:
session.rollback()
return session.query(model).filter_by(**kwargs).one(), False
I just wrote a fairly expansive blog post on all the details, but a few quite ideas of why I used this.
It unpacks to a tuple that tells you if the object existed or not. This can often be useful in your workflow.
The function gives the ability to work with #classmethod decorated creator functions (and attributes specific to them).
The solution protects against Race Conditions when you have more than one process connected to the datastore.
EDIT: I've changed session.commit() to session.flush() as explained in this blog post. Note that these decisions are specific to the datastore used (Postgres in this case).
EDIT 2: I’ve updated using a {} as a default value in the function as this is typical Python gotcha. Thanks for the comment, Nigel! If your curious about this gotcha, check out this StackOverflow question and this blog post.

A modified version of erik's excellent answer
def get_one_or_create(session,
model,
create_method='',
create_method_kwargs=None,
**kwargs):
try:
return session.query(model).filter_by(**kwargs).one(), True
except NoResultFound:
kwargs.update(create_method_kwargs or {})
try:
with session.begin_nested():
created = getattr(model, create_method, model)(**kwargs)
session.add(created)
return created, False
except IntegrityError:
return session.query(model).filter_by(**kwargs).one(), True
Use a nested transaction to only roll back the addition of the new item instead of rolling back everything (See this answer to use nested transactions with SQLite)
Move create_method. If the created object has relations and it is assigned members through those relations, it is automatically added to the session. E.g. create a book, which has user_id and user as corresponding relationship, then doing book.user=<user object> inside of create_method will add book to the session. This means that create_method must be inside with to benefit from an eventual rollback. Note that begin_nested automatically triggers a flush.
Note that if using MySQL, the transaction isolation level must be set to READ COMMITTED rather than REPEATABLE READ for this to work. Django's get_or_create (and here) uses the same stratagem, see also the Django documentation.

This SQLALchemy recipe does the job nice and elegant.
The first thing to do is to define a function that is given a Session to work with, and associates a dictionary with the Session() which keeps track of current unique keys.
def _unique(session, cls, hashfunc, queryfunc, constructor, arg, kw):
cache = getattr(session, '_unique_cache', None)
if cache is None:
session._unique_cache = cache = {}
key = (cls, hashfunc(*arg, **kw))
if key in cache:
return cache[key]
else:
with session.no_autoflush:
q = session.query(cls)
q = queryfunc(q, *arg, **kw)
obj = q.first()
if not obj:
obj = constructor(*arg, **kw)
session.add(obj)
cache[key] = obj
return obj
An example of utilizing this function would be in a mixin:
class UniqueMixin(object):
#classmethod
def unique_hash(cls, *arg, **kw):
raise NotImplementedError()
#classmethod
def unique_filter(cls, query, *arg, **kw):
raise NotImplementedError()
#classmethod
def as_unique(cls, session, *arg, **kw):
return _unique(
session,
cls,
cls.unique_hash,
cls.unique_filter,
cls,
arg, kw
)
And finally creating the unique get_or_create model:
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
engine = create_engine('sqlite://', echo=True)
Session = sessionmaker(bind=engine)
class Widget(UniqueMixin, Base):
__tablename__ = 'widget'
id = Column(Integer, primary_key=True)
name = Column(String, unique=True, nullable=False)
#classmethod
def unique_hash(cls, name):
return name
#classmethod
def unique_filter(cls, query, name):
return query.filter(Widget.name == name)
Base.metadata.create_all(engine)
session = Session()
w1, w2, w3 = Widget.as_unique(session, name='w1'), \
Widget.as_unique(session, name='w2'), \
Widget.as_unique(session, name='w3')
w1b = Widget.as_unique(session, name='w1')
assert w1 is w1b
assert w2 is not w3
assert w2 is not w1
session.commit()
The recipe goes deeper into the idea and provides different approaches but I've used this one with great success.

The closest semantically is probably:
def get_or_create(model, **kwargs):
"""SqlAlchemy implementation of Django's get_or_create.
"""
session = Session()
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance, False
else:
instance = model(**kwargs)
session.add(instance)
session.commit()
return instance, True
not sure how kosher it is to rely on a globally defined Session in sqlalchemy, but the Django version doesn't take a connection so...
The tuple returned contains the instance and a boolean indicating if the instance was created (i.e. it's False if we read the instance from the db).
Django's get_or_create is often used to make sure that global data is available, so I'm committing at the earliest point possible.

I slightly simplified #Kevin. solution to avoid wrapping the whole function in an if/else statement. This way there's only one return, which I find cleaner:
def get_or_create(session, model, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if not instance:
instance = model(**kwargs)
session.add(instance)
return instance

There is a Python package that has #erik's solution as well as a version of update_or_create(). https://github.com/enricobarzetti/sqlalchemy_get_or_create

Depending on the isolation level you adopted, none of the above solutions would work.
The best solution I have found is a RAW SQL in the following form:
INSERT INTO table(f1, f2, unique_f3)
SELECT 'v1', 'v2', 'v3'
WHERE NOT EXISTS (SELECT 1 FROM table WHERE f3 = 'v3')
This is transactionally safe whatever the isolation level and the degree of parallelism are.
Beware: in order to make it efficient, it would be wise to have an INDEX for the unique column.

One problem I regularly encounter is when a field has a max length (say, STRING(40)) and you'd like to perform a get or create with a string of large length, the above solutions will fail.
Building off of the above solutions, here's my approach:
from sqlalchemy import Column, String
def get_or_create(self, add=True, flush=True, commit=False, **kwargs):
"""
Get the an entity based on the kwargs or create an entity with those kwargs.
Params:
add: (default True) should the instance be added to the session?
flush: (default True) flush the instance to the session?
commit: (default False) commit the session?
kwargs: key, value pairs of parameters to lookup/create.
Ex: SocialPlatform.get_or_create(**{'name':'facebook'})
returns --> existing record or, will create a new record
---------
NOTE: I like to add this as a classmethod in the base class of my tables, so that
all data models inherit the base class --> functionality is transmitted across
all orm defined models.
"""
# Truncate values if necessary
for key, value in kwargs.items():
# Only use strings
if not isinstance(value, str):
continue
# Only use if it's a column
my_col = getattr(self.__table__.columns, key)
if not isinstance(my_col, Column):
continue
# Skip non strings again here
if not isinstance(my_col.type, String):
continue
# Get the max length
max_len = my_col.type.length
if value and max_len and len(value) > max_len:
# Update the value
value = value[:max_len]
kwargs[key] = value
# -------------------------------------------------
# Make the query...
instance = session.query(self).filter_by(**kwargs).first()
if instance:
return instance
else:
# Max length isn't accounted for here.
# The assumption is that auto-truncation will happen on the child-model
# Or directtly in the db
instance = self(**kwargs)
# You'll usually want to add to the session
if add:
session.add(instance)
# Navigate these with caution
if add and commit:
try:
session.commit()
except IntegrityError:
session.rollback()
elif add and flush:
session.flush()
return instance

Related

Return missing values in google app engine query

In google app engine, say I have a Parent and a Child Entity:
class Parent(ndb.Model):
pass
class Child(ndb.Model):
parent_key = ndb.KeyProperty(indexed = True)
... other properties I don't need to fetch ...
I have a list of parents' keys, say parents_list, and I'm trying to answer efficiently: what parent in parents_list has a child.
Ideally, I would run this query:
children_list = Child.query().filter(Child.parent_key = parents_list).fetch(projection = 'parent_key')
It does not work because of the projection property (parent_key) being in the equality filter. So I would have to retrieve all properties, which seems inefficient.
Is there a way to efficiently solve this?
Your child model should actually be
class Child(ndb.Model):
parent_key = ndb.KeyProperty(kind="Parent", indexed = True)
If you were doing this in Python2, you could use an ndb_tasklet (see code below; note that I haven't executed this code myself so it's not guaranteed to work; it's just here to serve as a guide but I have used tasklets in the past before). If python3, try and create async queries
class Parent(ndb.Model):
#classmethod
def parentsWithChildren(cls, parents_list):
#ndb.tasklet
def child_callback(parent_key):
q = Child.query(Child.parent_key == parent_key)
output = yield q.fetch_async(projection = 'parent_key')
raise ndb.Return ((parentKey, output))
# For each key in parents_list, invoke the callback - child_callback which returns a result only if the parent_key matches
# ndb.tasklet is asynchronous so code is run in parallel
final_output = [ child_callback(a) for a in parents_list]
return final_output

Conditionally adding several filters to a SQLAlchemy query without duplicating code

I have a SQLAlchemy model:
class Ticket(db.Model):
__tablename__ = 'ticket'
id = db.Column(INTEGER(unsigned=True), primary_key=True, nullable=False,
autoincrement=True)
cluster = db.Column(db.VARCHAR(128))
#classmethod
def get(cls, cluster=None):
query = db.session.query(Ticket)
if cluster is not None:
query = query.filter(Ticket.cluster==cluster)
return query.one()
If I add a new column and would like to extend the get method, I have to add one if xxx is not None like this below:
#classmethod
def get(cls, cluster=None, user=None):
query = db.session.query(Ticket)
if cluster is not None:
query = query.filter(Ticket.cluster==cluster)
if user is not None:
query = query.filter(Ticket.user==user)
return query.one()
Is there any way I could make this more efficient? If I have too many columns, the get method would become so ugly.
As always, if you don't want to write something repetitive, use a loop:
#classmethod
def get(cls, **kwargs):
query = db.session.query(Ticket)
for k, v in kwargs.items():
query = query.filter(getattr(table, k) == v)
return query.one()
Because we're no longer setting the cluster=None/user=None as defaults (but instead depending on things that weren't specified by the caller simply never being added to kwargs), we no longer need to prevent filters for null values from being added: The only way a null value will end up in the argument list is if the user actually asked to search for a value of None; so this new code is able to honor that request should it ever take place.
If you prefer to retain the calling convention where cluster and user can be passed positionally (but the user can't search for a value of None), see the initial version of this answer.

Tracking model changes in SQLAlchemy

I want to log every action what will be done with some SQLAlchemy-Models.
So, I have a after_insert, after_delete and before_update hooks, where I will save previous and current representation of model,
def keep_logs(cls):
#event.listens_for(cls, 'after_delete')
def after_delete_trigger(mapper, connection, target):
pass
#event.listens_for(cls, 'after_insert')
def after_insert_trigger(mapper, connection, target):
pass
#event.listens_for(cls, 'before_update')
def before_update_trigger(mapper, connection, target):
prev = cls.query.filter_by(id=target.id).one()
# comparing previous and current model
MODELS_TO_LOGGING = (
User,
)
for cls in MODELS_TO_LOGGING:
keep_logs(cls)
But there is one problem: when I'm trying to find model in before_update hook, SQLA returns modified (dirty) version.
How can I get previous version of model before updating it?
Is there a different way to keep model changes?
Thanks!
SQLAlchemy tracks the changes to each attribute. You don't need to (and shouldn't) query the instance again in the event. Additionally, the event is triggered for any instance that has been modified, even if that modification will not change any data. Loop over each column, checking if it has been modified, and store any new values.
#event.listens_for(cls, 'before_update')
def before_update(mapper, connection, target):
state = db.inspect(target)
changes = {}
for attr in state.attrs:
hist = attr.load_history()
if not hist.has_changes():
continue
# hist.deleted holds old value
# hist.added holds new value
changes[attr.key] = hist.added
# now changes map keys to new values
I had a similar problem but wanted to be able to keep track of the deltas as changes are made to sqlalchemy models instead of just the new values. I wrote this slight extension to davidism's answer to do that along with slightly better handling of before and after, since they are lists sometimes or empty tuples other times:
from sqlalchemy import inspect
def get_model_changes(model):
"""
Return a dictionary containing changes made to the model since it was
fetched from the database.
The dictionary is of the form {'property_name': [old_value, new_value]}
Example:
user = get_user_by_id(420)
>>> '<User id=402 email="business_email#gmail.com">'
get_model_changes(user)
>>> {}
user.email = 'new_email#who-dis.biz'
get_model_changes(user)
>>> {'email': ['business_email#gmail.com', 'new_email#who-dis.biz']}
"""
state = inspect(model)
changes = {}
for attr in state.attrs:
hist = state.get_history(attr.key, True)
if not hist.has_changes():
continue
old_value = hist.deleted[0] if hist.deleted else None
new_value = hist.added[0] if hist.added else None
changes[attr.key] = [old_value, new_value]
return changes
def has_model_changed(model):
"""
Return True if there are any unsaved changes on the model.
"""
return bool(get_model_changes(model))
If an attribute is expired (which sessions do by default on commit) the old value is not available unless it was loaded before being changed. You can see this with the inspection.
state = inspect(entity)
session.commit()
state.attrs.my_attribute.history # History(added=None, unchanged=None, deleted=None)
# Load history manually
state.attrs.my_attribute.load_history()
state.attrs.my_attribute.history # History(added=(), unchanged=['my_value'], deleted=())
In order for attributes to stay loaded you can not expire entities by settings expire_on_commit to False on the session.

What is the proper way to delineate modules and classes in Python?

I am new to Python, and I'm starting to learn the basics of the code structure. I've got a basic app that I'm working on up on my Github.
For my simple app, I'm create a basic "Evernote-like" service which allows the user to create and edit a list of notes. In the early design, I have a Note object and a Notepad object, which is effectively a list of notes. Presently, I have the following file structure:
Notes.py
|
|------ Notepad (class)
|------ Note (class)
From my current understanding and implementation, this translates into the "Notes" module having a Notepad class and Note class, so when I do an import, I'm saying "from Notes import Notepad / from Notes import Note".
Is this the right approach? I feel, out of Java habit, that I should have a folder for Notes and the two classes as individual files.
My goal here is to understand what the best practice is.
As long as the classes are rather small put them into one file.
You can still move them later, if necessary.
Actually, it is rather common for larger projects to have a rather deep hierarchy but expose a more flat one to the user. So if you move things later but would like still have notes.Note even though the class Note moved deeper, it would be simple to just import note.path.to.module.Note into notes and the user can get it from there. You don't have to do that but you can. So even if you change your mind later but would like to keep the API, no problem.
I've been working in a similar application myself. I can't say this is the best possible approach, but it served me well. The classes are meant to interact with the database (context) when the user makes a request (http request, this is a webapp).
# -*- coding: utf-8 -*-
import json
import datetime
class Note ():
"""A note. This class is part of the data model and is instantiated every
time there access to the database"""
def __init__(self, noteid = 0, note = "", date = datetime.datetime.now(), context = None):
self.id = noteid
self.note = note
self.date = date
self.ctx = context #context holds the db connection and some globals
def get(self):
"""Get the current object from the database. This function needs the
instance to have an id"""
if id == 0:
raise self.ctx.ApplicationError(404, ("No note with id 0 exists"))
cursor = self.ctx.db.conn.cursor()
cursor.execute("select note, date from %s.notes where id=%s" %
(self.ctx.db.DB_NAME, str(self.id)))
data = cursor.fetchone()
if not data:
raise self.ctx.ApplicationError(404, ("No note with id "
+ self.id + " was found"))
self.note = data[0]
self.date = data[1]
return self
def insert(self, user):
"""This function inserts the object to the database. It can be an empty
note. User must be authenticated to add notes (authentication handled
elsewhere)"""
cursor = self.ctx.db.conn.cursor()
query = ("insert into %s.notes (note, owner) values ('%s', '%s')" %
(self.ctx.db.DB_NAME, str(self.note), str(user['id'])))
cursor.execute(query)
return self
def put(self):
"""Modify the current note in the database"""
cursor = self.ctx.db.conn.cursor()
query = ("update %s.notes set note = '%s' where id = %s" %
(self.ctx.db.DB_NAME, str(self.note), str(self.id)))
cursor.execute(query)
return self
def delete(self):
"""Delete the current note, by id"""
if self.id == 0:
raise self.ctx.ApplicationError(404, "No note with id 0 exists")
cursor = self.ctx.db.conn.cursor()
query = ("delete from %s.notes where id = %s" %
(self.ctx.db.DB_NAME, str(self.id)))
cursor.execute(query)
def toJson(self):
"""Returns a json string of the note object's data attributes"""
return json.dumps(self.toDict())
def toDict(self):
"""Returns a dict of the note object's data attributes"""
return {
"id" : self.id,
"note" : self.note,
"date" : self.date.strftime("%Y-%m-%d %H:%M:%S")
}
class NotesCollection():
"""This class handles the notes as a collection"""
collection = []
def get(self, user, context):
"""Populate the collection object and return it"""
cursor = context.db.conn.cursor()
cursor.execute("select id, note, date from %s.notes where owner=%s" %
(context.db.DB_NAME, str(user["id"])))
note = cursor.fetchone()
while note:
self.collection.append(Note(note[0], note[1],note[2]))
note = cursor.fetchone()
return self
def toJson(self):
"""Return a json string of the current collection"""
return json.dumps([note.toDict() for note in self.collection])
I personally use python as a "get it done" language, and don't bother myself with details. This shows in the code above. However one piece of advice: There are no private variables nor methods in python, so don't bother trying to create them. Make your life easier, code fast, get it done
Usage example:
class NotesCollection(BaseHandler):
#tornado.web.authenticated
def get(self):
"""Retrieve all notes from the current user and return a json object"""
allNotes = Note.NotesCollection().get(self.get_current_user(), settings["context"])
json = allNotes.toJson()
self.write(json)
#protected
#tornado.web.authenticated
def post(self):
"""Handles all post requests to /notes"""
requestType = self.get_argument("type", "POST")
ctx = settings["context"]
if requestType == "POST":
Note.Note(note = self.get_argument("note", ""),
context = ctx).insert(self.get_current_user())
elif requestType == "DELETE":
Note.Note(id = self.get_argument("id"), context = ctx).delete()
elif requestType == "PUT":
Note.Note(id = self.get_argument("id"),
note = self.get_argument("note"),
context = ctx).put()
else:
raise ApplicationError(405, "Method not allowed")
By using decorators I'm getting user authentication and error handling out of the main code. This makes it clearer and easier to mantain.

Fastest Way to Create a New Object Only if it Doesn't Already Exist (SQLAlchemy) [duplicate]

I want to get an object from the database if it already exists (based on provided parameters) or create it if it does not.
Django's get_or_create (or source) does this. Is there an equivalent shortcut in SQLAlchemy?
I'm currently writing it out explicitly like this:
def get_or_create_instrument(session, serial_number):
instrument = session.query(Instrument).filter_by(serial_number=serial_number).first()
if instrument:
return instrument
else:
instrument = Instrument(serial_number)
session.add(instrument)
return instrument
Following the solution of #WoLpH, this is the code that worked for me (simple version):
def get_or_create(session, model, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance
else:
instance = model(**kwargs)
session.add(instance)
session.commit()
return instance
With this, I'm able to get_or_create any object of my model.
Suppose my model object is :
class Country(Base):
__tablename__ = 'countries'
id = Column(Integer, primary_key=True)
name = Column(String, unique=True)
To get or create my object I write :
myCountry = get_or_create(session, Country, name=countryName)
That's basically the way to do it, there is no shortcut readily available AFAIK.
You could generalize it ofcourse:
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).one_or_none()
if instance:
return instance, False
else:
params = {k: v for k, v in kwargs.items() if not isinstance(v, ClauseElement)}
params.update(defaults or {})
instance = model(**params)
try:
session.add(instance)
session.commit()
except Exception: # The actual exception depends on the specific database so we catch all exceptions. This is similar to the official documentation: https://docs.sqlalchemy.org/en/latest/orm/session_transaction.html
session.rollback()
instance = session.query(model).filter_by(**kwargs).one()
return instance, False
else:
return instance, True
2020 update (Python 3.9+ ONLY)
Here is a cleaner version with Python 3.9's the new dict union operator (|=)
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).one_or_none()
if instance:
return instance, False
else:
kwargs |= defaults or {}
instance = model(**kwargs)
try:
session.add(instance)
session.commit()
except Exception: # The actual exception depends on the specific database so we catch all exceptions. This is similar to the official documentation: https://docs.sqlalchemy.org/en/latest/orm/session_transaction.html
session.rollback()
instance = session.query(model).filter_by(**kwargs).one()
return instance, False
else:
return instance, True
Note:
Similar to the Django version this will catch duplicate key constraints and similar errors. If your get or create is not guaranteed to return a single result it can still result in race conditions.
To alleviate some of that issue you would need to add another one_or_none() style fetch right after the session.commit(). This still is no 100% guarantee against race conditions unless you also use a with_for_update() or serializable transaction mode.
I've been playing with this problem and have ended up with a fairly robust solution:
def get_one_or_create(session,
model,
create_method='',
create_method_kwargs=None,
**kwargs):
try:
return session.query(model).filter_by(**kwargs).one(), False
except NoResultFound:
kwargs.update(create_method_kwargs or {})
created = getattr(model, create_method, model)(**kwargs)
try:
session.add(created)
session.flush()
return created, True
except IntegrityError:
session.rollback()
return session.query(model).filter_by(**kwargs).one(), False
I just wrote a fairly expansive blog post on all the details, but a few quite ideas of why I used this.
It unpacks to a tuple that tells you if the object existed or not. This can often be useful in your workflow.
The function gives the ability to work with #classmethod decorated creator functions (and attributes specific to them).
The solution protects against Race Conditions when you have more than one process connected to the datastore.
EDIT: I've changed session.commit() to session.flush() as explained in this blog post. Note that these decisions are specific to the datastore used (Postgres in this case).
EDIT 2: I’ve updated using a {} as a default value in the function as this is typical Python gotcha. Thanks for the comment, Nigel! If your curious about this gotcha, check out this StackOverflow question and this blog post.
A modified version of erik's excellent answer
def get_one_or_create(session,
model,
create_method='',
create_method_kwargs=None,
**kwargs):
try:
return session.query(model).filter_by(**kwargs).one(), True
except NoResultFound:
kwargs.update(create_method_kwargs or {})
try:
with session.begin_nested():
created = getattr(model, create_method, model)(**kwargs)
session.add(created)
return created, False
except IntegrityError:
return session.query(model).filter_by(**kwargs).one(), True
Use a nested transaction to only roll back the addition of the new item instead of rolling back everything (See this answer to use nested transactions with SQLite)
Move create_method. If the created object has relations and it is assigned members through those relations, it is automatically added to the session. E.g. create a book, which has user_id and user as corresponding relationship, then doing book.user=<user object> inside of create_method will add book to the session. This means that create_method must be inside with to benefit from an eventual rollback. Note that begin_nested automatically triggers a flush.
Note that if using MySQL, the transaction isolation level must be set to READ COMMITTED rather than REPEATABLE READ for this to work. Django's get_or_create (and here) uses the same stratagem, see also the Django documentation.
This SQLALchemy recipe does the job nice and elegant.
The first thing to do is to define a function that is given a Session to work with, and associates a dictionary with the Session() which keeps track of current unique keys.
def _unique(session, cls, hashfunc, queryfunc, constructor, arg, kw):
cache = getattr(session, '_unique_cache', None)
if cache is None:
session._unique_cache = cache = {}
key = (cls, hashfunc(*arg, **kw))
if key in cache:
return cache[key]
else:
with session.no_autoflush:
q = session.query(cls)
q = queryfunc(q, *arg, **kw)
obj = q.first()
if not obj:
obj = constructor(*arg, **kw)
session.add(obj)
cache[key] = obj
return obj
An example of utilizing this function would be in a mixin:
class UniqueMixin(object):
#classmethod
def unique_hash(cls, *arg, **kw):
raise NotImplementedError()
#classmethod
def unique_filter(cls, query, *arg, **kw):
raise NotImplementedError()
#classmethod
def as_unique(cls, session, *arg, **kw):
return _unique(
session,
cls,
cls.unique_hash,
cls.unique_filter,
cls,
arg, kw
)
And finally creating the unique get_or_create model:
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
engine = create_engine('sqlite://', echo=True)
Session = sessionmaker(bind=engine)
class Widget(UniqueMixin, Base):
__tablename__ = 'widget'
id = Column(Integer, primary_key=True)
name = Column(String, unique=True, nullable=False)
#classmethod
def unique_hash(cls, name):
return name
#classmethod
def unique_filter(cls, query, name):
return query.filter(Widget.name == name)
Base.metadata.create_all(engine)
session = Session()
w1, w2, w3 = Widget.as_unique(session, name='w1'), \
Widget.as_unique(session, name='w2'), \
Widget.as_unique(session, name='w3')
w1b = Widget.as_unique(session, name='w1')
assert w1 is w1b
assert w2 is not w3
assert w2 is not w1
session.commit()
The recipe goes deeper into the idea and provides different approaches but I've used this one with great success.
The closest semantically is probably:
def get_or_create(model, **kwargs):
"""SqlAlchemy implementation of Django's get_or_create.
"""
session = Session()
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance, False
else:
instance = model(**kwargs)
session.add(instance)
session.commit()
return instance, True
not sure how kosher it is to rely on a globally defined Session in sqlalchemy, but the Django version doesn't take a connection so...
The tuple returned contains the instance and a boolean indicating if the instance was created (i.e. it's False if we read the instance from the db).
Django's get_or_create is often used to make sure that global data is available, so I'm committing at the earliest point possible.
I slightly simplified #Kevin. solution to avoid wrapping the whole function in an if/else statement. This way there's only one return, which I find cleaner:
def get_or_create(session, model, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if not instance:
instance = model(**kwargs)
session.add(instance)
return instance
There is a Python package that has #erik's solution as well as a version of update_or_create(). https://github.com/enricobarzetti/sqlalchemy_get_or_create
Depending on the isolation level you adopted, none of the above solutions would work.
The best solution I have found is a RAW SQL in the following form:
INSERT INTO table(f1, f2, unique_f3)
SELECT 'v1', 'v2', 'v3'
WHERE NOT EXISTS (SELECT 1 FROM table WHERE f3 = 'v3')
This is transactionally safe whatever the isolation level and the degree of parallelism are.
Beware: in order to make it efficient, it would be wise to have an INDEX for the unique column.
One problem I regularly encounter is when a field has a max length (say, STRING(40)) and you'd like to perform a get or create with a string of large length, the above solutions will fail.
Building off of the above solutions, here's my approach:
from sqlalchemy import Column, String
def get_or_create(self, add=True, flush=True, commit=False, **kwargs):
"""
Get the an entity based on the kwargs or create an entity with those kwargs.
Params:
add: (default True) should the instance be added to the session?
flush: (default True) flush the instance to the session?
commit: (default False) commit the session?
kwargs: key, value pairs of parameters to lookup/create.
Ex: SocialPlatform.get_or_create(**{'name':'facebook'})
returns --> existing record or, will create a new record
---------
NOTE: I like to add this as a classmethod in the base class of my tables, so that
all data models inherit the base class --> functionality is transmitted across
all orm defined models.
"""
# Truncate values if necessary
for key, value in kwargs.items():
# Only use strings
if not isinstance(value, str):
continue
# Only use if it's a column
my_col = getattr(self.__table__.columns, key)
if not isinstance(my_col, Column):
continue
# Skip non strings again here
if not isinstance(my_col.type, String):
continue
# Get the max length
max_len = my_col.type.length
if value and max_len and len(value) > max_len:
# Update the value
value = value[:max_len]
kwargs[key] = value
# -------------------------------------------------
# Make the query...
instance = session.query(self).filter_by(**kwargs).first()
if instance:
return instance
else:
# Max length isn't accounted for here.
# The assumption is that auto-truncation will happen on the child-model
# Or directtly in the db
instance = self(**kwargs)
# You'll usually want to add to the session
if add:
session.add(instance)
# Navigate these with caution
if add and commit:
try:
session.commit()
except IntegrityError:
session.rollback()
elif add and flush:
session.flush()
return instance

Categories