sqlalchemy lookup tables - python

Hi I have a table in 3NF form
ftype_table = Table(
'FTYPE',
Column('ftypeid', Integer, primary_key=True),
Column('typename', String(50)),
base.metadata,
schema='TEMP')
file_table = Table(
'FILE',
base.metadata,
Column('fileid', Integer, primary_key=True),
Column('datatypeid', Integer, ForeignKey(ftype_table.c.datatypeid)),
Column('size', Integer),
schema='TEMP')
and mappers
class File(object): pass
class FileType(object): pass
mapper(File, file_table, properties={'filetype': relation(FileType)})
mapper(FileType, file_table)
suppose Ftype table contains 1:TXT 2:AVI 3:PPT
what i would like to do is the following if i create a File object like this:
file=File()
file.size=10
file.filetype= FileType('PPT')
Session.save(file)
Session.flush()
is that the File table contains fileid:xxx,size:10, datatypeid:3
Unfortunately an entry gets added to the FileType table and this id gets propagated to the File table.
Is there a smart way to do achieve the above with sqlalchemy witout the need to do a query on the FileType table to see if the entry exist or not
Thanks

the UniqueObject recipe is the standard answer here: http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject . The idea is to override the creation of File using either __metaclass__.call() or File.__new__() to return the already-existing object, from the DB or from cache (the initial DB lookup, if the object isn't already present, is obviously unavoidable unless something constructed around MySQL's REPLACE is used).
edit: since I've been working on the usage recipes, I've rewritten the unique object recipe to be more portable and updated for 0.5/0.6.

Just create a cache of FileType objects, so that the database lookup occurs only the first time you use a given file type:
class FileTypeCache(dict):
def __missing__(self, key):
obj = self[key] = Session.query(FileType).filter_by(typename=key).one()
return obj
filetype_cache = FileTypeCache()
file=File()
file.size=10
file.filetype= filetype_cache['PPT']
should work, modulo typos.

Since declarative_base and zzzeek code does not work with sqlalchemy 0.4, I
used the following cache so that new objects also stay unique if they are not present in the db
class FileTypeCache(dict):
def __missing__(self, key):
try:
obj = self[key] = Session.query(FileType).filter_by(typename=key).one()
return obj
except InvalidRequestError:
return obj=self[key]= FileType(key)
return obj
override eq of FileType
class FileType(object):
def __init__(self, typename)
self.typename=typename
def __eq__(self):
if isinstance(other, FileType):
return self.typename == other.typename
else:
return False

Related

Domain Model id with equality and SqlAlchemy autoincrement?

Say I have a domain model with an id field plus eq and hash. Then there is a simple SqlAlchemy ORM mapping.
#dataclass
class Foo:
_id: int
value: 50
def __eq__(self, other):
if not isinstance(other, Foo):
return False
return other._id == self._id
def __hash__(self):
return hash(self._id)
foo = Table(
"foo",
mapper_registry.metadata,
Column("_id", Integer, primary_key=True, autoincrement=True),
Column("value", Float, nullable=False)
mapper_registry.map_imperatively(Foo, foo)
This seems simple enough and follows the documentation for SqlAlchemy. My problem is the _id column/property. If I create a test to load Foo from the database:
def test_foo_mapper_can_load_foos(session):
with session:
session.execute(
'INSERT INTO foo (value)'
'VALUES (50)'
)
session.commit()
expected = [
Foo(_id=1, value=50),
]
assert session.query(Foo).all() == expected
this works, fine. The model is initialised with ids from the database.
But, what about initialising a model to commit to the database. If the client creates a new foo to try to write to the database, how should I approach the id for the model, before gets committed?
def test_foo_mapper_can_save_foos(session):
#option1 - manually set it (collides with auto_increment)
new_foo = Foo(_id=1, value: 50)
#option2 - set to None (collides with __eq__/hashing)
new foo = Foo(_id=None, value: 50)
#option3 - get rid of id from domain model and only have it in db
new_foo = Foo(value: 50)
session.add(new_foo)
session.commit()
rows = list(session.execute('SELECT * FROM "foo"'))
assert rows == [(1, 50)]
Each of the test options can work, but none of the implementations seem like good code.
In option 1, when the client creates a new foo an id must be set, the dataclass requires it in the constructor... but it seems to not really be in line with the auto_increment primary key idea on the table - the client can pass any id, whether it is the next in sequence or not and the mapper will try to use it. I feel the database should be responsible for setting primary key.
So, on to option 2. Set the id to None, and the database will take care of it on commit. However, the eq and hash functions rely on id for equality and the object becomes unhashable. This could also be done by setting _id: int = None as a default value on the domain model itself. But again, seems like a bad solution.
Finally option 3... remove the _id field from the domain model... which has popped up in a couple of articles, but also seem less than ideal as Foo now has no unique id for use in select statements, and use in other business logic etc...
I'm sure I'm thinking about this all wrong, I just can't figure out where.

SQLAlchemy insert new records with many to one relationship [duplicate]

Hi I have a table in 3NF form
ftype_table = Table(
'FTYPE',
Column('ftypeid', Integer, primary_key=True),
Column('typename', String(50)),
base.metadata,
schema='TEMP')
file_table = Table(
'FILE',
base.metadata,
Column('fileid', Integer, primary_key=True),
Column('datatypeid', Integer, ForeignKey(ftype_table.c.datatypeid)),
Column('size', Integer),
schema='TEMP')
and mappers
class File(object): pass
class FileType(object): pass
mapper(File, file_table, properties={'filetype': relation(FileType)})
mapper(FileType, file_table)
suppose Ftype table contains 1:TXT 2:AVI 3:PPT
what i would like to do is the following if i create a File object like this:
file=File()
file.size=10
file.filetype= FileType('PPT')
Session.save(file)
Session.flush()
is that the File table contains fileid:xxx,size:10, datatypeid:3
Unfortunately an entry gets added to the FileType table and this id gets propagated to the File table.
Is there a smart way to do achieve the above with sqlalchemy witout the need to do a query on the FileType table to see if the entry exist or not
Thanks
the UniqueObject recipe is the standard answer here: http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject . The idea is to override the creation of File using either __metaclass__.call() or File.__new__() to return the already-existing object, from the DB or from cache (the initial DB lookup, if the object isn't already present, is obviously unavoidable unless something constructed around MySQL's REPLACE is used).
edit: since I've been working on the usage recipes, I've rewritten the unique object recipe to be more portable and updated for 0.5/0.6.
Just create a cache of FileType objects, so that the database lookup occurs only the first time you use a given file type:
class FileTypeCache(dict):
def __missing__(self, key):
obj = self[key] = Session.query(FileType).filter_by(typename=key).one()
return obj
filetype_cache = FileTypeCache()
file=File()
file.size=10
file.filetype= filetype_cache['PPT']
should work, modulo typos.
Since declarative_base and zzzeek code does not work with sqlalchemy 0.4, I
used the following cache so that new objects also stay unique if they are not present in the db
class FileTypeCache(dict):
def __missing__(self, key):
try:
obj = self[key] = Session.query(FileType).filter_by(typename=key).one()
return obj
except InvalidRequestError:
return obj=self[key]= FileType(key)
return obj
override eq of FileType
class FileType(object):
def __init__(self, typename)
self.typename=typename
def __eq__(self):
if isinstance(other, FileType):
return self.typename == other.typename
else:
return False

To lock or to catch IntegrityError? [duplicate]

I want to get an object from the database if it already exists (based on provided parameters) or create it if it does not.
Django's get_or_create (or source) does this. Is there an equivalent shortcut in SQLAlchemy?
I'm currently writing it out explicitly like this:
def get_or_create_instrument(session, serial_number):
instrument = session.query(Instrument).filter_by(serial_number=serial_number).first()
if instrument:
return instrument
else:
instrument = Instrument(serial_number)
session.add(instrument)
return instrument
Following the solution of #WoLpH, this is the code that worked for me (simple version):
def get_or_create(session, model, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance
else:
instance = model(**kwargs)
session.add(instance)
session.commit()
return instance
With this, I'm able to get_or_create any object of my model.
Suppose my model object is :
class Country(Base):
__tablename__ = 'countries'
id = Column(Integer, primary_key=True)
name = Column(String, unique=True)
To get or create my object I write :
myCountry = get_or_create(session, Country, name=countryName)
That's basically the way to do it, there is no shortcut readily available AFAIK.
You could generalize it ofcourse:
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).one_or_none()
if instance:
return instance, False
else:
params = {k: v for k, v in kwargs.items() if not isinstance(v, ClauseElement)}
params.update(defaults or {})
instance = model(**params)
try:
session.add(instance)
session.commit()
except Exception: # The actual exception depends on the specific database so we catch all exceptions. This is similar to the official documentation: https://docs.sqlalchemy.org/en/latest/orm/session_transaction.html
session.rollback()
instance = session.query(model).filter_by(**kwargs).one()
return instance, False
else:
return instance, True
2020 update (Python 3.9+ ONLY)
Here is a cleaner version with Python 3.9's the new dict union operator (|=)
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).one_or_none()
if instance:
return instance, False
else:
kwargs |= defaults or {}
instance = model(**kwargs)
try:
session.add(instance)
session.commit()
except Exception: # The actual exception depends on the specific database so we catch all exceptions. This is similar to the official documentation: https://docs.sqlalchemy.org/en/latest/orm/session_transaction.html
session.rollback()
instance = session.query(model).filter_by(**kwargs).one()
return instance, False
else:
return instance, True
Note:
Similar to the Django version this will catch duplicate key constraints and similar errors. If your get or create is not guaranteed to return a single result it can still result in race conditions.
To alleviate some of that issue you would need to add another one_or_none() style fetch right after the session.commit(). This still is no 100% guarantee against race conditions unless you also use a with_for_update() or serializable transaction mode.
I've been playing with this problem and have ended up with a fairly robust solution:
def get_one_or_create(session,
model,
create_method='',
create_method_kwargs=None,
**kwargs):
try:
return session.query(model).filter_by(**kwargs).one(), False
except NoResultFound:
kwargs.update(create_method_kwargs or {})
created = getattr(model, create_method, model)(**kwargs)
try:
session.add(created)
session.flush()
return created, True
except IntegrityError:
session.rollback()
return session.query(model).filter_by(**kwargs).one(), False
I just wrote a fairly expansive blog post on all the details, but a few quite ideas of why I used this.
It unpacks to a tuple that tells you if the object existed or not. This can often be useful in your workflow.
The function gives the ability to work with #classmethod decorated creator functions (and attributes specific to them).
The solution protects against Race Conditions when you have more than one process connected to the datastore.
EDIT: I've changed session.commit() to session.flush() as explained in this blog post. Note that these decisions are specific to the datastore used (Postgres in this case).
EDIT 2: I’ve updated using a {} as a default value in the function as this is typical Python gotcha. Thanks for the comment, Nigel! If your curious about this gotcha, check out this StackOverflow question and this blog post.
A modified version of erik's excellent answer
def get_one_or_create(session,
model,
create_method='',
create_method_kwargs=None,
**kwargs):
try:
return session.query(model).filter_by(**kwargs).one(), True
except NoResultFound:
kwargs.update(create_method_kwargs or {})
try:
with session.begin_nested():
created = getattr(model, create_method, model)(**kwargs)
session.add(created)
return created, False
except IntegrityError:
return session.query(model).filter_by(**kwargs).one(), True
Use a nested transaction to only roll back the addition of the new item instead of rolling back everything (See this answer to use nested transactions with SQLite)
Move create_method. If the created object has relations and it is assigned members through those relations, it is automatically added to the session. E.g. create a book, which has user_id and user as corresponding relationship, then doing book.user=<user object> inside of create_method will add book to the session. This means that create_method must be inside with to benefit from an eventual rollback. Note that begin_nested automatically triggers a flush.
Note that if using MySQL, the transaction isolation level must be set to READ COMMITTED rather than REPEATABLE READ for this to work. Django's get_or_create (and here) uses the same stratagem, see also the Django documentation.
This SQLALchemy recipe does the job nice and elegant.
The first thing to do is to define a function that is given a Session to work with, and associates a dictionary with the Session() which keeps track of current unique keys.
def _unique(session, cls, hashfunc, queryfunc, constructor, arg, kw):
cache = getattr(session, '_unique_cache', None)
if cache is None:
session._unique_cache = cache = {}
key = (cls, hashfunc(*arg, **kw))
if key in cache:
return cache[key]
else:
with session.no_autoflush:
q = session.query(cls)
q = queryfunc(q, *arg, **kw)
obj = q.first()
if not obj:
obj = constructor(*arg, **kw)
session.add(obj)
cache[key] = obj
return obj
An example of utilizing this function would be in a mixin:
class UniqueMixin(object):
#classmethod
def unique_hash(cls, *arg, **kw):
raise NotImplementedError()
#classmethod
def unique_filter(cls, query, *arg, **kw):
raise NotImplementedError()
#classmethod
def as_unique(cls, session, *arg, **kw):
return _unique(
session,
cls,
cls.unique_hash,
cls.unique_filter,
cls,
arg, kw
)
And finally creating the unique get_or_create model:
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
engine = create_engine('sqlite://', echo=True)
Session = sessionmaker(bind=engine)
class Widget(UniqueMixin, Base):
__tablename__ = 'widget'
id = Column(Integer, primary_key=True)
name = Column(String, unique=True, nullable=False)
#classmethod
def unique_hash(cls, name):
return name
#classmethod
def unique_filter(cls, query, name):
return query.filter(Widget.name == name)
Base.metadata.create_all(engine)
session = Session()
w1, w2, w3 = Widget.as_unique(session, name='w1'), \
Widget.as_unique(session, name='w2'), \
Widget.as_unique(session, name='w3')
w1b = Widget.as_unique(session, name='w1')
assert w1 is w1b
assert w2 is not w3
assert w2 is not w1
session.commit()
The recipe goes deeper into the idea and provides different approaches but I've used this one with great success.
The closest semantically is probably:
def get_or_create(model, **kwargs):
"""SqlAlchemy implementation of Django's get_or_create.
"""
session = Session()
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance, False
else:
instance = model(**kwargs)
session.add(instance)
session.commit()
return instance, True
not sure how kosher it is to rely on a globally defined Session in sqlalchemy, but the Django version doesn't take a connection so...
The tuple returned contains the instance and a boolean indicating if the instance was created (i.e. it's False if we read the instance from the db).
Django's get_or_create is often used to make sure that global data is available, so I'm committing at the earliest point possible.
I slightly simplified #Kevin. solution to avoid wrapping the whole function in an if/else statement. This way there's only one return, which I find cleaner:
def get_or_create(session, model, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if not instance:
instance = model(**kwargs)
session.add(instance)
return instance
There is a Python package that has #erik's solution as well as a version of update_or_create(). https://github.com/enricobarzetti/sqlalchemy_get_or_create
Depending on the isolation level you adopted, none of the above solutions would work.
The best solution I have found is a RAW SQL in the following form:
INSERT INTO table(f1, f2, unique_f3)
SELECT 'v1', 'v2', 'v3'
WHERE NOT EXISTS (SELECT 1 FROM table WHERE f3 = 'v3')
This is transactionally safe whatever the isolation level and the degree of parallelism are.
Beware: in order to make it efficient, it would be wise to have an INDEX for the unique column.
One problem I regularly encounter is when a field has a max length (say, STRING(40)) and you'd like to perform a get or create with a string of large length, the above solutions will fail.
Building off of the above solutions, here's my approach:
from sqlalchemy import Column, String
def get_or_create(self, add=True, flush=True, commit=False, **kwargs):
"""
Get the an entity based on the kwargs or create an entity with those kwargs.
Params:
add: (default True) should the instance be added to the session?
flush: (default True) flush the instance to the session?
commit: (default False) commit the session?
kwargs: key, value pairs of parameters to lookup/create.
Ex: SocialPlatform.get_or_create(**{'name':'facebook'})
returns --> existing record or, will create a new record
---------
NOTE: I like to add this as a classmethod in the base class of my tables, so that
all data models inherit the base class --> functionality is transmitted across
all orm defined models.
"""
# Truncate values if necessary
for key, value in kwargs.items():
# Only use strings
if not isinstance(value, str):
continue
# Only use if it's a column
my_col = getattr(self.__table__.columns, key)
if not isinstance(my_col, Column):
continue
# Skip non strings again here
if not isinstance(my_col.type, String):
continue
# Get the max length
max_len = my_col.type.length
if value and max_len and len(value) > max_len:
# Update the value
value = value[:max_len]
kwargs[key] = value
# -------------------------------------------------
# Make the query...
instance = session.query(self).filter_by(**kwargs).first()
if instance:
return instance
else:
# Max length isn't accounted for here.
# The assumption is that auto-truncation will happen on the child-model
# Or directtly in the db
instance = self(**kwargs)
# You'll usually want to add to the session
if add:
session.add(instance)
# Navigate these with caution
if add and commit:
try:
session.commit()
except IntegrityError:
session.rollback()
elif add and flush:
session.flush()
return instance

Fastest Way to Create a New Object Only if it Doesn't Already Exist (SQLAlchemy) [duplicate]

I want to get an object from the database if it already exists (based on provided parameters) or create it if it does not.
Django's get_or_create (or source) does this. Is there an equivalent shortcut in SQLAlchemy?
I'm currently writing it out explicitly like this:
def get_or_create_instrument(session, serial_number):
instrument = session.query(Instrument).filter_by(serial_number=serial_number).first()
if instrument:
return instrument
else:
instrument = Instrument(serial_number)
session.add(instrument)
return instrument
Following the solution of #WoLpH, this is the code that worked for me (simple version):
def get_or_create(session, model, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance
else:
instance = model(**kwargs)
session.add(instance)
session.commit()
return instance
With this, I'm able to get_or_create any object of my model.
Suppose my model object is :
class Country(Base):
__tablename__ = 'countries'
id = Column(Integer, primary_key=True)
name = Column(String, unique=True)
To get or create my object I write :
myCountry = get_or_create(session, Country, name=countryName)
That's basically the way to do it, there is no shortcut readily available AFAIK.
You could generalize it ofcourse:
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).one_or_none()
if instance:
return instance, False
else:
params = {k: v for k, v in kwargs.items() if not isinstance(v, ClauseElement)}
params.update(defaults or {})
instance = model(**params)
try:
session.add(instance)
session.commit()
except Exception: # The actual exception depends on the specific database so we catch all exceptions. This is similar to the official documentation: https://docs.sqlalchemy.org/en/latest/orm/session_transaction.html
session.rollback()
instance = session.query(model).filter_by(**kwargs).one()
return instance, False
else:
return instance, True
2020 update (Python 3.9+ ONLY)
Here is a cleaner version with Python 3.9's the new dict union operator (|=)
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).one_or_none()
if instance:
return instance, False
else:
kwargs |= defaults or {}
instance = model(**kwargs)
try:
session.add(instance)
session.commit()
except Exception: # The actual exception depends on the specific database so we catch all exceptions. This is similar to the official documentation: https://docs.sqlalchemy.org/en/latest/orm/session_transaction.html
session.rollback()
instance = session.query(model).filter_by(**kwargs).one()
return instance, False
else:
return instance, True
Note:
Similar to the Django version this will catch duplicate key constraints and similar errors. If your get or create is not guaranteed to return a single result it can still result in race conditions.
To alleviate some of that issue you would need to add another one_or_none() style fetch right after the session.commit(). This still is no 100% guarantee against race conditions unless you also use a with_for_update() or serializable transaction mode.
I've been playing with this problem and have ended up with a fairly robust solution:
def get_one_or_create(session,
model,
create_method='',
create_method_kwargs=None,
**kwargs):
try:
return session.query(model).filter_by(**kwargs).one(), False
except NoResultFound:
kwargs.update(create_method_kwargs or {})
created = getattr(model, create_method, model)(**kwargs)
try:
session.add(created)
session.flush()
return created, True
except IntegrityError:
session.rollback()
return session.query(model).filter_by(**kwargs).one(), False
I just wrote a fairly expansive blog post on all the details, but a few quite ideas of why I used this.
It unpacks to a tuple that tells you if the object existed or not. This can often be useful in your workflow.
The function gives the ability to work with #classmethod decorated creator functions (and attributes specific to them).
The solution protects against Race Conditions when you have more than one process connected to the datastore.
EDIT: I've changed session.commit() to session.flush() as explained in this blog post. Note that these decisions are specific to the datastore used (Postgres in this case).
EDIT 2: I’ve updated using a {} as a default value in the function as this is typical Python gotcha. Thanks for the comment, Nigel! If your curious about this gotcha, check out this StackOverflow question and this blog post.
A modified version of erik's excellent answer
def get_one_or_create(session,
model,
create_method='',
create_method_kwargs=None,
**kwargs):
try:
return session.query(model).filter_by(**kwargs).one(), True
except NoResultFound:
kwargs.update(create_method_kwargs or {})
try:
with session.begin_nested():
created = getattr(model, create_method, model)(**kwargs)
session.add(created)
return created, False
except IntegrityError:
return session.query(model).filter_by(**kwargs).one(), True
Use a nested transaction to only roll back the addition of the new item instead of rolling back everything (See this answer to use nested transactions with SQLite)
Move create_method. If the created object has relations and it is assigned members through those relations, it is automatically added to the session. E.g. create a book, which has user_id and user as corresponding relationship, then doing book.user=<user object> inside of create_method will add book to the session. This means that create_method must be inside with to benefit from an eventual rollback. Note that begin_nested automatically triggers a flush.
Note that if using MySQL, the transaction isolation level must be set to READ COMMITTED rather than REPEATABLE READ for this to work. Django's get_or_create (and here) uses the same stratagem, see also the Django documentation.
This SQLALchemy recipe does the job nice and elegant.
The first thing to do is to define a function that is given a Session to work with, and associates a dictionary with the Session() which keeps track of current unique keys.
def _unique(session, cls, hashfunc, queryfunc, constructor, arg, kw):
cache = getattr(session, '_unique_cache', None)
if cache is None:
session._unique_cache = cache = {}
key = (cls, hashfunc(*arg, **kw))
if key in cache:
return cache[key]
else:
with session.no_autoflush:
q = session.query(cls)
q = queryfunc(q, *arg, **kw)
obj = q.first()
if not obj:
obj = constructor(*arg, **kw)
session.add(obj)
cache[key] = obj
return obj
An example of utilizing this function would be in a mixin:
class UniqueMixin(object):
#classmethod
def unique_hash(cls, *arg, **kw):
raise NotImplementedError()
#classmethod
def unique_filter(cls, query, *arg, **kw):
raise NotImplementedError()
#classmethod
def as_unique(cls, session, *arg, **kw):
return _unique(
session,
cls,
cls.unique_hash,
cls.unique_filter,
cls,
arg, kw
)
And finally creating the unique get_or_create model:
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
engine = create_engine('sqlite://', echo=True)
Session = sessionmaker(bind=engine)
class Widget(UniqueMixin, Base):
__tablename__ = 'widget'
id = Column(Integer, primary_key=True)
name = Column(String, unique=True, nullable=False)
#classmethod
def unique_hash(cls, name):
return name
#classmethod
def unique_filter(cls, query, name):
return query.filter(Widget.name == name)
Base.metadata.create_all(engine)
session = Session()
w1, w2, w3 = Widget.as_unique(session, name='w1'), \
Widget.as_unique(session, name='w2'), \
Widget.as_unique(session, name='w3')
w1b = Widget.as_unique(session, name='w1')
assert w1 is w1b
assert w2 is not w3
assert w2 is not w1
session.commit()
The recipe goes deeper into the idea and provides different approaches but I've used this one with great success.
The closest semantically is probably:
def get_or_create(model, **kwargs):
"""SqlAlchemy implementation of Django's get_or_create.
"""
session = Session()
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance, False
else:
instance = model(**kwargs)
session.add(instance)
session.commit()
return instance, True
not sure how kosher it is to rely on a globally defined Session in sqlalchemy, but the Django version doesn't take a connection so...
The tuple returned contains the instance and a boolean indicating if the instance was created (i.e. it's False if we read the instance from the db).
Django's get_or_create is often used to make sure that global data is available, so I'm committing at the earliest point possible.
I slightly simplified #Kevin. solution to avoid wrapping the whole function in an if/else statement. This way there's only one return, which I find cleaner:
def get_or_create(session, model, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if not instance:
instance = model(**kwargs)
session.add(instance)
return instance
There is a Python package that has #erik's solution as well as a version of update_or_create(). https://github.com/enricobarzetti/sqlalchemy_get_or_create
Depending on the isolation level you adopted, none of the above solutions would work.
The best solution I have found is a RAW SQL in the following form:
INSERT INTO table(f1, f2, unique_f3)
SELECT 'v1', 'v2', 'v3'
WHERE NOT EXISTS (SELECT 1 FROM table WHERE f3 = 'v3')
This is transactionally safe whatever the isolation level and the degree of parallelism are.
Beware: in order to make it efficient, it would be wise to have an INDEX for the unique column.
One problem I regularly encounter is when a field has a max length (say, STRING(40)) and you'd like to perform a get or create with a string of large length, the above solutions will fail.
Building off of the above solutions, here's my approach:
from sqlalchemy import Column, String
def get_or_create(self, add=True, flush=True, commit=False, **kwargs):
"""
Get the an entity based on the kwargs or create an entity with those kwargs.
Params:
add: (default True) should the instance be added to the session?
flush: (default True) flush the instance to the session?
commit: (default False) commit the session?
kwargs: key, value pairs of parameters to lookup/create.
Ex: SocialPlatform.get_or_create(**{'name':'facebook'})
returns --> existing record or, will create a new record
---------
NOTE: I like to add this as a classmethod in the base class of my tables, so that
all data models inherit the base class --> functionality is transmitted across
all orm defined models.
"""
# Truncate values if necessary
for key, value in kwargs.items():
# Only use strings
if not isinstance(value, str):
continue
# Only use if it's a column
my_col = getattr(self.__table__.columns, key)
if not isinstance(my_col, Column):
continue
# Skip non strings again here
if not isinstance(my_col.type, String):
continue
# Get the max length
max_len = my_col.type.length
if value and max_len and len(value) > max_len:
# Update the value
value = value[:max_len]
kwargs[key] = value
# -------------------------------------------------
# Make the query...
instance = session.query(self).filter_by(**kwargs).first()
if instance:
return instance
else:
# Max length isn't accounted for here.
# The assumption is that auto-truncation will happen on the child-model
# Or directtly in the db
instance = self(**kwargs)
# You'll usually want to add to the session
if add:
session.add(instance)
# Navigate these with caution
if add and commit:
try:
session.commit()
except IntegrityError:
session.rollback()
elif add and flush:
session.flush()
return instance

How to map one class against multiple tables with SQLAlchemy?

Lets say that I have a database structure with three tables that look like this:
items
- item_id
- item_handle
attributes
- attribute_id
- attribute_name
item_attributes
- item_attribute_id
- item_id
- attribute_id
- attribute_value
I would like to be able to do this in SQLAlchemy:
item = Item('item1')
item.foo = 'bar'
session.add(item)
session.commit()
item1 = session.query(Item).filter_by(handle='item1').one()
print item1.foo # => 'bar'
I'm new to SQLAlchemy and I found this in the documentation (http://www.sqlalchemy.org/docs/05/mappers.html#mapping-a-class-against-multiple-tables):
j = join(items, item_attributes, items.c.item_id == item_attributes.c.item_id). \
join(attributes, item_attributes.c.attribute_id == attributes.c.attribute_id)
mapper(Item, j, properties={
'item_id': [items.c.item_id, item_attributes.c.item_id],
'attribute_id': [item_attributes.c.attribute_id, attributes.c.attribute_id],
})
It only adds item_id and attribute_id to Item and its not possible to add attributes to Item object.
Is what I'm trying to achieve possible with SQLAlchemy? Is there a better way to structure the database to get the same behaviour of "dynamic columns"?
This is called the entity-attribute-value pattern. There is an example about this under the SQLAlchemy examples directory: vertical/.
If you are using PostgreSQL, then there is also the hstore contrib module that can store a string to string mapping. If you are interested then I have some code for a custom type that makes it possible to use that to store extended attributes via SQLAlchemy.
Another option to store custom attributes is to serialize them to a text field. In that case you will lose the ability to filter by attributes.
The link to vertical/vertical.py is broken. The example had been renamed to dictlike-polymorphic.py and dictlike.py.
I am pasting in the contents of dictlike.py:
"""Mapping a vertical table as a dictionary.
This example illustrates accessing and modifying a "vertical" (or
"properties", or pivoted) table via a dict-like interface. These are tables
that store free-form object properties as rows instead of columns. For
example, instead of::
# A regular ("horizontal") table has columns for 'species' and 'size'
Table('animal', metadata,
Column('id', Integer, primary_key=True),
Column('species', Unicode),
Column('size', Unicode))
A vertical table models this as two tables: one table for the base or parent
entity, and another related table holding key/value pairs::
Table('animal', metadata,
Column('id', Integer, primary_key=True))
# The properties table will have one row for a 'species' value, and
# another row for the 'size' value.
Table('properties', metadata
Column('animal_id', Integer, ForeignKey('animal.id'),
primary_key=True),
Column('key', UnicodeText),
Column('value', UnicodeText))
Because the key/value pairs in a vertical scheme are not fixed in advance,
accessing them like a Python dict can be very convenient. The example below
can be used with many common vertical schemas as-is or with minor adaptations.
"""
class VerticalProperty(object):
"""A key/value pair.
This class models rows in the vertical table.
"""
def __init__(self, key, value):
self.key = key
self.value = value
def __repr__(self):
return '<%s %r=%r>' % (self.__class__.__name__, self.key, self.value)
class VerticalPropertyDictMixin(object):
"""Adds obj[key] access to a mapped class.
This is a mixin class. It can be inherited from directly, or included
with multiple inheritence.
Classes using this mixin must define two class properties::
_property_type:
The mapped type of the vertical key/value pair instances. Will be
invoked with two positional arugments: key, value
_property_mapping:
A string, the name of the Python attribute holding a dict-based
relationship of _property_type instances.
Using the VerticalProperty class above as an example,::
class MyObj(VerticalPropertyDictMixin):
_property_type = VerticalProperty
_property_mapping = 'props'
mapper(MyObj, sometable, properties={
'props': relationship(VerticalProperty,
collection_class=attribute_mapped_collection('key'))})
Dict-like access to MyObj is proxied through to the 'props' relationship::
myobj['key'] = 'value'
# ...is shorthand for:
myobj.props['key'] = VerticalProperty('key', 'value')
myobj['key'] = 'updated value']
# ...is shorthand for:
myobj.props['key'].value = 'updated value'
print myobj['key']
# ...is shorthand for:
print myobj.props['key'].value
"""
_property_type = VerticalProperty
_property_mapping = None
__map = property(lambda self: getattr(self, self._property_mapping))
def __getitem__(self, key):
return self.__map[key].value
def __setitem__(self, key, value):
property = self.__map.get(key, None)
if property is None:
self.__map[key] = self._property_type(key, value)
else:
property.value = value
def __delitem__(self, key):
del self.__map[key]
def __contains__(self, key):
return key in self.__map
# Implement other dict methods to taste. Here are some examples:
def keys(self):
return self.__map.keys()
def values(self):
return [prop.value for prop in self.__map.values()]
def items(self):
return [(key, prop.value) for key, prop in self.__map.items()]
def __iter__(self):
return iter(self.keys())
if __name__ == '__main__':
from sqlalchemy import (MetaData, Table, Column, Integer, Unicode,
ForeignKey, UnicodeText, and_, not_)
from sqlalchemy.orm import mapper, relationship, create_session
from sqlalchemy.orm.collections import attribute_mapped_collection
metadata = MetaData()
# Here we have named animals, and a collection of facts about them.
animals = Table('animal', metadata,
Column('id', Integer, primary_key=True),
Column('name', Unicode(100)))
facts = Table('facts', metadata,
Column('animal_id', Integer, ForeignKey('animal.id'),
primary_key=True),
Column('key', Unicode(64), primary_key=True),
Column('value', UnicodeText, default=None),)
class AnimalFact(VerticalProperty):
"""A fact about an animal."""
class Animal(VerticalPropertyDictMixin):
"""An animal.
Animal facts are available via the 'facts' property or by using
dict-like accessors on an Animal instance::
cat['color'] = 'calico'
# or, equivalently:
cat.facts['color'] = AnimalFact('color', 'calico')
"""
_property_type = AnimalFact
_property_mapping = 'facts'
def __init__(self, name):
self.name = name
def __repr__(self):
return '<%s %r>' % (self.__class__.__name__, self.name)
mapper(Animal, animals, properties={
'facts': relationship(
AnimalFact, backref='animal',
collection_class=attribute_mapped_collection('key')),
})
mapper(AnimalFact, facts)
metadata.bind = 'sqlite:///'
metadata.create_all()
session = create_session()
stoat = Animal(u'stoat')
stoat[u'color'] = u'reddish'
stoat[u'cuteness'] = u'somewhat'
# dict-like assignment transparently creates entries in the
# stoat.facts collection:
print stoat.facts[u'color']
session.add(stoat)
session.flush()
session.expunge_all()
critter = session.query(Animal).filter(Animal.name == u'stoat').one()
print critter[u'color']
print critter[u'cuteness']
critter[u'cuteness'] = u'very'
print 'changing cuteness:'
metadata.bind.echo = True
session.flush()
metadata.bind.echo = False
marten = Animal(u'marten')
marten[u'color'] = u'brown'
marten[u'cuteness'] = u'somewhat'
session.add(marten)
shrew = Animal(u'shrew')
shrew[u'cuteness'] = u'somewhat'
shrew[u'poisonous-part'] = u'saliva'
session.add(shrew)
loris = Animal(u'slow loris')
loris[u'cuteness'] = u'fairly'
loris[u'poisonous-part'] = u'elbows'
session.add(loris)
session.flush()
q = (session.query(Animal).
filter(Animal.facts.any(
and_(AnimalFact.key == u'color',
AnimalFact.value == u'reddish'))))
print 'reddish animals', q.all()
# Save some typing by wrapping that up in a function:
with_characteristic = lambda key, value: and_(AnimalFact.key == key,
AnimalFact.value == value)
q = (session.query(Animal).
filter(Animal.facts.any(
with_characteristic(u'color', u'brown'))))
print 'brown animals', q.all()
q = (session.query(Animal).
filter(not_(Animal.facts.any(
with_characteristic(u'poisonous-part', u'elbows')))))
print 'animals without poisonous-part == elbows', q.all()
q = (session.query(Animal).
filter(Animal.facts.any(AnimalFact.value == u'somewhat')))
print 'any animal with any .value of "somewhat"', q.all()
# Facts can be queried as well.
q = (session.query(AnimalFact).
filter(with_characteristic(u'cuteness', u'very')))
print 'just the facts', q.all()
metadata.drop_all()

Categories