Say I have a domain model with an id field plus eq and hash. Then there is a simple SqlAlchemy ORM mapping.
#dataclass
class Foo:
_id: int
value: 50
def __eq__(self, other):
if not isinstance(other, Foo):
return False
return other._id == self._id
def __hash__(self):
return hash(self._id)
foo = Table(
"foo",
mapper_registry.metadata,
Column("_id", Integer, primary_key=True, autoincrement=True),
Column("value", Float, nullable=False)
mapper_registry.map_imperatively(Foo, foo)
This seems simple enough and follows the documentation for SqlAlchemy. My problem is the _id column/property. If I create a test to load Foo from the database:
def test_foo_mapper_can_load_foos(session):
with session:
session.execute(
'INSERT INTO foo (value)'
'VALUES (50)'
)
session.commit()
expected = [
Foo(_id=1, value=50),
]
assert session.query(Foo).all() == expected
this works, fine. The model is initialised with ids from the database.
But, what about initialising a model to commit to the database. If the client creates a new foo to try to write to the database, how should I approach the id for the model, before gets committed?
def test_foo_mapper_can_save_foos(session):
#option1 - manually set it (collides with auto_increment)
new_foo = Foo(_id=1, value: 50)
#option2 - set to None (collides with __eq__/hashing)
new foo = Foo(_id=None, value: 50)
#option3 - get rid of id from domain model and only have it in db
new_foo = Foo(value: 50)
session.add(new_foo)
session.commit()
rows = list(session.execute('SELECT * FROM "foo"'))
assert rows == [(1, 50)]
Each of the test options can work, but none of the implementations seem like good code.
In option 1, when the client creates a new foo an id must be set, the dataclass requires it in the constructor... but it seems to not really be in line with the auto_increment primary key idea on the table - the client can pass any id, whether it is the next in sequence or not and the mapper will try to use it. I feel the database should be responsible for setting primary key.
So, on to option 2. Set the id to None, and the database will take care of it on commit. However, the eq and hash functions rely on id for equality and the object becomes unhashable. This could also be done by setting _id: int = None as a default value on the domain model itself. But again, seems like a bad solution.
Finally option 3... remove the _id field from the domain model... which has popped up in a couple of articles, but also seem less than ideal as Foo now has no unique id for use in select statements, and use in other business logic etc...
I'm sure I'm thinking about this all wrong, I just can't figure out where.
Related
I have a SQLAlchemy model:
class Ticket(db.Model):
__tablename__ = 'ticket'
id = db.Column(INTEGER(unsigned=True), primary_key=True, nullable=False,
autoincrement=True)
cluster = db.Column(db.VARCHAR(128))
#classmethod
def get(cls, cluster=None):
query = db.session.query(Ticket)
if cluster is not None:
query = query.filter(Ticket.cluster==cluster)
return query.one()
If I add a new column and would like to extend the get method, I have to add one if xxx is not None like this below:
#classmethod
def get(cls, cluster=None, user=None):
query = db.session.query(Ticket)
if cluster is not None:
query = query.filter(Ticket.cluster==cluster)
if user is not None:
query = query.filter(Ticket.user==user)
return query.one()
Is there any way I could make this more efficient? If I have too many columns, the get method would become so ugly.
As always, if you don't want to write something repetitive, use a loop:
#classmethod
def get(cls, **kwargs):
query = db.session.query(Ticket)
for k, v in kwargs.items():
query = query.filter(getattr(table, k) == v)
return query.one()
Because we're no longer setting the cluster=None/user=None as defaults (but instead depending on things that weren't specified by the caller simply never being added to kwargs), we no longer need to prevent filters for null values from being added: The only way a null value will end up in the argument list is if the user actually asked to search for a value of None; so this new code is able to honor that request should it ever take place.
If you prefer to retain the calling convention where cluster and user can be passed positionally (but the user can't search for a value of None), see the initial version of this answer.
Hi I have a table in 3NF form
ftype_table = Table(
'FTYPE',
Column('ftypeid', Integer, primary_key=True),
Column('typename', String(50)),
base.metadata,
schema='TEMP')
file_table = Table(
'FILE',
base.metadata,
Column('fileid', Integer, primary_key=True),
Column('datatypeid', Integer, ForeignKey(ftype_table.c.datatypeid)),
Column('size', Integer),
schema='TEMP')
and mappers
class File(object): pass
class FileType(object): pass
mapper(File, file_table, properties={'filetype': relation(FileType)})
mapper(FileType, file_table)
suppose Ftype table contains 1:TXT 2:AVI 3:PPT
what i would like to do is the following if i create a File object like this:
file=File()
file.size=10
file.filetype= FileType('PPT')
Session.save(file)
Session.flush()
is that the File table contains fileid:xxx,size:10, datatypeid:3
Unfortunately an entry gets added to the FileType table and this id gets propagated to the File table.
Is there a smart way to do achieve the above with sqlalchemy witout the need to do a query on the FileType table to see if the entry exist or not
Thanks
the UniqueObject recipe is the standard answer here: http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject . The idea is to override the creation of File using either __metaclass__.call() or File.__new__() to return the already-existing object, from the DB or from cache (the initial DB lookup, if the object isn't already present, is obviously unavoidable unless something constructed around MySQL's REPLACE is used).
edit: since I've been working on the usage recipes, I've rewritten the unique object recipe to be more portable and updated for 0.5/0.6.
Just create a cache of FileType objects, so that the database lookup occurs only the first time you use a given file type:
class FileTypeCache(dict):
def __missing__(self, key):
obj = self[key] = Session.query(FileType).filter_by(typename=key).one()
return obj
filetype_cache = FileTypeCache()
file=File()
file.size=10
file.filetype= filetype_cache['PPT']
should work, modulo typos.
Since declarative_base and zzzeek code does not work with sqlalchemy 0.4, I
used the following cache so that new objects also stay unique if they are not present in the db
class FileTypeCache(dict):
def __missing__(self, key):
try:
obj = self[key] = Session.query(FileType).filter_by(typename=key).one()
return obj
except InvalidRequestError:
return obj=self[key]= FileType(key)
return obj
override eq of FileType
class FileType(object):
def __init__(self, typename)
self.typename=typename
def __eq__(self):
if isinstance(other, FileType):
return self.typename == other.typename
else:
return False
I made this statement using flask-sqlalchemy and I've chosen to keep it in its original form. Post.query is equivalent to session.query(Post)
I attempted to make a subquery that would filter out all posts in a database which are in the draft state and not made or modified by the current user. I made this query,
Post.query\
.filter(sqlalchemy.and_(
Post.post_status != Consts.PostStatuses["Draft"],
sqlalchemy.or_(
Post.modified_by_id == current_user.get_id(),
Post.created_by_id == current_user.get_id()))
which created:
Where true AND ("Post".modified_by_id = :modified_by_id_1 OR
"Post".created_by_id = :created_by_id_1)
Expected outcome:
Where "Post".post_status != "Draft" AND (
"Post".modified_by_id = :modified_by_id_1 OR
"Post".created_by_id = :created_by_id_1)
I'm wondering, why this is happening? How can I increase the error level in SQLAlchemy? I think my project is silently failing and I would like to confirm my guess.
Update:
I used the wrong constants dictionary. One dictionary contains ints, the other contains strings (one for data base queries, one for printing).
_post_status = db.Column(
db.SmallInteger,
default=Consts.post_status["Draft"])
post_status contains integers, Consts.PostStatuses contains strings. In hind sight, really bad idea. I'm going to make a single dictionary that returns a tuple instead of two dictionaries.
#property
def post_status(self):
return Consts.post_status.get(getattr(self, "_post_status", None))
the problem is that your post_status property isn't acceptable for usage in an ORM level query, as this is a python descriptor which at the class level by default returns itself:
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
_post_status = Column(String)
#property
def post_status(self):
return self._post_status
print (A.post_status)
print (A.post_status != 5678)
output:
$ python test.py
<property object at 0x10165bd08>
True
the type of usage you're looking for seems like that of a hybrid attribute, which is a SQLAlchemy-included extension to a "regular" python descriptor which produces class-level behavior that's compatible with core SQL expressions:
from sqlalchemy.ext.hybrid import hybrid_property
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
_post_status = Column(String)
#hybrid_property
def post_status(self):
return self._post_status
print (A.post_status)
print (A.post_status != 5678)
output:
$ python test.py
A._post_status
a._post_status != :_post_status_1
be sure to read the hybrid doc carefully including how to establish the correct SQL expression behavior, descriptors that work both at the instance and class level is a somewhat advanced Python technique.
I need to implement a "related items" feature, i.e. to allow items from the same table to be arbitrarily linked to each other in a many-to-many fashion. Something similar to how news websites show related articles.
Also, I need the relationship to be bi-directional, something like this:
a = Item()
b = Item()
a.related.append(b)
assert a in b.related # True
Now, on SQL level I imagine this could be solved by modifying the "standard" many-to-many relationship so 2 records are inserted into the association table each time an association is made, so (a -> b) and (b -> a) are two separate records.
Alternatively, the join condition for the many-to-many table could somehow check both sides of the association, so roughly instead of ... JOIN assoc ON a.id = assoc.left_id ... SQLAlchemy would produce something like ... JOIN assoc ON a.id = assoc.left_id OR a.id = assoc.right_id ...
Is there a way to configure this with SQLAlchemy so the relation works similar to a "normal" many-to-many relationship?
It's likely that I'm just don't know the correct terminology - everything I came up with - "self-referential", "bidirectional", "association" - is used to describe something else in SQLAlchemy.
Using Attribute Events should do the job. See the sample code below, where little ugly piece of code is solely for the purpose of avoid endless recursion:
class Item(Base):
__tablename__ = "item"
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=False)
# relationships
related = relationship('Item',
secondary = t_links,
primaryjoin = (id == t_links.c.from_id),
secondaryjoin = (id == t_links.c.to_id),
)
_OTHER_SIDE = set()
from sqlalchemy import event
def Item_related_append_listener(target, value, initiator):
global _OTHER_SIDE
if not((target, value) in _OTHER_SIDE):
_OTHER_SIDE.add((value, target))
if not target in value.related:
value.related.append(target)
else:
_OTHER_SIDE.remove((target, value))
event.listen(Item.related, 'append', Item_related_append_listener)
# ...
a = Item()
b = Item()
a.related.append(b)
assert a in b.related # True
For completeness sake, here's the code I ended up with; the listener method is slightly different to avoid using a global variable, an also there's a listener for remove event.
import sqlalchemy as sa
related_items = sa.Table(
"related_items",
Base.metadata,
sa.Column("id", sa.Integer, primary_key=True),
sa.Column("from_id", sa.ForeignKey("items.id")),
sa.Column("to_id", sa.ForeignKey("items.id")),
)
class Item(Base):
__tablename__ = 'items'
...
related = sa.orm.relationship('Item',
secondary = related_items,
primaryjoin = (id == related_items.c.from_id),
secondaryjoin = (id == related_items.c.to_id),
)
def item_related_append_listener(target, value, initiator):
if not hasattr(target, "__related_to__"):
target.__related_to__ = set()
target.__related_to__.add(value)
if target not in getattr(value, "__related_to__", set()):
value.related.append(target)
sa.event.listen(Item.related, 'append', item_related_append_listener)
def item_related_remove_listener(target, value, initiator):
if target in value.related:
value.related.remove(target)
sa.event.listen(Item.related, 'remove', item_related_remove_listener)
I have two legacy table that I would like to use SQLAlchemy declarative to access data.
Order:
order_id
is_processed
FooData:
foo_id
order_id
A order may or may not have FooData and I would like to distinguish between the two order types using SQLAlchemy declarative models.
The problem I have wrapping my head around is.
How do I set up such a relationship? Ideally I'd have two classes Order and FooOrder where Order has no FooData and FooOrder has FooData.
I have to query both types (Order and FooOrder) together based on is_processed and process them differently based on whether it is Order or FooOrder. How do I go about querying in this case?
If you can change the DB, then simply add one discriminator column, set
the value of this column to proper value (order|foodata) depending on whether
the foodata exists for it, make it NOT NULL and configure simple Joined Table Inheritance.
If you cannot change the DB (add a discriminator column) and you only have the simple
2-table model as you show, then I would not use inheritance, but rather 1-1 relationship.
Model Definition:
class Order(Base):
__tablename__ = 'order'
__table_args__ = {'autoload': True}
class FooData(Base):
__tablename__ = 'foo_data'
__table_args__ = {'autoload': True}
# #note: you need next line only if your DB does not have FK defined
#__table_args__ = (ForeignKeyConstraint(['order_id'], ['order.order_id']), {'autoload': True})
# define 1-[0..1] relationship from Order to FooData with eager loading
Order.foodata = relationship(FooData, uselist=False, lazy="joined", backref="order")
Adding new objects:
ord = Order(); ord.is_processed = False
session.add(ord)
ord = Order(); ord.is_processed = False
foo = FooData(); foo.someinfo = "test foo created from SA"
ord.foodata = foo
session.add(ord)
session.commit()
Query: all based on is_processed:
qry = session.query(Order).filter(Order.is_processed == False)
for ord in qry:
print ord, ord.foodata
A la Polymorphism:
You can even implement methods on your Order and FooData in a way that it
would seem they are in fact using inheritance:
class Order(Base):
# ...
def process_order(self):
if self.foodata:
self.foodata.process_order()
else:
print "processing order: ", self
class FooData(Base):
# ...
def process_order(self):
print "processing foo_data: ", self