I have a set of classes defined like this:
class CrosstabCohort(Base):
__table__ = crosstab_cohorts
# has property is_column
cohort = relationship(Cohort, innerjoin=True)
class Crosstab(Base):
__table__ = crosstabs
columns = relationship(CrosstabCohort,
#secondary=crosstab_cohorts,
primaryjoin=and_(
crosstabs.c.crosstab_id == crosstab_cohorts.c.crosstab_id,
crosstab_cohorts.c.is_column == True),
order_by=crosstab_cohorts.c.created_at,
)
rows = relationship(CrosstabCohort,
#secondary=crosstab_cohorts,
primaryjoin=and_(
crosstabs.c.crosstab_id == crosstab_cohorts.c.crosstab_id,
crosstab_cohorts.c.is_column == False),
order_by=crosstab_cohorts.c.created_at,
)
When I add column or row objects by calling append on the relationship instance, I'd like for the is_column property to be automatically set to either True or False depending upon which relationship I append it to. Is this possible? At present, when I append to these relationships and try to commit, I receive an error from my database that the is_column property is not set. Can this be done automatically or must I set the is_column property on each Cohort object?
Related
Say I have a domain model with an id field plus eq and hash. Then there is a simple SqlAlchemy ORM mapping.
#dataclass
class Foo:
_id: int
value: 50
def __eq__(self, other):
if not isinstance(other, Foo):
return False
return other._id == self._id
def __hash__(self):
return hash(self._id)
foo = Table(
"foo",
mapper_registry.metadata,
Column("_id", Integer, primary_key=True, autoincrement=True),
Column("value", Float, nullable=False)
mapper_registry.map_imperatively(Foo, foo)
This seems simple enough and follows the documentation for SqlAlchemy. My problem is the _id column/property. If I create a test to load Foo from the database:
def test_foo_mapper_can_load_foos(session):
with session:
session.execute(
'INSERT INTO foo (value)'
'VALUES (50)'
)
session.commit()
expected = [
Foo(_id=1, value=50),
]
assert session.query(Foo).all() == expected
this works, fine. The model is initialised with ids from the database.
But, what about initialising a model to commit to the database. If the client creates a new foo to try to write to the database, how should I approach the id for the model, before gets committed?
def test_foo_mapper_can_save_foos(session):
#option1 - manually set it (collides with auto_increment)
new_foo = Foo(_id=1, value: 50)
#option2 - set to None (collides with __eq__/hashing)
new foo = Foo(_id=None, value: 50)
#option3 - get rid of id from domain model and only have it in db
new_foo = Foo(value: 50)
session.add(new_foo)
session.commit()
rows = list(session.execute('SELECT * FROM "foo"'))
assert rows == [(1, 50)]
Each of the test options can work, but none of the implementations seem like good code.
In option 1, when the client creates a new foo an id must be set, the dataclass requires it in the constructor... but it seems to not really be in line with the auto_increment primary key idea on the table - the client can pass any id, whether it is the next in sequence or not and the mapper will try to use it. I feel the database should be responsible for setting primary key.
So, on to option 2. Set the id to None, and the database will take care of it on commit. However, the eq and hash functions rely on id for equality and the object becomes unhashable. This could also be done by setting _id: int = None as a default value on the domain model itself. But again, seems like a bad solution.
Finally option 3... remove the _id field from the domain model... which has popped up in a couple of articles, but also seem less than ideal as Foo now has no unique id for use in select statements, and use in other business logic etc...
I'm sure I'm thinking about this all wrong, I just can't figure out where.
This seems like a real beginner question, but I'm having trouble finding a simple answer. I have simplified this down to just the bare bones with a simple data model representing a one-to-many relationship:
class Room(db.Model):
__tablename__ = 'rooms'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(128), unique=True)
capacity = db.Column(db.Integer)
events = db.relationship('Event', backref='room')
class Event(db.Model):
__tablename__ = 'counts'
id = db.Column(db.Integer, primary_key=True)
unusedCapacity = db.Column(db.Integer)
attendance = db.Column(db.Integer)
room_id = db.Column(db.Integer, db.ForeignKey('rooms.id'))
Event.unusedCapacity is calculated as Room.capacity - Event.attendance, but I need to store the value in the column — Room.capacity may change over time, but the Event.unusedCapacity needs to reflect the actual unused capacity at the time of the Event.
I am currently querying the Room and then creating the event:
room = Room.query.get(room_id) # using Flask sqlAlchemy
event = event(unusedCapacity = room.capacity - attendance, ...etc)
My question is: is there a more efficient way to do this in one step?
As noted in the comments by #SuperShoot, a query on insert can calculate the unused capacity in the database without having to fetch first. An explicit constructor, such as shown by #tooTired, could pass a scalar subquery as unusedCapacity:
class Event(db.Model):
...
def __init__(self, **kwgs):
if 'unusedCapacity' not in kwgs:
kwgs['unusedCapacity'] = \
db.select([Room.capacity - kwgs['attendance']]).\
where(Room.id == kwgs['room_id']).\
as_scalar()
super().__init__(**kwgs)
Though it is possible to use client-invoked SQL expressions as defaults, I'm not sure how one could refer to the values to be inserted in the expression without using a context-sensitive default function, but that did not quite work out: the scalar subquery was not inlined and SQLAlchemy tried to pass it using placeholders instead.
A downside of the __init__ approach is that you cannot perform bulk inserts that would handle unused capacity using the table created for the model as is, but will have to perform a manual query that does the same.
Another thing to look out for is that until a flush takes place the unusedCapacity attribute of a new Event object holds the SQL expression object, not the actual value. The solution by #tooTired is more transparent in this regard, since a new Event object will hold the numeric value of unused capacity from the get go.
SQLAlchemy adds an implicit constructor to all model classes which accepts keyword arguments for all its columns and relationships. You can override this and pass the kwargs without unusedCapacity and get the room capacity in the constructor:
class Event(db.Model):
# ...
#kwargs without unusedCapacity
def __init__(**kwargs):
room = Room.query.get(kwargs.get(room_id))
super(Event, self).__init__(unusedCapacity = room.capacity - kwargs.get(attendance), **kwargs)
#Create new event normally
event = Event(id = 1, attendance = 1, room_id = 1)
I'm trying to model an entity that as one or more one-to-many relationships, such that it's last_modified attribute is updated, when
a child is added or removed
a child is modified
the entity itself is modified
I've put together the following minimal example:
class Config(Base):
__tablename__ = 'config'
ID = Column('ID', Integer, primary_key=True)
name = Column('name', String)
last_modified = Column('last_modified', DateTime, default=now, onupdate=now)
params = relationship('ConfigParam', backref='config')
class ConfigParam(Base):
__tablename__ = 'config_params'
ID = Column('ID', Integer, primary_key=True)
ConfigID = Column('ConfigID', Integer, ForeignKey('config.ID'), nullable=False)
key = Column('key', String)
value = Column('value', Float)
#event.listens_for(Config.params, 'append')
#event.listens_for(Config.params, 'remove')
def receive_append_or_remove(target, value, initiator):
target.last_modified = now()
#event.listens_for(ConfigParam.key, 'set')
#event.listens_for(ConfigParam.value, 'set')
def receive_attr_change(target, value, oldvalue, initiator):
if target.config:
# don't act if the parent config isn't yet set
# i.e. during __init__
target.config.last_modified = now()
This seems to work, but I'm wondering if there's a better way to do this?
Specifically, this becomes very verbose since my actual ConfigParam implementation has more attributes and I'm having multiple one-to-many relations configured on the parent Config class.
Take this with a huge grain of salt, it "seems" to work, could explode:
def rel_listener(t, v, i):
t.last_modified = now()
def listener(t, v, o, i):
if t.config:
t.config.last_modified = now()
from sqlalchemy import inspect
for rel in inspect(Config).relationships:
event.listen(rel, 'append', rel_listener)
event.listen(rel, 'remove', rel_listener)
for col in inspect(ConfigParam).column_attrs:
event.listen(col, 'set', listener)
Problem is that the inspections make no exceptions and columns such as 'ID' and 'ConfigID' will be bound to event listeners.
Another perhaps slightly less tedious form would be to just use a list of attributes to bind events to in a similar fashion:
for attr in ['key', 'value']:
event.listen(getattr(ConfigParam, attr), 'set', listener)
This gives you control over what is bound to events and what is not.
I need to implement a "related items" feature, i.e. to allow items from the same table to be arbitrarily linked to each other in a many-to-many fashion. Something similar to how news websites show related articles.
Also, I need the relationship to be bi-directional, something like this:
a = Item()
b = Item()
a.related.append(b)
assert a in b.related # True
Now, on SQL level I imagine this could be solved by modifying the "standard" many-to-many relationship so 2 records are inserted into the association table each time an association is made, so (a -> b) and (b -> a) are two separate records.
Alternatively, the join condition for the many-to-many table could somehow check both sides of the association, so roughly instead of ... JOIN assoc ON a.id = assoc.left_id ... SQLAlchemy would produce something like ... JOIN assoc ON a.id = assoc.left_id OR a.id = assoc.right_id ...
Is there a way to configure this with SQLAlchemy so the relation works similar to a "normal" many-to-many relationship?
It's likely that I'm just don't know the correct terminology - everything I came up with - "self-referential", "bidirectional", "association" - is used to describe something else in SQLAlchemy.
Using Attribute Events should do the job. See the sample code below, where little ugly piece of code is solely for the purpose of avoid endless recursion:
class Item(Base):
__tablename__ = "item"
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=False)
# relationships
related = relationship('Item',
secondary = t_links,
primaryjoin = (id == t_links.c.from_id),
secondaryjoin = (id == t_links.c.to_id),
)
_OTHER_SIDE = set()
from sqlalchemy import event
def Item_related_append_listener(target, value, initiator):
global _OTHER_SIDE
if not((target, value) in _OTHER_SIDE):
_OTHER_SIDE.add((value, target))
if not target in value.related:
value.related.append(target)
else:
_OTHER_SIDE.remove((target, value))
event.listen(Item.related, 'append', Item_related_append_listener)
# ...
a = Item()
b = Item()
a.related.append(b)
assert a in b.related # True
For completeness sake, here's the code I ended up with; the listener method is slightly different to avoid using a global variable, an also there's a listener for remove event.
import sqlalchemy as sa
related_items = sa.Table(
"related_items",
Base.metadata,
sa.Column("id", sa.Integer, primary_key=True),
sa.Column("from_id", sa.ForeignKey("items.id")),
sa.Column("to_id", sa.ForeignKey("items.id")),
)
class Item(Base):
__tablename__ = 'items'
...
related = sa.orm.relationship('Item',
secondary = related_items,
primaryjoin = (id == related_items.c.from_id),
secondaryjoin = (id == related_items.c.to_id),
)
def item_related_append_listener(target, value, initiator):
if not hasattr(target, "__related_to__"):
target.__related_to__ = set()
target.__related_to__.add(value)
if target not in getattr(value, "__related_to__", set()):
value.related.append(target)
sa.event.listen(Item.related, 'append', item_related_append_listener)
def item_related_remove_listener(target, value, initiator):
if target in value.related:
value.related.remove(target)
sa.event.listen(Item.related, 'remove', item_related_remove_listener)
I have two legacy table that I would like to use SQLAlchemy declarative to access data.
Order:
order_id
is_processed
FooData:
foo_id
order_id
A order may or may not have FooData and I would like to distinguish between the two order types using SQLAlchemy declarative models.
The problem I have wrapping my head around is.
How do I set up such a relationship? Ideally I'd have two classes Order and FooOrder where Order has no FooData and FooOrder has FooData.
I have to query both types (Order and FooOrder) together based on is_processed and process them differently based on whether it is Order or FooOrder. How do I go about querying in this case?
If you can change the DB, then simply add one discriminator column, set
the value of this column to proper value (order|foodata) depending on whether
the foodata exists for it, make it NOT NULL and configure simple Joined Table Inheritance.
If you cannot change the DB (add a discriminator column) and you only have the simple
2-table model as you show, then I would not use inheritance, but rather 1-1 relationship.
Model Definition:
class Order(Base):
__tablename__ = 'order'
__table_args__ = {'autoload': True}
class FooData(Base):
__tablename__ = 'foo_data'
__table_args__ = {'autoload': True}
# #note: you need next line only if your DB does not have FK defined
#__table_args__ = (ForeignKeyConstraint(['order_id'], ['order.order_id']), {'autoload': True})
# define 1-[0..1] relationship from Order to FooData with eager loading
Order.foodata = relationship(FooData, uselist=False, lazy="joined", backref="order")
Adding new objects:
ord = Order(); ord.is_processed = False
session.add(ord)
ord = Order(); ord.is_processed = False
foo = FooData(); foo.someinfo = "test foo created from SA"
ord.foodata = foo
session.add(ord)
session.commit()
Query: all based on is_processed:
qry = session.query(Order).filter(Order.is_processed == False)
for ord in qry:
print ord, ord.foodata
A la Polymorphism:
You can even implement methods on your Order and FooData in a way that it
would seem they are in fact using inheritance:
class Order(Base):
# ...
def process_order(self):
if self.foodata:
self.foodata.process_order()
else:
print "processing order: ", self
class FooData(Base):
# ...
def process_order(self):
print "processing foo_data: ", self