SQLAlchemy association table (association object pattern) raises IntegrityError - python

I'm using SQLAlchemy release 0.8.2 (tried python 2.7.5 and 3.3.2)
I've had to use the association object pattern (for a many-to-many relationship) in my code, but whenever I've been adding an association, it has been raising an IntegrityError exception. This is because instead of executing "INSERT INTO association (left_id, right_id, extra_data) [...]", it instead executes "INSERT INTO association (right_id, extra_data) [...]", which is going to raise an IntegrityError exception since it's missing a primary key.
After trying to narrow down the problem for a while and simplifying the code as much as possible, I found the culprit(s?), but I don't understand why it's behaving this way.
I included my complete code so the reader can test it as is. The class declarations are exactly the same as in the documentation (with backrefs).
#!/usr/bin/env python2
import sqlalchemy
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String
from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship, backref
Base = declarative_base()
class Association(Base):
__tablename__ = 'association'
left_id = Column(Integer, ForeignKey('left.id'), primary_key=True)
right_id = Column(Integer, ForeignKey('right.id'), primary_key=True)
extra_data = Column(String(50))
child = relationship("Child", backref="parent_assocs")
class Parent(Base):
__tablename__ = 'left'
id = Column(Integer, primary_key=True)
children = relationship("Association", backref="parent")
class Child(Base):
__tablename__ = 'right'
id = Column(Integer, primary_key=True)
def main():
engine = sqlalchemy.create_engine('sqlite:///:memory:', echo=True)
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
# populate old data
session.add(Child())
# new data
p = Parent()
session.add(p) # Commenting this fixes the error.
session.flush()
# rest of new data
a = Association(extra_data="some data")
a.child = session.query(Child).one()
# a.child = Child() # Using this instead of the above line avoids the error - but that's not what I want.
p.children.append(a)
# a.parent = p # Using this instead of the above line fixes the error! They're logically equivalent.
session.add(p)
session.commit()
if __name__ == '__main__':
main()
So, as mentioned in the comments in the code above, there are three ways to fix/avoid the problem.
Don't add the parent to the session before declaring the association
Create a new child for the association instead of selecting an already existing child.
Use the backref on the association
I don't understand the behaviour of all three cases.
The second case does something different, so it's not a possible solution. I don't understand the behaviour however, and would appreciate an explanation of why the problem is avoided in this case.
I'm thinking the first case may have something to do with "Object States", but I don't know exactly what's causing it either. Oh, and adding session.autoflush=False just before the first occurrence of session.add(p) also fixes the problem which adds to my confusion.
For the third case, I'm drawing a complete blank since they should be logically equivalent.
Thanks for any insight!

what happens here is that when you call upon p.children.append(), SQLAlchemy can't append to a plain collection without loading it first. As it goes to load, autoflush kicks in - you know this because in your stack trace you will see a line like this:
File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/orm/session.py", line 1183, in _autoflush
self.flush()
Your Association object is then flushed here in an incomplete state; it's in the session in the first place because when you say a.child = some_persistent_child, an event appends a to the parent_assocs collection of Child which then cascades the Association object into the session (see Controlling Cascade on Backrefs for some background on this, and one possible solution).
But without affecting any relationships, the easiest solution when you have this chicken/egg sort of problem is to temporarily disable autoflush using no_autoflush:
with session.no_autoflush:
p.children.append(a)
by disabling the autoflush when p.children is loaded, your pending object a is not flushed; it is then associated with the already persistent Parent (because you've added and flushed that already) and is ready for INSERT.
this allows your test program to succeed.

Related

Why am I unable to generate a query using relationships?

I'm experimenting with relationship functionality within SQLAlchemy however I've not been able to crack it. The following is a simple MRE:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, ForeignKey, Integer, create_engine
from sqlalchemy.orm import relationship, sessionmaker
Base = declarative_base()
class Tournament(Base):
__tablename__ = "tournament"
__table_args__ = {"schema": "belgarath", "extend_existing": True}
id_ = Column(Integer, primary_key=True)
tournament_master_id = Column(Integer, ForeignKey("belgarath.tournament_master.id_"))
tournament_master = relationship("TournamentMaster", back_populates="tournament")
class TournamentMaster(Base):
__tablename__ = "tournament_master"
__table_args__ = {"schema": "belgarath", "extend_existing": True}
id_ = Column(Integer, primary_key=True)
tour_id = Column(Integer, index=True)
tournament = relationship("Tournament", back_populates="tournament_master")
engine = create_engine("mysql+mysqlconnector://root:root#localhost/")
Session = sessionmaker(bind=engine)
session = Session()
qry = session.query(Tournament.tournament_master.id_).limit(100)
I was hoping to be able to query the id_ field from the tournament_master table through a relationship specified in the tournament table. However I get the following error:
AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with Tournament.tournament_master has an attribute 'id_'
I've also tried replacing the two relationship lines with a single backref line in TournamentMaster:
tournament = relationship("Tournament", backref="tournament_master")
However I then get the error:
AttributeError: type object 'Tournament' has no attribute 'tournament_master'
Where am I going wrong?
(I'm using SQLAlchemy v1.3.18)
Your ORM classes look fine. It's the query that's incorrect.
In short you're getting that "InstrumentedAttribute" error because you are misusing the session.query method.
From the docs the session.query method takes as arguments, "SomeMappedClass" or "entities". You have 2 mapped classes defined, Tournament, and TournamentMaster. These "entities" are typically either your mapped classes (ORM objects) or a Column of these mapped classes.
However you are passing in Tournament.tournament_master.id_ which is not a "MappedClass" or a column and thus not an "entity" that session.query can consume.
Another way to look at it is that by calling Tournament.tournament_master.id_ you are trying to access a 'TournamentMaster' record (or instance) from the 'Tournament' class, which doesn't make sense.
It's not super clear to me what exactly you hoping to return from the query. In any case though here's a start.
Instead of
qry = session.query(Tournament.tournament_master.id_).limit(100)
try
qry = session.query(Tournament, TournamentMaster).join(TournamentMaster).limit(100)
This may also work (haven't tested) to only return the id_ field, if that is you intention
qry = session.query(Tournament, TournamentMaster).join(Tournament).with_entities(TournamentMaster.id_).limit(100)

Can a sqlalchemy relationship be accessed using a new session?

When using yield_per I am forced to use a separate session if I want to perform another query while the result from the yield_per query have not yet been all fetched.
Let's take this models for our example:
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
children = relationship("Child", backref="parent")
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('parent.id'))
Here are three ways to achieve the same thing, but only the first one works:
query = session.query(Parent).yield_per(5)
for p in query:
print(p.id)
# 1: Using a new session (will work):
c = newsession.query(Child).filter_by(parent_id=p.id).first()
print(c.id)
# 2: Using the same session (will not work):
c = session.query(Child).filter_by(parent_id=p.id).first()
print(c.id)
# 3: Using the relationship (will not work):
c = p.children[0]
print(c.id)
Indeed (when using mysql) both 2 and 3 will throw an exception and stop execution with the following error: "Commands out of sync; you can't run this command now".
My question is, is there a way I can make relationship lookup work in this context ? Is there maybe a way to trick sqlalchemy into using a new session when the first one is busy ?
Try selectinload, as it supports eagerly loading with yield_per.
import sqlalchemy as sa
query = session.query(
Parent
).options(
sa.orm.selectinload(Parent.children)
).yield_per(5)
for parent in query:
for child in parent.children:
print(child.id)

How to merge an object into the session based on ID using SQLAlchemy while keeping correct discriminator of child classes?

I am passing object IDs between threads, and would like to be able to merge an object into the session by ID. I could of course query objects across threads to accomplish this but would rather avoid the overhead of a database trip if the object is in the session already.
My code is working, but it is not correctly setting the discriminator value when I merge by the parent class. How can I ensure the discriminator is set correctly?
from sqlalchemy import create_engine, __version__
from sqlalchemy.orm import sessionmaker, scoped_session
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column
from sqlalchemy.types import Integer, String
Session = scoped_session(sessionmaker())
Base = declarative_base()
class Person(Base):
__tablename__ = 'person'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
discriminator = Column(String, nullable=False)
__mapper_args__ = {'polymorphic_identity': 'person',
'polymorphic_on': discriminator}
class Programmer(Person):
__mapper_args__ = {'polymorphic_identity': 'programmer'}
fav_language = Column(String)
engine = create_engine('postgresql+zxjdbc://mnaber:test123#localhost:5432/ajtest2', echo=True)
Session.configure(bind=engine)
Base.metadata.bind = engine
Base.metadata.drop_all(checkfirst=True)
Base.metadata.create_all()
s = Session()
michael = Programmer(name='Michael', fav_language='Python')
s.add(michael)
s.commit()
print "Sqlalchemy Version: %s" % __version__
print "INITIAL: %s %s" % (type(michael), michael.discriminator)
michael_merged = s.merge(Person(id=michael.id)) #Merge by parent class
print "MERGED: %s %s" % (type(michael_merged), michael_merged.discriminator)
print "FINAL: %s %s" % (type(michael), michael.discriminator)
michael_merged.fav_language = 'Jython'
s.add(michael_merged)
s.commit()
The interesting output of this is:
Sqlalchemy Version: 0.8.7
INITIAL: <class '__main__.Programmer'> programmer
MERGED: <class '__main__.Programmer'> person
FINAL: <class '__main__.Programmer'> person
site-packages/sqlalchemy/orm/persistence.py:154: SAWarning: Flushing object <Programmer at 0x18> with incompatible polymorphic identity 'person'; the object may not refresh and/or load correctly
mapper._validate_polymorphic_identity(mapper, state, dict_)
How should I ensure that MERGED and FINAL have a discriminator of programmer?
merge() assumes an object coming in looks the way you want it to look so you'd need to pass a Programmer here. Passing in Person(id=1) when there's really a Programmer in the database for that identity is not a supported pattern, the behavior is undefined. What happens at the moment is that your Person object has the "discriminator" of "person" set up front, so that's the value that gets flushed into the database; it overrides what's already in the session.
You can trick it into working like this:
p1 = Person(id=michael.id)
del p1.discriminator
michael_merged = s.merge(p1) #Merge by parent class
however, I can't guarantee that code like the above will always work for future SQLAlchemy versions. It would not, for example, work for earlier versions where the "discriminator" was chosen at flush time.
Your code example is such that Programmer is already present in the Session; you can get this object in a class-agnostic way like this:
obj = session.query(Person).get(michael.id)
then you have your Programmer object and you can modify it freely.

Exclude soft deleted items in self referential relationship SQLAlchemy

I currently have a self referential relationship on the Foo:
parent_id = DB.Column(DB.Integer, DB.ForeignKey('foo.id'))
parent = DB.relation(
'Foo',
remote_side=[id],
backref=DB.backref(
'children',
primaryjoin=('and_(foo.c.id==foo.c.parent_id, foo.c.is_deleted==False)')
)
)
Now I am trying to exclude any children with is_deleted set as true. I'm pretty sure the problem is it is checking is_deleted against the parent, but I have no idea where to go from here.
How to modify the relationship so that children with is_deleted are not included in the result set?
I took a stab at answering this. My solution should work with SQLAlchemy>=0.8.
In effect nothing surprising is going on here, yet proper care has to be applied when using such patterns, as the state of the Sessions identity-map will not reflect the state of the DB all the time.
I used the post_update switch in the relationship to break the cyclical dependency which arises from this setup. For more information have a look at the SQLAlchemy documentation about this.
Warning: The fact that the Session does not always reflect the state of the DB may be a cause for nasty bugs and other confusions. In this example I use expire_all to show the real state of the DB, yet this is not a good solution because it reloads all objects and all un-flushed changes are lost. Use expire and expire_all with great care!
First we define the model
#!/usr/bin/env python
import sqlalchemy as sa
import sqlalchemy.orm as orm
from sqlalchemy.ext.declarative import declarative_base
engine = sa.create_engine('sqlite:///blah.db')
Base = declarative_base()
Base.bind = engine
class Obj(Base):
__table__ = sa.Table(
'objs', Base.metadata,
sa.Column('id', sa.Integer, primary_key=True),
sa.Column('parent_id', sa.Integer, sa.ForeignKey('objs.id')),
sa.Column('deleted', sa.Boolean),
)
# I used the remote() annotation function to make the whole thing more
# explicit and readable.
children = orm.relationship(
'Obj',
primaryjoin=sa.and_(
orm.remote(__table__.c.parent_id) == __table__.c.id,
orm.remote(__table__.c.deleted) == False,
),
backref=orm.backref('parent',
remote_side=[__table__.c.id]),
# This breaks the cyclical dependency which arises from my setup.
# For more information see: http://stackoverflow.com/a/18284518/15274
post_update=True,
)
def __repr__(self):
return "<Obj id=%d children=%d>" % (self.id, len(self.children))
Then we try it out
def main():
session = orm.sessionmaker(bind=engine)
db = session()
Base.metadata.create_all(engine)
p1 = Obj()
db.add(p1)
db.flush()
p2 = Obj()
p2.deleted = True
p1.children.append(p2)
db.flush()
# prints <Obj id=1 children=1>
# This means the object is in the `children` collection, even though
# it is deleted. If you want to prevent this you may want to use
# custom collection classes (not for novices!).
print p1
# We let SQLalchemy forget everything and fetch the state from the DB.
db.expire_all()
p3 = db.query(Obj).first()
# prints <Obj id=1 children=0>
# This indicates that the children which is still linked is not
# loaded into the relationship, which is what we wanted.
print p3
db.rollback()
if __name__ == '__main__':
main()
You should probably filter in the controller, not in the model.
This is not a perfect answer :-)
BTW - but I want to say this question is a perfect example that ORM-s and abstraction layers over SQL suck.
Looks like SQLAlchemy comes in the way of programmer, instead of helping him.
In SQL this is dead simple.
SELECT parent.*, child.*
FROM foo AS parent
JOIN foo AS child ON child.parent_id = parent.id
WHERE NOT child.is_deleted

How to set up global connection to database?

I have problem with setting up database connection. I want to set connection, where I can see this connection in all my controllers.
Now I use something like this in my controller:
db = create_engine('mysql://root:password#localhost/python')
metadata = MetaData(db)
email_list = Table('email',metadata,autoload=True)
In development.ini I have:
sqlalchemy.url = mysql://root#password#localhost/python
sqlalchemy.pool_recycle = 3600
How do I set _____init_____.py?
I hope you got pylons working; for anyone else that may later read question I'll present some pointers in the right direction.
First of all, you are only creating a engine and a metadata object. While you can use the engine to create connections directly you would almost always use a Session to manage querying and updating your database.
Pylons automatically setups this for you by creating a engine from your configuration file, then passing it to yourproject.model.__init__.py:init_model() which binds it to a scoped_session object.
This scoped_session object is available from yourproject.model.meta and is the object you would use to query your database. For example:
record = meta.Session.query(model.MyTable).filter(id=42)
Because it is a scoped_session it automatically creates a Session object and associates it with the current thread if it doesn't already exists. Scoped_session passes all action (.query(), .add(), .delete()) down into the real Session object and thus allows you a simple way to interact the database with having to manage the non-thread-safe Session object explicitly.
The scoped_session, Session, object from yourproject.model.meta is automatically associated with a metadata object created as either yourproject.model.meta:metadata (in pylons 0.9.7 and below) or yourproject.model.meta:Base.metadata (in pylons 1.0). Use this metadata object to define your tables. As you can see in newer versions of pylons a metadata is associated with a declarative_base() object named Base, which allows you to use SqlAlchemy's declarative style.
Using this from the controller
from yourproject import model
from yourproject.model import Session
class MyController(..):
def resource(self):
result = Session.query(model.email_list).\
filter(model.email_list.c.id=42).one()
return str(result)
Use real connections
If you really want to get a connection object simply use
from yourproject.model import Session
connection = Session.connection()
result = connection.execute("select 3+4;")
// more connection executions
Session.commit()
However this is all good, but what you should be doing is...
This leaves out that you are not really using SqlAlchemy much. The power of SqlAlchemy really shines when you start mapping your database tables to python classes. So anyone looking into using pylons with a database should take a serious look at what you can do with SqlAlchemy. If SqlAlchemy starts out intimidating simply start out with using its declarative approach, which should be enough for almost all pylons apps.
In your model instead of defining Table constructs, do this:
from sqlalchemy import Column, Integer, Unicode, ForeignKey
from sqlalchemy.orm import relation
from yourproject.model.meta import Base
class User(Base):
__tablename__ = 'users'
# primary_key implies nullable=False
id = Column(Integer, primary_key=True, index=True)
# nullable defaults to True
name = Column(Unicode, nullable=False)
notes = relation("UserNote", backref="user")
query = Session.query_property()
class UserNote(Base):
__tablename__ = 'usernotess'
# primary_key implies nullable=False
id = Column(Integer, primary_key=True, index=True)
userid = Column(Integer, index=True, ForeignKey("User.id"))
# nullable defaults to True
text = Column(Unicode, nullable=False)
query = Session.query_property()
Note the query objects. These are smart object that live on the class and associates your classes with the scoped_session(), Session. This allows you to event more easily extract data from your database.
from sqlalchemy.orm import eagerload
def resource(self):
user = User.query.filter(User.id==42).options(eagerload("notes")).one()
return "\n".join([ x.text for x in user.notes ])
1.0 version of Pylons use declarative syntax. More about this, you can see here .
In mode/init.py you can write somthing like this:
from your_programm.model.meta import Session, Base
from sqlalchemy import *
from sqlalchemy.types import *
def init_model(engine):
Session.configure(bind=engine)
class Foo(Base) :
__tablename__ = "foo"
id = Column(Integer, primary_key=True)
name = Column(String)
...
What you want to do is modify the Globals class in your app_globals.py file to include a .engine (or whatever) attribute. Then, in your controllers, you use from pylons import app_globals and app_globals.engine to access the engine (or metadata, session, scoped_session, etc...).

Categories