SQLAlchemy DetachedInstanceError with regular attribute (not a relation) - python

I just started using SQLAlchemy and get a DetachedInstanceError and can't find much information on this anywhere. I am using the instance outside a session, so it is natural that SQLAlchemy is unable to load any relations if they are not already loaded, however, the attribute I am accessing is not a relation, in fact this object has no relations at all. I found solutions such as eager loading, but I can't apply to this because this is not a relation. I even tried "touching" this attribute before closing the session, but it still doesn't prevent the exception. What could be causing this exception for a non-relational property even after it has been successfully accessed once before? Any help in debugging this issue is appreciated. I will meanwhile try to get a reproducible stand-alone scenario and update here.
Update: This is the actual exception message with a few stacks:
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/attributes.py", line 159, in __get__
return self.impl.get(instance_state(instance), instance_dict(instance))
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/attributes.py", line 377, in get
value = callable_(passive=passive)
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/state.py", line 280, in __call__
self.manager.deferred_scalar_loader(self, toload)
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/mapper.py", line 2323, in _load_scalar_attributes
(state_str(state)))
DetachedInstanceError: Instance <ReportingJob at 0xa41cd8c> is not bound to a Session; attribute refresh operation cannot proceed
The partial model looks like this:
metadata = MetaData()
ModelBase = declarative_base(metadata=metadata)
class ReportingJob(ModelBase):
__tablename__ = 'reporting_job'
job_id = Column(BigInteger, Sequence('job_id_sequence'), primary_key=True)
client_id = Column(BigInteger, nullable=True)
And the field client_id is what is causing this exception with a usage like the below:
Query:
jobs = session \
.query(ReportingJob) \
.filter(ReportingJob.job_id == job_id) \
.all()
if jobs:
# FIXME(Hari): Workaround for the attribute getting lazy-loaded.
jobs[0].client_id
return jobs[0]
This is what triggers the exception later out of the session scope:
msg = msg + ", client_id: %s" % job.client_id

I found the root cause while trying to narrow down the code that caused the exception. I placed the same attribute access code at different places after session close and found that it definitely doesn't cause any issue immediately after the close of query session. It turns out the problem starts appearing after closing a fresh session that is opened to update the object. Once I understood that the state of the object is unusable after a session close, I was able to find this thread that discussed this same issue. Two solutions that come out of the thread are:
Keep a session open (which is obvious)
Specify expire_on_commit=False to sessionmaker().
The 3rd option is to manually set expire_on_commit to False on the session once it is created, something like: session.expire_on_commit = False. I verified that this solves my issue.

We were getting similar errors, even with expire_on_commit set to False. In the end it was actually caused by having two sessionmakers that were both getting used to make sessions in different requests. I don't really understand what was going on, but if you see this exception with expire_on_commit=False, make sure you don't have two sessionmakers initialized.

I had a similar problem with the DetachedInstanceError: Instance <> is not bound to a Session;
The situation was quite simple, I pass the session and the record to be updated to my function and it would merge the record and commit it to the database. In the first sample I would get the error, as I was lazy and thought that I could just return the merged object so my operating record would be updated (ie its is_modified value would be false). It did return the updated record and is_modified was now false but subsequent uses threw the error. I think this was compounded because of related child records but not entirely sure of that.
def EditStaff(self, session, record):
try:
r = session.merge(record)
session.commit()
return r
except:
return False
After much googling and reading about sessions etc, I realized that since I had captured the instance r before the commit and returned it, when that same record was sent back to this function for another edit/commit it had lost its session.
So to fix this I just query the database for the record just updated and return it to keep it in session and mark its is_modified value back to false.
def EditStaff(self, session, record):
try:
session.merge(record)
session.commit()
r = self.GetStaff(session, record)
return r
except:
return False
Setting the expire_on_commit=False also avoided the error as mentioned above, but I don't think it actually addresses the error, and could lead to many other issues IMO.

To throw my cause & solution into the ring, I use flask and flask-sqlalchemy to manage all my session stuff. This is fine when I'm doing things via the site, but when doing things via command line and scripts, you have to ensure that anything that's doing flask-y things has to do it with the flask context.
So, in my situation, I needed to get things from a database (using flask-sqlalchemy), then render them to templates (using flask's render_template), then email them (using flask-mail).
In code, what I'd done was something like,
def render_obj(db_obj):
with app.app_context():
return render_template('template_for_my_db_obj.html', db_obj=db_obj
def get_renders():
my_db_objs = MyDbObj.query.all()
renders = []
for day, _db_objs in itertools.groupby(my_db_objs, MyDbObj.get_date):
renders.extend(list(map(render_obj, _db_obj)))
return renders
def email_report():
renders = get_renders()
report = '\n'.join(renders)
with app.app_context():
mail.send(Message('Subject', ['me#me.com'], html=report))
(this is basically pseudocode, I was doing other things in the grouping section)
And when I was running, I'd get through the first _db_obj, but then I'd get the error on any run after.
The culprit? with app.app_context().
Basically it does a few things when you come out of that context, including kinda freshening up the db connections. One of the things that comes from that is getting rid of the last session that was around, which was the session that all the my_db_objs were associated with.
There's a few different options for solutions, but I went with a variant of,
def render_obj(db_obj):
return render_template('template_for_my_db_obj.html', db_obj=db_obj
def get_renders():
my_db_objs = MyDbObj.query.all()
renders = []
for day, _db_objs in itertools.groupby(my_db_objs, MyDbObj.get_date):
renders.extend(list(map(render_obj, _db_obj)))
return renders
def email_report():
with app.app_context():
renders = get_renders()
report = '\n'.join(renders)
mail.send(Message('Subject', ['me#me.com'], html=report))
Only 1 with app.app_context() which wraps them all. The main thing you need to do (if you've a setup like mine) is ensure any dB object you're using to be "inside" any app_context you're using. If you do what I did in the first iteration, all your dB objects will lose their session, ending in DetachedInstanceError like me.

My solution was a simple oversight;
I created an object, added and ,committed it to the db and after that I tried to access on of the original object attributes without refreshing session session.refresh(object)
user = UserFactory()
session.add(user)
session.commit()
# missing session.refresh(user) and causing the problem
print(user.name)

As for me (newbie), I made a mistake on the indent and close the session inside my loop, in which I loop each row, do some operation and commit each time.
So for those newbie like me, check your code before setting things like expire_on_commit=False, it may lead your to another trap.

My solution to this error was also a simple oversight, which I don't think any of the other answers cover.
My function is fetching object x, modifying it, then returning the original x, because I would like the older version.
Before committing and returning x, I was calling expunge_all, but it was "too late", as the object was already marked dirty.
The solution was simply to expunge the object as early as possible.
# pseudo code
x = session.fetch_x()
# adding the following line fixed it
session.expunge(x)
y = session.update(x)
return x

I have a similar problem in my current project and this fix works for me. Please check in your DB relationship for options lazy=True and change it to lazy='dynamic'.

Related

Instance is not bound to a Session

Just like many other people I run into the problem of an instance not being bound to a Session. I have read the SQLAlchemy docs and the top-10 questions here on SO. Unfortunatly I have not found an explanation or solution to my error.
My guess is that the commit() closes the session rendering the object unbound. Does this mean that In need to create two objects, one to use with SQL Alchemy and one to use with the rest of my code?
I create an object something like this:
class Mod(Base):
__tablename__ = 'mod'
insert_timestamp = Column(DateTime, default=datetime.datetime.now())
name = Column(String, nullable=False)
Then I add it to the database using this function and afterwards the object is usesless, I cannot do anything with it anymore, always getting the error that it is not bound to a session. I have tried to keep the session open, keep it closed, copy the object, open two sessions, return the object, return a copy of the object.
def add(self, dataobjects: list[Base]) -> bool:
s = self.sessionmaker()
try:
s.add_all(dataobjects)
except TypeError:
s.rollback()
raise
else:
s.commit()
return True
This is my session setup:
self.engine = create_engine(f"sqlite:///{self.config.database['file']}")
self.sessionmaker = sessionmaker(bind=self.engine)
Base.metadata.bind = self.engine
My last resort would be to create every object twice, once for SQL Alchemy and once so I can actually use the object in my code. This defeats the purpose of SQL Alchemy for me.
ORM entities are expired when committed. By default, if an entity attribute is accessed after expiry, the session emits a SELECT to get the current value from the database. In this case, the session only exists in the scope of the add method, so subsequent attribute access raises the "not bound to a session" error.
There are a few ways to approach this:
Pass expire _on_commit=False when creating the session. This will prevent automatic expiry so attribute values will remain accessible after leaving the add function, but they may be stale.
Create the session outside of the add function and pass it as an argument (or return it from add, though that's rather ugly). As long as the session is not garbage collected the entities remain bound to it.
Create a new session and do mod = session.merge(mod) for each entity to bind it to the new session.
Which option you choose depends on your application.

Celery pickling not playing nice with Cassandra driver, can't figure out the root cause

I'm experiencing some behavior that I can't quite figure out. I'm using Cassandra to store message objects, and I'm using Celery for async pulls and pushes to the database. Everything is working fine, except for a single Celery task; the other tasks that use the same code/classes work. Here's a rough breakdown of the code logic:
db_manager = DBManager()
class User(object):
def __init__(self, user_id):
... normal init stuff ...
self.loader()
#run_async
def loader(self):
... loads from database if found, otherwise pulls from API ...
# THIS WORKS
#celery.task(name='user-to-db', filter=task_method)
def to_db(self):
# db_manager is a custom backend that handles relevant db reads, writes, etc.
db_manager.add('users', self.user_payload)
# THIS WORKS
#celery.task(name='load-friends', filter=task_method)
def load_friends(self):
# Checks secondary redis index for friends of user
friends = redis.srandmember('users:the-users-id:friends', self.id, 20)
if not friends:
profiles = load_friends_from_api(user_id=self.id)
else:
query = "SELECT * FROM keyspace.users WHERE id IN ({friends})".format(friends=friends)
# Init a User object for every friend
loaded_friends = [User(friend) for friend in profiles]
# Returns a class container with all the instances of User(friend), accessible through a class property
return FriendContainer(self.id, loaded_friends)
# THIS DOES NOT WORK
#celery.task(name='get-user-messages', filter=task_method)
def get_user_messages(self):
# THIS IS WHERE IT FAILS #
messages = db_manager.get("SELECT message FROM keyspace.message_timelines WHERE user_id = {user_id}".format(user_id=self.id))
# THAT LINE ABOVE #
# Init a message class object for every message payload in database
msgs = [Message(m, user=self) for m in messages]
# Returns a message container class holding all the message objects, accessible through a class property
return MessageContainer(msgs)
This last class method throws an error:
File "/usr/local/lib/python2.7/dist-packages/kombu/serialization.py", line 356, in pickle_dumps
return dumper(obj, protocol=pickle_protocol)
EncodeError: Can't pickle <class 'cassandra.io.eventletreactor.message'>: attribute lookup cassandra.io.eventletreactor.message failed
cassandra.io.eventletreactor.message points to a user-defined type in Cassandra that I use as a container for message objects per user. The line that throws this error is:
messages = db_manager.get("SELECT message FROM keyspace.message_timelines WHERE user_id = {user_id}".format(user_id=self.id))
This is the method from DBManager():
class DBManager(object):
... stuff ...
def get(self, query):
# I do some stuff to prepare the query, namely substituting `WHERE this = that` for `WHERE this = ?` to create a Cassandra prepared statement.
statement = cassandra.prepare(query_prepared)
# I want these messages as a dict, not the default namedtuple
cassandra.row_factory = dict_factory
# User id is parsed out of query
results = cassandra.execute(statement, (user_id,))
rows = results.current_rows
# rows is a list of dicts, no weird class references or anything in there
return rows
I've read that Celery tasks out of class methods is/was kind of experimental, but I can't figure out why all the other methods qua tasks that use the same instance of DBManager are working.
The problem seems to be localized to some issue with the user-defined type message that's not playing nice within the Cassandra driver; however, if I run the get method from DBManager within the Celery task itself, it works. That is, if I copy/paste the code that is throwing the error from DBManager.get into User.get_user_messages, it works fine. If I try to call DBManager.get from within User.get_user_messages, it breaks.
I just can't figure out where the problem is. I can do all the following just fine:
Run the get_user_messages method without Celery, and it works.
Run the get_user_messages method WITH Celery if I run the get method code right in the Celery task method itself.
I can run other methods registered as Celery tasks that point to other methods in DBManager that use the Cassandra driver, even ones that insert the same message user-defined type into the database.
I've tried pickling ALL THE THINGS all the way down myself, and in various combinations, and can't reproduce the error.
What I have not tried:
Change serializer to json or yaml. There are a few convenience items in the db payload that won't serialize with either of those two.
Use dill instead of pickle. It seems like this should work without having to switch serializers given that I can get various parts working separately.
I could just say screw it and run the query directly through the Cassandra driver instead of my DBManager class, but I feel like this should be solvable and I'm just missing something really, really obvious, so obvious that I'm not seeing it. Any suggestions on where to look would be greatly appreciated.
In case of relevance: Cassandra 3.3, CQL 3.4, DataStax python driver 3.1
Meh, I found the problem, and it WAS really obvious. I guess I didn't actually try pickling all the things, just most of the things, and I didn't catch this in my 4am debugging stupor.
At any rate, cassandra.row_factory = dict_factory, when called on a user defined type, doesn't actually return everything as a dict. It gives a dict of {'label': message(x='this', y='that')}, where message is a namedtuple. The Cassandra driver dynamically creates the namedtuple inside of a class instance, and so pickle couldn't find it.

How can I cache/memoize my SQLAlchemy functions?

I am using FLask-OAuthlib and want to do some caching/memoization using Flask-Cache. I've got caching setup on my views but I'm having trouble with caching this function:
#oauth.clientgetter
#cache.memoize(timeout=86400)
def load_client(client_id):
return DBSession.query(Client).filter_by(client_id=client_id).first()
The first time the function is run (not cached yet) it runs fine but when it gets it from cache something gets messed up somehow and says it's an invalid client. I don't know if it's caching it incorrectly or if having the #oauth.clientgetter decorator somehow screws up the caching. Everything works fine without caching and the client is valid. I've tried to move the function around like so, but get the same result:
class Client(Base):
__tablename__ = 'client'
__table_args__ = {'autoload': True}
user = relationship('User')
#classmethod
#cache.memoize(timeout=86400)
def get_client(cls,client_id):
return DBSession.query(cls).filter_by(client_id=client_id).first()
Then, in my view I have:
#oauth.clientgetter
def load_client(client_id):
return Client.get_client(client_id)
But this gives the same result. I am using redis as my cache backend and the keys/values I have are:
1) "flask_cache_Pwd2uVDVikMYMDNB+gVWlW"
2) "flask_cache_api.models.Client.get_client_memver"
3) "flask_cache_http://lvho.st:5000/me"
GET flask_cache_Pwd2uVDVikMYMDNB+gVWlW:
"!ccopy_reg\n_reconstructor\np1\n(capi.models\nClient\np2\nc__builtin__\nobject\np3\nNtRp4\n(dp5\nS'_sa_instance_state'\np6\ng1\n(csqlalchemy.orm.state\nInstanceState\np7\ng3\nNtRp8\n(dp9\nS'manager'\np10\ng1\n(csqlalchemy.orm.instrumentation\n_SerializeManager\np11\ng3\nNtRp12\n(dp13\nS'class_'\np14\ng2\nsbsS'class_'\np15\ng2\nsS'modified'\np16\nI00\nsS'committed_state'\np17\n(dp18\nsS'instance'\np19\ng4\nsS'callables'\np20\n(dp21\nsS'key'\np22\n(g2\n(S'Iu6copdawXIQIskY5kwPgxFgU7JoE9lTSqmlqw29'\np23\nttp24\nsS'expired'\np25\nI00\nsbsVuser_id\np26\nL4L\nsVname\np27\nS'Default'\np28\nsV_default_scopes\np29\nS'email'\np30\nsVclient_id\np31\ng23\nsV_redirect_uris\np32\nS'http://localhost:8000/authorized/'\np33\nsVactive\np34\nI1\nsVclient_secret\np35\nS'Vnw0YJjgNzR06KiwXWmYz7aSPu1ht7JnY1eRil4s5vXLM9N2ph'\np36\nsVdescription\np37\nNsb."
GET flask_cache_api.models.Client.get_client_memver:
"!S'+gVWlW'\np1\n."
Try reversing the order of your decorators:
#cache.memoize(timeout=86400)
#oauth.clientgetter
def load_client(client_id):
return DBSession.query(Client).filter_by(client_id=client_id).first()
EDIT
The problem seem to be that a Client object is not pickle-able, while cache.memoize relies on objects' pickle-ability. Therefor, in one case, you end up with an invalid-client error (the client object did not "survive" the picke-dump-then-pickle-load process), and in another case, with some kind of caching error which (silently) prevents the object from being cached (I'm not sure what mechanism causes this silent-handling).
In any case, it seems to me you shouldn't attempt to memoize your client object in the first place.

Parent instance is not bound to a Session; lazy load operation of attribute ’account’ cannot proceed

While trying to do the following operation:
for line in blines:
line.account = get_customer(line.AccountCode)
I am getting an error while trying to assign a value to line.account:
DetachedInstanceError: Parent instance <SunLedgerA at 0x16eda4d0> is not bound to a Session; lazy load operation of attribute 'account' cannot proceed
Am I doing something wrong??
"detached" means you're dealing with an ORM object that is not associated with a Session. The Session is the gateway to the relational database, so anytime you refer to attributes on the mapped object, the ORM will sometimes need to go back to the database to get the current value of that attribute. In general, you should only work with "attached" objects - "detached" is a temporary state used for caching and for moving objects between sessions.
See Quickie Intro to Object States, then probably read the rest of that document too ;).
I had the same problem with Celery. Adding lazy='subquery' to relationship solved my problem.
I encountered this type of DetachedInstanceError when I prematurely close the query session (that is, having code to deal with those SQLAlchemy model objects AFTER the session is closed). So that's one clue to double check no session closure until you absolutely don't need interact with model objects, I.E. some Lazy Loaded model attributes etc.
I had the same problem when unittesting.
The solution was to call everything within the "with" context:
with self.app.test_client() as c:
res = c.post('my_url/test', data=XYZ, content_type='application/json')
Then it worked.
Adding the lazy attribute didn't work for me.
To access the attribute connected to other table, you should call it within session.
#contextmanager
def get_db_session(engine):
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
db = SessionLocal()
try:
yield db
except Exception:
db.rollback()
raise
finally:
db.close()
with get_db_session(engine) as sess:
data = sess.query(Groups).all()
# `group_users` is connected to other table
print([x.group_users for x in data]) # sucess
print([x.group_users for x in data]) # fail

When should I be calling flush() on SQLAlchemy?

I'm new to SQLAlchemy and have inherited a somewhat messy codebase without access to the original author.
The code is litered with calls to DBSession.flush(), seemingly any time the author wanted to make sure data was being saved. At first I was just following patterns I saw in this code, but as I'm reading docs, it seems this is unnecessary - that autoflushing should be in place. Additionally, I've gotten into a few cases with AJAX calls that generate the error "InvalidRequestError: Session is already flushing".
Under what scenarios would I legitimately want to keep a call to flush()?
This is a Pyramid app, and SQLAlchemy is being setup with:
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension(), expire_on_commit=False))
Base = declarative_base()
The ZopeTransactionExtension on the DBSession in conjunction with the pyramid_tm being active on your project will handle all commits for you. The situations where you need to flush are:
You want to create a new object and get back the primary key.
DBSession.add(obj)
DBSession.flush()
log.info('look, my new object got primary key %d', obj.id)
You want to try to execute some SQL in a savepoint and rollback if it fails without invalidating the entire transaction.
sp = transaction.savepoint()
try:
foo = Foo()
foo.id = 5
DBSession.add(foo)
DBSession.flush()
except IntegrityError:
log.error('something already has id 5!!')
sp.rollback()
In all other cases involving the ORM, the transaction will be aborted for you upon exception, or committed upon success automatically by pyramid_tm. If you execute raw SQL, you will need to execute transaction.commit() yourself or mark the session as dirty via zope.sqlalchemy.mark_changed(DBSession) otherwise there is no way for the ZTE to know the session has changed.
Also you should leave expire_on_commit at the default of True unless you have a really good reason.

Categories