SQLAlchemy still able to get object from session after deletion - python

I have an endpoint to delete an object from my database. I delete it with the following code:
my_object = Object.query.get(id)
db.session.delete(my_object)
db.session.commit()
return json.dumps({'success': True}
I have an API test to test the endpoint where I create an object and then use the endpoint to delete it. I am trying to assert after the deletion happens it isn't in the database.
my_object = MyObject()
db.session.add(my_object)
db.session.commit()
response = requests.delete('{}/my-object/{}'.format(
os.environ.get('MY_URL'),
my_object.id
))
self.assertTrue(response.json()['success']) // this passes
self.assertEqual(200, response.status_code) // this passes
result = db.session.query(MyObject).get(my_object.id)
print(result.id) // prints the id of the job even though it is deleted from the database
I think this is related to some SQLAlchemy session caching. I have tried db.session.flush(), db.session.expire_all() to no avail. The object is actually being deleted from the database. So I would expect the query result to be None.
I see this in the docs but haven't full wrapped my head around it. when to commit and close a session
Thanks for any help.

So in your test code, you add the object to a session and commit it. It gets saved to the db, and is your session's identity map.
Then you hit your app, it has it's own session. It deletes the object and commits, now it's gone from the db. But...
Your previous session doesn't know anything about this, and when you use a .get(), it will give back what's in its identity map: a Python object with an ID. It won't refetch unless you close the session or force a refresh from the DB (I can't remember OTOH how to do this, but you can, it's in the docs somewhere). If you used a clean third session, it would have a fresh identity map and would not be holding onto a reference to the python object so you'd get what you expect, ie. no result. This is by design because the Identity Map allows SQLAlchemy to chain a bunch of changes together into one optimal SQL query that is only fired when you commit.
So yeah, you're seeing the fetch from the Identity Map which is still alive. (You can even pop it open in the interactive interpreter and poke around) And it makes sense, because say you have two threads of different web requests and one is part way doing some longer lived stuff with an object when another request deletes the object. The first thread shouldn't barf on the Python code working with the object, because that would just trigger random exceptions wherever you were in the code. It should just think that it can do its thing, and then fail on commit, triggering a rollback.
HTH

db.session.expunge_all()
"Remove all object instances from this Session..."
http://docs.sqlalchemy.org/en/latest/orm/session_api.html#sqlalchemy.orm.session.Session.expunge_all
Or simple trigger after each request db.session.remove()
For example in Flask with SQLAlchemy scoped_session:
#app.teardown_appcontext
def shutdown_session(exception=None):
db.session.remove()
"The scoped_session.remove() method, as always, removes the current Session associated with the thread, if any. However, one advantage of the threading.local() object is that if the application thread itself ends, the “storage” for that thread is also garbage collected. So it is in fact “safe” to use thread local scope with an application that spawns and tears down threads, without the need to call scoped_session.remove(). However, the scope of transactions themselves, i.e. ending them via Session.commit() or Session.rollback(), will usually still be something that must be explicitly arranged for at the appropriate time, unless the application actually ties the lifespan of a thread to the lifespan of a transaction."
http://docs.sqlalchemy.org/en/latest/orm/contextual.html#thread-local-scope
http://docs.sqlalchemy.org/en/latest/orm/contextual.html#using-thread-local-scope-with-web-applications

After a fair amount of reading and testing I found creating a new session to be the easiest solution. I wasn't able to figure out how to refresh the record from the database even though the record was stale.
Here is some reading I did:
when to commit and close a session
is the session a cache
is the session thread safe
understanding sqlalchemy session
Here is how I solved the problem by creating a new db connection and session using Flask-SQLAlchemy:
[my other imports...]
from flask.ext.sqlalchemy import SQLAlchemy
[my other code...]
def test_it_should_delete_my_object(self):
my_object = MyObject()
db.session.add(my_object)
db.session.commit()
response = requests.delete('{}/my-object/{}'.format(
os.environ.get('MY_URL'),
my_object.id
))
self.assertTrue(response.json()['success'])
self.assertEqual(200, response.status_code)
new_db = SQLAlchemy(app) // establishing a new connection
result = new_db.session.query(MyObject).get(my_object.id) // by using a new
self.assertIsNone(result)
Thank you all for the help. Gonna be doing more research on this.

Related

Pyramid dbsession always does rollback

I'm trying to use pyramid's transaction manager to commit the changes. Unfortunately every time, they're rolled back regardless of what I do.
I tried the simple:
def handle(conn):
conn.execute('''ALTER TABLE ....''')
with bootstrap(sys.argv[1]) as env:
with env['request'].tm:
handle(env['request'].dbsession)
As well as dropping down to connection and creating explicit transaction:
def handle(conn):
with conn.begin() as tran:
conn.execute('''ALTER TABLE ....''')
tran.commit()
with bootstrap(sys.argv[1]) as env:
with env['request'].tm:
handle(env['request'].dbsession.connection())
and a few other ways, but every time, I'm getting a ROLLBACK instead of a COMMIT.
Doing a simple commit at the end of the first case results in:
Error: Transaction must be committed using the transaction manager
I'm quite lost at what what is sqlalchemy actually doing in this case - why do I get a "success" with a rollback? What should I do to commit? What would it look like in case of a nested, explicit transaction inside handle?
As ilja said in the comments, the right answer is that when you're operating on the connection directly and not on the ORM session via ORM operations, it's not possible for zope.sqlalchemy to know whether you changed things or not. By default, zope.sqlalchemy requires you to either use the ORM or to mark the session changed manually.
from zope.sqlalchemy import mark_changed
mark_changed(env['request'].dbsession)
Alternatively, if this is a common pattern for you then you can configure zope.sqlalchemy to just always assume the session was changed and thus issue commits instead of rollbacks by default.
zope.sqlalchemy.register(..., initial_state='changed')
You already have a call like this somewhere in your code and you just need to add the initial_state attribute.

Using requests.Session() in Python as a context manager for a variable number of consecutive requests

I would like to use an object as a context manager, but control the moment when the __exit__ method is called.
In particular, I am using the Session() object offered by the Python requests module, which can be used as a context-manager. I understand that by simply using e.g. requests.get() a new Session() object is created and destroyed each time, while instead persisting a session should give a shorter response time and keep headers, etc for all requests within the session. I would like to make x requests with a Session(), then close the session and make y requests with a new Session() object, and so on until my code has finished running. However, since I have various calls to Session().get() in my code, I do not want to have it all contained within a with block. What are my options?
This is probably a simple problem for more experienced Python programmers, but I find myself stuck in figuring out how to organise my code. Any ideas about how this could be implemented? Happy to give any clarifications if needed. Thanks.
Note that calling Session() always creates a new Python object, here a requests session. This means that if you have, as you say
various calls to Session().get() in my code
then you are necessarily creating a new Session object each and every time. There is no way to wrap your code as written with something magic to reuse the same session. You have two main solutions.
1. Create one session object and reuse it.
Since your code probably looks something like this (where three independent sessions are created):
Session().get(...)
# ...
Session().get(...)
# ...
Session().get(...)
Instead, create one session reuse it each time. You would need to modify your code as such:
mysession = Session() # create a new session
mysession.get(...)
# ...
mysession.get(...) # session is reused
# ...
mysession.get(...) # session is reused
# finished with session, make sure to close it!
mysession.close()
This avoids putting everything in a with block.
2. Place your code in a with block.
with requests.Session() as s:
s.get(...)
s.get(...)
s.get(...)
# session s automatically closed here
As noted in the documentation, the with context manager
will make sure the session is closed as soon as the with block is exited, even if unhandled exceptions occurred.
That is the main benefit - you don't have to remember to close the session, which is very easy to forget particularly when you are making a large number of requests.
You note that you do not want to have all your session calls contained within a with block. One recommendation is to handle all of your requests in a with block, then process the resulting responses outside the block. This would a) separate your code into "get remote data" and "process remote data" sections, making it a little easier to read, however, without looking at your code/logic, this might not be possible.

SSL syscall error bad file descriptor using sqlalchemy and postgres

So I have a daemon process that talks to Postgres via sqlalchemy. The daemon does something like this:
while True:
oEngine = setup_new_engine()
with oEngine.connect() as conn:
Logger.debug("connection established")
DBSession = sessionmaker(bind=conn)()
Logger.debug('DBSession created. id={0}'.format(id(DBSession)))
#do a bunch of stuff with DBSession
DBSession.commit()
Logger.debug('DBSession committed. id={0}'.format(id(DBSession)))
On the first iteration of the forever loop everything works great. For a while. The DBSession successfully makes a few queries to the database. But then one query fails with the error:
OperationalError: (OperationalError) SSL SYSCALL error: Bad file descriptor
This speaks to me of a closed connection or file descriptor being used. But the connections are created and maintained by the daemon so I don't know what this means.
In other words what happens is:
create engine
open connection
setup dbsession
query dbsession => works great
query dbsession => ERROR
The query in question looks like:
DBSession.query(Login)
.filter(Login.LFTime == oLineTime)
.filter(Login.success == self.success)
.count()
which seems perfectly reasonable to me.
My question is: What kinds of reasons could there be for this kind of behaviour and how can I fix it or isolate the problem?
Let me know if you need more code. There is a heck of a lot of it so I went for the minimalist approach here...
I fixed this by thinking about the session scope instead of the transaction scope.
while True:
do_stuff()
def do_stuff():
oEngine = setup_new_engine()
with oEngine.connect() as conn:
Logger.debug("connection established")
DBSession = sessionmaker(bind=conn)()
#do a bunch of stuff with DBSession
DBSession.commit()
DBSession.close()
I would still like to know why this fixed things though...
You are creating the session inside your while loop, which is very ill-advised. With the code the way you had it the first time, you would spawn off a new connection at every iteration and leave it open. Before too long, you would be bound to hit some kind of limit and be unable to open yet another new session. (What kind of limit? Hard to say, but it could be a memory condition since DB connections are pretty weighty; it could be a DB-server limit where it will only accept a certain number of simultaneous user connections for performance reasons; hard to know and it doesn't really matter, because whatever the limit was, it has prevented you from using a very wasteful approach and hence has worked as intended!)
The solution you have hit upon fixes the problem because, as you open a new connection with each loop, so you also close it with each loop, freeing up the resources and allowing additional loops to create sessions of their own and succeed. However, this is still a lot of unnecessary busyness and a waste of processing resources on both the server and the client. I suspect it would work just as well-- and potentially be a lot faster-- if you move the sessionmaker outside the while loop.
def main():
oEngine = setup_new_engine()
with oEngine.connect() as conn:
Logger.debug("connection established")
DBSession = sessionmaker(bind=conn)()
apparently_infinite_loop(DBSession)
# close only after we are done and have somehow exited the infinite loop
DBSession.close()
def apparently_infinite_loop(DBSession):
while True:
#do a bunch of stuff with DBSession
DBSession.commit()
I don't currently have a working sqlalchemy setup, so you likely have some syntax errors in there, but anyway I hope it makes the point about the fundamental underlying issue.
More detail is available here: http://docs.sqlalchemy.org/en/rel_0_9/orm/session.html#session-faq-whentocreate
Some points from the docs to note:
"The Session will begin a new transaction if it is used again". So this is why you don't need to be constantly opening new sessions in order to get transaction scope; a commit is all it takes.
"As a general rule, the application should manage the lifecycle of the session externally to functions that deal with specific data." So your fundamental problem originally (and still) is all of that session management going on right down there inside the while loop right alongside your data processing code.

About refreshing objects in sqlalchemy session

I am dealing with a doubt about sqlalchemy and objects refreshing!
I am in the situation in what I have 2 sessions, and the same object has been queried in both sessions! For some particular thing I cannot to close one of the sessions.
I have modified the object and commited the changes in session A, but in session B, the attributes are the initial ones! without modifications!
Shall I implement a notification system to communicate changes or there is a built-in way to do this in sqlalchemy?
Sessions are designed to work like this. The attributes of the object in Session B WILL keep what it had when first queried in Session B. Additionally, SQLAlchemy will not attempt to automatically refresh objects in other sessions when they change, nor do I think it would be wise to try to create something like this.
You should be actively thinking of the lifespan of each session as a single transaction in the database. How and when sessions need to deal with the fact that their objects might be stale is not a technical problem that can be solved by an algorithm built into SQLAlchemy (or any extension for SQLAlchemy): it is a "business" problem whose solution you must determine and code yourself. The "correct" response might be to say that this isn't a problem: the logic that occurs with Session B could be valid if it used the data at the time that Session B started. Your "problem" might not actually be a problem. The docs actually have an entire section on when to use sessions, but it gives a pretty grim response if you are hoping for a one-size-fits-all solution...
A Session is typically constructed at the beginning of a logical
operation where database access is potentially anticipated.
The Session, whenever it is used to talk to the database, begins a
database transaction as soon as it starts communicating. Assuming the
autocommit flag is left at its recommended default of False, this
transaction remains in progress until the Session is rolled back,
committed, or closed. The Session will begin a new transaction if it
is used again, subsequent to the previous transaction ending; from
this it follows that the Session is capable of having a lifespan
across many transactions, though only one at a time. We refer to these
two concepts as transaction scope and session scope.
The implication here is that the SQLAlchemy ORM is encouraging the
developer to establish these two scopes in his or her application,
including not only when the scopes begin and end, but also the expanse
of those scopes, for example should a single Session instance be local
to the execution flow within a function or method, should it be a
global object used by the entire application, or somewhere in between
these two.
The burden placed on the developer to determine this scope is one area
where the SQLAlchemy ORM necessarily has a strong opinion about how
the database should be used. The unit of work pattern is specifically
one of accumulating changes over time and flushing them periodically,
keeping in-memory state in sync with what’s known to be present in a
local transaction. This pattern is only effective when meaningful
transaction scopes are in place.
That said, there are a few things you can do to change how the situation works:
First, you can reduce how long your session stays open. Session B is querying the object, then later you are doing something with that object (in the same session) that you want to have the attributes be up to date. One solution is to have this second operation done in a separate session.
Another is to use the expire/refresh methods, as the docs show...
# immediately re-load attributes on obj1, obj2
session.refresh(obj1)
session.refresh(obj2)
# expire objects obj1, obj2, attributes will be reloaded
# on the next access:
session.expire(obj1)
session.expire(obj2)
You can use session.refresh() to immediately get an up-to-date version of the object, even if the session already queried the object earlier.
Run this, to force session to update latest value from your database of choice:
session.expire_all()
Excellent DOC about default behavior and lifespan of session
I just had this issue and the existing solutions didn't work for me for some reason. What did work was to call session.commit(). After calling that, the object had the updated values from the database.
TL;DR Rather than working on Session synchronization, see if your task can be reasonably easily coded with SQLAlchemy Core syntax, directly on the Engine, without the use of (multiple) Sessions
For someone coming from SQL and JDBC experience, one critical thing to learn about SQLAlchemy, which, unfortunately, I didn't clearly come across reading through the multiple documents for months is that SQLAlchemy consists of two fundamentally different parts: the Core and the ORM. As the ORM documentation is listed first on the website and most examples use the ORM-like syntax, one gets thrown into working with it and sets them-self up for errors and confusion - if thinking about ORM in terms of SQL/JDBC. ORM uses its own abstraction layer that takes a complete control over how and when actual SQL statements are executed. The rule of thumb is that a Session is cheap to create and kill, and it should never be re-used for anything in the program's flow and logic that may cause re-querying, synchronization or multi-threading. On the other hand, the Core is the direct no-thrills SQL, very much like a JDBC Driver. There is one place in the docs I found that "suggests" using Core over ORM:
it is encouraged that simple SQL operations take place here, directly on the Connection, such as incrementing counters or inserting extra rows within log
tables. When dealing with the Connection, it is expected that Core-level SQL
operations will be used; e.g. those described in SQL Expression Language Tutorial.
Although, it appears that using a Connection causes the same side effect as using a Session: re-query of a specific record returns the same result as the first query, even if the record's content in the DB was changed. So, apparently Connections are as "unreliable" as Sessions to read DB content in "real time", but a direct Engine execution seems to be working fine as it picks a Connection object from the pool (assuming that the retrieved Connection would never be in the same "reuse" state relatively to the query as the specific open Connection). The Result object should be closed explicitly, as per SA docs
What is your isolation level is set to?
SHOW GLOBAL VARIABLES LIKE 'transaction_isolation';
By default mysql innodb transaction_isolation is set to REPEATABLE-READ.
+-----------------------+-----------------+
| Variable_name | Value |
+-----------------------+-----------------+
| transaction_isolation | REPEATABLE-READ |
+-----------------------+-----------------+
Consider setting it to READ-COMMITTED.
You can set this for your sqlalchemy engine only via:
create_engine("mysql://<connection_string>", isolation_level="READ COMMITTED")
I think another option is:
engine = create_engine("mysql://<connection_string>")
engine.execution_options(isolation_level="READ COMMITTED")
Or set it globally in the DB via:
SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED;
https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html
and
https://docs.sqlalchemy.org/en/14/orm/session_transaction.html#setting-transaction-isolation-levels-dbapi-autocommit
If u had added the incorrect model to the session, u can do:
db.session.rollback()

sqlalchemy session not recognizing changes in mysql database (done by other processes)

Application consists of:
main process (python+sqlalchemy) that
periodically check db (sleeps most
of the time)
child processes that write to db
web app that write to db
Problem is that the main process session doesn't seem to register changes in the db done outside that session. How do ensure it does? (as of now I am closing and reopening the session every time the process awakes and does its check).
I am closing and reopening the session every time the process awakes and does its check
SQLAlchemy will not work like this. Changes are tracked in the session.
someobj = Session.query(SomeClass).first()
puts someobj into Session internal cache. When you do someobj.attr = val, it marks the change in the Session associated with someobj.
So if you pulled object1 from some session1, then closed session1, object1 is not associated with any session anymore and is not tracked.
Best solution would be to commit right upon the change:
object1 = newsession.add(object1)
newsession.commit()
Otherwise you will have to manage own objects cache and merge all of them upon each check.

Categories