Application consists of:
main process (python+sqlalchemy) that
periodically check db (sleeps most
of the time)
child processes that write to db
web app that write to db
Problem is that the main process session doesn't seem to register changes in the db done outside that session. How do ensure it does? (as of now I am closing and reopening the session every time the process awakes and does its check).
I am closing and reopening the session every time the process awakes and does its check
SQLAlchemy will not work like this. Changes are tracked in the session.
someobj = Session.query(SomeClass).first()
puts someobj into Session internal cache. When you do someobj.attr = val, it marks the change in the Session associated with someobj.
So if you pulled object1 from some session1, then closed session1, object1 is not associated with any session anymore and is not tracked.
Best solution would be to commit right upon the change:
object1 = newsession.add(object1)
newsession.commit()
Otherwise you will have to manage own objects cache and merge all of them upon each check.
Related
I have a Flask application has following app setup:
game_dict = load_game_dict()
app.run(host="0.0.0.0", debug=False, threaded=False, processes=3)
Here the game_dict is some meta data which load from db. we have a requirement to refresh this game_dict once a week, so I defined another GET method for admin to refresh it:
#app.route('/api/admin/reload/dict', methods=['GET'])
def api_admin_reload_dict():
""" API for reloading game dict from database at runtime """
global game_dict
game_dict = load_game_dict()
I found it doesn't work, I debug and found the game_dict after this call still keep the initial value, I guess it is because I am using processes=3 here.
Did anyone have the idea about how to process in this situation?
Thanks
The multiple processes will not have access to each others memory.
However, two processes can read data from a shared entity. Where an entity could be a database or object in memory or another process.
If game_dict doesn't hold much data you could hit your db before every request and reload the game_dict variable with the new data.
If hitting the database for all the game_dict data is slow then you could set a counter flag that updates whenever the game_dict metadata in the db is updated, and then compare the current process' counter to the db's counter and update the current process' game_dict variable if the counters differ.
Alternatively you could use Python's builtin Multiprocessing Manager object for sharing state across processes by creating a separate server process that manages the shared state.
Now, I use Amazon RDS, lambda, python and sqlalchemy. when I checked amazon rds performance insights, I find some rollback invoked. rollback is invoked so far.
But when i excute other query in insights, there are not error.
How can i find where is rollback invoked? or why is rollback invoked?
I doubt wrong query. so, I tried to send same query that i found query in performance insights. but there are no rollback.
I doubt traffic issue. So, I tried to send many same query about (1000000) using 'for' and 5 terminal at the same time. After I check show processlist. but there are no rollback.
I heard sqlalchemy.create_engine use connection pool and when connection close, sqlalchemy invoked rollback. but I don't know, How can i check this issue and this issue is solution of this problem.
this is a my rds performance insights
Rollbacks can originate from either rolling back a transaction to unwind queries, or upon returning a connection to the pool.
One way that you could get a feel for what your app is doing would be to hook into those rollback actions through the event system to enable some tracking.
There are two events that you'd need to look at:
ConnectionEvents.rollback:
Intercept rollback() events, as initiated by a Transaction.
PoolEvents.reset:
Called before the “reset” action occurs for a pooled connection.
You could set listeners on these events that increment some counters, or perform some logging that is specific to counting the number of rollbacks. Then you'd be able to get a feel for the relative weight of transaction rollbacks vs pool rollbacks.
E.g. using some crude global counters but you can add whatever logic that you need:
import logging
from sqlalchemy import event
POOL_ROLLBACKS = 0
TXN_ROLLBACKS = 0
#event.listens_for(YourEngine, 'reset')
def receive_reset(dbapi_connection, connection_record):
POOL_ROLLBACKS += 1
logging.debug(f"Pool rollback count: {POOL_ROLLBACKS}")
#event.listens_for(YourEngine, 'rollback')
def receive_rollback(conn):
# track a transaction based rollback
TXN_ROLLBACKS += 1
logging.debug(f'Transaction rollback count {TXN_ROLLBACKS}')
I am using turbogears2 with MySQL db. With the same code, single thread case can update/write to the tables. But thread thread has no error, however, no write is successful.
Outside turbogears2, multi threads can write to the tables no problems.
No error or complaints with multi thread with tg2. Just no successful write to the table.
I will be very grateful if anyone using tg2 can advise.
With default configuration settings, in a regular request/response cycle, TuborGears2 enables a transaction manager to automatically commit changes to the database when a controller has finished processing a request.
This is introduced in the Wiki in 20 Minutes tutorial:
[...] you would usually need to flush the SQLAlchemy Unit of Work and commit the currently running transaction, those are operations that TurboGears2 transaction management will automatically do for us.
You don’t have to do anything to use this transaction management system, it should just work.
However, for everything that is outside a regular request/response cycle, for example a stream, or a different thread like a scheduler, manually flushing the session and committing the transaction is required. This is performed with DBSession.flush() and transaction.commit().
I have an endpoint to delete an object from my database. I delete it with the following code:
my_object = Object.query.get(id)
db.session.delete(my_object)
db.session.commit()
return json.dumps({'success': True}
I have an API test to test the endpoint where I create an object and then use the endpoint to delete it. I am trying to assert after the deletion happens it isn't in the database.
my_object = MyObject()
db.session.add(my_object)
db.session.commit()
response = requests.delete('{}/my-object/{}'.format(
os.environ.get('MY_URL'),
my_object.id
))
self.assertTrue(response.json()['success']) // this passes
self.assertEqual(200, response.status_code) // this passes
result = db.session.query(MyObject).get(my_object.id)
print(result.id) // prints the id of the job even though it is deleted from the database
I think this is related to some SQLAlchemy session caching. I have tried db.session.flush(), db.session.expire_all() to no avail. The object is actually being deleted from the database. So I would expect the query result to be None.
I see this in the docs but haven't full wrapped my head around it. when to commit and close a session
Thanks for any help.
So in your test code, you add the object to a session and commit it. It gets saved to the db, and is your session's identity map.
Then you hit your app, it has it's own session. It deletes the object and commits, now it's gone from the db. But...
Your previous session doesn't know anything about this, and when you use a .get(), it will give back what's in its identity map: a Python object with an ID. It won't refetch unless you close the session or force a refresh from the DB (I can't remember OTOH how to do this, but you can, it's in the docs somewhere). If you used a clean third session, it would have a fresh identity map and would not be holding onto a reference to the python object so you'd get what you expect, ie. no result. This is by design because the Identity Map allows SQLAlchemy to chain a bunch of changes together into one optimal SQL query that is only fired when you commit.
So yeah, you're seeing the fetch from the Identity Map which is still alive. (You can even pop it open in the interactive interpreter and poke around) And it makes sense, because say you have two threads of different web requests and one is part way doing some longer lived stuff with an object when another request deletes the object. The first thread shouldn't barf on the Python code working with the object, because that would just trigger random exceptions wherever you were in the code. It should just think that it can do its thing, and then fail on commit, triggering a rollback.
HTH
db.session.expunge_all()
"Remove all object instances from this Session..."
http://docs.sqlalchemy.org/en/latest/orm/session_api.html#sqlalchemy.orm.session.Session.expunge_all
Or simple trigger after each request db.session.remove()
For example in Flask with SQLAlchemy scoped_session:
#app.teardown_appcontext
def shutdown_session(exception=None):
db.session.remove()
"The scoped_session.remove() method, as always, removes the current Session associated with the thread, if any. However, one advantage of the threading.local() object is that if the application thread itself ends, the “storage” for that thread is also garbage collected. So it is in fact “safe” to use thread local scope with an application that spawns and tears down threads, without the need to call scoped_session.remove(). However, the scope of transactions themselves, i.e. ending them via Session.commit() or Session.rollback(), will usually still be something that must be explicitly arranged for at the appropriate time, unless the application actually ties the lifespan of a thread to the lifespan of a transaction."
http://docs.sqlalchemy.org/en/latest/orm/contextual.html#thread-local-scope
http://docs.sqlalchemy.org/en/latest/orm/contextual.html#using-thread-local-scope-with-web-applications
After a fair amount of reading and testing I found creating a new session to be the easiest solution. I wasn't able to figure out how to refresh the record from the database even though the record was stale.
Here is some reading I did:
when to commit and close a session
is the session a cache
is the session thread safe
understanding sqlalchemy session
Here is how I solved the problem by creating a new db connection and session using Flask-SQLAlchemy:
[my other imports...]
from flask.ext.sqlalchemy import SQLAlchemy
[my other code...]
def test_it_should_delete_my_object(self):
my_object = MyObject()
db.session.add(my_object)
db.session.commit()
response = requests.delete('{}/my-object/{}'.format(
os.environ.get('MY_URL'),
my_object.id
))
self.assertTrue(response.json()['success'])
self.assertEqual(200, response.status_code)
new_db = SQLAlchemy(app) // establishing a new connection
result = new_db.session.query(MyObject).get(my_object.id) // by using a new
self.assertIsNone(result)
Thank you all for the help. Gonna be doing more research on this.
I am using Django model with Oracle and there are two processes having a DB connection respectively.
In order for both processes to have a DB connection respectively, I first close the connection in the main process before forking a process and then fork a new process so that it prevents child process to copy the DB connection.
from django.db import connection
connection.close()
childProcess.start()
As a result, each process re-open DB connection each when they first try to access the DB through Django model.
In this circumstance, In the main process it works fine with django.db.connection.queries returning query information in the process but in the child process it returns always an empty list.
How can I get query information in the child process using django.db.connection.queries as well?
I found what I go wrong with the problem I asked.
In the child Process, it creates a new Thread.
This Thread only does monitor the query operated by the main Thread(in the child Process).
I thought that connection object offered by Django is global resource in a Process.(that is why I separated the rule of Threads, one is to run queries and the other one is to monitor the queries).
However, after a series of test, I found that a connection object only collects query information operated by the Thread.
It means that django.db.connection run per a Thread not a process.
As a result, I mistook the problem's cause for Multi-processing problem.