Querying objects added to a non committed session in SQLAlchemy

Querying objects added to a non committed session in SQLAlchemy - python

So I placed this question without too much context, and got downvoted, let's try again...
For one, I don't follow the logic behind SQLAlchemy's session.add. I understand that it queues the object for insertion, and I understand that session.query looks in the connected database rather than in the session, but is it at all possible, within SQLAlchemy, to query the session without first doing session.flush? My expectation from something which reads session.query is that it queries the session...
I am now manually looking in session.new after a None comes out of session.query().first().
There's two reasons why I don't want to do session.flush before my session.query,
one based on efficiency fears (why should I write to the database, and query the database if I am still within a session which the user may want to rollback?);
two is because I've adopted a fairly large program, and it manages to define its own Session whose instances causes flush to also commit.
So really the core of this question is who's helping me find an error in a GPL program on github!
This is a code snippet with a surprising behaviour in bauble/ghini:
# setting up things in ghini
# <replace-later>
import bauble
import bauble.db as db
db.open('sqlite:///:memory:', verify=False)
from bauble.prefs import prefs
import bauble.pluginmgr as pluginmgr
prefs.init()
prefs.testing = True
pluginmgr.load()
db.create(True)
Session = bauble.db.Session
from bauble.plugins.garden import Location
# </replace-later>
# now just plain straightforward usage
session = Session()
session.query(Location).delete()
session.commit()
u0 = session.query(Location).filter_by(code=u'mario').first()
print u0
u1 = Location(code=u'mario')
session.add(u1)
session.flush()
u2 = session.query(Location).filter_by(code=u'mario').one()
print u1, u2, u1==u2
session.rollback()
u3 = session.query(Location).filter_by(code=u'mario').first()
print u3
the output here is:
None
mario mario True
mario
here you have what I think is just standard simple code to set up a database:
from sqlalchemy import Column, Unicode
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Location(Base):
__tablename__ = 'location'
code = Column(Unicode(64), index=True, primary_key=True)
def __init__(self, code=None):
self.code = code
def __repr__(self):
return self.code
from sqlalchemy import create_engine
engine = create_engine('sqlite:///joindemo.db')
Base.metadata.create_all(engine)
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine, autoflush=False)
with this, the output of the same above code snippet is less surprising:
None
mario mario True
None

The reason why flushes in bauble end up emitting a COMMIT is the line 133 in db.py where they handle their history table:
table.insert(dict(table_name=mapper.local_table.name,
table_id=instance.id, values=str(row),
operation=operation, user=user,
timestamp=datetime.datetime.today())).execute()
Instead of issuing the additional SQL in the event handler using the passed in transactional connection, as they should, they execute the statement itself as is, which means it ends up using the engine as the bind (found through the table's metadata). Executing using the engine has autocommit behaviour. Since bauble always uses a SingletonThreadPool, there's just one connection per thread, and so that statement ends up committing the flushed changes as well. I wonder if this bug is the reason why bauble disables autoflush...
The fix is to change the event handling to use the transactional connection:
class HistoryExtension(orm.MapperExtension):
"""
HistoryExtension is a
:class:`~sqlalchemy.orm.interfaces.MapperExtension` that is added
to all clases that inherit from bauble.db.Base so that all
inserts, updates, and deletes made to the mapped objects are
recorded in the `history` table.
"""
def _add(self, operation, mapper, connection, instance):
"""
Add a new entry to the history table.
"""
... # a ton of code here
table = History.__table__
stmt = table.insert(dict(table_name=mapper.local_table.name,
table_id=instance.id, values=str(row),
operation=operation, user=user,
timestamp=datetime.datetime.today()))
connection.execute(stmt)
def after_update(self, mapper, connection, instance):
self._add('update', mapper, connection, instance)
def after_insert(self, mapper, connection, instance):
self._add('insert', mapper, connection, instance)
def after_delete(self, mapper, connection, instance):
self._add('delete', mapper, connection, instance)
It's worth a note that MapperExtension has been deprecated since version 0.7.
Regarding your views about the session I quote "Session Basics", which you really should read through:
In the most general sense, the Session establishes all conversations with the database and represents a “holding zone” for all the objects which you’ve loaded or associated with it during its lifespan. It provides the entrypoint to acquire a Query object, which sends queries to the database using the Session object’s current database connection, ...
and "Is the Session a cache?":
Yeee…no. It’s somewhat used as a cache, in that it implements the identity map pattern, and stores objects keyed to their primary key. However, it doesn’t do any kind of query caching. This means, if you say session.query(Foo).filter_by(name='bar'), even if Foo(name='bar') is right there, in the identity map, the session has no idea about that. It has to issue SQL to the database, get the rows back, and then when it sees the primary key in the row, then it can look in the local identity map and see that the object is already there. It’s only when you say query.get({some primary key}) that the Session doesn’t have to issue a query.
So:
My expectation from something which reads session.query is that it queries the session...
Your expectations are wrong. The Session handles talking to the DB – among other things.
There's two reasons why I don't want to do session.flush before my session.query,
one based on efficiency fears (why should I write to the database, and query the database if I am still within a session which the user may want to rollback?);
Because your DB may do validation, have triggers, and generate values for some columns – primary keys, timestamps, and the like. The data you thought you're inserting may end up something else in the DB and the Session has absolutely no way to know about that.
Also, why should SQLAlchemy implement a sort of an in-memory DB in itself, with its own query engine, and all the problems that come with synchronizing 2 databases? How would SQLAlchemy support all the different operations and functions of different DBs you query against? Your simple equality predicate example just scratches the surface.
When you rollback, you roll back the DB's transaction (along with the session's unflushed changes).
two is because I've adopted a fairly large program, and it manages to define its own Session whose instances causes flush to also commit.
Caused by the event handling bug.

Related

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation XXXXXXXX does not exist ------- Error using SQLAlchemy ORM [duplicate]

We host a multitenant app with SQLAlchemy and postgres. I am looking at moving from having separate databases for each tenant to a single database with multiple schemas. Does SQLAlchemy support this natively? I basically just want every query that comes out to be prefixed with a predetermined schema... e.g
select * from client1.users
instead of just
select * from users
Note that I want to switch the schema for all tables in a particular request/set of requests, not just a single table here and there.
I imagine that this could be accomplished with a custom query class as well but I can't imagine that something hasn't been done in this vein already.

well there's a few ways to go at this and it depends on how your app is structured. Here is the most basic way:
meta = MetaData(schema="client1")
If the way your app runs is one "client" at a time within the whole application, you're done.
But what may be wrong with that here is, every Table from that MetaData is on that schema. If you want one application to support multiple clients simultaneously (usually what "multitenant" means), this would be unwieldy since you'd need to create a copy of the MetaData and dupe out all the mappings for each client. This approach can be done, if you really want to, the way it works is you'd access each client with a particular mapped class like:
client1_foo = Client1Foo()
and in that case you'd be working with the "entity name" recipe at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/EntityName in conjunction with sometable.tometadata() (see http://docs.sqlalchemy.org/en/latest/core/metadata.html#sqlalchemy.schema.Table.tometadata).
So let's say the way it really works is multiple clients within the app, but only one at a time per thread. Well actually, the easiest way to do that in Postgresql would be to set the search path when you start working with a connection:
# start request
# new session
sess = Session()
# set the search path
sess.execute("SET search_path TO client1")
# do stuff with session
# close it. if you're using connection pooling, the
# search path is still set up there, so you might want to
# revert it first
sess.close()
The final approach would be to override the compiler using the #compiles extension to stick the "schema" name in within statements. This is doable, but would be tricky as there's not a consistent hook for everywhere "Table" is generated. Your best bet is probably setting the search path on each request.

If you want to do this at the connection string level then use the following:
dbschema='schema1,schema2,public' # Searches left-to-right
engine = create_engine(
'postgresql+psycopg2://dbuser#dbhost:5432/dbname',
connect_args={'options': '-csearch_path={}'.format(dbschema)})
But, a better solution for a multi-client (multi-tenant) application is to configure a different db user for each client, and configure the relevant search_path for each user:
alter role user1 set search_path = "$user", public

It can now be done using schema translation map in Sqlalchemy 1.1.
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
__table_args__ = {'schema': 'per_user'}
On each request, the Session can be set up to refer to a different schema each time:
session = Session()
session.connection(execution_options={
"schema_translate_map": {"per_user": "account_one"}})
# will query from the ``account_one.user`` table
session.query(User).get(5)
Referred it from the SO answer here.
Link to the Sqlalchemy docs.

You may be able to manage this using the sqlalchemy event interface. So before you create the first connection, set up a listener along the lines of
from sqlalchemy import event
from sqlalchemy.pool import Pool
def set_search_path( db_conn, conn_proxy ):
print "Setting search path..."
db_conn.cursor().execute('set search_path=client9, public')
event.listen(Pool,'connect', set_search_path )
Obviously this needs to be executed before the first connection is created (eg in the application initiallization)
The problem I see with the session.execute(...) solution is that this executes on a specific connection used by the session. However I cannot see anything in sqlalchemy that guarantees that the session will continue to use the same connection indefinitely. If it picks up a new connection from the connection pool, then it will lose the search path setting.
I am needing an approach like this in order to set the application search_path, which is different to the database or user search path. I'd like to be able to set this in the engine configuration, but cannot see a way to do this. Using the connect event does work. I'd be interested in a simpler solution if anyone has one.
On the other hand, if you are wanting to handle multiple clients within an application, then this won't work - and I guess the session.execute(...) approach may be the best approach.

from sqlalchemy 1.1,
this can be done easily using using schema_translation_map.
https://docs.sqlalchemy.org/en/11/changelog/migration_11.html#multi-tenancy-schema-translation-for-table-objects
connection = engine.connect().execution_options(
schema_translate_map={None: "user_schema_one"})
result = connection.execute(user_table.select())
Here is a detailed reviews of all options available:
https://github.com/sqlalchemy/sqlalchemy/issues/4081

It's possible to solve this on DB level. I suppose you have a dedicated user for your application who is granted some privileges on the schema. Just set search_path for him to this schema:
ALTER ROLE your_user IN DATABASE your_db SET search_path TO your_schema;

There is a schema property in Table definitions
I'm not sure if it works but you can try:
Table(CP.get('users', metadata, schema='client1',....)

I tried:
con.execute('SET search_path TO {schema}'.format(schema='myschema'))
and that didn't work for me. I then used the schema= parameter in the init function:
# We then bind the connection to MetaData()
meta = sqlalchemy.MetaData(bind=con, reflect=True, schema='myschema')
Then I qualified the table with the schema name
house_table = meta.tables['myschema.houses']
and everything worked.

You can just change your search_path. Issue
set search_path=client9;
at the start of your session and then just keep your tables unqualified.
You can also set a default search_path at a per-database or per-user level. I'd be tempted to set it to an empty schema by default so you can easily catch any failure to set it.
http://www.postgresql.org/docs/current/static/ddl-schemas.html#DDL-SCHEMAS-PATH

I found none of the above answers worked with SqlAlchmeny 1.2.4. This is the solution that worked for me.
from sqlalchemy import MetaData, Table
from sqlalchemy import create_engine
def table_schemato_psql(schema_name, table_name):
conn_str = 'postgresql://{username}:{password}#localhost:5432/{database}'.format(
username='<username>',
password='<password>',
database='<database name>'
)
engine = create_engine(conn_str)
with engine.connect() as conn:
conn.execute('SET search_path TO {schema}'.format(schema=schema_name))
meta = MetaData()
table_data = Table(table_name, meta,
autoload=True,
autoload_with=conn,
postgresql_ignore_search_path=True)
for column in table_data.columns:
print column.name

I use the following pattern.
engine = sqlalchemy.create_engine("postgresql://postgres:mypass#172.17.0.2/mydb")
for schema in ['schema1', 'schema2']:
engine.execute(CreateSchema(schema))
tmp_engine = engine.execution_options(schema_translate_map = { None: schema } )
Base.metadata.create_all(tmp_engine)

For anyone who is coming here, for a more general solution that can support MYSQL or Oracle, please refer to this guide.
So basically it set the schemas for the engine when the first connection to the database is made.
engine = create_engine("engine_url")
#event.listens_for(engine, "connect", insert=True)
def set_current_schema(dbapi_connection, connection_record):
cursor_obj = dbapi_connection.cursor()
cursor_obj.execute(f"USE {self.schemas_name}")
cursor_obj.close()
the query to execute depends is specific to the database you are using, so for PSQL you will have a different query, for ORACLE, you will have a different, etc.

Python flask-sqlalchemy: Do I have to commit session after a query?

I am writing an app in python flask-sqlalchemy with MySQL DB (https://flask-sqlalchemy.palletsprojects.com/en/2.x/) and I am wondering if I have to make "db.session.commit() or db.session.rollback()" after GET call, which only query DB .
For example:
#app.route('/getOrders')
def getOrders():
orders = Order.query.all()
# Do I have to put here "db.session.commit()" or "db.session.rollback" ?
return { 'orders': [order.serialize() for order in orders] }

orders = Order.query.all() is a SELECT query that could be extended to include additional filters (WHERE etc.). It doesn't alter the database, it simply reads values from it. You don't need to commit on a read for precisely that reason - what would you store other than "I just read this data"? Databases are concerned about this in other ways i.e. access permissions and logs.
Given the above, rollback doesn't make any sense because there are no changes to actually roll back.
Flask-SQLAlchemy does a bit of magic around sessions amongst other things. It's roughly equivalent to:
from sqlalchemy.orm import scoped_session, sessionmaker
Session = sessionmaker(bind=engine, autocommit=False, autoflush=False)
db_session = scoped_session(Session)
Followed with a method to close sessions:
def init_db(app):
app.teardown_appcontext(teardown_session)
def teardown_session(exception=None):
db_session.remove()
Bottom line being: no, you don't have to worry about commit or rollback here, even in SQL, and the session management (completely separate) is handled by Flask-SQLALchemy

Insert not working for SQLAlchemy database session

Why isn't a record being inserted? There is an id returned but when I check the database there is no new record.
From models.py
from zope.sqlalchemy import ZopeTransactionExtension
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
And views.py
DBSession.execute(text('INSERT INTO (a,b,c) VALUES (\'a\',\'b\',\'c\') RETURNING id'), params=dict(a=a,b=b,c=c))
I've tried committing with transaction.commit() which doesn't get an error but doesn't insert a record. result.fetchone()[0] is getting an id.
And DBSession.commit which gets
assert self.transaction_manager.get().status == ZopeStatus.COMMITTING, "Transaction must be committed using the transaction manager"

This is because you are not using ORM to insert new rows threfore transaction doesn't know it should commit on it's own, because transaction state is not marked as dirty.
Place the following code after you DBSession.execute the query in your views.py.
from zope.sqlalchemy import mark_changed
session = DBSession()
session.execute(...your query...)
mark_changed(session)
At this point transaction should be able to properly commit your query, alternatively use ORM to insert the new row.
Here is a bit more on this subject:
https://pypi.python.org/pypi/zope.sqlalchemy/0.7.4#id15
By default, zope.sqlalchemy puts sessions in an 'active' state when they are first used. ORM write operations automatically move the session into a 'changed' state. This avoids unnecessary database commits. Sometimes it is necessary to interact with the database directly through SQL. It is not possible to guess whether such an operation is a read or a write. Therefore we must manually mark the session as changed when manual SQL statements write to the DB.

try
DBSession.flush()
after execute

When should I be calling flush() on SQLAlchemy?

I'm new to SQLAlchemy and have inherited a somewhat messy codebase without access to the original author.
The code is litered with calls to DBSession.flush(), seemingly any time the author wanted to make sure data was being saved. At first I was just following patterns I saw in this code, but as I'm reading docs, it seems this is unnecessary - that autoflushing should be in place. Additionally, I've gotten into a few cases with AJAX calls that generate the error "InvalidRequestError: Session is already flushing".
Under what scenarios would I legitimately want to keep a call to flush()?
This is a Pyramid app, and SQLAlchemy is being setup with:
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension(), expire_on_commit=False))
Base = declarative_base()

The ZopeTransactionExtension on the DBSession in conjunction with the pyramid_tm being active on your project will handle all commits for you. The situations where you need to flush are:
You want to create a new object and get back the primary key.
DBSession.add(obj)
DBSession.flush()
log.info('look, my new object got primary key %d', obj.id)
You want to try to execute some SQL in a savepoint and rollback if it fails without invalidating the entire transaction.
sp = transaction.savepoint()
try:
foo = Foo()
foo.id = 5
DBSession.add(foo)
DBSession.flush()
except IntegrityError:
log.error('something already has id 5!!')
sp.rollback()
In all other cases involving the ORM, the transaction will be aborted for you upon exception, or committed upon success automatically by pyramid_tm. If you execute raw SQL, you will need to execute transaction.commit() yourself or mark the session as dirty via zope.sqlalchemy.mark_changed(DBSession) otherwise there is no way for the ZTE to know the session has changed.
Also you should leave expire_on_commit at the default of True unless you have a really good reason.

How to create a non-persistent Elixir/SQLAlchemy object?

Because of legacy data which is not available in the database but some external files, I want to create a SQLAlchemy object which contains data read from the external files, but isn't written to the database if I execute session.flush()
My code looks like this:
try:
return session.query(Phone).populate_existing().filter(Phone.mac == ident).one()
except:
return self.createMockPhoneFromLicenseFile(ident)
def createMockPhoneFromLicenseFile(self, ident):
# Some code to read necessary data from file deleted....
phone = Phone()
phone.mac = foo
phone.data = bar
phone.state = "Read from legacy file"
phone.purchaseOrderPosition = self.getLegacyOrder(ident)
# SQLAlchemy magic doesn't seem to work here, probably because we don't insert the created
# phone object into the database. So we set the id fields manually.
phone.order_id = phone.purchaseOrderPosition.order_id
phone.order_position_id = phone.purchaseOrderPosition.order_position_id
return phone
Everything works fine except that on a session.flush() executed later in the application SQLAlchemy tries to write the created Phone object to the database (which fortunately doesn't succeed, because phone.state is longer than the data type allows), which breaks the function which issues the flush.
Is there any way to prevent SQLAlchemy from trying to write such an object?
Update
While I didn't find anything on
using_mapper_options(save_on_init=False)
in the Elixir documentation (maybe you can provide a link?), it seemed to me worth a try (I would have preferred a way to prevent a single instance from being written instead of the whole entity).
At first it seemed that the statement has no effect and I suspected my SQLAlchemy/Elixir versions to be too old, but then I found out that the connection to the PurchaseOrderPosition entity (which I didn't modify) made with
phone.purchaseOrderPosition = self.getLegacyOrder(ident)
causes the phone object to be written again. If I remove the statement, everything seems to be fine.

You need to do
import elixir
elixir.options_defaults['mapper_options'] = { 'save_on_init': False }
to prevent Entity instances which you instantiate being auto-added to the session. Ideally, this should be done as early as possible in your code. You can also do this on a per-entity basis, via using_mapper_options(save_on_init=False) - see the Elixir documentation for more details.
Update:
See this post on the Elixir mailing list indicating that this is the solution.
Also, as Ants Aasma points out, you can use cascade options on the Elixir relationship to set up cascade options in SQLAlchemy. See this page for more details.

Well, sqlalchemy doesn't, by default.
Consider the following self-contained example code.
from sqlalchemy import Column, Integer, Unicode, create_engine
from sqlalchemy.orm import create_session
from sqlalchemy.ext.declarative import declarative_base
e = create_engine('sqlite://')
Base = declarative_base(bind=e)
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(Unicode(50))
# create the empty table and a session
Base.metadata.create_all()
s = create_session(bind=e, autoflush=False, autocommit=False)
# assert the table is empty
assert s.query(User).all() == []
# create a new User instance but don't save it to database:
u = User()
u.name = 'siebert'
# I could run s.add(u) here but I won't
s.flush()
s.commit()
# assert the table is still empty
assert s.query(User).all() == []
So I'm not sure what's implicity adding your instances to the session. Normally you have to manually call s.add(u) to make it go to the session. I'm not familiar with elixir so perhaps this is some elixir trickery... Maybe you could remove it from the session, by using session.expunge().

Old post but I came across a similar issue, in my case in sqlalchemy it was caused by cascading on backrefs:
http://docs.sqlalchemy.org/en/rel_0_7/orm/session.html#backref-cascade
Turn it off on your backrefs so that you have to explicitly add things to the session

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.