SQLAlchamy Database Construction & Reuse

SQLAlchamy Database Construction & Reuse - python

there's something I'm struggling to understand with SQLAlchamy from it's documentation and tutorials.
I see how to autoload classes from a DB table, and I see how to design a class and create from it (declaratively or using the mapper()) a table that is added to the DB.
My question is how does one write code that both creates the table (e.g. on first run) and then reuses it?
I don't want to have to create the database with one tool or one piece of code and have separate code to use the database.
Thanks in advance,
Peter

create_all() does not do anything if a table exists already, so just call it as soon as you set up your engine or connection.
(Note that if you change your table schema, create_all() will not update it! So you still need "another program" to do that.)
This is the usual pattern:
def createEngine(metadata, dsn, **args):
engine = create_engine(dsn, **args)
metadata.create_all(engine)
return engine
def doStuff(engine):
res = engine.execute('select * from mytable')
# etc etc
def main():
engine = createEngine(metadata, 'sqlite:///:memory:')
doStuff(engine)
if __name__=='__main__':
main()

I think you're perhaps over-thinking the situation. If you want to create the database afresh, you normally just call Base.metadata.create_all() or equivalent, and if you don't want to do that, you don't call it.
You could try calling it every time and handling the exception if it goes wrong, assuming that the database is already set up.
Or you could try querying for a certain table and if that fails, call create_all() to put everything in place.
Every other part of your app should work in the same way whether you perform the db creation or not.

Related

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation XXXXXXXX does not exist ------- Error using SQLAlchemy ORM [duplicate]

We host a multitenant app with SQLAlchemy and postgres. I am looking at moving from having separate databases for each tenant to a single database with multiple schemas. Does SQLAlchemy support this natively? I basically just want every query that comes out to be prefixed with a predetermined schema... e.g
select * from client1.users
instead of just
select * from users
Note that I want to switch the schema for all tables in a particular request/set of requests, not just a single table here and there.
I imagine that this could be accomplished with a custom query class as well but I can't imagine that something hasn't been done in this vein already.

well there's a few ways to go at this and it depends on how your app is structured. Here is the most basic way:
meta = MetaData(schema="client1")
If the way your app runs is one "client" at a time within the whole application, you're done.
But what may be wrong with that here is, every Table from that MetaData is on that schema. If you want one application to support multiple clients simultaneously (usually what "multitenant" means), this would be unwieldy since you'd need to create a copy of the MetaData and dupe out all the mappings for each client. This approach can be done, if you really want to, the way it works is you'd access each client with a particular mapped class like:
client1_foo = Client1Foo()
and in that case you'd be working with the "entity name" recipe at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/EntityName in conjunction with sometable.tometadata() (see http://docs.sqlalchemy.org/en/latest/core/metadata.html#sqlalchemy.schema.Table.tometadata).
So let's say the way it really works is multiple clients within the app, but only one at a time per thread. Well actually, the easiest way to do that in Postgresql would be to set the search path when you start working with a connection:
# start request
# new session
sess = Session()
# set the search path
sess.execute("SET search_path TO client1")
# do stuff with session
# close it. if you're using connection pooling, the
# search path is still set up there, so you might want to
# revert it first
sess.close()
The final approach would be to override the compiler using the #compiles extension to stick the "schema" name in within statements. This is doable, but would be tricky as there's not a consistent hook for everywhere "Table" is generated. Your best bet is probably setting the search path on each request.

If you want to do this at the connection string level then use the following:
dbschema='schema1,schema2,public' # Searches left-to-right
engine = create_engine(
'postgresql+psycopg2://dbuser#dbhost:5432/dbname',
connect_args={'options': '-csearch_path={}'.format(dbschema)})
But, a better solution for a multi-client (multi-tenant) application is to configure a different db user for each client, and configure the relevant search_path for each user:
alter role user1 set search_path = "$user", public

It can now be done using schema translation map in Sqlalchemy 1.1.
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
__table_args__ = {'schema': 'per_user'}
On each request, the Session can be set up to refer to a different schema each time:
session = Session()
session.connection(execution_options={
"schema_translate_map": {"per_user": "account_one"}})
# will query from the ``account_one.user`` table
session.query(User).get(5)
Referred it from the SO answer here.
Link to the Sqlalchemy docs.

You may be able to manage this using the sqlalchemy event interface. So before you create the first connection, set up a listener along the lines of
from sqlalchemy import event
from sqlalchemy.pool import Pool
def set_search_path( db_conn, conn_proxy ):
print "Setting search path..."
db_conn.cursor().execute('set search_path=client9, public')
event.listen(Pool,'connect', set_search_path )
Obviously this needs to be executed before the first connection is created (eg in the application initiallization)
The problem I see with the session.execute(...) solution is that this executes on a specific connection used by the session. However I cannot see anything in sqlalchemy that guarantees that the session will continue to use the same connection indefinitely. If it picks up a new connection from the connection pool, then it will lose the search path setting.
I am needing an approach like this in order to set the application search_path, which is different to the database or user search path. I'd like to be able to set this in the engine configuration, but cannot see a way to do this. Using the connect event does work. I'd be interested in a simpler solution if anyone has one.
On the other hand, if you are wanting to handle multiple clients within an application, then this won't work - and I guess the session.execute(...) approach may be the best approach.

from sqlalchemy 1.1,
this can be done easily using using schema_translation_map.
https://docs.sqlalchemy.org/en/11/changelog/migration_11.html#multi-tenancy-schema-translation-for-table-objects
connection = engine.connect().execution_options(
schema_translate_map={None: "user_schema_one"})
result = connection.execute(user_table.select())
Here is a detailed reviews of all options available:
https://github.com/sqlalchemy/sqlalchemy/issues/4081

It's possible to solve this on DB level. I suppose you have a dedicated user for your application who is granted some privileges on the schema. Just set search_path for him to this schema:
ALTER ROLE your_user IN DATABASE your_db SET search_path TO your_schema;

There is a schema property in Table definitions
I'm not sure if it works but you can try:
Table(CP.get('users', metadata, schema='client1',....)

I tried:
con.execute('SET search_path TO {schema}'.format(schema='myschema'))
and that didn't work for me. I then used the schema= parameter in the init function:
# We then bind the connection to MetaData()
meta = sqlalchemy.MetaData(bind=con, reflect=True, schema='myschema')
Then I qualified the table with the schema name
house_table = meta.tables['myschema.houses']
and everything worked.

You can just change your search_path. Issue
set search_path=client9;
at the start of your session and then just keep your tables unqualified.
You can also set a default search_path at a per-database or per-user level. I'd be tempted to set it to an empty schema by default so you can easily catch any failure to set it.
http://www.postgresql.org/docs/current/static/ddl-schemas.html#DDL-SCHEMAS-PATH

I found none of the above answers worked with SqlAlchmeny 1.2.4. This is the solution that worked for me.
from sqlalchemy import MetaData, Table
from sqlalchemy import create_engine
def table_schemato_psql(schema_name, table_name):
conn_str = 'postgresql://{username}:{password}#localhost:5432/{database}'.format(
username='<username>',
password='<password>',
database='<database name>'
)
engine = create_engine(conn_str)
with engine.connect() as conn:
conn.execute('SET search_path TO {schema}'.format(schema=schema_name))
meta = MetaData()
table_data = Table(table_name, meta,
autoload=True,
autoload_with=conn,
postgresql_ignore_search_path=True)
for column in table_data.columns:
print column.name

I use the following pattern.
engine = sqlalchemy.create_engine("postgresql://postgres:mypass#172.17.0.2/mydb")
for schema in ['schema1', 'schema2']:
engine.execute(CreateSchema(schema))
tmp_engine = engine.execution_options(schema_translate_map = { None: schema } )
Base.metadata.create_all(tmp_engine)

For anyone who is coming here, for a more general solution that can support MYSQL or Oracle, please refer to this guide.
So basically it set the schemas for the engine when the first connection to the database is made.
engine = create_engine("engine_url")
#event.listens_for(engine, "connect", insert=True)
def set_current_schema(dbapi_connection, connection_record):
cursor_obj = dbapi_connection.cursor()
cursor_obj.execute(f"USE {self.schemas_name}")
cursor_obj.close()
the query to execute depends is specific to the database you are using, so for PSQL you will have a different query, for ORACLE, you will have a different, etc.

It is possible to check a variable value in a function in testing?

I have the following function:
def create_tables(tables: list) -> None:
template = jinja2.Template(
open("/opt/airflow/etl_scripts/postgres/staging_orientdb_create_tables.sql", "r").read()
)
pg_hook = PostgresHook(POSTGRES_CONN_ID) # this is airflow module for Postgres
conn = pg_hook.get_conn()
for table in tables:
sql = template.render(TABLE_NAME=table)
with conn.cursor() as cur:
cur.execute(sql)
conn.commit()
Is there a solution to check the content of the "execute" or "sql" internal variable?
My test looks like this, but because I return nothing, I can't check:
def test_create_tables(mocker):
pg_mock = MagicMock(name="pg")
mocker.patch.object(
PostgresHook,
"get_conn",
return_value=pg_mock
)
mock_file_content = "CREATE TABLE IF NOT EXISTS {{table}}"
mocker.patch("builtins.open", mock_open(read_data=mock_file_content))
create_tables(tables=["mock_table_1", "mock_table_2"])
Thanks,

You cannot access internal variables of a function from the outside.
I strongly suggest refactoring your create_tables function then, if you already see that you want to test a specific part of its algorithm.
You could create a function like get_sql_from_table for example. Then test that separately and mock it out in your test_create_tables.
Although, since all it does is call the template rendering function form an external library, I am not sure, if that is even something you should be testing. You should assume/know that they test their functions themselves.
Same goes for the Postgres functions.
But the general advice stands: If you want to verify that a specific part of your code does what you expect it to do, factor it out into its own unit/function and test that separately.

How to target PostgreSQL schema in SQLAlchemy DB URI? [duplicate]

We host a multitenant app with SQLAlchemy and postgres. I am looking at moving from having separate databases for each tenant to a single database with multiple schemas. Does SQLAlchemy support this natively? I basically just want every query that comes out to be prefixed with a predetermined schema... e.g
select * from client1.users
instead of just
select * from users
Note that I want to switch the schema for all tables in a particular request/set of requests, not just a single table here and there.
I imagine that this could be accomplished with a custom query class as well but I can't imagine that something hasn't been done in this vein already.

If you want to do this at the connection string level then use the following:
dbschema='schema1,schema2,public' # Searches left-to-right
engine = create_engine(
'postgresql+psycopg2://dbuser#dbhost:5432/dbname',
connect_args={'options': '-csearch_path={}'.format(dbschema)})
But, a better solution for a multi-client (multi-tenant) application is to configure a different db user for each client, and configure the relevant search_path for each user:
alter role user1 set search_path = "$user", public

It can now be done using schema translation map in Sqlalchemy 1.1.
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
__table_args__ = {'schema': 'per_user'}
On each request, the Session can be set up to refer to a different schema each time:
session = Session()
session.connection(execution_options={
"schema_translate_map": {"per_user": "account_one"}})
# will query from the ``account_one.user`` table
session.query(User).get(5)
Referred it from the SO answer here.
Link to the Sqlalchemy docs.

You may be able to manage this using the sqlalchemy event interface. So before you create the first connection, set up a listener along the lines of
from sqlalchemy import event
from sqlalchemy.pool import Pool
def set_search_path( db_conn, conn_proxy ):
print "Setting search path..."
db_conn.cursor().execute('set search_path=client9, public')
event.listen(Pool,'connect', set_search_path )
Obviously this needs to be executed before the first connection is created (eg in the application initiallization)
The problem I see with the session.execute(...) solution is that this executes on a specific connection used by the session. However I cannot see anything in sqlalchemy that guarantees that the session will continue to use the same connection indefinitely. If it picks up a new connection from the connection pool, then it will lose the search path setting.
I am needing an approach like this in order to set the application search_path, which is different to the database or user search path. I'd like to be able to set this in the engine configuration, but cannot see a way to do this. Using the connect event does work. I'd be interested in a simpler solution if anyone has one.
On the other hand, if you are wanting to handle multiple clients within an application, then this won't work - and I guess the session.execute(...) approach may be the best approach.

from sqlalchemy 1.1,
this can be done easily using using schema_translation_map.
https://docs.sqlalchemy.org/en/11/changelog/migration_11.html#multi-tenancy-schema-translation-for-table-objects
connection = engine.connect().execution_options(
schema_translate_map={None: "user_schema_one"})
result = connection.execute(user_table.select())
Here is a detailed reviews of all options available:
https://github.com/sqlalchemy/sqlalchemy/issues/4081

It's possible to solve this on DB level. I suppose you have a dedicated user for your application who is granted some privileges on the schema. Just set search_path for him to this schema:
ALTER ROLE your_user IN DATABASE your_db SET search_path TO your_schema;

There is a schema property in Table definitions
I'm not sure if it works but you can try:
Table(CP.get('users', metadata, schema='client1',....)

I tried:
con.execute('SET search_path TO {schema}'.format(schema='myschema'))
and that didn't work for me. I then used the schema= parameter in the init function:
# We then bind the connection to MetaData()
meta = sqlalchemy.MetaData(bind=con, reflect=True, schema='myschema')
Then I qualified the table with the schema name
house_table = meta.tables['myschema.houses']
and everything worked.

You can just change your search_path. Issue
set search_path=client9;
at the start of your session and then just keep your tables unqualified.
You can also set a default search_path at a per-database or per-user level. I'd be tempted to set it to an empty schema by default so you can easily catch any failure to set it.
http://www.postgresql.org/docs/current/static/ddl-schemas.html#DDL-SCHEMAS-PATH

I found none of the above answers worked with SqlAlchmeny 1.2.4. This is the solution that worked for me.
from sqlalchemy import MetaData, Table
from sqlalchemy import create_engine
def table_schemato_psql(schema_name, table_name):
conn_str = 'postgresql://{username}:{password}#localhost:5432/{database}'.format(
username='<username>',
password='<password>',
database='<database name>'
)
engine = create_engine(conn_str)
with engine.connect() as conn:
conn.execute('SET search_path TO {schema}'.format(schema=schema_name))
meta = MetaData()
table_data = Table(table_name, meta,
autoload=True,
autoload_with=conn,
postgresql_ignore_search_path=True)
for column in table_data.columns:
print column.name

I use the following pattern.
engine = sqlalchemy.create_engine("postgresql://postgres:mypass#172.17.0.2/mydb")
for schema in ['schema1', 'schema2']:
engine.execute(CreateSchema(schema))
tmp_engine = engine.execution_options(schema_translate_map = { None: schema } )
Base.metadata.create_all(tmp_engine)

For anyone who is coming here, for a more general solution that can support MYSQL or Oracle, please refer to this guide.
So basically it set the schemas for the engine when the first connection to the database is made.
engine = create_engine("engine_url")
#event.listens_for(engine, "connect", insert=True)
def set_current_schema(dbapi_connection, connection_record):
cursor_obj = dbapi_connection.cursor()
cursor_obj.execute(f"USE {self.schemas_name}")
cursor_obj.close()
the query to execute depends is specific to the database you are using, so for PSQL you will have a different query, for ORACLE, you will have a different, etc.

When should I be calling flush() on SQLAlchemy?

I'm new to SQLAlchemy and have inherited a somewhat messy codebase without access to the original author.
The code is litered with calls to DBSession.flush(), seemingly any time the author wanted to make sure data was being saved. At first I was just following patterns I saw in this code, but as I'm reading docs, it seems this is unnecessary - that autoflushing should be in place. Additionally, I've gotten into a few cases with AJAX calls that generate the error "InvalidRequestError: Session is already flushing".
Under what scenarios would I legitimately want to keep a call to flush()?
This is a Pyramid app, and SQLAlchemy is being setup with:
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension(), expire_on_commit=False))
Base = declarative_base()

The ZopeTransactionExtension on the DBSession in conjunction with the pyramid_tm being active on your project will handle all commits for you. The situations where you need to flush are:
You want to create a new object and get back the primary key.
DBSession.add(obj)
DBSession.flush()
log.info('look, my new object got primary key %d', obj.id)
You want to try to execute some SQL in a savepoint and rollback if it fails without invalidating the entire transaction.
sp = transaction.savepoint()
try:
foo = Foo()
foo.id = 5
DBSession.add(foo)
DBSession.flush()
except IntegrityError:
log.error('something already has id 5!!')
sp.rollback()
In all other cases involving the ORM, the transaction will be aborted for you upon exception, or committed upon success automatically by pyramid_tm. If you execute raw SQL, you will need to execute transaction.commit() yourself or mark the session as dirty via zope.sqlalchemy.mark_changed(DBSession) otherwise there is no way for the ZTE to know the session has changed.
Also you should leave expire_on_commit at the default of True unless you have a really good reason.

Creating pooled objects (Python)

I am writing a script that requires interacting with several databases (not concurrently). In order to facilitate this, I am mainting the db related information (connections etc) in a dictionary. As an aside, I am using sqlAlchemy for all interaction with the db. I don't know whether that is relevant to this question or not.
I have a function to set up the pool. It looks somewhat like this:
def setupPool():
global pooled_objects
for name in NAMES:
engine = create_engine("postgresql+psycopg2://postgres:pwd#localhost/%s" % name)
metadata = MetaData(engine)
conn = engine.connect()
tbl = Table('my_table', metadata, autoload=True)
info = {'db_connection': conn, 'table': tbl }
pooled_objects[name] = info
I am not sure if there are any gotchas in the code above, since I am using the same variable names, and its not clear (to me atleast), how the underlying pointers to the resources (connection are being handled). For example, will creating another engine (to a different db) and assigning it to the 'engine' variable cause the previous instance to be 'harvested' by the GC (since no code is using that reference yet - the pool is still being setup).
In short, is the code above OK?, and if not, why not - i.e. how may I fix it with respect to the issues mentioned above?

The code you have is perfectly good.
Just because you use the same variable name does not mean you are overriding (or freeing) another object that was assigned to that variable. In fact, you can look at the names as temporary labels to your objects.
Now, you store the final objects in the global dictionary pooled_objects, which means that until your program is done or your delete data from there explicitely, GC is not going to free them.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.