How to automap a backend-agnostic Base in SQLAlchemy - python

I am trying to take a simple database schema in Oracle and migrate it to a mssql database. There are other ways I can do this but my first thought was to utilize SQLAlchemy's automap and create_all functionality to do it pretty much instantaneously.
Unfortunately when I attempt to do so I run into some conversion errors:
Input:
from sqlalchemy.ext.automap import automap_base
from custom_connections import connect_to_oracle, connect_to_mssql
Base = automap_base()
oracle_engine = connect_to_oracle()
mssql_engine = connect_to_mssql()
Base.prepare(oracle_engine, reflect=True, schema = ‘ORACLE_MAIN_DB’)
Base.metadata.create_all(mssql_engine)
(Note that the connect_to functions are custom functions which return sqlalchemy engines. Currently they just return engines with base settings.)
Output:
CompileError: (in table 'acct', column 'acctnbr'): Compiler <sqlalchemy.dialects.mssql.base.MSTypeCompiler object at 0x00000268E8FF6DA0> can't render element of type <class 'sqlalchemy.dialects.oracle.base.NUMBER'>
The issue is that while Sqlalchemy is converting most types to sqlalchemy types when mapping the Base, it doesn't do the same with Oracle NUMBER types. I attempted a similar trick using alembic autogeneration off the automapped Base, but the Oracle NUMBER types caused issues there as well.
Given all the power behind it, I would have thought Sqlalchemy would be able to handle this without any issues. Is there a technique or setting I could use when running this code which would cause it to convert all types to their Sqlalchemy equivalent when mapping the base instead of just most types?

This is described in the SQLAlchemy docs about reflection:
from sqlalchemy import MetaData, Table, create_engine
from sqlalchemy import event
mysql_engine = create_engine("mysql://scott:tiger#localhost/test")
metadata_obj = MetaData()
my_mysql_table = Table("my_table", metadata_obj, autoload_with=mysql_engine)
# my_table should already exist in your db, this is the table you want to convert
#event.listens_for(metadata_obj, "column_reflect")
def genericize_datatypes(inspector, tablename, column_dict):
column_dict["type"] = column_dict["type"].as_generic()
my_generic_table = Table("my_table", metadata_obj, autoload_with=mysql_engine)
# generic table will be a generic version of my_table
sqlite_eng = create_engine("sqlite:///test.db", echo=True)\
my_generic_table.create(sqlite_eng)

Related

How to specify fillfactor for a table in sqlalchemy?

Let's say I have a simple table. In raw SQL it looks like this:
CREATE TABLE events (id INT PRIMARY KEY) WITH (fillfactor=60);
My question is how to specify fillfactor for a table using sqlalchemy declarative base?
Of course I can achieve that by using raw SQL in sqlalchemy like execute("ALTER TABLE events SET (fillfactor=60)"), but I'm interested whether there is a way to do that using native sqlalchemy tools.
I've tried the following approach, but that didnt't work:
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class SimpleExampleTable(Base):
__tablename__ = 'events'
__table_args__ = {'comment': 'events table', "fillfactor": 60}
id = Column(Integer, primary_key=True)
TypeError: Additional arguments should be named <dialectname>_<argument>, got 'fillfactor'
Looking through documentation I've managed to find information only about fillfactor usage in indexes.
My environment:
python 3.9
sqlalchemy 1.3.22
PostgreSQL 11.6
Tbh, have the same question, and the only thing I found which is related to the fillfactor in the SQLAlchemy docs is the one with index (link to docs):
PostgreSQL allows storage parameters to be set on indexes. The storage
parameters available depend on the index method used by the index.
Storage parameters can be specified on Index using the postgresql_with
keyword argument:
Index('my_index', my_table.c.data, postgresql_with={"fillfactor": 50})
But it seems, there is no setting option where you can set the fillfactor directly for the table.
But there is still an option to run the raw SQL query (as the alembic migration, let's say):
ALTER TABLE mytable SET (fillfactor = 70);
Note that setting fillfactor on an existing table will not rearrange
the data, it will only apply to future inserts. But you can use
VACUUM to rewrite the table, which will respect the new fillfactor
setting.
The previous quote is taken from here
Extending the answer from Max Kapustin, you can use an event listener to automatically execute the ALTER TABLE statement when the table is created.
import sqlalchemy as sa
engine = sa.create_engine('postgresql:///test', echo=True, future=True)
tablename = 't65741211'
tbl = sa.Table(
tablename,
sa.MetaData(),
sa.Column('id', sa.Integer, primary_key=True),
listeners=[
(
'after_create',
sa.schema.DDL(
f"""ALTER TABLE "{tablename}" SET (fillfactor = 70)"""
),
)
],
)
tbl.drop(engine, checkfirst=True)
tbl.create(engine)

SQLAlchemy's automap_base creates bad collections for one-to-many tables

I am using Flask-SQLAlchemy to recreate tables in an existing SQL Server database. The usual declarative base wasn't working out so I am trying to use the automap base.
My DB has a table called 'orders' and I am trying to query the data:
db = SQLAlchemy(app)
db.Model.metadata.reflect(bind=db.engine)
Base = automap_base()
Base.prepare(db.engine, reflect=True)
db.session.query(Base.classes.orders).all()
However, an ArgumentError is thrown:
ArgumentError: orders.orders and back-reference
orders.orders_collection are both of the same
direction symbol('ONETOMANY'). Did you mean to set remote_side on the
many-to-one side ?
The SQLAlchemy doc tells me that the latter orders.orders_collection is automatically created. What could I do to make sure the creation of this collection doesn't lead to this ArgumentError?

why use sqlalchemy declarative api?

New to sqlalchemy and somewhat novice with programing and python. I had wanted to query a table. It seems I can use the all() function when querying but cannot filter without creating a class.
1.) Can I filter without creating a class and using the declarative api? Is the filtering example stated below incorrect?
2.) When would it be appropriate to use declarative api in sqlalchemy and when would it not be appropriate?
import sqlalchemy as sql
from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm import sessionmaker
db = sql.create_engine('postgresql://postgres:password#localhost:5432/postgres')
engine = db.connect()
meta = MetaData(engine)
session = sessionmaker(bind=engine)
session = session()
files = Table('files',meta,
Column('file_id',Integer,primary_key=True),
Column('file_name',String(256)),
Column('query',String(256)),
Column('results',Integer),
Column('totalresults',Integer),
schema='indeed')
session.query(files).all() #ok
session.query(files).filter(files.file_name = 'test.json') #not ok
If you want to filter by a Table construct, it should be:
session.query(files).filter(files.c.file_name == 'test.json')
You need to create mapped classes if you want to use the ORM features of SQLAlchemy. For example, with the code you currently have, in order to do an update you have to do
session.execute(files.update().values(...))
As opposed to:
file = session.query(File).first()
file.file_name = "new file name"
session.commit()
The declarative API happens to be the easiest way of constructing mapped classes, so use it if you want to use the ORM.
Filter using declarative api this way:
session.query(files).filter(files.file_name == 'test.json').all()
You can also use raw sql queries (docs).
Whether using declarative api or not may depend on your queries complexity, because sometimes sqlalchemy doesn't optimize them right way.

Can SQLAlchemy automatically create relationships from a database schema?

Starting from an existing (SQLite) database with foreign keys, can SQLAlchemy automatically build relationships?
SQLAlchemy classes are automatically created via __table_args__ = {'autoload': True}.
The goal would be to easily access data from related tables without having to add all the relationships one by one by hand (i.e. without using sqlalchemy.orm.relationship() and sqlalchemy.orm.backref).
[Update] As of SQLAlchemy 0.9.1 there is Automap extension for doing that.
For SQLAlchemy < 0.9.0 it is possible to use sqlalchemy reflection.
SQLAlchemy reflection loads foreign/primary keys relations between tables. But doesn't create relations between mapped classes. Actually reflection doesn't create mapped classes for you - you have to specify mapped class name.
Actually I think that reflection support for loading foreign keys is a great helper and time saving tool. Using it you can build a query using joins without need to specify which columns to use for a join.
from sqlalchemy import *
from sqlalchemy import create_engine, orm
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
metadata = MetaData()
Base = declarative_base()
Base.metadata = metadata
db = create_engine('<db connection URL>',echo=False)
metadata.reflect(bind=db)
cause_code_table = metadata.tables['cause_code']
ndticket_table = metadata.tables['ndticket']
sm = orm.sessionmaker(bind=db, autoflush=True, autocommit=True, expire_on_commit=True)
session = orm.scoped_session(sm)
q = session.query(ndticket_table,cause_code_table).join(cause_code_table)
for r in q.limit(10):
print r
Also when I was using reflection to run queries to existing database - I had to define only mapped classes names, table bindings, relations, BUT there were no need to define table columns for these relations.
class CauseCode(Base):
__tablename__ = "cause_code"
class NDTicket(Base):
__tablename__ = "ndticket"
cause_code = relationship("CauseCode", backref = "ndticket")
q = session.query(NDTicket)
for r in q.limit(10):
print r.ticket_id, r.cause_code.cause_code
Overall SQLAlchemy reflection is already powerful tool and save me time, so adding relations manually is a small overhead for me.
If I would have to develop functionality that will add relations between mapped objects using existing foreign keys, I would start from using reflection with inspector. Using get_foreign_keys() method gives all information required to build relations - referred table name, referred column name and column name in target table. And would use this information for adding property with relationship into mapped class.
insp = reflection.Inspector.from_engine(db)
print insp.get_table_names()
print insp.get_foreign_keys(NDTicket.__tablename__)
>>>[{'referred_table': u'cause_code', 'referred_columns': [u'cause_code'], 'referred_schema': None, 'name': u'SYS_C00135367', 'constrained_columns': [u'cause_code_id']}]
As of SQLAlchemy 0.9.1 the (for now experimental) Automap extension would seem to do just that: http://docs.sqlalchemy.org/en/rel_0_9/orm/extensions/automap.html

How do I get a raw, compiled SQL query from a SQLAlchemy expression?

I have a SQLAlchemy query object and want to get the text of the compiled SQL statement, with all its parameters bound (e.g. no %s or other variables waiting to be bound by the statement compiler or MySQLdb dialect engine, etc).
Calling str() on the query reveals something like this:
SELECT id WHERE date_added <= %s AND date_added >= %s ORDER BY count DESC
I've tried looking in query._params but it's an empty dict. I wrote my own compiler using this example of the sqlalchemy.ext.compiler.compiles decorator but even the statement there still has %s where I want data.
I can't quite figure out when my parameters get mixed in to create the query; when examining the query object they're always an empty dictionary (though the query executes fine and the engine prints it out when you turn echo logging on).
I'm starting to get the message that SQLAlchemy doesn't want me to know the underlying query, as it breaks the general nature of the expression API's interface all the different DB-APIs. I don't mind if the query gets executed before I found out what it was; I just want to know!
This blogpost by Nicolas Cadou provides an updated answer.
Quoting from the blog post, this is suggested and worked for me:
from sqlalchemy.dialects import postgresql
print str(q.statement.compile(dialect=postgresql.dialect()))
Where q is defined as:
q = DBSession.query(model.Name).distinct(model.Name.value) \
.order_by(model.Name.value)
Or just any kind of session.query().
The documentation uses literal_binds to print a query q including parameters:
print(q.statement.compile(compile_kwargs={"literal_binds": True}))
the above approach has the caveats that it is only supported for basic types, such as ints and strings, and furthermore if a bindparam() without a pre-set value is used directly, it won’t be able to stringify that either.
The documentation also issues this warning:
Never use this technique with string content received from untrusted
input, such as from web forms or other user-input applications.
SQLAlchemy’s facilities to coerce Python values into direct SQL string
values are not secure against untrusted input and do not validate the
type of data being passed. Always use bound parameters when
programmatically invoking non-DDL SQL statements against a relational
database.
This should work with Sqlalchemy >= 0.6
from sqlalchemy.sql import compiler
from psycopg2.extensions import adapt as sqlescape
# or use the appropiate escape function from your db driver
def compile_query(query):
dialect = query.session.bind.dialect
statement = query.statement
comp = compiler.SQLCompiler(dialect, statement)
comp.compile()
enc = dialect.encoding
params = {}
for k,v in comp.params.iteritems():
if isinstance(v, unicode):
v = v.encode(enc)
params[k] = sqlescape(v)
return (comp.string.encode(enc) % params).decode(enc)
Thing is, sqlalchemy never mixes the data with your query. The query and the data are passed separately to your underlying database driver - the interpolation of data happens in your database.
Sqlalchemy passes the query as you've seen in str(myquery) to the database, and the values will go in a separate tuple.
You could use some approach where you interpolate the data with the query yourself (as albertov suggested below), but that's not the same thing that sqlalchemy is executing.
For the MySQLdb backend I modified albertov's awesome answer (thanks so much!) a bit. I'm sure they could be merged to check if comp.positional was True but that's slightly beyond the scope of this question.
def compile_query(query):
from sqlalchemy.sql import compiler
from MySQLdb.converters import conversions, escape
dialect = query.session.bind.dialect
statement = query.statement
comp = compiler.SQLCompiler(dialect, statement)
comp.compile()
enc = dialect.encoding
params = []
for k in comp.positiontup:
v = comp.params[k]
if isinstance(v, unicode):
v = v.encode(enc)
params.append( escape(v, conversions) )
return (comp.string.encode(enc) % tuple(params)).decode(enc)
First let me preface by saying that I assume you're doing this mainly for debugging purposes -- I wouldn't recommend trying to modify the statement outside of the SQLAlchemy fluent API.
Unfortunately there doesn't seem to be a simple way to show the compiled statement with the query parameters included. SQLAlchemy doesn't actually put the parameters into the statement -- they're passed into the database engine as a dictionary. This lets the database-specific library handle things like escaping special characters to avoid SQL injection.
But you can do this in a two-step process reasonably easily. To get the statement, you can do as you've already shown, and just print the query:
>>> print(query)
SELECT field_1, field_2 FROM table WHERE id=%s;
You can get one step closer with query.statement, to see the parameter names. Note :id_1 below vs %s above -- not really a problem in this very simple example, but could be key in a more complicated statement.
>>> print(query.statement)
>>> print(query.statement.compile()) # seems to be equivalent, you can also
# pass in a dialect if you want
SELECT field_1, field_2 FROM table WHERE id=:id_1;
Then, you can get the actual values of the parameters by getting the params property of the compiled statement:
>>> print(query.statement.compile().params)
{u'id_1': 1}
This worked for a MySQL backend at least; I would expect it's also general enough for PostgreSQL without needing to use psycopg2.
For postgresql backend using psycopg2, you can listen for the do_execute event, then use the cursor, statement and type coerced parameters along with Cursor.mogrify() to inline the parameters. You can return True to prevent actual execution of the query.
import sqlalchemy
class QueryDebugger(object):
def __init__(self, engine, query):
with engine.connect() as connection:
try:
sqlalchemy.event.listen(engine, "do_execute", self.receive_do_execute)
connection.execute(query)
finally:
sqlalchemy.event.remove(engine, "do_execute", self.receive_do_execute)
def receive_do_execute(self, cursor, statement, parameters, context):
self.statement = statement
self.parameters = parameters
self.query = cursor.mogrify(statement, parameters)
# Don't actually execute
return True
Sample usage:
>>> engine = sqlalchemy.create_engine("postgresql://postgres#localhost/test")
>>> metadata = sqlalchemy.MetaData()
>>> users = sqlalchemy.Table('users', metadata, sqlalchemy.Column("_id", sqlalchemy.String, primary_key=True), sqlalchemy.Column("document", sqlalchemy.dialects.postgresql.JSONB))
>>> s = sqlalchemy.select([users.c.document.label("foobar")]).where(users.c.document.contains({"profile": {"iid": "something"}}))
>>> q = QueryDebugger(engine, s)
>>> q.query
'SELECT users.document AS foobar \nFROM users \nWHERE users.document #> \'{"profile": {"iid": "something"}}\''
>>> q.statement
'SELECT users.document AS foobar \nFROM users \nWHERE users.document #> %(document_1)s'
>>> q.parameters
{'document_1': '{"profile": {"iid": "something"}}'}
The following solution uses the SQLAlchemy Expression Language and works with SQLAlchemy 1.1. This solution does not mix the parameters with the query (as requested by the original author), but provides a way of using SQLAlchemy models to generate SQL query strings and parameter dictionaries for different SQL dialects. The example is based on the tutorial http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html
Given the class,
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class foo(Base):
__tablename__ = 'foo'
id = Column(Integer(), primary_key=True)
name = Column(String(80), unique=True)
value = Column(Integer())
we can produce a query statement using the select function.
from sqlalchemy.sql import select
statement = select([foo.name, foo.value]).where(foo.value > 0)
Next, we can compile the statement into a query object.
query = statement.compile()
By default, the statement is compiled using a basic 'named' implementation that is compatible with SQL databases such as SQLite and Oracle. If you need to specify a dialect such as PostgreSQL, you can do
from sqlalchemy.dialects import postgresql
query = statement.compile(dialect=postgresql.dialect())
Or if you want to explicitly specify the dialect as SQLite, you can change the paramstyle from 'qmark' to 'named'.
from sqlalchemy.dialects import sqlite
query = statement.compile(dialect=sqlite.dialect(paramstyle="named"))
From the query object, we can extract the query string and query parameters
query_str = str(query)
query_params = query.params
and finally execute the query.
conn.execute( query_str, query_params )
You can use events from ConnectionEvents family: after_cursor_execute or before_cursor_execute.
In sqlalchemy UsageRecipes by #zzzeek you can find this example:
Profiling
...
#event.listens_for(Engine, "before_cursor_execute")
def before_cursor_execute(conn, cursor, statement,
parameters, context, executemany):
conn.info.setdefault('query_start_time', []).append(time.time())
logger.debug("Start Query: %s" % statement % parameters)
...
Here you can get access to your statement
UPDATE: Came up with yet another case where the previous solution here wasn't properly producing the correct SQL statement. After a bit of diving around in SQLAlchemy, it becomes apparent that you not only need to compile for a particular dialect, you also need to take the compiled query and initialize it for the correct DBAPI connection context. Otherwise, things like type bind processors don't get executed and values like JSON.NULL don't get properly translated.
Note, this makes this solution very particular to Flask + Flask-SQLAlchemy + psycopg2 + PostgreSQL. You may need to translate this solution to your environment by changing the dialect and how you reference your connection. However, I'm pretty confident this produces the exact SQL for all data types.
The result below is a simple method to drop in and occasionally but reliably grab the exact, compiled SQL that would be sent to my PostgreSQL backend by just interrogating the query itself:
import sqlalchemy.dialects.postgresql.psycopg2
from flask import current_app
def query_to_string(query):
dialect = sqlalchemy.dialects.postgresql.psycopg2.dialect()
compiled_query = query.statement.compile(dialect=dialect)
sqlalchemy_connection = current_app.db.session.connection()
context = dialect.execution_ctx_cls._init_compiled(
dialect,
sqlalchemy_connection,
sqlalchemy_connection.connection,
compiled_query,
None
)
mogrified_query = sqlalchemy_connection.connection.cursor().mogrify(
context.statement,
context.parameters[0]
)
return mogrified_query.decode()
query = [ .... some ORM query .... ]
print(f"compiled SQL = {query_to_string(query)}")
I've created this little function that I import when I want to print the full query, considering I'm in the middle of a test when the dialect is already bound:
import re
def print_query(query):
regex = re.compile(":(?P<name>\w+)")
params = query.statement.compile().params
sql = regex.sub("'{\g<name>}'", str(query.statement)).format(**params)
print(f"\nPrinting SQLAlchemy query:\n\n")
print(sql)
return sql
I think .statement would possibly do the trick:
http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=query
>>> local_session.query(sqlalchemy_declarative.SomeTable.text).statement
<sqlalchemy.sql.annotation.AnnotatedSelect at 0x6c75a20; AnnotatedSelectobject>
>>> x=local_session.query(sqlalchemy_declarative.SomeTable.text).statement
>>> print(x)
SELECT sometable.text
FROM sometable
If with SQLAlchemy you are using PyMySQL, you can do one trick.
I was in a hurry and lost a lot of time, so I changed the driver for print the current statement with parameters.
SQLAlchemy intentionally does not support full stringification of literal values.
But PyMySQL has 'mogrify' method which does it, but, SQLALchemy has no HOOK for call it when using ORM insert/update (when it controls the cursor) like db.add or commit/flush (for update).
So, Just go where the driver is using (to know where use):
pip show pycharm
In the folder, find and edit the file cursors.py.
In the method:
def execute(self, query, args=None):
Under the line:
query = self.mogrify(query, args)
Just Add:
print(query)
Will work like a charm, debug, resolve the issue and remove the print.

Categories