I'd really like to be able to print out valid SQL for my application, including values, rather than bind parameters, but it's not obvious how to do this in SQLAlchemy (by design, I'm fairly sure).
Has anyone solved this problem in a general way?
In the vast majority of cases, the "stringification" of a SQLAlchemy statement or query is as simple as:
print(str(statement))
This applies both to an ORM Query as well as any select() or other statement.
Note: the following detailed answer is being maintained on the sqlalchemy documentation.
To get the statement as compiled to a specific dialect or engine, if the statement itself is not already bound to one you can pass this in to compile():
print(statement.compile(someengine))
or without an engine:
from sqlalchemy.dialects import postgresql
print(statement.compile(dialect=postgresql.dialect()))
When given an ORM Query object, in order to get at the compile() method we only need access the .statement accessor first:
statement = query.statement
print(statement.compile(someengine))
with regards to the original stipulation that bound parameters are to be "inlined" into the final string, the challenge here is that SQLAlchemy normally is not tasked with this, as this is handled appropriately by the Python DBAPI, not to mention bypassing bound parameters is probably the most widely exploited security holes in modern web applications. SQLAlchemy has limited ability to do this stringification in certain circumstances such as that of emitting DDL. In order to access this functionality one can use the 'literal_binds' flag, passed to compile_kwargs:
from sqlalchemy.sql import table, column, select
t = table('t', column('x'))
s = select([t]).where(t.c.x == 5)
print(s.compile(compile_kwargs={"literal_binds": True}))
the above approach has the caveats that it is only supported for basic
types, such as ints and strings, and furthermore if a bindparam
without a pre-set value is used directly, it won't be able to
stringify that either.
To support inline literal rendering for types not supported, implement
a TypeDecorator for the target type which includes a
TypeDecorator.process_literal_param method:
from sqlalchemy import TypeDecorator, Integer
class MyFancyType(TypeDecorator):
impl = Integer
def process_literal_param(self, value, dialect):
return "my_fancy_formatting(%s)" % value
from sqlalchemy import Table, Column, MetaData
tab = Table('mytable', MetaData(), Column('x', MyFancyType()))
print(
tab.select().where(tab.c.x > 5).compile(
compile_kwargs={"literal_binds": True})
)
producing output like:
SELECT mytable.x
FROM mytable
WHERE mytable.x > my_fancy_formatting(5)
Given that what you want makes sense only when debugging, you could start SQLAlchemy with echo=True, to log all SQL queries. For example:
engine = create_engine(
"mysql://scott:tiger#hostname/dbname",
encoding="latin1",
echo=True,
)
This can also be modified for just a single request:
echo=False – if True, the Engine will log all statements as well as a repr() of their parameter lists to the engines logger, which defaults to sys.stdout. The echo attribute of Engine can be modified at any time to turn logging on and off. If set to the string "debug", result rows will be printed to the standard output as well. This flag ultimately controls a Python logger; see Configuring Logging for information on how to configure logging directly.
Source: SQLAlchemy Engine Configuration
If used with Flask, you can simply set
app.config["SQLALCHEMY_ECHO"] = True
to get the same behaviour.
This works in python 2 and 3 and is a bit cleaner than before, but requires SA>=1.0.
from sqlalchemy.engine.default import DefaultDialect
from sqlalchemy.sql.sqltypes import String, DateTime, NullType
# python2/3 compatible.
PY3 = str is not bytes
text = str if PY3 else unicode
int_type = int if PY3 else (int, long)
str_type = str if PY3 else (str, unicode)
class StringLiteral(String):
"""Teach SA how to literalize various things."""
def literal_processor(self, dialect):
super_processor = super(StringLiteral, self).literal_processor(dialect)
def process(value):
if isinstance(value, int_type):
return text(value)
if not isinstance(value, str_type):
value = text(value)
result = super_processor(value)
if isinstance(result, bytes):
result = result.decode(dialect.encoding)
return result
return process
class LiteralDialect(DefaultDialect):
colspecs = {
# prevent various encoding explosions
String: StringLiteral,
# teach SA about how to literalize a datetime
DateTime: StringLiteral,
# don't format py2 long integers to NULL
NullType: StringLiteral,
}
def literalquery(statement):
"""NOTE: This is entirely insecure. DO NOT execute the resulting strings."""
import sqlalchemy.orm
if isinstance(statement, sqlalchemy.orm.Query):
statement = statement.statement
return statement.compile(
dialect=LiteralDialect(),
compile_kwargs={'literal_binds': True},
).string
Demo:
# coding: UTF-8
from datetime import datetime
from decimal import Decimal
from literalquery import literalquery
def test():
from sqlalchemy.sql import table, column, select
mytable = table('mytable', column('mycol'))
values = (
5,
u'snowman: ☃',
b'UTF-8 snowman: \xe2\x98\x83',
datetime.now(),
Decimal('3.14159'),
10 ** 20, # a long integer
)
statement = select([mytable]).where(mytable.c.mycol.in_(values)).limit(1)
print(literalquery(statement))
if __name__ == '__main__':
test()
Gives this output: (tested in python 2.7 and 3.4)
SELECT mytable.mycol
FROM mytable
WHERE mytable.mycol IN (5, 'snowman: ☃', 'UTF-8 snowman: ☃',
'2015-06-24 18:09:29.042517', 3.14159, 100000000000000000000)
LIMIT 1
We can use compile method for this purpose. From the docs:
from sqlalchemy.sql import text
from sqlalchemy.dialects import postgresql
stmt = text("SELECT * FROM users WHERE users.name BETWEEN :x AND :y")
stmt = stmt.bindparams(x="m", y="z")
print(stmt.compile(dialect=postgresql.dialect(),compile_kwargs={"literal_binds": True}))
Result:
SELECT * FROM users WHERE users.name BETWEEN 'm' AND 'z'
Warning from docs:
Never use this technique with string content received from untrusted
input, such as from web forms or other user-input applications.
SQLAlchemy’s facilities to coerce Python values into direct SQL string
values are not secure against untrusted input and do not validate the
type of data being passed. Always use bound parameters when
programmatically invoking non-DDL SQL statements against a relational
database.
So building on #zzzeek's comments on #bukzor's code I came up with this to easily get a "pretty-printable" query:
def prettyprintable(statement, dialect=None, reindent=True):
"""Generate an SQL expression string with bound parameters rendered inline
for the given SQLAlchemy statement. The function can also receive a
`sqlalchemy.orm.Query` object instead of statement.
can
WARNING: Should only be used for debugging. Inlining parameters is not
safe when handling user created data.
"""
import sqlparse
import sqlalchemy.orm
if isinstance(statement, sqlalchemy.orm.Query):
if dialect is None:
dialect = statement.session.get_bind().dialect
statement = statement.statement
compiled = statement.compile(dialect=dialect,
compile_kwargs={'literal_binds': True})
return sqlparse.format(str(compiled), reindent=reindent)
I personally have a hard time reading code which is not indented so I've used sqlparse to reindent the SQL. It can be installed with pip install sqlparse.
This code is based on brilliant existing answer from #bukzor. I just added custom render for datetime.datetime type into Oracle's TO_DATE().
Feel free to update code to suit your database:
import decimal
import datetime
def printquery(statement, bind=None):
"""
print a query, with values filled in
for debugging purposes *only*
for security, you should always separate queries from their values
please also note that this function is quite slow
"""
import sqlalchemy.orm
if isinstance(statement, sqlalchemy.orm.Query):
if bind is None:
bind = statement.session.get_bind(
statement._mapper_zero_or_none()
)
statement = statement.statement
elif bind is None:
bind = statement.bind
dialect = bind.dialect
compiler = statement._compiler(dialect)
class LiteralCompiler(compiler.__class__):
def visit_bindparam(
self, bindparam, within_columns_clause=False,
literal_binds=False, **kwargs
):
return super(LiteralCompiler, self).render_literal_bindparam(
bindparam, within_columns_clause=within_columns_clause,
literal_binds=literal_binds, **kwargs
)
def render_literal_value(self, value, type_):
"""Render the value of a bind parameter as a quoted literal.
This is used for statement sections that do not accept bind paramters
on the target driver/database.
This should be implemented by subclasses using the quoting services
of the DBAPI.
"""
if isinstance(value, basestring):
value = value.replace("'", "''")
return "'%s'" % value
elif value is None:
return "NULL"
elif isinstance(value, (float, int, long)):
return repr(value)
elif isinstance(value, decimal.Decimal):
return str(value)
elif isinstance(value, datetime.datetime):
return "TO_DATE('%s','YYYY-MM-DD HH24:MI:SS')" % value.strftime("%Y-%m-%d %H:%M:%S")
else:
raise NotImplementedError(
"Don't know how to literal-quote value %r" % value)
compiler = LiteralCompiler(dialect, statement)
print compiler.process(statement)
I would like to point out that the solutions given above do not "just work" with non-trivial queries. One issue I came across were more complicated types, such as pgsql ARRAYs causing issues. I did find a solution that for me, did just work even with pgsql ARRAYs:
borrowed from:
https://gist.github.com/gsakkis/4572159
The linked code seems to be based on an older version of SQLAlchemy. You'll get an error saying that the attribute _mapper_zero_or_none doesn't exist. Here's an updated version that will work with a newer version, you simply replace _mapper_zero_or_none with bind. Additionally, this has support for pgsql arrays:
# adapted from:
# https://gist.github.com/gsakkis/4572159
from datetime import date, timedelta
from datetime import datetime
from sqlalchemy.orm import Query
try:
basestring
except NameError:
basestring = str
def render_query(statement, dialect=None):
"""
Generate an SQL expression string with bound parameters rendered inline
for the given SQLAlchemy statement.
WARNING: This method of escaping is insecure, incomplete, and for debugging
purposes only. Executing SQL statements with inline-rendered user values is
extremely insecure.
Based on http://stackoverflow.com/questions/5631078/sqlalchemy-print-the-actual-query
"""
if isinstance(statement, Query):
if dialect is None:
dialect = statement.session.bind.dialect
statement = statement.statement
elif dialect is None:
dialect = statement.bind.dialect
class LiteralCompiler(dialect.statement_compiler):
def visit_bindparam(self, bindparam, within_columns_clause=False,
literal_binds=False, **kwargs):
return self.render_literal_value(bindparam.value, bindparam.type)
def render_array_value(self, val, item_type):
if isinstance(val, list):
return "{%s}" % ",".join([self.render_array_value(x, item_type) for x in val])
return self.render_literal_value(val, item_type)
def render_literal_value(self, value, type_):
if isinstance(value, long):
return str(value)
elif isinstance(value, (basestring, date, datetime, timedelta)):
return "'%s'" % str(value).replace("'", "''")
elif isinstance(value, list):
return "'{%s}'" % (",".join([self.render_array_value(x, type_.item_type) for x in value]))
return super(LiteralCompiler, self).render_literal_value(value, type_)
return LiteralCompiler(dialect, statement).process(statement)
Tested to two levels of nested arrays.
To log SQL queries using Python logging instead of the echo=True flag:
import logging
logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)
per the documentation.
Just a simple colored example with ORM's Query and pygments.
import sqlparse
from pygments import highlight
from pygments.formatters.terminal import TerminalFormatter
from pygments.lexers import SqlLexer
from sqlalchemy import create_engine
from sqlalchemy.orm import Query
engine = create_engine("sqlite+pysqlite:///db.sqlite", echo=True, future=True)
def format_sql(query: Query):
compiled = query.statement.compile(
engine, compile_kwargs={"literal_binds": True})
parsed = sqlparse.format(str(compiled), reindent=True, keyword_case='upper')
print(highlight(parsed, SqlLexer(), TerminalFormatter()))
Or version without sqlparse (without sqlparse there are less new lines in output)
def format_sql(query: Query):
compiled = query.statement.compile(
engine, compile_kwargs={"literal_binds": True})
print(highlight(str(compiled), SqlLexer(), TerminalFormatter()))
This is my approach
# query is instance of: from sqlalchemy import select
def raw_query(query):
q = str(query.compile())
p = query.compile().params
for k in p.keys():
v = p.get(k)
if isinstance(v, (int, float, complex)):
q = q.replace(f":{k}", f"{v}")
else:
q = q.replace(f":{k}", f"'{v}'")
print(q)
How to use it:
from sqlalchemy import select
select_query = select([
any_model_table.c["id_account"],
any_model_table.c["id_provider"],
any_model_table.c["id_service"],
func.sum(any_model_table.c["items"]).label("items"),
# #eaf
func.date_format(func.now(), "%Y-%m-%d").label("some_date"),
func.date_format(func.now(), "%Y").label("as_year"),
func.date_format(func.now(), "%m").label("as_month"),
func.date_format(func.now(), "%d").label("as_day"),
]).group_by(
any_model_table.c.id_account,
any_model_table.c.id_provider,
any_model_table.c.id_service
).where(
any_model_table.c.id == 5
).where(
func.date_format(any_model_table.c.dt, "%Y-%m-%d") == datetime.utcnow().strftime('%Y-%m-%d')
)
raw_query(select_query)
I'm trying to use the google-api for python.
I've managed to store the credentials in a CredentialsField (basically copying this) implementation.
I can get a storage object:
>>> storage = Storage(CredentialsModel, 'id', user, 'credential')
>>> storage
<oauth2client.django_orm.Storage object at 0x7f1f8f1260f0>
no problem. But when I try to get the credentials object:
>>> credential = storage.get()
>>> credential
I just get a massive string (7482 characters) instead of a credentials object. What gives? (I think the string might be a bytearray, since it begins with '\\x67414e6a6.
I'm also using Python 3.
Any thoughts on why I'm getting a string instead of a Credentials object?
I've actually found the answer if you still need it:
The problem is that the class CredentialFields in django_orm defined a metaclass variable, which isn't supported in python3 anymore.
Therefore, it is necessary to change it to something like this:
class CredentialsField(models.Field, metaclass=models.SubfieldBase):
someone has opened an issue in the github repo:
https://github.com/google/oauth2client/issues/168
My solution for python3, django 1.8 with postgres database:
There is a missing step right before saving the byte data to database, and after retrieving the data back from database: The byte data need to be converted to/from string first. You can convert byte to string and vice versa with decode("utf-8") and encode("utf-8").
You also do not need the __metaclass__, but need to have get_prep_value() and from_db_value() functions.
The entire class CredentialsField should be rewritten like so:
class CredentialsField(models.Field):
def __init__(self, *args, **kwargs):
if 'null' not in kwargs:
kwargs['null'] = True
super().__init__(*args, **kwargs)
def get_internal_type(self):
return "TextField"
def to_python(self, value):
if value is None:
return None
if isinstance(value, oauth2client.client.Credentials):
return value
value = value.encode("utf-8") # string to byte
return pickle.loads(base64.b64decode(value))
def from_db_value(self, value, expression, connection, context):
return self.to_python(value)
def get_db_prep_value(self, value, connection, prepared=False):
if value is None:
return None
byte_repr = base64.b64encode(pickle.dumps(value))
return byte_repr.decode("utf-8") # byte to string
def get_prep_value(self, value):
return self.get_db_prep_value(value)
Not too sure if this is really simple or not, but I can't really find anything on the topic. But, either using the regular MongoEngine library, or even Flask-MongoEngine for my Flask based website, would it be possible to return a MongoEngine document as straight JSON?
Thanks!
In 0.8 there are helpers see https://github.com/MongoEngine/mongoengine/issues/1
in the meantime you have to use pymongo's json_utils directly:
from bson import json_util
json_util.dumps(MyDoc._collection_obj.find(MyDoc.objects()._query))
Ross's and Jellyflower's workarounds don't work when field projection or ordering is used.
More general workaround:
from bson import json_util
json = json_util.dumps(query._cursor)
The correct workaround should probably be:
from bson import json_util
objects = MyDoc.objects()
json_util.dumps(objects._collection_obj.find(objects._query))
Update: thanks to Lo-Tan for to_mongo() method usage suggestion.
Eventually I came up with the following solution:
from json import JSONEncoder
from mongoengine.base import BaseDocument
class MongoEncoder(JSONEncoder):
def default(self, o):
if isinstance(o, BaseDocument):
data = o.to_mongo()
# might not be present if EmbeddedDocument
o_id = data.pop('_id', None)
if o_id:
data['id'] = str(o_id['$oid'])
data.pop('_cls', None)
return data
else:
return JSONEncoder.default(self, o)
# consider `obj` to be MongoEngine object
json_data = json.dumps(obj, cls=MongoEncoder)
It uses to_json() method, added as the response to the aforementioned issue.