How can I compare two SQLAlchemy queries if they are the same? - python

I have a function returning SQLAlchemy query object and I want to test this function that it builds correct query.
For example:
import sqlalchemy
metadata = sqlalchemy.MetaData()
users = sqlalchemy.Table(
"users",
metadata,
sqlalchemy.Column("email", sqlalchemy.String(255), nullable=False, unique=True),
sqlalchemy.Column("username", sqlalchemy.String(50), nullable=False, unique=True),
)
def select_first_users(n):
return users.select().limit(n)
def test_queries_are_equal(self):
expected_query = users.select().limit(10)
assert select_first_users(10) == expected_query # fails here
assert select_first_users(10).compare(expected_query) # fails here too
I have no idea how to compare two queries for equality. == doesn't work here because as far as I can see these objects do not have the __eq__ method defined, so it compares objects by address in memory and surely fails. The compare method also does is comparison.
The only solution I see is like:
assert str(q1.compile()) == str(q2.compile())
, but it is strange and contains placeholders instead of actual values.
So how can I compare two SQLAlchemy queries for equality?
I use Python 3.7.4, SQLAlchemy==1.3.10.

There is a parameter to the compile function that solves the placeholder problem
query.compile(compile_kwargs={"literal_binds": True}), so instead of
SELECT users.email,
users.username
FROM users
LIMIT :param_1
you get
SELECT users.email,
users.username
FROM users
LIMIT 10
So I think you could do something like that
import sqlparse
def format_query(query):
return sqlparse.format(str(query.compile(compile_kwargs={"literal_binds": True})),
reindent=True, keyword_case='upper')
def test_queries_are_equal():
expected_query = users.select().limit(10)
assert format_query(expected_query) == format_query(select_first_users(10))
If comparison is based on an exact string match, I think it's better to ensure a consistent formatting, hence the use of sqlparse.
Natively it handles only int and strings but it can be extended, see the doc for more info
https://docs.sqlalchemy.org/en/13/faq/sqlexpressions.html#faq-sql-expression-string

Related

SQLAlchemy, programmatically check argument

I want to programmatically check a variable to see if it is one of several allowed strings. I can also add a check constraint in the sql code, but I don't really want to do that. I know I can access arguments passed into SQLAlchemy objects via kwargs. What is the best what to assert that a passed in argument is allowed?
class Attend(db.Model):
__tablename__ = 'attend'
uid = db.Column(db.Integer, db.ForeignKey('user.uid'), primary_key=True)
gid = db.Column(db.Integer, db.ForeignKey('group.gid'), primary_key=True)
# assert(user_role in GroupRoles.roles) -- want to do something like this
user_role = db.Column(db.String)
user = db.relationship('User', back_populates='registered_groups')
group = db.relationship('Group', back_populates='registered_users')
If you want to validate a particular column when it's assigned in Python (including in the default constructor), you can use the validates decorator:
class Attend(db.Model):
...
#validates("user_role"):
def _validate_user_role(self, key, value):
assert(user_role in GroupRoles.roles)
return value
Your particular use case, however, seems to fit an enum better:
user_role = db.Column(db.Enum(*GroupRoles.roles))
This produces server-side checks that the value is valid. In SQLAlchemy 1.1 (currently unreleased), this also performs Python-side checks, obviating the need for the _validate_user_role function above (still necessary in SQLAlchemy 1.0 and before).

SQLAlchemy hybrid_property and expressions

I am working on storing some data produced by an external process in a postgres database using sqlalchemy. The external data has several dates stored as strings that I would like to use as datetime objects for comparison and duration calculation and I'd like the conversion to happen in the data model to maintain consistency. I'm trying to use a hybrid_property but I am running into problems based on the different ways that SQLAlchemy uses the hybrid_property as an instance or class.
A (simplified) case looks like this...
class Contact(Base):
id = Column(String(100), primary_key=True)
status_date = Column(String(100))
#hybrid_property
def real_status_date(self):
return convert_from_outside_date(self.status_date)
with the conversion function something like this (the function can return a date, False on conversion failure or None on being passed None)...
def convert_from_outside_date(in_str):
out_date = None
if in_str != None:
try:
out_date = datetime.datetime.strptime(in_str,"%Y-%m-%d")
except ValueError:
out_date = False
return out_date
When I use an instance of Contact, contact.real_status_date properly works as a datetime. The problem is when Contact.real_status_date is used in a query filter.
db_session.query(Contact).filter(
Contact.real_status_date > datetime.datetime.now())
Gets me a "TypeError: Boolean value of this clause is not defined" exception, with the
in_str != None
line of the conversion function as the last part of the stack trace.
Some answers (https://stackoverflow.com/a/14504695/416308) show the use of a setter function and the addition of new column in the data model. Other answers (https://stackoverflow.com/a/13642708/416308) show the addition of #property.expression function that returns something sqlalchemy can interpret into a sql expression.
Adding a setter to the Contact class works but the addition of new columns seems like it shouldn't be necessary and makes some table metadata parsing more difficult later and I'd like to avoid it if I can.
_real_status_date = Column(DateTime())
#hybrid_property
def real_status_date(self):
return self._real_status_date
#real_status_date.setter
def value(self):
self._real_status_date = convert_from_outside_date(self.status_date)
If I used an #.expression decorator would I have to implement a strptime function that is more sql compatible? What would that look like? Is there something wrong with the conversion function that is causing trouble here?
As zzzeek mentions, you could add the following to your class
Depending on your DB, it might already interpret a python datetime object
So it could work only modifying your conversion function to:
def convert_from_outside_date(in_str):
if in_str:
try:
return datetime.datetime.strptime(in_str,"%Y-%m-%d")
# Return None for a Null date
return None
Otherwise you need to add an expression function:
#real_status_date.expression
def real_status_date(self):
return sqlalchemy.Date(self.real_status_date)

What is the correct way to make SQLalchemy store strings as lowercase?

I'm using SQLAlchemy to talk to my database. Because not many people will be using my application (at least initially), I figure SQLite is the quickest/easiest back end.
I've got a User, and it has a unique ID that's string based, e.g. asdf#asdf.com, or Mr. Fnord. I don't care what format the id is in - just that it's unique. However, I want to this to be a case-insensitive uniqueness. So Mr. Fnord and mr. fNoRd would be equivalent.
Apparently there's a COLLATE setting on the schema you can use, but (at least with sqlite) it doesn't seem to be straignt forward. My solution was to use properties on the class to lowercase everything before it went to the table, but that seemed brittle/hackish.
Are properties the best way to handle lowercasing everything, or is there a better way to make things case insensitive via SQLAlchemy/SQLite?
I haven't tried this personally, but perhaps you could use a custom comparator operator? I.e.
class CaseInsensitiveColumnComparator(ColumnProperty.Comparator):
def __eq__(self, other):
return func.lower(self.__clause_element__()) == func.lower(other)
This way, your ID can be stored in any case, but will compare as lowercase.
Another idea - augment sqlalchemy.types.Text, and use that for your ID.
import sqlalchemy.types as types
class LowerCaseText(types.TypeDecorator):
'''Converts strings to lower case on the way in.'''
impl = types.Text
def process_bind_param(self, value, dialect):
return value.lower()
class User(Base):
__tablename__ = 'user'
id = Column(LowerCaseText, primary_key=True)
...

How do I get a raw, compiled SQL query from a SQLAlchemy expression?

I have a SQLAlchemy query object and want to get the text of the compiled SQL statement, with all its parameters bound (e.g. no %s or other variables waiting to be bound by the statement compiler or MySQLdb dialect engine, etc).
Calling str() on the query reveals something like this:
SELECT id WHERE date_added <= %s AND date_added >= %s ORDER BY count DESC
I've tried looking in query._params but it's an empty dict. I wrote my own compiler using this example of the sqlalchemy.ext.compiler.compiles decorator but even the statement there still has %s where I want data.
I can't quite figure out when my parameters get mixed in to create the query; when examining the query object they're always an empty dictionary (though the query executes fine and the engine prints it out when you turn echo logging on).
I'm starting to get the message that SQLAlchemy doesn't want me to know the underlying query, as it breaks the general nature of the expression API's interface all the different DB-APIs. I don't mind if the query gets executed before I found out what it was; I just want to know!
This blogpost by Nicolas Cadou provides an updated answer.
Quoting from the blog post, this is suggested and worked for me:
from sqlalchemy.dialects import postgresql
print str(q.statement.compile(dialect=postgresql.dialect()))
Where q is defined as:
q = DBSession.query(model.Name).distinct(model.Name.value) \
.order_by(model.Name.value)
Or just any kind of session.query().
The documentation uses literal_binds to print a query q including parameters:
print(q.statement.compile(compile_kwargs={"literal_binds": True}))
the above approach has the caveats that it is only supported for basic types, such as ints and strings, and furthermore if a bindparam() without a pre-set value is used directly, it won’t be able to stringify that either.
The documentation also issues this warning:
Never use this technique with string content received from untrusted
input, such as from web forms or other user-input applications.
SQLAlchemy’s facilities to coerce Python values into direct SQL string
values are not secure against untrusted input and do not validate the
type of data being passed. Always use bound parameters when
programmatically invoking non-DDL SQL statements against a relational
database.
This should work with Sqlalchemy >= 0.6
from sqlalchemy.sql import compiler
from psycopg2.extensions import adapt as sqlescape
# or use the appropiate escape function from your db driver
def compile_query(query):
dialect = query.session.bind.dialect
statement = query.statement
comp = compiler.SQLCompiler(dialect, statement)
comp.compile()
enc = dialect.encoding
params = {}
for k,v in comp.params.iteritems():
if isinstance(v, unicode):
v = v.encode(enc)
params[k] = sqlescape(v)
return (comp.string.encode(enc) % params).decode(enc)
Thing is, sqlalchemy never mixes the data with your query. The query and the data are passed separately to your underlying database driver - the interpolation of data happens in your database.
Sqlalchemy passes the query as you've seen in str(myquery) to the database, and the values will go in a separate tuple.
You could use some approach where you interpolate the data with the query yourself (as albertov suggested below), but that's not the same thing that sqlalchemy is executing.
For the MySQLdb backend I modified albertov's awesome answer (thanks so much!) a bit. I'm sure they could be merged to check if comp.positional was True but that's slightly beyond the scope of this question.
def compile_query(query):
from sqlalchemy.sql import compiler
from MySQLdb.converters import conversions, escape
dialect = query.session.bind.dialect
statement = query.statement
comp = compiler.SQLCompiler(dialect, statement)
comp.compile()
enc = dialect.encoding
params = []
for k in comp.positiontup:
v = comp.params[k]
if isinstance(v, unicode):
v = v.encode(enc)
params.append( escape(v, conversions) )
return (comp.string.encode(enc) % tuple(params)).decode(enc)
First let me preface by saying that I assume you're doing this mainly for debugging purposes -- I wouldn't recommend trying to modify the statement outside of the SQLAlchemy fluent API.
Unfortunately there doesn't seem to be a simple way to show the compiled statement with the query parameters included. SQLAlchemy doesn't actually put the parameters into the statement -- they're passed into the database engine as a dictionary. This lets the database-specific library handle things like escaping special characters to avoid SQL injection.
But you can do this in a two-step process reasonably easily. To get the statement, you can do as you've already shown, and just print the query:
>>> print(query)
SELECT field_1, field_2 FROM table WHERE id=%s;
You can get one step closer with query.statement, to see the parameter names. Note :id_1 below vs %s above -- not really a problem in this very simple example, but could be key in a more complicated statement.
>>> print(query.statement)
>>> print(query.statement.compile()) # seems to be equivalent, you can also
# pass in a dialect if you want
SELECT field_1, field_2 FROM table WHERE id=:id_1;
Then, you can get the actual values of the parameters by getting the params property of the compiled statement:
>>> print(query.statement.compile().params)
{u'id_1': 1}
This worked for a MySQL backend at least; I would expect it's also general enough for PostgreSQL without needing to use psycopg2.
For postgresql backend using psycopg2, you can listen for the do_execute event, then use the cursor, statement and type coerced parameters along with Cursor.mogrify() to inline the parameters. You can return True to prevent actual execution of the query.
import sqlalchemy
class QueryDebugger(object):
def __init__(self, engine, query):
with engine.connect() as connection:
try:
sqlalchemy.event.listen(engine, "do_execute", self.receive_do_execute)
connection.execute(query)
finally:
sqlalchemy.event.remove(engine, "do_execute", self.receive_do_execute)
def receive_do_execute(self, cursor, statement, parameters, context):
self.statement = statement
self.parameters = parameters
self.query = cursor.mogrify(statement, parameters)
# Don't actually execute
return True
Sample usage:
>>> engine = sqlalchemy.create_engine("postgresql://postgres#localhost/test")
>>> metadata = sqlalchemy.MetaData()
>>> users = sqlalchemy.Table('users', metadata, sqlalchemy.Column("_id", sqlalchemy.String, primary_key=True), sqlalchemy.Column("document", sqlalchemy.dialects.postgresql.JSONB))
>>> s = sqlalchemy.select([users.c.document.label("foobar")]).where(users.c.document.contains({"profile": {"iid": "something"}}))
>>> q = QueryDebugger(engine, s)
>>> q.query
'SELECT users.document AS foobar \nFROM users \nWHERE users.document #> \'{"profile": {"iid": "something"}}\''
>>> q.statement
'SELECT users.document AS foobar \nFROM users \nWHERE users.document #> %(document_1)s'
>>> q.parameters
{'document_1': '{"profile": {"iid": "something"}}'}
The following solution uses the SQLAlchemy Expression Language and works with SQLAlchemy 1.1. This solution does not mix the parameters with the query (as requested by the original author), but provides a way of using SQLAlchemy models to generate SQL query strings and parameter dictionaries for different SQL dialects. The example is based on the tutorial http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html
Given the class,
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class foo(Base):
__tablename__ = 'foo'
id = Column(Integer(), primary_key=True)
name = Column(String(80), unique=True)
value = Column(Integer())
we can produce a query statement using the select function.
from sqlalchemy.sql import select
statement = select([foo.name, foo.value]).where(foo.value > 0)
Next, we can compile the statement into a query object.
query = statement.compile()
By default, the statement is compiled using a basic 'named' implementation that is compatible with SQL databases such as SQLite and Oracle. If you need to specify a dialect such as PostgreSQL, you can do
from sqlalchemy.dialects import postgresql
query = statement.compile(dialect=postgresql.dialect())
Or if you want to explicitly specify the dialect as SQLite, you can change the paramstyle from 'qmark' to 'named'.
from sqlalchemy.dialects import sqlite
query = statement.compile(dialect=sqlite.dialect(paramstyle="named"))
From the query object, we can extract the query string and query parameters
query_str = str(query)
query_params = query.params
and finally execute the query.
conn.execute( query_str, query_params )
You can use events from ConnectionEvents family: after_cursor_execute or before_cursor_execute.
In sqlalchemy UsageRecipes by #zzzeek you can find this example:
Profiling
...
#event.listens_for(Engine, "before_cursor_execute")
def before_cursor_execute(conn, cursor, statement,
parameters, context, executemany):
conn.info.setdefault('query_start_time', []).append(time.time())
logger.debug("Start Query: %s" % statement % parameters)
...
Here you can get access to your statement
UPDATE: Came up with yet another case where the previous solution here wasn't properly producing the correct SQL statement. After a bit of diving around in SQLAlchemy, it becomes apparent that you not only need to compile for a particular dialect, you also need to take the compiled query and initialize it for the correct DBAPI connection context. Otherwise, things like type bind processors don't get executed and values like JSON.NULL don't get properly translated.
Note, this makes this solution very particular to Flask + Flask-SQLAlchemy + psycopg2 + PostgreSQL. You may need to translate this solution to your environment by changing the dialect and how you reference your connection. However, I'm pretty confident this produces the exact SQL for all data types.
The result below is a simple method to drop in and occasionally but reliably grab the exact, compiled SQL that would be sent to my PostgreSQL backend by just interrogating the query itself:
import sqlalchemy.dialects.postgresql.psycopg2
from flask import current_app
def query_to_string(query):
dialect = sqlalchemy.dialects.postgresql.psycopg2.dialect()
compiled_query = query.statement.compile(dialect=dialect)
sqlalchemy_connection = current_app.db.session.connection()
context = dialect.execution_ctx_cls._init_compiled(
dialect,
sqlalchemy_connection,
sqlalchemy_connection.connection,
compiled_query,
None
)
mogrified_query = sqlalchemy_connection.connection.cursor().mogrify(
context.statement,
context.parameters[0]
)
return mogrified_query.decode()
query = [ .... some ORM query .... ]
print(f"compiled SQL = {query_to_string(query)}")
I've created this little function that I import when I want to print the full query, considering I'm in the middle of a test when the dialect is already bound:
import re
def print_query(query):
regex = re.compile(":(?P<name>\w+)")
params = query.statement.compile().params
sql = regex.sub("'{\g<name>}'", str(query.statement)).format(**params)
print(f"\nPrinting SQLAlchemy query:\n\n")
print(sql)
return sql
I think .statement would possibly do the trick:
http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=query
>>> local_session.query(sqlalchemy_declarative.SomeTable.text).statement
<sqlalchemy.sql.annotation.AnnotatedSelect at 0x6c75a20; AnnotatedSelectobject>
>>> x=local_session.query(sqlalchemy_declarative.SomeTable.text).statement
>>> print(x)
SELECT sometable.text
FROM sometable
If with SQLAlchemy you are using PyMySQL, you can do one trick.
I was in a hurry and lost a lot of time, so I changed the driver for print the current statement with parameters.
SQLAlchemy intentionally does not support full stringification of literal values.
But PyMySQL has 'mogrify' method which does it, but, SQLALchemy has no HOOK for call it when using ORM insert/update (when it controls the cursor) like db.add or commit/flush (for update).
So, Just go where the driver is using (to know where use):
pip show pycharm
In the folder, find and edit the file cursors.py.
In the method:
def execute(self, query, args=None):
Under the line:
query = self.mogrify(query, args)
Just Add:
print(query)
Will work like a charm, debug, resolve the issue and remove the print.

How can I get all rows with keys provided in a list using SQLalchemy?

I have sequence of IDs I want to retrieve. It's simple:
session.query(Record).filter(Record.id.in_(seq)).all()
Is there a better way to do it?
Your code is absolutety fine.
IN is like a bunch of X=Y joined with OR and is pretty fast in contemporary databases.
However, if your list of IDs is long, you could make the query a bit more efficient by passing a sub-query returning the list of IDs.
The code as is is completely fine. However, someone is asking me for some system of hedging between the two approaches of doing a big IN vs. using get() for individual IDs.
If someone is really trying to avoid the SELECT, then the best way to do that is to set up the objects you need in memory ahead of time. Such as, you're working on a large table of elements. Break up the work into chunks, such as, order the full set of work by primary key, or by date range, whatever, then load everything for that chunk locally into a cache:
all_ids = [<huge list of ids>]
all_ids.sort()
while all_ids:
chunk = all_ids[0:1000]
# bonus exercise! Throw each chunk into a multiprocessing.pool()!
all_ids = all_ids[1000:]
my_cache = dict(
Session.query(Record.id, Record).filter(
Record.id.between(chunk[0], chunk[-1]))
)
for id_ in chunk:
my_obj = my_cache[id_]
<work on my_obj>
That's the real world use case.
But to also illustrate some SQLAlchemy API, we can make a function that does the IN for records we don't have and a local get for those we do. Here is that:
from sqlalchemy import inspect
def get_all(session, cls, seq):
mapper = inspect(cls)
lookup = set()
for ident in seq:
key = mapper.identity_key_from_primary_key((ident, ))
if key in session.identity_map:
yield session.identity_map[key]
else:
lookup.add(ident)
if lookup:
for obj in session.query(cls).filter(cls.id.in_(lookup)):
yield obj
Here is a demonstration:
from sqlalchemy import Column, Integer, create_engine, String
from sqlalchemy.orm import Session
from sqlalchemy.ext.declarative import declarative_base
import random
Base = declarative_base()
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
data = Column(String)
e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)
ids = range(1, 50)
s = Session(e)
s.add_all([A(id=i, data='a%d' % i) for i in ids])
s.commit()
s.close()
already_loaded = s.query(A).filter(A.id.in_(random.sample(ids, 10))).all()
assert len(s.identity_map) == 10
to_load = set(random.sample(ids, 25))
all_ = list(get_all(s, A, to_load))
assert set(x.id for x in all_) == to_load
If you use composite primary keys, you can use tuple_, as in
from sqlalchemy import tuple_
session.query(Record).filter(tuple_(Record.id1, Record.id2).in_(seq)).all()
Note that this is not available on SQLite (see doc).
I'd recommend to take a look at the SQL it produces. You can just print str(query) to see it.
I'm not aware of an ideal way of doing it with standard SQL.
There is one other way; If it's reasonable to expect that the objects in question are already loaded into the session; you've accessed them before in the same transaction, you can instead do:
map(session.query(Record).get, seq)
In the case where those objects are already present, this will be much faster, since there won't be any queries to retrieve those objects; On the other hand, if more than a tiny number of those objects are not loaded, it will be much, much slower, since it will cause a query per missing instance, instead of a single query for all objects.
This can be useful when you are doing joinedload() queries before reaching the above step, so you can be sure that they have been loaded already. In general, you should use the solution in the question by default, and only explore this solution when you have seen that you are querying for the same objects over and over.

Categories