Can't convert this SQL String (with VALUES ... AS) to SQLAlchemy Code - python

The SQL query I have can identify the Max Edit Time from the 3 tables that it is joining together:
Select Identity.SSN, Schedule.First_Class, Students.Last_Name,
(SELECT Max(v)
FROM (VALUES (Students.Edit_DtTm), (Schedule.Edit_DtTm),
(Identity.Edit_DtTm)) AS value(v)) as [MaxEditDate]
FROM Schedule
LEFT JOIN Students ON Schedule.stdnt_id=Students.Student_Id
LEFT JOIN Identity ON Schedule.std_id=Identity.std_id
I need this to be in SQLAlchemy so I can reference the columns being used elsewhere in my code. Below is the simplest version of what i'm trying to do but it doesn't work. I've tried changing around how I query it but I either get a SQL error that I'm using VALUES incorrectly or it doesn't join properly and gets me the actual highest value in those columns without matching it to the outer query
max_edit_subquery = sa.func.values(Students.Edit_DtTm, Schedule.Edit_DtTm, Identity.Edit_DtTm)
base_query = (sa.select([Identity.SSN, Schedule.First_Class, Students.Last_Name,
(sa.select([sa.func.max(self.max_edit_subquery)]))]).
select_from(Schedule.__table__.join(Students, Schedule.stdnt_id == Students.stdnt_id).
join(Ident, Schedule.std_id == Identity.std_id)))

I am not an expert at SQLAlchemy but you could exchange VALUES with UNION ALL:
Select Identity.SSN, Schedule.First_Class, Students.Last_Name,
(SELECT Max(v)
FROM (SELECT Students.Edit_DtTm AS v
UNION ALL SELECT Schedule.Edit_DtTm
UNION ALL SELECT Identity.Edit_DtTm) s
) as [MaxEditDate]
FROM Schedule
LEFT JOIN Students ON Schedule.stdnt_id=Students.Student_Id
LEFT JOIN Identity ON Schedule.std_id=Identity.std_id;
Another approach is to use GREATEST function (not available in T-SQL):
Select Identity.SSN, Schedule.First_Class, Students.Last_Name,
GREATEST(Students.Edit_DtTm, Schedule.Edit_DtTm,Identity.Edit_DtTm)
as [MaxEditDate]
FROM Schedule
LEFT JOIN Students ON Schedule.stdnt_id=Students.Student_Id
LEFT JOIN Identity ON Schedule.std_id=Identity.std_id;
I hope that it will help you to translate it to ORM version.

I had the similar problem and i solved using the below approach. I have added the full code and resultant query. The code was executed on the MSSQL server. I had used different tables and masked with the tables and columns used in your requirement in the below code snippet.
from sqlalchemy import *
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.types import String
from sqlalchemy.sql.expression import FromClause
class values(FromClause):
def __init__(self, *args):
self.column_names = args
#compiles(values)
def compile_values(element, compiler, asfrom=False, **kwrgs):
values = "VALUES %s" % ", ".join("(%s)" % compiler.render_literal_value(elem, String()) for elem in element.column_names)
if asfrom:
values = "(%s)" % values
return values
base_query = self.db_session.query(Schedule.Edit_DtTm.label("Schedule_Edit_DtTm"),
Identity.Edit_DtTm.label("Identity_Edit_DtTm"),
Students.Edit_DtTm.label("Students_Edit_DtTm"),
Identity.SSN
).outerjoin(Students, Schedule.stdnt_id==Students.Student_Id
).outerjoin(Identity, Schedule.std_id==Identity.std_id).subquery()
values_at_from_clause = values(("Students_Edit_DtTm"), ("Schedule_Edit_DtTm"), ("Identity_Edit_DtTm")
).alias('values(MaxEditDate)')
get_max_from_values = self.db_session.query(func.max(text('MaxEditDate'))
).select_from(values_at_from_clause)
output_query = self.db_session.query(get_max_from_values.subquery()
).label("MaxEditDate")
**print output_query**
SELECT
anon_1.Schedule_Edit_DtTm AS anon_1_Schedule_Edit_DtTm,
anon_1.Students_Edit_DtTm AS anon_1_Students_Edit_DtTm,
anon_1.Identity_Edit_DtTm AS anon_1_Identity_Edit_DtTm,
anon_1.SSN AS anon_1_SSN
(
SELECT
anon_2.max_1
FROM
(
SELECT
max( MaxEditDate ) AS max_1
FROM
(
VALUES (Students_Edit_DtTm),
(Schedule_Edit_DtTm),
(Identity_Edit_DtTm)
) AS values(MaxEditDate)
) AS anon_2
) AS MaxEditDate
FROM
(
SELECT
Schedule.Edit_DtTm AS Schedule_Edit_DtTm,
Students.Edit_DtTm AS Students_Edit_DtTm,
Identity.Edit_DtTm AS Identity_Edit_DtTm,
Identity.SSN AS SSN
FROM
Schedule WITH(NOLOCK)
LEFT JOIN Students WITH(NOLOCK) ON
Schedule.stdnt_id==Students.Student_Id
LEFT JOIN Identity WITH(NOLOCK) ON
Schedule.std_id==Identity.std_id
) AS anon_1

Related

Can we make correlated queries with SQLAlchemy

I'm trying to translate this SQL query into a Flask-SQLAlchemy call:
SELECT *
FROM "ENVOI"
WHERE "ID_ENVOI" IN (SELECT d."ID_ENVOI"
FROM "DECLANCHEMENT" d
WHERE d."STATUS" = 0
AND d."DATE" = (SELECT max("DECLANCHEMENT"."DATE")
FROM "DECLANCHEMENT"
WHERE "DECLANCHEMENT"."ID_ENVOI" = d."ID_ENVOI"))
As you can see, it uses subqueries and, most important part, one of the subqueries is a correlated query (it use d table defined in an outer query).
I know how to use subqueries with subquery() function, but I can't find documentation about correlated queries with SQLAlchemy. Do you know a way to do it ?
Yes, we can.
Have a look at the following example (especially the correlate method call):
from sqlalchemy import select, func, table, Column, Integer
table1 = table('table1', Column('col', Integer))
table2 = table('table2', Column('col', Integer))
subquery = select(
[func.if_(table1.c.col == 1, table2.c.col, None)]
).correlate(table1)
query = (
select([table1.c.col,
subquery.label('subquery')])
.select_from(table1)
)
if __name__ == '__main__':
print(query)
will result in the following query
SELECT table1.col, (SELECT if(table1.col = :col_1, table2.col, NULL) AS if_1
FROM table2) AS subquery
FROM table1
As you can see, if you call correlate on a select, the given Table will not be added to it's FROM-clause.
You have to do this even when you specify select_from directly, as SQLAlchemy will happily add any table it finds in the columns.
Based on the link from univerio's comment, I've done this code for my request:
Declch = db.aliased(Declanchement)
maxdate_sub = db.select([db.func.max(Declanchement.date)])\
.where(Declanchement.id_envoi == Declch.id_envoi)
decs_sub = db.session.query(Declch.id_envoi)\
.filter(Declch.status == SMS_EN_ATTENTE)\
.filter(Declch.date < since)\
.filter(Declch.date == maxdate_sub).subquery()
envs = Envoi.query.filter(Envoi.id_envoi.in_(decs_sub)).all()

Adding a join to an SQL Alchemy expression that already has a select_from()

Note: this is a question about SQL Alchemy's expression language not the ORM
SQL Alchemy is fine for adding WHERE or HAVING clauses to an existing query:
q = select([bmt_gene.c.id]).select_from(bmt_gene)
q = q.where(bmt_gene.c.ensembl_id == "ENSG00000000457")
print q
SELECT bmt_gene.id
FROM bmt_gene
WHERE bmt_gene.ensembl_id = %s
However if you try to add a JOIN in the same way you'll get an exception:
q = select([bmt_gene.c.id]).select_from(bmt_gene)
q = q.join(bmt_gene_name)
sqlalchemy.exc.NoForeignKeysError: Can't find any foreign key relationships between 'Select object' and 'bmt_gene_name'
If you specify the columns it creates a subquery (which is incomplete SQL anyway):
q = select([bmt_gene.c.id]).select_from(bmt_gene)
q = q.join(bmt_gene_name, q.c.id == bmt_gene_name.c.gene_id)
(SELECT bmt_gene.id AS id FROM bmt_gene)
JOIN bmt_gene_name ON id = bmt_gene_name.gene_id
But what I actually want is this:
SELECT
bmt_gene.id AS id
FROM
bmt_gene
JOIN bmt_gene_name ON id = bmt_gene_name.gene_id
edit: Adding the JOIN has to be after the creation of the initial query expression q. The idea is that I make a basic query skeleton then I iterate over all the joins requested by the user and add them to the query.
Can this be done in SQL Alchemy?
The first error (NoForeignKeysError) means that your table lacks foreign key definition. Fix this if you don't want to write join clauses by hand:
from sqlalchemy.types import Integer
from sqlalchemy.schema import MetaData, Table, Column, ForeignKey
meta = MetaData()
bmt_gene_name = Table(
'bmt_gene_name', meta,
Column('id', Integer, primary_key=True),
Column('gene_id', Integer, ForeignKey('bmt_gene.id')),
# ...
)
The joins in SQLAlchemy expression language work a little bit different from what you expect. You need to create Join object where you join all the tables and only then provide it to Select object:
q = select([bmt_gene.c.id])
q = q.where(bmt_gene.c.ensembl_id == 'ENSG00000000457')
j = bmt_gene # Initial table to join.
table_list = [bmt_gene_name, some_other_table, ...]
for table in table_list:
j = j.join(table)
q = q.select_from(j)
The reason why you see the subquery in your join is that Select object is treated like a table (which essentially it is) which you asked to join to another table.
You can access the current select_from of a query with the froms attribute, and then join it with another table and update the select_from.
As explained in the documentation, calling select_from usually adds another selectable to the FROM list, however:
Passing a Join that refers to an already present Table or other selectable will have the effect of concealing the presence of that selectable as an individual element in the rendered FROM list, instead rendering it into a JOIN clause.
So you can add a join like this, for example:
q = select([bmt_gene.c.id]).select_from(bmt_gene)
q = q.select_from(
join(q.froms[0], bmt_gene_name,
bmt_gene.c.id == bmt_gene_name.c.gene_id)
)

SQL- How to grab a large list of data instead of iterating each query

I gather a list of items for each item I check the database with SQL query with the following code:
SELECT *
FROM task_activity as ja
join task as j on ja.task_id = j.id
WHERE j.name = '%s'
AND ja.avg_runtime <> 0
AND ja.avg_runtime is not NULL
AND ja.id = (SELECT MAX(id) FROM task_activity
WHERE task_id = ja.task_id
and avg_runtime <> 0
AND ja.avg_runtime is not NULL)
% str(task.get('name'))).fetchall()
But do I need to iterate through the list and make a query for everyone. This list is quite large at times. Can I just make one query and get back a list data set?
In this particular query I'm only looking for the column avg_runtime with the task id and the maximum id will be the last calculated runtime.
I don't have access to the database other then to make queries. Using Microsoft SQL Server 2012 (SP1) - 11.0.3349.0 (X64)
You might be able to speed this up using row_number(). Note, I think there's a bug in your original query. Should ja.avg_runtime in the subquery just be avg_runtime?
sql = """with x as (
select
task_id,
avg_runtime,
id,
row_number() over (partition by ja.task_id order by ja.id desc) rn
from
task_activity as ja
join
task as j
on ja.task_id = j.id
where
j.name in ({0}) and
ja.avg_runtime <> 0 and
ja.avg_runtime is not null
) select
task_id,
avg_runtime,
id
from
x
where
rn = 1;"""
# build up ?,?,? for parameter substitution
# assume tasknames is the list containing the task names.
params = ",".join(tasknames.map(lambda x: "?"))
# connection is your db connection
cursor = connection.cursor()
# interpolate the ?,?,? and bind parameters
cursor.execute(sql.format(params), tasknames)
cursor.fetchall()
the following index should make this query pretty fast (although it depends how many rows are being excluded by the filters on ja.avg_runtime):
create index ix_task_id_id on task_activity (task_id, id desc);

How to count rows with SELECT COUNT(*) with SQLAlchemy?

I'd like to know if it's possible to generate a SELECT COUNT(*) FROM TABLE statement in SQLAlchemy without explicitly asking for it with execute().
If I use:
session.query(table).count()
then it generates something like:
SELECT count(*) AS count_1 FROM
(SELECT table.col1 as col1, table.col2 as col2, ... from table)
which is significantly slower in MySQL with InnoDB. I am looking for a solution that doesn't require the table to have a known primary key, as suggested in Get the number of rows in table using SQLAlchemy.
Query for just a single known column:
session.query(MyTable.col1).count()
I managed to render the following SELECT with SQLAlchemy on both layers.
SELECT count(*) AS count_1
FROM "table"
Usage from the SQL Expression layer
from sqlalchemy import select, func, Integer, Table, Column, MetaData
metadata = MetaData()
table = Table("table", metadata,
Column('primary_key', Integer),
Column('other_column', Integer) # just to illustrate
)
print select([func.count()]).select_from(table)
Usage from the ORM layer
You just subclass Query (you have probably anyway) and provide a specialized count() method, like this one.
from sqlalchemy.sql.expression import func
class BaseQuery(Query):
def count_star(self):
count_query = (self.statement.with_only_columns([func.count()])
.order_by(None))
return self.session.execute(count_query).scalar()
Please note that order_by(None) resets the ordering of the query, which is irrelevant to the counting.
Using this method you can have a count(*) on any ORM Query, that will honor all the filter andjoin conditions already specified.
I needed to do a count of a very complex query with many joins. I was using the joins as filters, so I only wanted to know the count of the actual objects. count() was insufficient, but I found the answer in the docs here:
http://docs.sqlalchemy.org/en/latest/orm/tutorial.html
The code would look something like this (to count user objects):
from sqlalchemy import func
session.query(func.count(User.id)).scalar()
Addition to the Usage from the ORM layer in the accepted answer: count(*) can be done for ORM using the query.with_entities(func.count()), like this:
session.query(MyModel).with_entities(func.count()).scalar()
It can also be used in more complex cases, when we have joins and filters - the important thing here is to place with_entities after joins, otherwise SQLAlchemy could raise the Don't know how to join error.
For example:
we have User model (id, name) and Song model (id, title, genre)
we have user-song data - the UserSong model (user_id, song_id, is_liked) where user_id + song_id is a primary key)
We want to get a number of user's liked rock songs:
SELECT count(*)
FROM user_song
JOIN song ON user_song.song_id = song.id
WHERE user_song.user_id = %(user_id)
AND user_song.is_liked IS 1
AND song.genre = 'rock'
This query can be generated in a following way:
user_id = 1
query = session.query(UserSong)
query = query.join(Song, Song.id == UserSong.song_id)
query = query.filter(
and_(
UserSong.user_id == user_id,
UserSong.is_liked.is_(True),
Song.genre == 'rock'
)
)
# Note: important to place `with_entities` after the join
query = query.with_entities(func.count())
liked_count = query.scalar()
Complete example is here.
If you are using the SQL Expression Style approach there is another way to construct the count statement if you already have your table object.
Preparations to get the table object. There are also different ways.
import sqlalchemy
database_engine = sqlalchemy.create_engine("connection string")
# Populate existing database via reflection into sqlalchemy objects
database_metadata = sqlalchemy.MetaData()
database_metadata.reflect(bind=database_engine)
table_object = database_metadata.tables.get("table_name") # This is just for illustration how to get the table_object
Issuing the count query on the table_object
query = table_object.count()
# This will produce something like, where id is a primary key column in "table_name" automatically selected by sqlalchemy
# 'SELECT count(table_name.id) AS tbl_row_count FROM table_name'
count_result = database_engine.scalar(query)
I'm not clear on what you mean by "without explicitly asking for it with execute()" So this might be exactly what you are not asking for.
OTOH, this might help others.
You can just run the textual SQL:
your_query="""
SELECT count(*) from table
"""
the_count = session.execute(text(your_query)).scalar()
def test_query(val: str):
query = f"select count(*) from table where col1='{val}'"
rtn = database_engine.query(query)
cnt = rtn.one().count
but you can find the way if you checked debug watch
query = session.query(table.column).filter().with_entities(func.count(table.column.distinct()))
count = query.scalar()
this worked for me.
Gives the query:
SELECT count(DISTINCT table.column) AS count_1
FROM table where ...
Below is the way to find the count of any query.
aliased_query = alias(query)
db.session.query(func.count('*')).select_from(aliased_query).scalar()
Here is the link to the reference document if you want to explore more options or read details.

Sqlite3 / Webpy: "no such column" with double left join

I am trying to do Sqlite3 query via webpy framework.The query works in SQLiteManager. But with web.db i get "sqlite3.OperationalError no such column a.id".
Is this a webpy bug?
import web
db = web.database(dbn='sqlite', db='data/feed.db')
account = 1
query='''
SELECT a.id, a.url, a.title, a.description, a.account_count, b.id subscribed FROM
(SELECT feed.id, feed.url, feed.title, feed.description, count(account_feed.id) account_count
FROM feed
LEFT OUTER JOIN account_feed
ON feed.id=account_feed.feed_id AND feed.actived=1
GROUP BY feed.id, feed.url, feed.title, feed.description
ORDER BY count(account_feed.id) DESC, feed.id DESC)
a LEFT OUTER JOIN account_feed b ON a.id=b.feed_id AND b.account_id=$account'''
return list(self._db.query(query,vars=locals()))
Traceback is here:http://pastebin.com/pUA7zB9H
Not sure why you are getting the error "no such column a.id", but
it may help to
use a multiline string (easier to read),
use a parametrized argument for account (Was $account a Perl-hangover?)
query = '''
SELECT a.id, a.url, a.title, a.description, a.account_count, b.id subscribed
FROM ( {q} ) a
LEFT OUTER JOIN account_feed b
ON a.id=b.feed_id
AND b.account_id = ?'''.format(q=query)
args=[account]
cursor.execute(query,args)

Categories