How to count rows with SELECT COUNT(*) with SQLAlchemy? - python

I'd like to know if it's possible to generate a SELECT COUNT(*) FROM TABLE statement in SQLAlchemy without explicitly asking for it with execute().
If I use:
session.query(table).count()
then it generates something like:
SELECT count(*) AS count_1 FROM
(SELECT table.col1 as col1, table.col2 as col2, ... from table)
which is significantly slower in MySQL with InnoDB. I am looking for a solution that doesn't require the table to have a known primary key, as suggested in Get the number of rows in table using SQLAlchemy.

Query for just a single known column:
session.query(MyTable.col1).count()

I managed to render the following SELECT with SQLAlchemy on both layers.
SELECT count(*) AS count_1
FROM "table"
Usage from the SQL Expression layer
from sqlalchemy import select, func, Integer, Table, Column, MetaData
metadata = MetaData()
table = Table("table", metadata,
Column('primary_key', Integer),
Column('other_column', Integer) # just to illustrate
)
print select([func.count()]).select_from(table)
Usage from the ORM layer
You just subclass Query (you have probably anyway) and provide a specialized count() method, like this one.
from sqlalchemy.sql.expression import func
class BaseQuery(Query):
def count_star(self):
count_query = (self.statement.with_only_columns([func.count()])
.order_by(None))
return self.session.execute(count_query).scalar()
Please note that order_by(None) resets the ordering of the query, which is irrelevant to the counting.
Using this method you can have a count(*) on any ORM Query, that will honor all the filter andjoin conditions already specified.

I needed to do a count of a very complex query with many joins. I was using the joins as filters, so I only wanted to know the count of the actual objects. count() was insufficient, but I found the answer in the docs here:
http://docs.sqlalchemy.org/en/latest/orm/tutorial.html
The code would look something like this (to count user objects):
from sqlalchemy import func
session.query(func.count(User.id)).scalar()

Addition to the Usage from the ORM layer in the accepted answer: count(*) can be done for ORM using the query.with_entities(func.count()), like this:
session.query(MyModel).with_entities(func.count()).scalar()
It can also be used in more complex cases, when we have joins and filters - the important thing here is to place with_entities after joins, otherwise SQLAlchemy could raise the Don't know how to join error.
For example:
we have User model (id, name) and Song model (id, title, genre)
we have user-song data - the UserSong model (user_id, song_id, is_liked) where user_id + song_id is a primary key)
We want to get a number of user's liked rock songs:
SELECT count(*)
FROM user_song
JOIN song ON user_song.song_id = song.id
WHERE user_song.user_id = %(user_id)
AND user_song.is_liked IS 1
AND song.genre = 'rock'
This query can be generated in a following way:
user_id = 1
query = session.query(UserSong)
query = query.join(Song, Song.id == UserSong.song_id)
query = query.filter(
and_(
UserSong.user_id == user_id,
UserSong.is_liked.is_(True),
Song.genre == 'rock'
)
)
# Note: important to place `with_entities` after the join
query = query.with_entities(func.count())
liked_count = query.scalar()
Complete example is here.

If you are using the SQL Expression Style approach there is another way to construct the count statement if you already have your table object.
Preparations to get the table object. There are also different ways.
import sqlalchemy
database_engine = sqlalchemy.create_engine("connection string")
# Populate existing database via reflection into sqlalchemy objects
database_metadata = sqlalchemy.MetaData()
database_metadata.reflect(bind=database_engine)
table_object = database_metadata.tables.get("table_name") # This is just for illustration how to get the table_object
Issuing the count query on the table_object
query = table_object.count()
# This will produce something like, where id is a primary key column in "table_name" automatically selected by sqlalchemy
# 'SELECT count(table_name.id) AS tbl_row_count FROM table_name'
count_result = database_engine.scalar(query)

I'm not clear on what you mean by "without explicitly asking for it with execute()" So this might be exactly what you are not asking for.
OTOH, this might help others.
You can just run the textual SQL:
your_query="""
SELECT count(*) from table
"""
the_count = session.execute(text(your_query)).scalar()

def test_query(val: str):
query = f"select count(*) from table where col1='{val}'"
rtn = database_engine.query(query)
cnt = rtn.one().count
but you can find the way if you checked debug watch

query = session.query(table.column).filter().with_entities(func.count(table.column.distinct()))
count = query.scalar()
this worked for me.
Gives the query:
SELECT count(DISTINCT table.column) AS count_1
FROM table where ...

Below is the way to find the count of any query.
aliased_query = alias(query)
db.session.query(func.count('*')).select_from(aliased_query).scalar()
Here is the link to the reference document if you want to explore more options or read details.

Related

Dynamically add filter to SQLAlchemy TextClause

Assume I have a SQLAlchemy table which looks like:
class Country:
name = VARCHAR
population = INTEGER
continent = VARCHAR
num_states = INTEGER
My application allow seeing name and population for all Countries. So I have a TextClause which looks like
"select name, population from Country"
I allow raw queries in my application so I don't have option to change this to selectable.
At runtime, I want to allow my users to choose a field name and put a field value on which I want to allow filtering. eg: User can say I only want to see name and population for countries where Continent is Asia. So I dynamically want to add the filter
.where(Country.c.continent == 'Asia')
But I can't add .where to a TextClause.
Similarly, my user may choose to see name and population for countries where num_states is greater than 10. So I dynamically want to add the filter
.where(Country.c.num_states > 10)
But again I can't add .where to a TextClause.
What are the options I have to solve this problem?
Could subquery help here in any way?
Please add a filter based on the conditions. filter is used for adding where conditions in sqlalchemy.
Country.query.filter(Country.num_states > 10).all()
You can also do this:
query = Country.query.filter(Country.continent == 'Asia')
if user_input == 'states':
query = query.filter(Country.num_states > 10)
query = query.all()
This is not doable in a general sense without parsing the query. In relational algebra terms, the user applies projection and selection operations to a table, and you want to apply selection operations to it. Since the user can apply arbitrary projections (e.g. user supplies SELECT id FROM table), you are not guaranteed to be able to always apply your filters on top, so you have to apply your filters before the user does. That means you need to rewrite it to SELECT id FROM (some subquery), which requires parsing the user's query.
However, we can sort of cheat depending on the database that you are using, by having the database engine do the parsing for you. The way to do this is with CTEs, by basically shadowing the table name with a CTE.
Using your example, it looks like the following. User supplies query
SELECT name, population FROM country;
You shadow country with a CTE:
WITH country AS (
SELECT * FROM country
WHERE continent = 'Asia'
) SELECT name, population FROM country;
Unfortunately, because of the way SQLAlchemy's CTE support works, it is tough to get it to generate a CTE for a TextClause. The solution is to basically generate the string yourself, using a custom compilation extension, something like this:
class WrappedQuery(Executable, ClauseElement):
def __init__(self, name, outer, inner):
self.name = name
self.outer = outer
self.inner = inner
#compiles(WrappedQuery)
def compile_wrapped_query(element, compiler, **kwargs):
return "WITH {} AS ({}) {}".format(
element.name,
compiler.process(element.outer),
compiler.process(element.inner))
c = Country.__table__
cte = select(["*"]).select_from(c).where(c.c.continent == "Asia")
query = WrappedQuery("country", cte, text("SELECT name, population FROM country"))
session.execute(query)
From my tests, this only works in PostgreSQL. SQLite and SQL Server both treat it as recursive instead of shadowing, and MySQL does not support CTEs.
I couldn't find anything nice for this in the documentation for this. I ended up resorting to pretty much just string processing.... but at least it works!
from sqlalchemy.sql import text
query = """select name, population from Country"""
if continent is not None:
additional_clause = """WHERE continent = {continent};"""
query = query + additional_clause
text_clause = text(
query.format(
continent=continent,
),
)
else:
text_clause = text(query)
with sql_connection() as conn:
results = conn.execute(text_clause)
You could also chain this logic with more clauses, although you'll have to create a boolean flag for the first WHERE clause and then use AND for the subsequent ones.

Can we make correlated queries with SQLAlchemy

I'm trying to translate this SQL query into a Flask-SQLAlchemy call:
SELECT *
FROM "ENVOI"
WHERE "ID_ENVOI" IN (SELECT d."ID_ENVOI"
FROM "DECLANCHEMENT" d
WHERE d."STATUS" = 0
AND d."DATE" = (SELECT max("DECLANCHEMENT"."DATE")
FROM "DECLANCHEMENT"
WHERE "DECLANCHEMENT"."ID_ENVOI" = d."ID_ENVOI"))
As you can see, it uses subqueries and, most important part, one of the subqueries is a correlated query (it use d table defined in an outer query).
I know how to use subqueries with subquery() function, but I can't find documentation about correlated queries with SQLAlchemy. Do you know a way to do it ?
Yes, we can.
Have a look at the following example (especially the correlate method call):
from sqlalchemy import select, func, table, Column, Integer
table1 = table('table1', Column('col', Integer))
table2 = table('table2', Column('col', Integer))
subquery = select(
[func.if_(table1.c.col == 1, table2.c.col, None)]
).correlate(table1)
query = (
select([table1.c.col,
subquery.label('subquery')])
.select_from(table1)
)
if __name__ == '__main__':
print(query)
will result in the following query
SELECT table1.col, (SELECT if(table1.col = :col_1, table2.col, NULL) AS if_1
FROM table2) AS subquery
FROM table1
As you can see, if you call correlate on a select, the given Table will not be added to it's FROM-clause.
You have to do this even when you specify select_from directly, as SQLAlchemy will happily add any table it finds in the columns.
Based on the link from univerio's comment, I've done this code for my request:
Declch = db.aliased(Declanchement)
maxdate_sub = db.select([db.func.max(Declanchement.date)])\
.where(Declanchement.id_envoi == Declch.id_envoi)
decs_sub = db.session.query(Declch.id_envoi)\
.filter(Declch.status == SMS_EN_ATTENTE)\
.filter(Declch.date < since)\
.filter(Declch.date == maxdate_sub).subquery()
envs = Envoi.query.filter(Envoi.id_envoi.in_(decs_sub)).all()

How to convert SQL scalar subquery to SQLAlchemy expression

I need a litle help with expressing in SQLAlchemy language my code like this:
SELECT
s.agent_id,
s.property_id,
p.address_zip,
(
SELECT v.valuation
FROM property_valuations v WHERE v.zip_code = p.address_zip
ORDER BY ABS(DATEDIFF(v.as_of, s.date_sold))
LIMIT 1
) AS back_valuation,
FROM sales s
JOIN properties p ON s.property_id = p.id
Inner subquery aimed to get property value from table propert_valuations with columns (zip_code INT, valuation DECIMAL, as_if DATE) closest to the date of sale from table sales. I know how to rewrite it but I completely stuck on order_by expression - I cannot prepare subquery to pass ordering member later.
Currently I have following queries:
subquery = (
session.query(PropertyValuation)
.filter(PropertyValuation.zip_code == Property.address_zip)
.order_by(func.abs(func.datediff(PropertyValuation.as_of, Sale.date_sold)))
.limit(1)
)
query = session.query(Sale).join(Sale.property_)
How to combine these queries together?
How to combine these queries together?
Use as_scalar(), or label():
subquery = (
session.query(PropertyValuation.valuation)
.filter(PropertyValuation.zip_code == Property.address_zip)
.order_by(func.abs(func.datediff(PropertyValuation.as_of, Sale.date_sold)))
.limit(1)
)
query = session.query(Sale.agent_id,
Sale.property_id,
Property.address_zip,
# `subquery.as_scalar()` or
subquery.label('back_valuation'))\
.join(Property)
Using as_scalar() limits returned columns and rows to 1, so you cannot get the whole model object using it (as query(PropertyValuation) is a select of all the attributes of PropertyValuation), but getting just the valuation attribute works.
but I completely stuck on order_by expression - I cannot prepare subquery to pass ordering member later.
There's no need to pass it later. Your current way of declaring the subquery is fine as it is, since SQLAlchemy can automatically correlate FROM objects to those of an enclosing query. I tried creating models that somewhat represent what you have, and here's how the query above works out (with added line-breaks and indentation for readability):
In [10]: print(query)
SELECT sale.agent_id AS sale_agent_id,
sale.property_id AS sale_property_id,
property.address_zip AS property_address_zip,
(SELECT property_valuations.valuation
FROM property_valuations
WHERE property_valuations.zip_code = property.address_zip
ORDER BY abs(datediff(property_valuations.as_of, sale.date_sold))
LIMIT ? OFFSET ?) AS back_valuation
FROM sale
JOIN property ON property.id = sale.property_id

Adding a join to an SQL Alchemy expression that already has a select_from()

Note: this is a question about SQL Alchemy's expression language not the ORM
SQL Alchemy is fine for adding WHERE or HAVING clauses to an existing query:
q = select([bmt_gene.c.id]).select_from(bmt_gene)
q = q.where(bmt_gene.c.ensembl_id == "ENSG00000000457")
print q
SELECT bmt_gene.id
FROM bmt_gene
WHERE bmt_gene.ensembl_id = %s
However if you try to add a JOIN in the same way you'll get an exception:
q = select([bmt_gene.c.id]).select_from(bmt_gene)
q = q.join(bmt_gene_name)
sqlalchemy.exc.NoForeignKeysError: Can't find any foreign key relationships between 'Select object' and 'bmt_gene_name'
If you specify the columns it creates a subquery (which is incomplete SQL anyway):
q = select([bmt_gene.c.id]).select_from(bmt_gene)
q = q.join(bmt_gene_name, q.c.id == bmt_gene_name.c.gene_id)
(SELECT bmt_gene.id AS id FROM bmt_gene)
JOIN bmt_gene_name ON id = bmt_gene_name.gene_id
But what I actually want is this:
SELECT
bmt_gene.id AS id
FROM
bmt_gene
JOIN bmt_gene_name ON id = bmt_gene_name.gene_id
edit: Adding the JOIN has to be after the creation of the initial query expression q. The idea is that I make a basic query skeleton then I iterate over all the joins requested by the user and add them to the query.
Can this be done in SQL Alchemy?
The first error (NoForeignKeysError) means that your table lacks foreign key definition. Fix this if you don't want to write join clauses by hand:
from sqlalchemy.types import Integer
from sqlalchemy.schema import MetaData, Table, Column, ForeignKey
meta = MetaData()
bmt_gene_name = Table(
'bmt_gene_name', meta,
Column('id', Integer, primary_key=True),
Column('gene_id', Integer, ForeignKey('bmt_gene.id')),
# ...
)
The joins in SQLAlchemy expression language work a little bit different from what you expect. You need to create Join object where you join all the tables and only then provide it to Select object:
q = select([bmt_gene.c.id])
q = q.where(bmt_gene.c.ensembl_id == 'ENSG00000000457')
j = bmt_gene # Initial table to join.
table_list = [bmt_gene_name, some_other_table, ...]
for table in table_list:
j = j.join(table)
q = q.select_from(j)
The reason why you see the subquery in your join is that Select object is treated like a table (which essentially it is) which you asked to join to another table.
You can access the current select_from of a query with the froms attribute, and then join it with another table and update the select_from.
As explained in the documentation, calling select_from usually adds another selectable to the FROM list, however:
Passing a Join that refers to an already present Table or other selectable will have the effect of concealing the presence of that selectable as an individual element in the rendered FROM list, instead rendering it into a JOIN clause.
So you can add a join like this, for example:
q = select([bmt_gene.c.id]).select_from(bmt_gene)
q = q.select_from(
join(q.froms[0], bmt_gene_name,
bmt_gene.c.id == bmt_gene_name.c.gene_id)
)

How to use NOT IN clause in sqlalchemy ORM query

how do i convert the following mysql query to sqlalchemy?
SELECT * FROM `table_a` ta, `table_b` tb where 1
AND ta.id = tb.id
AND ta.id not in (select id from `table_c`)
so far i have this for sqlalchemy:
query = session.query(table_a, table_b)
query = query.filter(table_a.id == table_b.id)
The ORM internals describe the not_in() operator (previously notin_()), so you can say:
query = query.filter(table_a.id.not_in(subquery))
# ^^^^^^
From the docs:
inherited from the ColumnOperators.not_in() method of ColumnOperators
implement the NOT IN operator.
This is equivalent to using negation with ColumnOperators.in_(), i.e. ~x.in_(y).
Note that version 1.4 states:
The not_in() operator is renamed from notin_() in previous releases. The previous name remains available for backwards compatibility.
So you may find notin_() in some cases.
Try this:
subquery = session.query(table_c.id)
query = query.filter(~table_a.id.in_(subquery))
Note: table_a, table_b and table_c should be mapped classes, not Table instances.
here is the full code:
#join table_a and table_b
query = session.query(table_a, table_b)
query = query.filter(table_a.id == table_b.id)
# create subquery
subquery = session.query(table_c.id)
# select all from table_a not in subquery
query = query.filter(~table_a.id.in_(subquery))

Categories