SqlAlchemy select with max, group_by and order_by - python

I have to list the last modified resources for each group, for that I can do this query:
model.Session.query(
model.Resource, func.max(model.Resource.last_modified)
).group_by(model.Resource.resource_group_id).order_by(
model.Resource.last_modified.desc())
But SqlAlchemy complains with:
ProgrammingError: (ProgrammingError) column "resource.id" must appear in
the GROUP BY clause or be used in an aggregate function
How I can select only resource_group_id and last_modified columns?
In SQL what I want is this:
SELECT resource_group_id, max(last_modified) AS max_1
FROM resource GROUP BY resource_group_id ORDER BY max_1 DESC

model.Session.query(
model.Resource.resource_group_id, func.max(model.Resource.last_modified)
).group_by(model.Resource.resource_group_id).order_by(
func.max(model.Resource.last_modified).desc())

You already got it, but I'll try to explain what's going on with the original query for future reference.
In sqlalchemy if you specified query(model.Resource, ...), a model reference, it will list each column on the resource table in the generated SQL select statement, so your original query would look something like:
SELECT resource.resource_group_id AS resource_group_id,
resource.extra_column1 AS extra_column1,
resource.extra_column2 AS extra_column2,
...
count(resource.resource_group_id) AS max_1
GROUP BY resource_group_id ORDER BY max_1 DESC;
This won't work with a GROUP BY.
A common way to avoid this is to specify what columns you want to select explicitly by adding them to the query method .query(model.Resource.resource_group_id)

Related

count subquery in sqlalchemy

I'm having some trouble translating a subquery into sqlalchemy. I have two tables that both have a store_id column that is a foreign key (but it isn't a direct many-to-many relationship) and I need to return the id, store_id and name from table 1 along with the number of records from table 2 that also have the same store_id. I know the SQL that I would use to return those records I'm just now sure how to do it using sqlalchemy.
SELECT
table_1.id
table_1.store_id,
table_1.name,
(
SELECT
count(table_2.id)
FROM
table_2
WHERE
table_1.store_id = table_2.store_id
) AS store_count FROM table_1;
This post actually answered my question. I must have missed it when I was searching initially. My solution below.
Generate sql with subquery as a column in select statement using SQLAlchemy
store_count = session.query(func.count(Table2.id)).filter(Table2.store_id == Table1.store_id)
session.query.add_columns(Table1.id, Table1.name, Table1.store_id, store_count.label("store_count"))

Django group_by argument depending on order_by

I'm struggling (again) with Django's annotate functionality where the actual SQL query is quite clear to me.
Goal:
I want to get the number of users with a certain let's say status (it could be just any column of the model).
Approach(es):
1) User.objects.values('status').annotate(count=Count('*'))
This results into the following SQL query
SELECT users_user.status, COUNT(*) as count
FROM users_user
GROUP BY users_user.id
ORDER BY usser_user.id ASC
However, this will give me a queryset of all users each "annotated" with the count value. This is kind of the behaviour I would have expected.
2) User.objects.values('status').annotate(count=Count('*')).order_by()
This results into the following SQL query
SELECT users_user.status, COUNT(*) as count
FROM users_user
GROUP BY users_user.status
No ORDER BY, and now the GROUP BY argument is the status column. This is not what I expected, but the result I was looking for.
Question:
Why does Django's order_by() without any argument affect the SQL GROUP BY argument? (Or broader, why does the second approach "work"?)
Some details:
django 2.2.9
postgres 9.4
This is explained here
Fields that are mentioned in the order_by() part of a queryset (or which are used in the default ordering on a model) are used when selecting the output data, even if they are not otherwise specified in the values() call.

Passing a value into Sqlite3 python

I am trying to pass a method parameter into an sqlite command and I am having problems.
I have tried several different methods including:
c.execute('''SELECT ?, count(*)
FROM posts
GROUP BY 1''', (var,))
c.execute('''SELECT :variable, count(*)
FROM posts
GROUP BY 1''', {"variable" : var})
I have looked at the python docs on this and believe I am following them.
Both these methods don't select the columns, but return
[('lang', 3284469)]
Where lang is the name of the column and the value of the variable being passed. It should look more like
[('en', 3289)]
[('es', 845)]
....
[('ze', 39)]
The only method I have been able to get working is:
c.execute('''SELECT '''+var+''', count(*)
FROM posts
GROUP BY 1''')
I am not particularly happy with this, is there a better way to do this? What am I doing wrong? Thanks
That is not the way parameterized requests work. You should only use parameters for values, not for identifiers. That means that request parameters should not be column names, nor table names, and occur only in the where or having clause.
So if you want the column name or the table name to be variable, you must dynamically build the request string, and when you want values in the where clause to be dynamic, you use parameters:
c.execute('''SELECT '''+var+''', count(*) as total
FROM posts
GROUP BY 1
HAVING total > ?''', (minval,))

sqlalchemy using INTERSECT and UNNEST

I'm trying to translate a raw SQL to sqlalchemy core/orm but I'm having some difficulties. Here is the SQL:
SELECT
(SELECT UNNEST(MyTable.my_array_column)
INTERSECT
SELECT UNNEST(ARRAY['VAL1', 'VAL2']::varchar[])) AS matched
FROM
MyTable
WHERE
my_array_column && ARRAY['VAL1', 'VAL2']::varchar[];
The following query, gives me a FROM clause which I don't need in my nested SELECT:
matched = select([func.unnest(MyTable.my_array_column)]).intersect(select([func.unnest('VAL1', 'VAL2')]))
# SELECT unnest(MyTable.my_array_colum) AS unnest_1
# FROM MyTable INTERSECT SELECT unnest(%(unnest_3)s, %(unnest_4)s) AS unnest_2
How can I tell the select to not include the FROM clause? Note that func.unnest() only accepts a column. So I cannot use func.unnest('my_array_column').
Referring to a table of an enclosing query in a subquery is the process of correlation, which SQLAlchemy attempts to do automatically. In this case, it doesn't quite work, I believe, because your INTERSECT query is a "selectable", not a scalar value, which SQLAlchemy attempts to put in the FROM list instead of the SELECT list.
The solution is twofold. We need to make SQLAlchemy put the INTERSECT query in the SELECT list by applying a label, and make it correlate MyTable correctly:
select([
select([func.unnest(MyTable.my_array_column)]).correlate(MyTable)
.intersect(select([func.unnest('VAL1', 'VAL2')]))
.label("matched")
]).select_from(MyTable)
# SELECT (SELECT unnest("MyTable".my_array_column) AS unnest_1 INTERSECT SELECT unnest(%(unnest_3)s, %(unnest_4)s) AS unnest_2) AS matched
# FROM "MyTable"

SQLAlchemy: filter on operator in a many-to-many relationship

I have two classes with a many-to-many relationship, Items and Categories.
Categories have an associated value.
I would like to query for all Items for which the highest Categorie.value (if there is any) is less than a given value.
So far I have tried queries like this:
from sqlalchemy.sql import functions
Session.query(Items).join(Categories,Items.categories).filter(functions.max(Categories.value)<3.14).all()
But in this case I get a (OperationalError) misuse of aggregate function max() error.
Is there a way to make this query?
You need GROUP BY and HAVING instead of just WHERE for filtering on an aggregate.
Session.query(Items).join(Items.categories).group_by(Items.id).having(functions.max(Categories.value)<3.14).all()
Edit: To also include Items without any category, I believe you can do an outer join and put an OR in the HAVING clause:
Session.query(Items).outerjoin(Items.categories).group_by(Items.id)\
.having( (functions.max(Categories.value)<3.14) | (functions.count(Categories.id)==0) )\
.all()

Categories