Group by & count function in sqlalchemy

Group by & count function in sqlalchemy - python

I want a "group by and count" command in sqlalchemy. How can I do this?

The documentation on counting says that for group_by queries it is better to use func.count():
from sqlalchemy import func
session.query(Table.column,
func.count(Table.column)).group_by(Table.column).all()

If you are using Table.query property:
from sqlalchemy import func
Table.query.with_entities(Table.column, func.count(Table.column)).group_by(Table.column).all()
If you are using session.query() method (as stated in miniwark's answer):
from sqlalchemy import func
session.query(Table.column, func.count(Table.column)).group_by(Table.column).all()

You can also count on multiple groups and their intersection:
self.session.query(func.count(Table.column1),Table.column1, Table.column2).group_by(Table.column1, Table.column2).all()
The query above will return counts for all possible combinations of values from both columns.

Related

In SQLAlchemy, is there a way to eager-load multiple aliased selectables of the same class using one query?

I have an SQLAlchemy mapped class MyClass, and two aliases for it. I can eager-load a relationship MyClass.relationship on each alias separately using selectinload() like so:
alias_1, alias_2 = aliased(MyClass), aliased(MyClass)
q = session.query(alias_1, alias_2).options(
selectinload(alias_1.relationship),
selectinload(alias_2.relationship))
However, this results in 2 separate SQL queries on MyClass.relationship (in addition to the main query on MyClass, but this is irrelevant to the question). Since these 2 queries on MyClass.relationship are to the same table, I think that it should be possible to merge the primary keys generated within the IN clause in these queries, and just run 1 query on MyClass.relationship.
My best guess for how to do this is:
alias_1, alias_2 = aliased(MyClass), aliased(MyClass)
q = session.query(alias_1, alias_2).options(
selectinload(MyClass.relationship))
But it clearly didn't work:
sqlalchemy.exc.ArgumentError: Mapped attribute "MyClass.relationship" does not apply to any of the root entities in this query, e.g. aliased(MyClass), aliased(MyClass). Please specify the full path from one of the root entities to the target attribute.
Is there a way to do this in SQLAlchemy?

So, this is exactly the same issue we had. This docs explains how to do it.
You need to add selectin_polymorphic. For anyone else if you are using with_polymorphic in your select then remove it.
from sqlalchemy.orm import selectin_polymorphic
query = session.query(MyClass).options(
selectin_polymorphic(MyClass, [alias_1, alias_2]),
selectinload(MyClass.relationship)
)

How to sort sqlalchemy table with python function?

I have table with column equation, on this column I store mathematical equation like x+12-25+z**2, x*z-12, etc. for each row it is different equation and I want to sort my table by equation's output. X and Z are some python variables, you can think of them as numpy.array. Variables are updated every 15-30 minutes.
My table looks like this
class Table(Base):
....
equation = Column(String(1024))
I calculate equations in python by my function calculate_equation(string) it takes care of the placing all variables and doing math operations also it returns only 1 number.
I tried with hybrid_property which looks like this:
#hybrid_property
def equation_value(self):
return calculate_equation(self.equation)
And sort it with:
session.query(Table).order_by(Table.equation_value).all()
But it throws errors.
Any advises on how to do it? Is this even correct? Should I use other data storing mechanism?
Suggestions are appreciated.

The .order_by() method emits an SQL ORDER BY clause; so the thing you're ordering by needs to be an SQL expression, not an arbitrary python function. Just tagging the method with #hybrid_property isn't enough - you need to also implement #equation_value.expression as an sqlalchemy expression.
You have two options:
Implement your calculate_equation function in SQL (or as an sqlalchemy expression/hybrid prop), which will allow you to use it in an ORDER BY clause. From your description, this is probably very difficult.
Just query for everything and do the sorting afterwards in memory:
sorted(session.query(Table).all(), key=calculate_equation)

Python SQLAlchemy Query using labeled OVER clause with ORM

This other question says how to use the OVER clause on sqlalchemy:
Using the OVER window function in SQLAlchemy
But how to do that using ORM? I have something like:
q = self.session.query(self.entity, func.count().over().label('count_over'))
This fails when I call q.all() with the following message:
sqlalchemy.exc.InvalidRequestError:
Ambiguous column name 'count(*) OVER ()' in result set! try 'use_labels' option on select statement
How can I solve this?

You have the over syntax almost correct, it should be something like this:
import sqlalchemy
q = self.session.query(
self.entity,
sqlalchemy.over(func.count()).label('count_over'),
)
Example from the docs:
from sqlalchemy import over
over(func.row_number(), order_by='x')

SQLAlchemy Query object has with_entities method that can be used to customize the list of columns the query returns:
Model.query.with_entities(Model.foo, func.count().over().label('count_over'))
Resulting in following SQL:
SELECT models.foo AS models_foo, count(*) OVER () AS count_over FROM models

You got the functions right. They way to use them to produce the desired result would be as follows:
from sqlalchemy import func
q = self.session.query(self.entity, func.count(self.entity).over().label('count_over'))
This will produce a COUNT(*) statement since no Entity.field was specified. I use the following format:
from myschema import MyEntity
from sqlalchemy import func
q = self.session.query(MyEntity, func.count(MyEntity.id).over().label('count'))
That is if there is an id field, of course. But you get the mechanics :-)

Setting a default value in sqlalchemy

I would like to set a column default value that is based on another table in my SQLAlchemy model.
Currently I have this:
Column('version', Integer, default=1)
What I need is (roughly) this:
Column('version', Integer, default="SELECT MAX(1, MAX(old_versions)) FROM version_table")
How can I implement this in SQLAlchemy?

The documentation gives the following possibilities for default:
A scalar, Python callable, or ClauseElement representing the default
value for this column, which will be invoked upon insert if this
column is otherwise not specified in the VALUES clause of the insert.
You may look into using a simple function, or you may just be able to use a select() object.
In your case, maybe something along the lines of:
from sqlalchemy.sql import select, func
...
Column('version', Integer, default=select([func.max(1,
func.max(version_table.c.old_versions))]))

You want server_default
Column('version', Integer, server_default="SELECT MAX(1, MAX(old_versions)) FROM version_table")

If you want to use a DML statement to generate the default value, you can simply use the text method to indicate that you are passing DML. You may also need an extra set of parentheses if the engine wants to write this inside a VALUES clause , e.g.:
from sqlachemy import text
Column('version', Integer, default=text("(SELECT MAX(1, MAX(old_versions)) FROM version_table)"))
I've used this technique to use a sequence to override the server default ID generation, e.g.:
Column('version', Integer, default=text("NEXT VALUE FOR someSequence"))

How to use avg and sum in SQLAlchemy query

I'm trying to return a totals/averages row from my dataset which contains the SUM of certain fields and the AVG of others.
I could do this in SQL via:
SELECT SUM(field1) as SumFld, AVG(field2) as AvgFld
FROM Rating WHERE url=[url_string]
My attempt to translate this into SQLAlchemy is as follows:
totals = Rating.query(func.avg(Rating.field2)).filter(Rating.url==url_string.netloc)
But this is erroring out with:
TypeError: 'BaseQuery' object is not callable

You should use something like:
from sqlalchemy.sql import func
session.query(func.avg(Rating.field2).label('average')).filter(Rating.url==url_string.netloc)
You cannot use MyObject.query here, because SqlAlchemy tries to find a field to put result of avg function to, and it fails.

You cannot use MyObject.query here, because SqlAlchemy tries to find a field to put result of avg function to, and it fails.
This isn't exactly true. func.avg(Rating.field2).label('average') returns a Column object (the same type object that it was given to be precise). So you can use it with the with_entities method of the query object.
This is how you would do it for your example:
Rating.query.with_entities(func.avg(Rating.field2).label('average')).filter(Rating.url == url_string.netloc)

attention = Attention_scores.query
.with_entities(func.avg(Attention_scores.score))
.filter(classroom_number == classroom_number)
.all()
I tried it like this and it gave the correct average.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Group by & count function in sqlalchemy - python

I want a "group by and count" command in sqlalchemy. How can I do this?

The documentation on counting says that for group_by queries it is better to use func.count(): from sqlalchemy import func session.query(Table.column, func.count(Table.column)).group_by(Table.column).all()

You can also count on multiple groups and their intersection: self.session.query(func.count(Table.column1),Table.column1, Table.column2).group_by(Table.column1, Table.column2).all() The query above will return counts for all possible combinations of values from both columns.

Related

In SQLAlchemy, is there a way to eager-load multiple aliased selectables of the same class using one query?

How to sort sqlalchemy table with python function?

Python SQLAlchemy Query using labeled OVER clause with ORM

Setting a default value in sqlalchemy

How to use avg and sum in SQLAlchemy query

Categories

Resources