Using window functions to LIMIT a query with SqlAlchemy on Postgres - python

I'm trying to write the following sql query with sqlalchemy ORM:
SELECT * FROM
(SELECT *, row_number() OVER(w)
FROM (select distinct on (grandma_id, author_id) * from contents) as c
WINDOW w AS (PARTITION BY grandma_id ORDER BY RANDOM())) AS v1
WHERE row_number <= 4;
This is what I've done so far:
s = Session()
unique_users_contents = (s.query(Content).distinct(Content.grandma_id,
Content.author_id)
.subquery())
windowed_contents = (s.query(Content,
func.row_number()
.over(partition_by=Content.grandma_id,
order_by=func.random()))
.select_from(unique_users_contents)).subquery()
contents = (s.query(Content).select_from(windowed_contents)
.filter(row_number >= 4)) ## how can I reference the row_number() value?
result = contents
for content in result:
print "%s\t%s\t%s" % (content.id, content.grandma_id,
content.author_id)
As you can see it's pretty much modeled, but I have no idea how to reference the row_number() result of the subquery from the outer query where. I tried something like windowed_contents.c.row_number and adding a label() call on the window func but it's not working, couldn't find any similar example in the official docs or in stackoverflow.
How can this be accomplished? And also, could you suggest a better way to do this query?

windowed_contents.c.row_number against a label() is how you'd do it, works for me (note the select_entity_from() method is new in SQLA 0.8.2 and will be needed here in 0.9 vs. select_from()):
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Content(Base):
__tablename__ = 'contents'
grandma_id = Column(Integer, primary_key=True)
author_id = Column(Integer, primary_key=True)
s = Session()
unique_users_contents = s.query(Content).distinct(
Content.grandma_id, Content.author_id).\
subquery('c')
q = s.query(
Content,
func.row_number().over(
partition_by=Content.grandma_id,
order_by=func.random()).label("row_number")
).select_entity_from(unique_users_contents).subquery()
q = s.query(Content).select_entity_from(q).filter(q.c.row_number <= 4)
print q

Related

slqalchemy orm using select where with parameters

I am trying something really simple, but I cannot find the proper way to do it in any of the sqlalchemy orm tutorials I can find. I want to do the equivalent of the following from Adonisjs:
Database.query('SELECT * FROM tbl WHERE user = ? AND age = ?', ['Tester', 18])
How do I do parameters in the below sqlalchemy python code? What am I doing wrong?
from sqlalchemy.orm import Session
engine = create_engine("postgresql+psycopg2://test:test#localhost:5432/test", echo=False, future=True)
session = Session(engine)
sql = select(User).where(User.first_name == 'Tester').where(User.age == 18)
user = session.execute(sql)
So instead of User.first_name == 'Tester', I'd like it to be a binding placeholder. Same goes for User.age == 18. Then is session.execute(sql) I'd like to add the bindings. Is there a way to do this, or am I approaching this the incorrect way? I want to use orm, so the syntax above. I'm trying to learn the newest sqlalchemy with orm instead of core.
As far as I know, bind parameters like the ones in your qmark style query are only available on text based queries like a TextClause.
ORM and textual queries are compatible via Select.from_statement.
import sqlalchemy as sa
from sqlalchemy import orm
Base = orm.declarative_base()
class User(Base):
__tablename__ = "user"
id = sa.Column(sa.Integer, primary_key=True)
first_name = sa.Column(sa.String)
age = sa.Column(sa.Integer)
def __repr__(self):
return f"User(first_name={self.first_name}, age={self.age})"
engine = sa.create_engine("sqlite:///:memory:", echo=True, future=True)
Base.metadata.create_all(engine)
u1 = User(first_name="Alice", age=21)
u2 = User(first_name="Bob", age=20)
session = orm.Session(engine)
session.add_all([u1, u2])
session.flush()
stmt = sa.select(User).from_statement(
sa.text("SELECT * FROM user WHERE first_name = :fn AND age = :age")
)
session.execute(stmt, {"fn": "Alice", "age": 21}).scalars().one()
stmt = sa.select(User).where(User.first_name == "Alice", User.age == 21)
session.execute(stmt).scalars().one()
# or with variables
fn = "Alice"
age = 21
stmt = sa.select(User).where(User.first_name == fn, User.age == age)
session.execute(stmt).scalars().one()

SQLAlchemy best way to filter a table based on values from another table

I apologize in advance if my question is banal: I am a total beginner of SQL.
I want to create a simple database, with two tables: Students and Answers.
Basically, each student will answer three question (possible answers are True or False for each question), and his answers will be stored in Answers table.
Students can have two "experience" levels: "Undergraduate" and "Graduate".
What is the best way to obtain all Answers that were given by Students with "Graduate" experience level?
This is how I define SQLAlchemy classes for entries in Students and Answers tables:
import random
from sqlalchemy import create_engine
from sqlalchemy import Column, Integer, String, Date, Boolean, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship
db_uri = "sqlite:///simple_answers.db"
db_engine = create_engine(db_uri)
db_connect = db_engine.connect()
Session = sessionmaker()
Session.configure(bind=db_engine)
db_session = Session()
Base = declarative_base()
class Student(Base):
__tablename__ = "Students"
id = Column(Integer, primary_key=True)
experience = Column(String, nullable=False)
class Answer(Base):
__tablename__ = "Answers"
id = Column(Integer, primary_key=True)
student_id = Column(Integer, ForeignKey("Students.id"), nullable=False)
answer = Column(Boolean, nullable=False)
Base.metadata.create_all(db_connect)
Then, I insert some random entries in the database:
categories_experience = ["Undergraduate", "Graduate"]
categories_answer = [True, False]
n_students = 20
n_answers_by_each_student = 3
random.seed(1)
for _ in range(n_students):
student = Student(experience=random.choice(categories_experience))
db_session.add(student)
db_session.commit()
answers = [Answer(student_id=student.id, answer=random.choice(categories_answer))
for _ in range(n_answers_by_each_student)]
db_session.add_all(answers)
db_session.commit()
Then, I obtain Student.id of all "Graduate" students:
ids_graduates = db_session.query(Student.id).filter(Student.experience == "Graduate").all()
ids_graduates = [result.id for result in ids_graduates]
And finally, I select Answers from "Graduate" Students using .in_ operator:
answers_graduates = db_session.query(Answer).filter(Answer.student_id.in_(ids_graduates)).all()
I manually checked the answers, and they are right. But, since I am a total beginner of SQL, I suspect that there is some better way to achieve the same result.
Is there such an objectively "best" way (more Pythonic, more efficient...)? I would like to achieve my result with SQLAlchemy, possibly using the ORM interface.
When I asked the question, I was in a hurry.
Since then, I have had the time to study SQLAlchemy ORM documentation.
There are two recommended ways to filter tables based on values from another table.
The first way is actually very similar to what I had originally tried:
query_graduates = (
db_session
.query(User.id)
.filter(User.experience == "Graduate")
)
query_answers_graduates = (
db_session
.query(Answer)
.filter(Answer.user_id.in_(query_graduates))
)
answers_graduates = query_answers_graduates.all()
It uses .in_ operator, which accepts as argument either a list of objects, or another query.
The second way uses .join method:
query_answers_graduates = (
db_session
.query(Answer)
.join(User)
.filter(User.experience == "Graduate")
)
The second approach is more concise. I timed both solutions, and the second approach, which uses .join, is slightly faster.
You are mentioning SQL but I am confused if you want to do this particular step in Python or SQL. If SQL, something like this could work:
select * from Students s
inner join Answers a on s.id = a.student_id
where s.experience = "Graduate";
Updated code
I have never used SQLAlchemy before but something similar to this may work...
sql = """select s.Id, a.answer from Students s
inner join Answers a on s.id = a.student_id
where s.experience = "Graduate";"""
with db_session as con:
rows = con.execute(sql)
for row in rows:
print(row)

SQLAlchemy Nested CTE Query

The sqlalchemy core query builder appears to unnest and relocate CTE queries to the "top" of the compiled sql.
I'm converting an existing Postgres query that selects deeply joined data as a single JSON object. The syntax is pretty contrived but it significantly reduces network overhead for large queries. The goal is to build the query dynamically using the sqlalchemy core query builder.
Here's a minimal working example of a nested CTE
with res_cte as (
select
account_0.name acct_name,
(
with offer_cte as (
select
offer_0.id
from
offer offer_0
where
offer_0.account_id = account_0.id
)
select
array_agg(offer_cte.id)
from
offer_cte
) as offer_arr
from
account account_0
)
select
acct_name::text, offer_arr::text
from res_cte
Result
acct_name, offer_arr
---------------------
oliver, null
rachel, {3}
buddy, {4,5}
(my incorrect use of) the core query builder attempts to unnest offer_cte and results in every offer.id being associated with every account_name in the result.
There's no need to re-implement this exact query in an answer, any example that results in a similarly nested CTE would be perfect.
I just implemented the nesting cte feature. It should land with 1.4.24 release.
Pull request: https://github.com/sqlalchemy/sqlalchemy/pull/6709
import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base
# Model declaration
Base = declarative_base()
class Offer(Base):
__tablename__ = "offer"
id = sa.Column(sa.Integer, primary_key=True)
account_id = sa.Column(sa.Integer, nullable=False)
class Account(Base):
__tablename__ = "account"
id = sa.Column(sa.Integer, primary_key=True)
name = sa.Column(sa.TEXT, nullable=False)
# Query construction
account_0 = sa.orm.aliased(Account)
# Watch the nesting keyword set to True
offer_cte = (
sa.select(Offer.id)
.where(Offer.account_id == account_0.id)
.select_from(Offer)
.correlate(account_0).cte("offer_cte", nesting=True)
)
offer_arr = sa.select(sa.func.array_agg(offer_cte.c.id).label("offer_arr"))
res_cte = sa.select(
account_0.name.label("acct_name"),
offer_arr.scalar_subquery().label("offer_arr"),
).cte("res_cte")
final_query = sa.select(
sa.cast(res_cte.c.acct_name, sa.TEXT),
sa.cast(res_cte.c.offer_arr, sa.TEXT),
)
It constructs this query that returns the result you expect:
WITH res_cte AS
(
SELECT
account_1.name AS acct_name
, (
WITH offer_cte AS
(
SELECT
offer.id AS id
FROM
offer
WHERE
offer.account_id = account_1.id
)
SELECT
array_agg(offer_cte.id) AS offer_arr
FROM
offer_cte
) AS offer_arr
FROM
account AS account_1
)
SELECT
CAST(res_cte.acct_name AS TEXT) AS acct_name
, CAST(res_cte.offer_arr AS TEXT) AS offer_arr
FROM
res_cte

Insert and update with core SQLAlchemy

I have a database that I don't have metadata or orm classes for (the database already exists).
I managed to get the select stuff working by:
from sqlalchemy.sql.expression import ColumnClause
from sqlalchemy.sql import table, column, select, update, insert
from sqlalchemy.ext.declarative import *
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
import pyodbc
db = create_engine('mssql+pyodbc://pytest')
Session = sessionmaker(bind=db)
session = Session()
list = []
list.append (column("field1"))
list.append (column("field2"))
list.append (column("field3"))
s = select(list)
s.append_from('table')
s.append_whereclause("field1 = 'abc'")
s = s.limit(10)
result = session.execute(s)
out = result.fetchall()
print(out)
So far so good.
The only way I can get an update/insert working is by executing a raw query like:
session.execute(<Some sql>)
I would like to make it so I can make a class out of that like:
u = Update("table")
u.Set("file1","some value")
u.Where(<some conditon>)
seasion.execute(u)
Tried (this is just one of the approaches I tried):
i = insert("table")
v = i.values([{"name":"name1"}, {"name":"name2"}])
u = update("table")
u = u.values({"name": "test1"})
I can't get that to execute on:
session.execute(i)
or
session.execute(u)
Any suggestion how to construct an insert or update without writing ORM models?
As you can see from the SQLAlchemy Overview documentation, sqlalchemy is build with two layers: ORM and Core. Currently you are using only some constructs of the Core and building everything manually.
In order to use Core you should let SQLAlchemy know some meta information about your database in order for it to operate on it. Assuming you have a table mytable with columns field1, field2, field3 and a defined primary key, the code below should perform all the tasks you need:
from sqlalchemy.sql import table, column, select, update, insert
# define meta information
metadata = MetaData(bind=engine)
mytable = Table('mytable', metadata, autoload=True)
# select
s = mytable.select() # or:
#s = select([mytable]) # or (if only certain columns):
#s = select([mytable.c.field1, mytable.c.field2, mytable.c.field3])
s = s.where(mytable.c.field1 == 'abc')
result = session.execute(s)
out = result.fetchall()
print(out)
# insert
i = insert(mytable)
i = i.values({"field1": "value1", "field2": "value2"})
session.execute(i)
# update
u = update(mytable)
u = u.values({"field3": "new_value"})
u = u.where(mytable.c.id == 33)
session.execute(u)

Converting SQL commands to Python's ORM

How would you convert the following codes to Python's ORM such as by SQLalchemy?
#1 Putting data to Pg
import os, pg, sys, re, psycopg2
#conn = psycopg2.connect("dbname='tkk' host='localhost' port='5432' user='noa' password='123'")
conn = psycopg2.connect("dbname=tk user=naa password=123")
cur = conn.cursor()
cur.execute("""INSERT INTO courses (course_nro)
VALUES ( %(course_nro)s )""", dict(course_nro='abcd'))
conn.commit()
#2 Fetching
cur.execute("SELECT * FROM courses")
print cur.fetchall()
Examples about the two commands in SQLalchemy
insert
sqlalchemy.sql.expression.insert(table, values=None, inline=False, **kwargs)
select
sqlalchemy.sql.expression.select(columns=None, whereclause=None, from_obj=[], **kwargs)
After the initial declarations, you can do something like this:
o = Course(course_nro='abcd')
session.add(o)
session.commit()
and
print session.query(Course).all()
The declarations could look something like this:
from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import session_maker
# create an engine, and a base class
engine = create_engine('postgre://naa:123#localhost/tk')
DeclarativeBase = declarative_base(bind=engine)
metadata = DeclarativeBase.metadata
# create a session
Session = session_maker(engine)
session = Session()
# declare the models
class Cource(DelcarativeBase):
__tablename__ = 'courses'
course_nro = Column('course_nro', CHAR(12))
This declarative method is just one way of using sqlalchemy.
Even though this is old, more examples can't hurt, right? I thought I'd demonstrate how to do this with PyORMish.
from pyormish import Model
class Course(Model):
_TABLE_NAME = 'courses'
_PRIMARY_FIELD = 'id' # or whatever your primary field is
_SELECT_FIELDS = ('id','course_nro')
_COMMIT_FIELDS = ('course_nro',)
Model.db_config = dict(
DB_TYPE='postgres',
DB_CONN_STRING='postgre://naa:123#localhost/tk'
)
To create:
new_course = Course().create(course_nro='abcd')
To select:
# return the first row WHERE course_nro='abcd'
new_course = Course().get_by_fields(course_nro='abcd')

Categories