How can I get the youngest objects from SQLAlchemy? - python

Each row in my table has a date. The date is not unique. The same date is present more than one time.
I want to get all objects with the youngest date.
My solution work but I am not sure if this is a elegent SQLAlchemy way.
query = _session.query(Table._date) \
.order_by(Table._date.desc()) \
.group_by(Table._date)
# this is the younges date (type is date.datetime)
young = query.first()
query = _session.query(Table).filter(Table._date==young)
result = query.all()
Isn't there a way to put all this in one query object or something like that?

You need a having clause, and you need to import the max function
then your query will be:
from sqlalchemy import func
stmt = _session.query(Table) \
.group_by(Table._date) \
.having(Table._date == func.max(Table._date)
This produces a sql statement like the following.
SELECT my_table.*
FROM my_table
GROUP BY my_table._date
HAVING my_table._date = MAX(my_table._date)
If you construct your sql statement with a select, you can examine the sql produced in your case using. *I'm not sure if this would work with statements query
str(stmt)

Two ways of doing this using a sub-query:
# #note: do not need to alias, but do in order to specify `name`
T1 = aliased(MyTable, name="T1")
# version-1:
subquery = (session.query(func.max(T1._date).label("max_date"))
.as_scalar()
)
# version-2:
subquery = (session.query(T1._date.label("max_date"))
.order_by(T1._date.desc())
.limit(1)
.as_scalar()
)
qry = session.query(MyTable).filter(MyTable._date == subquery)
results = qry.all()
The output should be similar to:
# version-1
SELECT my_table.id AS my_table_id, my_table.name AS my_table_name, my_table._date AS my_table__date
FROM my_table
WHERE my_table._date = (
SELECT max("T1"._date) AS max_date
FROM my_table AS "T1")
# version-2
SELECT my_table.id AS my_table_id, my_table.name AS my_table_name, my_table._date AS my_table__date
FROM my_table
WHERE my_table._date = (
SELECT "T1"._date AS max_date
FROM my_table AS "T1"
ORDER BY "T1"._date DESC LIMIT ? OFFSET ?
)

Related

SQLAlchemy Select from Join of two Subqueries

Need help translating this SQL query into SQLAlchemy:
select
COALESCE(DATE_1,DATE_2) as DATE_COMPLETE,
QUESTIONS_CNT,
ANSWERS_CNT
from (
(select DATE as DATE_1,
count(distinct QUESTIONS) as QUESTIONS_CNT
from GUEST_USERS
where LOCATION like '%TEXAS%'
and DATE = '2021-08-08'
group by DATE
) temp1
full join
(select DATE as DATE_2,
count(distinct ANSWERS) as ANSWERS_CNT
from USERS
where LOCATION like '%TEXAS%'
and DATE = '2021-08-08'
group by DATE
) temp2
on temp1.DATE_1=temp2.DATE_2
)
Mainly struggling with the join of the two subqueries. I've tried this (just for the join part of the SQL):
query1 = db.session.query(
GUEST_USERS.DATE_WEEK_START.label("DATE_1"),
func.count(GUEST_USERS.QUESTIONS).label("QUESTIONS_CNT")
).filter(
GUEST_USERS.LOCATION.like("%TEXAS%"),
GUEST_USERS.DATE == "2021-08-08"
).group_by(GUEST_USERS.DATE)
query2 = db_session_stg.query(
USERS.DATE.label("DATE_2"),
func.count(USERS.ANSWERS).label("ANSWERS_CNT")
).filter(
USERS.LOCATION.like("%TEXAS%"),
USERS.DATE == "2021-08-08"
).group_by(USERS.DATE)
sq2 = query2.subquery()
query1_results = query1.join(
sq2,
sq2.c.DATE_2 == GUEST_USERS.DATE)
).all()
In this output I receive only the DATE_1 column and the QUESTIONS_CNT columns. Any idea why the selected output from the subquery is not being returned in the result?
Not sure if this is the best solution but this is how I got it to work. Using 3 subqueries essentially.
query1 = db.session.query(
GUEST_USERS.DATE_WEEK_START.label("DATE_1"),
func.count(GUEST_USERS.QUESTIONS).label("QUESTIONS_CNT")
).filter(
GUEST_USERS.LOCATION.like("%TEXAS%"),
GUEST_USERS.DATE == "2021-08-08"
).group_by(GUEST_USERS.DATE)
query2 = db_session_stg.query(
USERS.DATE.label("DATE_2"),
func.count(USERS.ANSWERS).label("ANSWERS_CNT")
).filter(
USERS.LOCATION.like("%TEXAS%"),
USERS.DATE == "2021-08-08"
).group_by(USERS.DATE)
sq1 = query1.subquery()
sq2 = query2.subquery()
query3 = db.session.query(sq1, sq2).join(
sq2,
sq2.c.DATE_2 == sq1.c.DATE_1)
sq3 = query3.subquery()
query4 = db.session.query(
func.coalesce(
sq3.c.DATE_1, sq3.c.DATE_2),
sq3.c.QUESTIONS_CNT,
sq3.c.ANSWERS_CNT
)
results = query4.all()

Convert rank and partition query to SqlAlchemy

I would like to convert the following query to SqlAlchemy, but the documentation isn't very helpful:
select * from (
select *,
RANK() OVER (PARTITION BY id ORDER BY date desc) AS RNK
from table1
) d
where RNK = 1
Any suggestions?
use over expression
from sqlalchemy import func
subquery = db.session.query(
table1,
func.rank().over(
order_by=table1.c.date.desc(),
partition_by=table1.c.id
).label('rnk')
).subquery()
query = db.session.query(subquery).filter(
subquery.c.rnk==1
)

SQL/Impala Breaking the nested query into a more readable format

I have the following work python code to do impala connection/query:
import pandas as pd
query = 'select my_c_instance_id, count(my_c_instance_id) as my_ins_id_count from ' + \
'(select * from my_table where my_c_id like "%small%") as small_table' + \
' group by(my_c_instance_id)'
cursor = impala_con.cursor()
cursor.execute('USE my_db')
cursor.execute(query)
df_result = as_pandas(cursor)
df_result
The codes work fine, but I am wondering if it is possible to break it into two more readable pieces, something like:
small_table = 'select * from my_table where my_c_id like "%small%"'
query = 'select my_c_instance_id, count(my_c_instance_id) as my_ins_id_count from small_table group by(my_c_instance_id)'
cursor = impala_con.cursor()
cursor.execute('USE my_db')
cursor.execute(query)
df_result = as_pandas(cursor)
df_result
If possible, how do I make the above idea actually work? Thanks.
Unless I'm misunderstanding something, there's no need for the subquery at all, just move the where criteria to the main query:
select my_c_instance_id, count(my_c_instance_id) as my_ins_id_count
from my_table
where my_c_id like '%small%'
group by my_c_instance_id

Get table name for field in database result in Python (PostgreSQL)

I'm trying to get table name for field in result set that I got from database (Python, Postgres). There is a function in PHP to get table name for field, I used it and it works so I know it can be done (in PHP). I'm looking for similar function in Python.
pg_field_table() function in PHP gets results and field number and "returns the name of the table that field belongs to". That is exactly what I need, but in Python.
Simple exaple - create tables, insert rows, select data:
CREATE TABLE table_a (
id INT,
name VARCHAR(10)
);
CREATE TABLE table_b (
id INT,
name VARCHAR(10)
);
INSERT INTO table_a (id, name) VALUES (1, 'hello');
INSERT INTO table_b (id, name) VALUES (1, 'world');
When using psycopg2 or sqlalchemy I got right data and right field names but without information about table name.
import psycopg2
query = '''
SELECT *
FROM table_a A
LEFT JOIN table_b B
ON A.id = B.id
'''
con = psycopg2.connect('dbname=testdb user=postgres password=postgres')
cur = con.cursor()
cur.execute(query)
data = cur.fetchall()
print('fields', [desc[0] for desc in cur.description])
print('data', data)
The example above prints field names. The output is:
fields ['id', 'name', 'id', 'name']
data [(1, 'hello', 1, 'world')]
I know that there is cursor.description, but it does not contain table name, just the field name.
What I need - some way to retrieve table names for fields in result set when using raw SQL to query data.
EDIT 1: I need to know if "hello" came from "table_a" or "table_b", both fields are named same ("name"). Without information about table name you can't tell in which table the value is.
EDIT 2: I know that there are some workarounds like SQL aliases: SELECT table_a.name AS name1, table_b.name AS name2 but I'm really asking how to retrieve table name from result set.
EDIT 3: I'm looking for solution that allows me to write any raw SQL query, sometimes SELECT *, sometimes SELECT A.id, B.id ... and after executing that query I will get field names and table names for fields in the result set.
It is necessary to query the pg_attribute catalog for the table qualified column names:
query = '''
select
string_agg(format(
'%%1$s.%%2$s as "%%1$s.%%2$s"',
attrelid::regclass, attname
) , ', ')
from pg_attribute
where attrelid = any (%s::regclass[]) and attnum > 0 and not attisdropped
'''
cursor.execute(query, ([t for t in ('a','b')],))
select_list = cursor.fetchone()[0]
query = '''
select {}
from a left join b on a.id = b.id
'''.format(select_list)
print cursor.mogrify(query)
cursor.execute(query)
print [desc[0] for desc in cursor.description]
Output:
select a.id as "a.id", a.name as "a.name", b.id as "b.id", b.name as "b.name"
from a left join b on a.id = b.id
['a.id', 'a.name', 'b.id', 'b.name']

Error: Cursor' object has no attribute '_last_executed

I have this cursor
cursor.execute("SELECT price FROM Items WHERE itemID = (
SELECT item_id FROM Purchases
WHERE purchaseID = %d AND customer_id = %d)",
[self.purchaseID, self.customer])
I get this error
'Cursor' object has no attribute '_last_executed'
But when I try this:
cursor.execute("SELECT price FROM Items WHERE itemID = (
SELECT item_id FROM Purchases
WHERE purchaseID = 1 AND customer_id = 1)",
)
there is no error. How do I fix this?
I encountered this problem too. I changed the %d to %s, and it is solved. Wish this is useful for you.
The problem is that you are not making substitutions properly in your select string. From docs:
def execute(self, query, args=None):
"""Execute a query.
query -- string, query to execute on server
args -- optional sequence or mapping, parameters to use with query.
Note: If args is a sequence, then %s must be used as the
parameter placeholder in the query. If a mapping is used,
%(key)s must be used as the placeholder.
Returns long integer rows affected, if any
"""
So, it should be:
cursor.execute("SELECT price FROM Items WHERE itemID = (
SELECT item_id FROM Purchases
WHERE purchaseID = ? AND customer_id = ?)",
(self.purchaseID, self.customer))
The reason is that you are using '%d'. When you use '%' in SQL, the execute will interpret the '%' as the format. You should write your statement like this:
cursor.execute("SELECT price FROM Items WHERE itemID = (
SELECT item_id FROM Purchases
WHERE purchaseID = %%d AND customer_id = %%d)",
[self.purchaseID, self.customer])
Depending on your SQL package, you may need to use cursor.statement instead.
Worked for me using double %%
"SELECT title, address from table t1, table t2 on t1.id=t2.id where t1.title like '%%Brink%%' "
from django.db import connection
print(connection.queries)
The code above should display all the requeries that are executed on the request.

Categories