SQLAlchemy subquery access outer tables - python

I'm having trouble converting this SQL into a valid SQLAlchemy query:
select *
from A
join B on B.Id = (
select top 1 Id
from B
where B.name = A.name
order by B.date
)
I've tried using the subquery but it fails:
query = session.query(A, B)
sub_query = session.query(B)
sub_query = sub_query.filter(B.name == A.name)
sub_query = sub_query.order_by(B.date.desc()).limit(1)
sub_query = sub_query.subquery()
query = query.join(B, B.id == sub_query.c.Id)
By accessing the A in the subquery, SqLAlchemy will add it to the subquery from clause and doesn't use the A from the outer query.
I've seen many SQLAlchemy subquery examples but none of them uses the outer fields.

By using correlate(A) in the subquery we tell the SQLAlchemy that reuses A from the outer query.
For making the join work we should access the Id of the subquery, so we should return only Id and use scalar_subquery() to convert the subquery to a scalar subquery:
query = session.query(A, B)
sub_query = session.query(B.Id)
sub_query = sub_query.filter(B.name == A.name)
sub_query = sub_query.order_by(B.date.desc()).limit(1)
sub_query = sub_query.correlate(A)
query = query.join(B, B.id == sub_query.scalar_subquery())

Related

SQLAlchemy Select from Join of two Subqueries

Need help translating this SQL query into SQLAlchemy:
select
COALESCE(DATE_1,DATE_2) as DATE_COMPLETE,
QUESTIONS_CNT,
ANSWERS_CNT
from (
(select DATE as DATE_1,
count(distinct QUESTIONS) as QUESTIONS_CNT
from GUEST_USERS
where LOCATION like '%TEXAS%'
and DATE = '2021-08-08'
group by DATE
) temp1
full join
(select DATE as DATE_2,
count(distinct ANSWERS) as ANSWERS_CNT
from USERS
where LOCATION like '%TEXAS%'
and DATE = '2021-08-08'
group by DATE
) temp2
on temp1.DATE_1=temp2.DATE_2
)
Mainly struggling with the join of the two subqueries. I've tried this (just for the join part of the SQL):
query1 = db.session.query(
GUEST_USERS.DATE_WEEK_START.label("DATE_1"),
func.count(GUEST_USERS.QUESTIONS).label("QUESTIONS_CNT")
).filter(
GUEST_USERS.LOCATION.like("%TEXAS%"),
GUEST_USERS.DATE == "2021-08-08"
).group_by(GUEST_USERS.DATE)
query2 = db_session_stg.query(
USERS.DATE.label("DATE_2"),
func.count(USERS.ANSWERS).label("ANSWERS_CNT")
).filter(
USERS.LOCATION.like("%TEXAS%"),
USERS.DATE == "2021-08-08"
).group_by(USERS.DATE)
sq2 = query2.subquery()
query1_results = query1.join(
sq2,
sq2.c.DATE_2 == GUEST_USERS.DATE)
).all()
In this output I receive only the DATE_1 column and the QUESTIONS_CNT columns. Any idea why the selected output from the subquery is not being returned in the result?
Not sure if this is the best solution but this is how I got it to work. Using 3 subqueries essentially.
query1 = db.session.query(
GUEST_USERS.DATE_WEEK_START.label("DATE_1"),
func.count(GUEST_USERS.QUESTIONS).label("QUESTIONS_CNT")
).filter(
GUEST_USERS.LOCATION.like("%TEXAS%"),
GUEST_USERS.DATE == "2021-08-08"
).group_by(GUEST_USERS.DATE)
query2 = db_session_stg.query(
USERS.DATE.label("DATE_2"),
func.count(USERS.ANSWERS).label("ANSWERS_CNT")
).filter(
USERS.LOCATION.like("%TEXAS%"),
USERS.DATE == "2021-08-08"
).group_by(USERS.DATE)
sq1 = query1.subquery()
sq2 = query2.subquery()
query3 = db.session.query(sq1, sq2).join(
sq2,
sq2.c.DATE_2 == sq1.c.DATE_1)
sq3 = query3.subquery()
query4 = db.session.query(
func.coalesce(
sq3.c.DATE_1, sq3.c.DATE_2),
sq3.c.QUESTIONS_CNT,
sq3.c.ANSWERS_CNT
)
results = query4.all()

MySQL - select table name if it contains record for list of tables

I am interested in finding the most efficient manner to query the following:
For a list of table names, return the table name if it contains at least one record that meet the conditions
Essentially, something similar to the following Python code in a single query:
dfs = [pd.read_sql('SELECT name FROM {} WHERE a=1 AND b=2'.format(table), engine) for table in tables]
tables = [table for table, df in zip(tables, dfs) if not df.empty]
Is this possible in MySQL?
Assuming you trust the table names in tables not to contain any surprises leading to SQL injection, you could device something like:
from sqlalchemy import text
selects = [f'SELECT :table_{i} FROM {table} WHERE a = 1 AND b = 2'
for i, table in enumerate(tables)]
stmt = ' UNION '.join(selects)
stmt = text(stmt)
results = engine.execute(
stmt, {f'table_{i}': table for i, table in enumerate(tables)})
or you could use SQLAlchemy constructs to build the same query safely:
from sqlalchemy import table, column, union, and_, select, Integer, literal
tbls = [table(name,
column('a', Integer),
column('b', Integer)) for name in tables]
stmt = union(*[select([literal(name).label('name')]).
select_from(tbl).
where(and_(tbl.c.a == 1, tbl.c.b == 2))
for tbl, name in zip(tbls, tables)])
results = engine.execute(stmt)
You can use a UNION of queries that search each table.
(SELECT 'table1' AS table_name
FROM table1
WHERE a = 1 AND b = 2
LIMIT 1)
UNION
(SELECT 'table2' AS table_name
FROM table2
WHERE a = 1 AND b = 2
LIMIT 1)
UNION
(SELECT 'table3' AS table_name
FROM table3
WHERE a = 1 AND b = 2
LIMIT 1)
...

contains_eager and limits in SQLAlchemy

I have 2 classes:
class A(Base):
id = Column(Integer, primary_key=True)
name = Column(String)
children = relationship('B')
class B(Base):
id = Column(Integer, primary_key=True)
id_a = Column(Integer, ForeignKey('a.id'))
name = Column(String)
Now I need all object A which contains B with some name and A object will contain all B objects filtered.
To achieve it I build query.
query = db.session.query(A).join(B).options(db.contains_eager(A.children)).filter(B.name=='SOME_TEXT')
Now I need only 50 items of query so I do:
query.limit(50).all()
Result contain less then 50 even if without limit there is more than 50. I read The Zen of Eager Loading. But there must be some trick to achieve it. One of my idea is to make 2 query. One with innerjoin to take ID's then use this ID's in first query.
But maybe there is better solve for this.
First, take a step back and look at the SQL. Your current query is
SELECT * FROM a JOIN b ON b.id_a = a.id WHERE b.name == '...' LIMIT 50;
Notice the limit is on a JOIN b and not a, but if you put the limit on a you can't filter by the field in b. There are two solutions to this problem. The first is to use a scalar subquery to filter on b.name, like this:
SELECT * FROM a
WHERE EXISTS (SELECT 1 FROM b WHERE b.id_a = a.id AND b.name = '...')
LIMIT 50;
This can be inefficient depending on the DB backend. The second solution is to do a DISTINCT on a after the join, like this:
SELECT DISTINCT a.* FROM a JOIN b ON b.id_a = a.id
WHERE b.name == '...'
LIMIT 50;
Notice how in either case you do not get any column from b. How do we get them? Do another join!
SELECT * FROM (
SELECT DISTINCT a.* FROM a JOIN b ON b.id_a = a.id
WHERE b.name == '...'
LIMIT 50;
) a JOIN b ON b.id_a = a.id
WHERE b.name == '...';
Now, to write all of this in SQLAlchemy:
subquery = (
session.query(A)
.join(B)
.with_entities(A) # only select A's columns
.filter(B.name == '...')
.distinct()
.limit(50)
.subquery() # convert to subquery
)
aliased_A = aliased(A, subquery)
query = (
session.query(aliased_A)
.join(B)
.options(contains_eager(aliased_A.children))
.filter(B.name == "...")
)

Performing union with three queries - SQLAlchemy

In my project setup querying is being done based on the SQLAlchemy.
As per my previous requirements I have done the union with two queries.
Now I need to do Union with three queries.
Code is as follows:
query1 = query1.filter(model.name == "in-addr.arpa.")
query2 = query2.filter(model.tenant_id.in_(tenant_ids))
query = query1.union(query2)
Now Here I need to add one more query as follows:
query3 = query3.filter(model.tenant_id == context.tenant_id)
So I need to perform Union with all the three queries.
The solution is following:
query1 = query1.filter(model.name == "in-addr.arpa.")
query2 = query2.filter(model.tenant_id.in_(tenant_ids))
query3 = query3.filter(model.tenant_id == context.tenant_id)
query = query1.union(query2,query3)
This is how I did this in SQLAlchemy 1.3
from sqlalchemy import union
query1 = query1.filter(model.name == "in-addr.arpa.")
query2 = query2.filter(model.tenant_id.in_(tenant_ids))
query3 = query3.filter(model.tenant_id == context.tenant_id)
all_queries = [query1, query2, query3]
golden_set = union(*all_queries)
The change here is that the union method accepts a list of SQLAlchemy selectables.
In SQLAlchemy 1.4 you will need to use the function union and pass the queries as positional arguments instead of a list.
from sqlalchemy import union
query1 = query1.filter(model.name == "in-addr.arpa.")
query2 = query2.filter(model.tenant_id.in_(tenant_ids))
query3 = query3.filter(model.tenant_id == context.tenant_id)
query = union(query1, query2, query3)

SQLAlchemy Joining with subquery issue

I am trying to translate SQL into SQLAlchemy. The SQL version of the query I want is as follows:
SELECT * from calendarEventAttendee
JOIN calendarEventAttendanceActual ON calendarEventAttendanceActual.id = calendarEventAttendee.attendanceActualId
LEFT JOIN
(SELECT bill.id, bill.personId, billToEvent.eventId FROM bill JOIN billToEvent ON bill.id = billToEvent.billId) b
ON b.eventId = calendarEventAttendee.eventId AND b.personId = calendarEventAttendee.personId
WHERE b.id is NULL
My SQLAlchemy query is as follows:
query = db.session.query(CalendarEventAttendee).join(CalendarEventAttendanceActual)
sub_query = db.session.query(Bill, BillToEvent).join(BillToEvent, BillToEvent.billId == Bill.id).subquery()
query = query.outerjoin(sub_query, and_(sub_query.Bill.personId == CalendarEventAttendee.personId, Bill.eventId == CalendarEventAttendee.eventId))
results = query.all()
I am getting an error AttributeError: 'Alias' object has no attribute 'Bill'
If I adjust the SQLAlchemy query to the following:
sub_query = db.session.query(Bill, BillToEvent).join(BillToEvent, BillToEvent.billId == Bill.id).subquery()
query = query.outerjoin(sub_query, and_(sub_query.Bill.personId == CalendarEventAttendee.personId, sub_query.BillToEvent.eventId == CalendarEventAttendee.eventId))
results = query.all()
I get an error AttributeError: Bill
Any help would be appreciated, thanks!
Once you call subquery(), there is no access to objects, but only to columns via .c.{column_name} accessor.
Do the following for sub_query instead: load only the columns you need in order to avoid any name collisions:
sub_query = db.session.query(
Bill.id, Bill.personId, BillToEvent.eventId
).join(BillToEvent, BillToEvent.billId == Bill.id).subquery()
Then in your query use column names with .c.column_name:
query = query.outerjoin(
sub_query, and_(
sub_query.c.personId == CalendarEventAttendee.personId,
sub_query.c.eventId == CalendarEventAttendee.eventId)
)
results = query.all()

Categories