contains_eager and limits in SQLAlchemy - python

I have 2 classes:
class A(Base):
id = Column(Integer, primary_key=True)
name = Column(String)
children = relationship('B')
class B(Base):
id = Column(Integer, primary_key=True)
id_a = Column(Integer, ForeignKey('a.id'))
name = Column(String)
Now I need all object A which contains B with some name and A object will contain all B objects filtered.
To achieve it I build query.
query = db.session.query(A).join(B).options(db.contains_eager(A.children)).filter(B.name=='SOME_TEXT')
Now I need only 50 items of query so I do:
query.limit(50).all()
Result contain less then 50 even if without limit there is more than 50. I read The Zen of Eager Loading. But there must be some trick to achieve it. One of my idea is to make 2 query. One with innerjoin to take ID's then use this ID's in first query.
But maybe there is better solve for this.

First, take a step back and look at the SQL. Your current query is
SELECT * FROM a JOIN b ON b.id_a = a.id WHERE b.name == '...' LIMIT 50;
Notice the limit is on a JOIN b and not a, but if you put the limit on a you can't filter by the field in b. There are two solutions to this problem. The first is to use a scalar subquery to filter on b.name, like this:
SELECT * FROM a
WHERE EXISTS (SELECT 1 FROM b WHERE b.id_a = a.id AND b.name = '...')
LIMIT 50;
This can be inefficient depending on the DB backend. The second solution is to do a DISTINCT on a after the join, like this:
SELECT DISTINCT a.* FROM a JOIN b ON b.id_a = a.id
WHERE b.name == '...'
LIMIT 50;
Notice how in either case you do not get any column from b. How do we get them? Do another join!
SELECT * FROM (
SELECT DISTINCT a.* FROM a JOIN b ON b.id_a = a.id
WHERE b.name == '...'
LIMIT 50;
) a JOIN b ON b.id_a = a.id
WHERE b.name == '...';
Now, to write all of this in SQLAlchemy:
subquery = (
session.query(A)
.join(B)
.with_entities(A) # only select A's columns
.filter(B.name == '...')
.distinct()
.limit(50)
.subquery() # convert to subquery
)
aliased_A = aliased(A, subquery)
query = (
session.query(aliased_A)
.join(B)
.options(contains_eager(aliased_A.children))
.filter(B.name == "...")
)

Related

SQLAlchemy subquery access outer tables

I'm having trouble converting this SQL into a valid SQLAlchemy query:
select *
from A
join B on B.Id = (
select top 1 Id
from B
where B.name = A.name
order by B.date
)
I've tried using the subquery but it fails:
query = session.query(A, B)
sub_query = session.query(B)
sub_query = sub_query.filter(B.name == A.name)
sub_query = sub_query.order_by(B.date.desc()).limit(1)
sub_query = sub_query.subquery()
query = query.join(B, B.id == sub_query.c.Id)
By accessing the A in the subquery, SqLAlchemy will add it to the subquery from clause and doesn't use the A from the outer query.
I've seen many SQLAlchemy subquery examples but none of them uses the outer fields.
By using correlate(A) in the subquery we tell the SQLAlchemy that reuses A from the outer query.
For making the join work we should access the Id of the subquery, so we should return only Id and use scalar_subquery() to convert the subquery to a scalar subquery:
query = session.query(A, B)
sub_query = session.query(B.Id)
sub_query = sub_query.filter(B.name == A.name)
sub_query = sub_query.order_by(B.date.desc()).limit(1)
sub_query = sub_query.correlate(A)
query = query.join(B, B.id == sub_query.scalar_subquery())

MySQL - select table name if it contains record for list of tables

I am interested in finding the most efficient manner to query the following:
For a list of table names, return the table name if it contains at least one record that meet the conditions
Essentially, something similar to the following Python code in a single query:
dfs = [pd.read_sql('SELECT name FROM {} WHERE a=1 AND b=2'.format(table), engine) for table in tables]
tables = [table for table, df in zip(tables, dfs) if not df.empty]
Is this possible in MySQL?
Assuming you trust the table names in tables not to contain any surprises leading to SQL injection, you could device something like:
from sqlalchemy import text
selects = [f'SELECT :table_{i} FROM {table} WHERE a = 1 AND b = 2'
for i, table in enumerate(tables)]
stmt = ' UNION '.join(selects)
stmt = text(stmt)
results = engine.execute(
stmt, {f'table_{i}': table for i, table in enumerate(tables)})
or you could use SQLAlchemy constructs to build the same query safely:
from sqlalchemy import table, column, union, and_, select, Integer, literal
tbls = [table(name,
column('a', Integer),
column('b', Integer)) for name in tables]
stmt = union(*[select([literal(name).label('name')]).
select_from(tbl).
where(and_(tbl.c.a == 1, tbl.c.b == 2))
for tbl, name in zip(tbls, tables)])
results = engine.execute(stmt)
You can use a UNION of queries that search each table.
(SELECT 'table1' AS table_name
FROM table1
WHERE a = 1 AND b = 2
LIMIT 1)
UNION
(SELECT 'table2' AS table_name
FROM table2
WHERE a = 1 AND b = 2
LIMIT 1)
UNION
(SELECT 'table3' AS table_name
FROM table3
WHERE a = 1 AND b = 2
LIMIT 1)
...

When add four parameter of values, annotate sum not work

I'm using Django 1.4 and Python 2.7.
I'm doing a Sum of some values... when I do this, this work perfect:
CategoryAnswers.objects.using('mam').filter(category=cat["category"], brand=cat["brand"], category__segment_category=cat["category__segment_category"]).values('category__name', 'brand__name','brand__pk').annotate(total=Sum('answer'))
And generate a query:
SELECT `category`.`name`, `brand`.`name`, `category_answers`.`brand_id`, SUM(`category_answers`.`answer`) AS `total`
FROM `category_answers`
INNER JOIN `category`
ON (`category_answers`.`category_id` = `category`.`id`)
INNER JOIN `brand`
ON (`category_answers`.`brand_id` = `brand`.`id`)
WHERE (`category_answers`.`category_id` = 6 AND
`category_answers`.`brand_id` = 1 AND
`category`.`segment_category_id` = 1 )
GROUP BY `category`.`name`, `brand`.`name`, `category_answers`.`brand_id`
ORDER BY NULL
But when I add a new value, this not work:
CategoryAnswers.objects.using('mam').order_by().filter(category=cat["category"], brand=cat["brand"], category__segment_category=cat["category__segment_category"]).values('category__name','category__pk','brand__name','brand__pk').annotate(total=Sum('answer'))
Seeing the query that is returned, the problem is django add on group by a wrong field (category_answers.id):
SELECT `category`.`name`, `category_answers`.`category_id`, `brand`.`name`, `category_answers`.`brand_id`,
SUM(`category_answers`.`answer`) AS `total`
FROM `category_answers`
INNER JOIN `category`
ON (`category_answers`.`category_id` = `category`.`id`)
INNER JOIN `brand`
ON (`category_answers`.`brand_id` = `brand`.`id`)
WHERE (`category_answers`.`category_id` = 6 AND
`category_answers`.`brand_id` = 1 AND
`category`.`segment_category_id` = 1 )
GROUP BY `category_answers`.`id`, `category`.`name`, `category_answers`.`category_id`, `brand`.`name`, `category_answers`.`brand_id`
ORDER BY NULL
If I remove any parameter this work, so I do not believe this to be problem specific parameter... Am I doing something wrong?
I can't resolve this, so... I do this with raw SQL query:
cursor = connections["mam"].cursor()
cursor.execute("SELECT B.name, A.category_id, A.brand_id, SUM(A.answer) AS total, C.name FROM category_answers A INNER JOIN category B ON A.category_id = B.id INNER JOIN brand C ON A.brand_id = C.id WHERE A.brand_id = %s AND A.category_id = %s AND B.segment_category_id = %s", [cat["brand"],cat["category"],cat["category__segment_category"]])
c_answers = cursor.fetchone()
This is not the best way, but it's works. :)

How can I write an SQLAlchemy Query with a Join and an Aggregate?

I have a table that has 3 columns: type, content and time (an integer). For each 'type', I want to select the entry with the greatest (most recent) 'time' integer and the corresponding data. How can I do this using SQLAlchemy and Python? I could do this using SQL by performing:
select
c.type,
c.time,
b.data
from
parts as b
inner join
(select
a.type,
max(a.time) as time
from parts as a
group by a.type) as c
on
b.type = c.type and
b.time = c.time
But how can I accomplish this in SQLAlchemy?
The table mapping:
class Structure(Base):
__tablename__ = 'structure'
id = Column(Integer, primary_key=True)
type = Column(Text)
content = Column(Text)
time = Column(Integer)
def __init__(self, type, content):
self.type = type
self.content = content
self.time = time.time()
def serialise(self):
return {"type" : self.type,
"content" : self.content};
The attempted query:
max = func.max(Structure.time).alias("time")
c = DBSession.query(max)\
.add_columns(Structure.type, Structure.time)\
.group_by(Structure.type)\
.subquery()
c.alias("c")
b = DBSession.query(Structure.content)\
.add_columns(c.c.type, c.c.time)\
.join(c, Structure.type == c.c.type)
Gives me:
sqlalchemy.exc.OperationalError: (OperationalError) near "(": syntax
error u'SELECT structure.content AS structure_content, anon_1.type AS
anon_1_type, anon_1.t ime AS anon_1_time \nFROM structure JOIN (SELECT
time.max_1 AS max_1, structure.type AS type, structure.time AS time
\nFROM max(structure.time) AS time, structu re GROUP BY
structure.type) AS anon_1 ON structure.type = anon_1.type' ()
I'm essentially stabbing in the dark, so any help would be appreciated.
Try the code below using sub-query:
subq = (session.query(
Structure.type,
func.max(Structure.time).label("max_time")
).
group_by(Structure.type)
).subquery()
qry = (session.query(Structure).
join(subq, and_(Structure.type == subq.c.type, Structure.time == subq.c.max_time))
)
print qry
producing SQL:
SELECT structure.id AS structure_id, structure.type AS structure_type, structure.content AS structure_content, structure.time AS structure_time
FROM structure
JOIN (SELECT structure.type AS type, max(structure.time) AS max_time
FROM structure GROUP BY structure.type) AS anon_1
ON structure.type = anon_1.type
AND structure.time = anon_1.max_time

sqlalchemy: union query few columns from multiple tables with condition

I'm trying to adapt some part of a MySQLdb application to sqlalchemy in declarative base. I'm only beginning with sqlalchemy.
The legacy tables are defined something like:
student: id_number*, semester*, stateid, condition, ...
choice: id_number*, semester*, choice_id, school, program, ...
We have 3 tables for each of them (student_tmp, student_year, student_summer, choice_tmp, choice_year, choice_summer), so each pair (_tmp, _year, _summer) contains information for a specific moment.
select *
from `student_tmp`
inner join `choice_tmp` using (`id_number`, `semester`)
My problem is the information that is important to me is to get the equivalent of the following select:
SELECT t.*
FROM (
(
SELECT st.*, ct.*
FROM `student_tmp` AS st
INNER JOIN `choice_tmp` as ct USING (`id_number`, `semester`)
WHERE (ct.`choice_id` = IF(right(ct.`semester`, 1)='1', '3', '4'))
AND (st.`condition` = 'A')
) UNION (
SELECT sy.*, cy.*
FROM `student_year` AS sy
INNER JOIN `choice_year` as cy USING (`id_number`, `semester`)
WHERE (cy.`choice_id` = 4)
AND (sy.`condition` = 'A')
) UNION (
SELECT ss.*, cs.*
FROM `student_summer` AS ss
INNER JOIN `choice_summer` as cs USING (`id_number`, `semester`)
WHERE (cs.`choice_id` = 3)
AND (ss.`condition` = 'A')
)
) as t
* used for shorten the select, but I'm actually only querying for about 7 columns out of the 50 availables.
This information is used in many flavors... "Do I have new students? Do I still have all students from a given date? Which students are subscribed after the given date? etc..." The result of this select statement is to be saved in another database.
Would it be possible for me to achieve this with a single view-like class? The information is read-only so I don't need to be able to modify/create/delte. Or do I have to declare a class for each table (ending up with 6 classes) and every time I need to query, I have to remember to filter?
Thanks for pointers.
EDIT: I don't have modification access to the database (I cannot create a view). Both databases may not be on the same server (so I cannot create a view on my second DB).
My concern is to avoid the full table scan before filtering on condition and choice_id.
EDIT 2: I've set up declarative classes like this:
class BaseStudent(object):
id_number = sqlalchemy.Column(sqlalchemy.String(7), primary_key=True)
semester = sqlalchemy.Column(sqlalchemy.String(5), primary_key=True)
unique_id_number = sqlalchemy.Column(sqlalchemy.String(7))
stateid = sqlalchemy.Column(sqlalchemy.String(12))
condition = sqlalchemy.Column(sqlalchemy.String(3))
class Student(BaseStudent, Base):
__tablename__ = 'student'
choices = orm.relationship('Choice', backref='student')
#class StudentYear(BaseStudent, Base):...
#class StudentSummer(BaseStudent, Base):...
class BaseChoice(object):
id_number = sqlalchemy.Column(sqlalchemy.String(7), primary_key=True)
semester = sqlalchemy.Column(sqlalchemy.String(5), primary_key=True)
choice_id = sqlalchemy.Column(sqlalchemy.String(1))
school = sqlalchemy.Column(sqlalchemy.String(2))
program = sqlalchemy.Column(sqlalchemy.String(5))
class Choice(BaseChoice, Base):
__tablename__ = 'choice'
__table_args__ = (
sqlalchemy.ForeignKeyConstraint(['id_number', 'semester',],
[Student.id_number, Student.semester,]),
)
#class ChoiceYear(BaseChoice, Base): ...
#class ChoiceSummer(BaseChoice, Base): ...
Now, the query that gives me correct SQL for one set of table is:
q = session.query(StudentYear, ChoiceYear) \
.select_from(StudentYear) \
.join(ChoiceYear) \
.filter(StudentYear.condition=='A') \
.filter(ChoiceYear.choice_id=='4')
but it throws an exception...
"Could not locate column in row for column '%s'" % key)
sqlalchemy.exc.NoSuchColumnError: "Could not locate column in row for column '*'"
How do I use that query to create myself a class I can use?
If you can create this view on the database, then you simply map the view as if it was a table. See Reflecting Views.
# DB VIEW
CREATE VIEW my_view AS -- #todo: your select statements here
# SA
my_view = Table('my_view', metadata, autoload=True)
# define view object
class ViewObject(object):
def __repr__(self):
return "ViewObject %s" % str((self.id_number, self.semester,))
# map the view to the object
view_mapper = mapper(ViewObject, my_view)
# query the view
q = session.query(ViewObject)
for _ in q:
print _
If you cannot create a VIEW on the database level, you could create a selectable and map the ViewObject to it. The code below should give you the idea:
student_tmp = Table('student_tmp', metadata, autoload=True)
choice_tmp = Table('choice_tmp', metadata, autoload=True)
# your SELECT part with the columns you need
qry = select([student_tmp.c.id_number, student_tmp.c.semester, student_tmp.stateid, choice_tmp.school])
# your INNER JOIN condition
qry = qry.where(student_tmp.c.id_number == choice_tmp.c.id_number).where(student_tmp.c.semester == choice_tmp.c.semester)
# other WHERE clauses
qry = qry.where(student_tmp.c.condition == 'A')
You can create 3 queries like this, then combine them with union_all and use the resulting query in the mapper:
view_mapper = mapper(ViewObject, my_combined_qry)
In both cases you have to ensure though that a PrimaryKey is properly defined on the view, and you might need to override the autoloaded view, and specify the primary key explicitely (see the link above). Otherwise you will either receive an error, or might not get proper results from the query.
Answer to EDIT-2:
qry = (session.query(StudentYear, ChoiceYear).
select_from(StudentYear).
join(ChoiceYear).
filter(StudentYear.condition == 'A').
filter(ChoiceYear.choice_id == '4')
)
The result will be tuple pairs: (Student, Choice).
But if you want to create a new mapped class for the query, then you can create a selectable as the sample above:
student_tmp = StudentTmp.__table__
choice_tmp = ChoiceTmp.__table__
.... (see sample code above)
This is to show what I ended up doing, any comment welcomed.
class JoinedYear(Base):
__table__ = sqlalchemy.select(
[
StudentYear.id_number,
StudentYear.semester,
StudentYear.stateid,
ChoiceYear.school,
ChoiceYear.program,
],
from_obj=StudentYear.__table__.join(ChoiceYear.__table__),
) \
.where(StudentYear.condition == 'A') \
.where(ChoiceYear.choice_id == '4') \
.alias('YearView')
and I will elaborate from there...
Thanks #van

Categories