I have a two sqlalchemy query objects (q1 and q2) - they both belong to the same table and I want to be able to intersect the two queries. Since my database is MySql, q1.intersect(q2) throws sql syntax error. Is there a way to perform intersect MySql queries in sqlalchemy? My research pointed at using subqueries, aliases and left joins but all these solutions are native sql queries. I am looking for a sqlalchemy syntax.
Query:
q1 = Model1().query().filter(Model1.description.ilike(%aws%))
q2 = Model1().query().filter(Model1.tags.ilike(%cloud%))
I want to return q1.intersect(q2)
Also, what I have specified here as queries is just one of the cases of a broader set. I have a function which takes in an operator (and/or) and two operands(sql alchemy query objects, q1 and q2) which can be different and complex for different function calls. In this case, I cannot do a nested filter. I need to work with just q1 and q2.
For this simple case, you could just use two filters in the same query
results = db.query(Model1).filter(
Model1.description.ilike('%aws'),
Model1.tags.ilike('%cloud%')
)
This would return the same results as an intersect.
With two separate queries:
stmt = q2.subquery()
results = q1.outerjoin(stmt, Model1.id==stmt.c.id).filter(stmt.c.id != None)
Related
With a "partially unknown query", I mean a query which is composed of sub-queries where the original SqlAlchemy sub-query objects are not available / not known to the entity that is working with the query.
Consider the following example: I have a function that produces some query that contains sub-queries, which serves as a basis for specialized queries. An example of such a function could look like this:
def get_base_query(user_id: int) -> Query:
max_reps = get_max_reps_query().subquery()
user_reps = get_user_reps_query(user_id).subquery()
return (
session
.query(
max_reps.c.topic.label('topic'),
(max_reps.c.reps - user_reps.c.reps).label('reps'),
)
.select_from(
max_reps.join(user_reps, max_reps.c.topic == user_reps.c.topic)
)
)
Some other function is going to receive that Query object and wants to extend it. This function only knows the the structure of the input query result (i.e. two columns topic and reps). Say this function needs to extend the query in a way to limit the rows to match a certain topic.
I've tried the following ways to achieve this, but the behavior is as outlined in the comments below:
from sqlalchemy import func as F
def query_filter_topic(query: Query, topic: str) -> Query:
# Doesn't actually filter the rows.
query = query.filter(Exercise.topic == topic)
# Doesn't actually filter the rows.
query = query.filter(query.subquery().c.topic == topic)
# Results in 0 rows.
query = query.filter(F.topic == topic)
...
My hypothesis is that none of these column references match any of the columns of the outer-most SELECT of the query (because they are dynamically created columns). I was really expecting the subquery().c.exercise_name bit to work.
Is there a "correct" way to extend the query? Given that I explicitly labeled the columns in get_base_query(), I feel like there should be a way to reference those columns.
On a side note, I would expect SqlAlchemy to throw an error rather than silently accepting these columns that it apparently can't process. The filter() calls that doesn't filter rows doesn't seem to change the Query object at all (judging by the str() representation).
The only way I could figure out to get it working is the below, but it doesn't feel like the right way to do it.
def query_filter_topic(query: Query, topic: str) -> Query:
query = query.subquery()
query = session.query(*query.c).filter(query.c.topic == topic)
I have an error, when SQLAlchemy produced wrong SQL query, but I can't determine conditions.
I use Flask-SQLAlchemy and initially it's a just MyModel.query and it represented by simple SELECT with JOINs. But when .limit() method is applied, it transforms and uses subquery for fetch main objects and only then apply JOINs. The problem is in ORDER BY statement, which remains the same and ignores the subquery definition.
Here's example and I've simplify select fields:
-- Initially
SELECT *
FROM customer_rates
LEFT OUTER JOIN seasons AS seasons_1 ON seasons_1.id = customer_rates.season_id
LEFT OUTER JOIN users AS users_1 ON users_1.id = customer_rates.customer_id
-- other joins ...
ORDER BY customer_rates.id, customer_rates.id
-- Then .limit()
SELECT anon_1.*, *
FROM (
SELECT customer_rates.*
FROM customer_rates
LIMIT :param_1) AS anon_1
LEFT OUTER JOIN seasons AS seasons_1 ON seasons_1.id = anon_1.customer_rates_season_id
LEFT OUTER JOIN users AS users_1 ON users_1.id = anon_1.customer_rates_customer_id
-- other joins
ORDER BY customer_rates.id, customer_rates.id
And this query gives following error:
ProgrammingError: (psycopg2.ProgrammingError) missing FROM-clause entry for table "customer_rates"
The last line in query should be:
ORDER BY anon_1.customer_rates_id
The code, that produces this queries is a part of large application. I've tried to implement this from scratch in a small flask application, But I can't reproduce it. In small application it always uses a JOIN.
So I need to know, when SQLAlchemy decides to use subquery.
I use python 2.7 and PostgreSQL 9
The answer is pretty straightforward. It uses subquery when it joined table has many-to-one relations with queried model. So for producing correct number of results it limits the queried rows in the subquery
As explained in this question, you can use string literals to do order by in unions.
For example, this works with Oracle:
querypart1 = select([t1.c.col1.label("a")]).order_by(t1.c.col1).limit(limit)
querypart2 = select([t2.c.col2.label("a")]).order_by(t2.c.col2).limit(limit)
query = querypart1.union_all(querypart2).order_by("a").limit(limit)
The order-by can take a string literal, which is the name of the column in the union result.
(There are gazillions of rows in partitioned tables and I'm trying to paginate the damn things)
When running against SQLite3, however, this generates an exception:
sqlalchemy.exc.OperationalError: (OperationalError) near "ORDER": syntax error
How can you order by the results of a union?
The queries that are part of a union query must not be sorted.
To be able to use limits inside a compound query, you must wrap the individual queries inside a separate subquery:
SELECT * FROM (SELECT ... LIMIT ...)
UNION ALL
SELECT * FROM (SELECT ... LIMIT ...)
q1 = select(...).limit(...).subquery()
q2 = select(...).limit(...).subquery()
query = q1.union_all(q2)...
If I've got an SQLAlchemy ORM query:
admin_users = Session.query(User).filter_by(is_admin=True)
Is it possible to modify the columns returned by that query?
For example, so that I could select only the User.id column, and use that in a sub query:
admin_email_addresses = Session.query(EmailAddress)\
.filter(EmailAddress.user_id.in_(admin_users.select_columns(User.id))
Note: the .values() method will not work, as it executes the query and returns an iterable of results (so, ex, EmailAddress.user_id.in_(admin_users.values(User.id)) will perform two queries, not one).
I know that I could modify the first query to be Session.query(User.id), but I'm specifically wondering how I could modify the columns returned by a query.
I feel your pain on the values() thing. In 0.6.5 I added with_entities() which is just like values() except doesn't iterate:
q = q.with_entities(User.id)
Assuming that your Address.user_id defines a ForeignKey, the query below will do the job more efficiently compared to IN operator:
admin_email_addresses = session.query(EmailAddress).\
join(User).filter(User.is_admin==True)
If you do not have a ForeignKey (although you should), you can specify the join condition explicitely:
admin_email_addresses = session.query(EmailAddress).\
join(User, User.id==EmailAddress.user_id).filter(User.is_admin==True)
But if you really would like to do it with in_ operator, here you go (note the subquery):
subq = session.query(User.id).filter(User.is_admin==True).subquery()
admin_email_addresses = session.query(EmailAddress).\
filter(EmailAddress.user_id.in_(subq))
I have a very large db that I am working with, and I need to know how to select a large set of id's which doesn't have any real pattern to them. This is segment of code I have so far:
longIdList = [1, 3, 5 ,8 ....................................]
for id in longIdList
sql = "select * from Table where id = %s" %id
result = cursor.execute(sql)
print result.fetchone()
I was thinking, That there must be a quicker way of doing this... I mean my script needs to search through a db that has over 4 million id's. Is there a way that I can use a select command to grab them all in one shot. could I use the where statement with a list of id's? Thanks
Yes, you can use SQL's IN() predicate to compare a column to a set of values. This is standard SQL and it's supported by every SQL database.
There may be a practical limit to the number of values you can put in an IN() predicate before it becomes too inefficient or simply exceeds a length limit on SQL queries. The largest practical list of values depends on what database you use (in Oracle it's 1000, MS SQL Server it's around 2000). My feeling is that if your list exceeds a few dozen values, I'd seek another solution.
For example, #ngroot suggests using a temp table in his answer. For analysis of this solution, see this blog by StackOverflow regular #Quassnoi: Passing parameters in MySQL: IN list vs. temporary table.
Parameterizing a list of values into an SQL query a safe way can be tricky. You should be mindful of the risk of SQL injection.
Also see this popular question on Stack Overflow: Parameterizing a SQL IN clause?
You can use IN to look for multiple items simultaneously:
SELECT * FROM Table WHERE id IN (x, y, z, ...)
So maybe something like:
sql = "select * from Table where id in (%s)" % (', '.join(str(id) for id in longIdList))
Serialize the list in some fashion (comma-separated or XML would be reasonable choices), then have a stored procedure on the other side that will deserialize the list into a temp table. You can then do an INNER JOIN against the temp table.