sqlalchemy using INTERSECT and UNNEST - python

I'm trying to translate a raw SQL to sqlalchemy core/orm but I'm having some difficulties. Here is the SQL:
SELECT
(SELECT UNNEST(MyTable.my_array_column)
INTERSECT
SELECT UNNEST(ARRAY['VAL1', 'VAL2']::varchar[])) AS matched
FROM
MyTable
WHERE
my_array_column && ARRAY['VAL1', 'VAL2']::varchar[];
The following query, gives me a FROM clause which I don't need in my nested SELECT:
matched = select([func.unnest(MyTable.my_array_column)]).intersect(select([func.unnest('VAL1', 'VAL2')]))
# SELECT unnest(MyTable.my_array_colum) AS unnest_1
# FROM MyTable INTERSECT SELECT unnest(%(unnest_3)s, %(unnest_4)s) AS unnest_2
How can I tell the select to not include the FROM clause? Note that func.unnest() only accepts a column. So I cannot use func.unnest('my_array_column').

Referring to a table of an enclosing query in a subquery is the process of correlation, which SQLAlchemy attempts to do automatically. In this case, it doesn't quite work, I believe, because your INTERSECT query is a "selectable", not a scalar value, which SQLAlchemy attempts to put in the FROM list instead of the SELECT list.
The solution is twofold. We need to make SQLAlchemy put the INTERSECT query in the SELECT list by applying a label, and make it correlate MyTable correctly:
select([
select([func.unnest(MyTable.my_array_column)]).correlate(MyTable)
.intersect(select([func.unnest('VAL1', 'VAL2')]))
.label("matched")
]).select_from(MyTable)
# SELECT (SELECT unnest("MyTable".my_array_column) AS unnest_1 INTERSECT SELECT unnest(%(unnest_3)s, %(unnest_4)s) AS unnest_2) AS matched
# FROM "MyTable"

Related

jaydebeapi Getting column alias names

Is there a way to return the aliased column names from a sql query returned from JayDeBeApi?
For example, I have the following query:
sql = """ SELECT visitorid AS id_alias FROM table LIMIT 1 """
I then run the following (connect_to_vdm() establishes a connection to my DB):
curs = connect_to_vdm().cursor()
curs.execute(sql)
vals = curs.fetchall()
I normally retrieve column names like so:
desc = curs.description
column_names = [col[0] for col in desc]
This returns the original column name "visitorid" and not the alias specified in the query "id_alias".
I know I could swap the names for the value in Python, but hoping to be able to have this done within the query since it is already defined in the Select statement. This behaves as expected in a SQL client, but I cannot seem to get the Aliases to return when using python/JayDeBeApi. Is there a way to do this using JayDeBeApi?
EDIT:
I have discovered that structuring my query with a CTE seems to help fix the problem, but still wondering if there is a more straightforward solution out there. Here is how I rewrote the same query:
sql = """ WITH cte (id_alias) AS (SELECT visitorid AS id_alias FROM table LIMIT 1) SELECT id_alias from cte"""
I was able to fix this using a CTE (Common Table Expression)
sql = """ WITH cte (id_alias) AS (SELECT visitorid AS id_alias FROM table LIMIT 1) SELECT id_alias from cte"""
Hat tip to pybokeh on Github, but this worked for me.
According to IBM (here and here), the behavior of JDBC drivers changed at some point. Bizarrely, the column aliases display just fine when using a tool like DBVisualizer, but not by querying through jaydebeapi.
To fix, add the following to the end of your DB URL:
:useJDBC4ColumnNameAndLabelSemantics=false;
Example:
jdbc:db2://[DBSERVER]:[PORT]/[DBNAME]:useJDBC4ColumnNameAndLabelSemantics=false;

When SQLAlchemy decides to use subquery with .limit() method?

I have an error, when SQLAlchemy produced wrong SQL query, but I can't determine conditions.
I use Flask-SQLAlchemy and initially it's a just MyModel.query and it represented by simple SELECT with JOINs. But when .limit() method is applied, it transforms and uses subquery for fetch main objects and only then apply JOINs. The problem is in ORDER BY statement, which remains the same and ignores the subquery definition.
Here's example and I've simplify select fields:
-- Initially
SELECT *
FROM customer_rates
LEFT OUTER JOIN seasons AS seasons_1 ON seasons_1.id = customer_rates.season_id
LEFT OUTER JOIN users AS users_1 ON users_1.id = customer_rates.customer_id
-- other joins ...
ORDER BY customer_rates.id, customer_rates.id
-- Then .limit()
SELECT anon_1.*, *
FROM (
SELECT customer_rates.*
FROM customer_rates
LIMIT :param_1) AS anon_1
LEFT OUTER JOIN seasons AS seasons_1 ON seasons_1.id = anon_1.customer_rates_season_id
LEFT OUTER JOIN users AS users_1 ON users_1.id = anon_1.customer_rates_customer_id
-- other joins
ORDER BY customer_rates.id, customer_rates.id
And this query gives following error:
ProgrammingError: (psycopg2.ProgrammingError) missing FROM-clause entry for table "customer_rates"
The last line in query should be:
ORDER BY anon_1.customer_rates_id
The code, that produces this queries is a part of large application. I've tried to implement this from scratch in a small flask application, But I can't reproduce it. In small application it always uses a JOIN.
So I need to know, when SQLAlchemy decides to use subquery.
I use python 2.7 and PostgreSQL 9
The answer is pretty straightforward. It uses subquery when it joined table has many-to-one relations with queried model. So for producing correct number of results it limits the queried rows in the subquery

SQLite Inner Join with Limit on Left Table

Firstly, let me describe a scenario similar to the one I am facing; to better explain my issue. In this scenario, I am creating a system which needs to select 'n' random blog posts from a table and then get all the replies for those selected posts.
Imagine my structure like so:
blog_posts(id INTEGER PRIMARY KEY, thepost TEXT)
blog_replies(id INTEGER PRIMARY KEY, postid INTEGER FOREIGN KEY REFERENCES blog_posts(id), thereply TEXT)
This is the current SQL I have, but am getting an error:
SELECT blog_post.id, blog_post.thepost, blog_replies.id, blog_replies.thereply
FROM (SELECT blog_post.id, blog_post.thepost FROM blog_post ORDER BY RANDOM() LIMIT ?)
INNER JOIN blog_replies
ON blog_post.id=blog_replies_options.postid;
Here is the error:
sqlite3.OperationalError: no such column: hmquestion.id
Your query needs an alias added to the subquery. This will allow you to reference the fields from your subquery within the outer query:
SELECT hmquestion.id, hmquestion.question,
hmquestion_options.id, hmquestion_options.option
FROM (SELECT hmquestion.id, hmquestion.question
FROM hmquestion ORDER BY RANDOM() LIMIT ?) AS hmquestion <--Add Alias Here
INNER JOIN hmquestion_options
ON hmquestion.id=hmquestion_options.questionid;
As is, you're outer query doesn't know what hmquestion references.

Sort by a column in a union query in SqlAlchemy SQLite

As explained in this question, you can use string literals to do order by in unions.
For example, this works with Oracle:
querypart1 = select([t1.c.col1.label("a")]).order_by(t1.c.col1).limit(limit)
querypart2 = select([t2.c.col2.label("a")]).order_by(t2.c.col2).limit(limit)
query = querypart1.union_all(querypart2).order_by("a").limit(limit)
The order-by can take a string literal, which is the name of the column in the union result.
(There are gazillions of rows in partitioned tables and I'm trying to paginate the damn things)
When running against SQLite3, however, this generates an exception:
sqlalchemy.exc.OperationalError: (OperationalError) near "ORDER": syntax error
How can you order by the results of a union?
The queries that are part of a union query must not be sorted.
To be able to use limits inside a compound query, you must wrap the individual queries inside a separate subquery:
SELECT * FROM (SELECT ... LIMIT ...)
UNION ALL
SELECT * FROM (SELECT ... LIMIT ...)
q1 = select(...).limit(...).subquery()
q2 = select(...).limit(...).subquery()
query = q1.union_all(q2)...

SqlAlchemy select with max, group_by and order_by

I have to list the last modified resources for each group, for that I can do this query:
model.Session.query(
model.Resource, func.max(model.Resource.last_modified)
).group_by(model.Resource.resource_group_id).order_by(
model.Resource.last_modified.desc())
But SqlAlchemy complains with:
ProgrammingError: (ProgrammingError) column "resource.id" must appear in
the GROUP BY clause or be used in an aggregate function
How I can select only resource_group_id and last_modified columns?
In SQL what I want is this:
SELECT resource_group_id, max(last_modified) AS max_1
FROM resource GROUP BY resource_group_id ORDER BY max_1 DESC
model.Session.query(
model.Resource.resource_group_id, func.max(model.Resource.last_modified)
).group_by(model.Resource.resource_group_id).order_by(
func.max(model.Resource.last_modified).desc())
You already got it, but I'll try to explain what's going on with the original query for future reference.
In sqlalchemy if you specified query(model.Resource, ...), a model reference, it will list each column on the resource table in the generated SQL select statement, so your original query would look something like:
SELECT resource.resource_group_id AS resource_group_id,
resource.extra_column1 AS extra_column1,
resource.extra_column2 AS extra_column2,
...
count(resource.resource_group_id) AS max_1
GROUP BY resource_group_id ORDER BY max_1 DESC;
This won't work with a GROUP BY.
A common way to avoid this is to specify what columns you want to select explicitly by adding them to the query method .query(model.Resource.resource_group_id)

Categories