Incorrect SQL generated by SQLAlchemy

Incorrect SQL generated by SQLAlchemy - python

I have a function that returns a query that would fetch «New priorities for emails» by given account id.
First it selects a domain name for that account, and then selects a data structure for it.
And everything should be OK IMO, but not at this time: SQLAlchemy is generating SQL that is syntactically wrong, and I can’t understand how to fix it. Here are the samples:
def unprocessed_by_account_id(account_id: str):
account_domain = select(
[tables.organizations.c.organization_id]).select_from(
tables.accounts.join(
tables.email_addresses,
tables.accounts.c.account_id == tables.email_addresses.c.email_address,
).join(tables.organizations)
).where(
tables.accounts.c.account_id == account_id,
)
domain_with_subdomains = concat('%', account_domain)
fields = [
tables.users.c.first_name,
…
tables.priorities.c.name,
]
fromclause = tables.users.join(
…
).join(tables.organizations)
whereclause = and_(
…
tables.organizations.c.organization_id.notlike(
domain_with_subdomains),
)
stmt = select(fields).select_from(fromclause).where(whereclause)
return stmt
print(unprocessed_by_account_id(‘foo’))
So it generates:
SELECT
users.first_name,
…
priorities.name
FROM (SELECT organizations.organization_id AS organization_id
FROM accounts
JOIN email_addresses
ON accounts.account_id = email_addresses.email_address
JOIN organizations
ON organizations.organization_id = email_addresses.organization_id
WHERE accounts.account_id = :account_id_1), users
JOIN
…
JOIN organizations
ON organizations.organization_id = email_addresses.organization_id
WHERE emails.account_id = :account_id_2 AND
priorities_new_emails.status = :status_1 AND
organizations.organization_id NOT LIKE
concat(:concat_1, (SELECT organizations.organization_id
FROM accounts
JOIN email_addresses ON accounts.account_id =
email_addresses.email_address
JOIN organizations
ON organizations.organization_id =
email_addresses.organization_id
WHERE accounts.account_id = :account_id_1))
But the first
(SELECT organizations.organization_id AS organization_id
FROM accounts
JOIN email_addresses
ON accounts.account_id = email_addresses.email_address
JOIN organizations
ON organizations.organization_id = email_addresses.organization_id
WHERE accounts.account_id = :account_id_1)
Is redundant here and produces
[2017-05-29 23:49:51] [42601] ERROR: subquery in FROM must have an alias
[2017-05-29 23:49:51] Hint: For example, FROM (SELECT ...) [AS] foo.
[2017-05-29 23:49:51] Position: 245
I tried to use account_domain = account_domain.cte(), but no luck, except that the subquery went to WITH clause as expected.
Also I tried with_only_columns with no effect at all.
I think that Alchemy is adding this statement, because it sees it inside WHERE clause and thinks that without it the filtering will result in an error, but I’m not sure.
Also I must mention than in previous version of code the statement was almost the same except there were no concat(‘%’, account_domain) and notlike was !=.
Also I tried inserting alias here and there, but had no success with that either. And if I manually delete that first statement from the select is plain SQL, then I’d receive expectable results.
Any help is appreciated, thank you.

If you're using a subquery as a value, you need to declare it as_scalar():
domain_with_subdomains = concat('%', account_domain.as_scalar())

Related

Django ORM: window function with subsequent filtering

Answering this question, I found out that window functions are not allowed to combine with filter (technically, they are, but filter clause affects the window). There is a hint to wrap window function in an inner query, so that final SQL looks like this (as I understand):
SELECT * FROM (
SELECT *, *window_function* FROM TABLE)
WHERE *filtering_conditions*
The question is: how can I write this query with Django ORM?

Another solution is Common Table Expressions (CTE), and with the help of django-cte, you could achieve what you want:
cte = With(
YouModel.objects.annotate(
your_window_function=Window(...),
)
)
qs = cte.queryset().with_cte(cte).filter(your_window_function='something')
Which translates roughly to:
WITH cte as (
SELECT *, WINDOW(...) as your_window_function
FROM yourmodel
)
SELECT *
FROM cte
WHERE cte.your_window_function = 'something'

There are developers interested in solving it but it's not something possible with the ORM right now.
One proposed solution would be to add a QuerySet.subquery() or .wrap() method that pushes the queryset within a subquery so it can then be filtered.
Update, Django 4.2 will support it.

You need to use raw query. In order to do multiple queries at one. for further information django documentation
for p in Person.objects.raw('''
SELECT * FROM (SELECT *, *window_function* FROM TABLE)
WHERE *filtering_conditions*'''):
print(p)
# John Smith
# Jane Jones
Other thing you can do is the following.
model.py
class Category(models.Model):
name = models.CharField(max_length=100)
class Hero(models.Model):
# ...
name = models.CharField(max_length=100)
category = models.ForeignKey(Category, on_delete=models.CASCADE)
benevolence_factor = models.PositiveSmallIntegerField(
help_text="How benevolent this hero is?",
default=50
)
querySet.py
hero_qs = Hero.objects.filter(category=OuterRef("pk"))
.order_by("-benevolence_factor")
Category.objects.all()
.annotate(most_benevolent_hero=Subquery(hero_qs.values('name')[:1]))
Generated SQL would look like this..
SELECT "entities_category"."id",
"entities_category"."name",
(SELECT U0."name"
FROM "entities_hero" U0
WHERE U0."category_id" = ("entities_category"."id")
ORDER BY U0."benevolence_factor" DESC
LIMIT 1) AS "most_benevolent_hero"
FROM "entities_category"

How can I get the youngest objects from SQLAlchemy?

Each row in my table has a date. The date is not unique. The same date is present more than one time.
I want to get all objects with the youngest date.
My solution work but I am not sure if this is a elegent SQLAlchemy way.
query = _session.query(Table._date) \
.order_by(Table._date.desc()) \
.group_by(Table._date)
# this is the younges date (type is date.datetime)
young = query.first()
query = _session.query(Table).filter(Table._date==young)
result = query.all()
Isn't there a way to put all this in one query object or something like that?

You need a having clause, and you need to import the max function
then your query will be:
from sqlalchemy import func
stmt = _session.query(Table) \
.group_by(Table._date) \
.having(Table._date == func.max(Table._date)
This produces a sql statement like the following.
SELECT my_table.*
FROM my_table
GROUP BY my_table._date
HAVING my_table._date = MAX(my_table._date)
If you construct your sql statement with a select, you can examine the sql produced in your case using. *I'm not sure if this would work with statements query
str(stmt)

Two ways of doing this using a sub-query:
# #note: do not need to alias, but do in order to specify `name`
T1 = aliased(MyTable, name="T1")
# version-1:
subquery = (session.query(func.max(T1._date).label("max_date"))
.as_scalar()
)
# version-2:
subquery = (session.query(T1._date.label("max_date"))
.order_by(T1._date.desc())
.limit(1)
.as_scalar()
)
qry = session.query(MyTable).filter(MyTable._date == subquery)
results = qry.all()
The output should be similar to:
# version-1
SELECT my_table.id AS my_table_id, my_table.name AS my_table_name, my_table._date AS my_table__date
FROM my_table
WHERE my_table._date = (
SELECT max("T1"._date) AS max_date
FROM my_table AS "T1")
# version-2
SELECT my_table.id AS my_table_id, my_table.name AS my_table_name, my_table._date AS my_table__date
FROM my_table
WHERE my_table._date = (
SELECT "T1"._date AS max_date
FROM my_table AS "T1"
ORDER BY "T1"._date DESC LIMIT ? OFFSET ?
)

DatabaseError: more than one row returned by a subquery used as an expression (Django)

Django through a DatabaseError when I try to use a merged query set. My code is
assetsNetwork = Asset.objects.filter(client=myClient, module__label__in=network_label_list)
vulnsNetworkRaw = Vuln.objects.none()
for asset in assetsNetwork:
vulnsNetworkRaw = vulnsNetworkRaw | asset.latest_vulns
logging.debug("+++%s+++"%vulnsNetworkRaw)
The error message is
DatabaseError: more than one row returned by a subquery used as an expression
The .latest_vulns method is
#property
def latest_scan(self):
from arachni.models import WebScan, Vulns as WebVuln
my_module = self.module
try:
return Scan.objects.filter(assets__id=self.id, status='Audit Complete').latest('completed_Date')
except:
return Scan.objects.none()
#property
def latest_vulns(self):
from arachni.models import WebScan, Vulns as WebVuln
latest_scan = self.latest_scan
return Vuln.objects.filter(scan=latest_scan, host=self.IP_Address)
Query:
2012-08-07 16:44:38 EDT STATEMENT: SELECT "pegasus_vuln"."id", "pegasus_vuln"."nvt_id", "pegasus_vuln"."scan_id", "pegasus_vuln"."host", "pegasus_vuln"."port", "pegasus_vuln"."risk_factor", "pegasus_vuln"."cvss_score", "pegasus_vuln"."status", "pegasus_vuln"."change", "pegasus_vuln"."comment", "pegasus_vuln"."description", "pegasus_vuln"."solution", "pegasus_vuln"."_order" FROM "pegasus_vuln" WHERE (("pegasus_vuln"."host" = '192.168.2.251' AND "pegasus_vuln"."scan_id" = 95 ) OR ("pegasus_vuln"."host" = '192.168.2.5' AND "pegasus_vuln"."scan_id" = (SELECT U0."id" FROM "pegasus_scan" U0)) OR ("pegasus_vuln"."host" = '10.1.10.244' AND "pegasus_vuln"."scan_id" = 109 ) OR ("pegasus_vuln"."host" = '192.168.2.5' AND "pegasus_vuln"."scan_id" = (SELECT U0."id" FROM "pegasus_scan" U0)) OR ("pegasus_vuln"."host" = '192.168.2.248' AND "pegasus_vuln"."scan_id" = (SELECT U0."id" FROM "pegasus_scan" U0))) ORDER BY "pegasus_vuln"."_order" ASC LIMIT 21
2012-08-07 16:44:38 EDT ERROR: more than one row returned by a subquery used as an expression
It successfully logs for several times, but gives an error also in the logging line. Could anybody help me? Thanks a lot.

Check code like this in your SQL. You need to use IN operator if more than one result can be fetched from the nested SQL.
"pegasus_vuln"."scan_id" = (SELECT U0."id" FROM "pegasus_scan" U0))

The problem have been solved weirdly. I added logging in latest_vulns to evaluate the query set, then everything works fine. It works even after I removed the logging.

sqlalchemy: union query few columns from multiple tables with condition

I'm trying to adapt some part of a MySQLdb application to sqlalchemy in declarative base. I'm only beginning with sqlalchemy.
The legacy tables are defined something like:
student: id_number*, semester*, stateid, condition, ...
choice: id_number*, semester*, choice_id, school, program, ...
We have 3 tables for each of them (student_tmp, student_year, student_summer, choice_tmp, choice_year, choice_summer), so each pair (_tmp, _year, _summer) contains information for a specific moment.
select *
from `student_tmp`
inner join `choice_tmp` using (`id_number`, `semester`)
My problem is the information that is important to me is to get the equivalent of the following select:
SELECT t.*
FROM (
(
SELECT st.*, ct.*
FROM `student_tmp` AS st
INNER JOIN `choice_tmp` as ct USING (`id_number`, `semester`)
WHERE (ct.`choice_id` = IF(right(ct.`semester`, 1)='1', '3', '4'))
AND (st.`condition` = 'A')
) UNION (
SELECT sy.*, cy.*
FROM `student_year` AS sy
INNER JOIN `choice_year` as cy USING (`id_number`, `semester`)
WHERE (cy.`choice_id` = 4)
AND (sy.`condition` = 'A')
) UNION (
SELECT ss.*, cs.*
FROM `student_summer` AS ss
INNER JOIN `choice_summer` as cs USING (`id_number`, `semester`)
WHERE (cs.`choice_id` = 3)
AND (ss.`condition` = 'A')
)
) as t
* used for shorten the select, but I'm actually only querying for about 7 columns out of the 50 availables.
This information is used in many flavors... "Do I have new students? Do I still have all students from a given date? Which students are subscribed after the given date? etc..." The result of this select statement is to be saved in another database.
Would it be possible for me to achieve this with a single view-like class? The information is read-only so I don't need to be able to modify/create/delte. Or do I have to declare a class for each table (ending up with 6 classes) and every time I need to query, I have to remember to filter?
Thanks for pointers.
EDIT: I don't have modification access to the database (I cannot create a view). Both databases may not be on the same server (so I cannot create a view on my second DB).
My concern is to avoid the full table scan before filtering on condition and choice_id.
EDIT 2: I've set up declarative classes like this:
class BaseStudent(object):
id_number = sqlalchemy.Column(sqlalchemy.String(7), primary_key=True)
semester = sqlalchemy.Column(sqlalchemy.String(5), primary_key=True)
unique_id_number = sqlalchemy.Column(sqlalchemy.String(7))
stateid = sqlalchemy.Column(sqlalchemy.String(12))
condition = sqlalchemy.Column(sqlalchemy.String(3))
class Student(BaseStudent, Base):
__tablename__ = 'student'
choices = orm.relationship('Choice', backref='student')
#class StudentYear(BaseStudent, Base):...
#class StudentSummer(BaseStudent, Base):...
class BaseChoice(object):
id_number = sqlalchemy.Column(sqlalchemy.String(7), primary_key=True)
semester = sqlalchemy.Column(sqlalchemy.String(5), primary_key=True)
choice_id = sqlalchemy.Column(sqlalchemy.String(1))
school = sqlalchemy.Column(sqlalchemy.String(2))
program = sqlalchemy.Column(sqlalchemy.String(5))
class Choice(BaseChoice, Base):
__tablename__ = 'choice'
__table_args__ = (
sqlalchemy.ForeignKeyConstraint(['id_number', 'semester',],
[Student.id_number, Student.semester,]),
)
#class ChoiceYear(BaseChoice, Base): ...
#class ChoiceSummer(BaseChoice, Base): ...
Now, the query that gives me correct SQL for one set of table is:
q = session.query(StudentYear, ChoiceYear) \
.select_from(StudentYear) \
.join(ChoiceYear) \
.filter(StudentYear.condition=='A') \
.filter(ChoiceYear.choice_id=='4')
but it throws an exception...
"Could not locate column in row for column '%s'" % key)
sqlalchemy.exc.NoSuchColumnError: "Could not locate column in row for column '*'"
How do I use that query to create myself a class I can use?

If you can create this view on the database, then you simply map the view as if it was a table. See Reflecting Views.
# DB VIEW
CREATE VIEW my_view AS -- #todo: your select statements here
# SA
my_view = Table('my_view', metadata, autoload=True)
# define view object
class ViewObject(object):
def __repr__(self):
return "ViewObject %s" % str((self.id_number, self.semester,))
# map the view to the object
view_mapper = mapper(ViewObject, my_view)
# query the view
q = session.query(ViewObject)
for _ in q:
print _
If you cannot create a VIEW on the database level, you could create a selectable and map the ViewObject to it. The code below should give you the idea:
student_tmp = Table('student_tmp', metadata, autoload=True)
choice_tmp = Table('choice_tmp', metadata, autoload=True)
# your SELECT part with the columns you need
qry = select([student_tmp.c.id_number, student_tmp.c.semester, student_tmp.stateid, choice_tmp.school])
# your INNER JOIN condition
qry = qry.where(student_tmp.c.id_number == choice_tmp.c.id_number).where(student_tmp.c.semester == choice_tmp.c.semester)
# other WHERE clauses
qry = qry.where(student_tmp.c.condition == 'A')
You can create 3 queries like this, then combine them with union_all and use the resulting query in the mapper:
view_mapper = mapper(ViewObject, my_combined_qry)
In both cases you have to ensure though that a PrimaryKey is properly defined on the view, and you might need to override the autoloaded view, and specify the primary key explicitely (see the link above). Otherwise you will either receive an error, or might not get proper results from the query.
Answer to EDIT-2:
qry = (session.query(StudentYear, ChoiceYear).
select_from(StudentYear).
join(ChoiceYear).
filter(StudentYear.condition == 'A').
filter(ChoiceYear.choice_id == '4')
)
The result will be tuple pairs: (Student, Choice).
But if you want to create a new mapped class for the query, then you can create a selectable as the sample above:
student_tmp = StudentTmp.__table__
choice_tmp = ChoiceTmp.__table__
.... (see sample code above)

This is to show what I ended up doing, any comment welcomed.
class JoinedYear(Base):
__table__ = sqlalchemy.select(
[
StudentYear.id_number,
StudentYear.semester,
StudentYear.stateid,
ChoiceYear.school,
ChoiceYear.program,
],
from_obj=StudentYear.__table__.join(ChoiceYear.__table__),
) \
.where(StudentYear.condition == 'A') \
.where(ChoiceYear.choice_id == '4') \
.alias('YearView')
and I will elaborate from there...
Thanks #van

Getting the id of the last record inserted for Postgresql SERIAL KEY with Python

I am using SQLAlchemy without the ORM, i.e. using hand-crafted SQL statements to directly interact with the backend database. I am using PG as my backend database (psycopg2 as DB driver) in this instance - I don't know if that affects the answer.
I have statements like this,for brevity, assume that conn is a valid connection to the database:
conn.execute("INSERT INTO user (name, country_id) VALUES ('Homer', 123)")
Assume also that the user table consists of the columns (id [SERIAL PRIMARY KEY], name, country_id)
How may I obtain the id of the new user, ideally, without hitting the database again?

You might be able to use the RETURNING clause of the INSERT statement like this:
result = conn.execute("INSERT INTO user (name, country_id) VALUES ('Homer', 123)
RETURNING *")
If you only want the resulting id:
result = conn.execute("INSERT INTO user (name, country_id) VALUES ('Homer', 123)
RETURNING id")
[new_id] = result.fetchone()

User lastrowid
result = conn.execute("INSERT INTO user (name, country_id) VALUES ('Homer', 123)")
result.lastrowid

Current SQLAlchemy documentation suggests
result.inserted_primary_key should work!

Python + SQLAlchemy
after commit, you get the primary_key column id (autoincremeted) updated in your object.
db.session.add(new_usr)
db.session.commit() #will insert the new_usr data into database AND retrieve id
idd = new_usr.usrID # usrID is the autoincremented primary_key column.
return jsonify(idd),201 #usrID = 12, correct id from table User in Database.

this question has been asked many times on stackoverflow and no answer I have seen is comprehensive. Googling 'sqlalchemy insert get id of new row' brings up a lot of them.
There are three levels to SQLAlchemy.
Top: the ORM.
Middle: Database abstraction (DBA) with Table classes etc.
Bottom: SQL using the text function.
To an OO programmer the ORM level looks natural, but to a database programmer it looks ugly and the ORM gets in the way. The DBA layer is an OK compromise. The SQL layer looks natural to database programmers and would look alien to an OO-only programmer.
Each level has it own syntax, similar but different enough to be frustrating. On top of this there is almost too much documentation online, very hard to find the answer.
I will describe how to get the inserted id AT THE SQL LAYER for the RDBMS I use.
Table: User(user_id integer primary autoincrement key, user_name string)
conn: Is a Connection obtained within SQLAlchemy to the DBMS you are using.
SQLite
======
insstmt = text(
'''INSERT INTO user (user_name)
VALUES (:usernm) ''' )
# Execute within a transaction (optional)
txn = conn.begin()
result = conn.execute(insstmt, usernm='Jane Doe')
# The id!
recid = result.lastrowid
txn.commit()
MS SQL Server
=============
insstmt = text(
'''INSERT INTO user (user_name)
OUTPUT inserted.record_id
VALUES (:usernm) ''' )
txn = conn.begin()
result = conn.execute(insstmt, usernm='Jane Doe')
# The id!
recid = result.fetchone()[0]
txn.commit()
MariaDB/MySQL
=============
insstmt = text(
'''INSERT INTO user (user_name)
VALUES (:usernm) ''' )
txn = conn.begin()
result = conn.execute(insstmt, usernm='Jane Doe')
# The id!
recid = conn.execute(text('SELECT LAST_INSERT_ID()')).fetchone()[0]
txn.commit()
Postgres
========
insstmt = text(
'''INSERT INTO user (user_name)
VALUES (:usernm)
RETURNING user_id ''' )
txn = conn.begin()
result = conn.execute(insstmt, usernm='Jane Doe')
# The id!
recid = result.fetchone()[0]
txn.commit()

result.inserted_primary_key
Worked for me. The only thing to note is that this returns a list that contains that last_insert_id.

Make sure you use fetchrow/fetch to receive the returning object
insert_stmt = user.insert().values(name="homer", country_id="123").returning(user.c.id)
row_id = await conn.fetchrow(insert_stmt)

For Postgress inserts from python code is simple to use "RETURNING" keyword with the "col_id" (name of the column which you want to get the last inserted row id) in insert statement at end
syntax -
from sqlalchemy import create_engine
conn_string = "postgresql://USERNAME:PSWD#HOSTNAME/DATABASE_NAME"
db = create_engine(conn_string)
conn = db.connect()
INSERT INTO emp_table (col_id, Name ,Age)
VALUES(3,'xyz',30) RETURNING col_id;
or
(if col_id column is auto increment)
insert_sql = (INSERT INTO emp_table (Name ,Age)
VALUES('xyz',30) RETURNING col_id;)
result = conn.execute(insert_sql)
[last_row_id] = result.fetchone()
print(last_row_id)
#output = 3
ex -

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Incorrect SQL generated by SQLAlchemy - python

If you're using a subquery as a value, you need to declare it as_scalar(): domain_with_subdomains = concat('%', account_domain.as_scalar())

Related

Django ORM: window function with subsequent filtering

How can I get the youngest objects from SQLAlchemy?

DatabaseError: more than one row returned by a subquery used as an expression (Django)

sqlalchemy: union query few columns from multiple tables with condition

Getting the id of the last record inserted for Postgresql SERIAL KEY with Python

Categories

Resources