I'm trying to use Django ORM to generate a queryset and I can't find how to use an OuterRef in the joining condition with a FilteredRelation.
What I have in Django
Main queryset
queryset = LineOutlier.objects.filter(report=self.kwargs['report_pk'], report__apn__customer__cen_id=self.kwargs['customer_cen_id']) \
.select_related('category__traffic') \
.select_related('category__frequency') \
.select_related('category__stability') \
.prefetch_related('category__traffic__labels') \
.prefetch_related('category__frequency__labels') \
.prefetch_related('category__stability__labels') \
.annotate(history=subquery)
The subquery
subquery = ArraySubquery(
LineOutlierReport.objects
.filter((Q(lineoutlier__imsi=OuterRef('imsi')) | Q(lineoutlier__isnull=True)) & Q(id__in=last_5_reports_ids))
.values(json=JSONObject(
severity='lineoutlier__severity',
report_id='id',
report_start_date='start_date',
report_end_date='end_date'
)
)
)
The request can be executed, but the SQL generated is not exactly what I want :
SQL Generated
SELECT "mlformalima_lineoutlier"."id",
"mlformalima_lineoutlier"."imsi",
ARRAY(
SELECT JSONB_BUILD_OBJECT('severity', V1."severity", 'report_id', V0."id", 'report_start_date', V0."start_date", 'report_end_date', V0."end_date") AS "json"
FROM "mlformalima_lineoutlierreport" V0
LEFT OUTER JOIN "mlformalima_lineoutlier" V1
ON (V0."id" = V1."report_id")
WHERE ((V1."imsi" = ("mlformalima_lineoutlier"."imsi") OR V1."id" IS NULL) AND V0."id" IN (SELECT DISTINCT ON (U0."id") U0."id" FROM "mlformalima_lineoutlierreport" U0 WHERE U0."apn_id" = 2 ORDER BY U0."id" ASC, U0."end_date" DESC LIMIT 5))
) AS "history",
FROM "mlformalima_lineoutlier"
The problem here is that the OuterRef condition (V1."imsi" = ("mlformalima_lineoutlier"."imsi")) is done on the WHERE statement, and I want it to be on the JOIN statement
What I want in SQL
SELECT "mlformalima_lineoutlier"."id",
"mlformalima_lineoutlier"."imsi",
ARRAY(
SELECT JSONB_BUILD_OBJECT('severity', V1."severity", 'report_id', V0."id", 'report_start_date', V0."start_date", 'report_end_date', V0."end_date") AS "json"
FROM "mlformalima_lineoutlierreport" V0
LEFT OUTER JOIN "mlformalima_lineoutlier" V1
ON (V0."id" = V1."report_id" AND ((V1."id" IS NULL) OR V1."imsi" = ("mlformalima_lineoutlier"."imsi")))
WHERE V0."id" IN (SELECT DISTINCT ON (U0."id") U0."id" FROM "mlformalima_lineoutlierreport" U0 WHERE U0."apn_id" = 2 ORDER BY U0."id" ASC, U0."end_date" DESC LIMIT 5))
) AS "history",
FROM "mlformalima_lineoutlier"
What I tried in Django
I tried to use the FilteredRelation to change the JOIN condition, but I can't seem to use it in combination with an OuterRef
subquery = ArraySubquery(
LineOutlierReport.objects
.annotate(filtered_relation=FilteredRelation('lineoutlier', condition=Q(lineoutlier__imsi=OuterRef('imsi')) | Q(lineoutlier__isnull=True)))
.filter(Q(id__in=last_5_reports_ids))
.values(json=JSONObject(
severity='filtered_relation__severity',
report_id='id',
report_start_date='start_date',
report_end_date='end_date'
)
)
)
I can't execute this query because of the following error
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
How can I modify my query to make it work ?
This looks like this Django bug. As a workaround you can annotate another column and reference it in the FilteredRelation, like so :
subquery = ArraySubquery(
LineOutlierReport.objects
.annotate(
outer_imsi=OuterRef('imsi'),
filtered_relation=FilteredRelation('lineoutlier', condition=Q(lineoutlier__imsi=F('outer_imsi')) | Q(lineoutlier__isnull=True)))
.filter(Q(id__in=last_5_reports_ids))
.values(json=JSONObject(
severity='filtered_relation__severity',
report_id='id',
report_start_date='start_date',
report_end_date='end_date'
)
)
)
That way you avoid OuterRef being processed inside FilteredRelation.
Related
I have a complex SQL query as below which I am using to access MySQL db from a Python script.
sql_query_vav = """SELECT t1.deviceId, t1.date, t1.vavId, t1.timestamp, t1.nvo_airflow as airflow, t1.nvo_air_damper_position as damper_position , t1.nvo_temperature_sensor_pps as vavTemperature , d.MILO as miloId ,m1.timestamp as miloTimestamp, m1.temperature as miloTemperature
FROM
(SELECT deviceId, date, nvo_airflow, nvo_air_damper_position, nvo_temperature_sensor_pps, vavId, timestamp, counter from vavData where date=%s and floor=%s) t1
INNER JOIN
(SELECT date,max(timestamp) as timestamp,vavId from vavData where date=%s and floor=%s group by vavId) t2
ON (t1.timestamp = t2.timestamp)
INNER JOIN
(SELECT VAV,MILO,floor from VavMiloMapping where floor = %s) d
ON (t1.vavId = d.VAV )
INNER JOIN
(SELECT t1.deviceId,t1.date,t1.timestamp,t1.humidity,t1.temperature,t1.block,t1.floor,t1.location
FROM
(SELECT deviceId,date,timestamp,humidity,temperature,block,floor,location from miloData WHERE date=%s and floor=%s) t1
INNER JOIN
(SELECT deviceId,max(timestamp) as timestamp,location from miloData where date=%s and floor=%s GROUP BY deviceId) t2
ON (t1.timestamp = t2.timestamp)) m1
ON (d.MILO = m1.location) order by t1.vavId"""
I get an error with the above query which says
mysql.connector.errors.ProgrammingError: 1055 (42000): Expression #3 of SELECT list is not in GROUP BY
clause and contains nonaggregated column 'minniedb.miloData.location' which is not functionally dependent
on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
I have tried to change the SQL mode by executing
SET GLOBAL sql_mode=(SELECT REPLACE(##sql_mode,'ONLY_FULL_GROUP_BY',''));
and tried to restart the MySQL service using
sudo service mysql restart
I think I have done everything required. Why am I still getting the same error?
If you want to find the place of uncorrectness you must format the code carefully at least.
SELECT t1.deviceId,
t1.date,
t1.vavId,
t1.timestamp,
t1.nvo_airflow as airflow,
t1.nvo_air_damper_position as damper_position ,
t1.nvo_temperature_sensor_pps as vavTemperature ,
d.MILO as miloId ,
m1.timestamp as miloTimestamp,
m1.temperature as miloTemperature
FROM ( SELECT deviceId,
date,
nvo_airflow,
nvo_air_damper_position,
nvo_temperature_sensor_pps,
vavId,
timestamp,
counter
from vavData
where date=%s
and floor=%s
) t1
INNER JOIN ( SELECT date,
max(timestamp) as timestamp,
vavId
from vavData
where date=%s
and floor=%s
group by vavId
) t2 ON (t1.timestamp = t2.timestamp)
INNER JOIN ( SELECT VAV,
MILO,
floor
from VavMiloMapping
where floor = %s
) d ON (t1.vavId = d.VAV )
INNER JOIN ( SELECT t1.deviceId,
t1.date,
t1.timestamp,
t1.humidity,
t1.temperature,
t1.block,
t1.floor,
t1.location
FROM ( SELECT deviceId,
date,
timestamp,
humidity,
temperature,
block,
floor,
location
from miloData
WHERE date=%s
and floor=%s
) t1
INNER JOIN ( SELECT deviceId,
max(timestamp) as timestamp,
location
from miloData
where date=%s
and floor=%s
GROUP BY deviceId
) t2 ON (t1.timestamp = t2.timestamp)
) m1 ON (d.MILO = m1.location)
order by t1.vavId
Now it is visible that there are 2 points of uncorrectness. Both problematic subqueries have an alias t2 and looks like
SELECT some_Id,
max(timestamp) as timestamp,
some_another_field
from some_table
where some_conditions
GROUP BY some_Id
The fiels marked as some_another_field is included into neither GROUP BY expression not aggregate function.
Correct these subqueries.
I have a function that returns a query that would fetch «New priorities for emails» by given account id.
First it selects a domain name for that account, and then selects a data structure for it.
And everything should be OK IMO, but not at this time: SQLAlchemy is generating SQL that is syntactically wrong, and I can’t understand how to fix it. Here are the samples:
def unprocessed_by_account_id(account_id: str):
account_domain = select(
[tables.organizations.c.organization_id]).select_from(
tables.accounts.join(
tables.email_addresses,
tables.accounts.c.account_id == tables.email_addresses.c.email_address,
).join(tables.organizations)
).where(
tables.accounts.c.account_id == account_id,
)
domain_with_subdomains = concat('%', account_domain)
fields = [
tables.users.c.first_name,
…
tables.priorities.c.name,
]
fromclause = tables.users.join(
…
).join(tables.organizations)
whereclause = and_(
…
tables.organizations.c.organization_id.notlike(
domain_with_subdomains),
)
stmt = select(fields).select_from(fromclause).where(whereclause)
return stmt
print(unprocessed_by_account_id(‘foo’))
So it generates:
SELECT
users.first_name,
…
priorities.name
FROM (SELECT organizations.organization_id AS organization_id
FROM accounts
JOIN email_addresses
ON accounts.account_id = email_addresses.email_address
JOIN organizations
ON organizations.organization_id = email_addresses.organization_id
WHERE accounts.account_id = :account_id_1), users
JOIN
…
JOIN organizations
ON organizations.organization_id = email_addresses.organization_id
WHERE emails.account_id = :account_id_2 AND
priorities_new_emails.status = :status_1 AND
organizations.organization_id NOT LIKE
concat(:concat_1, (SELECT organizations.organization_id
FROM accounts
JOIN email_addresses ON accounts.account_id =
email_addresses.email_address
JOIN organizations
ON organizations.organization_id =
email_addresses.organization_id
WHERE accounts.account_id = :account_id_1))
But the first
(SELECT organizations.organization_id AS organization_id
FROM accounts
JOIN email_addresses
ON accounts.account_id = email_addresses.email_address
JOIN organizations
ON organizations.organization_id = email_addresses.organization_id
WHERE accounts.account_id = :account_id_1)
Is redundant here and produces
[2017-05-29 23:49:51] [42601] ERROR: subquery in FROM must have an alias
[2017-05-29 23:49:51] Hint: For example, FROM (SELECT ...) [AS] foo.
[2017-05-29 23:49:51] Position: 245
I tried to use account_domain = account_domain.cte(), but no luck, except that the subquery went to WITH clause as expected.
Also I tried with_only_columns with no effect at all.
I think that Alchemy is adding this statement, because it sees it inside WHERE clause and thinks that without it the filtering will result in an error, but I’m not sure.
Also I must mention than in previous version of code the statement was almost the same except there were no concat(‘%’, account_domain) and notlike was !=.
Also I tried inserting alias here and there, but had no success with that either. And if I manually delete that first statement from the select is plain SQL, then I’d receive expectable results.
Any help is appreciated, thank you.
If you're using a subquery as a value, you need to declare it as_scalar():
domain_with_subdomains = concat('%', account_domain.as_scalar())
I'm building query in python using sqlalchemy version 1.0.0 over MySql server.
We have no foreign keys defined in the schemas.
I want to construct the next query:
select s.visit_date, ss.store_number_1, count(p.pk) num_of_probes
from probedata.session s
inner join probedata.scene ps
on s.session_uid = ps.session_uid
inner join probedata.probe p
on ps.pk = p.scene_fk
inner join static.stores ss
on s.store_fk = ss.pk
where s.status = 'Completed'
group by s.visit_date, ss.store_number_1
the code for the query is:
num_of_probes = session.query(StoreSession.visit_date, Stores.store_number_1, func.count(Probe.pk).label('num_of_probes')) \
.select_from(StoreSession) \
.join(Scene, StoreSession.session_uid == Scene.session_uid) \
.join(Probe, Scene.pk ==Probe.scene_fk) \
.join(Stores, StoreSession.store_fk == Stores.pk) \
.filter(StoreSession.status == 'Completed') \
.group_by(StoreSession.visit_date, Stores.store_number_1)
The query I'm receiving is:
SELECT probedata.session.visit_date AS probedata_session_visit_date, static.stores.store_number_1 AS static_stores_store_number_1, count(probedata.probe.pk) AS num_of_probes
FROM probedata.session
JOIN probedata.scene
ON probedata.session.session_uid = probedata.scene.session_uid AND NULL
JOIN probedata.probe ON probedata.scene.pk = probedata.probe.scene_fk AND NULL
JOIN static.stores ON probedata.session.store_fk = static.stores.pk AND NULL
WHERE probedata.session.status = :status_1
GROUP BY probedata.session.visit_date, static.stores.store_number_1
The problem is that the sqlachemy join adds "AND NULL" to each of the join in the sql query.
I understood that it does it since no foreign keys are defined, and SqlAlcemy 0.9.8 doesn't add it.
How can I dismiss the 'AND NULL' condition in 1.0.0?
Thanks,
Keren
Relevant information about the issue:
Append AND NULL condition on joins
Each row in my table has a date. The date is not unique. The same date is present more than one time.
I want to get all objects with the youngest date.
My solution work but I am not sure if this is a elegent SQLAlchemy way.
query = _session.query(Table._date) \
.order_by(Table._date.desc()) \
.group_by(Table._date)
# this is the younges date (type is date.datetime)
young = query.first()
query = _session.query(Table).filter(Table._date==young)
result = query.all()
Isn't there a way to put all this in one query object or something like that?
You need a having clause, and you need to import the max function
then your query will be:
from sqlalchemy import func
stmt = _session.query(Table) \
.group_by(Table._date) \
.having(Table._date == func.max(Table._date)
This produces a sql statement like the following.
SELECT my_table.*
FROM my_table
GROUP BY my_table._date
HAVING my_table._date = MAX(my_table._date)
If you construct your sql statement with a select, you can examine the sql produced in your case using. *I'm not sure if this would work with statements query
str(stmt)
Two ways of doing this using a sub-query:
# #note: do not need to alias, but do in order to specify `name`
T1 = aliased(MyTable, name="T1")
# version-1:
subquery = (session.query(func.max(T1._date).label("max_date"))
.as_scalar()
)
# version-2:
subquery = (session.query(T1._date.label("max_date"))
.order_by(T1._date.desc())
.limit(1)
.as_scalar()
)
qry = session.query(MyTable).filter(MyTable._date == subquery)
results = qry.all()
The output should be similar to:
# version-1
SELECT my_table.id AS my_table_id, my_table.name AS my_table_name, my_table._date AS my_table__date
FROM my_table
WHERE my_table._date = (
SELECT max("T1"._date) AS max_date
FROM my_table AS "T1")
# version-2
SELECT my_table.id AS my_table_id, my_table.name AS my_table_name, my_table._date AS my_table__date
FROM my_table
WHERE my_table._date = (
SELECT "T1"._date AS max_date
FROM my_table AS "T1"
ORDER BY "T1"._date DESC LIMIT ? OFFSET ?
)
I'm using Django 1.4 and Python 2.7.
I'm doing a Sum of some values... when I do this, this work perfect:
CategoryAnswers.objects.using('mam').filter(category=cat["category"], brand=cat["brand"], category__segment_category=cat["category__segment_category"]).values('category__name', 'brand__name','brand__pk').annotate(total=Sum('answer'))
And generate a query:
SELECT `category`.`name`, `brand`.`name`, `category_answers`.`brand_id`, SUM(`category_answers`.`answer`) AS `total`
FROM `category_answers`
INNER JOIN `category`
ON (`category_answers`.`category_id` = `category`.`id`)
INNER JOIN `brand`
ON (`category_answers`.`brand_id` = `brand`.`id`)
WHERE (`category_answers`.`category_id` = 6 AND
`category_answers`.`brand_id` = 1 AND
`category`.`segment_category_id` = 1 )
GROUP BY `category`.`name`, `brand`.`name`, `category_answers`.`brand_id`
ORDER BY NULL
But when I add a new value, this not work:
CategoryAnswers.objects.using('mam').order_by().filter(category=cat["category"], brand=cat["brand"], category__segment_category=cat["category__segment_category"]).values('category__name','category__pk','brand__name','brand__pk').annotate(total=Sum('answer'))
Seeing the query that is returned, the problem is django add on group by a wrong field (category_answers.id):
SELECT `category`.`name`, `category_answers`.`category_id`, `brand`.`name`, `category_answers`.`brand_id`,
SUM(`category_answers`.`answer`) AS `total`
FROM `category_answers`
INNER JOIN `category`
ON (`category_answers`.`category_id` = `category`.`id`)
INNER JOIN `brand`
ON (`category_answers`.`brand_id` = `brand`.`id`)
WHERE (`category_answers`.`category_id` = 6 AND
`category_answers`.`brand_id` = 1 AND
`category`.`segment_category_id` = 1 )
GROUP BY `category_answers`.`id`, `category`.`name`, `category_answers`.`category_id`, `brand`.`name`, `category_answers`.`brand_id`
ORDER BY NULL
If I remove any parameter this work, so I do not believe this to be problem specific parameter... Am I doing something wrong?
I can't resolve this, so... I do this with raw SQL query:
cursor = connections["mam"].cursor()
cursor.execute("SELECT B.name, A.category_id, A.brand_id, SUM(A.answer) AS total, C.name FROM category_answers A INNER JOIN category B ON A.category_id = B.id INNER JOIN brand C ON A.brand_id = C.id WHERE A.brand_id = %s AND A.category_id = %s AND B.segment_category_id = %s", [cat["brand"],cat["category"],cat["category__segment_category"]])
c_answers = cursor.fetchone()
This is not the best way, but it's works. :)