Converting LEFT OUTER JOIN query to Django orm queryset/query - python

Given PostgreSQL 9.2.10, Django 1.8, python 2.7.5 and the following models:
class restProdAPI(models.Model):
rest_id = models.PositiveIntegerField(primary_key=True)
rest_host = models.CharField(max_length=20)
rest_ip = models.GenericIPAddressField(default='0.0.0.0')
rest_mode = models.CharField(max_length=20)
rest_state = models.CharField(max_length=20)
class soapProdAPI(models.Model):
soap_id = models.PositiveIntegerField(primary_key=True)
soap_host = models.CharField(max_length=20)
soap_ip = models.GenericIPAddressField(default='0.0.0.0')
soap_asset = models.CharField(max_length=20)
soap_state = models.CharField(max_length=20)
And the following raw query which returns exactly what I am looking for:
SELECT
app_restProdAPI.rest_id, app_soapProdAPI.soap_id, app_restProdAPI.rest_host, app_restProdAPI.rest_ip, app_soapProdAPI.soap_asset, app_restProdAPI.rest_mode, app_restProdAPI.rest_state
FROM
app_soapProdAPI
LEFT OUTER JOIN
app_restProdAPI
ON
((app_restProdAPI.rest_host = app_soapProdAPI.soap_host)
OR
(app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip))
WHERE
app_restProdAPI.rest_mode = 'Excluded';
Which returns like this:
rest_id | soap_id | rest_host | rest_ip | soap_asset | rest_mode | rest_state
---------+---------+---------------+----------------+------------+-----------+-----------
1234 | 12345 | 1G24019123ABC | 123.123.123.12 | A1234567 | Excluded | Up
What would be the best method for making this work using Django's model and orm structure?
I have been looking around for possible methods for joining the two tables entirely without a relationship but there does not seem to be a clean or efficient way to do this. I have also tried looking for methods to do left outer joins in django, but again documentation is sparse or difficult to decipher.
I know I will probably have to use Q objects to do the or clause I have in there. Additionally I have looked at relationships and it looks like a foreignkey() may work but I am unsure if this is the best method of doing it. Any and all help would be greatly appreciated. Thank you in advance.
** EDIT 1 **
So far Todor has offered a solution that uses a INNER JOIN that works. I may have found a solution HERE if anyone can decipher that mess of inline raw html.
** EDIT 2 **
Is there a way to filter on a field (where something = 'something') like my query above given, Todor's answer? I tried the following but it is still including all records even though my equivalent postresql query is working as expected. It seems I cannot have everything in the where that I do because when I remove one of the or statements and just do a and statement it applies the excluded filter.
soapProdAPI.objects.extra(
select = {
'rest_id' : 'app_restprodapi.rest_id',
'rest_host' : 'app_restprodapi.rest_host',
'rest_ip' : 'app_restprodapi.rest_ip',
'rest_mode' : 'app_restprodapi.rest_mode',
'rest_state' : 'app_restprodapi.rest_state'
},
tables = ['app_restprodapi'],
where = ['app_restprodapi.rest_mode=%s \
AND app_restprodapi.rest_host=app_soapprodapi.soap_host \
OR app_restprodapi.rest_ip=app_soapprodapi.soap_ip'],
params = ['Excluded']
)
** EDIT 3 / CURRENT SOLUTION IN PLACE **
To date Todor has provided the most complete answer, using an INNER JOIN, but the hope is that this question will generate thought into how this still may be accomplished. As this does not seem to be inherently possible, any and all suggestions are welcome as they may possibly lead to better solutions. That being said, using Todor's answer, I was able accomplish the exact query I needed:
restProdAPI.objects.extra(
select = {
'soap_id' : 'app_soapprodapi.soap_id',
'soap_asset' : 'app_soapprodapi.soap_asset'
},
tables = ['app_soapprodapi'],
where = ['app_restprodapi.rest_mode = %s',
'app_soapprodapi.soap_host = app_restprodapi.rest_host OR \
app_soapprodapi.soap_ip = app_restprodapi.rest_ip'
],
params = ['Excluded']
)
** TLDR **
I would like to convert this PostGreSQL query to the ORM provided by Django WITHOUT using .raw() or any raw query code at all. I am completely open to changing the model to having a foreignkey if that facilitates this and is, from a performance standpoint, the best method. I am going to be using the objects returned in conjunction with django-datatables-view if that helps in terms of design.

Solving it with INNER JOIN
In case you can go with only soapProdAPI's that contain corresponding restProdAPI ( in terms of your join statement -> linked by host or ip). You can try the following:
soapProdAPI.objects.extra(
select = {
'rest_id' : "app_restProdAPI.rest_id",
'rest_host' : "app_restProdAPI.rest_host",
'rest_ip' : "app_restProdAPI.rest_ip",
'rest_mode' : "app_restProdAPI.rest_mode",
'rest_state': "app_restProdAPI.rest_state"
},
tables = ["app_restProdAPI"],
where = ["app_restProdAPI.rest_host = app_soapProdAPI.soap_host \
OR app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip"]
)
How to filter more?
Since we are using .extra I would advice to read the docs carefully. In general we can't use .filter with some of the fields inside the select dict, because they are not part of the soapProdAPI and Django can't resolve them. We have to stick with the where kwarg in .extra, and since it's a list, we better just add another element.
where = ["app_restProdAPI.rest_host = app_soapProdAPI.soap_host \
OR app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip",
"app_restProdAPI.rest_mode=%s"
],
params = ['Excluded']
Repeated subquery
If you really need all soapProdAPI's no matter if they have corresponding restProdAPI I can only think of a one ugly example where a subquery is repeated for each field you need.
soapProdAPI.objects.extra(
select = {
'rest_id' : "(select rest_id from app_restProdAPI where app_restProdAPI.rest_host = app_soapProdAPI.soap_host OR app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip)",
'rest_host' : "(select rest_host from app_restProdAPI where app_restProdAPI.rest_host = app_soapProdAPI.soap_host OR app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip)",
'rest_ip' : "(select rest_ip from app_restProdAPI where app_restProdAPI.rest_host = app_soapProdAPI.soap_host OR app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip)",
'rest_mode' : "(select rest_mode from app_restProdAPI where app_restProdAPI.rest_host = app_soapProdAPI.soap_host OR app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip)",
'rest_state': "(select rest_state from app_restProdAPI where app_restProdAPI.rest_host = app_soapProdAPI.soap_host OR app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip)"
},
)

I think this could be usefull for you! Effectively, you can use Q to construct your query.
I try it the Django shell, I create some data and I did something like this:
restProdAPI.objects.filter(Q(rest_host=s1.soap_host)|Q(rest_ip=s1.soap_ip))
Where s1 is a soapProdAPI.
This is all the code i whote, you can try it and to see if can help you
from django.db.models import Q
from core.models import restProdAPI, soapProdAPI
s1 = soapProdAPI.objects.get(soap_id=1)
restProdAPI.objects.filter(Q(rest_id=s1.soap_id)|Q(rest_ip=s1.soap_ip))

Related

Complex Django query involving an ArrayField & coefficients

On the one hand, let's consider this Django model:
from django.db import models
from uuid import UUID
class Entry(models.Model):
id = models.UUIDField(primary_key=True, default=uuid4, editable=False)
value = models.DecimalField(decimal_places=12, max_digits=22)
items = ArrayField(base_field=models.UUIDField(null=False, blank=False), default=list)
On the other hand, let's say we have this dictionary:
coefficients = {item1_uuid: item1_coef, item2_uuid: item2_coef, ... }
Entry.value is intended to be distributed among the Entry.items according to coefficients.
Using Django ORM, what would be the most efficient way (in a single SQL query) to get the sum of the values of my Entries for a single Item, given the coefficients?
For instance, for item1 below I want to get 168.5454..., that is to say 100 * 1 + 150 * (0.2 / (0.2 + 0.35)) + 70 * 0.2.
Entry ID
Value
Items
uuid1
100
[item1_uuid]
uuid2
150
[item1_uuid, item2_uuid]
uuid3
70
[item1_uuid, item2_uuid, item3_uuid]
coefficients = { item1_uuid: Decimal("0.2"), item2_uuid: Decimal("0.35"), item3_uuid: Decimal("0.45") }
Bonus question: how could I adapt my models for this query to run faster? I've deliberately chosen to use an ArrayField and decided not to use a ManyToManyField, was that a bad idea? How to know where I could add db_index[es] for this specific query?
I am using Python 3.10, Django 4.1. and Postgres 14.
I've found a solution to my own question, but I'm sure someone here could come up with a more efficient & cleaner approach.
The idea here is to chain the .alias() methods (cf. Django documentation) and the conditional expressions with Case and When in a for loop.
This results in an overly complex query, which at least does work as expected:
def get_value_for_item(coefficients, item):
item_coef = coefficients.get(item.pk, Decimal(0))
if not item_coef:
return Decimal(0)
several = Q(items__len__gt=1)
queryset = (
Entry.objects
.filter(items__contains=[item.pk])
.alias(total=Case(When(several, then=Value(Decimal(0)))))
)
for k, v in coefficients.items():
has_k = Q(items__contains=[k])
queryset = queryset.alias(total=Case(
When(several & has_k, then=Value(v) + F("total")),
default="total",
)
)
return (
queryset.annotate(
coef_applied=Case(
When(several, then=Value(item_coef) / F("total") * F("value")),
default="value",
)
).aggregate(Sum("coef_applied", default=Decimal(0)))
)["coef_applied__sum"]
With the example I gave in my question and for item1, the output of this function is Decimal(168.5454...) as expected.

Django: Add a "configuration" list for different code sections to access

I use these different code snippets at different parts in my code. To avoid potential errors over time I would like to implement one configuration list that both these sections can access. The list gets longer over time with more entries. Do you have an idea about how to achieve that?
Here the "configuration" list #1 and #2 should access in order to perform the filter and if statement:
list = [TYPE_OF_PEOPLE_ATTENDING, HEARING_ABOUT_THE_EVENT, MISSING_EVENT_INFORMATION, REASON_FOR_ATTENDING]
1
entities = (
Entity.objects.values("answer__question__focus", "name")
.annotate(count=Count("pk"))
.annotate(total_salience=Sum("salience"))
.filter(
Q(answer__question__focus=QuestionFocus.TYPE_OF_PEOPLE_ATTENDING) |
Q(answer__question__focus=QuestionFocus.HEARING_ABOUT_THE_EVENT) |
Q(answer__question__focus=QuestionFocus.MISSING_EVENT_INFORMATION) |
Q(answer__question__focus=QuestionFocus.REASON_FOR_ATTENDING)
)
)
2
if (
answer_obj.question.focus == QuestionFocus.TYPE_OF_PEOPLE_ATTENDING
or answer_obj.question.focus == QuestionFocus.HEARING_ABOUT_THE_EVENT
or answer_obj.question.focus == QuestionFocus.MISSING_EVENT_INFORMATION
or answer_obj.question.focus == QuestionFocus.REASON_FOR_ATTENDING
):
entities = analyze_entities(answer_obj.answer)
bulk_create_entities(entities, response, answer_obj)
You should be able to rewrite both statements to directly use a list:
VALID_TYPES = [TYPE_OF_PEOPLE_ATTENDING, HEARING_ABOUT_THE_EVENT, MISSING_EVENT_INFORMATION, REASON_FOR_ATTENDING]
1
entities = (
Entity.objects.values("answer__question__focus", "name")
.annotate(count=Count("pk"))
.annotate(total_salience=Sum("salience"))
.filter(answer__question__focus__in=VALID_TYPES)
2
if (answer_obj.question.focus in VALID_TYPES):
entities = analyze_entities(answer_obj.answer)
bulk_create_entities(entities, response, answer_obj)

Django ORM filter by Max column value of two related models

I have 3 related models:
Program(Model):
... # which aggregates ProgramVersions
ProgramVersion(Model):
program = ForeignKey(Program)
index = IntegerField()
UserProgramVersion(Model):
user = ForeignKey(User)
version = ForeignKey(ProgramVersion)
index = IntegerField()
ProgramVersion and UserProgramVersion are orderable models based on index field - object with highest index in the table is considered latest/newest object (this is handled by some custom logic, not relevant).
I would like to select all latest UserProgramVersion's, i.e. latest UPV's which point to the same Program.
this can be handled by this UserProgramVersion queryset:
def latest_user_program_versions(self):
latest = self\
.order_by('version__program_id', '-version__index', '-index')\
.distinct('version__program_id')
return self.filter(id__in=latest)
this works fine however im looking for a solution which does NOT use .distinct()
I tried something like this:
def latest_user_program_versions(self):
latest = self\
.annotate(
'max_version_index'=Max('version__index'),
'max_index'=Max('index'))\
.filter(
'version__index'=F('max_version_index'),
'index'=F('max_index'))
return self.filter(id__in=latest)
this however does not work
Use Subquery() expressions in Django 1.11. The example in docs is similar and the purpose is also to get the newest item for required parent records.
(You could start probably by that example with your objects, but I wrote also a complete more complicated suggestion to avoid possible performance pitfalls.)
from django.db.models import OuterRef, Subquery
...
def latest_user_program_versions(self, *args, **kwargs):
# You should filter users by args or kwargs here, for performance reasons.
# If you do it here it is applied also to subquery - much faster on a big db.
qs = self.filter(*args, **kwargs)
parent = Program.objects.filter(pk__in=qs.values('version__program'))
newest = (
qs.filter(version__program=OuterRef('pk'))
.order_by('-version__index', '-index')
)
pks = (
parent.annotate(newest_id=Subquery(newest.values('pk')[:1]))
.values_list('newest_id', flat=True)
)
# Maybe you prefer to uncomment this to be it compiled by two shorter SQLs.
# pks = list(pks)
return self.filter(pk__in=pks)
If you considerably improve it, write the solution in your answer.
EDIT Your problem in your second solution:
Nobody can cut a branch below him, neither in SQL, but I can sit on its temporary copy in a subquery, to can survive it :-) That is also why I ask for a filter at the beginning. The second problem is that Max('version__index') and Max('index') could be from two different objects and no valid intersection is found.
EDIT2: Verified: The internal SQL from my query is complicated, but seems correct.
SELECT app_userprogramversion.id,...
FROM app_userprogramversion
WHERE app_userprogramversion.id IN
(SELECT
(SELECT U0.id
FROM app_userprogramversion U0
INNER JOIN app_programversion U2 ON (U0.version_id = U2.id)
WHERE (U0.user_id = 123 AND U2.program_id = (V0.id))
ORDER BY U2.index DESC, U0.index DESC LIMIT 1
) AS newest_id
FROM app_program V0 WHERE V0.id IN
(SELECT U2.program_id AS Col1
FROM app_userprogramversion U0
INNER JOIN app_programversion U2 ON (U0.version_id = U2.id)
WHERE U0.user_id = 123
)
)

Generated django queryset works, but running it in django fails

I've tried replacing INNER_QUERY with "myApp_Instructionssteptranslation", and also leaving it away but this just gives other errors.
So, how come the generated query seems to work correctly when ran apart, but fails when we want to retrieve its results using django?
And how can I fix the issue so that it behaves like I want it too?We have a model InstructionsStep, which has a foreign key to a Instructions, which in turn is connected to a Library. A InstructionsStep has a description, but as multiple languages might exist this description is stored in a separate model containing a language code and the description translated in that language.
But for performance reasons, we need to be able to get a queryset of Instructionssteps, where the description is annotated in the default language (which is stored in the Library). To achieve this and to circumvent django's limitations on joins within annotations, we created a custom Aggregate function that retrieves this language. (DefaultInstructionsStepTranslationDescription)
class InstructionsStepTranslationQuerySet(models.query.QuerySet):
def language(self, language):
class DefaultInstructionsStepTranslationDescription(Aggregate):
template = '''
(%(function)s %(distinct)s INNER_QUERY."%(expressions)s" FROM (
SELECT "myApp_Instructionssteptranslation"."description" AS "description",
MIN("myUser_library"."default_language") AS "default_language"
FROM "myApp_Instructionssteptranslation"
INNER JOIN "myApp_Instructionsstep" A_ST ON ("myApp_Instructionssteptranslation"."Instructions_step_id" = A_ST."id")
INNER JOIN "myApp_Instructions" ON (A_ST."Instructions_id" = "myApp_Instructions"."id")
LEFT OUTER JOIN "myUser_library" ON ("myApp_Instructions"."library_id" = "myUser_library"."id")
WHERE "myApp_Instructionssteptranslation"."Instructions_step_id" = "myApp_Instructionsstep"."id"
and "myApp_Instructionssteptranslation"."language" = default_language
GROUP BY "myApp_Instructionssteptranslation"."id"
) AS INNER_QUERY
LIMIT 1
'''
function = 'SELECT'
def __init__(self, expression='', **extra):
super(DefaultInstructionsStepTranslationDescription, self).__init__(
expression,
distinct='',
output_field=CharField(),
**extra
)
return self.annotate(
t_description=
Case(
When(id__in = InstructionsStepTranslation.objects\
.annotate( default_language = Min(F("Instructions_step__Instructions__library__default_language")))\
.filter( language=F("default_language") )\
.values_list("Instructions_step_id"),
then=DefaultInstructionsStepTranslationDescription(Value("description"))
),
default=Value("error"),
output_field=CharField()
)
)
This generates the following sql-query (the database is a postgres database)
SELECT "myApp_Instructionsstep"."id",
"myApp_Instructionsstep"."original_id",
"myApp_Instructionsstep"."number",
"myApp_Instructionsstep"."Instructions_id",
"myApp_Instructionsstep"."ccp",
CASE
WHEN "myApp_Instructionsstep"."id" IN
(SELECT U0."Instructions_step_id"
FROM "myApp_Instructionssteptranslation" U0
INNER JOIN "myApp_Instructionsstep" U1 ON (U0."Instructions_step_id" = U1."id")
INNER JOIN "myApp_Instructions" U2 ON (U1."Instructions_id" = U2."id")
LEFT OUTER JOIN "myUser_library" U3 ON (U2."library_id" = U3."id")
GROUP BY U0."id"
HAVING U0."language" = (MIN(U3."default_language"))) THEN
(SELECT INNER_QUERY."description"
FROM
(SELECT "myApp_Instructionssteptranslation"."description" AS "description",
MIN("myUser_library"."default_language") AS "default_language"
FROM "myApp_Instructionssteptranslation"
INNER JOIN "myApp_Instructionsstep" A_ST ON ("myApp_Instructionssteptranslation"."Instructions_step_id" = A_ST."id")
INNER JOIN "myApp_Instructions" ON (A_ST."Instructions_id" = "myApp_Instructions"."id")
LEFT OUTER JOIN "myUser_library" ON ("myApp_Instructions"."library_id" = "myUser_library"."id")
WHERE "myApp_Instructionssteptranslation"."Instructions_step_id" = "myApp_Instructionsstep"."id"
and "myApp_Instructionssteptranslation"."language" = default_language
GROUP BY "myApp_Instructionssteptranslation"."id") AS INNER_QUERY
LIMIT 1)
ELSE 'error'
END AS "t_description"
FROM "myApp_Instructionsstep"
WHERE "myApp_Instructionsstep"."id" = 438
GROUP BY "myApp_Instructionsstep"."id"
ORDER BY "myApp_Instructionsstep"."number" ASC
Which works correctly when pasted in Postico.
However, running this in django,
step_id = 438
# InstructionsStep.objectsobjects is overrided with a custom manager that uses the above defined custon queryset
step_queryset = InstructionsStep.objects.language('en').filter(id=step_id)
retrieved_steps = step_queryset.all()
gives the following error:
LINE 1: ...ge" = (MIN(U3."default_language"))) THEN (SELECT INNER_QUER...
^
HINT: Perhaps you meant to reference the column "inner_query.description".
I've tried replacing INNER_QUERY with "myApp_Instructionssteptranslation", and also leaving it away but this just gives other errors.
So, how come the generated query seems to work correctly when ran apart, but fails when we want to retrieve its results using django?
And how can I fix the issue so that it behaves like I want it too?
Meanwhile, I've found that the printed query with the .query attribute differs from the actual query that's been executed.
In this case it printed SELECT INNER_QUERY."description", but it executed SELECT INNER_QUERY."'description'". The single quotes are added because of the Value("description") expression given to InstructionsStepTranslationQuerySet
I solved my problem in the end by passing the id-field (F("id")) instead and using it instead of A_ST."id". (sadly this is necessary as Aggregate does not allow an empty expression to be passed)

Sqlalchemy select all where join not exists mysql using orm

We have two tables: Users and Permissions
We want to select all the users who do NOT have a "guest" permission. Now, it is possible for users to have multiple permissions (not just 1) so simply querying for !"guest" won't work. Here is what our query looks like now:
query = session.query(Users).join(Permission, and_(
Permission.userId == theUser.uid, Permission.deviceId== theDevice.uid))
query.join(Permission).filter(~exists().where(and_(Permission.level==SqlConstants.PermissionLevels.GUEST, Users.uid == Permission.userId)))
I'm not sure if the join in the first line is relevant to the problem we are having, but we are using it, so I'm including it here. (I'll edit it out if it isn't relevant.)
The above returns the following exception:
returned no FROM clauses due to auto-correlation; specify correlate(<tables>) to control correlation manually.
I gleaned this pattern from the following SO post:
Using NOT EXISTS clause in sqlalchemy ORM query
as well as from the sqlalchemy documentation (which is shallow concerning not exists):
http://docs.sqlalchemy.org/en/rel_1_0/orm/query.html
It isn't clear to me what I'm doing wrong or whether there is a better way.
I'm not completely sure that I've understood your problem, mostly because the solution I came up with is quite simple. I'll give it a try, and anyway I hope it helps you in some way.
I was able to reproduce easily the exception you get when using exists. I think it happens because in the where parameters you are mixing columns from the two tables in the join. It would not give the exception if you rewrite it more or less like this,
sq = session.query(Users.pk).join(Permission).filter(Permission.level==SqlConstants.PermissionLevels.GUEST)
q = session.query(Users).join(Permission).filter(~sq.exists())
However it does not work, since as soon as there is 1 register in Permission with GUEST level, the query will give no result at all.
But why not rewriting it like this?
sq = session.query(Users.pk).join(Permission).filter(Permission.level==SqlConstants.PermissionLevels.GUEST)
q = session.query(Users).filter(~Users.pk.in_(sq))
In my trials, if I understood properly your problem, it works.
FYI, this is the toy example I used, where table A corresponds to Users, B to Permission, and B.attr would store the permission level.
In [2]:
class A(Base):
__tablename__ = 'A'
pk = Column('pk', Integer, primary_key=True)
name = Column('name', String)
class B(Base):
__tablename__ = 'B'
pk = Column('pk', Integer, primary_key=True)
fk = Column('fk', Integer, ForeignKey('A.pk'))
attr = Column('attr', Integer)
a = relationship("A", backref='B')
This is the data I have inserted,
In [4]:
q = session.query(B)
print(q)
for x in q.all():
print(x.pk, x.fk, x.attr)
q = session.query(A)
print(q)
for x in q.all():
print(x.pk, x.name)
​
SELECT "B".pk AS "B_pk", "B".fk AS "B_fk", "B".attr AS "B_attr"
FROM "B"
1 1 1
2 1 2
3 2 0
4 2 4
5 1 4
SELECT "A".pk AS "A_pk", "A".name AS "A_name"
FROM "A"
1 one
2 two
3 three
And this the result of the query,
In [16]:
from sqlalchemy import exists, and_, tuple_
sq = session.query(A.pk).join(B).filter(B.attr==2)
print(sq)
q = session.query(A).filter(~A.pk.in_(sq))
print(q)
​
for x in q.all():
print(x.pk, x.name)
SELECT "A".pk AS "A_pk"
FROM "A" JOIN "B" ON "A".pk = "B".fk
WHERE "B".attr = :attr_1
SELECT "A".pk AS "A_pk", "A".name AS "A_name"
FROM "A"
WHERE "A".pk NOT IN (SELECT "A".pk AS "A_pk"
FROM "A" JOIN "B" ON "A".pk = "B".fk
WHERE "B".attr = :attr_1)
2 two
3 three
Hope it helps!

Categories