On the one hand, let's consider this Django model:
from django.db import models
from uuid import UUID
class Entry(models.Model):
id = models.UUIDField(primary_key=True, default=uuid4, editable=False)
value = models.DecimalField(decimal_places=12, max_digits=22)
items = ArrayField(base_field=models.UUIDField(null=False, blank=False), default=list)
On the other hand, let's say we have this dictionary:
coefficients = {item1_uuid: item1_coef, item2_uuid: item2_coef, ... }
Entry.value is intended to be distributed among the Entry.items according to coefficients.
Using Django ORM, what would be the most efficient way (in a single SQL query) to get the sum of the values of my Entries for a single Item, given the coefficients?
For instance, for item1 below I want to get 168.5454..., that is to say 100 * 1 + 150 * (0.2 / (0.2 + 0.35)) + 70 * 0.2.
Entry ID
Value
Items
uuid1
100
[item1_uuid]
uuid2
150
[item1_uuid, item2_uuid]
uuid3
70
[item1_uuid, item2_uuid, item3_uuid]
coefficients = { item1_uuid: Decimal("0.2"), item2_uuid: Decimal("0.35"), item3_uuid: Decimal("0.45") }
Bonus question: how could I adapt my models for this query to run faster? I've deliberately chosen to use an ArrayField and decided not to use a ManyToManyField, was that a bad idea? How to know where I could add db_index[es] for this specific query?
I am using Python 3.10, Django 4.1. and Postgres 14.
I've found a solution to my own question, but I'm sure someone here could come up with a more efficient & cleaner approach.
The idea here is to chain the .alias() methods (cf. Django documentation) and the conditional expressions with Case and When in a for loop.
This results in an overly complex query, which at least does work as expected:
def get_value_for_item(coefficients, item):
item_coef = coefficients.get(item.pk, Decimal(0))
if not item_coef:
return Decimal(0)
several = Q(items__len__gt=1)
queryset = (
Entry.objects
.filter(items__contains=[item.pk])
.alias(total=Case(When(several, then=Value(Decimal(0)))))
)
for k, v in coefficients.items():
has_k = Q(items__contains=[k])
queryset = queryset.alias(total=Case(
When(several & has_k, then=Value(v) + F("total")),
default="total",
)
)
return (
queryset.annotate(
coef_applied=Case(
When(several, then=Value(item_coef) / F("total") * F("value")),
default="value",
)
).aggregate(Sum("coef_applied", default=Decimal(0)))
)["coef_applied__sum"]
With the example I gave in my question and for item1, the output of this function is Decimal(168.5454...) as expected.
Related
Background
Suppose we have a set of questions, and a set of students that answered these questions.
The answers have been reviewed, and scores have been assigned, on some unknown range.
Now, we need to normalize the scores with respect to the extreme values within each question.
For example, if question 1 has a minimum score of 4 and a maximum score of 12, those scores would be normalized to 0 and 1 respectively. Scores in between are interpolated linearly (as described e.g. in Normalization to bring in the range of [0,1]).
Then, for each student, we would like to know the mean of the normalized scores for all questions combined.
Minimal example
Here's a very naive minimal implementation, just to illustrate what we would like to achieve:
class Question(models.Model):
pass
class Student(models.Model):
def mean_normalized_score(self):
normalized_scores = []
for score in self.score_set.all():
normalized_scores.append(score.normalized_value())
return mean(normalized_scores) if normalized_scores else None
class Score(models.Model):
student = models.ForeignKey(to=Student, on_delete=models.CASCADE)
question = models.ForeignKey(to=Question, on_delete=models.CASCADE)
value = models.FloatField()
def normalized_value(self):
limits = Score.objects.filter(question=self.question).aggregate(
min=models.Min('value'), max=models.Max('value'))
return (self.value - limits['min']) / (limits['max'] - limits['min'])
This works well, but it is quite inefficient in terms of database queries, etc.
Goal
Instead of the implementation above, I would prefer to offload the number-crunching on to the database.
What I've tried
Consider, for example, these two use cases:
list the normalized_value for all Score objects
list the mean_normalized_score for all Student objects
The first use case can be covered using window functions in a query, something like this:
w_min = Window(expression=Min('value'), partition_by=[F('question')])
w_max = Window(expression=Max('value'), partition_by=[F('question')])
annotated_scores = Score.objects.annotate(
normalized_value=(F('value') - w_min) / (w_max - w_min))
This works nicely, so the Score.normalized_value() method from the example is no longer needed.
Now, I would like to do something similar for the second use case, to replace the Student.mean_normalized_score() method by a single database query.
The raw SQL could look something like this (for sqlite):
SELECT id, student_id, AVG(normalized_value) AS mean_normalized_score
FROM (
SELECT
myapp_score.*,
((myapp_score.value - MIN(myapp_score.value) OVER (PARTITION BY myapp_score.question_id)) / (MAX(myapp_score.value) OVER (PARTITION BY myapp_score.question_id) - MIN(myapp_score.value) OVER (PARTITION BY myapp_score.question_id)))
AS normalized_value
FROM myapp_score
)
GROUP BY student_id
I can make this work as a raw Django query, but I have not yet been able to reproduce this query using Django's ORM.
I've tried building on the annotated_scores queryset described above, using Django's Subquery, annotate(), aggregate(), Prefetch, and combinations of those, but I must be making a mistake somewhere.
Probably the closest I've gotten is this:
subquery = Subquery(annotated_scores.values('normalized_value'))
Score.objects.values('student_id').annotate(mean=Avg(subquery))
But this is incorrect.
Could someone point me in the right direction, without resorting to raw queries?
I may have found a way to do this using subqueries. The main thing is at least from django, we cannot use the window functions on aggregates, so that's what is blocking the calculation of the mean of the normalized values. I've added comments on the lines to explain what I'm trying to do:
# Get the minimum score per question
min_subquery = Score.objects.filter(question=OuterRef('question')).values('question').annotate(min=Min('value'))
# Get the maximum score per question
max_subquery = Score.objects.filter(question=OuterRef('question')).values('question').annotate(max=Max('value'))
# Calculate the normalized value per score, then get the average by grouping by students
mean_subquery = Score.objects.filter(student=OuterRef('pk')).annotate(
min=Subquery(min_subquery.values('min')[:1]),
max=Subquery(max_subquery.values('max')[:1]),
normalized=ExpressionWrapper((F('value') - F('min'))/(F('max') - F('min')), output_field=FloatField())
).values('student').annotate(mean=Avg('normalized'))
# Get the calculated mean per student
Student.objects.annotate(mean=Subquery(mean_subquery.values('mean')[:1]))
The resulting SQL is:
SELECT
"student"."id",
"student"."name",
(
SELECT
AVG(
(
(
V0."value" - (
SELECT
MIN(U0."value") AS "min"
FROM
"score" U0
WHERE
U0."question_id" = (V0."question_id")
GROUP BY
U0."question_id"
LIMIT
1
)
) / (
(
SELECT
MAX(U0."value") AS "max"
FROM
"score" U0
WHERE
U0."question_id" = (V0."question_id")
GROUP BY
U0."question_id"
LIMIT
1
) - (
SELECT
MIN(U0."value") AS "min"
FROM
"score" U0
WHERE
U0."question_id" = (V0."question_id")
GROUP BY
U0."question_id"
LIMIT
1
)
)
)
) AS "mean"
FROM
"score" V0
WHERE
V0."student_id" = ("student"."id")
GROUP BY
V0."student_id"
LIMIT
1
) AS "mean"
FROM
"student"
As mentioned by #bdbd, and judging from this Django issue, it appears that annotating a windowed queryset is not yet possible (using Django 3.2).
As a temporary workaround, I refactored #bdbd's excellent Subquery solution as follows.
class ScoreQuerySet(models.QuerySet):
def annotate_normalized(self):
w_min = Subquery(self.filter(
question=OuterRef('question')).values('question').annotate(
min=Min('value')).values('min')[:1])
w_max = Subquery(self.filter(
question=OuterRef('question')).values('question').annotate(
max=Max('value')).values('max')[:1])
return self.annotate(normalized=(F('value') - w_min) / (w_max - w_min))
def aggregate_student_mean(self):
return self.annotate_normalized().values('student_id').annotate(
mean=Avg('normalized'))
class Score(models.Model):
objects = ScoreQuerySet.as_manager()
...
Note: If necessary, we can add more Student lookups to the values() in aggregate_student_mean(), e.g. student__name. As long as we take care not to mess up the grouping.
Now, if it ever becomes possible to filter and annotate windowed querysets, we can simply replace the Subquery lines by the much simpler Window implementation:
w_min = Window(expression=Min('value'), partition_by=[F('question')])
w_max = Window(expression=Max('value'), partition_by=[F('question')])
I have the following models:
class LocationPoint(models.Model):
latitude = models.DecimalField(max_digits=16, decimal_places=12)
longitude = models.DecimalField(max_digits=16, decimal_places=12)
class Meta:
unique_together = (
('latitude', 'longitude',),
)
class GeoLogEntry(models.Model):
device = models.ForeignKey(Device, on_delete=models.PROTECT)
location_point = models.ForeignKey(LocationPoint, on_delete=models.PROTECT)
recorded_at = models.DateTimeField(db_index=True)
created_at = models.DateTimeField(auto_now_add=True, db_index=True)
I have lots of incoming records to create (probably thousands at once).
Currently I create them like this:
# Simplified map function contents (removed mapping from dict as it's unrelated to the question topic
points_models = map(lambda point: LocationPoint(latitude=latitude, longitude=longitude), points)
LocationPoint.objects.bulk_create(
points_models,
ignore_conflicts=True
)
# Simplified map function contents (removed mapping from dict as it's unrelated to the question topic
geo_log_entries = map(
lambda log_entry: GeoLogEntry(device=device, location_point=LocationPoint.objects.get(latitude=latitude, longitude=longitude), recorded_at=log_entry.recorded_at),
log_entries
)
GeoLogEntry.objects.bulk_create(geo_log_entries, ignore_conflicts=True)
But I think it's not very effective because it runs N SELECT queries for N records. Is there a better way to do that?
I use Python 3.9, Django 3.1.2 and PostgreSQL 12.4.
The main problem is to fetch the objects to link to in bulk to. We can fetch the objects in bulk once we stored all of these objects:
from django.db.models import Q
points_models = [
LocationPoint(latitude=point.latitude, longitude=point.longitude)
for point in points
]
LocationPoint.objects.bulk_create(
points_models,
ignore_conflicts=True
)
qfilter = Q(
*[
Q(('latitude', point.latitude), ('longitude', point.longitude))
for point in log_entries
],
_connector=Q.OR
)
data = {
(lp.longitude, lp.latitude): lp.pk
for lp in LocationPoint.objects.filter(qfilter)
}
geo_log_entries = [
GeoLogEntry(
device=entry.device,
location_point_id=data[entry.longitude, entry.latitude],
recorded_at=entry.recorded_at
)
for entry in log_entries
]
GeoLogEntry.objects.bulk_create(geo_log_entries, ignore_conflicts=True)
We thus fetch all the objects in bulk that we need to link to (with one query thus), make a dictionary that maps the longitude and latitude on the primary key, and then set location_point_id to that point.
It is however important that one uses decimals, or at least a type that will match. Floating points are tricky, since these can easily have rounding errors (therefore often longitudes and latitudes are stored as "fixed point" numbers, so for example integers that are a factor 1'000 larger or 1'000'000 larger). Otherwise you should use an algorithm that matches it with the data that is generated through querying.
bulk_create(...) will return you created objects as a list. You can filter those objects on Python side, instead of making queries to your DB, as they are already fetched.
location_points = LocationPoint.objects.bulk_create(
points_models,
ignore_conflicts=True
)
geo_log_entries = map(
lambda log_entry: GeoLogEntry(
device=device,
location_point=get_location_point(log_entry, location_points),
recorded_at=log_entry.recorded_at
),
log_entries
)
GeoLogEntry.objects.bulk_create(geo_log_entries, ignore_conflicts=True)
All you need to do is implement get_location_point satisfying your needs
I have 3 related models:
Program(Model):
... # which aggregates ProgramVersions
ProgramVersion(Model):
program = ForeignKey(Program)
index = IntegerField()
UserProgramVersion(Model):
user = ForeignKey(User)
version = ForeignKey(ProgramVersion)
index = IntegerField()
ProgramVersion and UserProgramVersion are orderable models based on index field - object with highest index in the table is considered latest/newest object (this is handled by some custom logic, not relevant).
I would like to select all latest UserProgramVersion's, i.e. latest UPV's which point to the same Program.
this can be handled by this UserProgramVersion queryset:
def latest_user_program_versions(self):
latest = self\
.order_by('version__program_id', '-version__index', '-index')\
.distinct('version__program_id')
return self.filter(id__in=latest)
this works fine however im looking for a solution which does NOT use .distinct()
I tried something like this:
def latest_user_program_versions(self):
latest = self\
.annotate(
'max_version_index'=Max('version__index'),
'max_index'=Max('index'))\
.filter(
'version__index'=F('max_version_index'),
'index'=F('max_index'))
return self.filter(id__in=latest)
this however does not work
Use Subquery() expressions in Django 1.11. The example in docs is similar and the purpose is also to get the newest item for required parent records.
(You could start probably by that example with your objects, but I wrote also a complete more complicated suggestion to avoid possible performance pitfalls.)
from django.db.models import OuterRef, Subquery
...
def latest_user_program_versions(self, *args, **kwargs):
# You should filter users by args or kwargs here, for performance reasons.
# If you do it here it is applied also to subquery - much faster on a big db.
qs = self.filter(*args, **kwargs)
parent = Program.objects.filter(pk__in=qs.values('version__program'))
newest = (
qs.filter(version__program=OuterRef('pk'))
.order_by('-version__index', '-index')
)
pks = (
parent.annotate(newest_id=Subquery(newest.values('pk')[:1]))
.values_list('newest_id', flat=True)
)
# Maybe you prefer to uncomment this to be it compiled by two shorter SQLs.
# pks = list(pks)
return self.filter(pk__in=pks)
If you considerably improve it, write the solution in your answer.
EDIT Your problem in your second solution:
Nobody can cut a branch below him, neither in SQL, but I can sit on its temporary copy in a subquery, to can survive it :-) That is also why I ask for a filter at the beginning. The second problem is that Max('version__index') and Max('index') could be from two different objects and no valid intersection is found.
EDIT2: Verified: The internal SQL from my query is complicated, but seems correct.
SELECT app_userprogramversion.id,...
FROM app_userprogramversion
WHERE app_userprogramversion.id IN
(SELECT
(SELECT U0.id
FROM app_userprogramversion U0
INNER JOIN app_programversion U2 ON (U0.version_id = U2.id)
WHERE (U0.user_id = 123 AND U2.program_id = (V0.id))
ORDER BY U2.index DESC, U0.index DESC LIMIT 1
) AS newest_id
FROM app_program V0 WHERE V0.id IN
(SELECT U2.program_id AS Col1
FROM app_userprogramversion U0
INNER JOIN app_programversion U2 ON (U0.version_id = U2.id)
WHERE U0.user_id = 123
)
)
Using Django ORM, can one do something like queryset.objects.annotate(Count('queryset_objects', gte=VALUE)). Catch my drift?
Here's a quick example to use for illustrating a possible answer:
In a Django website, content creators submit articles, and regular users view (i.e. read) the said articles. Articles can either be published (i.e. available for all to read), or in draft mode. The models depicting these requirements are:
class Article(models.Model):
author = models.ForeignKey(User)
published = models.BooleanField(default=False)
class Readership(models.Model):
reader = models.ForeignKey(User)
which_article = models.ForeignKey(Article)
what_time = models.DateTimeField(auto_now_add=True)
My question is: How can I get all published articles, sorted by unique readership from the last 30 mins? I.e. I want to count how many distinct (unique) views each published article got in the last half an hour, and then produce a list of articles sorted by these distinct views.
I tried:
date = datetime.now()-timedelta(minutes=30)
articles = Article.objects.filter(published=True).extra(select = {
"views" : """
SELECT COUNT(*)
FROM myapp_readership
JOIN myapp_article on myapp_readership.which_article_id = myapp_article.id
WHERE myapp_readership.reader_id = myapp_user.id
AND myapp_readership.what_time > %s """ % date,
}).order_by("-views")
This sprang the error: syntax error at or near "01" (where "01" was the datetime object inside extra). It's not much to go on.
For django >= 1.8
Use Conditional Aggregation:
from django.db.models import Count, Case, When, IntegerField
Article.objects.annotate(
numviews=Count(Case(
When(readership__what_time__lt=treshold, then=1),
output_field=IntegerField(),
))
)
Explanation:
normal query through your articles will be annotated with numviews field. That field will be constructed as a CASE/WHEN expression, wrapped by Count, that will return 1 for readership matching criteria and NULL for readership not matching criteria. Count will ignore nulls and count only values.
You will get zeros on articles that haven't been viewed recently and you can use that numviews field for sorting and filtering.
Query behind this for PostgreSQL will be:
SELECT
"app_article"."id",
"app_article"."author",
"app_article"."published",
COUNT(
CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN 1
ELSE NULL END
) as "numviews"
FROM "app_article" LEFT OUTER JOIN "app_readership"
ON ("app_article"."id" = "app_readership"."which_article_id")
GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"
If we want to track only unique queries, we can add distinction into Count, and make our When clause to return value, we want to distinct on.
from django.db.models import Count, Case, When, CharField, F
Article.objects.annotate(
numviews=Count(Case(
When(readership__what_time__lt=treshold, then=F('readership__reader')), # it can be also `readership__reader_id`, it doesn't matter
output_field=CharField(),
), distinct=True)
)
That will produce:
SELECT
"app_article"."id",
"app_article"."author",
"app_article"."published",
COUNT(
DISTINCT CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN "app_readership"."reader_id"
ELSE NULL END
) as "numviews"
FROM "app_article" LEFT OUTER JOIN "app_readership"
ON ("app_article"."id" = "app_readership"."which_article_id")
GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"
For django < 1.8 and PostgreSQL
You can just use raw for executing SQL statement created by newer versions of django. Apparently there is no simple and optimized method for querying that data without using raw (even with extra there are some problems with injecting required JOIN clause).
Articles.objects.raw('SELECT'
' "app_article"."id",'
' "app_article"."author",'
' "app_article"."published",'
' COUNT('
' DISTINCT CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN "app_readership"."reader_id"'
' ELSE NULL END'
' ) as "numviews"'
'FROM "app_article" LEFT OUTER JOIN "app_readership"'
' ON ("app_article"."id" = "app_readership"."which_article_id")'
'GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"')
For django >= 2.0 you can use Conditional aggregation with a filter argument in the aggregate functions:
from datetime import timedelta
from django.utils import timezone
from django.db.models import Count, Q # need import
Article.objects.annotate(
numviews=Count(
'readership__reader__id',
filter=Q(readership__what_time__gt=timezone.now() - timedelta(minutes=30)),
distinct=True
)
)
Given PostgreSQL 9.2.10, Django 1.8, python 2.7.5 and the following models:
class restProdAPI(models.Model):
rest_id = models.PositiveIntegerField(primary_key=True)
rest_host = models.CharField(max_length=20)
rest_ip = models.GenericIPAddressField(default='0.0.0.0')
rest_mode = models.CharField(max_length=20)
rest_state = models.CharField(max_length=20)
class soapProdAPI(models.Model):
soap_id = models.PositiveIntegerField(primary_key=True)
soap_host = models.CharField(max_length=20)
soap_ip = models.GenericIPAddressField(default='0.0.0.0')
soap_asset = models.CharField(max_length=20)
soap_state = models.CharField(max_length=20)
And the following raw query which returns exactly what I am looking for:
SELECT
app_restProdAPI.rest_id, app_soapProdAPI.soap_id, app_restProdAPI.rest_host, app_restProdAPI.rest_ip, app_soapProdAPI.soap_asset, app_restProdAPI.rest_mode, app_restProdAPI.rest_state
FROM
app_soapProdAPI
LEFT OUTER JOIN
app_restProdAPI
ON
((app_restProdAPI.rest_host = app_soapProdAPI.soap_host)
OR
(app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip))
WHERE
app_restProdAPI.rest_mode = 'Excluded';
Which returns like this:
rest_id | soap_id | rest_host | rest_ip | soap_asset | rest_mode | rest_state
---------+---------+---------------+----------------+------------+-----------+-----------
1234 | 12345 | 1G24019123ABC | 123.123.123.12 | A1234567 | Excluded | Up
What would be the best method for making this work using Django's model and orm structure?
I have been looking around for possible methods for joining the two tables entirely without a relationship but there does not seem to be a clean or efficient way to do this. I have also tried looking for methods to do left outer joins in django, but again documentation is sparse or difficult to decipher.
I know I will probably have to use Q objects to do the or clause I have in there. Additionally I have looked at relationships and it looks like a foreignkey() may work but I am unsure if this is the best method of doing it. Any and all help would be greatly appreciated. Thank you in advance.
** EDIT 1 **
So far Todor has offered a solution that uses a INNER JOIN that works. I may have found a solution HERE if anyone can decipher that mess of inline raw html.
** EDIT 2 **
Is there a way to filter on a field (where something = 'something') like my query above given, Todor's answer? I tried the following but it is still including all records even though my equivalent postresql query is working as expected. It seems I cannot have everything in the where that I do because when I remove one of the or statements and just do a and statement it applies the excluded filter.
soapProdAPI.objects.extra(
select = {
'rest_id' : 'app_restprodapi.rest_id',
'rest_host' : 'app_restprodapi.rest_host',
'rest_ip' : 'app_restprodapi.rest_ip',
'rest_mode' : 'app_restprodapi.rest_mode',
'rest_state' : 'app_restprodapi.rest_state'
},
tables = ['app_restprodapi'],
where = ['app_restprodapi.rest_mode=%s \
AND app_restprodapi.rest_host=app_soapprodapi.soap_host \
OR app_restprodapi.rest_ip=app_soapprodapi.soap_ip'],
params = ['Excluded']
)
** EDIT 3 / CURRENT SOLUTION IN PLACE **
To date Todor has provided the most complete answer, using an INNER JOIN, but the hope is that this question will generate thought into how this still may be accomplished. As this does not seem to be inherently possible, any and all suggestions are welcome as they may possibly lead to better solutions. That being said, using Todor's answer, I was able accomplish the exact query I needed:
restProdAPI.objects.extra(
select = {
'soap_id' : 'app_soapprodapi.soap_id',
'soap_asset' : 'app_soapprodapi.soap_asset'
},
tables = ['app_soapprodapi'],
where = ['app_restprodapi.rest_mode = %s',
'app_soapprodapi.soap_host = app_restprodapi.rest_host OR \
app_soapprodapi.soap_ip = app_restprodapi.rest_ip'
],
params = ['Excluded']
)
** TLDR **
I would like to convert this PostGreSQL query to the ORM provided by Django WITHOUT using .raw() or any raw query code at all. I am completely open to changing the model to having a foreignkey if that facilitates this and is, from a performance standpoint, the best method. I am going to be using the objects returned in conjunction with django-datatables-view if that helps in terms of design.
Solving it with INNER JOIN
In case you can go with only soapProdAPI's that contain corresponding restProdAPI ( in terms of your join statement -> linked by host or ip). You can try the following:
soapProdAPI.objects.extra(
select = {
'rest_id' : "app_restProdAPI.rest_id",
'rest_host' : "app_restProdAPI.rest_host",
'rest_ip' : "app_restProdAPI.rest_ip",
'rest_mode' : "app_restProdAPI.rest_mode",
'rest_state': "app_restProdAPI.rest_state"
},
tables = ["app_restProdAPI"],
where = ["app_restProdAPI.rest_host = app_soapProdAPI.soap_host \
OR app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip"]
)
How to filter more?
Since we are using .extra I would advice to read the docs carefully. In general we can't use .filter with some of the fields inside the select dict, because they are not part of the soapProdAPI and Django can't resolve them. We have to stick with the where kwarg in .extra, and since it's a list, we better just add another element.
where = ["app_restProdAPI.rest_host = app_soapProdAPI.soap_host \
OR app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip",
"app_restProdAPI.rest_mode=%s"
],
params = ['Excluded']
Repeated subquery
If you really need all soapProdAPI's no matter if they have corresponding restProdAPI I can only think of a one ugly example where a subquery is repeated for each field you need.
soapProdAPI.objects.extra(
select = {
'rest_id' : "(select rest_id from app_restProdAPI where app_restProdAPI.rest_host = app_soapProdAPI.soap_host OR app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip)",
'rest_host' : "(select rest_host from app_restProdAPI where app_restProdAPI.rest_host = app_soapProdAPI.soap_host OR app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip)",
'rest_ip' : "(select rest_ip from app_restProdAPI where app_restProdAPI.rest_host = app_soapProdAPI.soap_host OR app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip)",
'rest_mode' : "(select rest_mode from app_restProdAPI where app_restProdAPI.rest_host = app_soapProdAPI.soap_host OR app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip)",
'rest_state': "(select rest_state from app_restProdAPI where app_restProdAPI.rest_host = app_soapProdAPI.soap_host OR app_restProdAPI.rest_ip = app_soapProdAPI.soap_ip)"
},
)
I think this could be usefull for you! Effectively, you can use Q to construct your query.
I try it the Django shell, I create some data and I did something like this:
restProdAPI.objects.filter(Q(rest_host=s1.soap_host)|Q(rest_ip=s1.soap_ip))
Where s1 is a soapProdAPI.
This is all the code i whote, you can try it and to see if can help you
from django.db.models import Q
from core.models import restProdAPI, soapProdAPI
s1 = soapProdAPI.objects.get(soap_id=1)
restProdAPI.objects.filter(Q(rest_id=s1.soap_id)|Q(rest_ip=s1.soap_ip))