Django-queryset join without foreignkey - python

model.py
class Tdzien(models.Model):
dziens = models.SmallIntegerField(primary_key=True, db_column='DZIENS')
dzienrok = models.SmallIntegerField(unique=True, db_column='ROK')
class Tnogahist(models.Model):
id_noga = models.ForeignKey(Tenerg, primary_key=True, db_column='ID_ENERG')
dziens = models.SmallIntegerField(db_column='DZIENS')
What I want is to get id_noga where dzienrok=1234. I know that dziens should be
dziens = models.ForeignKey(Tdzien)
but it isn't and I can't change that. Normally I would use something like
Tnogahist.objects.filter(dziens__dzienrok=1234)
but I don't know how to join and filter those tables without foreignkey.

No joins without a foreign key as far as I know, but you could use two queries:
Tnogahist.objects.filter(dziens__in=Tdzien.objects.filter(dzienrok=1234))

It's possible to join two tables by performing a raw sql query. But for this case it's quite nasty, so I recommend you to rewrite your models.py.
You can check how to do this here
It would be something like this:
from django.db import connection
def my_custom_sql(self):
cursor = connection.cursor()
cursor.execute("select id_noga
from myapp_Tnogahist a
inner join myapp_Tdzien b on a.dziens=b.dziens
where b.dzienrok = 1234")
row = cursor.fetchone()
return row

Could you do this with .extra? From https://docs.djangoproject.com/en/dev/ref/models/querysets/#extra:
where / tables
You can define explicit SQL WHERE clauses — perhaps to perform
non-explicit joins — by using where. You can manually add tables to
the SQL FROM clause by using tables.

To provide a little more context around #paul-tomblin's answer,
It's worth mentioning that for the vast majority of django users; the best course of action is to implement a conventional foreign key. Django strongly recommends avoiding the use of extra() saying "use this method as a last resort". However, extra() is still preferable to raw queries using Manager.raw() or executing custom SQL directly using django.db.connection
Here's an example of how you would achieve this using django's .extra() method:
Tnogahist.objects.extra(
tables = ['myapp_tdzien'],
where = [
'myapp_tnogahist.dziens=myapp_tdzien.dziens',
'myapp_tdzien.dzienrok=%s',
],
params = [1234],
)
The primary appeal for using extra() over other approaches is that it plays nicely with the rest of django's queryset stack, like filter, exclude, defer, values, and slicing. So you can probably plug it in alongside traditional django query logic. For example: Tnogahist.objects.filter(...).extra(...).values('id_noga')[:10]

Related

Django: How to "join" two querysets using Prefetch Object?

Context
I am quite new to Django and I am trying to write a complex query that I think would be easily writable in raw SQL, but for which I am struggling using the ORM.
Models
I have several models named SignalValue, SignalCategory, SignalSubcategory, SignalType, SignalSubtype that have the same structure like the following model:
class MyModel(models.Model):
id = models.BigAutoField(primary_key=True)
name = models.CharField()
fullname = models.CharField()
I also have explicit models that represent the relationships between the model SignalValue and the other models SignalCategory, SignalSubcategory, SignalType, SignalSubtype. Each of these relationships are named SignalValueCategory, SignalValueSubcategory, SignalValueType, SignalValueSubtype respectively. Below is the SignalValueCategory model as an example:
class SignalValueCategory(models.Model):
signal_value = models.OneToOneField(SignalValue)
signal_category = models.ForeignKey(SignalCategory)
Finally, I also have the two following models. ResultSignal stores all the signals related to the model Result:
class Result(models.Model):
pass
class ResultSignal(models.Model):
id = models.BigAutoField(primary_key=True)
result = models.ForeignKey(
Result
)
signal_value = models.ForeignKey(
SignalValue
)
Query
What I am trying to achieve is the following.
For a given Result, I want to retrieve all the ResultSignals that belong to it, filter them to keep the ones of my interest, and annotate them with two fields that we will call filter_group_id and filter_group_name. The values of two fields are determined by the SignalValue of the given ResultSignal.
From my perspective, the easiest way to achieve this would be first to annotate the SignalValues with their corresponding filter_group_name and filter_group_id, and then to join the resulting QuerySet with the ResultSignals. However, I think that it is not possible to join two QuerySets together in Django. Consequently, I thought that we could maybe use Prefetch objects to achieve what I am trying to do, but it seems that I am unable to make it work properly.
Code
I will now describe the current state of my queries.
First, annotating the SignalValues with their corresponding filter_group_name and filter_group_id. Note that filter_aggregator in the following code is just a complex filter that allows me to select the wanted SignalValues only. group_filter is the same filter but as a list of subfilters. Additionally, filter_name_case is a conditional expression (Case() construct):
# Attribute a group_filter_id and group_filter_name for each signal
signal_filters = SignalValue.objects.filter(
filter_aggregator
).annotate(
filter_group_id=Window(
expression=DenseRank(),
order_by=group_filters
),
filter_group_name=filter_name_case
)
Then, trying to join/annotate the SignalResults:
prefetch_object = Prefetch(
lookup="signal_value",
queryset=signal_filters,
to_attr="test"
)
result_signals: QuerySet = (
last_interview_result
.resultsignal_set
.filter(signal_value__in=signal_values_of_interest)
.select_related(
'signal_value__signalvaluecategory__signal_category',
'signal_value__signalvaluesubcategory__signal_subcategory',
'signal_value__signalvaluetype__signal_type',
'signal_value__signalvaluesubtype__signal_subtype',
)
.prefetch_related(
prefetch_object
)
.values(
"signal_value",
"test",
category=F('signal_value__signalvaluecategory__signal_category__name'),
subcategory=F('signal_value__signalvaluesubcategory__signal_subcategory__name'),
type=F('signal_value__signalvaluetype__signal_type__name'),
subtype=F('signal_value__signalvaluesubtype__signal_subtype__name'),
)
)
Normally, from my understanding, the resulting QuerySet should have a field "test" that is now available, that would contain the fields of signal_filter, the first QuerySet. However, Django complains that "test" is not found when calling .values(...) in the last part of my code: Cannot resolve keyword 'test' into field. Choices are: [...]. It is like the to_attr parameter of the Prefetch object was not taken into account at all.
Questions
Did I missunderstand the functioning of annotate() and prefetch_related() functions? If not, what am I doing wrong in my code for the specified parameter to_attr to not exist in my resulting QuerySet?
Is there a better way to join two QuerySets in Django or am I better off using RawSQL? An alternative way would be to switch to Pandas to make the join in-memory, but it is very often more efficient to do such transformations on the SQL side with well-designed queries.
You're on the right path, but just missing what prefetch does.
Your annotations are correct, but the "test" prefetch isn't really an attribute. You batch up the SELECT * FROM signal_value queries so you don't have to execute the select per row. Just drop the "test" annotation and you should be fine. https://docs.djangoproject.com/en/3.2/ref/models/querysets/#prefetch-related
Please don't use pandas, it's definitely not necessary and is a ton of overhead. As you say yourself, it's more efficient to do the transforms on the sql side
From the docs on prefetch_related:
Remember that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query.
It's not obvious but the values() call is part of these chained methods that imply a different query, and will actually cancel prefetch_related. This should work if you remove it.

In SQLAlchemy, is there a way to eager-load multiple aliased selectables of the same class using one query?

I have an SQLAlchemy mapped class MyClass, and two aliases for it. I can eager-load a relationship MyClass.relationship on each alias separately using selectinload() like so:
alias_1, alias_2 = aliased(MyClass), aliased(MyClass)
q = session.query(alias_1, alias_2).options(
selectinload(alias_1.relationship),
selectinload(alias_2.relationship))
However, this results in 2 separate SQL queries on MyClass.relationship (in addition to the main query on MyClass, but this is irrelevant to the question). Since these 2 queries on MyClass.relationship are to the same table, I think that it should be possible to merge the primary keys generated within the IN clause in these queries, and just run 1 query on MyClass.relationship.
My best guess for how to do this is:
alias_1, alias_2 = aliased(MyClass), aliased(MyClass)
q = session.query(alias_1, alias_2).options(
selectinload(MyClass.relationship))
But it clearly didn't work:
sqlalchemy.exc.ArgumentError: Mapped attribute "MyClass.relationship" does not apply to any of the root entities in this query, e.g. aliased(MyClass), aliased(MyClass). Please specify the full path from one of the root entities to the target attribute.
Is there a way to do this in SQLAlchemy?
So, this is exactly the same issue we had. This docs explains how to do it.
You need to add selectin_polymorphic. For anyone else if you are using with_polymorphic in your select then remove it.
from sqlalchemy.orm import selectin_polymorphic
query = session.query(MyClass).options(
selectin_polymorphic(MyClass, [alias_1, alias_2]),
selectinload(MyClass.relationship)
)

Pass a queryset as the argument to __in in django?

I have a list of object ID's that I am getting from a query in an model's method, then I'm using that list to delete objects from a different model:
class SomeObject(models.Model):
# [...]
def do_stuff(self, some_param):
# [...]
ids_to_delete = {item.id for item in self.items.all()}
other_object = OtherObject.objects.get_or_create(some_param=some_param)
other_object.items.filter(item_id__in=ids_to_delete).delete()
What I don't like is that this takes 2 queries (well, technically 3 for the get_or_create() but in the real code it's actually .filter(some_param=some_param).first() instead of the .get(), so I don't think there's any easy way around that).
How do I pass in an unevaluated queryset as the argument to an __in lookup?
I would like to do something like:
ids_to_delete = self.items.all().values("id")
other_object.items.filter(item_id__in=ids_to_delete).delete()
You can, pass a QuerySet to the query:
other_object.items.filter(id__in=self.items.all()).delete()
this will transform it in a subquery. But not all databases, especially MySQL ones, are good with such subqueries. Furthermore Django handles .delete() manually. It will thus make a query to fetch the primary keys of the items, and then trigger the delete logic (and also remove items that have a CASCADE dependency). So .delete() is not done as one query, but at least two queries, and often a larger amount due to ForeignKeys with an on_delete trigger.
Note however that you here remove Item objects, not "unlink" this from the other_object. For this .remove(…) [Django-doc] can be used.
I should've tried the code sample I posted, you can in fact do this. It's given as an example in the documentation, but it says "be cautious about using nested queries and understand your database server’s performance characteristics" and recommends against doing this, casting the subquery into a list:
values = Blog.objects.filter(
name__contains='Cheddar').values_list('pk', flat=True)
entries = Entry.objects.filter(blog__in=list(values))

How to bulk-associate an object to multiple objects that have ManyToManyField?

I have a model that looks like this:
class Keyword(models.Model):
name = models.CharField(unique=True)
class Post(models.Model):
title = models.CharField()
keywords = models.ManyToManyField(
Keyword, related_name="posts_that_have_this_keyword"
)
Now I want to migrate all Posts of a wrongly named Keyword to a new properly named Keyword. And there are multiple wrongly named Keywords.
I can do the following but it leads to a number of SQL queries.
for keyword in Keyword.objects.filter(is_wrongly_named=True).iterator():
old = keyword
new, _ = Keyword.objects.get_or_create(name='some proper name')
for note in old.notes_that_have_this_keyword.all():
note.keywords.add(old)
old.delete()
Is there a way I can achieve this while minimizing the SQL queries executed?
I prefer Django ORM solution to a raw SQL one, because I jumped right into the Django ORM without studying deep into SQL, not so familiar with SQL.
Thank you.
If you want to perform bulk operations with M2M relationships I suggest that you act directly on the table that joins the two objects. Django allows you to access this otherwise anonymous table by using the through attribute on the M2M attribute on an object.
So, to get the table that joins Keywords and Posts you could reference either Keyword.posts_that_have_this_keyword.through or Post.keywords.through. I'd suggest you assign a nicely named variable to this like:
KeywordPost = Post.keywords.through
Once you get a hold onto that table bulk operations can be performed.
bulk remove bad entries
KeywordPost.objects.filter(keyword__is_wrongly_named=True).delete()
bulk create new entries
invalid_keyword_posts = KeywordPost.objects.filter(keyword__is_wrongly_named=True)
post_ids_to_update = invalid_keyword_posts.values_list("post_id", flat=True)
new_keyword_posts = [KeywordPost(post_id=p_id, keyword=new_keyword) for p_id in post_ids_to_update]
KeywordPost.objects.bulk_create(new_keyword_posts)
Basically you get access to all the features that the ORM provides on this join table. You should be able to achieve much better performance that way.
You can read up more on the through attribute here: https://docs.djangoproject.com/en/3.0/ref/models/fields/#django.db.models.ManyToManyField.through
Good luck!

Enhancing a queryset's performance

I'm learning DRF and experimenting with a queryset. I'm trying to optimize to work as efficiently as possible. The goal being to get a list of grades for active students who are majoring in 'Art'.
Based on database optimization techniques,
I've ran some different updates and don't see a difference when I look at the Time returned via the console's Network tab. I DO however, see less logs in the Seq scan when I run the .explain() method on the model filtering. Am I accomplishing anything by doing that?
For example:
Grades.objects.filter(student_id__in=list(student_list)).order_by()
Anything else I can do to improve the below code that I might be missing? - Outside of adding any Foreign or Primary key model changes.
class GradeViewSet(viewsets.ModelViewSet):
serializer_class = GradesSerializer
def retrieve(self, request, *args, **kwargs):
active_students = Student.objects.filter(active=True)
student_list = active_students.filter(major='Art').values_list('student_id')
queryset = Grades.objects.filter(student_id__in=student_list)
serializers = GradesSerializer(queryset, many=True)
return Response(serializers.data)
SQL query I'm attempting to create in Django.
select * from app_grades g
join app_students s on g.student_id = s.student_id
where s.active = true and s.major = 'Art'
Your code will execute two separate database queries, I suggest that you try the following query instead:
queryset = Grades.objects.filter(student__active=True, student__major='Art')
this code will retrieve the exact same records but performing only one query with the appropriate JOIN clause.
You probably want to take a look at this part of the documentation.
Because of the lack of model relations that forbids the use of lookups I suggest that you use an Exists subuery. In this specific case the query will be as follows:
queryset = Grades.objects.annotate(student_passes_filter=Exists(
Student.objects.filter(id=OuterRef('student_id'), active=True, major='Art')
)).filter(student_passes_filter=True)
You will need to import Exists and OuterRef. Note that these are available from Django 1.11 onwards.
You should probably regroup those lines to reduce the number of queries:
active_students = Student.objects.filter(active=True)
student_list = active_students.filter(major='Art').values_list('student_id')
Into:
active_students = Student.objects.filter(active=True, major=‘Art’)
And converting to list then

Categories