I'm currently performing a raw query in my database because i use the MySQL function instr. I would like to translate is into a django Func class.I've spend several days reading the docs, Django custom for complex Func (sql function) and Annotate SQL Function to Django ORM Queryset `FUNC` / `AGGREGATE` but i still fail to write succesfully my custom Func.
This is my database
from django.db import models
class Car(models.Model):
brand = models.CharField("brand name", max_length=50)
#then add the __str__ function
Then I populate my database for this test
Car.objects.create(brand="mercedes")
Car.objects.create(brand="bmw")
Car.objects.create(brand="audi")
I want to check if something in my table is in my user input. This is how i perform my SQL query currently
query = Car.objects.raw("SELECT * FROM myAppName_car WHERE instr(%s, brand)>0", ["my audi A3"])
# this return an sql query with one element in this example
I'm trying to tranform it in something that would look like this
from django.db.models import Func
class Instr(Func):
function = 'INSTR'
query = Car.objects.filter(brand=Instr('brand'))
EDIT
Thank to the response, the correct answer is
from django.db.models import Value
from django.db.models.functions import StrIndex
query = Car.objects.annotate(pos=StrIndex(Value("my audi A3"), "brand")).filter(pos__gt=0)
Your custom database function is totally correct, but you're using it in the wrong way.
When you look at your usage in the raw SQL function, you can clearly see that you need 3 parameters for your filtering to work correctly: a string, a column name and a threshold (in your case it is always zero)
If you want to use this function in the same way in your query, you should do it like this:
query = Car.objects.annotate(
position_in_brand=Instr("my audi A3")
).filter(position_in_brand__gt=0)
For this to work properly, you also need to add output_field = IntegerField() to your function class, so Django knows that the result will always be an integer.
But... You've missed 2 things and your function is not needed at all, because it already exist in Django as StrIndex.
And the 2nd thing is: you can achieve the same outcome by using already existing django lookups, like contains or icontains:
query = Car.objects.filter(brand__contains="my audi A3")
You can find out more about lookups in Django docs
Related
Context
I am quite new to Django and I am trying to write a complex query that I think would be easily writable in raw SQL, but for which I am struggling using the ORM.
Models
I have several models named SignalValue, SignalCategory, SignalSubcategory, SignalType, SignalSubtype that have the same structure like the following model:
class MyModel(models.Model):
id = models.BigAutoField(primary_key=True)
name = models.CharField()
fullname = models.CharField()
I also have explicit models that represent the relationships between the model SignalValue and the other models SignalCategory, SignalSubcategory, SignalType, SignalSubtype. Each of these relationships are named SignalValueCategory, SignalValueSubcategory, SignalValueType, SignalValueSubtype respectively. Below is the SignalValueCategory model as an example:
class SignalValueCategory(models.Model):
signal_value = models.OneToOneField(SignalValue)
signal_category = models.ForeignKey(SignalCategory)
Finally, I also have the two following models. ResultSignal stores all the signals related to the model Result:
class Result(models.Model):
pass
class ResultSignal(models.Model):
id = models.BigAutoField(primary_key=True)
result = models.ForeignKey(
Result
)
signal_value = models.ForeignKey(
SignalValue
)
Query
What I am trying to achieve is the following.
For a given Result, I want to retrieve all the ResultSignals that belong to it, filter them to keep the ones of my interest, and annotate them with two fields that we will call filter_group_id and filter_group_name. The values of two fields are determined by the SignalValue of the given ResultSignal.
From my perspective, the easiest way to achieve this would be first to annotate the SignalValues with their corresponding filter_group_name and filter_group_id, and then to join the resulting QuerySet with the ResultSignals. However, I think that it is not possible to join two QuerySets together in Django. Consequently, I thought that we could maybe use Prefetch objects to achieve what I am trying to do, but it seems that I am unable to make it work properly.
Code
I will now describe the current state of my queries.
First, annotating the SignalValues with their corresponding filter_group_name and filter_group_id. Note that filter_aggregator in the following code is just a complex filter that allows me to select the wanted SignalValues only. group_filter is the same filter but as a list of subfilters. Additionally, filter_name_case is a conditional expression (Case() construct):
# Attribute a group_filter_id and group_filter_name for each signal
signal_filters = SignalValue.objects.filter(
filter_aggregator
).annotate(
filter_group_id=Window(
expression=DenseRank(),
order_by=group_filters
),
filter_group_name=filter_name_case
)
Then, trying to join/annotate the SignalResults:
prefetch_object = Prefetch(
lookup="signal_value",
queryset=signal_filters,
to_attr="test"
)
result_signals: QuerySet = (
last_interview_result
.resultsignal_set
.filter(signal_value__in=signal_values_of_interest)
.select_related(
'signal_value__signalvaluecategory__signal_category',
'signal_value__signalvaluesubcategory__signal_subcategory',
'signal_value__signalvaluetype__signal_type',
'signal_value__signalvaluesubtype__signal_subtype',
)
.prefetch_related(
prefetch_object
)
.values(
"signal_value",
"test",
category=F('signal_value__signalvaluecategory__signal_category__name'),
subcategory=F('signal_value__signalvaluesubcategory__signal_subcategory__name'),
type=F('signal_value__signalvaluetype__signal_type__name'),
subtype=F('signal_value__signalvaluesubtype__signal_subtype__name'),
)
)
Normally, from my understanding, the resulting QuerySet should have a field "test" that is now available, that would contain the fields of signal_filter, the first QuerySet. However, Django complains that "test" is not found when calling .values(...) in the last part of my code: Cannot resolve keyword 'test' into field. Choices are: [...]. It is like the to_attr parameter of the Prefetch object was not taken into account at all.
Questions
Did I missunderstand the functioning of annotate() and prefetch_related() functions? If not, what am I doing wrong in my code for the specified parameter to_attr to not exist in my resulting QuerySet?
Is there a better way to join two QuerySets in Django or am I better off using RawSQL? An alternative way would be to switch to Pandas to make the join in-memory, but it is very often more efficient to do such transformations on the SQL side with well-designed queries.
You're on the right path, but just missing what prefetch does.
Your annotations are correct, but the "test" prefetch isn't really an attribute. You batch up the SELECT * FROM signal_value queries so you don't have to execute the select per row. Just drop the "test" annotation and you should be fine. https://docs.djangoproject.com/en/3.2/ref/models/querysets/#prefetch-related
Please don't use pandas, it's definitely not necessary and is a ton of overhead. As you say yourself, it's more efficient to do the transforms on the sql side
From the docs on prefetch_related:
Remember that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query.
It's not obvious but the values() call is part of these chained methods that imply a different query, and will actually cancel prefetch_related. This should work if you remove it.
I'm learning DRF and experimenting with a queryset. I'm trying to optimize to work as efficiently as possible. The goal being to get a list of grades for active students who are majoring in 'Art'.
Based on database optimization techniques,
I've ran some different updates and don't see a difference when I look at the Time returned via the console's Network tab. I DO however, see less logs in the Seq scan when I run the .explain() method on the model filtering. Am I accomplishing anything by doing that?
For example:
Grades.objects.filter(student_id__in=list(student_list)).order_by()
Anything else I can do to improve the below code that I might be missing? - Outside of adding any Foreign or Primary key model changes.
class GradeViewSet(viewsets.ModelViewSet):
serializer_class = GradesSerializer
def retrieve(self, request, *args, **kwargs):
active_students = Student.objects.filter(active=True)
student_list = active_students.filter(major='Art').values_list('student_id')
queryset = Grades.objects.filter(student_id__in=student_list)
serializers = GradesSerializer(queryset, many=True)
return Response(serializers.data)
SQL query I'm attempting to create in Django.
select * from app_grades g
join app_students s on g.student_id = s.student_id
where s.active = true and s.major = 'Art'
Your code will execute two separate database queries, I suggest that you try the following query instead:
queryset = Grades.objects.filter(student__active=True, student__major='Art')
this code will retrieve the exact same records but performing only one query with the appropriate JOIN clause.
You probably want to take a look at this part of the documentation.
Because of the lack of model relations that forbids the use of lookups I suggest that you use an Exists subuery. In this specific case the query will be as follows:
queryset = Grades.objects.annotate(student_passes_filter=Exists(
Student.objects.filter(id=OuterRef('student_id'), active=True, major='Art')
)).filter(student_passes_filter=True)
You will need to import Exists and OuterRef. Note that these are available from Django 1.11 onwards.
You should probably regroup those lines to reduce the number of queries:
active_students = Student.objects.filter(active=True)
student_list = active_students.filter(major='Art').values_list('student_id')
Into:
active_students = Student.objects.filter(active=True, major=βArtβ)
And converting to list then
model.py
class Tdzien(models.Model):
dziens = models.SmallIntegerField(primary_key=True, db_column='DZIENS')
dzienrok = models.SmallIntegerField(unique=True, db_column='ROK')
class Tnogahist(models.Model):
id_noga = models.ForeignKey(Tenerg, primary_key=True, db_column='ID_ENERG')
dziens = models.SmallIntegerField(db_column='DZIENS')
What I want is to get id_noga where dzienrok=1234. I know that dziens should be
dziens = models.ForeignKey(Tdzien)
but it isn't and I can't change that. Normally I would use something like
Tnogahist.objects.filter(dziens__dzienrok=1234)
but I don't know how to join and filter those tables without foreignkey.
No joins without a foreign key as far as I know, but you could use two queries:
Tnogahist.objects.filter(dziens__in=Tdzien.objects.filter(dzienrok=1234))
It's possible to join two tables by performing a raw sql query. But for this case it's quite nasty, so I recommend you to rewrite your models.py.
You can check how to do this here
It would be something like this:
from django.db import connection
def my_custom_sql(self):
cursor = connection.cursor()
cursor.execute("select id_noga
from myapp_Tnogahist a
inner join myapp_Tdzien b on a.dziens=b.dziens
where b.dzienrok = 1234")
row = cursor.fetchone()
return row
Could you do this with .extra? From https://docs.djangoproject.com/en/dev/ref/models/querysets/#extra:
where / tables
You can define explicit SQL WHERE clauses β perhaps to perform
non-explicit joins β by using where. You can manually add tables to
the SQL FROM clause by using tables.
To provide a little more context around #paul-tomblin's answer,
It's worth mentioning that for the vast majority of django users; the best course of action is to implement a conventional foreign key. Django strongly recommends avoiding the use of extra() saying "use this method as a last resort". However, extra() is still preferable to raw queries using Manager.raw() or executing custom SQL directly using django.db.connection
Here's an example of how you would achieve this using django's .extra() method:
Tnogahist.objects.extra(
tables = ['myapp_tdzien'],
where = [
'myapp_tnogahist.dziens=myapp_tdzien.dziens',
'myapp_tdzien.dzienrok=%s',
],
params = [1234],
)
The primary appeal for using extra() over other approaches is that it plays nicely with the rest of django's queryset stack, like filter, exclude, defer, values, and slicing. So you can probably plug it in alongside traditional django query logic. For example: Tnogahist.objects.filter(...).extra(...).values('id_noga')[:10]
I'm using django ORM's exact() method to query only selected fields from a set of models to save RAM. I can't use defer() or only() due to some constraints on the ORM manager I am using (it's not the default one).
The following code works without an error:
q1 = Model.custom_manager.all().extra(select={'field1':'field1'})
# I only want one field from this model
However, when I jsonify the q1 queryset, I get every single field of the model.. so extra() must not have worked, or am I doing something wrong?
print SafeString(serializers.serialize('json', q1))
>>> '{ everything!!!!!}'
To be more specific, the custom manager I am using is django-sphinx. Model.search.query(...) for example.
Thanks.
So, Im not sure if you can do exactly what you want to do. However, if you only want the values for a particular field or a few fields, you can do it with values
It likely does the full query, but the result will only have the values you want. Using your example:
q1 = Model.custom_manager.values('field1', 'field2').all()
This should return a ValuesQuerySet. Which you will not be able to use with serializers.serialize so you will have to do something like this:
from django.utils import simplejson
data = [value for value in q1]
json_dump = simplejson.dumps(data)
Another probably better solution is to just do your query like originally intended, forgetting extra and values and just use the fields kwarg in the serialize method like this:
print SafeString(serializers.serialize('json', q1, fields=('field1', 'field2')))
The downside is that none of these things actually do the same thing as Defer or Only(all the fields are returned from the database), but you get the output you desire.
I'm running a bunch of filters on one of my models. Specifically, I'm doing something like this in one of my views:
cities = City.objects.filter(name__icontains=request.GET['name']
cities = City.objects.filter(population__gte=request.GET['lowest_population']
return cities
Now I'd like to add one other, different type of filter. Specifically, I'd like to include only those cities that are a certain distance away from a particular zip code. I already have the relevant function for this, i.e. something like:
distanceFromZipCode(city, zipCode)
# This returns 110 miles, for example
How do I combine django's queryset filtering with this additional filter I'd like to add? I know that if cities were merely a list, I could just use .filter() and pass in the appropriate lambda (e.g. return true if the distance from the relevant zip code is <100).
But I'm dealing with query sets, not simple lists, so how would I do this?
The root of the issue is that you're trying to mix sql filters, which are done within the db, and a python filter, which is done once the records are materialized from the db. You can't do that without taking the items from the database and then filtering on top of that.
You can't, do this via your python function, but you can do this via geodjango:
https://docs.djangoproject.com/en/dev/ref/contrib/gis/db-api/#distance-queries
cites = cities.filter(distance_lt=101)
would get you what you want
You're trying to mix a Python method with a database query, and that's not possible. Either you write the SQL to perform the distance calculation (fast), or you fetch every row and call your method (slow). Django filters simply translate parameters into a SQL WHERE clause, so if you can't express it in SQL, you probably can't express it in a filter.
If you are storing the location of the city as geometry you can use a distance spatial filter chained with the rest of your filters:
from django.contrib.gis.measure import D
zipCode = ZipCode.objects.all()[0]
cities = City.objects.filter(point__distance_lte=(zipCode.geom, D(mi=110)))
This assumes you have a ZipCode model with geometry of each zip code and the geometry is stored in a field called 'geom' and your City object has a point field called 'point'.
In my opinion you should use the Queryset object, and define complex filter method inside custom manager.
from django.db import models
from django.db.models import Q
class CityManager(models.Manager):
def get_filtered_cities(self, name=None, lowest_population=None, zip_code=None):
query = Q()
if(name):
query = Q(name__icontains=name)
if(lowest_population):
query = query & Q(population__gte=lowest_population)
if(zip_code):
pass #other query object
return self.get_query_set().filter(query)