Say I have 3 models in Django
class Instrument(models.Model):
ticker = models.CharField(max_length=30, unique=True, db_index=True)
class Instrument_df(models.Model):
instrument = models.OneToOneField(
Instrument,
on_delete=models.CASCADE,
primary_key=True,
)
class Quote(models.Model):
instrument = models.ForeignKey(Instrument, on_delete=models.CASCADE)
I just want to query all Quotes that correspond to an instrument of 'DF' type. in SQL I would perform the join of Quote and Instrument_df on field id.
Using Django's ORM I came out with
Quote.objects.filter(instrument__instrument_df__instrument_id__gte=-1)
I think this does the job, but I see two drawbacks:
1) I am joining 3 tables, when in fact table Instrument would not need to be involved.
2) I had to insert the trivial id > -1 condition, that holds always. This looks awfully artificial.
How should this query be written?
Thanks!
Assuming Instrument_df has other fields not shown in the snippet (else this table is just useless and could be replaced by a flag in Instrument), a possible solution could be to use either a subquery or two queries:
# with a subquery
dfids = Instrument_df.objects.values_list("instrument", flat=True)
Quote.objects.filter(instrument__in=dfids)
# with two queries (can be faster on MySQL)
dfids = list(Instrument_df.objects.values_list("instrument", flat=True))
Quote.objects.filter(instrument__in=dfids)
Whether this will perform better than your actual solution depends on your db vendor and version (MySQL was known for being very bad at handling subqueries, don't know if it's still the case) and actual content.
But I think the best solution here would be a plain raw query - this is a bit less portable and may require more care in case of a schema update (hint: use a custom manager and write this query as a manager method so you have one single point of truth - you don't want to scatter your views with raw sql queries).
Related
How can I make a query
select name where id in (select id from ...)
using Django ORM? I think I can make this using some loop for for obtain some result and another loop for, for use this result, but I think that is not practical job, is more simple make a query sql, I think that make this in python should be more simple in python
I have these models:
class Invoice (models.Model):
factura_id = models.IntegerField(unique=True)
created_date = models.DateTimeField()
store_id = models.ForeignKey(Store,blank=False)
class invoicePayments(models.Model):
invoice = models.ForeignKey(Factura)
date = models.DateTimeField()#auto_now = True)
money = models.DecimalField(max_digits=9,decimal_places=0)
I need get the payments of a invoice filter by store_id,date of pay.
I make this query in mysql using a select in (select ...). This a simple query but make some similar using django orm i only think and make some loop for but I don't like this idea:
invoiceXstore = invoice.objects.filter(local=3)
for a in invoiceXstore:
payments = invoicePayments.objects.filter(invoice=a.id,
date__range=["2016-05-01", "2016-05-06"])
You can traverse ForeignKey relations using double underscores (__) in Django ORM. For example, your query could be implemented as:
payments = invoicePayments.objects.filter(invoice__store_id=3,
date__range=["2016-05-01", "2016-05-06"])
I guess you renamed your classes to English before posting here. In this case, you may need to change the first part to factura__local=3.
As a side note, it is recommended to rename your model class to InvoicePayments (with a capital I) to be more compliant with PEP8.
Your mysql raw query is a sub query.
select name where id in (select id from ...)
In mysql this will usually be slower than an INNER JOIN (refer : [http://dev.mysql.com/doc/refman/5.7/en/rewriting-subqueries.html]) thus you can rewrite your raw query as an INNER JOIN which will look like 1.
SELECT ip.* FROM invoicepayments i INNER JOIN invoice i ON
ip.invoice_id = i.id
You can then use a WHERE clause to apply the filtering.
The looping query approach you have tried does work but it is not recommended because it results in a large number of queries being executed. Instead you can do.
InvoicePayments.objects.filter(invoice__local=3,
date__range=("2016-05-01", "2016-05-06"))
I am not quite sure what 'local' stands for because your model does not show any field like that. Please update your model with the correct field or edit the query as appropriate.
To lean about __range see this https://docs.djangoproject.com/en/1.9/ref/models/querysets/#range
I have a model where I needed historical data for a couple specific fields, so I put those fields into a separate model with a foreign key relationship.
Something sort of like this:
class DataThing(models.Model):
# a bunch of fields here...
class DataThingHistory(models.Model):
datathing_id = models.ForeignKey('DataThing', on_delete=models.CASCADE)
text_with_history = models.CharField(max_length=500, null=True, blank=True)
# other similar fields...
timestamp = models.DateTimeField()
Now I'm trying to filter the former model using a text field in the latest corresponding entry in the latter.
Basically if these were not separate models I'd just try this:
search_results = DataThing.objects.filter(text_with_history__icontains=searchterm)
But I haven't figured out a good way to do this across this one-to-many relationship and using only the entry with the latest timestamp in the latter model, at least by using the Django ORM.
I have an idea of how to do the query I want using raw SQL, but I'd really like to avoid using raw if at all possible.
This solution makes use of distinct(*fields) which is currently only supported by Postgres:
latest_things = DataThingHistory.objects.
order_by('datathing_id_id', '-timestamp').
distinct('datathing_id_id')
lt_with_searchterm = DataThingHistory.objects.
filter(id__in=latest_things, text_with_history__icontains=searchterm)
search_results = DataThing.objects.filter(datathinghistory__in=lt_with_searchterm)
This should result in single db query. I have split the query for readability, but you can nest it into a single statement. Btw, as you might see here, foo_id is not a good name for a ForeignKey field.
You would do the same by querying DataThing while referring to DataThingHistory:
search_results = DataThing.objects.filter(datathinghistory__text_with_history__icontains=searchterm)
Check django doc on how to query on reverse relationship.
Edit:
My previous answer is incomplete. In order to search on latest history for each DataThing, you need to annotate on timestamp using Max:
from django.db.models import Max
search_results = search_results.values('field1', 'field2',...).annotate(latest_history=Max('datathinghistory__timestemp'))
This wouldn't give you complete DataThing objects, but you could add as many fields to values as you want.
I have a problem with building right index for my query.
I have a model like this:
from django.db import models
class Record(models.Model):
user = models.ForeignKey(User, db_index=True, related_name='records')
action = models.ForeignKey(Action, db_index=True)
time = models.DateTimeField(db_index=True, default=timezone.now)
class Meta:
index_together = (
('user', 'time'),
('action', 'user', 'time'),
)
As you can see, there are two custom indexes for this model.
If I wanna get all records, related to specific user, filtered by time, I use this query: user.records.filter(time__gt=some_moment). It works OK and uses first custom index (according to Django Debug Toolbar).
Now, in my situation result must be sorted by action. I use this query: user.records.filter(time__gt=some_moment).order_by('action').
But, although an appropriate index exists, it is not used.
What am I doing wrong? How to build correct index for this query?
Django version = 1.8.4, all migrations are applied, database backend = mysql.
UPD: there is my query:
SELECT *** FROM `appname_record`
WHERE (`appname_record`.`user_id` = 1896158 AND
`appname_record`.`time` > '2015-10-19 06:39:30.992790')
ORDER BY `appname_record`.`action_id` ASC
there is full django toolbar explanation:
ID: 1
SELECT_TYPE: SIMPLE
TABLE: appname_record
TYPE: ALL
POSSIBLE_KEYS:
appname_record_user_id_3214bab8a46891cc_idx, appname_record_07cc694b
KEY: None
KEY_LEN: None
REF: None
ROWS: 240
EXTRA: Using where; Using filesort
There is mysql show create table appname_record; part about keys:
PRIMARY KEY (`id`),
KEY `appname_record_action_id_3e42ba1d5288899c_idx` (`action_id`, `user_id`,`time`),
KEY `appname_record_user_id_3214bab8a46891cc_idx` (`user_id`,`time`),
KEY `appname_record_07cc694b` (`time`),
So it seems like right index isn't even in possible keys.
If a query does not use any indexes at all, that's often because there isn't enough data in the table for an index to be really useful. However with 500 records there is a good chance that an index ought to come into play.
In the query that you have used, the appname_record_user_id_3214bab8a46891cc_idx is indeed a likely candidate but it's still not used. Why? because your query apparently causes the database to look at approximately half the table, as such an index cannot speed things up.
You seem to be on the right track with dropping one index. Two many similar indexes aren't really usefull either. I would try this index instead:
class Meta:
index_together = (
('user', 'time','action'),
)
The difference here is in the order of the fields. This is important:
MySQL can use multiple-column indexes for queries that test all the
columns in the index, or queries that test just the first column, the
first two columns, the first three columns, and so on. If you specify
the columns in the right order in the index definition, a single
composite index can speed up several kinds of queries on the same
table.
I found the solution, it is not elegant, but it worked for me. Since I couldn't build any query, which will use 3-column index, I jush dropped a 2-column, so now both of my queries use 3-column. Have no idea, why it was ignored previously. Maybe, some complex mysql optimizations.
For the new versions of Django, using Index class meta over index_together meta options is suggested.
from django.db import models
class Customer(models.Model):
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
class Meta:
indexes = [
models.Index(fields=['last_name', 'first_name']),
models.Index(fields=['first_name'], name='first_name_idx'),
]
https://docs.djangoproject.com/en/3.2/ref/models/options/#unique-together
https://docs.djangoproject.com/en/3.2/ref/models/options/#index-together
I have the following models which I'm testing with SQLite3 and MySQL:
# (various model fields extraneous to discussion removed...)
class Run(models.Model):
runNumber = models.IntegerField()
class Snapshot(models.Model):
t = models.DateTimeField()
class SnapshotRun(models.Model):
snapshot = models.ForeignKey(Snapshot)
run = models.ForeignKey(Run)
# other fields which make it possible to have multiple distinct Run objects per Snapshot
I want a query which will give me a set of runNumbers & snapshot IDs for which the Snapshot.id is below some specified value. Naively I would expect this to work:
print SnapshotRun.objects.filter(snapshot__id__lte=ss_id)\
.order_by("run__runNumber", "-snapshot__id")\
.distinct("run__runNumber", "snapshot__id")\
.values("run__runNumber", "snapshot__id")
But this blows up with
NotImplementedError: DISTINCT ON fields is not supported by this database backend
for both database backends. Postgres is unfortunately not an option.
Time to fall back to raw SQL?
Update:
Since Django's ORM won't help me out of this one (thanks #jknupp) I did manage to get the following raw SQL to work:
cursor.execute("""
SELECT r.runNumber, ssr1.snapshot_id
FROM livedata_run AS r
JOIN livedata_snapshotrun AS ssr1
ON ssr1.id =
(
SELECT id
FROM livedata_snapshotrun AS ssr2
WHERE ssr2.run_id = r.id
AND ssr2.snapshot_id <= %s
ORDER BY snapshot_id DESC
LIMIT 1
);
""", max_ss_id)
Here livedata is the Django app these tables live in.
The note in the Django documentation is pretty clear:
Note:
Any fields used in an order_by() call are included in the SQL SELECT columns. This can sometimes lead to unexpected results when used in conjunction with distinct(). If order by fields from a related model, those fields will be added to the selected columns and they may make otherwise duplicate rows appear to be distinct. Since the extra columns don’t appear in the returned results (they are only there to support ordering), it sometimes looks like non-distinct results are being returned.
Similarly, if you use a values() query to restrict the columns selected, the columns used in any order_by() (or default model ordering) will still be involved and may affect uniqueness of the results.
The moral here is that if you are using distinct() be careful about ordering by related models. Similarly, when using distinct() and values() together, be careful when ordering by fields not in the values() call.
Also, below that:
This ability to specify field names (with distinct) is only available in PostgreSQL.
I'm trying to use the Django ORM for a task that requires a JOIN in SQL. I
already have a workaround that accomplishes the same task with multiple queries
and some off-DB processing, but I'm not satisfied by the runtime complexity.
First, I'd like to give you a short introduction to the relevant part of my
model. After that, I'll explain the task in English, SQL and (inefficient) Django ORM.
The Model
In my CMS model, posts are multi-language: For each post and each language, there can be one instance of the post's content. Also, when editing posts, I don't UPDATE, but INSERT new versions of them.
So, PostContent is unique on post, language and version. Here's the class:
class PostContent(models.Model):
""" contains all versions of a post, in all languages. """
language = models.ForeignKey(Language)
post = models.ForeignKey(Post) # the Post object itself only
version = models.IntegerField(default=0) # contains slug and id.
# further metadata and content left out
class Meta:
unique_together = (("resource", "language", "version"),)
The Task in SQL
And this is the task: I'd like to get a list of the most recent versions of all posts in each language, using the ORM. In SQL, this translates to a JOIN on a subquery that does GROUP BY and MAX to get the maximum of version for each unique pair of resource and language. The perfect answer to this question would be a number of ORM calls that produce the following SQL statement:
SELECT
id,
post_id,
version,
v
FROM
cms_postcontent,
(SELECT
post_id as p,
max(version) as v,
language_id as l
FROM
cms_postcontent
GROUP BY
post_id,
language_id
) as maxv
WHERE
post_id=p
AND version=v
AND language_id=l;
Solution in Django
My current solution using the Django ORM does not produce such a JOIN, but two seperate SQL
queries, and one of those queries can become very large. I first execute the subquery (the inner SELECT from above):
maxv = PostContent.objects.values('post','language').annotate(
max_version=Max('version'))
Now, instead of joining maxv, I explicitly ask for every single post in maxv, by
filtering PostContent.objects.all() for each tuple of post, language, max_version. The resulting SQL looks like
SELECT * FROM PostContent WHERE
post=P1 and language=L1 and version=V1
OR post=P2 and language=L2 and version=V2
OR ...;
In Django:
from django.db.models import Q
conjunc = map(lambda pc: Q(version=pc['max_version']).__and__(
Q(post=pc['post']).__and__(
Q(language=pc['language']))), maxv)
result = PostContent.objects.filter(
reduce(lambda disjunc, x: disjunc.__or__(x), conjunc[1:], conjunc[0]))
If maxv is sufficiently small, e.g. when retrieving a single post, this might be
a good solution, but the size of the query and the time to create it grow linearly with
the number of posts. The complexity of parsing the query is also at least linear.
Is there a better way to do this, apart from using raw SQL?
You can join (in the sense of union) querysets with the | operator, as long as the querysets query the same model.
However, it sounds like you want something like PostContent.objects.order_by('version').distinct('language'); as you can't quite do that in 1.3.1, consider using values in combination with distinct() to get the effect you need.