Is there a more elegant/faster way to do this?
Here are my models:
class X(Model):
(...)
class A(Model):
xs = ManyToManyField(X)
class B(A):
(...)
class C(A): # NOTE: not relevant here
(...)
class Y(Model):
b = ManyToOneField(B)
Here is what I want to do:
def foo(x):
# NOTE: b's pk (b.a_ptr) is actually a's pk (a.id)
return Y.objects.filter(b__in=x.a_set.all())
But it returns the following error:
<repr(<django.db.models.query.QuerySet at 0x7f6f2c159a90>) failed:
django.core.exceptions.FieldError: Cannot resolve keyword u'a_ptr'
into field. Choices are: ...enumerate all fields of A...>
And here is what I'm doing right now in order to minimize the queries:
def foo(x):
a_set = x.a_set
a_set.model = B # NOTE: force the model to be B
return Y.filter(b__in=a_set.all())
It works but it's not one line. It would be cool to have something like this:
def foo(x):
return Y.filter(b__in=x.a_set._as(B).all())
The simpler way in Django seems to be the following:
def foo(x):
return Y.filter(b__in=B.objects.filter(pk__in=x.a_set.all()))
...but this makes useless sub-queries in SQL:
SELECT Y.* FROM Y WHERE Y.b IN (
SELECT B.a_ptr FROM B WHERE B.a_ptr IN (
SELECT A.id FROM A WHERE ...));
This is what I want:
SELECT Y.* FROM Y WHERE Y.b IN (SELECT A.id FROM A WHERE ...);
(This SQL example is a bit simplified because the relation between A and X is actually a ManyToMany table, I substituted this with SELECT A.id FROM A WHERE ... for clarity sake.)
You might get a better result if you follow the relationship one more step, rather than using in.
Y.objects.filter(b__xs=x)
I know next to nothing about databases, but there are many warnings in the Django literature about concrete inheritance class B(A) causing database inefficiency because of repeated INNER JOIN operations.
The underlying problem is that relational databases aren't object-orientated.
If you use Postgres, then Django has native support (from 1.8) for Hstore fields, which are searchable but schema-less collections of keyword-value pairs. I intend to be using one of these in my base class (your A) to eliminate any need for derived classes (your B and C). Instead, I'll maintain a type field in A (equal to "B" or "C" in your example) and use that and/or careful key presence checking to access subfields of the Hstore field which are "guaranteed" to exist if (say) instance.type=="C" (which constraint isn't enforced at the DB integrity level).
I also discovered the django-hstore module which existed prior to Django 1.8, and which has even greater functionality. I've yet to do anything with it so cannot comment further.
I also found reference to wide tables in the literature, where you simply define a model A with a potentially large number of fields which contain useful data only if the object is of type B, and more yet fields for types C,D,... I don't have the DB knowledge to assess the cost of lots of blank or null fields attached to every row of a table. Clearly if you have a million sheep records and one pedigree-racehorse record and one stick-insect record, as "subclasses" of Animal in the same wide table, it gets silly.
I'd appreciate feedback on either idea if you have the DB understanding that I lack.
Oh, and for completeness, there's the django-polymorphic module, but that builds on concrete inheritance with its attendant inefficiencies.
Related
I have a concrete base model, from which other models inherit (all models in this question have been trimmed for brevity):
class Order(models.Model):
state = models.ForeignKey('OrderState')
Here are a few examples of the "child" models:
class BorrowOrder(Order):
parts = models.ManyToManyField('Part', through='BorrowOrderPart')
class ReturnOrder(Order):
parts = models.ManyToManyField('Part', through='ReturnOrderPart')
As you can see from these examples, each child model has a many-to-many relationship of Parts through a custom table. Those custom through-tables look something like this:
class BorrowOrderPart(models.Model):
borrow_order = models.ForeignKey('BorrowOrder', related_name='borrowed_parts')
part = models.ForeignKey('Part')
qty_borrowed = models.PositiveIntegerField()
class ReturnOrderPart(models.Model):
return_order = models.ForeignKey('ReturnOrder', related_name='returned_parts')
part = models.ForeignKey('Part')
qty_returned = models.PositiveIntegerField()
Note that the "quantity" field in each through table has a custom name (unfortunately): qty_borrowed or qty_returned. I'd like to be able to query the base table (so that I'm searching across all order types), and include an annotated field for each that sums these quantity fields:
# Not sure what I specify in the Sum() call here, given that the fields
# I'm interested in are different depending on the child's type.
qs = models.Order.objects.annotate(total_qty=Sum(???))
# For a single model, I would do something like:
qs = models.BorrowOrder.objects.annotate(
total_qty=Sum('borrowed_parts__qty_borrowed'))
So I guess I have two related questions:
Can I annotate a child-model's data through a query on the parent model?
If so, can I conditionally specify the field to be annotated, given that the actual field name changes depending on the model in question?
This feels to me like a place where using When() and Case() might be helpful, but I'm not sure how I'd build the necessary logic.
The problem is that, when you are querying the base model (in multi-table inheritance), it's hard to find out which subclass the object actually is. See How to know which is the child class of a model.
The query might be achievable in theory, with something like
SELECT
CASE
WHEN child1.base_ptr_id IS NOT NULL THEN ...
WHEN child2.base_ptr_id IS NOT NULL THEN ...
END AS ...
FROM base
LEFT JOIN child1 ON child1.base_ptr_id = base.id
LEFT JOIN child2 ON child2.base_ptr_id = base.id
...
but I don't know how to translate that in Django and I think it would be too much trouble to do it. It could be done, if not anything else using raw queries.
Another solution would be to add to the base class a field that specifies which actual subclass each object is; in that case, you'd need to make as many queries as there are subclasses and join them. I don't like this solution either. Update: After I slept on this I conclude that the most Django-like solution would be not to query the parent model in the first place; simply query the submodels and join the results. I would explore the third option below only if there were performance or other practical problems.
Another idea is to create a database view (with CREATE VIEW) based on the above SQL query and translate it into a Django model with managed = False, and query that one. Maybe this is somewhat cleaner than the other solutions, but it is a bit non-standard.
Model C has a foreign key link to model B. Model B has a foreign key link to model A. This is, model A instance may have many model B instances, and Model B may have many model C instances.
Pseudo-code:
class A:
...
class B:
a = models.ForeignKey(A, ...)
class C:
b = models.ForeignKey(B, ...)
I would like to retrieve the entire list of elements of model A from the database, annotating it with the count of B elements it has, where their respective count of C elements is not none (or any random number, for that matter).
I tried this:
A.objects.annotate(
at_least_one_count=Count('b', filter=Q(b__c__isnull=False))
)
To the best of my knowledge, this should be working. However, numbers returned are not correct (in a practical case). It returns a higher number than without the filter, which is obviously not possible:
A.objects.annotate(
at_least_one_count=Count('b')
)
Edit: when I run the following query individually for every A instance, I get the same numbers, which makes me think there might be something wrong in my code:
A.objects.first().b_set.filter(c__isnull=False).__len__()
Note: I would like to perform this query without SQL. If I have to utilise some more advance Pythonic tools that Django provides, I am happy to do it, as long as I stay Object-Oriented. I am trying to move away from using raw SQL for all database operations, and re-write them all with Django ORM. However, it seems to be overly complicated.
The answer is simple: after applying queries that translate to join statements, one must execute a distinct on filter, which in Django is done by calling .distinct(...) on the query set.
In this case, if you are using a filter, and the distinct restriction is on the filter object, then you want to use:
...=Count('b', filter=Q(b__c__isnull=False), distinct=True)
How can I apply annotations and filters from a custom manager queryset when filtering via a related field? Here's some code to demonstrate what I mean.
Manager and models
from django.db.models import Value, BooleanField
class OtherModelManager(Manager):
def get_queryset(self):
return super(OtherModelManager, self).get_queryset().annotate(
some_flag=Value(True, output_field=BooleanField())
).filter(
disabled=False
)
class MyModel(Model):
other_model = ForeignKey(OtherModel)
class OtherModel(Model):
disabled = BooleanField()
objects = OtherModelManager()
Attempting to filter the related field using the manager
# This should only give me MyModel objects with related
# OtherModel objects that have the some_flag annotation
# set to True and disabled=False
my_model = MyModel.objects.filter(some_flag=True)
If you try the above code you will get the following error:
TypeError: Related Field got invalid lookup: some_flag
To further clarify, essentially the same question was reported as a bug with no response on how to actually achieve this: https://code.djangoproject.com/ticket/26393.
I'm aware that this can be achieved by simply using the filter and annotation from the manager directly in the MyModel filter, however the point is to keep this DRY and ensure this behaviour is repeated everywhere this model is accessed (unless explicitly instructed not to).
How about running nested queries (or two queries, in case your backend is MySQL; performance).
The first to fetch the pk of the related OtherModel objects.
The second to filter the Model objects on the fetched pks.
other_model_pks = OtherModel.objects.filter(some_flag=...).values_list('pk', flat=True)
my_model = MyModel.objects.filter(other_model__in=other_model_pks)
# use (...__in=list(other_model_pks)) for MySQL to avoid a nested query.
I don't think what you want is possible.
1) I think you are miss-understanding what annotations do.
Generating aggregates for each item in a QuerySet
The second way to generate summary values is to generate an
independent summary for each object in a QuerySet. For example, if you
are retrieving a list of books, you may want to know how many authors
contributed to each book. Each Book has a many-to-many relationship
with the Author; we want to summarize this relationship for each book
in the QuerySet.
Per-object summaries can be generated using the annotate() clause.
When an annotate() clause is specified, each object in the QuerySet
will be annotated with the specified values.
The syntax for these annotations is identical to that used for the
aggregate() clause. Each argument to annotate() describes an aggregate
that is to be calculated.
So when you say:
MyModel.objects.annotate(other_model__some_flag=Value(True, output_field=BooleanField()))
You are not annotation some_flag over other_model.
i.e. you won't have: mymodel.other_model.some_flag
You are annotating other_model__some_flag over mymodel.
i.e. you will have: mymodel.other_model__some_flag
2) I'm not sure how familiar SQL is for you, but in order to preserve MyModel.objects.filter(other_model__some_flag=True) possible, i.e. to keep the annotation when doing JOINS, the ORM would have to do a JOIN over subquery, something like:
INNER JOIN
(
SELECT other_model.id, /* more fields,*/ 1 as some_flag
FROM other_model
) as sub on mymodel.other_model_id = sub.id
which would be super slow and I'm not surprised they are not doing it.
Possible solution
don't annotate your field, but add it as a regular field in your model.
The simplified answer is that models are authoritative on the field collection and Managers are authoritative on collections of models. In your efforts to make it DRY you made it WET, cause you alter the field collection in your manager.
In order to fix it, you would have to teach the model about the lookup and need to do that using the Lookup API.
Now I'm assuming that you're not actually annotating with a fixed value, so if that annotation is in fact reducible to fields, then you may just get it done, because in the end it needs to be mapped to database representation.
I have ModelA and ModelB classes, and I want to define an attribute on ModelA called number_of_b_models whose value is the result of the next query:
SELECT count(modelb.id) FROM modelb WHERE modelb.a_id = :1 AND modelb.status = 2
where :1 is replaced by a.id from ModelA.
Since the query itself is so straightforward and self-explanatory it doesn't have to use the func.count construct, or any other SQLAlchemy construct - just to allow me to comfortably use number_of_b_models from an instance of ModelA.
How do i define number_of_b_models on ModelA?
As a property on the SQLAlchemy model (ModelA) itself. This solution is not ideal in terms of internal implementation - using the Session.object_session classmethod. This can be replaced by insepct(self).session - but this is pretty much the same thing. This is obviously not suitable for mass queries (If you want the number_of_b_models from n instance of ModelA you will trigger n database calls). It DOES make ModelA's API cleaner.
#property
def number_of_b_models(self):
return Session.object_session(self).query(
func.count(ModelB.id)
).filter_by(a_id=self.id, status=2).first()[0]
I have these two simple models, A and B:
from django.db import models
class A(models.Model):
name = models.CharField(max_length=10)
class B(A):
age = models.IntegerField()
Now, how can I query for all instances of A which do not have an instance of B?
The only way I found requires an explicitly unique field on each subclass, which is NOT NULL, so that I can do A.objects.filter(b__this_is_a_b=None), for example, to get instances that are not also B instances. I'm looking for a way to do this without adding an explicit silly flag like that.
I also don't want to query for all of the objects and then filter them in Python. I want to get the DB to do it for me, which is basically something like SELECT * FROM A WHERE A.id in (SELECT id from B)
Since some version of django or python this works as well:
A.Objects.all().filter(b__isnull=True)
because if a is an A object a.b gives the subclass B of a when it exists
I Know this is an old question, but my answer might help new searchers on this subject.
see also:
multi table inheritance
And one of my own questions about this: downcasting a super class to a sub class
I'm not sure it's possible to do this purely in the DB with Django's
ORM, in a single query. Here's the best I've been able to do:
A.objects.exclude(id__in=[r[0] for r in B.objects.values_list("a_ptr_id")])
This is 2 DB queries, and works best with a simplistic inheritance
graph - each subclass of A would require a new database query.
Okay, it took a lot of trial and error, but I have a solution. It's
ugly as all hell, and the SQL is probably worse than just going with
two queries, but you can do something like so:
A.objects.exclude(b__age__isnull=True).exclude(b__age_isnull=False)
There's no way to get Django to do the join without referencing a
field on b. But with these successive .exclude()s, you make any A
with a B subclass match one or the other of the excludes. All you're left with are A's
without a B subclass.
Anyway, this is an interesting use case, you should bring it up on django-dev...
I don't work with django, but it looks like you want the isinstance(obj, type) built-in python method.
Edit:
Would A.objects.exclude(id__exact=B__id) work?