Undo `lazyload()` with the relationship default

Undo `lazyload()` with the relationship default - python

I have a Query object which was initially configured to lazyload() all relations on a model:
query = session.query(Article).options(lazyload('author'))
Is it possible to revert the relationship loading back to default? E.g. the relationship was configured with lazy='joined', and I want the query to have joinedload() behavior without using joinedload() explicitly.
I was expecting defaultload() to have this behavior, but in fact it does not: it references the query default instead of the relationship default. So I'm searching for kinda resetload() solution.
The reason for doing this is because I'm creating a JSON-based query syntax, and no relations should be loaded unless the user explicitly names them.
Currently, I'm using lazyload() on all relations that were not explicitly requested, but want to go the other way around: lazyload() all relations first, and then override it for some of them.
This would have made the code more straigntforward.

Just to be clear:
By default, all inter-object relationships are lazy loading.
http://docs.sqlalchemy.org/en/latest/orm/loading.html
So we are talking about a case in which a relation has been specifically marked as eager loading, then the queries are configured as lazy loading, then you want to "override the override" as it were.
Chaining calls to options will override earlier calls. I did test this a bit.
q = s.query(User) # lazy loads 'addresses'
q = s.query(User).options(contains_eager('addresses')) # eager loads
q = s.query(User).options(contains_eager('addresses'))\
.options(lazyload('addresses')) # lazy loads
q = s.query(User).options(contains_eager('addresses'))\
.options(lazyload('addresses'))\
.options(contains_eager('addresses')) # eager loads
However, it sounds like you're talking about just reverting the lazyload option, whereas the above case involves an explicit change to eager loading.
The defaultload docstring says its use case is to be chained to other loader options, so I don't think it's related.
Based on a glance through the source, I don't think this behavior is supported. When you update the loading strategy option, it updates a dictionary with the new loading strategy and I don't think there's still a reference to the old strategy, at least as far as I can tell.
You could keep a reference to the query object before .options(lazyload(...)), or just have an option to generate the query with or without the lazyload on everything.

To force everything to lazyload, ignoring what was specified on the relationship, you can use the '*' target. From the docs:
affecting all relationships not otherwise specified in the query. This
feature is available by passing the string '*' as the argument to any
of these options:
session.query(Article).options(lazyload('*'))
Then you can specify whatever load types you want per relationship or relationship chain.
# not sure how you are mapping json data to relationships
# once you know the relationships, you can build a list of them to load
my_loads = [joinedload(rel) for rel in json_rel_data]
query = session.query(Article).options(lazyload('*'), *my_loads)
# query lazy loads **everything** except the explicitly set joined loads
If you are joining on the relationships for query purposes, you can use contains_eager instead of joinedload in the options to use the already joined relationship.
my_eagers = [contains_eager(rel) for rel in json_rel_joins]
my_loads = [joinedload(rel) for rel in json_rel_loads]
query = session.query(Article
).join(*json_rel_joins
).options(lazyload('*'), *my_eagers, *my_loads)

Related

Django filter related field using related model's custom manager

How can I apply annotations and filters from a custom manager queryset when filtering via a related field? Here's some code to demonstrate what I mean.
Manager and models
from django.db.models import Value, BooleanField
class OtherModelManager(Manager):
def get_queryset(self):
return super(OtherModelManager, self).get_queryset().annotate(
some_flag=Value(True, output_field=BooleanField())
).filter(
disabled=False
)
class MyModel(Model):
other_model = ForeignKey(OtherModel)
class OtherModel(Model):
disabled = BooleanField()
objects = OtherModelManager()
Attempting to filter the related field using the manager
# This should only give me MyModel objects with related
# OtherModel objects that have the some_flag annotation
# set to True and disabled=False
my_model = MyModel.objects.filter(some_flag=True)
If you try the above code you will get the following error:
TypeError: Related Field got invalid lookup: some_flag
To further clarify, essentially the same question was reported as a bug with no response on how to actually achieve this: https://code.djangoproject.com/ticket/26393.
I'm aware that this can be achieved by simply using the filter and annotation from the manager directly in the MyModel filter, however the point is to keep this DRY and ensure this behaviour is repeated everywhere this model is accessed (unless explicitly instructed not to).

How about running nested queries (or two queries, in case your backend is MySQL; performance).
The first to fetch the pk of the related OtherModel objects.
The second to filter the Model objects on the fetched pks.
other_model_pks = OtherModel.objects.filter(some_flag=...).values_list('pk', flat=True)
my_model = MyModel.objects.filter(other_model__in=other_model_pks)
# use (...__in=list(other_model_pks)) for MySQL to avoid a nested query.

I don't think what you want is possible.
1) I think you are miss-understanding what annotations do.
Generating aggregates for each item in a QuerySet
The second way to generate summary values is to generate an
independent summary for each object in a QuerySet. For example, if you
are retrieving a list of books, you may want to know how many authors
contributed to each book. Each Book has a many-to-many relationship
with the Author; we want to summarize this relationship for each book
in the QuerySet.
Per-object summaries can be generated using the annotate() clause.
When an annotate() clause is specified, each object in the QuerySet
will be annotated with the specified values.
The syntax for these annotations is identical to that used for the
aggregate() clause. Each argument to annotate() describes an aggregate
that is to be calculated.
So when you say:
MyModel.objects.annotate(other_model__some_flag=Value(True, output_field=BooleanField()))
You are not annotation some_flag over other_model.
i.e. you won't have: mymodel.other_model.some_flag
You are annotating other_model__some_flag over mymodel.
i.e. you will have: mymodel.other_model__some_flag
2) I'm not sure how familiar SQL is for you, but in order to preserve MyModel.objects.filter(other_model__some_flag=True) possible, i.e. to keep the annotation when doing JOINS, the ORM would have to do a JOIN over subquery, something like:
INNER JOIN
(
SELECT other_model.id, /* more fields,*/ 1 as some_flag
FROM other_model
) as sub on mymodel.other_model_id = sub.id
which would be super slow and I'm not surprised they are not doing it.
Possible solution
don't annotate your field, but add it as a regular field in your model.

The simplified answer is that models are authoritative on the field collection and Managers are authoritative on collections of models. In your efforts to make it DRY you made it WET, cause you alter the field collection in your manager.
In order to fix it, you would have to teach the model about the lookup and need to do that using the Lookup API.
Now I'm assuming that you're not actually annotating with a fixed value, so if that annotation is in fact reducible to fields, then you may just get it done, because in the end it needs to be mapped to database representation.

Is there a way to verify that all model relationships are successfully eager loaded in Sqlalchemy?

I'm familiar with the joinedload and subqueryload options in Sqlalchemy, and I'm using them to query a large result set that's later expunged from the session and cached.
Is there a way to verify that every possible relationship from the top-level model on down is eager-loaded at this point?

The supported way to ensure that you've eager loaded all the relationships you need is to put lazy="raise" on all of your relationship. It won't tell you that you did something wrong until you do it, but EAFP.
children = relationship(Child, lazy="raise")

The following iterator will iterate over all relationships reachable from a given model. It yields a tuple of (model_class, relationship_name). You can modify to look at prop.lazy or similar, or use this to construct loader options to lazy load the right things, or whatever seems appropriate.
from sqlalchemy import inspect
def recursive_relations(model, already_traversed = None):
if not already_traversed: already_traversed = set()
inspection = inspect(model)
already_traversed.add(model)
for name, prop in inspection.relationships.items():
yield (model, name)
if prop.mapper.class_ not in already_traversed:
already_traversed.add(prop.mapper.class_)
yield from recursive_relations(prop.mapper.class_, already_traversed)

How to explicitly load relationships on an existing object?

I have a SQLAlchemy model Foo which contains a lazy-loaded relationship bar which points to another model that also has a lazy-loaded relationship foobar.
When querying normally I would use this code to ensure that all objects are loaded with a single query:
session.query(Foo).options(joinedload('bar').joinedload('foobar'))
However, now I have a case where a base class already provides me a Foo instance that was retrieved using session.query(Foo).one(), so the relationships are lazy-loaded (which is the default, and I don't want to change that).
For a single level of nesting I wouldn't mind it being loaded once I access foo.bar, but since I also need to access foo.bar[x].foobar I really prefer to avoid sending queries in a loop (which would happen whenever I access foobar).
I'm looking for a way to make SQLAlchemy load the foo.bar relationship while also using the joinedload strategy for foobar.

I ran into a similar situation recently, and ended up doing the following:
eager_loaded = db.session.query(Bar).options(joinedload('foobar'))
.filter_by(bar_fk=foo.foo_pk).all()
Assuming you can recreate the bar join condition in the filter_by arguments, all the objects in the collection will be loaded into the identity map, and foo.bar[x].foobar will not need to go to the database.
One caveat: It looks like the identity map may dispose of the loaded entities if they are no longer strongly referenced - thus the assignment to eager_loaded.

The SQLAlchemy wiki contains the Disjoint Eager Loading recipe. A query is issued for the parent collection, then the children are queried and combined. For the most part, this was implemented in SQLAlchemy as the subquery strategy, but the recipe covers the case where you explicitly need to make the query later, not just separately.
The idea is that you order the child query and group the results by the remote columns linking the relationship, then populate the attribute for each parent item with the group of children. The following is slightly modified from the recipe to allow passing in a custom child query with extra options, rather than building it from the parent query. This does mean that you have to construct the child query more carefully: if your parent query has filters, then the child should join and filter as well, to prevent loading unneeded rows.
from itertools import groupby
from sqlalchemy.orm import attributes
def disjoint_load(parents, rel, q):
local_cols, remote_cols = zip(*rel.prop.local_remote_pairs)
q = q.join(rel).order_by(*remote_cols)
if attr.prop.order_by:
q = q.order_by(*rel.prop.order_by)
collections = dict((k, list(v)) for k, v in groupby(q, lambda x: tuple([getattr(x, c.key) for c in remote_cols])))
for p in parents:
attributes.set_committed_value(
p, attr.key,
collections.get(tuple([getattr(p, c.key) for c in local_cols]), ()))
return parents
# load the parents
devices = session.query(Device).filter(Device.active).all()
# build the child query with extras, use the same filter
findings = session.query(Finding
).join(Device.findings
).filter(Device.active
).options(db.joinedload(Finding.scans))
for d in disjoint_load(devices, Device.findings, findings):
print(d.cn, len(d.findings))

What is the difference between a mongoengine.DynamicEmbeddedDocument vs mongoengine.DictField?

A mongoengine.DynamicEmbeddedDocument can be used to leverage MongoDB's flexible schema-less design. It's expandable and doesn't apply type constraints to the fields, afaik.
A mongoengine.DictField similarly allows for use of MongoDB's schema-less nature. In the documentation they simply say (w.r.t. the DictField)
This is similar to an embedded document, but the structure is not defined.
Does that mean, then, the mongoengine.fields.DictField and the mongoengine.DynamicEmbeddedDocument are completely interchangeable?
EDIT (for more information):
mongoengine.DynamicEmbeddedDocument inherits from mongoengine.EmbeddedDocument which, from the code is:
A mongoengine.Document that isn't stored in its own collection. mongoengine.EmbeddedDocuments should be used as fields on mongoengine.Documents through the mongoengine.EmbeddedDocumentField field type.
A mongoengine.fields.EmbeddedDocumentField is
An embedded document field - with a declared document_type. Only valid values are subclasses of EmbeddedDocument.
Does this mean the only thing that makes the DictField and DynamicEmbeddedDocument not totally interchangeable is that the DynamicEmbeddedDocument has to be defined through the EmbeddedDocumentField field type?

From what I’ve seen, the two are similar, but not entirely interchangeable. Each approach may have a slight advantage based on your needs. First of all, as you point out, the two approaches require differing definitions in the document, as shown below.
class ExampleDynamicEmbeddedDoc(DynamicEmbeddedDocument):
pass
class ExampleDoc(Document):
dict_approach = DictField()
dynamic_doc_approach = EmbeddedDocumentField(ExampleDynamicEmbeddedDoc, default = ExampleDynamicEmbeddedDoc())
Note: The default is not required, but the dynamic_doc_approach field will need to be set to a ExampleDynamicEmbeddedDoc object in order to save. (i.e. trying to save after setting example_doc_instance.dynamic_doc_approach = {} would throw an exception). Also, you could use the GenericEmbeddedDocumentField if you don’t want to tie the field to a specific type of EmbeddedDocument, but the field would still need to be point to an object subclassed from EmbeddedDocument in order to save.
Once set up, the two are functionally similar in that you can save data to them as needed and without restrictions:
e = ExampleDoc()
e.dict_approach["test"] = 10
e.dynamic_doc_approach.test = 10
However, the one main difference that I’ve seen is that you can query against any values added to a DictField, whereas you cannot with a DynamicEmbeddedDoc.
ExampleDoc.objects(dict_approach__test = 10) # Returns a QuerySet containing our entry.
ExampleDoc.objects(dynamic_doc_approach__test = 10) # Throws an exception.
That being said, using an EmbeddedDocument has the advantage of validating fields which you know will be present in the document. (We simply would need to add them to the ExampleDynamicEmbeddedDoc definition). Because of this, I think it is best to use a DynamicEmbeddedDocument when you have a good idea of a schema for the field and only anticipate adding fields minimally (which you will not need to query against). However, if you are not concerned about validation or anticipate adding a lot of fields which you’ll query against, go with a DictField.

hybrid property with join in sqlalchemy

I have probably not grasped the use of #hybrid_property fully. But what I try to do is to make it easy to access a calculated value based on a column in another table and thus a join is required.
So what I have is something like this (which works but is awkward and feels wrong):
class Item():
:
#hybrid_property
def days_ago(self):
# Can I even write a python version of this ?
pass
#days_ago.expression
def days_ago(cls):
return func.datediff(func.NOW(), func.MAX(Event.date_started))
This requires me to add the join on the Action table by the caller when I need to use the days_ago property. Is the hybrid_property even the correct approach to simplifying my queries where I need to get hold of the days_ago value ?

One way or another you need to load or access Action rows either via join or via lazy load (note here it's not clear what Event vs. Action is, I'm assuming you have just Item.actions -> Action).
The non-"expression" version of days_ago intends to function against Action objects that are relevant only to the current instance. Normally within a hybrid, this means just iterating through Item.actions and performing the operation in Python against loaded Action objects. Though in this case you're looking for a simple aggregate you could instead opt to run a query, but again it would be local to self so this is like object_session(self).query(func.datediff(...)).select_from(Action).with_parent(self).scalar().
The expression version of the hybrid when formed against another table typically requires that the query in which it is used already have the correct FROM clauses set up, so it would look like session.query(Item).join(Item.actions).filter(Item.days_ago == xyz). This is explained at Join-Dependent Relationship Hybrid.
your expression here might be better produced as a column_property, if you can afford using a correlated subquery. See that at http://docs.sqlalchemy.org/en/latest/orm/mapping_columns.html#using-column-property-for-column-level-options.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.