How to remove all items from many-to-many collection in SqlAlchemy? - python

when I need to remove an object from declarative ORM many-to-many relationship, I am supposed to do this:
blogpost.tags.remove(tag)
Well. What am I supposed to do if I need to purge all these relations (not only one)? Typical situation: I'd like to set a new list of tags to my blogpost. So I need to...:
Remove all existing relations between that blogpost and tags.
Set new relations and create new tags if they don't exist.
Of course, there could be a better way of doing this. In that case please let me know.

This is the standard Python idiom for clearing a list – assigning to the “entire list” slice:
blogpost.tags[:] = []
Instead of the empty list, you may want assign the new set of tags directly.
blogpost.tags[:] = new_tags
SQLAlchemy's relations are instrumented attributes, meaning that they keep the interface of a list (or set, dict, etc), but any changes are reflected in the database. This means that anything you can do with a list is possible with a relation, while SQLA transparently listens to the changes and updates the database accordingly.

Confrimed for what Two-Bit Alchemist has reported.
blogpost.tags[:] = new_tags
will complain about
TypeError: 'AppenderBaseQuery' object does not support item assignment
But
blogpost.tags = new_tags
seems to work fine.

Related

Pass a queryset as the argument to __in in django?

I have a list of object ID's that I am getting from a query in an model's method, then I'm using that list to delete objects from a different model:
class SomeObject(models.Model):
# [...]
def do_stuff(self, some_param):
# [...]
ids_to_delete = {item.id for item in self.items.all()}
other_object = OtherObject.objects.get_or_create(some_param=some_param)
other_object.items.filter(item_id__in=ids_to_delete).delete()
What I don't like is that this takes 2 queries (well, technically 3 for the get_or_create() but in the real code it's actually .filter(some_param=some_param).first() instead of the .get(), so I don't think there's any easy way around that).
How do I pass in an unevaluated queryset as the argument to an __in lookup?
I would like to do something like:
ids_to_delete = self.items.all().values("id")
other_object.items.filter(item_id__in=ids_to_delete).delete()
You can, pass a QuerySet to the query:
other_object.items.filter(id__in=self.items.all()).delete()
this will transform it in a subquery. But not all databases, especially MySQL ones, are good with such subqueries. Furthermore Django handles .delete() manually. It will thus make a query to fetch the primary keys of the items, and then trigger the delete logic (and also remove items that have a CASCADE dependency). So .delete() is not done as one query, but at least two queries, and often a larger amount due to ForeignKeys with an on_delete trigger.
Note however that you here remove Item objects, not "unlink" this from the other_object. For this .remove(…) [Django-doc] can be used.
I should've tried the code sample I posted, you can in fact do this. It's given as an example in the documentation, but it says "be cautious about using nested queries and understand your database server’s performance characteristics" and recommends against doing this, casting the subquery into a list:
values = Blog.objects.filter(
name__contains='Cheddar').values_list('pk', flat=True)
entries = Entry.objects.filter(blog__in=list(values))

Undo `lazyload()` with the relationship default

I have a Query object which was initially configured to lazyload() all relations on a model:
query = session.query(Article).options(lazyload('author'))
Is it possible to revert the relationship loading back to default? E.g. the relationship was configured with lazy='joined', and I want the query to have joinedload() behavior without using joinedload() explicitly.
I was expecting defaultload() to have this behavior, but in fact it does not: it references the query default instead of the relationship default. So I'm searching for kinda resetload() solution.
The reason for doing this is because I'm creating a JSON-based query syntax, and no relations should be loaded unless the user explicitly names them.
Currently, I'm using lazyload() on all relations that were not explicitly requested, but want to go the other way around: lazyload() all relations first, and then override it for some of them.
This would have made the code more straigntforward.
Just to be clear:
By default, all inter-object relationships are lazy loading.
http://docs.sqlalchemy.org/en/latest/orm/loading.html
So we are talking about a case in which a relation has been specifically marked as eager loading, then the queries are configured as lazy loading, then you want to "override the override" as it were.
Chaining calls to options will override earlier calls. I did test this a bit.
q = s.query(User) # lazy loads 'addresses'
q = s.query(User).options(contains_eager('addresses')) # eager loads
q = s.query(User).options(contains_eager('addresses'))\
.options(lazyload('addresses')) # lazy loads
q = s.query(User).options(contains_eager('addresses'))\
.options(lazyload('addresses'))\
.options(contains_eager('addresses')) # eager loads
However, it sounds like you're talking about just reverting the lazyload option, whereas the above case involves an explicit change to eager loading.
The defaultload docstring says its use case is to be chained to other loader options, so I don't think it's related.
Based on a glance through the source, I don't think this behavior is supported. When you update the loading strategy option, it updates a dictionary with the new loading strategy and I don't think there's still a reference to the old strategy, at least as far as I can tell.
You could keep a reference to the query object before .options(lazyload(...)), or just have an option to generate the query with or without the lazyload on everything.
To force everything to lazyload, ignoring what was specified on the relationship, you can use the '*' target. From the docs:
affecting all relationships not otherwise specified in the query. This
feature is available by passing the string '*' as the argument to any
of these options:
session.query(Article).options(lazyload('*'))
Then you can specify whatever load types you want per relationship or relationship chain.
# not sure how you are mapping json data to relationships
# once you know the relationships, you can build a list of them to load
my_loads = [joinedload(rel) for rel in json_rel_data]
query = session.query(Article).options(lazyload('*'), *my_loads)
# query lazy loads **everything** except the explicitly set joined loads
If you are joining on the relationships for query purposes, you can use contains_eager instead of joinedload in the options to use the already joined relationship.
my_eagers = [contains_eager(rel) for rel in json_rel_joins]
my_loads = [joinedload(rel) for rel in json_rel_loads]
query = session.query(Article
).join(*json_rel_joins
).options(lazyload('*'), *my_eagers, *my_loads)

SQLAlchemy introspection of declarative classes

I'm writing a small sqlalchemy shim to export data from a MySQL database with some lightweight data transformations—mostly changing field names. My current script works fine but requires me to essentially describe my model twice—once in the class declaration and once as a list of field names to iterate over.
I'm trying to figure out how to use introspection to identify properties on row-objects that are column accessors. The following works almost perfectly:
for attr, value in self.__class__.__dict__.iteritems():
if isinstance(value, sqlalchemy.orm.attributes.InstrumentedAttribute):
self.__class__._columns.append(attr)
except that my to-many relation accessors are also instances of sqlalchemy.orm.attributes.InstrumentedAttribute, and I need to skip those. Is there any way to distinguish between the two while I am inspecting the class dictionary?
Most of the documentation I'm finding on sqlalchemy introspection involves looking at metadata.table, but since I'm renaming columns, that data isn't trivially mappable.
The Mapper of each mapped entity has an attribute columns with all column definitions. For example, if you have a declarative class User you can access the mapper with User.__mapper__ and the columns with:
list(User.__mapper__.columns)
Each column has several attributes, including name (which might not be the same as the mapped attribute named key), nullable, unique and so on...
I'd still like to see an answer to this question, but I've worked around it by name-mangling the relationship accessors (e.g. '_otherentity' instead of 'otherentity') and then filtering on the name. Works fine for my purposes.
An InstrumentedAttribute instance has an an attribute called impl that is in practice a ScalarAttributeImpl, a ScalarObjectAttributeImpl, or a CollectionAttributeImpl.
I'm not sure how brittle this is, but I just check which one it is to determine whether an instance will ultimately return a list or a single object.

SQLAlchemy: Shallow copy avoiding lazy loading

I'm trying to automatically build a shallow copy of a SA-mapped
object.. At the moment my function is just:
newobj = src.__class__()
for prop in class_mapper(src.__class__).iterate_properties:
setattr(newobj, prop.key, getattr(src, prop.key))
but I'm having troubles with lazy relations... Obviously getattr
triggers the lazy loading, but since I don't need their values right
away, I'd like to just copy the "this should be lazy loaded"-state of
the attribute... Is this possible?
Edit: I need this for a "data logging" system.. That is, whenever someone updates a persisted entity, I have to generate a new record and then mark the old one as such.
In order to do this I create a shallow copy of the entity (so SQLA issues an INSERT instead of an UPDATE) and work from there..
The system works quite nicely (it's been in production use for months) but now I'd like to enhance it so that it won't need that all the relations get lazy-loaded first..
What you need is to copy column properties only, which can be easily filtered using isinstance(prop, sqlalchemy.orm.ColumnProperty). Note, that you HAVE to copy externally stored relations (all many-to-many), since there is no columns corresponding to them in the main table. This can't be done with high-level interface without lazy-loading, so I'd prefer to accept this trade-off. Many-to-many relations can be determined with isinstance(prop, RelationProperty) and prop.secondary test. The resulting code will look like the following:
from sqlalchemy.orm import object_mapper, ColumnProperty, RelationProperty
newobj = type(src)()
for prop in object_mapper(src).iterate_properties:
if (isinstance(prop, ColumnProperty) or
isinstance(prop, RelationProperty) and prop.secondary):
setattr(newobj, prop.key, getattr(src, prop.key))
Also note, that SQLAlchemy is designed to maintain single object loaded for each identity, while your copy breaks this when identity (primary key) properties are copied too, but this is probably not your case if your are storing with new (versioned) identifier.

Recursive delete in google app engine

I'm using google app engine with django 1.0.2 (and the django-helper) and wonder how people go about doing recursive delete.
Suppose you have a model that's something like this:
class Top(BaseModel):
pass
class Bottom(BaseModel):
daddy = db.ReferenceProperty(Top)
Now, when I delete an object of type 'Top', I want all the associated 'Bottom' objects to be deleted as well.
As things are now, when I delete a 'Top' object, the 'Bottom' objects stay and then I get data that doesn't belong anywhere. When accessing the datastore in a view, I end up with:
Caught an exception while rendering: ReferenceProperty failed to be resolved.
I could of course find all objects and delete them, but since my real model is at least 5 levels deep, I'm hoping there's a way to make sure this can be done automatically.
I've found this article about how it works with Java and that seems to be pretty much what I want as well.
Anyone know how I could get that behavior in django as well?
You need to implement this manually, by looking up affected records and deleting them at the same time as you delete the parent record. You can simplify this, if you wish, by overriding the .delete() method on your parent class to automatically delete all related records.
For performance reasons, you almost certainly want to use key-only queries (allowing you to get the keys of entities to be deleted without having to fetch and decode the actual entities), and batch deletes. For example:
db.delete(Bottom.all(keys_only=True).filter("daddy =", top).fetch(1000))
Actually that behavior is GAE-specific. Django's ORM simulates "ON DELETE CASCADE" on .delete().
I know that this is not an answer to your question, but maybe it can help you from looking in the wrong places.
Reconsider the data structure. If the relationship will never change on the record lifetime, you could use "ancestors" feature of GAE:
class Top(db.Model): pass
class Middle(db.Model): pass
class Bottom(db.Model): pass
top = Top()
middles = [Middle(parent=top) for i in range(0,10)]
bottoms = [Bottom(parent=middle) for i in range(0,10) for middle in middles]
Then querying for ancestor=top will find all the records from all levels. So it will be easy to delete them.
descendants = list(db.Query().ancestor(top))
# should return [top] + middles + bottoms
If your hierarchy is only a small number of levels deep, then you might be able to do something with a field that looks like a file path:
daddy.ancestry = "greatgranddaddy/granddaddy/daddy/"
me.ancestry = daddy.ancestry + me.uniquename + "/"
sort of thing. You do need unique names, at least unique among siblings.
The path in object IDs sort of does this already, but IIRC that's bound up with entity groups, which you're advised not to use to express relationships in the data domain.
Then you can construct a query to return all of granddaddy's descendants using the initial substring trick, like this:
query = Person.all()
query.filter("ancestry >", gdaddy.ancestry + "\U0001")
query.filter("ancestry <", gdaddy.ancestry + "\UFFFF")
Obviously this is no use if you can't fit the ancestry into a 500 byte StringProperty.

Categories