SQLAlchemy: Shallow copy avoiding lazy loading - python

I'm trying to automatically build a shallow copy of a SA-mapped
object.. At the moment my function is just:
newobj = src.__class__()
for prop in class_mapper(src.__class__).iterate_properties:
setattr(newobj, prop.key, getattr(src, prop.key))
but I'm having troubles with lazy relations... Obviously getattr
triggers the lazy loading, but since I don't need their values right
away, I'd like to just copy the "this should be lazy loaded"-state of
the attribute... Is this possible?
Edit: I need this for a "data logging" system.. That is, whenever someone updates a persisted entity, I have to generate a new record and then mark the old one as such.
In order to do this I create a shallow copy of the entity (so SQLA issues an INSERT instead of an UPDATE) and work from there..
The system works quite nicely (it's been in production use for months) but now I'd like to enhance it so that it won't need that all the relations get lazy-loaded first..

What you need is to copy column properties only, which can be easily filtered using isinstance(prop, sqlalchemy.orm.ColumnProperty). Note, that you HAVE to copy externally stored relations (all many-to-many), since there is no columns corresponding to them in the main table. This can't be done with high-level interface without lazy-loading, so I'd prefer to accept this trade-off. Many-to-many relations can be determined with isinstance(prop, RelationProperty) and prop.secondary test. The resulting code will look like the following:
from sqlalchemy.orm import object_mapper, ColumnProperty, RelationProperty
newobj = type(src)()
for prop in object_mapper(src).iterate_properties:
if (isinstance(prop, ColumnProperty) or
isinstance(prop, RelationProperty) and prop.secondary):
setattr(newobj, prop.key, getattr(src, prop.key))
Also note, that SQLAlchemy is designed to maintain single object loaded for each identity, while your copy breaks this when identity (primary key) properties are copied too, but this is probably not your case if your are storing with new (versioned) identifier.

Related

Undo `lazyload()` with the relationship default

I have a Query object which was initially configured to lazyload() all relations on a model:
query = session.query(Article).options(lazyload('author'))
Is it possible to revert the relationship loading back to default? E.g. the relationship was configured with lazy='joined', and I want the query to have joinedload() behavior without using joinedload() explicitly.
I was expecting defaultload() to have this behavior, but in fact it does not: it references the query default instead of the relationship default. So I'm searching for kinda resetload() solution.
The reason for doing this is because I'm creating a JSON-based query syntax, and no relations should be loaded unless the user explicitly names them.
Currently, I'm using lazyload() on all relations that were not explicitly requested, but want to go the other way around: lazyload() all relations first, and then override it for some of them.
This would have made the code more straigntforward.
Just to be clear:
By default, all inter-object relationships are lazy loading.
http://docs.sqlalchemy.org/en/latest/orm/loading.html
So we are talking about a case in which a relation has been specifically marked as eager loading, then the queries are configured as lazy loading, then you want to "override the override" as it were.
Chaining calls to options will override earlier calls. I did test this a bit.
q = s.query(User) # lazy loads 'addresses'
q = s.query(User).options(contains_eager('addresses')) # eager loads
q = s.query(User).options(contains_eager('addresses'))\
.options(lazyload('addresses')) # lazy loads
q = s.query(User).options(contains_eager('addresses'))\
.options(lazyload('addresses'))\
.options(contains_eager('addresses')) # eager loads
However, it sounds like you're talking about just reverting the lazyload option, whereas the above case involves an explicit change to eager loading.
The defaultload docstring says its use case is to be chained to other loader options, so I don't think it's related.
Based on a glance through the source, I don't think this behavior is supported. When you update the loading strategy option, it updates a dictionary with the new loading strategy and I don't think there's still a reference to the old strategy, at least as far as I can tell.
You could keep a reference to the query object before .options(lazyload(...)), or just have an option to generate the query with or without the lazyload on everything.
To force everything to lazyload, ignoring what was specified on the relationship, you can use the '*' target. From the docs:
affecting all relationships not otherwise specified in the query. This
feature is available by passing the string '*' as the argument to any
of these options:
session.query(Article).options(lazyload('*'))
Then you can specify whatever load types you want per relationship or relationship chain.
# not sure how you are mapping json data to relationships
# once you know the relationships, you can build a list of them to load
my_loads = [joinedload(rel) for rel in json_rel_data]
query = session.query(Article).options(lazyload('*'), *my_loads)
# query lazy loads **everything** except the explicitly set joined loads
If you are joining on the relationships for query purposes, you can use contains_eager instead of joinedload in the options to use the already joined relationship.
my_eagers = [contains_eager(rel) for rel in json_rel_joins]
my_loads = [joinedload(rel) for rel in json_rel_loads]
query = session.query(Article
).join(*json_rel_joins
).options(lazyload('*'), *my_eagers, *my_loads)

hybrid property with join in sqlalchemy

I have probably not grasped the use of #hybrid_property fully. But what I try to do is to make it easy to access a calculated value based on a column in another table and thus a join is required.
So what I have is something like this (which works but is awkward and feels wrong):
class Item():
:
#hybrid_property
def days_ago(self):
# Can I even write a python version of this ?
pass
#days_ago.expression
def days_ago(cls):
return func.datediff(func.NOW(), func.MAX(Event.date_started))
This requires me to add the join on the Action table by the caller when I need to use the days_ago property. Is the hybrid_property even the correct approach to simplifying my queries where I need to get hold of the days_ago value ?
One way or another you need to load or access Action rows either via join or via lazy load (note here it's not clear what Event vs. Action is, I'm assuming you have just Item.actions -> Action).
The non-"expression" version of days_ago intends to function against Action objects that are relevant only to the current instance. Normally within a hybrid, this means just iterating through Item.actions and performing the operation in Python against loaded Action objects. Though in this case you're looking for a simple aggregate you could instead opt to run a query, but again it would be local to self so this is like object_session(self).query(func.datediff(...)).select_from(Action).with_parent(self).scalar().
The expression version of the hybrid when formed against another table typically requires that the query in which it is used already have the correct FROM clauses set up, so it would look like session.query(Item).join(Item.actions).filter(Item.days_ago == xyz). This is explained at Join-Dependent Relationship Hybrid.
your expression here might be better produced as a column_property, if you can afford using a correlated subquery. See that at http://docs.sqlalchemy.org/en/latest/orm/mapping_columns.html#using-column-property-for-column-level-options.

How to remove all items from many-to-many collection in SqlAlchemy?

when I need to remove an object from declarative ORM many-to-many relationship, I am supposed to do this:
blogpost.tags.remove(tag)
Well. What am I supposed to do if I need to purge all these relations (not only one)? Typical situation: I'd like to set a new list of tags to my blogpost. So I need to...:
Remove all existing relations between that blogpost and tags.
Set new relations and create new tags if they don't exist.
Of course, there could be a better way of doing this. In that case please let me know.
This is the standard Python idiom for clearing a list – assigning to the “entire list” slice:
blogpost.tags[:] = []
Instead of the empty list, you may want assign the new set of tags directly.
blogpost.tags[:] = new_tags
SQLAlchemy's relations are instrumented attributes, meaning that they keep the interface of a list (or set, dict, etc), but any changes are reflected in the database. This means that anything you can do with a list is possible with a relation, while SQLA transparently listens to the changes and updates the database accordingly.
Confrimed for what Two-Bit Alchemist has reported.
blogpost.tags[:] = new_tags
will complain about
TypeError: 'AppenderBaseQuery' object does not support item assignment
But
blogpost.tags = new_tags
seems to work fine.

Concurent Access to datastore in app engine

i want to know if db.run_in_transaction() acts as a lock for Data store operations
and helps in case of concurrent access on same entity.
Does in following code it is guarantied that a concurrent access will not cause a race and instead of creating new entity it will not do a over-write
Is db.run_in_transaction() correct/best way to do so
in following code i m trying to create new unique entity with following code
def txn(charmer=None):
new = None
key = my_magic() + random_part()
sk = Snake.get_by_name(key)
if not sk:
new = Snake(key_name=key, charmer= charmer)
new.put()
return new
db.run_in_transaction(txn, charmer)
That is a safe method. Should the same name get generated twice, only one entity would be created.
It sounds like you have already looked at the transactions documentation. There is also a more detailed description.
Check out the docs (specifically the equivalent code) on Model.get_or_insert, it answers exactly the question you are asking:
The get and subsequent (possible) put
are wrapped in a transaction to ensure
atomicity. Ths means that
get_or_insert() will never overwrite
an existing entity, and will insert a
new entity if and only if no entity
with the given kind and name exists.
What you've done is right and sort of duplicates the Model.get_or_insert, like Robert already explained.
I don't know if this can be called a 'lock'... the way this works is optimistic concurrency - the operation will execute assuming that no one else is trying to do the same thing at the same time, and if someone is, it will give you an exception. You'll need to figure out what you want to do in that case. Maybe ask the user to choose a new name?

SQLAlchemy introspection of declarative classes

I'm writing a small sqlalchemy shim to export data from a MySQL database with some lightweight data transformations—mostly changing field names. My current script works fine but requires me to essentially describe my model twice—once in the class declaration and once as a list of field names to iterate over.
I'm trying to figure out how to use introspection to identify properties on row-objects that are column accessors. The following works almost perfectly:
for attr, value in self.__class__.__dict__.iteritems():
if isinstance(value, sqlalchemy.orm.attributes.InstrumentedAttribute):
self.__class__._columns.append(attr)
except that my to-many relation accessors are also instances of sqlalchemy.orm.attributes.InstrumentedAttribute, and I need to skip those. Is there any way to distinguish between the two while I am inspecting the class dictionary?
Most of the documentation I'm finding on sqlalchemy introspection involves looking at metadata.table, but since I'm renaming columns, that data isn't trivially mappable.
The Mapper of each mapped entity has an attribute columns with all column definitions. For example, if you have a declarative class User you can access the mapper with User.__mapper__ and the columns with:
list(User.__mapper__.columns)
Each column has several attributes, including name (which might not be the same as the mapped attribute named key), nullable, unique and so on...
I'd still like to see an answer to this question, but I've worked around it by name-mangling the relationship accessors (e.g. '_otherentity' instead of 'otherentity') and then filtering on the name. Works fine for my purposes.
An InstrumentedAttribute instance has an an attribute called impl that is in practice a ScalarAttributeImpl, a ScalarObjectAttributeImpl, or a CollectionAttributeImpl.
I'm not sure how brittle this is, but I just check which one it is to determine whether an instance will ultimately return a list or a single object.

Categories