SQLAlchemy introspection of declarative classes - python

I'm writing a small sqlalchemy shim to export data from a MySQL database with some lightweight data transformations—mostly changing field names. My current script works fine but requires me to essentially describe my model twice—once in the class declaration and once as a list of field names to iterate over.
I'm trying to figure out how to use introspection to identify properties on row-objects that are column accessors. The following works almost perfectly:
for attr, value in self.__class__.__dict__.iteritems():
if isinstance(value, sqlalchemy.orm.attributes.InstrumentedAttribute):
self.__class__._columns.append(attr)
except that my to-many relation accessors are also instances of sqlalchemy.orm.attributes.InstrumentedAttribute, and I need to skip those. Is there any way to distinguish between the two while I am inspecting the class dictionary?
Most of the documentation I'm finding on sqlalchemy introspection involves looking at metadata.table, but since I'm renaming columns, that data isn't trivially mappable.

The Mapper of each mapped entity has an attribute columns with all column definitions. For example, if you have a declarative class User you can access the mapper with User.__mapper__ and the columns with:
list(User.__mapper__.columns)
Each column has several attributes, including name (which might not be the same as the mapped attribute named key), nullable, unique and so on...

I'd still like to see an answer to this question, but I've worked around it by name-mangling the relationship accessors (e.g. '_otherentity' instead of 'otherentity') and then filtering on the name. Works fine for my purposes.

An InstrumentedAttribute instance has an an attribute called impl that is in practice a ScalarAttributeImpl, a ScalarObjectAttributeImpl, or a CollectionAttributeImpl.
I'm not sure how brittle this is, but I just check which one it is to determine whether an instance will ultimately return a list or a single object.

Related

force object to be `dirty` in sqlalchemy

Is there a way to force an object mapped by sqlalchemy to be considered dirty? For example, given the context of sqlalchemy's Object Relational Tutorial the problem is demonstrated,
a=session.query(User).first()
a.__dict__['name']='eh'
session.dirty
yielding,
IdentitySet([])
i am looking for a way to force the user a into a dirty state.
This problem arises because the class that is mapped using sqlalchemy takes control of the attribute getter/setter methods, and this preventing sqlalchemy from registering changes.
I came across the same problem recently and it was not obvious.
Objects themselves are not dirty, but their attributes are. As SQLAlchemy will write back only changed attributes, not the whole object, as far as I know.
If you set an attribute using set_attribute and it is different from the original attribute data, SQLAlchemy founds out the object is dirty (TODO: I need details how it does the comparison):
from sqlalchemy.orm.attributes import set_attribute
set_attribute(obj, data_field_name, data)
If you want to mark the object dirty regardless of the original attribute value, no matter if it has changed or not, use flag_modified:
from sqlalchemy.orm.attributes import flag_modified
flag_modified(obj, data_field_name)
The flag_modified approach works if one know that attribute have a value present. SQLAlchemy documentation states:
Mark an attribute on an instance as ‘modified’.
This sets the ‘modified’ flag on the instance and establishes an
unconditional change event for the given attribute. The attribute must
have a value present, else an InvalidRequestError is raised.
Starting with version 1.2, if one wants to mark an entire instance then flag_dirty is the solution:
Mark an instance as ‘dirty’ without any specific attribute mentioned.

What is the difference between a mongoengine.DynamicEmbeddedDocument vs mongoengine.DictField?

A mongoengine.DynamicEmbeddedDocument can be used to leverage MongoDB's flexible schema-less design. It's expandable and doesn't apply type constraints to the fields, afaik.
A mongoengine.DictField similarly allows for use of MongoDB's schema-less nature. In the documentation they simply say (w.r.t. the DictField)
This is similar to an embedded document, but the structure is not defined.
Does that mean, then, the mongoengine.fields.DictField and the mongoengine.DynamicEmbeddedDocument are completely interchangeable?
EDIT (for more information):
mongoengine.DynamicEmbeddedDocument inherits from mongoengine.EmbeddedDocument which, from the code is:
A mongoengine.Document that isn't stored in its own collection. mongoengine.EmbeddedDocuments should be used as fields on mongoengine.Documents through the mongoengine.EmbeddedDocumentField field type.
A mongoengine.fields.EmbeddedDocumentField is
An embedded document field - with a declared document_type. Only valid values are subclasses of EmbeddedDocument.
Does this mean the only thing that makes the DictField and DynamicEmbeddedDocument not totally interchangeable is that the DynamicEmbeddedDocument has to be defined through the EmbeddedDocumentField field type?
From what I’ve seen, the two are similar, but not entirely interchangeable. Each approach may have a slight advantage based on your needs. First of all, as you point out, the two approaches require differing definitions in the document, as shown below.
class ExampleDynamicEmbeddedDoc(DynamicEmbeddedDocument):
pass
class ExampleDoc(Document):
dict_approach = DictField()
dynamic_doc_approach = EmbeddedDocumentField(ExampleDynamicEmbeddedDoc, default = ExampleDynamicEmbeddedDoc())
Note: The default is not required, but the dynamic_doc_approach field will need to be set to a ExampleDynamicEmbeddedDoc object in order to save. (i.e. trying to save after setting example_doc_instance.dynamic_doc_approach = {} would throw an exception). Also, you could use the GenericEmbeddedDocumentField if you don’t want to tie the field to a specific type of EmbeddedDocument, but the field would still need to be point to an object subclassed from EmbeddedDocument in order to save.
Once set up, the two are functionally similar in that you can save data to them as needed and without restrictions:
e = ExampleDoc()
e.dict_approach["test"] = 10
e.dynamic_doc_approach.test = 10
However, the one main difference that I’ve seen is that you can query against any values added to a DictField, whereas you cannot with a DynamicEmbeddedDoc.
ExampleDoc.objects(dict_approach__test = 10) # Returns a QuerySet containing our entry.
ExampleDoc.objects(dynamic_doc_approach__test = 10) # Throws an exception.
That being said, using an EmbeddedDocument has the advantage of validating fields which you know will be present in the document. (We simply would need to add them to the ExampleDynamicEmbeddedDoc definition). Because of this, I think it is best to use a DynamicEmbeddedDocument when you have a good idea of a schema for the field and only anticipate adding fields minimally (which you will not need to query against). However, if you are not concerned about validation or anticipate adding a lot of fields which you’ll query against, go with a DictField.

How to remove all items from many-to-many collection in SqlAlchemy?

when I need to remove an object from declarative ORM many-to-many relationship, I am supposed to do this:
blogpost.tags.remove(tag)
Well. What am I supposed to do if I need to purge all these relations (not only one)? Typical situation: I'd like to set a new list of tags to my blogpost. So I need to...:
Remove all existing relations between that blogpost and tags.
Set new relations and create new tags if they don't exist.
Of course, there could be a better way of doing this. In that case please let me know.
This is the standard Python idiom for clearing a list – assigning to the “entire list” slice:
blogpost.tags[:] = []
Instead of the empty list, you may want assign the new set of tags directly.
blogpost.tags[:] = new_tags
SQLAlchemy's relations are instrumented attributes, meaning that they keep the interface of a list (or set, dict, etc), but any changes are reflected in the database. This means that anything you can do with a list is possible with a relation, while SQLA transparently listens to the changes and updates the database accordingly.
Confrimed for what Two-Bit Alchemist has reported.
blogpost.tags[:] = new_tags
will complain about
TypeError: 'AppenderBaseQuery' object does not support item assignment
But
blogpost.tags = new_tags
seems to work fine.

SQLAlchemy: Shallow copy avoiding lazy loading

I'm trying to automatically build a shallow copy of a SA-mapped
object.. At the moment my function is just:
newobj = src.__class__()
for prop in class_mapper(src.__class__).iterate_properties:
setattr(newobj, prop.key, getattr(src, prop.key))
but I'm having troubles with lazy relations... Obviously getattr
triggers the lazy loading, but since I don't need their values right
away, I'd like to just copy the "this should be lazy loaded"-state of
the attribute... Is this possible?
Edit: I need this for a "data logging" system.. That is, whenever someone updates a persisted entity, I have to generate a new record and then mark the old one as such.
In order to do this I create a shallow copy of the entity (so SQLA issues an INSERT instead of an UPDATE) and work from there..
The system works quite nicely (it's been in production use for months) but now I'd like to enhance it so that it won't need that all the relations get lazy-loaded first..
What you need is to copy column properties only, which can be easily filtered using isinstance(prop, sqlalchemy.orm.ColumnProperty). Note, that you HAVE to copy externally stored relations (all many-to-many), since there is no columns corresponding to them in the main table. This can't be done with high-level interface without lazy-loading, so I'd prefer to accept this trade-off. Many-to-many relations can be determined with isinstance(prop, RelationProperty) and prop.secondary test. The resulting code will look like the following:
from sqlalchemy.orm import object_mapper, ColumnProperty, RelationProperty
newobj = type(src)()
for prop in object_mapper(src).iterate_properties:
if (isinstance(prop, ColumnProperty) or
isinstance(prop, RelationProperty) and prop.secondary):
setattr(newobj, prop.key, getattr(src, prop.key))
Also note, that SQLAlchemy is designed to maintain single object loaded for each identity, while your copy breaks this when identity (primary key) properties are copied too, but this is probably not your case if your are storing with new (versioned) identifier.

How do I get the key value of a db.ReferenceProperty without a database hit?

Is there a way to get the key (or id) value of a db.ReferenceProperty, without dereferencing the actual entity it points to? I have been digging around - it looks like the key is stored as the property name preceeded with an _, but I have been unable to get any code working. Examples would be much appreciated. Thanks.
EDIT: Here is what I have unsuccessfully tried:
class Comment(db.Model):
series = db.ReferenceProperty(reference_class=Series);
def series_id(self):
return self._series
And in my template:
more
The result:
more
Actually, the way that you are advocating accessing the key for a ReferenceProperty might well not exist in the future. Attributes that begin with '_' in python are generally accepted to be "protected" in that things that are closely bound and intimate with its implementation can use them, but things that are updated with the implementation must change when it changes.
However, there is a way through the public interface that you can access the key for your reference-property so that it will be safe in the future. I'll revise the above example:
class Comment(db.Model):
series = db.ReferenceProperty(reference_class=Series);
def series_id(self):
return Comment.series.get_value_for_datastore(self)
When you access properties via the class it is associated, you get the property object itself, which has a public method that can get the underlying values.
You're correct - the key is stored as the property name prefixed with '_'. You should just be able to access it directly on the model object. Can you demonstrate what you're trying? I've used this technique in the past with no problems.
Edit: Have you tried calling series_id() directly, or referencing _series in your template directly? I'm not sure whether Django automatically calls methods with no arguments if you specify them in this context. You could also try putting the #property decorator on the method.

Categories