I'm working on a project in SQLAlchemy. I've got Command class which has custom serialization/deserialization method called toBinArray() and fromBinArray(bytes). I use it for TCP communication (I don't want to use pickle because my functions create smaller outputs).
Command has several subclasses, let's call them CommandGet, CommandSet, etc. They have additional methods and attributes and serialization methods redefinitions to keep track of their own attributes. I'm keeping all of them in one table using polymorhic_identity mechanism.
The problem is that there are lot of subclasses and every has different attributes. I have previously written mapping for every of them, but this way table has huge amount of columns.
I would like to write mechanism that will serialize (using self.toBinArray()) every instance to attribute self._bin_array (stored in Binary column) before every write to DB and load (using self.fromBinArray(value)) attributes after every load of instance from DB.
I have already found answer to part of my question: I can call self.fromBinArray(self._bin_array) in function with #orm.reconstructor decorator. It is inherited by every Command subclass and executes proper inherited version of fromBinArray(). My question is how to automatize serialization on writing to DB (I know I can manually set self._bin_array but that's very troublesome)?
P.S. Part of my code, my main class:
class Command(Base):
__tablename__ = "commands"
dbid = Column(Integer, Sequence("commands_seq"), primary_key = True)
cmd_id = Column(SmallInteger)
instance_dbid = Column(Integer, ForeignKey("instances.dbid"))
type = Column(String(20))
_bin_array = Column(Binary)
__mapper_args__ = {
"polymorphic_on" : type,
"polymorphic_identity" : "Command",
}
#orm.reconstructor
def init_on_load(self):
self.fromBinArray(self._bin_array)
def fromBinArray(self, b):
(...)
def toBinArray(self):
(...)
EDIT: I've found solution (below in answer), but are there any other solutions? Maybe some shortcut to insert event listening function inside class body?
It looks that solution was simpler than I expected-you need to use event listener for before_insert (and/or before_update event). I've found information (source) that
reconstructor() is a shortcut into a larger system of “instance level”
events, which can be subscribed to using the event API - see
InstanceEvents for the full API description of these events.
And that gave me the clue:
#event.listens_for(Command, 'before_insert', propagate = True)
def serialize_before_insert(mapper, connection, target):
print("serialize_before_insert")
target._bin_array = target.toBinArray()
You can also use event.listen() function to ,,bind'' event listener to instance, but I personally prefer decorator way. It's very important to add propagate = True) in the declaration so subclasses can inherit listener!
Related
I have seen different "patterns" in handling this case so I am wondering if one has any drawbacks comapred to the other.
So lets assume that we wish to create a new object of class MyClass and add it to the database. We can do the following:
class MyClass:
pass
def builder_method_for_myclass():
# A lot of code here..
return MyClass()
my_object=builder_method_for_myclass()
with db.managed_session() as s:
s.add(my_object)
which seems that only keeps the session open for adding the new object but I have also seen cases where the entire builder method is called and executed within the managed session like so:
class MyClass:
pass
def builder_method_for_myclass():
# A lot of code here..
return MyClass()
with db.managed_session() as s:
my_object=builder_method_for_myclass()
are there any downsides in either of these methods and if yes what are they? Cant find something specific about this in the documentation.
When you build objects depending on objects fetched from a session you have to be in a session. So a factory function can only execute outside a session for the simplest cases. Usually you have to pass the session around or make it available on a thread local.
For example in this case to build a product I need to fetch the product category from the database into the session. So my product factory function depends on the session instance. The new product is created and added to the same session that the category is also in. An implicit commit should also occur when the session ends, ie the context manager completes.
def build_product(session, category_name):
category = session.query(ProductCategory).where(
ProductCategory.name == category_name).first()
return Product(category=category)
with db.managed_session() as s:
my_product = build_product(s, "clothing")
s.add(my_product)
I have a function which is registered as an event on a sqlalchemy model, as show in the code snippets below (not fully-functional as I don't show the db fixture), which should be enough to explain the problem.
root/myapp/models.py:
class MyModel:
id = Column(UUID, primary_key=True)
value = ''
#classmethod
def register_hook(cls, hook_fn):
event.listen(cls, "after_update", hook_fn, propagate=True)
root/myapp/app.py:
from models import MyModel
def hook_fn(mapper, connection, target):
print('fired hook!')
MyModel.register_hook(hook_fn)
root/test/conftest.py:
#pytest.fixture
def patched_hook_fn(mocker):
with mocker.patch("root.myapp.app.hook_fn") as patched:
yield patched
root/test/tests.py:
def test_hook_fires_on_change(db, patched_hook_fn):
model = MyModel(value="initial")
db.session.commit()
model.value = "changed"
db.session.commit() # hook fires here
assert patched_hook_fn.called # assert fails
What I'd like to know is:
Why doesn't the patched function get called?
Is there a simple way in a debug session to see where I should be patching in the with mocker.patch("myapp.app.hook_fn") as patched line?
It doesn't get called because you've already registered the unpatched version with the event system. SQLAlchemy does not read the value at root.myapp.app.hook_fn every time the event is fired, so even if you later set root.myapp.app.hook_fn = some_other_function (which is what patch is doing), it has no visible effect.
The way to fix this is to simply force your app to read the value every time the event is fired, by introducing a level of indirection:
MyModel.register_hook(lambda: hook_fn())
This takes advantage of the way Python resolves identifiers in a closure, where changing root.myapp.app.hook_fn actually changes the value of hook_fn in the closure.
As for your second question, there's no straightforward way to figure out what you need to patch because in order to patch it directly you need to figure out where it is stored in the internals of SQLAlchemy, and depending on that, even in your tests, is quite fragile.
I working on a website based on Flask and Flask-SQLAlchemy with MySQL. I have a handful bunch of feeds, each feed has a few data, but it needs a function.
At first, I used MySQL-python (with raw SQL) to store data, and feeds were on plugins system so each feed overrides update() function to import data by its way.
Now I changed to use Flask-SQLAlchemy and added Feed model to the database as it helps with SQLAlchemy ORM, but I'm stuck at how to handle update() function?
Keep the plugins system in parallel with the database model, but I think that's unpractical/noneffective.
Extend model class, I'm not sure if that's possible, e.g. FeedOne(Feed) will represent item(name="one") only.
Make update() function handle all feeds, by using if self.name == "" statement.
Added some code bits.
Feed model:
class Feed(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(255))
datapieces = db.relationship('Datapiece', backref = 'feed', lazy = 'dynamic')
update() function
def update(self):
data = parsedata(self.data_source)
for item in data.items:
new_datapiece = Datapiece(feed=self.id, name=item.name, value=item.value)
db.session.add(new_datapiece)
db.session.commit()
What I hope to achieve in option 2 is:
for feed in Feed.query.all():
feed.update()
And every feed will use its own class update().
Extending the class and adding an .update() method is just how it is supposed to work for option 2.
I don't see any problem in it (and i'm using that style of coding with flask/sqlalchemy all the time).
And if you (can) omit the dynamic lazy attribute you could also do a thing like:
self.datapieces.append(new_datapiece)
in your Feed's update function.
Is there a way to perform validation on an object after (or as) the properties are set but before the session is committed?
For instance, I have a domain model Device that has a mac property. I would like to ensure that the mac property contains a valid and sanitized mac value before it is added to or updated in the database.
It looks like the Pythonic approach is to do most things as properties (including SQLAlchemy). If I had coded this in PHP or Java, I would probably have opted to create getter/setter methods to protect the data and give me the flexibility to handle this in the domain model itself.
public function mac() { return $this->mac; }
public function setMac($mac) {
return $this->mac = $this->sanitizeAndValidateMac($mac);
}
public function sanitizeAndValidateMac($mac) {
if ( ! preg_match(self::$VALID_MAC_REGEX) ) {
throw new InvalidMacException($mac);
}
return strtolower($mac);
}
What is a Pythonic way to handle this type of situation using SQLAlchemy?
(While I'm aware that validation and should be handled elsewhere (i.e., web framework) I would like to figure out how to handle some of these domain specific validation rules as they are bound to come up frequently.)
UPDATE
I know that I could use property to do this under normal circumstances. The key part is that I am using SQLAlchemy with these classes. I do not understand exactly how SQLAlchemy is performing its magic but I suspect that creating and overriding these properties on my own could lead to unstable and/or unpredictable results.
You can add data validation inside your SQLAlchemy classes using the #validates() decorator.
From the docs - Simple Validators:
An attribute validator can raise an exception, halting the process of mutating the attribute’s value, or can change the given value into something different.
from sqlalchemy.orm import validates
class EmailAddress(Base):
__tablename__ = 'address'
id = Column(Integer, primary_key=True)
email = Column(String)
#validates('email')
def validate_email(self, key, address):
# you can use assertions, such as
# assert '#' in address
# or raise an exception:
if '#' not in address:
raise ValueError('Email address must contain an # sign.')
return address
Yes. This can be done nicely using a MapperExtension.
# uses sqlalchemy hooks to data model class specific validators before update and insert
class ValidationExtension( sqlalchemy.orm.interfaces.MapperExtension ):
def before_update(self, mapper, connection, instance):
"""not every instance here is actually updated to the db, see http://www.sqlalchemy.org/docs/reference/orm/interfaces.html?highlight=mapperextension#sqlalchemy.orm.interfaces.MapperExtension.before_update"""
instance.validate()
return sqlalchemy.orm.interfaces.MapperExtension.before_update(self, mapper, connection, instance)
def before_insert(self, mapper, connection, instance):
instance.validate()
return sqlalchemy.orm.interfaces.MapperExtension.before_insert(self, mapper, connection, instance)
sqlalchemy.orm.mapper( model, table, extension = ValidationExtension(), **mapper_args )
You may want to check before_update reference because not every instance here is actually updated to the db.
"It looks like the Pythonic approach is to do most things as properties"
It varies, but that's close.
"If I had coded this in PHP or Java, I would probably have opted to create getter/setter methods..."
Good. That's Pythonic enough. Your getter and setter functions are bound up in a property; that's pretty good.
What's the question?
Are you asking how to spell property?
However, "transparent validation" -- if I read your example code correctly -- may not really be all that good an idea.
Your model and your validation should probably be kept separate. It's common to have multiple validations for a single model. For some users, fields are optional, fixed or not used; this leads to multiple validations.
You'll be happier following the Django design pattern of using a Form for validation, separate form the model.
I have a complex network of objects being spawned from a sqlite database using sqlalchemy ORM mappings. I have quite a few deeply nested:
for parent in owner.collection:
for child in parent.collection:
for foo in child.collection:
do lots of calcs with foo.property
My profiling is showing me that the sqlalchemy instrumentation is taking a lot of time in this use case.
The thing is: I don't ever change the object model (mapped properties) at runtime, so once they are loaded I don't NEED the instrumentation, or indeed any sqlalchemy overhead at all. After much research, I'm thinking I might have to clone a 'pure python' set of objects from my already loaded 'instrumented objects', but that would be a pain.
Performance is really crucial here (it's a simulator), so maybe writing those layers as C extensions using sqlite api directly would be best. Any thoughts?
If you reference a single attribute of a single instance lots of times, a simple trick is to store it in a local variable.
If you want a way to create cheap pure python clones, share the dict object with the original object:
class CheapClone(object):
def __init__(self, original):
self.__dict__ = original.__dict__
Creating a copy like this costs about half of the instrumented attribute access and attribute lookups are as fast as normal.
There might also be a way to have the mapper create instances of an uninstrumented class instead of the instrumented one. If I have some time, I might take a look how deeply ingrained is the assumption that populated instances are of the same type as the instrumented class.
Found a quick and dirty way that seems to at least somewhat work on 0.5.8 and 0.6. Didn't test it with inheritance or other features that might interact badly. Also, this touches some non-public API's, so beware of breakage when changing versions.
from sqlalchemy.orm.attributes import ClassManager, instrumentation_registry
class ReadonlyClassManager(ClassManager):
"""Enables configuring a mapper to return instances of uninstrumented
classes instead. To use add a readonly_type attribute referencing the
desired class to use instead of the instrumented one."""
def __init__(self, class_):
ClassManager.__init__(self, class_)
self.readonly_version = getattr(class_, 'readonly_type', None)
if self.readonly_version:
# default instantiation logic doesn't know to install finders
# for our alternate class
instrumentation_registry._dict_finders[self.readonly_version] = self.dict_getter()
instrumentation_registry._state_finders[self.readonly_version] = self.state_getter()
def new_instance(self, state=None):
if self.readonly_version:
instance = self.readonly_version.__new__(self.readonly_version)
self.setup_instance(instance, state)
return instance
return ClassManager.new_instance(self, state)
Base = declarative_base()
Base.__sa_instrumentation_manager__ = ReadonlyClassManager
Usage example:
class ReadonlyFoo(object):
pass
class Foo(Base, ReadonlyFoo):
__tablename__ = 'foo'
id = Column(Integer, primary_key=True)
name = Column(String(32))
readonly_type = ReadonlyFoo
assert type(session.query(Foo).first()) is ReadonlyFoo
You should be able to disable lazy loading on the relationships in question and sqlalchemy will fetch them all in a single query.
Try using a single query with JOINs instead of the python loops.