Data Modeling, Flask-SQLAlchemy and Plugins

Data Modeling, Flask-SQLAlchemy and Plugins - python

I working on a website based on Flask and Flask-SQLAlchemy with MySQL. I have a handful bunch of feeds, each feed has a few data, but it needs a function.
At first, I used MySQL-python (with raw SQL) to store data, and feeds were on plugins system so each feed overrides update() function to import data by its way.
Now I changed to use Flask-SQLAlchemy and added Feed model to the database as it helps with SQLAlchemy ORM, but I'm stuck at how to handle update() function?
Keep the plugins system in parallel with the database model, but I think that's unpractical/noneffective.
Extend model class, I'm not sure if that's possible, e.g. FeedOne(Feed) will represent item(name="one") only.
Make update() function handle all feeds, by using if self.name == "" statement.
Added some code bits.
Feed model:
class Feed(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(255))
datapieces = db.relationship('Datapiece', backref = 'feed', lazy = 'dynamic')
update() function
def update(self):
data = parsedata(self.data_source)
for item in data.items:
new_datapiece = Datapiece(feed=self.id, name=item.name, value=item.value)
db.session.add(new_datapiece)
db.session.commit()
What I hope to achieve in option 2 is:
for feed in Feed.query.all():
feed.update()
And every feed will use its own class update().

Extending the class and adding an .update() method is just how it is supposed to work for option 2.
I don't see any problem in it (and i'm using that style of coding with flask/sqlalchemy all the time).
And if you (can) omit the dynamic lazy attribute you could also do a thing like:
self.datapieces.append(new_datapiece)
in your Feed's update function.

Related

SQLAlchemy-serialize values into column of table

I'm working on a project in SQLAlchemy. I've got Command class which has custom serialization/deserialization method called toBinArray() and fromBinArray(bytes). I use it for TCP communication (I don't want to use pickle because my functions create smaller outputs).
Command has several subclasses, let's call them CommandGet, CommandSet, etc. They have additional methods and attributes and serialization methods redefinitions to keep track of their own attributes. I'm keeping all of them in one table using polymorhic_identity mechanism.
The problem is that there are lot of subclasses and every has different attributes. I have previously written mapping for every of them, but this way table has huge amount of columns.
I would like to write mechanism that will serialize (using self.toBinArray()) every instance to attribute self._bin_array (stored in Binary column) before every write to DB and load (using self.fromBinArray(value)) attributes after every load of instance from DB.
I have already found answer to part of my question: I can call self.fromBinArray(self._bin_array) in function with #orm.reconstructor decorator. It is inherited by every Command subclass and executes proper inherited version of fromBinArray(). My question is how to automatize serialization on writing to DB (I know I can manually set self._bin_array but that's very troublesome)?
P.S. Part of my code, my main class:
class Command(Base):
__tablename__ = "commands"
dbid = Column(Integer, Sequence("commands_seq"), primary_key = True)
cmd_id = Column(SmallInteger)
instance_dbid = Column(Integer, ForeignKey("instances.dbid"))
type = Column(String(20))
_bin_array = Column(Binary)
__mapper_args__ = {
"polymorphic_on" : type,
"polymorphic_identity" : "Command",
}
#orm.reconstructor
def init_on_load(self):
self.fromBinArray(self._bin_array)
def fromBinArray(self, b):
(...)
def toBinArray(self):
(...)
EDIT: I've found solution (below in answer), but are there any other solutions? Maybe some shortcut to insert event listening function inside class body?

It looks that solution was simpler than I expected-you need to use event listener for before_insert (and/or before_update event). I've found information (source) that
reconstructor() is a shortcut into a larger system of “instance level”
events, which can be subscribed to using the event API - see
InstanceEvents for the full API description of these events.
And that gave me the clue:
#event.listens_for(Command, 'before_insert', propagate = True)
def serialize_before_insert(mapper, connection, target):
print("serialize_before_insert")
target._bin_array = target.toBinArray()
You can also use event.listen() function to ,,bind'' event listener to instance, but I personally prefer decorator way. It's very important to add propagate = True) in the declaration so subclasses can inherit listener!

What's the difference between Model.query and session.query(Model) in SQLAlchemy?

I'm a beginner in SQLAlchemy and found query can be done in 2 method:
Approach 1:
DBSession = scoped_session(sessionmaker())
class _Base(object):
query = DBSession.query_property()
Base = declarative_base(cls=_Base)
class SomeModel(Base):
key = Column(Unicode, primary_key=True)
value = Column(Unicode)
# When querying
result = SomeModel.query.filter(...)
Approach 2
DBSession = scoped_session(sessionmaker())
Base = declarative_base()
class SomeModel(Base):
key = Column(Unicode, primary_key=True)
value = Column(Unicode)
# When querying
session = DBSession()
result = session.query(SomeModel).filter(...)
Is there any difference between them?

In the code above, there is no difference. This is because, in line 3 of the first example:
the query property is explicitly bound to DBSession
there is no custom Query object passed to query_property
As #petr-viktorin points out in the answer here, there must be a session available before you define your model in the first example, which might be problematic depending on the structure of your application.
If, however, you need a custom query that adds additional query parameters automatically to all queries, then only the first example will allow that. A custom query class that inherits from sqlalchemy.orm.query.Query can be passed as an argument to query_property. This question shows an example of that pattern.
Even if a model object has a custom query property defined on it, that property is not used when querying with session.query, as in the last line in the second example. This means something like the first example the only option if you need a custom query class.

I see these downsides to query_property:
You cannot use it on a different Session than the one you've configured (though you could always use session.query then).
You need a session object available before you define your schema.
These could bite you when you want to write tests, for example.
Also, session.query fits better with how SQLAlchemy works; query_property looks like it's just added on top for convenience (or similarity with other systems?).
I'd recommend you stick to session.query.

An answer (here) to a different SQLAlchemy question might help. That answer starts with:
You can use Model.query, because the Model (or usually its base class, especially in cases where declarative extension is used) is assigned Session.query_property. In this case the Model.query is equivalent to Session.query(Model).

sqlalchemy - Performance issues with too many methods/attributes?

I have been using sqlalchemy for a few years now in replacement for Django models. I have found it very convenient to have custom methods attached to these models
i.e.
class Widget(Base):
__tablename__ = 'widgets'
id = Column(Integer, primary_key=True)
name = Column(Unicode(100))
def get_slug(self, max_length=50):
return slugify(self.name)[:max_length]
Is there a performance hit when doing things like session.query(Widget) if the model has a few dozen complex methods (50-75 lines)? Are these loaded into memory for every row returned and would it be more efficient to move some of these less-used methods into helper functions and import as-necessary?
def some_helper_function(widget):
':param widget: a instance of Widget()'
# do something
Thanks!

You would not have any performance hit when loading the objects from the database using SA using just a session.query(...).
And you definitely should not move any methods out to any helper function for the sake of performance, as in doing so you would basically destroy the object oriented paradigm of your model.

Is there a way to transparently perform validation on SQLAlchemy objects?

Is there a way to perform validation on an object after (or as) the properties are set but before the session is committed?
For instance, I have a domain model Device that has a mac property. I would like to ensure that the mac property contains a valid and sanitized mac value before it is added to or updated in the database.
It looks like the Pythonic approach is to do most things as properties (including SQLAlchemy). If I had coded this in PHP or Java, I would probably have opted to create getter/setter methods to protect the data and give me the flexibility to handle this in the domain model itself.
public function mac() { return $this->mac; }
public function setMac($mac) {
return $this->mac = $this->sanitizeAndValidateMac($mac);
}
public function sanitizeAndValidateMac($mac) {
if ( ! preg_match(self::$VALID_MAC_REGEX) ) {
throw new InvalidMacException($mac);
}
return strtolower($mac);
}
What is a Pythonic way to handle this type of situation using SQLAlchemy?
(While I'm aware that validation and should be handled elsewhere (i.e., web framework) I would like to figure out how to handle some of these domain specific validation rules as they are bound to come up frequently.)
UPDATE
I know that I could use property to do this under normal circumstances. The key part is that I am using SQLAlchemy with these classes. I do not understand exactly how SQLAlchemy is performing its magic but I suspect that creating and overriding these properties on my own could lead to unstable and/or unpredictable results.

You can add data validation inside your SQLAlchemy classes using the #validates() decorator.
From the docs - Simple Validators:
An attribute validator can raise an exception, halting the process of mutating the attribute’s value, or can change the given value into something different.
from sqlalchemy.orm import validates
class EmailAddress(Base):
__tablename__ = 'address'
id = Column(Integer, primary_key=True)
email = Column(String)
#validates('email')
def validate_email(self, key, address):
# you can use assertions, such as
# assert '#' in address
# or raise an exception:
if '#' not in address:
raise ValueError('Email address must contain an # sign.')
return address

Yes. This can be done nicely using a MapperExtension.
# uses sqlalchemy hooks to data model class specific validators before update and insert
class ValidationExtension( sqlalchemy.orm.interfaces.MapperExtension ):
def before_update(self, mapper, connection, instance):
"""not every instance here is actually updated to the db, see http://www.sqlalchemy.org/docs/reference/orm/interfaces.html?highlight=mapperextension#sqlalchemy.orm.interfaces.MapperExtension.before_update"""
instance.validate()
return sqlalchemy.orm.interfaces.MapperExtension.before_update(self, mapper, connection, instance)
def before_insert(self, mapper, connection, instance):
instance.validate()
return sqlalchemy.orm.interfaces.MapperExtension.before_insert(self, mapper, connection, instance)
sqlalchemy.orm.mapper( model, table, extension = ValidationExtension(), **mapper_args )
You may want to check before_update reference because not every instance here is actually updated to the db.

"It looks like the Pythonic approach is to do most things as properties"
It varies, but that's close.
"If I had coded this in PHP or Java, I would probably have opted to create getter/setter methods..."
Good. That's Pythonic enough. Your getter and setter functions are bound up in a property; that's pretty good.
What's the question?
Are you asking how to spell property?
However, "transparent validation" -- if I read your example code correctly -- may not really be all that good an idea.
Your model and your validation should probably be kept separate. It's common to have multiple validations for a single model. For some users, fields are optional, fixed or not used; this leads to multiple validations.
You'll be happier following the Django design pattern of using a Form for validation, separate form the model.

SqlAlchemy optimizations for read-only object models

I have a complex network of objects being spawned from a sqlite database using sqlalchemy ORM mappings. I have quite a few deeply nested:
for parent in owner.collection:
for child in parent.collection:
for foo in child.collection:
do lots of calcs with foo.property
My profiling is showing me that the sqlalchemy instrumentation is taking a lot of time in this use case.
The thing is: I don't ever change the object model (mapped properties) at runtime, so once they are loaded I don't NEED the instrumentation, or indeed any sqlalchemy overhead at all. After much research, I'm thinking I might have to clone a 'pure python' set of objects from my already loaded 'instrumented objects', but that would be a pain.
Performance is really crucial here (it's a simulator), so maybe writing those layers as C extensions using sqlite api directly would be best. Any thoughts?

If you reference a single attribute of a single instance lots of times, a simple trick is to store it in a local variable.
If you want a way to create cheap pure python clones, share the dict object with the original object:
class CheapClone(object):
def __init__(self, original):
self.__dict__ = original.__dict__
Creating a copy like this costs about half of the instrumented attribute access and attribute lookups are as fast as normal.
There might also be a way to have the mapper create instances of an uninstrumented class instead of the instrumented one. If I have some time, I might take a look how deeply ingrained is the assumption that populated instances are of the same type as the instrumented class.
Found a quick and dirty way that seems to at least somewhat work on 0.5.8 and 0.6. Didn't test it with inheritance or other features that might interact badly. Also, this touches some non-public API's, so beware of breakage when changing versions.
from sqlalchemy.orm.attributes import ClassManager, instrumentation_registry
class ReadonlyClassManager(ClassManager):
"""Enables configuring a mapper to return instances of uninstrumented
classes instead. To use add a readonly_type attribute referencing the
desired class to use instead of the instrumented one."""
def __init__(self, class_):
ClassManager.__init__(self, class_)
self.readonly_version = getattr(class_, 'readonly_type', None)
if self.readonly_version:
# default instantiation logic doesn't know to install finders
# for our alternate class
instrumentation_registry._dict_finders[self.readonly_version] = self.dict_getter()
instrumentation_registry._state_finders[self.readonly_version] = self.state_getter()
def new_instance(self, state=None):
if self.readonly_version:
instance = self.readonly_version.__new__(self.readonly_version)
self.setup_instance(instance, state)
return instance
return ClassManager.new_instance(self, state)
Base = declarative_base()
Base.__sa_instrumentation_manager__ = ReadonlyClassManager
Usage example:
class ReadonlyFoo(object):
pass
class Foo(Base, ReadonlyFoo):
__tablename__ = 'foo'
id = Column(Integer, primary_key=True)
name = Column(String(32))
readonly_type = ReadonlyFoo
assert type(session.query(Foo).first()) is ReadonlyFoo

You should be able to disable lazy loading on the relationships in question and sqlalchemy will fetch them all in a single query.

Try using a single query with JOINs instead of the python loops.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Data Modeling, Flask-SQLAlchemy and Plugins - python

Related

SQLAlchemy-serialize values into column of table

What's the difference between Model.query and session.query(Model) in SQLAlchemy?

sqlalchemy - Performance issues with too many methods/attributes?

Is there a way to transparently perform validation on SQLAlchemy objects?

SqlAlchemy optimizations for read-only object models

Categories

Resources