SqlAlchemy optimizations for read-only object models

SqlAlchemy optimizations for read-only object models - python

I have a complex network of objects being spawned from a sqlite database using sqlalchemy ORM mappings. I have quite a few deeply nested:
for parent in owner.collection:
for child in parent.collection:
for foo in child.collection:
do lots of calcs with foo.property
My profiling is showing me that the sqlalchemy instrumentation is taking a lot of time in this use case.
The thing is: I don't ever change the object model (mapped properties) at runtime, so once they are loaded I don't NEED the instrumentation, or indeed any sqlalchemy overhead at all. After much research, I'm thinking I might have to clone a 'pure python' set of objects from my already loaded 'instrumented objects', but that would be a pain.
Performance is really crucial here (it's a simulator), so maybe writing those layers as C extensions using sqlite api directly would be best. Any thoughts?

If you reference a single attribute of a single instance lots of times, a simple trick is to store it in a local variable.
If you want a way to create cheap pure python clones, share the dict object with the original object:
class CheapClone(object):
def __init__(self, original):
self.__dict__ = original.__dict__
Creating a copy like this costs about half of the instrumented attribute access and attribute lookups are as fast as normal.
There might also be a way to have the mapper create instances of an uninstrumented class instead of the instrumented one. If I have some time, I might take a look how deeply ingrained is the assumption that populated instances are of the same type as the instrumented class.
Found a quick and dirty way that seems to at least somewhat work on 0.5.8 and 0.6. Didn't test it with inheritance or other features that might interact badly. Also, this touches some non-public API's, so beware of breakage when changing versions.
from sqlalchemy.orm.attributes import ClassManager, instrumentation_registry
class ReadonlyClassManager(ClassManager):
"""Enables configuring a mapper to return instances of uninstrumented
classes instead. To use add a readonly_type attribute referencing the
desired class to use instead of the instrumented one."""
def __init__(self, class_):
ClassManager.__init__(self, class_)
self.readonly_version = getattr(class_, 'readonly_type', None)
if self.readonly_version:
# default instantiation logic doesn't know to install finders
# for our alternate class
instrumentation_registry._dict_finders[self.readonly_version] = self.dict_getter()
instrumentation_registry._state_finders[self.readonly_version] = self.state_getter()
def new_instance(self, state=None):
if self.readonly_version:
instance = self.readonly_version.__new__(self.readonly_version)
self.setup_instance(instance, state)
return instance
return ClassManager.new_instance(self, state)
Base = declarative_base()
Base.__sa_instrumentation_manager__ = ReadonlyClassManager
Usage example:
class ReadonlyFoo(object):
pass
class Foo(Base, ReadonlyFoo):
__tablename__ = 'foo'
id = Column(Integer, primary_key=True)
name = Column(String(32))
readonly_type = ReadonlyFoo
assert type(session.query(Foo).first()) is ReadonlyFoo

You should be able to disable lazy loading on the relationships in question and sqlalchemy will fetch them all in a single query.

Try using a single query with JOINs instead of the python loops.

Related

"Updater" design pattern, as opposed to "Builder"

This is actually language agnostic, but I always prefer Python.
The builder design pattern is used to validate that a configuration is valid prior to creating an object, via delegation of the creation process.
Some code to clarify:
class A():
def __init__(self, m1, m2): # obviously more complex in life
self._m1 = m1
self._m2 = m2
class ABuilder():
def __init__():
self._m1 = None
self._m2 = None
def set_m1(self, m1):
self._m1 = m1
return self
def set_m2(self, m1):
self._m2 = m2
return self
def _validate(self):
# complicated validations
assert self._m1 < 1000
assert self._m1 < self._m2
def build(self):
self._validate()
return A(self._m1, self._m2)
My problem is similar, with an extra constraint that I can't re-create the object each time due to to performance limitations.
Instead, I want to only update an existing object.
Bad solutions I came up with:
I could do as suggested here and just use setters like so
class A():
...
set_m1(self, m1):
self._m1 = m1
# and so on
But this is bad because using setters
Beats the purpose of encapsulation
Beats the purpose of the buillder (now updater), which is supposed to validate that some complex configuration is preserved after the creation, or update in this case.
As I mentioned earlier, I can't recreate the object every time, as this is expensive and I only want to update some fields, or sub-fields, and still validate or sub-validate.
I could add update and validation methods to A and call those, but this beats the purpose of delegating the responsibility of updates, and is intractable in the number of fields.
class A():
...
def update1(m1):
pass # complex_logic1
def update2(m2):
pass # complex_logic2
def update12(m1, m2):
pass # complex_logic12
I could just force to update every single field in A in a method with optional parameters
class A():
...
def update("""list of all fields of A"""):
pass
Which again is not tractable, as this method will soon become a god method due to the many combinations possible.
Forcing the method to always accept changes in A, and validating in the Updater also can't work, as the Updater will need to look at A's internal state to make a descision, causing a circular dependency.
How can I delegate updating fields in my object A
in a way that
Doesn't break encapsulation of A
Actually delegates the responsibility of updating to another object
Is tractable as A becomes more complicated
I feel like I am missing something trivial to extend building to updating.

I am not sure I understand all of your concerns, but I want to try and answer your post. From what you have written I assume:
Validation is complex and multiple properties of an object must be checked to decide if any change to the object is valid.
The object must always be in a valid state. Changes that make the object invalid are not permitted.
It is too expensive to copy the object, make the change, validate the object, and then reject the change if the validation fails.
Move the validation logic out of the builder and into a separate class like ModelValidator with a validateModel(model) method
The first option is to use a command pattern.
Create abstract class or interface named Update (I don't think Python abstract classes/interfaces, but that's fine). The Update interface implements two methods, execute() and undo().
A concrete class has a name like UpdateAdress, UpdatePortfolio, or UpdatePaymentInfo.
Each concrete Update object also holds a reference to your model object.
The concrete classes hold the state needed to for a particular kind of update. Imageine these methods exist on the UpdateAddress class:
UpdateAddress
setStreetNumber(...)
setCity(...)
setPostcode(...)
setCountry(...)
The update object needs to hold both the current and new values of a property. Like:
setStreetNumber(aString):
self.oldStreetNumber = model.getStreetNumber
self.newStreetNumber = aString
When the execute method is called, the model is updated:
execute:
model.setStreetNumber(newStreetNumber)
model.setCity(newCity)
# Set postcode and country
if not ModelValidator.isValid(model):
self.undo()
raise ValidationError
and the undo method looks like:
undo:
model.setStreetNumber(oldStreetNumber)
model.setCity(oldCity)
# Set postcode and country
That is a lot of typing, but it would work. Mutating your model object is nicely encapsulated by different kinds of updates. You can execute or undo the changes by calling those methods on the update object. You can even store a list of update objects for multi-level undos and re-tries.
However, it is a lot of typing for the programmer. Consider using persistent data structures. Persistent data structures can be used to copy objects very quickly -- approximately constant time complexity. Here is a python library of persistent data structures.
Let's assume your data was in a persistent data structure version of a dict. The library I referenced calls it a PMap.
The implementation of the update classes can be simpler. Starting with the constructor:
UpdateAddress(pmap)
self.oldPmap = pmap
self.newPmap = pmap
The setters are easier:
setStreetNumber(aString):
self.newPmap = newPmap.set('streetNumber', aString)
Execute passes back a new instance of the model, with all the updates.
execute:
if ModelValidator.isValid(newModel):
return newModel;
else:
raise ValidationError
The original object has not changed at all, thanks to the magic of persistent data structures.
The best thing is to not do any of this. Instead, use an ORM or object database. That is the "enterprise grade" solution. These libraries give you sophisticated tools like transactions and object version history.

Using slots and "constants" in Python

I'm working with some LDAP data in python (I'm not great at Python) and trying to organize a class object to hold the LDAP variables. Since it's LDAP data, the end result will be many copies of the same data structure (per user) collected in an iterable list.
I started with hard-coded attribute names for the __slots__ which seemed to be working, but as things progressed I realized those LDAP attributes ought to be immutable constants of some sort to minimize hard coded text/typos. I assigned variables to the __slot__ attributes but it seems this is not such a workable plan:
AttributeError: 'LDAP_USER' object has no attribute 'ATTR_GIVEN_NAME'
Now that I think about it, I'm not actually creating immutable "constants" with the ATTR_ definitions so those values could theoretically be changed during runtime. I can see why Python might be having a problem with this design.
What is a better way to reduce the usage of hard coded text in the code while maintaining a class object which can be instantiated?
ATTR_DN = 'dn'
ATTR_GIVEN_NAME = 'givenName'
ATTR_SN = 'sn'
ATTR_EMP_TYPE = 'employeeType'
class LDAP_USER (object):
__slots__ = ATTR_GIVEN_NAME, ATTR_DN, ATTR_SN, ATTR_EMP_TYPE
user = LDAP_USER()
user.ATTR_GIVEN_NAME = "milton"
user.ATTR_SN = "waddams"
user.ATTR_EMP_TYPE = "collating"
print ("user is " + user.ATTR_GIVEN_NAME)

With __slots__ defined as [ATTR_GIVEN_NAME, ATTR_DN], the attributes should be referenced using user.givenName and user.dn, since those are the string values in __slots__.
If you want to actually reference the attribute as user.ATTR_GIVEN_NAME then that should be the value in the __slots__ array. You can then add a mapping routine to convert the object attributes to LDAP fields when performing LDAP operations with the object.
Referencing a property not in __slots__ will generate an error, so typos will be caught at runtime.

Is it possible to have a collection on an object that does not have a foreign key relationship to each other?

I'm using an declarative SQLAlchemy class to perform computations. Part of the computations require me to perform the computations for all configurations provided by a different table which doesn't have any foreign key relationships between the two tables.
This analogy is nothing like my real application, but hopefully will help to comprehend what I want to happen.
I have a set of cars and a list of paint colors.
The car object has a factory which provides a car in all possible colors
from sqlalchemy import *
from sqlachemy.orm import *
def PaintACar(car, color):
pass
Base = declarative_base()
class Colors(Base):
__table__ = u'colors'
id = Column('id', Integer)
color= Column('color', Unicode)
class Car(Base):
__table__ = u'car'
id = Column('id', Integer)
model = Column('model', Unicode)
# is this somehow possible?
all_color_objects = collection(...)
# I know this is possible, but would like to know if there's another way
#property
def all_colors(self):
s = Session.object_session(self)
return s.query(A).all()
def CarColorFactory(self):
for color in self.all_color_objects:
yield PaintACar(self, color)
My question: Is it possible to produce all_color_objects somehow? Without having to resort to finding the session and manually issuing a query as in the all_colors property?

It's been a while, so I'm providing the best answer I saw (as a comment by zzzeek). Basically, I was looking for one-off syntactic sugar. My original 'ugly' implementation works just fine.
what better way would there be here besides getting a Session and producing the query you
want? Are you looking for being able to add to the collection and that automatically
flushes things? (just add the objects to the Session?) Do you not like using
object_session(self) >(you can build some mixin class or something that hides that for
you?) It's not really clear >what the problem is. The objects here have no relationship to
the parent class so there's no particular intelligence SQLAlchemy would be able to add.
– zzzeek Jun 17 at 5:03

Data Modeling, Flask-SQLAlchemy and Plugins

I working on a website based on Flask and Flask-SQLAlchemy with MySQL. I have a handful bunch of feeds, each feed has a few data, but it needs a function.
At first, I used MySQL-python (with raw SQL) to store data, and feeds were on plugins system so each feed overrides update() function to import data by its way.
Now I changed to use Flask-SQLAlchemy and added Feed model to the database as it helps with SQLAlchemy ORM, but I'm stuck at how to handle update() function?
Keep the plugins system in parallel with the database model, but I think that's unpractical/noneffective.
Extend model class, I'm not sure if that's possible, e.g. FeedOne(Feed) will represent item(name="one") only.
Make update() function handle all feeds, by using if self.name == "" statement.
Added some code bits.
Feed model:
class Feed(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(255))
datapieces = db.relationship('Datapiece', backref = 'feed', lazy = 'dynamic')
update() function
def update(self):
data = parsedata(self.data_source)
for item in data.items:
new_datapiece = Datapiece(feed=self.id, name=item.name, value=item.value)
db.session.add(new_datapiece)
db.session.commit()
What I hope to achieve in option 2 is:
for feed in Feed.query.all():
feed.update()
And every feed will use its own class update().

Extending the class and adding an .update() method is just how it is supposed to work for option 2.
I don't see any problem in it (and i'm using that style of coding with flask/sqlalchemy all the time).
And if you (can) omit the dynamic lazy attribute you could also do a thing like:
self.datapieces.append(new_datapiece)
in your Feed's update function.

sqlalchemy - Performance issues with too many methods/attributes?

I have been using sqlalchemy for a few years now in replacement for Django models. I have found it very convenient to have custom methods attached to these models
i.e.
class Widget(Base):
__tablename__ = 'widgets'
id = Column(Integer, primary_key=True)
name = Column(Unicode(100))
def get_slug(self, max_length=50):
return slugify(self.name)[:max_length]
Is there a performance hit when doing things like session.query(Widget) if the model has a few dozen complex methods (50-75 lines)? Are these loaded into memory for every row returned and would it be more efficient to move some of these less-used methods into helper functions and import as-necessary?
def some_helper_function(widget):
':param widget: a instance of Widget()'
# do something
Thanks!

You would not have any performance hit when loading the objects from the database using SA using just a session.query(...).
And you definitely should not move any methods out to any helper function for the sake of performance, as in doing so you would basically destroy the object oriented paradigm of your model.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

SqlAlchemy optimizations for read-only object models - python

You should be able to disable lazy loading on the relationships in question and sqlalchemy will fetch them all in a single query.

Try using a single query with JOINs instead of the python loops.

Related

"Updater" design pattern, as opposed to "Builder"

Using slots and "constants" in Python

Is it possible to have a collection on an object that does not have a foreign key relationship to each other?

Data Modeling, Flask-SQLAlchemy and Plugins

sqlalchemy - Performance issues with too many methods/attributes?

Categories

Resources