How can I cache/memoize my SQLAlchemy functions?

How can I cache/memoize my SQLAlchemy functions? - python

I am using FLask-OAuthlib and want to do some caching/memoization using Flask-Cache. I've got caching setup on my views but I'm having trouble with caching this function:
#oauth.clientgetter
#cache.memoize(timeout=86400)
def load_client(client_id):
return DBSession.query(Client).filter_by(client_id=client_id).first()
The first time the function is run (not cached yet) it runs fine but when it gets it from cache something gets messed up somehow and says it's an invalid client. I don't know if it's caching it incorrectly or if having the #oauth.clientgetter decorator somehow screws up the caching. Everything works fine without caching and the client is valid. I've tried to move the function around like so, but get the same result:
class Client(Base):
__tablename__ = 'client'
__table_args__ = {'autoload': True}
user = relationship('User')
#classmethod
#cache.memoize(timeout=86400)
def get_client(cls,client_id):
return DBSession.query(cls).filter_by(client_id=client_id).first()
Then, in my view I have:
#oauth.clientgetter
def load_client(client_id):
return Client.get_client(client_id)
But this gives the same result. I am using redis as my cache backend and the keys/values I have are:
1) "flask_cache_Pwd2uVDVikMYMDNB+gVWlW"
2) "flask_cache_api.models.Client.get_client_memver"
3) "flask_cache_http://lvho.st:5000/me"
GET flask_cache_Pwd2uVDVikMYMDNB+gVWlW:
"!ccopy_reg\n_reconstructor\np1\n(capi.models\nClient\np2\nc__builtin__\nobject\np3\nNtRp4\n(dp5\nS'_sa_instance_state'\np6\ng1\n(csqlalchemy.orm.state\nInstanceState\np7\ng3\nNtRp8\n(dp9\nS'manager'\np10\ng1\n(csqlalchemy.orm.instrumentation\n_SerializeManager\np11\ng3\nNtRp12\n(dp13\nS'class_'\np14\ng2\nsbsS'class_'\np15\ng2\nsS'modified'\np16\nI00\nsS'committed_state'\np17\n(dp18\nsS'instance'\np19\ng4\nsS'callables'\np20\n(dp21\nsS'key'\np22\n(g2\n(S'Iu6copdawXIQIskY5kwPgxFgU7JoE9lTSqmlqw29'\np23\nttp24\nsS'expired'\np25\nI00\nsbsVuser_id\np26\nL4L\nsVname\np27\nS'Default'\np28\nsV_default_scopes\np29\nS'email'\np30\nsVclient_id\np31\ng23\nsV_redirect_uris\np32\nS'http://localhost:8000/authorized/'\np33\nsVactive\np34\nI1\nsVclient_secret\np35\nS'Vnw0YJjgNzR06KiwXWmYz7aSPu1ht7JnY1eRil4s5vXLM9N2ph'\np36\nsVdescription\np37\nNsb."
GET flask_cache_api.models.Client.get_client_memver:
"!S'+gVWlW'\np1\n."

Try reversing the order of your decorators:
#cache.memoize(timeout=86400)
#oauth.clientgetter
def load_client(client_id):
return DBSession.query(Client).filter_by(client_id=client_id).first()
EDIT
The problem seem to be that a Client object is not pickle-able, while cache.memoize relies on objects' pickle-ability. Therefor, in one case, you end up with an invalid-client error (the client object did not "survive" the picke-dump-then-pickle-load process), and in another case, with some kind of caching error which (silently) prevents the object from being cached (I'm not sure what mechanism causes this silent-handling).
In any case, it seems to me you shouldn't attempt to memoize your client object in the first place.

Related

Drawbacks of executing code in an SQLAlchemy managed session and if so why?

I have seen different "patterns" in handling this case so I am wondering if one has any drawbacks comapred to the other.
So lets assume that we wish to create a new object of class MyClass and add it to the database. We can do the following:
class MyClass:
pass
def builder_method_for_myclass():
# A lot of code here..
return MyClass()
my_object=builder_method_for_myclass()
with db.managed_session() as s:
s.add(my_object)
which seems that only keeps the session open for adding the new object but I have also seen cases where the entire builder method is called and executed within the managed session like so:
class MyClass:
pass
def builder_method_for_myclass():
# A lot of code here..
return MyClass()
with db.managed_session() as s:
my_object=builder_method_for_myclass()
are there any downsides in either of these methods and if yes what are they? Cant find something specific about this in the documentation.

When you build objects depending on objects fetched from a session you have to be in a session. So a factory function can only execute outside a session for the simplest cases. Usually you have to pass the session around or make it available on a thread local.
For example in this case to build a product I need to fetch the product category from the database into the session. So my product factory function depends on the session instance. The new product is created and added to the same session that the category is also in. An implicit commit should also occur when the session ends, ie the context manager completes.
def build_product(session, category_name):
category = session.query(ProductCategory).where(
ProductCategory.name == category_name).first()
return Product(category=category)
with db.managed_session() as s:
my_product = build_product(s, "clothing")
s.add(my_product)

How to patch a function in pytest, registered on sqlalchemy event.listen "after_update"?

I have a function which is registered as an event on a sqlalchemy model, as show in the code snippets below (not fully-functional as I don't show the db fixture), which should be enough to explain the problem.
root/myapp/models.py:
class MyModel:
id = Column(UUID, primary_key=True)
value = ''
#classmethod
def register_hook(cls, hook_fn):
event.listen(cls, "after_update", hook_fn, propagate=True)
root/myapp/app.py:
from models import MyModel
def hook_fn(mapper, connection, target):
print('fired hook!')
MyModel.register_hook(hook_fn)
root/test/conftest.py:
#pytest.fixture
def patched_hook_fn(mocker):
with mocker.patch("root.myapp.app.hook_fn") as patched:
yield patched
root/test/tests.py:
def test_hook_fires_on_change(db, patched_hook_fn):
model = MyModel(value="initial")
db.session.commit()
model.value = "changed"
db.session.commit() # hook fires here
assert patched_hook_fn.called # assert fails
What I'd like to know is:
Why doesn't the patched function get called?
Is there a simple way in a debug session to see where I should be patching in the with mocker.patch("myapp.app.hook_fn") as patched line?

It doesn't get called because you've already registered the unpatched version with the event system. SQLAlchemy does not read the value at root.myapp.app.hook_fn every time the event is fired, so even if you later set root.myapp.app.hook_fn = some_other_function (which is what patch is doing), it has no visible effect.
The way to fix this is to simply force your app to read the value every time the event is fired, by introducing a level of indirection:
MyModel.register_hook(lambda: hook_fn())
This takes advantage of the way Python resolves identifiers in a closure, where changing root.myapp.app.hook_fn actually changes the value of hook_fn in the closure.
As for your second question, there's no straightforward way to figure out what you need to patch because in order to patch it directly you need to figure out where it is stored in the internals of SQLAlchemy, and depending on that, even in your tests, is quite fragile.

Celery pickling not playing nice with Cassandra driver, can't figure out the root cause

I'm experiencing some behavior that I can't quite figure out. I'm using Cassandra to store message objects, and I'm using Celery for async pulls and pushes to the database. Everything is working fine, except for a single Celery task; the other tasks that use the same code/classes work. Here's a rough breakdown of the code logic:
db_manager = DBManager()
class User(object):
def __init__(self, user_id):
... normal init stuff ...
self.loader()
#run_async
def loader(self):
... loads from database if found, otherwise pulls from API ...
# THIS WORKS
#celery.task(name='user-to-db', filter=task_method)
def to_db(self):
# db_manager is a custom backend that handles relevant db reads, writes, etc.
db_manager.add('users', self.user_payload)
# THIS WORKS
#celery.task(name='load-friends', filter=task_method)
def load_friends(self):
# Checks secondary redis index for friends of user
friends = redis.srandmember('users:the-users-id:friends', self.id, 20)
if not friends:
profiles = load_friends_from_api(user_id=self.id)
else:
query = "SELECT * FROM keyspace.users WHERE id IN ({friends})".format(friends=friends)
# Init a User object for every friend
loaded_friends = [User(friend) for friend in profiles]
# Returns a class container with all the instances of User(friend), accessible through a class property
return FriendContainer(self.id, loaded_friends)
# THIS DOES NOT WORK
#celery.task(name='get-user-messages', filter=task_method)
def get_user_messages(self):
# THIS IS WHERE IT FAILS #
messages = db_manager.get("SELECT message FROM keyspace.message_timelines WHERE user_id = {user_id}".format(user_id=self.id))
# THAT LINE ABOVE #
# Init a message class object for every message payload in database
msgs = [Message(m, user=self) for m in messages]
# Returns a message container class holding all the message objects, accessible through a class property
return MessageContainer(msgs)
This last class method throws an error:
File "/usr/local/lib/python2.7/dist-packages/kombu/serialization.py", line 356, in pickle_dumps
return dumper(obj, protocol=pickle_protocol)
EncodeError: Can't pickle <class 'cassandra.io.eventletreactor.message'>: attribute lookup cassandra.io.eventletreactor.message failed
cassandra.io.eventletreactor.message points to a user-defined type in Cassandra that I use as a container for message objects per user. The line that throws this error is:
messages = db_manager.get("SELECT message FROM keyspace.message_timelines WHERE user_id = {user_id}".format(user_id=self.id))
This is the method from DBManager():
class DBManager(object):
... stuff ...
def get(self, query):
# I do some stuff to prepare the query, namely substituting `WHERE this = that` for `WHERE this = ?` to create a Cassandra prepared statement.
statement = cassandra.prepare(query_prepared)
# I want these messages as a dict, not the default namedtuple
cassandra.row_factory = dict_factory
# User id is parsed out of query
results = cassandra.execute(statement, (user_id,))
rows = results.current_rows
# rows is a list of dicts, no weird class references or anything in there
return rows
I've read that Celery tasks out of class methods is/was kind of experimental, but I can't figure out why all the other methods qua tasks that use the same instance of DBManager are working.
The problem seems to be localized to some issue with the user-defined type message that's not playing nice within the Cassandra driver; however, if I run the get method from DBManager within the Celery task itself, it works. That is, if I copy/paste the code that is throwing the error from DBManager.get into User.get_user_messages, it works fine. If I try to call DBManager.get from within User.get_user_messages, it breaks.
I just can't figure out where the problem is. I can do all the following just fine:
Run the get_user_messages method without Celery, and it works.
Run the get_user_messages method WITH Celery if I run the get method code right in the Celery task method itself.
I can run other methods registered as Celery tasks that point to other methods in DBManager that use the Cassandra driver, even ones that insert the same message user-defined type into the database.
I've tried pickling ALL THE THINGS all the way down myself, and in various combinations, and can't reproduce the error.
What I have not tried:
Change serializer to json or yaml. There are a few convenience items in the db payload that won't serialize with either of those two.
Use dill instead of pickle. It seems like this should work without having to switch serializers given that I can get various parts working separately.
I could just say screw it and run the query directly through the Cassandra driver instead of my DBManager class, but I feel like this should be solvable and I'm just missing something really, really obvious, so obvious that I'm not seeing it. Any suggestions on where to look would be greatly appreciated.
In case of relevance: Cassandra 3.3, CQL 3.4, DataStax python driver 3.1

Meh, I found the problem, and it WAS really obvious. I guess I didn't actually try pickling all the things, just most of the things, and I didn't catch this in my 4am debugging stupor.
At any rate, cassandra.row_factory = dict_factory, when called on a user defined type, doesn't actually return everything as a dict. It gives a dict of {'label': message(x='this', y='that')}, where message is a namedtuple. The Cassandra driver dynamically creates the namedtuple inside of a class instance, and so pickle couldn't find it.

Python - caching a property to avoid future calculations

In the following example, cached_attr is used to get or set an attribute on a model instance when a database-expensive property (related_spam in the example) is called. In the example, I use cached_spam to save queries. I put print statements when setting and when getting values so that I could test it out. I tested it in a view by passing an Egg instance into the view and in the view using {{ egg.cached_spam }}, as well as other methods on the Egg model that make calls to cached_spam themselves. When I finished and tested it out the shell output in Django's development server showed that the attribute cache was missed several times, as well as successfully gotten several times. It seems to be inconsistent. With the same data, when I made small changes (as little as changing the print statement's string) and refreshed (with all the same data), different amounts of misses / successes happened. How and why is this happening? Is this code incorrect or highly problematic?
class Egg(models.Model):
... fields
#property
def related_spam(self):
# Each time this property is called the database is queried (expected).
return Spam.objects.filter(egg=self).all() # Spam has foreign key to Egg.
#property
def cached_spam(self):
# This should call self.related_spam the first time, and then return
# cached results every time after that.
return self.cached_attr('related_spam')
def cached_attr(self, attr):
"""This method (normally attached via an abstract base class, but put
directly on the model for this example) attempts to return a cached
version of a requested attribute, and calls the actual attribute when
the cached version isn't available."""
try:
value = getattr(self, '_p_cache_{0}'.format(attr))
print('GETTING - {0}'.format(value))
except AttributeError:
value = getattr(self, attr)
print('SETTING - {0}'.format(value))
setattr(self, '_p_cache_{0}'.format(attr), value)
return value

Nothing wrong with your code, as far as it goes. The problem probably isn't there, but in how you use that code.
The main thing to realise is that model instances don't have identity. That means that if you instantiate an Egg object somewhere, and a different one somewhere else, even if they refer to the same underlying database row they won't share internal state. So calling cached_attr on one won't cause the cache to be populated in the other.
For example, assuming you have a RelatedObject class with a ForeignKey to Egg:
my_first_egg = Egg.objects.get(pk=1)
my_related_object = RelatedObject.objects.get(egg__pk=1)
my_second_egg = my_related_object.egg
Here my_first_egg and my_second_egg both refer to the database row with pk 1, but they are not the same object:
>>> my_first_egg.pk == my_second_egg.pk
True
>>> my_first_egg is my_second_egg
False
So, filling the cache on my_first_egg doesn't fill it on my_second_egg.
And, of course, objects won't persist across requests (unless they're specifically made global, which is horrible), so the cache won't persist either.

Http servers that scale are shared-nothing; you can't rely on anything being singleton. To share state, you need to connect to a special-purpose service.
Django's caching support is appropriate for your use case. It isn't necessarily a global singleton either; if you use locmem://, it will be process-local, which could be the more efficient choice.

SQLAlchemy DetachedInstanceError with regular attribute (not a relation)

I just started using SQLAlchemy and get a DetachedInstanceError and can't find much information on this anywhere. I am using the instance outside a session, so it is natural that SQLAlchemy is unable to load any relations if they are not already loaded, however, the attribute I am accessing is not a relation, in fact this object has no relations at all. I found solutions such as eager loading, but I can't apply to this because this is not a relation. I even tried "touching" this attribute before closing the session, but it still doesn't prevent the exception. What could be causing this exception for a non-relational property even after it has been successfully accessed once before? Any help in debugging this issue is appreciated. I will meanwhile try to get a reproducible stand-alone scenario and update here.
Update: This is the actual exception message with a few stacks:
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/attributes.py", line 159, in __get__
return self.impl.get(instance_state(instance), instance_dict(instance))
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/attributes.py", line 377, in get
value = callable_(passive=passive)
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/state.py", line 280, in __call__
self.manager.deferred_scalar_loader(self, toload)
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/mapper.py", line 2323, in _load_scalar_attributes
(state_str(state)))
DetachedInstanceError: Instance <ReportingJob at 0xa41cd8c> is not bound to a Session; attribute refresh operation cannot proceed
The partial model looks like this:
metadata = MetaData()
ModelBase = declarative_base(metadata=metadata)
class ReportingJob(ModelBase):
__tablename__ = 'reporting_job'
job_id = Column(BigInteger, Sequence('job_id_sequence'), primary_key=True)
client_id = Column(BigInteger, nullable=True)
And the field client_id is what is causing this exception with a usage like the below:
Query:
jobs = session \
.query(ReportingJob) \
.filter(ReportingJob.job_id == job_id) \
.all()
if jobs:
# FIXME(Hari): Workaround for the attribute getting lazy-loaded.
jobs[0].client_id
return jobs[0]
This is what triggers the exception later out of the session scope:
msg = msg + ", client_id: %s" % job.client_id

I found the root cause while trying to narrow down the code that caused the exception. I placed the same attribute access code at different places after session close and found that it definitely doesn't cause any issue immediately after the close of query session. It turns out the problem starts appearing after closing a fresh session that is opened to update the object. Once I understood that the state of the object is unusable after a session close, I was able to find this thread that discussed this same issue. Two solutions that come out of the thread are:
Keep a session open (which is obvious)
Specify expire_on_commit=False to sessionmaker().
The 3rd option is to manually set expire_on_commit to False on the session once it is created, something like: session.expire_on_commit = False. I verified that this solves my issue.

We were getting similar errors, even with expire_on_commit set to False. In the end it was actually caused by having two sessionmakers that were both getting used to make sessions in different requests. I don't really understand what was going on, but if you see this exception with expire_on_commit=False, make sure you don't have two sessionmakers initialized.

I had a similar problem with the DetachedInstanceError: Instance <> is not bound to a Session;
The situation was quite simple, I pass the session and the record to be updated to my function and it would merge the record and commit it to the database. In the first sample I would get the error, as I was lazy and thought that I could just return the merged object so my operating record would be updated (ie its is_modified value would be false). It did return the updated record and is_modified was now false but subsequent uses threw the error. I think this was compounded because of related child records but not entirely sure of that.
def EditStaff(self, session, record):
try:
r = session.merge(record)
session.commit()
return r
except:
return False
After much googling and reading about sessions etc, I realized that since I had captured the instance r before the commit and returned it, when that same record was sent back to this function for another edit/commit it had lost its session.
So to fix this I just query the database for the record just updated and return it to keep it in session and mark its is_modified value back to false.
def EditStaff(self, session, record):
try:
session.merge(record)
session.commit()
r = self.GetStaff(session, record)
return r
except:
return False
Setting the expire_on_commit=False also avoided the error as mentioned above, but I don't think it actually addresses the error, and could lead to many other issues IMO.

To throw my cause & solution into the ring, I use flask and flask-sqlalchemy to manage all my session stuff. This is fine when I'm doing things via the site, but when doing things via command line and scripts, you have to ensure that anything that's doing flask-y things has to do it with the flask context.
So, in my situation, I needed to get things from a database (using flask-sqlalchemy), then render them to templates (using flask's render_template), then email them (using flask-mail).
In code, what I'd done was something like,
def render_obj(db_obj):
with app.app_context():
return render_template('template_for_my_db_obj.html', db_obj=db_obj
def get_renders():
my_db_objs = MyDbObj.query.all()
renders = []
for day, _db_objs in itertools.groupby(my_db_objs, MyDbObj.get_date):
renders.extend(list(map(render_obj, _db_obj)))
return renders
def email_report():
renders = get_renders()
report = '\n'.join(renders)
with app.app_context():
mail.send(Message('Subject', ['me#me.com'], html=report))
(this is basically pseudocode, I was doing other things in the grouping section)
And when I was running, I'd get through the first _db_obj, but then I'd get the error on any run after.
The culprit? with app.app_context().
Basically it does a few things when you come out of that context, including kinda freshening up the db connections. One of the things that comes from that is getting rid of the last session that was around, which was the session that all the my_db_objs were associated with.
There's a few different options for solutions, but I went with a variant of,
def render_obj(db_obj):
return render_template('template_for_my_db_obj.html', db_obj=db_obj
def get_renders():
my_db_objs = MyDbObj.query.all()
renders = []
for day, _db_objs in itertools.groupby(my_db_objs, MyDbObj.get_date):
renders.extend(list(map(render_obj, _db_obj)))
return renders
def email_report():
with app.app_context():
renders = get_renders()
report = '\n'.join(renders)
mail.send(Message('Subject', ['me#me.com'], html=report))
Only 1 with app.app_context() which wraps them all. The main thing you need to do (if you've a setup like mine) is ensure any dB object you're using to be "inside" any app_context you're using. If you do what I did in the first iteration, all your dB objects will lose their session, ending in DetachedInstanceError like me.

My solution was a simple oversight;
I created an object, added and ,committed it to the db and after that I tried to access on of the original object attributes without refreshing session session.refresh(object)
user = UserFactory()
session.add(user)
session.commit()
# missing session.refresh(user) and causing the problem
print(user.name)

As for me (newbie), I made a mistake on the indent and close the session inside my loop, in which I loop each row, do some operation and commit each time.
So for those newbie like me, check your code before setting things like expire_on_commit=False, it may lead your to another trap.

My solution to this error was also a simple oversight, which I don't think any of the other answers cover.
My function is fetching object x, modifying it, then returning the original x, because I would like the older version.
Before committing and returning x, I was calling expunge_all, but it was "too late", as the object was already marked dirty.
The solution was simply to expunge the object as early as possible.
# pseudo code
x = session.fetch_x()
# adding the following line fixed it
session.expunge(x)
y = session.update(x)
return x

I have a similar problem in my current project and this fix works for me. Please check in your DB relationship for options lazy=True and change it to lazy='dynamic'.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I cache/memoize my SQLAlchemy functions? - python

Related

Drawbacks of executing code in an SQLAlchemy managed session and if so why?

How to patch a function in pytest, registered on sqlalchemy event.listen "after_update"?

Celery pickling not playing nice with Cassandra driver, can't figure out the root cause

Python - caching a property to avoid future calculations

SQLAlchemy DetachedInstanceError with regular attribute (not a relation)

Categories

Resources