I have this google app-engine datastore NDB model:
class pySessions(ndb.Model):
# sid is the key/id
data = ndb.Blobproperty(required=True)
expiry = ndb.DateTimeProperty(required=True)
#classmethod
def get_sid(cls, sid):
sid_key = ndb.Key(cls, sid)
return cls.query(cls.key == sid_key,
cls.expiry >= datetime.utcnow()).get()
For getting an specific sid I can use something like this:
data = pySessions.get_by_id(sid)
if session and session.expiry >= datetime.utcnow():
return session.data
return {}
Or I could use the #classmethod get_sid
data = pySessions.get_sid(sid)
Both work but while doing some tests, notice that the #classmethod was behaving slower, or not reading the updated session data.
I was testing with a basic counter but after incrementing, I was reloading (header location) the page, is here that I notice that for some unknown reason querying the NBB using the #classmethod get_sidwas having some issues, the only way I was available to make it work was when using the pdb debugger, since it was making a 'pause' and allowing the code to read the data slowly.
Any idea of what is the difference between using #classmethod or a custom query ?
I'm not sure why you think the speed difference has anything to do with using a classmethod. The two methods are doing completely different things: the classmethod is doing a query (by key and expiry), whereas the other code is doing a straight get, which is much quicker, but then only returning if the expiry is less than now.
Queries are always much slower than gets in the datastore, but the second method has the disadvantage of always fetching the data - and therefore incurring a cost - even if the expiry date has passed.
The other potential downside of the query is that queries (unlike gets) are subject to eventual consistency, so there is a likelihood of not seeing the most recently updated data.
Related
I have a Python app split across different files. One of them, models.py, contains, among PyQt5 table models, several maps referred from several PyQt5 form files:
# first lines:
agents_id_map = \
{agent.name:agent.id for agent in db.session.query(db.Agent, db.Agent.id)}
# ....
# 2000 thousand lines
I want to keep this kind of maps centralized in a single point. I'm using SQLAlchemy also. Agent class is defined in a db.py file. I use these maps to fulfill the foreign key in another object, say, an invoice, like:
invoice = db.Invoice()
# Here is a reference
invoice.agent_id = models.agents_id_map[agent_combo.currentText()]
ยทยทยทยท
db.session.add(invoice)
db.session.commit()
The problem is that the model.py module gets cached and several parts of the application access old data, and, if another running instance A of the app creates a new agent, and a running instance B wants to create a new invoice, the B running instance won't see the new Agent created by A unless restarts the app. This also happens if a user in the same running instance creates an agent and then he wants to create an invoice. My solutions are:
Reload the module, to get the whole code executed again, but this could be very expensive.
Isolate the code building those maps in another file, say maps.py, which would be less expensive to reload and change all code that references it through refactoring.
Is there a solution that would allow me to touch only the code building those maps and the rest of the application remains ignorant of the change, and every time the map is referenced from another module or even the same, the code gets executed, effectively re-building maps with fresh data?
Is there a solution that would allow me to touch only the code building those maps and the rest of the application remains ignorant of the change, and every time the map is referenced from another module or even the same, the code gets executed, effectively re-building maps with fresh data?
Certainly: put you maps inside a function, or even better, a class.
If I understand this problem correctly, you have stateful data (maps) which need regenerating under some condition (every time they are accessed? Or just every time the db is updated?). I would do something like this:
class Mappings:
def __init__(self, db):
self._db = db
... # do any initial db stuff you need to here
def id_map(self, thing):
db_thing = getattr(self._db, thing.title)
return {x.name:x.id for x in self._db.session.query(db_thing, db_thing.id)}
def other_property_map(self, prop):
... # etc
mapping = Mapping(db)
mapping.id_map("agent")
This assumes that the mapping example you've given is your major use-case, but this model could easily be adapted for almost any other mapping you might want.
You would write a method of every kind of 'mapping' you need, and it would return the desired dictionary. Note that here I've assumed you handle setting up the db elsewhere and pass a fully initialised db access object to the class, which is probably what you want to do---this class is just about encapsulating mapper state, not re-inventing your orm.
Caching
I have not provided any caching. But if you have complete control over the db, it is easy enough to run a hook before you do any db commits looking to see if you've touched any particular model, and then state that those need rebuilding. Something like this:
class DbAccess(Mappings):
def __init__(self, db, models):
super().init(db)
self._cached_map = {model: {} for model in models}
def db_update(model: str, params: dict):
try:
self._cached_map[model] = {} # wipe cache
except KeyError:
pass
self._db.update_with_model(model, params) # dummy fn
def id_map(self, thing: str):
try:
return self._cached_map[thing]["id"]
except KeyError:
self._cached_map[thing]["id"] = super().id_map(thing)
return self._cached_map[thing]["id"]
I don't really think DbAccess should inherit from Mappings---put it all in one class, or have a DB class and a Mappings mixin and inherit from both. I just didn't want to write everything out again.
I've not written any real db access routines, (hence my dummy fn) as I don't know how you're doing it (but clearly using an ORM). But the basic idea is just to handle the caching yourself, by storing the mapping every time, but deleting all the stored mappings every time you do any commit transactions involving the model in question (thus rebuilding the cache as needed).
Aside
Note that if you really do have 2,000 lines of manually declared mappings of the form thing.name: thing.id you really should generate them at runtime anyhow. Declarative is all very well and good, but writing out 2,000 permutations of the same thing isn't declarative, it's just time-consuming---and doing the job a simple loop putting the data in ram could do for you at startup.
#property
def returns_an_int(self):
some_var = do_something_that_returns_int()
if some_var:
return int(some_var)
else:
return 0
I have a property using the code above, which is supposed to (and does) return an int. (I put the int casting on it to double check) However when I try to use it to order a fetch,cls.query().order(cls.returns_an_int) I get a type error TypeError: order() expects a Property or query Order; received <property object at 0xsome-hex-address> - I have tried using + or - to see if I can make it get a value because I saw it in other questions but it didn't work. What am I doing wrong? Am I missing something basic to make it work?
When you query the datastore, you need to query on one of the various ndb.Property types. You are querying on a vanilla python property. This doesn't work because the python property data never gets stored in the datastore and so the datastore can't order by data that it doesn't know about.
It's hard to give definitive advice on how to fix this. You can do your query without the .order and sort in client code:
return sorted(cls.query(), key=lambda item: item.returns_an_int)
but that won't work for entity types that have lots of entities. You could also change your property to an ndb.ComputedProperty:
#ndb.ComputedProperty
def returns_an_int(self):
some_var = do_something_that_returns_int()
if some_var:
return int(some_var)
else:
return 0
The downside here is that you're storing (and paying for) more datastore data and you also execute the returns_an_int method for every datastore put. If it is a fast method to execute, that shouldn't be a problem, but if it is a slow method you might start to notice the difference.
I am using FLask-OAuthlib and want to do some caching/memoization using Flask-Cache. I've got caching setup on my views but I'm having trouble with caching this function:
#oauth.clientgetter
#cache.memoize(timeout=86400)
def load_client(client_id):
return DBSession.query(Client).filter_by(client_id=client_id).first()
The first time the function is run (not cached yet) it runs fine but when it gets it from cache something gets messed up somehow and says it's an invalid client. I don't know if it's caching it incorrectly or if having the #oauth.clientgetter decorator somehow screws up the caching. Everything works fine without caching and the client is valid. I've tried to move the function around like so, but get the same result:
class Client(Base):
__tablename__ = 'client'
__table_args__ = {'autoload': True}
user = relationship('User')
#classmethod
#cache.memoize(timeout=86400)
def get_client(cls,client_id):
return DBSession.query(cls).filter_by(client_id=client_id).first()
Then, in my view I have:
#oauth.clientgetter
def load_client(client_id):
return Client.get_client(client_id)
But this gives the same result. I am using redis as my cache backend and the keys/values I have are:
1) "flask_cache_Pwd2uVDVikMYMDNB+gVWlW"
2) "flask_cache_api.models.Client.get_client_memver"
3) "flask_cache_http://lvho.st:5000/me"
GET flask_cache_Pwd2uVDVikMYMDNB+gVWlW:
"!ccopy_reg\n_reconstructor\np1\n(capi.models\nClient\np2\nc__builtin__\nobject\np3\nNtRp4\n(dp5\nS'_sa_instance_state'\np6\ng1\n(csqlalchemy.orm.state\nInstanceState\np7\ng3\nNtRp8\n(dp9\nS'manager'\np10\ng1\n(csqlalchemy.orm.instrumentation\n_SerializeManager\np11\ng3\nNtRp12\n(dp13\nS'class_'\np14\ng2\nsbsS'class_'\np15\ng2\nsS'modified'\np16\nI00\nsS'committed_state'\np17\n(dp18\nsS'instance'\np19\ng4\nsS'callables'\np20\n(dp21\nsS'key'\np22\n(g2\n(S'Iu6copdawXIQIskY5kwPgxFgU7JoE9lTSqmlqw29'\np23\nttp24\nsS'expired'\np25\nI00\nsbsVuser_id\np26\nL4L\nsVname\np27\nS'Default'\np28\nsV_default_scopes\np29\nS'email'\np30\nsVclient_id\np31\ng23\nsV_redirect_uris\np32\nS'http://localhost:8000/authorized/'\np33\nsVactive\np34\nI1\nsVclient_secret\np35\nS'Vnw0YJjgNzR06KiwXWmYz7aSPu1ht7JnY1eRil4s5vXLM9N2ph'\np36\nsVdescription\np37\nNsb."
GET flask_cache_api.models.Client.get_client_memver:
"!S'+gVWlW'\np1\n."
Try reversing the order of your decorators:
#cache.memoize(timeout=86400)
#oauth.clientgetter
def load_client(client_id):
return DBSession.query(Client).filter_by(client_id=client_id).first()
EDIT
The problem seem to be that a Client object is not pickle-able, while cache.memoize relies on objects' pickle-ability. Therefor, in one case, you end up with an invalid-client error (the client object did not "survive" the picke-dump-then-pickle-load process), and in another case, with some kind of caching error which (silently) prevents the object from being cached (I'm not sure what mechanism causes this silent-handling).
In any case, it seems to me you shouldn't attempt to memoize your client object in the first place.
In my application, there is a class for each model that holds commonly used queries (I guess it's somewhat of a "Repository" in DDD language). Each of these classes is passed the SQLAlchemy session object to create queries with upon construction. I'm having a little difficulty in figuring the best way to assert certain queries are being run in my unit tests. Using the ubiquitous blog example, let's say I have a "Post" model with columns and attributes "date" and "content". I also have a "PostRepository" with the method "find_latest" that is supposed to query for all posts in descending order by "date". It looks something like:
from myapp.models import Post
class PostRepository(object):
def __init__(self, session):
self._s = session
def find_latest(self):
return self._s.query(Post).order_by(Post.date.desc())
I'm having trouble mocking the Post.date.desc() call. Right now I'm monkey patching a mock in for Post.date.desc in my unit test, but I feel that there is likely a better approach.
Edit: I'm using mox for mock objects, my current unit test looks something like:
import unittest
import mox
class TestPostRepository(unittest.TestCase):
def setUp(self):
self._mox = mox.Mox()
def _create_session_mock(self):
from sqlalchemy.orm.session import Session
return self._mox.CreateMock(Session)
def _create_query_mock(self):
from sqlalchemy.orm.query import Query
return self._mox.CreateMock(Query)
def _create_desc_mock(self):
from myapp.models import Post
return self._mox.CreateMock(Post.date.desc)
def test_find_latest(self):
from myapp.models.repositories import PostRepository
from myapp.models import Post
expected_result = 'test'
session_mock = self._create_session_mock()
query_mock = self._create_query_mock()
desc_mock = self._create_desc_mock()
# Monkey patch
tmp = Post.date.desc
Post.date.desc = desc_mock
session_mock.query(Post).AndReturn(query_mock)
query_mock.order_by(Post.date.desc().AndReturn('test')).AndReturn(query_mock)
query_mock.offset(0).AndReturn(query_mock)
query_mock.limit(10).AndReturn(expected_result)
self._mox.ReplayAll()
r = PostRepository(session_mock)
result = r.find_latest()
self._mox.VerifyAll()
self.assertEquals(expected_result, result)
Post.date.desc = tmp
This does work, though feels ugly and I'm not sure why it fails without the "AndReturn('test')" piece of "Post.date.desc().AndReturn('test')"
I don't think you're really gaining much benefit by using mocks for testing your queries. Testing should be testing the logic of the code, not the implementation. A better solution would be to create a fresh database, add some objects to it, run the query on that database, and determine if you're getting the correct results back. For example:
# Create the engine. This starts a fresh database
engine = create_engine('sqlite://')
# Fills the database with the tables needed.
# If you use declarative, then the metadata for your tables can be found using Base.metadata
metadata.create_all(engine)
# Create a session to this database
session = sessionmaker(bind=engine)()
# Create some posts using the session and commit them
...
# Test your repository object...
repo = PostRepository(session)
results = repo.find_latest()
# Run your assertions of results
...
Now, you're actually testing the logic of the code. This means that you can change the implementation of your method, but so long as the query works correctly, the tests should still pass. If you want, you could write this method as a query that gets all the objects, then slices the resulting list. The test would pass, as it should. Later on, you could change the implementation to run the query using the SA expression APIs, and the test would pass.
One thing to keep in mind is that you might have problems with sqlite behaving differently than another database type. Using sqlite in-memory gives you fast tests, but if you want to be serious about these tests, you'll probably want to run them against the same type of database you'll be using in production as well.
If yet you want to create a unit test with mock input, you can create instances of your model with fake data
In case that the result proxy return result with data from more than one of the models (for instance when you join two tables), you can use collections data struct called namedtuple
We are using it to mock results of join queries
In the following example, cached_attr is used to get or set an attribute on a model instance when a database-expensive property (related_spam in the example) is called. In the example, I use cached_spam to save queries. I put print statements when setting and when getting values so that I could test it out. I tested it in a view by passing an Egg instance into the view and in the view using {{ egg.cached_spam }}, as well as other methods on the Egg model that make calls to cached_spam themselves. When I finished and tested it out the shell output in Django's development server showed that the attribute cache was missed several times, as well as successfully gotten several times. It seems to be inconsistent. With the same data, when I made small changes (as little as changing the print statement's string) and refreshed (with all the same data), different amounts of misses / successes happened. How and why is this happening? Is this code incorrect or highly problematic?
class Egg(models.Model):
... fields
#property
def related_spam(self):
# Each time this property is called the database is queried (expected).
return Spam.objects.filter(egg=self).all() # Spam has foreign key to Egg.
#property
def cached_spam(self):
# This should call self.related_spam the first time, and then return
# cached results every time after that.
return self.cached_attr('related_spam')
def cached_attr(self, attr):
"""This method (normally attached via an abstract base class, but put
directly on the model for this example) attempts to return a cached
version of a requested attribute, and calls the actual attribute when
the cached version isn't available."""
try:
value = getattr(self, '_p_cache_{0}'.format(attr))
print('GETTING - {0}'.format(value))
except AttributeError:
value = getattr(self, attr)
print('SETTING - {0}'.format(value))
setattr(self, '_p_cache_{0}'.format(attr), value)
return value
Nothing wrong with your code, as far as it goes. The problem probably isn't there, but in how you use that code.
The main thing to realise is that model instances don't have identity. That means that if you instantiate an Egg object somewhere, and a different one somewhere else, even if they refer to the same underlying database row they won't share internal state. So calling cached_attr on one won't cause the cache to be populated in the other.
For example, assuming you have a RelatedObject class with a ForeignKey to Egg:
my_first_egg = Egg.objects.get(pk=1)
my_related_object = RelatedObject.objects.get(egg__pk=1)
my_second_egg = my_related_object.egg
Here my_first_egg and my_second_egg both refer to the database row with pk 1, but they are not the same object:
>>> my_first_egg.pk == my_second_egg.pk
True
>>> my_first_egg is my_second_egg
False
So, filling the cache on my_first_egg doesn't fill it on my_second_egg.
And, of course, objects won't persist across requests (unless they're specifically made global, which is horrible), so the cache won't persist either.
Http servers that scale are shared-nothing; you can't rely on anything being singleton. To share state, you need to connect to a special-purpose service.
Django's caching support is appropriate for your use case. It isn't necessarily a global singleton either; if you use locmem://, it will be process-local, which could be the more efficient choice.