I'm designing an API with SQLAlchemy (querying MySQL) and I would like to force all my queries to have page_size (LIMIT) and page_number (OFFSET) parameters.
Is there a clean way of doing this with SQLAlchemy? Perhaps building a factory of some sort to create a custom Query object? Or maybe there is a good way to do this with a mixin class?
I tried the obvious thing and it didn't work because .limit() and .offset() must be called after all filter conditions have been applied:
def q(page=0, page_size=None):
q = session.query(...)
if page_size: q = q.limit(page_size)
if page: q = q.offset(page*page_size)
return q
When I try using this, I get the exception:
sqlalchemy.exc.InvalidRequestError: Query.filter() being called on a Query which already has LIMIT or OFFSET applied. To modify the row-limited results of a Query, call from_self() first. Otherwise, call filter() before limit() or offset() are applied.
Try adding a first, required argument, which must be a group of query filters. Thus,
# q({'id': 5}, 2, 50)
def q(filters, page=0, page_size=None):
query = session.query(...).filter_by(**filters)
if page_size:
query = query.limit(page_size)
if page:
query = query.offset(page*page_size)
return query
or,
# q(Model.id == 5, 2, 50)
def q(filter, page=0, page_size=None):
query = session.query(...).filter(filter)
if page_size:
query = query.limit(page_size)
if page:
query = query.offset(page*page_size)
return query
Not an option at the time of this question, since version 1.0.0 you can take advantage of Query events to ensure limit and offset methods are always called just before your query object is compiled, after any manipulation is performed by the users of your q function:
from sqlalchemy.event import listen
def q(page=0, page_size=None):
query = session.query()
listen(query, 'before_compile', apply_limit(page, page_size), retval=True)
return query
def apply_limit(page, page_size):
def wrapped(query):
if page_size:
query = query.limit(page_size)
if page:
query = query.offset(page * page_size)
return query
return wrapped
You can call query.limit(None). to remove previously applied limit or offset.
Related
Many times I find myself writing code similar to:
query = MyModel.objects.all()
if request.GET.get('filter_by_field1'):
query = query.filter(field1 = True)
if request.GET.get('filter_by_field2'):
query = query.filter(field2 = False)
field3_filter = request.GET.get('field3'):
if field3_filter is not None:
query = query.filter(field3 = field3_filter)
if field4_filter:
query = query.filter(field4 = field4_filter)
# etc...
return query
Is there a better, more generic way of building queries such as the one above?
If the only things that are ever going to be in request GET are potential query arguments, you could do this:
query = MyModel.objects.filter(**request.GET)
I have the following query call:
SearchList = (DBSession.query(
func.count(ExtendedCDR.uniqueid).label("CallCount"),
func.sum(ExtendedCDR.duration).label("TotalSeconds"),
ExtendedCDR,ExtensionMap)
.filter(or_(ExtensionMap.exten == ExtendedCDR.extension,ExtensionMap.prev_exten == ExtendedCDR.extension))
.filter(between(ExtendedCDR.start,datebegin,dateend))
.filter(ExtendedCDR.extension.in_(SelectedExtension))
.group_by(ExtendedCDR.extension)
.order_by(func.count(ExtendedCDR.uniqueid).desc()))
.all()
)
I would like to be able to define the order_by clause prior to calling the .query(), is this possible?
I tried doing as this stackoverflow answer suggests for a filter spec, but I had no idea how to create the filter_group syntax.
From that post:
filter_group = list(Column.in_('a','b'),Column.like('%a'))
query = query.filter(and_(*filter_group))
You build a SQL query with the DBSession.query() call, and this query is not executed until you call .all() on it.
You can store the intermediary results and add more filters or other clauses as needed:
search =DBSession.query(
func.count(ExtendedCDR.uniqueid).label("CallCount"),
func.sum(ExtendedCDR.duration).label("TotalSeconds"),
ExtendedCDR,ExtensionMap)
search = search.filter(or_(
ExtensionMap.exten == ExtendedCDR.extension,
ExtensionMap.prev_exten == ExtendedCDR.extension))
search = search.filter(between(ExtendedCDR.start, datebegin, dateend))
search = search.filter(ExtendedCDR.extension.in_(SelectedExtension))
search = search.group_by(ExtendedCDR.extension)
search = search.order_by(func.count(ExtendedCDR.uniqueid).desc())
The value you pass to order_by can be created ahead of time:
search_order = func.count(ExtendedCDR.uniqueid).desc()
then used like:
search = search.order_by(search_order)
Once your query is complete, get the results by calling .all():
SearchList = search.all()
In the GAE documentation, it states:
Because each get() or put() operation invokes a separate remote
procedure call (RPC), issuing many such calls inside a loop is an
inefficient way to process a collection of entities or keys at once.
Who knows how many other inefficiencies I have in my code, so I'd like to minimize as much as I can. Currently, I do have a for loop where each iteration has a separate query. Let's say I have a User, and a user has friends. I want to get the latest updates for every friend of the user. So what I have is an array of that user's friends:
for friend_dic in friends:
email = friend_dic['email']
lastUpdated = friend_dic['lastUpdated']
userKey = Key('User', email)
query = ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated)
qit = query.iter()
while (yield qit.has_next_async()):
status = qit.next()
status_list.append(status.to_dict())
raise ndb.Return(status_list)
Is there a more efficient way to do this, maybe somehow batch all these into one single query?
Try looking at NDB's map function: https://developers.google.com/appengine/docs/python/ndb/queryclass#Query_map_async
Example (assuming you keep your friend relationships in a separate model, for this example I assumed a Relationships model):
#ndb.tasklet
def callback(entity):
email = friend_dic['email']
lastUpdated = friend_dic['lastUpdated']
userKey = Key('User', email)
query = ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated)
status_updates = yield query.fetch_async()
raise ndb.Return(status_updates)
qry = ndb.gql("SELECT * FROM Relationships WHERE friend_to = :1", user.key)
updates = yield qry.map_async(callback)
#updates will now be a list of status updates
Update:
With a better understanding of your data model:
queries = []
status_list = []
for friend_dic in friends:
email = friend_dic['email']
lastUpdated = friend_dic['lastUpdated']
userKey = Key('User', email)
queries.append(ndb.gql('SELECT * FROM StatusUpdates WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastUpdated).fetch_async())
for query in queries:
statuses = yield query
status_list.extend([x.to_dict() for x in statuses])
raise ndb.Return(status_list)
You could perform those query concurrently using ndb async methods:
from google.appengine.ext import ndb
class Bar(ndb.Model):
pass
class Foo(ndb.Model):
pass
bars = ndb.put_multi([Bar() for i in range(10)])
ndb.put_multi([Foo(parent=bar) for bar in bars])
futures = [Foo.query(ancestor=bar).fetch_async(10) for bar in bars]
for f in futures:
print(f.get_result())
This launches 10 concurrent Datastore Query RPCs, and the overall latency only depends of the slowest one instead of the sum of all latencies
Also see the official ndb documentation for more detail on how to async APIs with ndb.
I'm trying to fetch results in a python2.7 appengine app using cursors, but each time I use with_cursor() it fetches the same result set.
query = Model.all().filter("profile =", p_key).order('-created')
if r.get('cursor'):
query = query.with_cursor(start_cursor = r.get('cursor'))
cursor = query.cursor()
objs = query.fetch(limit=10)
count = len(objs)
for obj in objs:
...
Each time through I'm getting same 10 results. I'm thinkng it has to do with using end_cursor, but how do I get that value if query.cursor() is returning the start_cursor. I've looked through the docs but this is poorly documented.
Your formatting is a bit screwy by the way. Looking at your code (which is incomplete and therefore potentially leaving something out.) I have to assume you have forgotten to store the cursor after fetching results (or return to the user - I am assuming r is a request ?).
So after you have fetched some data you need to call cursor() on the query. e.g This function counts all entities using a cursor.
def count_entities(kind):
c = None
count = 0
q = kind.all(keys_only=True)
while True:
if c:
q.with_cursor(c)
i = q.fetch(1000)
count = count + len(i)
if not i:
break
c = q.cursor()
return count
See how after fetch() has been called the c=q.cursor() call and it's is used as the cursor next time through the loop.
Here's what finally worked:
query = Model.all().filter("profile =", p_key).order('-created')
if request.get('cursor'):
query = query.with_cursor(request.get('cursor'))
objs = query.fetch(limit=10)
cursor = query.cursor()
for obj in objs:
...
I've just introspected a pretty nasty schema from a CRM app with sqlalchemy. All of the tables have a deleted column on them and I wanted to auto filter all those entities and relations flagged as deleted. Here's what I came up with:
class CustomizableQuery(Query):
"""An overridden sqlalchemy.orm.query.Query to filter entities
Filters itself by BinaryExpressions
found in :attr:`CONDITIONS`
"""
CONDITIONS = []
def __init__(self, mapper, session=None):
super(CustomizableQuery, self).__init__(mapper, session)
for cond in self.CONDITIONS:
self._add_criterion(cond)
def _add_criterion(self, criterion):
criterion = self._adapt_clause(criterion, False, True)
if self._criterion is not None:
self._criterion = self._criterion & criterion
else:
self._criterion = criterion
And it's used like this:
class UndeletedContactQuery(CustomizableQuery):
CONDITIONS = [contacts.c.deleted != True]
def by_email(self, email_address):
return EmailInfo.query.by_module_and_address('Contacts', email_address).contact
def by_username(self, uname):
return self.filter_by(twod_username_c=uname).one()
class Contact(object):
query = session.query_property(UndeletedContactQuery)
Contact.query.by_email('someone#some.com')
EmailInfo is the class that's mapped to the join table between emails and the other Modules that they're related to.
Here's an example of a mapper:
contacts_map = mapper(Contact, join(contacts, contacts_cstm), {
'_emails': dynamic_loader(EmailInfo,
foreign_keys=[email_join.c.bean_id],
primaryjoin=contacts.c.id==email_join.c.bean_id,
query_class=EmailInfoQuery),
})
class EmailInfoQuery(CustomizableQuery):
CONDITIONS = [email_join.c.deleted != True]
# More methods here
This gives me what I want in that I've filtered out all deleted Contacts. I can also use this as the query_class argument to dynamic_loader in my mappers - However...
Is there a better way to do this, I'm not really happy with poking around with the internals of a compicated class like Query as I am.
Has anyone solved this in a different way that they can share?
You can map to a select. Like this:
mapper(EmailInfo, select([email_join], email_join.c.deleted == False))
I'd consider seeing if it was possible to create views for these tables that filter out the deleted elements, and then you might be able to map directly to that view instead of the underlying table, at least for querying operations. However I've never tried this myself!