Intercept all queries on a model in SQLAlchemy - python

I need to intercept all queries that concern a model in SQLAlchemy, in a way that I can inspect it at the point where any of the query methods (all(), one(), scalar(), etc.) is executed.
I have thought about the following approaches:
1. Subclass the Query class
I could subclass sqlalchemy.orm.Query and override the execution code, starting basically from something like this.
However, I am writing a library that can be used in other SQLAlchemy applications, and thus the creation of the declarative base, let alone engines and sessions, is outside my scope.
Maybe I have missed something and it is possible to override the Query class for my models without knowledge of the session?
2. Use the before_execute Core Event
I have also thought of hooking into execution with the before_execute event.
The problem is thatit is bound to an engine (see above). Also, I need to modify objects in the session, and I got the impression that I do not have access to a session from within this event.
What I want to be able to do is something like:
session.query(MyModel).filter_by(foo="bar").all() is executed.
Intercept that query and do something like storing the query in a log table within the same database (not literally that, but a set of different things that basically need the exact same functionality as this example operation)
Let the query execute like normal.
What I am trying to do in the end is inject items from another data store into the SQLAlchemy database on-the-fly upon querying. While this seems stupid - trust me, it might be less stupid than it sounds (or even more stupid) ;).

The before_compile query event might be useful for you.
from weakref import WeakSet
from sqlalchemy import event
from sqlalchemy.orm import Query
visited_queries = WeakSet()
#event.listens_for(Query, 'before_compile')
def log_query(query):
# You can get the session
session = query.session
# Prevent recursion if you want to compile the query to log it!
if query not in visited_queries:
visited_queries.add(query)
# do something with query.statement
You can look at query.column_descriptions to see if your model is being queried.

Related

SQLAlchemy: use related object when session is closed

I have many models with relational links to each other which I have to use. My code is very complicated so I cannot keep session alive after a query. Instead, I try to preload all the objects:
def db_get_structure():
with Session(my_engine) as session:
deps = {x.id: x for x in session.query(Department).all()}
...
return (deps, ...)
def some_logic(id):
struct = db_get_structure()
return some_other_logic(struct.deps[id].owner)
However, I get the following error anyway regardless of the fact that all the objects are already loaded:
sqlalchemy.orm.exc.DetachedInstanceError: Parent instance <Department at 0x10476e780> is not bound to a Session; lazy load operation of attribute 'owner' cannot proceed
Is it possible to link preloaded objects with each other so that the relations will work after session get closed?
I know about joined queries (.options(joinedload(), but this approach leads to more code lines and bigger DB request, and I think this should be solved simpler, because all the objects are already loaded into Python objects.
It's even possible now to request the related objects like struct.deps[struct.deps[id].owner_id], but I think the ORM should do this and provide shorter notation struct.deps[id].owner using some "cached load".
Whenever you access an attribute on a DB entity that has not yet been loaded from the DB, SQLAlchemy will issue an implicit SQL statement to the DB to fetch that data. My guess is that this is what happens when you issue struct.deps[struct.deps[id].owner_id].
If the object in question has been removed from the session it is in a "detached" state and SQLAlchemy protects you from accidentally running into inconsistent data. In order to work with that object again it needs to be "re-attached".
I've done this already fairly often with session.merge:
attached_object = new_session.merge(detached_object)
But this will reconile the object instance with the DB and potentially issue updates to the DB if necessary. The detached_object is taken as "truth".
I believe you can do the reverse (attaching it by reading from the DB instead of writing to it) by using session.refresh(detached_object), but I need to verify this. I'll update the post if I found something.
Both ways have to talk to the DB with at least a select to ensure the data is consistent.
In order to avoid loading, issue session.merge(..., load=False). But this has some very important cavetas. Have a look at the docs of session.merge() for details.
I will need to read up on your link you added concerning your "complicated code". I would like to understand why you need to throw away your session the way you do it. Maybe there is an easier way?

Do we need to join in the controller or model when you use SQL Alchemy?

In MVC frameworks, where do you usually embed SQL Alchemy Code, is it ideal to put the query in Controller Methods or just use the Model Methods?
query = session.query(User, Document, DocumentsPermissions).join(Document).join(DocumentsPermissions).filter_by(Document.name=="Something")
Or I just delegate this to a Model Method which takes a args? What is the preferred way to do this? One of the benefits of the latter is that it can be re-used and it almost presents a view for the API programmers. Another advantage is that I can easily over-ride this if I make it a class method. This is usually helpful in customizations especially in commercial softwares.
#Ctrl.py
self.getdocument("Foo")
#Mdl.py
def getdocument(name):
query = session.query(User, Document, DocumentsPermissions).join(Document).join(DocumentsPermissions).filter_by(Document.name=="Something")
TL;DR: Isn't the concept of "M" in MVC blurred when you use ORM's like SQL Alchemy? I didn't have any problems with Model View Controller design patterns.
[PS: I am not sure if this belongs to Code Review Site, if so please let me know, I can just transfer over.]
I strongly prefer the second approach. It has a few advantages:
Your controller code can be dumb. This is good. Controllers that just fetch data from the backend, possibly reformat it a little bit, and pass it on to views are very easy to reason about.
It's easier to test that method in isolation. You can run getdocument('valid_name'), getdocument(None), getdocument(123), etc. to ensure they all work or fail as expected without dealing with all the surrounding controller code.
It's easier to test the controller. You can write a mock for getdocument() so that it always returns a known value and test that your controller processes it correctly.
I tend to put database query code in the Controller rather than the Model. As my understanding goes, Model methods are used to transform the data of the model into something else.
For example, a UserModel may have a FullName() method to return the concatenation of the user's first and last names.
Whereas, a UserController contains a GetAll() method to get a list of all users, which is where the database query code is found.

How to replace template method pattern with functional style?

You can see the code here
The concrete problem that I'm trying to solve is this.
Say that I need to provide REST interface to some entities modeled (sqlalchemy in my case) with some tool stored in database. Say that this collection is called parents.
I would need handlers like this
GET /parents
GET /parents/some_id
POST /parents
DELETE /parents/some_id
Conceptually all this handlers are very similar.
They all take ids from url, then create appropriate query. Then they fetching data with that query, then turn this data into dict and then call jsonify to create correct http response.
So with OOP I could design this like that.
class AbstractHandler():
def __init__(serializer, **specs):
self.specs = specs
def handle_request(self, spec_data, *_ids):
q = self.create_query(_ids)
d = self.fetch_data(self.specs[spec_data['name']](**(spec_data['args'] + (query, ))
out = serializer(d)
return jsonify(out)
The spec is a function that takes some parameters and query and produce more refined query based of this parameters.
So for example
GET /parents?spec={'name': 'by_name', 'args': ['adam'}
would return parent named Adam from collection.
This code has some flaws but I hope you see the point how template method makes flow of control here and subclasses can change how they would create query, how they would fetch data (item handler would need to call query.one() and collection handler would need to call query.all() for example)
So I can replace create_query, fetch_data with dependency injection instead. But that would create a problem that someone could create wrong configuration by giving the wrong dependency. That's basically what I've done, but with using partial functions instead.
So what I'm thinking right now is that I can solve this problem by creating factory functions for every type of handler that I need, that would give appropriate dependency to handler.
That's very much alike with template method solution I think. The difference basically is that in template method correctness dependencies is guarantied by object type and in my solution it's guarantied by type of factory function.
So enough with what I think, I'd like to know what you think about that?
How people in functional world solve this kinds of problem?

SQLAlchemy: Is there a way to track the line number and calling class of a query done through SQLAlchemy?

Relatively new to SQL and SQLAlchemy, so please forgive any ignorance on my part as to the proper terminology or syntax.
I have a MySQL database to which many queries are made through SQLAlchemy from various files. I want to know from which file and from which line the queries are being made. Knowing which class is calling the query would be helpful as well. So I need some kind of identifying information from each query about its source file, or some way to retrieve that information.
Is there a way to do this?
I'm looking at Event listeners for SQLAlchemy and they do not seem to intercept any information about the code making the SQLAlchemy queries. It would also be too impractical to go into each file and insert comments into queries identifying the source file, since there are many such files.
In other words, is there a solution that does not modify the actual files from which the SQLAlchemy queries are originating?
One solution would be to create a wrapper for the session.query method (assuming that's whats being used) and monkey patch (decorate) it. You can then employ traceback module to print the current call stack.
def query_wrapper(orig_query_method):
def _query(*args, **kwargs):
traceback.print_tb(limit=2)
#print(traceback.extract_stack(limit=2))
return orig_query_method(*args, **kwargs)
return _query
And then wherever the session is initialized (hopefully, in only one file) manually apply the decorator.
from sqlalchemy import sessionmaker
# There should already be some initialization similar to this
Session = sessionmaker(bind=db_engine)
session = Session()
session.query = query_wrapper(session.query)
So whenever session.query is called, it first its last calls that got the execution to that point.

Design pattern to organize non-trivial ORM queries?

I am developing a web API with 10 tables or so in the backend, with several one-to-many and many-to-many associations. The API essentially is a database wrapper that performs validated updates and conditional queries. It's written in Python, and I use SQLAlchemy for ORM and CherryPy for HTTP handling.
So far I have separated the 30-some queries the API performs into functions of their own, which look like this:
# in module "services.inventory"
def find_inventories(session, user_id, *inventory_ids, **kwargs):
query = session.query(Inventory, Product)
query = query.filter_by(user_id=user_id, deleted=False)
...
return query.all()
def find_inventories_by(session, app_id, user_id, by_app_id, by_type, limit, page):
....
# in another service module
def remove_old_goodie(session, app_id, user_id):
try:
old = _current_goodie(session, app_id, user_id)
services.inventory._remove(session, app_id, user_id, [old.id])
except ServiceException, e:
# log it and do stuff
....
The CherryPy request handler calls the query methods, which are scattered across several service modules, as needed. The rationale behind this solution is, since they need to access multiple model classes, they don't belong to individual models, and also these database queries should be separated out from direct handling of API accesses.
I realize that the above code might be called Foreign Methods in the realm of refactoring. I could well live with this way of organizing for a while, but as things are starting to look a little messy, I'm looking for a way to refactor this code.
Since the queries are tied directly to the API and its business logic, they are hard to generalize like getters and setters.
It smells to repeat the session argument like that, but as the current implementation of the API creates a new CherryPy handler instance for each API call and therefore the session object, there is no global way of getting at the current session.
Is there a well-established pattern to organize such queries? Should I stick with the Foreign Methods and just try to unify the function signature (argument ordering, naming conventions etc.)? What would you suggest?
The standard way to have global access to the current session in a threaded environment is ScopedSession. There are some important aspects to get right when integrating with your framework, mainly transaction control and clearing out sessions between requests. A common pattern is to have an autocommit=False (the default) ScopedSession in a module and wrap any business logic execution in a try-catch clause that rolls back in case of exception and commits if the method succeeded, then finally calls Session.remove(). The business logic would then import the Session object into global scope and use it like a regular session.
There seems to be an existing CherryPy-SQLAlchemy integration module, but as I'm not too familiar with CherryPy, I can't comment on its quality.
Having queries encapsulated as functions is just fine. Not everything needs to be in a class. If they get too numerous just split into separate modules by topic.
What I have found useful is too factor out common criteria fragments. They usually fit rather well as classmethods on model classes. Aside from increasing readability and reducing duplication, they work as implementation hiding abstractions up to some extent, making refactoring the database less painful. (Example: instead of (Foo.valid_from <= func.current_timestamp()) & (Foo.valid_until > func.current_timestamp()) you'd have Foo.is_valid())
SQLAlchemy strongly suggests that the session maker be part of some global configuration.
It is intended that the sessionmaker()
function be called within the global
scope of an application, and the
returned class be made available to
the rest of the application as the
single class used to instantiate
sessions.
Queries which are in separate modules isn't an interesting problem. The Django ORM works this way. A web site usually consists of multiple Django "applications", which sounds like your site that has many "service modules".
Knitting together multiple services is the point of an application. There aren't a lot of alternatives that are better.

Categories