Design pattern to organize non-trivial ORM queries? - python

I am developing a web API with 10 tables or so in the backend, with several one-to-many and many-to-many associations. The API essentially is a database wrapper that performs validated updates and conditional queries. It's written in Python, and I use SQLAlchemy for ORM and CherryPy for HTTP handling.
So far I have separated the 30-some queries the API performs into functions of their own, which look like this:
# in module "services.inventory"
def find_inventories(session, user_id, *inventory_ids, **kwargs):
query = session.query(Inventory, Product)
query = query.filter_by(user_id=user_id, deleted=False)
...
return query.all()
def find_inventories_by(session, app_id, user_id, by_app_id, by_type, limit, page):
....
# in another service module
def remove_old_goodie(session, app_id, user_id):
try:
old = _current_goodie(session, app_id, user_id)
services.inventory._remove(session, app_id, user_id, [old.id])
except ServiceException, e:
# log it and do stuff
....
The CherryPy request handler calls the query methods, which are scattered across several service modules, as needed. The rationale behind this solution is, since they need to access multiple model classes, they don't belong to individual models, and also these database queries should be separated out from direct handling of API accesses.
I realize that the above code might be called Foreign Methods in the realm of refactoring. I could well live with this way of organizing for a while, but as things are starting to look a little messy, I'm looking for a way to refactor this code.
Since the queries are tied directly to the API and its business logic, they are hard to generalize like getters and setters.
It smells to repeat the session argument like that, but as the current implementation of the API creates a new CherryPy handler instance for each API call and therefore the session object, there is no global way of getting at the current session.
Is there a well-established pattern to organize such queries? Should I stick with the Foreign Methods and just try to unify the function signature (argument ordering, naming conventions etc.)? What would you suggest?

The standard way to have global access to the current session in a threaded environment is ScopedSession. There are some important aspects to get right when integrating with your framework, mainly transaction control and clearing out sessions between requests. A common pattern is to have an autocommit=False (the default) ScopedSession in a module and wrap any business logic execution in a try-catch clause that rolls back in case of exception and commits if the method succeeded, then finally calls Session.remove(). The business logic would then import the Session object into global scope and use it like a regular session.
There seems to be an existing CherryPy-SQLAlchemy integration module, but as I'm not too familiar with CherryPy, I can't comment on its quality.
Having queries encapsulated as functions is just fine. Not everything needs to be in a class. If they get too numerous just split into separate modules by topic.
What I have found useful is too factor out common criteria fragments. They usually fit rather well as classmethods on model classes. Aside from increasing readability and reducing duplication, they work as implementation hiding abstractions up to some extent, making refactoring the database less painful. (Example: instead of (Foo.valid_from <= func.current_timestamp()) & (Foo.valid_until > func.current_timestamp()) you'd have Foo.is_valid())

SQLAlchemy strongly suggests that the session maker be part of some global configuration.
It is intended that the sessionmaker()
function be called within the global
scope of an application, and the
returned class be made available to
the rest of the application as the
single class used to instantiate
sessions.
Queries which are in separate modules isn't an interesting problem. The Django ORM works this way. A web site usually consists of multiple Django "applications", which sounds like your site that has many "service modules".
Knitting together multiple services is the point of an application. There aren't a lot of alternatives that are better.

Related

Intercept all queries on a model in SQLAlchemy

I need to intercept all queries that concern a model in SQLAlchemy, in a way that I can inspect it at the point where any of the query methods (all(), one(), scalar(), etc.) is executed.
I have thought about the following approaches:
1. Subclass the Query class
I could subclass sqlalchemy.orm.Query and override the execution code, starting basically from something like this.
However, I am writing a library that can be used in other SQLAlchemy applications, and thus the creation of the declarative base, let alone engines and sessions, is outside my scope.
Maybe I have missed something and it is possible to override the Query class for my models without knowledge of the session?
2. Use the before_execute Core Event
I have also thought of hooking into execution with the before_execute event.
The problem is thatit is bound to an engine (see above). Also, I need to modify objects in the session, and I got the impression that I do not have access to a session from within this event.
What I want to be able to do is something like:
session.query(MyModel).filter_by(foo="bar").all() is executed.
Intercept that query and do something like storing the query in a log table within the same database (not literally that, but a set of different things that basically need the exact same functionality as this example operation)
Let the query execute like normal.
What I am trying to do in the end is inject items from another data store into the SQLAlchemy database on-the-fly upon querying. While this seems stupid - trust me, it might be less stupid than it sounds (or even more stupid) ;).
The before_compile query event might be useful for you.
from weakref import WeakSet
from sqlalchemy import event
from sqlalchemy.orm import Query
visited_queries = WeakSet()
#event.listens_for(Query, 'before_compile')
def log_query(query):
# You can get the session
session = query.session
# Prevent recursion if you want to compile the query to log it!
if query not in visited_queries:
visited_queries.add(query)
# do something with query.statement
You can look at query.column_descriptions to see if your model is being queried.

Advice on database access approach in a custom environment using Python Pyramid

I’m new to Pyramid. I’ve used Python for a few months. I've created a Python application on Linux to maintain an Oracle database using weekly data feeds from a vendor. To get that done, one of the things I did was to create a customized database wrapper class using the cx_Oracle package. I had specific requirements for maintaining history in the DB. All Oracle access goes through the methods in this wrapper. I now want to use Pyramid to create a simple reporting browser interface to the Oracle DB. To allow me the greatest flexibility, I’d like to use the wrapper I already have to get to the data on Oracle instead of Alchemy (or possibly with it, I'm not sure).
In my Pyramid app, I’ve considered importing my wrapper in my views.py init method but that seems to get executed with every browser submit.
Can anyone suggest how I might create a persistent connection to Oracle that I can use over and over from my reporting application which uses my wrapper class? I’m finding Pyramid a bit opaque. I’m never sure what’s happening behind the scenes but I’m willing to operate on trust until I get the swing of it. I need the benefit of the automatic authorization/ authentication and login.
What I’m really looking for is a good approach from experienced Pyramid users before going down the wrong track.
Many thanks.
You should definitely use SQLAlchemy as it makes use of connection pooling and such. In your case SQLAlchemy would use cx_Oracle underneath but make you be able to concentrate writing actual code instead of maintaining connections/pooling them/and such.
You should follow the patterns in Wiki2 tutorial to set up basic SQLAlchemy.
So, basically, the question boils down to "how do I use my existing API with Pyramid", right? This is quite easy as Pyramid is database-agnostic and very transparent in this area despite what you say :) Basically, you import it, call its methods, retrieve data and send it to the template:
import pokemon_api as api
#view_config(route_name='list_pokemon', renderer='pokemon_list.mako')
def list_pokemon(request):
# To illustrate how to get data from the request to send to your API
batch_start = request.GET.get('batch_start', 0)
batch_end = batch_start + request.GET.get('batch_size', 0)
sort_order = request.GET.get('sort_by', 'name')
# Here we call our API - we don't actually care where it gets the data from, it's a black box
pokemon = api.retrieve_pokemon(from=batch_start, to=batch_end, sort=sort_order)
# send the data to the renderer/template
return {
'pokemon': pokemon
}
#view_config(route_name='add_pokemon', request_method='POST')
def add_pokemon(request):
"""
Add a new Pokemon
"""
name = request.POST.get('name', 0)
weight = request.GET.get('weight', 0)
hight = request.GET.get('hight', 0)
api.create(name=name, weight=weight, height=height)
# go back to the listing
return HTTPFound('/pokemon_list')
and if your API needs some initialization, you can do it at startup time
import pokemon_api as api
def main(global_config, **settings):
""" This function returns a Pyramid WSGI application.
"""
config = Configurator(settings=settings)
...
api.init("MAGIC_CONNECTION_STRING")
return config.make_wsgi_app()
Of course, this assumes your API already handles transactions, connections, pooling and other boring stuff :)
One last point to mention - in Python you generally don't import things inside methods, you import them at the top of the file at module-level scope. There are exceptions for that rule, but I don't see why you would need that in this case. Also, importing a module should be free of side-effects (which you might have since "importing my wrapper in my views.py __init__ method" seems to be causing trouble).

Do we need to join in the controller or model when you use SQL Alchemy?

In MVC frameworks, where do you usually embed SQL Alchemy Code, is it ideal to put the query in Controller Methods or just use the Model Methods?
query = session.query(User, Document, DocumentsPermissions).join(Document).join(DocumentsPermissions).filter_by(Document.name=="Something")
Or I just delegate this to a Model Method which takes a args? What is the preferred way to do this? One of the benefits of the latter is that it can be re-used and it almost presents a view for the API programmers. Another advantage is that I can easily over-ride this if I make it a class method. This is usually helpful in customizations especially in commercial softwares.
#Ctrl.py
self.getdocument("Foo")
#Mdl.py
def getdocument(name):
query = session.query(User, Document, DocumentsPermissions).join(Document).join(DocumentsPermissions).filter_by(Document.name=="Something")
TL;DR: Isn't the concept of "M" in MVC blurred when you use ORM's like SQL Alchemy? I didn't have any problems with Model View Controller design patterns.
[PS: I am not sure if this belongs to Code Review Site, if so please let me know, I can just transfer over.]
I strongly prefer the second approach. It has a few advantages:
Your controller code can be dumb. This is good. Controllers that just fetch data from the backend, possibly reformat it a little bit, and pass it on to views are very easy to reason about.
It's easier to test that method in isolation. You can run getdocument('valid_name'), getdocument(None), getdocument(123), etc. to ensure they all work or fail as expected without dealing with all the surrounding controller code.
It's easier to test the controller. You can write a mock for getdocument() so that it always returns a known value and test that your controller processes it correctly.
I tend to put database query code in the Controller rather than the Model. As my understanding goes, Model methods are used to transform the data of the model into something else.
For example, a UserModel may have a FullName() method to return the concatenation of the user's first and last names.
Whereas, a UserController contains a GetAll() method to get a list of all users, which is where the database query code is found.

Django Admin using RESTful API v.s. Database

This is a bit of a strange question, I know, but bear with me. We've developed a RESTful platform using Python for one of our iPhone apps. The webapp version has been built using Django, which makes use of this API as well. We were thinking it would be a great idea to use Django's built-in control panel capabilities to help manage the data.
This itself isn't the issue. The problem is that everyone has decided it would be best of the admin center was essentially a client that sits on top of the RESTful platform.
So, my question is, is there a way to manipulate the model layer of Django to access our API directly, rather than communicated directly with the database? The model layer would act as the client passing requests and responses to and from the admin center.
I'm sure this is possible, but I'm not so sure as to where I would start. Any input?
I remember I once thought about doing such thing. At the time, I created a custom Manager using a custom QuerySet. And I overrode some methods such as _filter_or_exclude(), count(), exists(), select_related(), ... and added some properties. It took less than a week to become a total mess that had probably no chance to work one day. So I immediately stopped everything and found a more suitable solution.
If I had to do it once again, I would take a long time to consider alternatives. And if it really sounds like the best thing to do, I'd probably create a custom database backend. This backend would, rather than converting Django ORM queries to SQL queries, convert them to HTTP requests.
To do so, I think the best starting point would be to get familiar with django source code concerning database backends.
I also think there are some important things to consider before starting such development:
Is the API able to handle any Django ORM request? Put another way: Will any Django ORM query be translatable to an API request?
If not, may "untranslatable" queries be safely ignored? For instance, an ORDER BY clause might be safe to ignore. While a GROUP BY clause is very unlikely to be safely dismissed.
If some queries can't be neither translated nor ignored, may them be reasonably emulated. For instance, if your API does not support a COUNT() operation, you could emulate it by getting the whole data and count it in python with len(), but is this reasonable?
If they are still some queries that you won't be able to handle (which is more than likely): Are all "common" queries (in this case, all queries potentially used by Django Admin) covered and will it be possible to upgrade the API if an uncovered case is discovered lately or is introduced in a future version of Django?
According to the use case, there are probably tons of other considerations to take, such as:
the integrity of the data
support of transactions
the timing of a query which will be probably much higher than just querying a local (or even remote) database.

Alternative to singleton?

I'm a Python & App Engine (and server-side!) newbie, and I'm trying to create very simple CMS. Each deployment of the application would have one -and only one -company object, instantiated from something like:
class Company(db.Model):
name = db.StringPropery()
profile = db.TextProperty()
addr = db.TextProperty()
I'm trying to provide the facility to update the company profile and other details.
My first thought was to have a Company entity singleton. But having looked at (although far from totally grasped) this thread I get the impression that it's difficult, and inadvisable, to do this.
So then I thought that perhaps for each deployment of the CMS I could, as a one-off, run a script (triggered by a totally obscure URL) which simply instantiates Company. From then on, I would get this instance with theCompany = Company.all()[0]
Is this advisable?
Then I remembered that someone in that thread suggested simply using a module. So I just created a Company.py file and stuck a few variables in it. I've tried this in the SDK and it seems to work -to my suprise, modified variable values "survived" between requests.
Forgive my ignorance but, I assume these values are only held in memory rather than on disk -unlike Datastore stuff? Is this a robust solution? (And would the module variables be in scope for all invocations of my application's scripts?)
Global variables are "app-cached." This means that each particular instance of your app will remember these variables' values between requests. However, when an instance is shutdown these values will be lost. Thus I do not think you really want to store these values in module-level variables (unless they are constants which do not need to be updated).
I think your original solution will work fine. You could even create the original entity using the remote API tool so that you don't need an obscure page to instantiate the one and only Company object.
You can also make the retrieval of the singleton Company entity a bit faster if you retrieve it by key.
If you will need to retrieve this entity frequently, then you can avoid round-trips to the datastore by using a caching technique. The fastest would be to app-cache the Company entity after you've retrieved it from the datastore. To protect against the entity from becoming too out of date, you can also app-cache the time you last retrieved the entity and if that time is more than N seconds old then you could re-fetch it from the datastore. For more details on this option and how it compares to alternatives, check out Nick Johnson's article Storage options on App Engine.
It sounds like you are trying to provide a way for your app to be configurable on a per-application basis.
Why not use the datastore to store your company entity with a key_name? Then you will always know how to fetch the company entity, and you'll be able edit the company without redeploying.
company = Company(key_name='c')
# set stuff on company....
company.put()
# later in code...
company = Company.get_by_key_name('c')
Use memcache to store the details of the company and avoid repeated datastore calls.
In addition to memcache, you can use module variables to cache the values. They are cached, as you have seen, between requests.
I think the approach you read about is the simplest:
Use module variables, initialized in None.
Provide accessors (get/setters) for these variables.
When a variable is accessed, if its value is None, fetch it from the database. Otherwise, just use it.
This way, you'll have app-wide variables provided by the module (which won't be instantiated again and again), they will be shared and you won't lose them.

Categories