Pyramid with SQLAlchemy: scoped or non-scoped database session

Pyramid with SQLAlchemy: scoped or non-scoped database session - python

For older versions of pyramid the setup for sqlalchemy session was done with scooped_session similar to this
DBSession = scoped_session(
sessionmaker(
autoflush=True,
expire_on_commit=False,
extension=zope.sqlalchemy.ZopeTransactionExtension()
)
However I see that newer tutorials as well the pyramid docs 'promotes' sqlalchemy with no threadlocals where the DBSession is attached to the request object.
Is the 'old' way broken and what is the advantage of the no threadlocals ?

I spearheaded this transition with help from several other contributors who had blogged [1] about some advantages. It basically boils down to following the pyramid philosophy of making it possible for applications to be written that do not require any global variables. This is really important when writing reusable, composable code. It makes your code's dependencies (api surface) clear, instead of having random functions dependent on your database, despite their function signatures / member variables not exposing those dependencies. This also makes it easier to test code because you don't have to worry as much about threadlocal variables. With globals you need to track down what modules may be holding references to them and patch them to use the new object. Without globals, you simply pass in the objects you want to use and the code uses them, just like any other parameter to a function or state on an object.
A lot of people complain about having to pass their database to tons of functions. This is a smell and just means you aren't designing your apis well. Many times you can structure things as an object that's created once per-request and stores the handle as something like self.dbsession, and each method on the object now has access to it.
[1] https://metaclassical.com/testing-pyramid-apps-without-a-scoped-session/

Related

Getting and serializing the state of dynamically created python instances to a relational model

I'm developing a framework of sorts. I'm providing a base class, that will be subclassed by other developers to add behavior to the system. The instances of those classes will have attributes that my framework doesn't necessarily expect, except by inspecting those instances' __dict__. To make things even more interesting, some of those classes can be created dynamically, at any time.
I'd like some things to be handled by the framework, namely, I will need to persist those instances, display their attribute values to the user, and let her search/filter instances using those values.
I have to use a relational database. I know there are some decent python OO database out there, but unfortunately they're not an option in this case.
I'm not looking for a full-blown ORM too... and it may not even be an option, given that some of the classes can be created dynamically.
So, my question is, what state of a python instance do I need to serialize to ensure that I can deserialize it later on? Is it enough to look at __dict__, or are there other private attributes that I should be using?
Pickling the instances is not enough, because I'll need to unpickle them to search/filter the attribute values, and I'm afraid it's too much data to do it in-memory (instead of letting the database do it).

Just use an ORM. This is what they are for.
What you are proposing to do is create your own half-assed ORM on your own time. Save your time for your own code that does things, and use the effort other people put for free into solving this problem for you.
Note that all class creation in python is "dynamic" - this is not an issue, for, well, anything at all. In fact, if you are assembling classes programmatically, it is probably slightly easier with an ORM, because they provide reifications of fields.
In the worst case, if you really do need to store your objects in a fake nosql-type schema, you will still only have to write your own backend driver if you use an existing ORM, rather than coding the whole stack yourself. (As it happens, you're not the first person to face this - solutions exist. Goole "python orm store dynamically created models" and "sqlalchemy store dynamically created models")
Candidates include:
Django ORM
SQLAlchemy
Some others you can find by googling "Python ORM".

Passing custom Python objects to nosetests

I am attempting to re-organize our test libraries for automation and nose seems really promising. My question is, what is the best strategy for passing Python objects into nose tests?
Our tests are organized in a testlib with a bunch of modules that exercise different types of request operations. Something like this:
testlib
\-testmoda
\-testmodb
\-testmodc
In some cases the test modules (i.e. testmoda) is nothing but test_something(), test_something2() functions while in some cases we have a TestModB class in testmob with the test_anotherthing1(), test_anotherthing2() functions. The cool thing is that nose easily finds both.
Most of those test functions are request factory stuff that can easily share a single connection to our server farm. Thus we do a lot of test_something1(cnn), TestModB.test_anotherthing2(cnn), etc.
Currently we don't use nose, instead we have a hodge-podge of homegrown driver scripts with hard-coded lists of tests to execute. Each of those driver scripts creates its own connection object. Maintaining those scripts and the connection minutia is painful.
I'd like to take free advantage of nose's beautiful discovery functionality, passing in a connection object of my choosing.
Thanks in advance!
Rob
P.S. The connection objects are not pickle-able. :(

Could you use a factory create the connections, then have the functions test_something1() (taking no arguments) use the factory to get a connection?

As far as I can tell, there is no easy way to simply pass custom objects to Nose.
However, as Matt pointed out there are some viable workarounds to achieve similar results.
Basically, do this:
Setup a data dictionary as a package level global
Add custom objects to that dictionary
Create some factory functions to return those custom objects or create new ones if they're present/suitable
Refactor the existing testlib\testmod* modules to use the factory

Issues with scoped_session in sqlalchemy - how does it work?

I'm not really sure how scoped_session works, other than it seems to be a wrapper that hides several real sessions, keeping them separate for different requests. Does it do this with thread locals?
Anyway the trouble is as follows:
S = elixir.session # = scoped_session(...)
f = Foo(bar=1)
S.add(f) # ERROR, f is already attached to session (different session)
Not sure how f ended up in a different session, I've not had problems with that before. Elsewhere I have code that looks just like that, but actually works. As you can imagine I find that very confusing.
I just don't know anything here, f seems to be magically added to a session in the constructor, but I don't seem to have any references to the session it uses. Why would it end up in a different session? How can I get it to end up in the right session? How does this scoped_session thing work anyway? It just seems to work sometimes, and other times it just doesn't.
I'm definitely very confused.

Scoped session creates a proxy object that keeps a registry of (by default) per thread session objects created on demand from the passed session factory. When you access a session method such as ScopedSession.add it finds the session corresponding to the current thread and returns the add method bound to that session. The active session can be removed using the ScopedSession.remove() method.
ScopedSession has a few convenience methods, one is query_property that creates a property that returns a query object bound to the scoped session it was created on and the class it was accessed. The other is ScopedSession.mapper that adds a default __init__(**kwargs) constructor and by default adds created objects to the scoped session the mapper was created off. This behavior can be controlled by the save_on_init keyword argument to the mapper. ScopedSession.mapper is deprecated because of exactly the problem that is in the question. This is one case where the Python "explicit is better than implicit" philosophy really applies. Unfortunately Elixir still by default uses ScopedSession.mapper.

It turns out elixir sets save-on-init=True on the created mappers. This can be disabled by:
using_mapper_options(save_on_init=False)
This solves the problem. Kudos to stepz on #sqlalchemy for figuring out what was going on immediately. Although I am still curious how scoped_session really works, so if someone answers that, they'll get credit for answering the question.

What's a good general way to look SQLAlchemy transactions, complete with authenticated user, etc?

I'm using SQLAlchemy's declarative extension. I'd like all changes to tables logs, including changes in many-to-many relationships (mapping tables). Each table should have a separate "log" table with a similar schema, but additional columns specifying when the change was made, who made the change, etc.
My programming model would be something like this:
row.foo = 1
row.log_version(username, change_description, ...)
Ideally, the system wouldn't allow the transaction to commit without row.log_version being called.
Thoughts?

There are too many questions in one, so they that full answers to all them won't fit StackOverflow answer format. I'll try to describe hints in short, so ask separate question for them if it's not enough.
Assigning user and description to transaction
The most popular way to do so is assigning user (and other info) to some global object (threading.local() in threaded application). This is very bad way, that causes hard to discover bugs.
A better way is assigning user to the session. This is OK when session is created for each web request (in fact, it's the best design for application with authentication anyway), since there is the only user using this session. But passing description this way is not as good.
And my favorite solution is to extent Session.commit() method to accept optional user (and probably other info) parameter and assign it current transaction. This is the most flexible, and it suites well to pass description too. Note that info is bound to single transaction and is passed in obvious way when transaction is closed.
Discovering changes
There is a sqlalchemy.org.attributes.instance_state(obj) contains all information you need. The most useful for you is probably state.committed_state dictionary which contains original state for changed fields (including many-to-many relations!). There is also state.get_history() method (or sqlalchemy.org.attributes.get_history() function) returning a history object with has_changes() method and added and deleted properties for new and old value respectively. In later case use state.manager.keys() (or state.manager.attributes) to get a list of all fields.
Automatically storing changes
SQLAlchemy supports mapper extension that can provide hooks before and after update, insert and delete. You need to provide your own extension with all before hooks (you can't use after since the state of objects is changed on flush). For declarative extension it's easy to write a subclass of DeclarativeMeta that adds a mapper extension for all your models. Note that you have to flush changes twice if you use mapped objects for log, since a unit of work doesn't account objects created in hooks.

We have a pretty comprehensive "versioning" recipe at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/LogVersions . It seems some other users have contributed some variants on it. The mechanics of "add a row when something changes at the ORM level" are all there.
Alternatively you can also intercept at the execution level using ConnectionProxy, search through the SQLA docs for how to use that.
edit: versioning is now an example included with SQLA: http://docs.sqlalchemy.org/en/rel_0_8/orm/examples.html#versioned-objects

Propagating application settings

Probably a very common question, but couldn't find suitable answer yet..
I have a (Python w/ C++ modules) application that makes heavy use of an SQLite database and its path gets supplied by user on application start-up.
Every time some part of application needs access to database, I plan to acquire a new session and discard it when done. For that to happen, I obviously need access to the path supplied on startup. Couple of ways that I see it happening:
1. Explicit arguments
The database path is passed everywhere it needs to be through an explicit parameter and database session is instantiated with that explicit path. This is perhaps the most modular, but seems to be incredibly awkward.
2. Database path singleton
The database session object would look like:
import foo.options
class DatabaseSession(object):
def __init__(self, path=foo.options.db_path):
...
I consider this to be the lesser-evil singleton, since we're storing only constant strings, which don't change during application runtime. This leaves it possible to override the default and unit test the DatabaseSession class if necessary.
3. Database path singleton + static factory method
Perhaps slight improvement over the above:
def make_session(path=None):
import foo.options
if path is None:
path = foo.options.db_path
return DatabaseSession(path)
class DatabaseSession(object):
def __init__(self, path):
...
This way the module doesn't depend on foo.options at all, unless we're using the factory method. Additionally, the method can perform stuff like session caching or whatnot.
And then there are other patterns, which I don't know of. I vaguely saw something similar in web frameworks, but I don't have any experience with those. My example is quite specific, but I imagine it also expands to other application settings, hence the title of the post.
I would like to hear your thoughts about what would be the best way to arrange this.

Yes, there are others. Your option 3 though is very Pythonic.
Use a standard Python module to encapsulate options (this is the way web frameworks like Django do it)
Use a factory to emit properly configured sessions.
Since SQLite already has a "connection", why not use that? What does your DatabaseSession class add that the built-in connection lacks?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.