Give each user their own database

Give each user their own database - python

I'm constructing my app such that each user has their own database (for easy isolation, and to minimize the need for sharding). This means that each web request, and all of the background scripts, need to connect to a different database based on which user is making the request, and use that connection for all function calls.
I figure I can make some sort of middleware that would pass the right connection to my web requests by attaching it to the request variable, but I don't know how I should ensure that all functions and model methods called by the request use this connection.

Well how to "ensure that all functions and model methods called by the request use this connection" is easy. You pass the connection into your api as with any well-designed code that isn't relying on global variables for such things. So you have a database session object loaded per-request, and you pass it down. It's very easy for model objects to turtle that session object further without explicitly passing it because each managed object knows what session owns it, and you can query it from there.
db = request.db
user = db.query(User).get(1)
user.add_group('foo')
class User(Base):
def add_group(self, name):
db = sqlalchemy.orm.object_session(self)
group = Group(name=name)
db.add(group)
I'm not recommending you use that exact pattern but it serves as an example of how to grab the session from a managed object, avoiding having to pass the session everywhere explicitly.
On to your original question, how to handle multi-tenancy... In your data model! Designing a system where you are splitting things up at that low of a level is a big maintenance burden and it does not scale well. For example it becomes very difficult to use any type of connection pooling when you have an arbitrary number of independent connections. To get around that people commonly use the SQL SCHEMA feature supported by some databases. That allows you to use the same connection but have access to a different table structure per session. That's better, but again managing all of those schemas independently should raise some red flags, violating DRY with all of that duplication in your data model. Any duplication at that level quickly becomes a burden that you need to be ready for.

Related

What is a good way to organize your models, connections if one wants to use SQLAlchemy to connect several databases to various applications?

Background:
This is the situation I am facing and so far my current solution seems rather clunky. I want to improve on it. Right now:
I setup connections to each database in the main function of the Pyramid application:
def main(global_config, **settings):
a_engine = engine_from_config(settings, 'A.')
b_engine = engine_from_config(settings, 'B.')
ASession.configure(bind=a_engine)
BSession.configure(bind=b_engine)
"ASession" and "BSession" are simply globally defined scoped_session in /models/init.py.
ASession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
I define model base class like so. For example:
ABase = declarative_base()
class user(ABase):
id = Column(Integer, primary_key=True)
name = Column(String)
This somehow already doesn't feel very clean. But now that this model is supposed to be accessed from a different application, I also need to define the engine and connection again in that application. This feels extremely redundant.
Problem Abstracted:
Assume that there are 2 different databases:
A and B
Also assume that you want A and B to be accessible from 2 different applications (e.g.: Pyramid application, Bokeh Server App which uses Tornado) using the same model.
In short, how would one best pattern objects/models/classes/functions to produce clean non-redundant code in Python3?
Initial Thought After The Question Was Posted:
Thinking about this a bit more, I think I want each model to be somehow "self-contained". The model should bring with it methods for initiating connections. In other words, the initiation of db connections should be decoupled from the web application itself.
And it should be done in an instance kind of manner. So that multiple applications can use the same models. Each application would have its own session connection to either DB.
How would the community pattern this? Friday afternoons don't lend themselves to find answers to these kinds of questions for myself at least.

I have done this. My recommendation below is how I like doing it, but is not the only way. I would ditch scoped sessions and the transaction manager and make explicit session management objects, with request lifecycle callbacks handling creation, closing, committing, or rolling back your sessions. Basically scoped sessions are a way to simulate a global by getting the same item for that thread of execution. The other way to do this in Pyramid is to attach things to the registry and the request, because you have those everywhere. You attach shared components to the registry (the ZCA) and per-request objects to the request.
When you have multiple sessions, I've found it much easier to reason about them and keep track of them if they are handled by components that wrap up everything for that engine. So for a case like that you describe, I've made two different DB engine components, that are created on start up, attached to the registry, and have a method for getting a fresh session. If you create these components properly, they should be usable in any application, whether it's Pyramid, Tornado, or your test script. You just make sure it has a constructor with some sane way of passing in settings for setting up the engine, whether it's a settings dict or kwargs. I then make my data model(s) live in their own python packages and it's easy to have any app in the family import the model, instantiate the engine components and go to town. Note that if you like using the ZCA registry (and I love it, it's a fantastic DI system), there's nothing preventing you from using it in non-pyramid apps, you just set it up manually in your server start up code.
In Pyramid specifically, I make a custom Request class and use the reify decorator to allow other pyramid code to get the session(s) for that request. The request class has end-of-life-callbacks attached to close out the sessions, and to do rollbacks or commits. There is a bit more boilerplate, but for me it's cleaner in that I can very easily trace where and when in code and time my session management is happening. It's also a good way for testing.
That said, there are lots of smart folks in SQLAlchemy/Pyramid land who swear by scoped sessions and the transaction manager, so there are other valid approaches. Hope that helps.

Flask-SQLAlchemy and SQLAlchemy

I'm building a small website and I already have all my models in SQLAlchemy. The website is to publish some information from some calculations which are done offline. Only the results will be published to a slimmed down database i.e. it contains the results, not the raw data, but the website needs to query the results.
I'm going to use Flask, as my models are already driven with Python (and some heavy lifting in C++ via SWIG) and I don't want to use Django.
Now this has been asked before I'm sure and the usual mantra without much justification is to 'use Flask-SQLAlchemy'. The question is why?
If I write some session handling myself, why do I have to go through the additional layer of redefining my database in Flask-SQLAlchemy. Other than having to write some code like here in my Flask app somewhere:
#app.before_request
def before_request():
g.db = connect_db()
#app.teardown_request
def teardown_request(exception):
db = getattr(g, 'db', None)
if db is not None:
db.close()
What else do I need to worry about? SQLAlchemy even does connection pooling for me by default.

Actually, you are building a web application using Flask which do database related kinds of stuff by sqlalchemy. So, when you are dealing with database session with multiple requests which your application handles, then you have to ascertain that, you are creating and closing sessions cautiously.
If you read SQLAlchemy docs, they recommend to keep the lifecycle of the session separate and external from functions and objects that access and/or manipulate database data. This will greatly help with achieving a predictable and consistent transactional scope.
A web application is the easiest case because such an application is already constructed around a single, consistent scope - this is the request, which represents an incoming request from a browser, the processing of that request to formulate a response, and finally the delivery of that response back to the client. Integrating web applications with the Session is then the straightforward task of linking the scope of the Session to that of the request. The Session can be established as the request begins, or using a lazy initialization pattern which establishes one as soon as it is needed. The request then proceeds, with some system in place where application logic can access the current Session in a manner associated with how the actual request object is accessed. As the request ends, the Session is torn down as well, usually through the usage of event hooks provided by the web framework. The transaction used by the Session may also be committed at this point, or alternatively the application may opt for an explicit commit pattern, only committing for those requests where one is warranted, but still always tearing down the Session unconditionally at the end.
In laymen terms, i mean to say that
In SQLAlchemy, above action is mentioned because sessions in web application should be scoped, meaning that each request handler creates and destroys its own session.
This is necessary because web servers can be multi-threaded, so multiple requests might be served at the same time, each working with a different database session.
This means that if you are using SqlAlchemy with Flask, you have to manually handle session like creating scoped session and also remove them on each request cautiously otherwise you may be in deep shit, which adds extra layer of complexity to your web application.
But, there comes Flask-SqlAlchemy (an extension of sqlalchemy library for Flask app) which provides infrastructure to assist in the task of aligning the lifespan of a Session with that of each web request. Actually, you can also found that in SqlAlchmey docs, they also recommend to use this with Flask.
Flask-SQLAlchemy creates a fresh/new scoped session for each request. If you dig further it you will find out here, it also installs a hook on app.teardown_appcontext (for Flask >=0.9), app.teardown_request (for Flask 0.7-0.8), app.after_request (for Flask <0.7) and here is where it calls db.session.remove().

The code you put in the question is not actually valid for Sqlalchemy integration is Flask. I know it is just example, but saying that just in case.
For Sqlalchemy integration all you need to do is to make sure current DbSession is cleaned up at the end of request via something like this:
#app.teardown_appcontext
def shutdown_session(exception=None):
DbSession.remove()
where DbSession is scoped session.
Here is documentation for the case when you dont want to use Flask-Sqlalchemy package.

Shared object between requests in Django

I am using a Python module (PyCLIPS) and Django 1.3.
I want develop a thread-safety class which realizes the Object Pool and the Singleton patterns and also that have to be shared between requests in Django.
For example, I want to do the following:
A request gets the object with some ID from the pool, do
something with it and push it back to the pool, then send response
with the object's ID.
Another request, that has the object's ID, gets
the object with the given ID from the pool and repeats the steps from the above request.
But the state of the object will has to be kept while it'll be at the pool while the server is running.
It should be like a Singleton Session Bean in Java EE
How I should do it? Is there something I'll should read?
Update:
I can't store objects from the pool in a database, because these objects are wrappers under a library written on C-language which is API for the Expert System Engine CLIPS.
Thanks!

Well, I think a different angle is necessary here. Django is not like Java, the solution should be tailored for a multi-process environment, not a multi-threaded one.
Django has no immediate equivalent of a singleton session bean.
That said, I see no reason your description does not fit a classic database model. You want to save per object data, which should always go in the DB layer.
Otherwise, you can always save stuff on the session, which Django provides for both logged-in users as well as for anonymous ones - see the docs on Django sessions.
Usage of any other pattern you might be familiar with from a Java environment will ultimately fail, considering the vast difference between running a Java web container, and the Python/Django multi-process environment.
Edit: well, considering these objects are not native to your app rather accessed via a third-party library, it does complicate things. My gut feeling is that these objects should not be handled by the web layer but rather by some sort of external service which you can access from a multi-process environment. As Daniel mentioned, you can always throw them in the cache (if said objects are pickle-able). But it feels as if these objects do not belong in the web tier.

Assuming the object cannot be pickled, you will need to create an app to manage the object and all of the interactions that need to happen against it. Probably the easiest implementation would be to create a single process wsgi app (on a different port) that exposes an api to do all of the operations that you need. Whether you use a RESTful api or form posts is up to your personal preference.

Are these database objects? Because if so, the db itself is really the pool, and there's no need to do anything special - each request can independently load the instance from the db, modify it, and save it back.
Edit after comment Well, the biggest problem is that a production web server environment is likely to be multi-process, so any global variables (ie the pool) are not shared between processes. You will need to store them somewhere that's globally accessible. A short in the dark, but are they serializable using Pickle? If so, then perhaps memcache might work.

With sqlalchemy how to dynamically bind to database engine on a per-request basis

I have a Pylons-based web application which connects via Sqlalchemy (v0.5) to a Postgres database. For security, rather than follow the typical pattern of simple web apps (as seen in just about all tutorials), I'm not using a generic Postgres user (e.g. "webapp") but am requiring that users enter their own Postgres userid and password, and am using that to establish the connection. That means we get the full benefit of Postgres security.
Complicating things still further, there are two separate databases to connect to. Although they're currently in the same Postgres cluster, they need to be able to move to separate hosts at a later date.
We're using sqlalchemy's declarative package, though I can't see that this has any bearing on the matter.
Most examples of sqlalchemy show trivial approaches such as setting up the Metadata once, at application startup, with a generic database userid and password, which is used through the web application. This is usually done with Metadata.bind = create_engine(), sometimes even at module-level in the database model files.
My question is, how can we defer establishing the connections until the user has logged in, and then (of course) re-use those connections, or re-establish them using the same credentials, for each subsequent request.
We have this working -- we think -- but I'm not only not certain of the safety of it, I also think it looks incredibly heavy-weight for the situation.
Inside the __call__ method of the BaseController we retrieve the userid and password from the web session, call sqlalchemy create_engine() once for each database, then call a routine which calls Session.bind_mapper() repeatedly, once for each table that may be referenced on each of those connections, even though any given request usually references only one or two tables. It looks something like this:
# in lib/base.py on the BaseController class
def __call__(self, environ, start_response):
# note: web session contains {'username': XXX, 'password': YYY}
url1 = 'postgres://%(username)s:%(password)s#server1/finance' % session
url2 = 'postgres://%(username)s:%(password)s#server2/staff' % session
finance = create_engine(url1)
staff = create_engine(url2)
db_configure(staff, finance) # see below
... etc
# in another file
Session = scoped_session(sessionmaker())
def db_configure(staff, finance):
s = Session()
from db.finance import Employee, Customer, Invoice
for c in [
Employee,
Customer,
Invoice,
]:
s.bind_mapper(c, finance)
from db.staff import Project, Hour
for c in [
Project,
Hour,
]:
s.bind_mapper(c, staff)
s.close() # prevents leaking connections between sessions?
So the create_engine() calls occur on every request... I can see that being needed, and the Connection Pool probably caches them and does things sensibly.
But calling Session.bind_mapper() once for each table, on every request? Seems like there has to be a better way.
Obviously, since a desire for strong security underlies all this, we don't want any chance that a connection established for a high-security user will inadvertently be used in a later request by a low-security user.

Binding global objects (mappers, metadata) to user-specific connection is not good way. As well as using scoped session. I suggest to create new session for each request and configure it to use user-specific connections. The following sample assumes that you use separate metadata objects for each database:
binds = {}
finance_engine = create_engine(url1)
binds.update(dict.fromkeys(finance_metadata.sorted_tables, finance_engine))
# The following line is required when mappings to joint tables are used (e.g.
# in joint table inheritance) due to bug (or misfeature) in SQLAlchemy 0.5.4.
# This issue might be fixed in newer versions.
binds.update(dict.fromkeys([Employee, Customer, Invoice], finance_engine))
staff_engine = create_engine(url2)
binds.update(dict.fromkeys(staff_metadata.sorted_tables, staff_engine))
# See comment above.
binds.update(dict.fromkeys([Project, Hour], staff_engine))
session = sessionmaker(binds=binds)()

I would look at the connection pooling and see if you can't find a way to have one pool per user.
You can dispose() the pool when the user's session has expired

How do I create a D-Bus service that dynamically creates multiple objects?

I'm new to D-Bus (and to Python, double whammy!) and I am trying to figure out the best way to do something that was discussed in the tutorial.
However, a text editor application
could as easily own multiple bus names
(for example, org.kde.KWrite in
addition to generic TextEditor), have
multiple objects (maybe
/org/kde/documents/4352 where the
number changes according to the
document), and each object could
implement multiple interfaces, such as
org.freedesktop.DBus.Introspectable,
org.freedesktop.BasicTextField,
org.kde.RichTextDocument.
For example, say I want to create a wrapper around flickrapi such that the service can expose a handful of Flickr API methods (say, urls_lookupGroup()). This is relatively straightforward if I want to assume that the service will always be specifying the same API key and that the auth information will be the same for everyone using the service.
Especially in the latter case, I cannot really assume this will be true.
Based on the documentation quoted above, I am assuming there should be something like this:
# Get the connection proxy object.
flickrConnectionService = bus.get_object("com.example.FlickrService",
"/Connection")
# Ask the connection object to connect, the return value would be
# maybe something like "/connection/5512" ...
flickrObjectPath = flickrConnectionService.connect("MY_APP_API_KEY",
"MY_APP_API_SECRET",
flickrUsername)
# Get the service proxy object.
flickrService = bus.get_object("com.example.FlickrService",
flickrObjectPath);
# As the flickr service object to get group information.
groupInfo = flickrService.getFlickrGroupInfo('s3a-belltown')
So, my questions:
1) Is this how this should be handled?
2) If so, how will the service know when the client is done? Is there a way to detect if the current client has broken connection so that the service can cleanup its dynamically created objects? Also, how would I create the individual objects in the first place?
3) If this is not how this should be handled, what are some other suggestions for accomplishing something similar?
I've read through a number of D-Bus tutorials and various documentation and about the closest I've come to seeing what I am looking for is what I quoted above. However, none of the examples look to actually do anything like this so I am not sure how to proceed.

1) Mostly yes, I would only change one thing in the connect method as I explain in 2).
2) D-Bus connections are not persistent, everything is done with request/response messages, no connection state is stored unless you implement this in third objects as you do with your flickerObject. The d-bus objects in python bindings are mostly proxies that abstract the remote objects as if you were "connected" to them, but what it really does is to build messages based on the information you give to D-Bus object instantiation (object path, interface and so). So the service cannot know when the client is done if client doesn't announce it with other explicit call.
To handle unexpected client finalization you can create a D-Bus object in the client and send the object path to the service when connecting, change your connect method to accept also an ObjectPath parameter. The service can listen to NameOwnerChanged signal to know if a client has died.
To create the individual object you only have to instantiate an object in the same service as you do with your "/Connection", but you have to be sure that you are using an unexisting name. You could have a "/Connection/Manager", and various "/Connection/1", "/Connection/2"...
3) If you need to store the connection state, you have to do something like that.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.