I am writing a Python app to work with Mongodb on the backend with pymongo.
I have created a model classes to mirror documents inserted on the database
this is a quick example of models in action:
if __name__ == '__main__':
class MyApp(App):
db_client = MongoClient(replicaset='erpRS').test
def build(self):
model = models.UserModel(name='NewUser', cpf='01234567890')
print(model)
model.save()
model_from_db = models.UserModel.objects.find_one()
print(model_from_db)
def on_stop(self):
self.db_client.close()
My model classes (ex: UserModel) create an instance of Collection as a class property to be used for CRUD operations. So on the snippet above, I create a user and save it to the db. then I fetch a previously saved user from the db. When app stops, on_stop() is called and pymongo client is and its connections are closed.
This code is working as it is intended.
But monitoring the database I can see that just this snipped above, with a single model instance, and thusly, a single Collection instance, and two calls to CRUD operations, open as much as 6 connections to the database.
On further inspection mongodb iself seems to keep a number of connections open. After a single run of this program, my 3 node ReplicaSet, all running on the same machine, has 19 connections open.
Is this a reasonable behavior? Are there supposed to be this many open connections?
MongoDB has connection pool. You shouldn't worry too much about having 19 connections open. The cost of one connection is 1mb file on the node.
What you should think about is refactoring your code and use DI to pass around MongoClient object.
Creating a new MongoClient and a connection every time you instantiate a class is way too expensive.
Related
I am using peewee to access a SQLite DB.
I have made a model.py like:
from peewee import *
db = SqliteDatabase('people.db')
class Person(Model):
name = CharField()
birthday = DateField()
is_relative = BooleanField()
class Meta:
database = db
In another Python file (with import model) I then manipulate the DB with calls like Person.create() or Person.select(name=='Joe').delete_instance().
The Quickstart says at the end to call db.close() to close the connection. Does this apply to my case as well? Am I supposed to call something like model.db.close()?
According to Charles Leifer, the maker of peewee it is the programmer's job to terminate connections. The documentation about Connection Pools tell, that all connections are thread-local, so as long as the Model is in use, the connection stays open and dies, if the thread containing the Transaction joins the Main-Thread.
Charles explicitly answers a question about the Connection Pool. The answer is a bit generalized, but I suppose this applies to all connections equally: About connection pool
Implicit answers on the topic:
Error 2006: MySQL server has gone away
Excerpt from the docs Quickstart Page:
Although it’s not necessary to open the connection explicitly, it is good practice since it will reveal any errors with your database connection immediately, as opposed to some arbitrary time later when the first query is executed. It is also good to close the connection when you are done – for instance, a web app might open a connection when it receives a request, and close the connection when it sends the response.
The final answer to your question is, based on these information: No.
You open and close the connection manually:
In your case (with db = SqliteDatabase('people.db'))
you established connection with the database by:
db.connect()
next you do whatever you want with the database and finally you close the connection with:
db.close()
I'm having a problem using the Perspective Broker feature of Twisted Python. The structure of my code is like this:
class DBService(service.Service):
def databaseOperation(self, input):
# insert input into DB
class PerspectiveRoot(pb.Root):
def __init__(self, service):
self.service = service
def remote_databaseOperation(self, input):
return self.service.databaseOperation(input)
db_service = DBService()
pb_factory = pb.PBServerFactory(PerspectiveRoot(db_service))
I hook up the factory to a TCP server and then multiple clients connect, who are able to insert records into the database via the remote_databaseOperation function.
This works fine until the number of requests gets big, then I end up with duplicate inputs and missing inputs. I assume this is because DBService's 'input' variable persists and gets overwritten during simultaneous requests. Is this correct? And, if so, what's the best way to rewrite my code so it can deal with simultaneous requests?
My first thought is to have DBService maintain a list of DB additions and loop through it, while clients are able to append to the list. Is this the most 'Twisted' way to do it?
Alternatively, is there a separate pb.Root instance for every client? In which case, I could move the database operation into there since its variables won't get overwritten.
I'm constructing my app such that each user has their own database (for easy isolation, and to minimize the need for sharding). This means that each web request, and all of the background scripts, need to connect to a different database based on which user is making the request, and use that connection for all function calls.
I figure I can make some sort of middleware that would pass the right connection to my web requests by attaching it to the request variable, but I don't know how I should ensure that all functions and model methods called by the request use this connection.
Well how to "ensure that all functions and model methods called by the request use this connection" is easy. You pass the connection into your api as with any well-designed code that isn't relying on global variables for such things. So you have a database session object loaded per-request, and you pass it down. It's very easy for model objects to turtle that session object further without explicitly passing it because each managed object knows what session owns it, and you can query it from there.
db = request.db
user = db.query(User).get(1)
user.add_group('foo')
class User(Base):
def add_group(self, name):
db = sqlalchemy.orm.object_session(self)
group = Group(name=name)
db.add(group)
I'm not recommending you use that exact pattern but it serves as an example of how to grab the session from a managed object, avoiding having to pass the session everywhere explicitly.
On to your original question, how to handle multi-tenancy... In your data model! Designing a system where you are splitting things up at that low of a level is a big maintenance burden and it does not scale well. For example it becomes very difficult to use any type of connection pooling when you have an arbitrary number of independent connections. To get around that people commonly use the SQL SCHEMA feature supported by some databases. That allows you to use the same connection but have access to a different table structure per session. That's better, but again managing all of those schemas independently should raise some red flags, violating DRY with all of that duplication in your data model. Any duplication at that level quickly becomes a burden that you need to be ready for.
Taking the following code into account:
http_server = tornado.httpserver.HTTPServer(app)
http_server.bind(options.port)
http_server.start(5)
What is the relation between the five subprocesses?? Does the dabatase connection instance start up along with the application share as part of the subprocesses?
What is the best practice to use http_server.start(5)?
Great thanks.
It depends where you connect to the DB. The simplest method is to use one shared DB connection, which is a class object on Application or on a RequestHandler. In that case, the single connection instance will be shared among all server processes.
For an example implementation, see the Blog demo app.
I have a Pylons-based web application which connects via Sqlalchemy (v0.5) to a Postgres database. For security, rather than follow the typical pattern of simple web apps (as seen in just about all tutorials), I'm not using a generic Postgres user (e.g. "webapp") but am requiring that users enter their own Postgres userid and password, and am using that to establish the connection. That means we get the full benefit of Postgres security.
Complicating things still further, there are two separate databases to connect to. Although they're currently in the same Postgres cluster, they need to be able to move to separate hosts at a later date.
We're using sqlalchemy's declarative package, though I can't see that this has any bearing on the matter.
Most examples of sqlalchemy show trivial approaches such as setting up the Metadata once, at application startup, with a generic database userid and password, which is used through the web application. This is usually done with Metadata.bind = create_engine(), sometimes even at module-level in the database model files.
My question is, how can we defer establishing the connections until the user has logged in, and then (of course) re-use those connections, or re-establish them using the same credentials, for each subsequent request.
We have this working -- we think -- but I'm not only not certain of the safety of it, I also think it looks incredibly heavy-weight for the situation.
Inside the __call__ method of the BaseController we retrieve the userid and password from the web session, call sqlalchemy create_engine() once for each database, then call a routine which calls Session.bind_mapper() repeatedly, once for each table that may be referenced on each of those connections, even though any given request usually references only one or two tables. It looks something like this:
# in lib/base.py on the BaseController class
def __call__(self, environ, start_response):
# note: web session contains {'username': XXX, 'password': YYY}
url1 = 'postgres://%(username)s:%(password)s#server1/finance' % session
url2 = 'postgres://%(username)s:%(password)s#server2/staff' % session
finance = create_engine(url1)
staff = create_engine(url2)
db_configure(staff, finance) # see below
... etc
# in another file
Session = scoped_session(sessionmaker())
def db_configure(staff, finance):
s = Session()
from db.finance import Employee, Customer, Invoice
for c in [
Employee,
Customer,
Invoice,
]:
s.bind_mapper(c, finance)
from db.staff import Project, Hour
for c in [
Project,
Hour,
]:
s.bind_mapper(c, staff)
s.close() # prevents leaking connections between sessions?
So the create_engine() calls occur on every request... I can see that being needed, and the Connection Pool probably caches them and does things sensibly.
But calling Session.bind_mapper() once for each table, on every request? Seems like there has to be a better way.
Obviously, since a desire for strong security underlies all this, we don't want any chance that a connection established for a high-security user will inadvertently be used in a later request by a low-security user.
Binding global objects (mappers, metadata) to user-specific connection is not good way. As well as using scoped session. I suggest to create new session for each request and configure it to use user-specific connections. The following sample assumes that you use separate metadata objects for each database:
binds = {}
finance_engine = create_engine(url1)
binds.update(dict.fromkeys(finance_metadata.sorted_tables, finance_engine))
# The following line is required when mappings to joint tables are used (e.g.
# in joint table inheritance) due to bug (or misfeature) in SQLAlchemy 0.5.4.
# This issue might be fixed in newer versions.
binds.update(dict.fromkeys([Employee, Customer, Invoice], finance_engine))
staff_engine = create_engine(url2)
binds.update(dict.fromkeys(staff_metadata.sorted_tables, staff_engine))
# See comment above.
binds.update(dict.fromkeys([Project, Hour], staff_engine))
session = sessionmaker(binds=binds)()
I would look at the connection pooling and see if you can't find a way to have one pool per user.
You can dispose() the pool when the user's session has expired