pymongo connection pooling and client requests - python

I know pymongo is thread safe and has an inbuilt connection pool.
In a web app that I am working on, I am creating a new connection instance on every request.
My understanding is that since pymongo manages the connection pool, it isn't wrong approach to create a new connection on each request, as at the end of the request the connection instance will be reclaimed and will be available on subsequent requests.
Am I correct here, or should I just create a single instance to use across multiple requests?

The "wrong approach" depends upon the architecture of your application. With pymongo being thread-safe and automatic connection pooling, the actual use of a single shared connection, or multiple connections, is going to "work". But the results will depend on what you expect the behavior to be. The documentation comments on both cases.
If your application is threaded, from the docs, each thread accessing a connection will get its own socket. So whether you create a single shared connection, or request a new one, it comes down to whether your requests are threaded or not.
When using gevent, you can have a socket per greenlet. This means you don't have to have a true thread per request. The requests can be async, and still get their own socket.
In a nutshell:
If your webapp requests are threaded, then it doesn't matter which way you access a new connection. The result will be the same (socket per thread)
If your webapp is async via gevent, then it doesn't matter which way you access a new conection. The result will be the same. (socket per greenlet)
If your webapp is async, but NOT via gevent, then you have to take into consideration the notes on the best suggested workflow.

Related

should i open db connection for every thread?

I have developing kind of chat app.
There are python&postgresql in server side, and xcode, android(java) side are client side(Web will be next phase).
Server program is always runing on ubuntu linux. and I create thread for every client connection in server(server program developed by python). I didnt decide how should be db operations?.
Should i create general DB connection and i should use this
connection for every client's DB
operation(Insert,update,delete..etc). In that case If i create
general connection, I guess i got some lock issue in future. (When i try to get chat message list while other user inserting)
IF I create DB connection when each client connected to my server. In that case, Is there too many connection. and it gaves me performance issue in future.
If i create DB connection on before each db operation, then there is so much db connection open and close operation.
Whats your opinion? Whats the best way?
The best way would be to maintain a pool of database connections in the server side.
For each request, use the available connection from the pool to do database operations and release it back to the pool once you're done.
This way you will not be creating new db connections for each request, which would be a costly operation.

Concurrency-safe way to initialize global data connections in Flask

Global variables are not thread-safe or "process-safe" in Flask.
However, I need to open connections to services that each worker will use, such as a PubSub client or a Cloud Storage client. It seems like these still need to be global so that any function in the application can access them. To lazily initialize them, I check if the variable is None, and this needs to be thread-safe. What is the recommended approach for opening connections that each request will use? Should I use a thread lock to synchronize?
The question you linked is talking about data, not connections. Having multiple workers mutating global data is not good because you can't reason about where those workers are in a web application to keep them in sync.
The solution to that question is to use an external data source, like a database, which must be connected to somehow. Your idea to have one global connection is not safe though, since multiple worker threads would interact with it concurrently and either mess with each other's state or wait one at a time to acquire the resource. The simplest way to handle this is to establish a connection in each view when you need it.
This example shows how to have a unique connection per request, without globals, reusing the connection once it's established for the request. The g object, while it looks like a global, is implemented as a thread-local behind the scenes, so each worker gets it's own g instance and connection stored on it during one request only.
from flask import g
def get_conn():
"""Use this function to establish or get the already established
connection during a request. The connection is closed at the end
of the request. This avoids having a global connection by storing
the connection on the g object per request.
"""
if "conn" not in g:
g.conn = make_connection(...)
return g.conn
#app.teardown_request
def close_conn(e):
"""Automatically close the connection after the request if
it was opened.
"""
conn = g.pop("conn", None)
if conn is not None:
conn.close()
#app.route("/get_data")
def get_data():
# If something else has already used get_conn during the
# request, this will return the same connection. Anything
# that uses it after this will also use the same connection.
conn = get_conn()
data = conn.query(...)
return jsonify(data)
You might eventually find that establishing a new connection each request is too expensive once you have many thousands of concurrent requests. One solution is to build a connection pool to store a list of connections globally, with a thread-safe way to acquire and replace a connection in the list as needed. SQLAlchemy (and Flask-SQLAlchemy) uses this technique. Many libraries already provide connection pool implementations, so either use them or use them as a reference for your own.

Should a connection to Redis cluster be made on each Flask request?

I have a Flask API, it connects to a Redis cluster for caching purposes. Should I be creating and tearing down a Redis connection on each flask api call? Or, should I try and maintain a connection across requests?
My argument against the second option is that I should really try and keep the api as stateless as possible, and I also don't know if keeping some persistent across request might causes threads race conditions or other side effects.
However, if I want to persist a connection, should it be saved on the session or on the application context?
This is about performance and scale. To get those 2 buzzwords buzzing you'll in fact need persistent connections.
Eventual race conditions will be no different than with a reconnect on every request so that shouldn't be a problem. Any RCs will depend on how you're using redis, but if it's just caching there's not much room for error.
I understand the desired stateless-ness of an API from a client sides POV, but not so sure what you mean about the server side.
I'd suggest you put them in the application context, not the sessions (those could become too numerous) whereas the app context gives you the optimal 1 connection per process (and created immediately at startup). Scaling this way becomes easy-peasy: you'll never have to worry about hitting the max connection counts on the redis box (and the less multiplexing the better).
It's good idea from the performance standpoint to keep connections to a database opened between requests. The reason for that is that opening and closing connections is not free and takes some time which may become problem when you have too many requests. Another issue that a database can only handle up to a certain number of connections and if you open more, database performance will degrade, so you need to control how many connections are opened at the same time.
To solve both of these issues you may use a connection pool. A connection pool contains a number of opened database connections and provides access to them. When a database operation should be performed from a connection shoul be taken from a pool. When operation is completed a connection should be returned to the pool. If a connection is requested when all connections are taken a caller will have to wait until some connections are returned to the pool. Since no new connections are opened in this processed (they all opened in advance) this will ensure that a database will not be overloaded with too many parallel connections.
If connection pool is used correctly a single connection will be used by only one thread will use it at any moment.
Despite of the fact that connection pool has a state (it should track what connections are currently in use) your API will be stateless. This is because from the API perspective "stateless" means: does not have a state/side-effects visible to an API user. Your server can perform a number of operations that change its internal state like writing to log files or writing to a cache, but since this does not influence what data is being returned as a reply to API calls this does not make this API "stateful".
You can see some examples of using Redis connection pool here.
Regarding where it should be stored I would use application context since it fits better to its purpose.

Making moves w/ websockets and python / django ( / twisted? )

The fun part of websockets is sending essentially unsolicited content from the server to the browser right?
Well, I'm using django-websocket by Gregor Müllegger. It's a really wonderful early crack at making websockets work in Django.
I have accomplished "hello world." The way this works is: when a request is a websocket, an object, websocket, is appended to the request object. Thus, I can, in the view interpreting the websocket, do something like:
request.websocket.send('We are the knights who say ni!')
That works fine. I get the message back in the browser like a charm.
But what if I want to do that without issuing a request from the browser at all?
OK, so first I save the websocket in the session dictionary:
request.session['websocket'] = request.websocket
Then, in a shell, I go and grab the session by session key. Sure enough, there's a websocket object in the session dictionary. Happy!
However, when I try to do:
>>> session.get_decoded()['websocket'].send('With a herring!')
I get:
Traceback (most recent call last):
File "<console>", line 1, in <module>
error: [Errno 9] Bad file descriptor
Sad. :-(
OK, so I don't know much of anything about sockets, but I know enough to sniff around in a debugger, and lo and behold, I see that the socket in my debugger (which is tied to the genuine websocket from the request) has fd=6, while the one that I grabbed from the session-saved websocket has fd=-1.
Can a socket-oriented person help me sort this stuff out?
I'm the author of django-websocket. I'm not a real expert in the topic of websockets and networking, however I think I have a decent understanding of whats going on. Sorry for going into great detail. Even if most of the answer isn't specific to your question it might help you at some other point. :-)
How websockets work
Let me explain shortly what a websocket is. A websocket starts as something that really looks like a plain HTTP request, established from the browser. It indicates through a HTTP header that it wants to "upgrade" the protocol to be a websocket instead of a HTTP request. If the server supports websockets, it agrees on the handshake and both - server and client - now know that they will use the established tcp socket formerly used for the HTTP request as a connection to interchange websocket messages.
Beside sending and waiting for messages, they have also of course the ability to close the connection at any time.
How django-websocket abuses the python's wsgi request environment to hijack the socket
Now lets get into the details of how django-websocket implements the "upgrading" of the HTTP request in a django request-response cylce.
Django usually uses the WSGI specification to talk to the webserver like apache or gunicorn etc. This specification was designed just with the very limited communication model of HTTP in mind. It assumes that it gets a HTTP request (only incoming data) and returns the response (only outgoing data). This makes it tricky to force django into the concept of a websocket where bidirectional communication is allowed.
What I'm doing in django-websocket to achieve this is that I dig very deeply into the internals of WSGI and django's request object to retrieve the underlaying socket. This tcp socket is then used to handle the upgrade the HTTP request to a websocket instance directly.
Now to your original question ...
I hope the above makes it obvious that when a websocket is established, there is no point in returning a HttpResponse. This is why you usually don't return anything in a view that is handled by django-websocket.
However I wanted to stick close to the concept of a view that holds the logic and returns data based on the input. This is why you should only use the code in your view to handle the websocket.
After you return from the view, the websocket is automatically closed. This is done for a reason: We don't want to keep the socket open for an undefined amount of time and relying on the client (the browser) to close it.
This is why you cannot access a websocket with django-websocket outside of your view. The file descriptor is then of course set to -1 indicating that its already closed.
Disclaimer
I explained above that I'm digging in the surrounding environment of django to get somehow -- in a very hackish way -- access to the underlaying socket. This is very fragile and also not supposed to work since WSGI is not designed for this! I also explained above that the websocket is closed after the view ends - however after the websocket closed down (AND closed the tcp socket), django's WSGI implementation tries to send a HTTP response - it doesn't know about websockets and thinks it is in a normal HTTP request-response cycle. But the socket is already closed an the sending will fail. This usually causes an exception in django.
This didn't affected my testings with the development server. The browser will never notice (you know .. the socket is already closed ;-) - but raising an unhandled error in every request is not a very good concept and may leak memory, doesn't handle database connection shutdown correctly and many athor things that will break at some point if you use django-websocket for more than experimenting.
This is why I would really advise you not to use websockets with django yet. It doesn't work by design. Django and especially WSGI would need a total overhaul to solve these problems (see this discussion for websockets and WSGI). Since then I would suggest using something like eventlet. Eventlet has a working websocket implementation (I borrowed some code from eventlet for the initial version of django-websocket) and since its just plain python code you can import your models and everything else from django. The only drawback is that you need a second webserver running just to handle websockets.
As Gregor Müllegger pointed out, Websockets can't be properly handled by WSGI, because that protocol never was designed to handle such a feature.
uWSGI, since version 1.9.11, can handle Websockets out of the box. Here uWSGI communicates with the application server using raw HTTP rather than the WSGI protocol. A server written that way, can therefore handle the protocol internals and keep the connection open over a long period. Having long living connections handled by a Django view is not a good idea either, because they then would block a worker thread, which is a limited resource.
The main purpose of Websockets, is to have the server push messages to the client in an asynchronous way. This can be a Django view triggered by other browsers (ex.: chat clients, multiplayer games), or an event triggered by, say django-celery (ex.: sport results). It therefore is fundamental for these Django services, to use a message queue for pushing messages to the client.
To handle this in a scalable way, I wrote django-websocket-redis, a Django module which can keep open all those long living Websocket connections in one single thread/process using Redis as the backend message queue.
You could give stargate a bash: http://boothead.github.com/stargate/ and http://pypi.python.org/pypi/stargate/.
It's built on top of pyramid and eventlet (I also contributed a fair bit of the websocket support and tests to eventlet). The big advantage of pyramid for this sort of thing is that it's got the concept of a resource which the url maps to, rather than just the result of a callable. So you end up with a graph of persistent resources that maps to your url structure and websocket connections are simply routed and connected to those resources.
So you end up only needing to do two things:
class YourView(WebSocketView):
def handler(self, websocket):
self.request.context.add_listener(websocket)
while True:
msg = websocket.wait()
# Do something with message
To receive messages
and
resource.send(some_other_message)
Here resource is an instance of a stargate.resource.WebSocketAwareContext (as is self.request.context) above and the send method sends the message to all clients connected with the add_listener method.
To publish a message to all of the connected clients you just call node.send(message)
I'm hopefully going to write up a little example app in the next week or two to demonstrate this a little better.
Feel free to ping me on github if you want some help with it.
request.websocket is probably get closed when you return from the request handler (view). The simple solution is to keep the handler alive (by not returning from the view). If your server is not multi-threaded you won't be able to accept any other simultaneous requests.

Python XMLRPC with concurrent requests

I'm looking for a way to prevent multiple hosts from issuing simultaneous commands to a Python XMLRPC listener. The listener is responsible for running scripts to perform tasks on that system that would fail if multiple users tried to issue these commands at the same time. Is there a way I can block all incoming requests until the single instance has completed?
I think python SimpleXMLRPCServer module is what you want. I believe the default behavior of that model is blocking new requests when current request is processing. The default behavior gave me lots of trouble and I changed that behavior by mix in ThreadingMixIn class so that my xmlrpc server could respond multiple requests in the same time.
class RPCThreading(SocketServer.ThreadingMixIn, SimpleXMLRPCServer.SimpleXMLRPCServer):
pass
If I understand your question correctly, SimpleXMLRPCServer is the solution. Just use it directly.
Can you have another communication channel? If yes, then have a "call me back when it is my turn" protocol running between the server and the clients.
In other words, each client would register its intention to issue requests to the server and the said server would "callback" the next-up client when it is ready.
There are several choices:
Use single-process-single-thread server like SimpleXMLRPCServer to process requests subsequently.
Use threading.Lock() in threaded server.
You some external locking mechanism (like lockfile module or GET_LOCK() function in mysql) in multiprocess server.

Categories