Testing Memcached connection - python

I want to run a basic service which shows the status of various other services on the system (i.e. Mongo, Redis, Memcached, etc).
For Memcached, I thought that I might do something like this:
from django.core.cache import cache
host, name = cache._cache._get_server('test')
This seems to return a host and an arbitrary string. Does the host object confirm that I'm connecting to Memcached successfully?
I know that the returned host object has a connect() method. I'm slightly scared to open a new connection in production environment and I don't have an easy Dev setup to test that method. I assume that it's in one of the Python Memcached libraries, but I'm not sure which one is relevant here.
Can I just use the _get_server method to test Memecached connection success, or should I use the connect method?

There are various things you could monitor, like memcache process up, memcache logs moving etc. that don't require network connectivity. Next level of test would be to see that you can open a socket at the memcache port. But the real test is of course to set and get a value from memcache. For that kind of testing I would probably just use the python-memcached package directly to make the connection and set and get values.

Related

Flask-SQLAlchemy "MySQL server has gone away" when using HAproxy

I've built a small python REST service using Flask, with Flask-SQLAlchemy used for talking to the MySQL DB.
If I connect directly to the MySQL server everything is good, no problems at all. If I use HAproxy (handles HA/failover, though in this dev environment there is only one DB server) then I constantly get MySQL server has gone away errors if the application doesn't talk to the DB frequently enough.
My HAproxy client timeout is set to 50 seconds, so what I think is happening is it cuts the stream, but the application isn't aware and tries to make use of an invalid connection.
Is there a setting I should be using when using services like HAproxy?
Also it doesn't seem to reconnect automatically, but if I issue a request manually I get Can't reconnect until invalid transaction is rolled back, which is odd since it is just a select() call I'm making, so I don't think it is a commit() I'm missing - or should I be calling commit() after every ORM based query?
Just to tidy up this question with an answer I'll post what I (think I) did to solve the issues.
Problem 1: HAproxy
Either increase the HAproxy client timeout value (globally, or in the frontend definition) to a value longer than what MySQL is set to reset on (see this interesting and related SF question)
Or set SQLALCHEMY_POOL_RECYCLE = 30 (30 in my case was less than HAproxy client timeout) in Flask's app.config so that when the DB is initialised it will pull in those settings and recycle connections before HAproxy cuts them itself. Similar to this issue on SO.
Problem 2: Can't reconnect until invalid transaction is rolled back
I believe I fixed this by tweaking the way the DB is initialised and imported across various modules. I basically now have a module that simply has:
from flask.ext.sqlalchemy import SQLAlchemy
db = SQLAlchemy()
Then in my main application factory I simply:
from common.database import db
db.init_app(app)
Also since I wanted to easily load table structures automatically I initialised the metadata binds within the app context, and I think it was this which cleanly handled the commit() issue/error I was getting, as I believe the database sessions are now being correctly terminated after each request.
with app.app_context():
# Setup DB binding
db.metadata.bind = db.engine

Amazon EC2 file structure / web app with separate Python backend?

I'm currently running a t2.micro instance on EC2 right now. I have the html/web interface side of it working, along with a MySQL database.
The site allows users to register and stores them in the DB via a PHP script.
I want there to be an actual Python application that queries the MySQL database and returns user data, to then be executed in a Python script.
What I cannot find is whether I host this Python application as a totally separate instance or if it can exist on the same instance, in a different directory. I ultimately just need to query the database, which makes me thing it must exist on the same instance.
Could someone please provide some guidance?
Let me just be clear: this is not a Python web app. This Python backend is entirely separate except making queries against the database.
Either approach is possible, but there are pros & cons to each.
Running separate Python app on the same server:
Pros:
Setting up local access to the database is fairly simple
Only need to handle backups or making snapshots, etc. for a single instance
Cons:
Harder to scale up individual pieces if you need more memory, processing power, etc. in the future
Running the Python app on a separate server:
Pros:
Separate pieces means you can scale up & down the hardware each piece is running on, according to their individual needs
If you're using all micro instances, you get more resources to work with, without any extra costs (assuming you're still meeting all the other 'free tier eligible' criteria)
Cons:
In general, more pieces == more time spent on configuration, administration tasks, etc.
You have to open up the database to non-local access
Simplest: open up the database to access from anywhere (e.g. all remote IP addresses), and have the Python app log in via the internet
Somewhat safer, more complex: set the Python app server up with an elastic IP, open up the database to access only from that address
Much safer, more complex: set up your own virtual private cloud (VPC), and allow connections to the database only from within the VPC. You'd have to configure public access for each of the servers for whatever public traffic you'll have, presumably ports 80 and/or 443.

In Django, how can I set db connection timeout?

OK, I know it's not that simple. I have two db connections defined in my settings.py: default and cache. I'm using DatabaseCache backend from django.core.cache. I have database router defined so I can use separate database/schema/table for my models and for cache. Perfect!
Now sometimes my cache DB is not available and there are two cases:
Connection to databse was established already when DB crashed - this is easy - I can use this recipe: http://code.activestate.com/recipes/576780-timeout-for-nearly-any-callable/ and wrap my query like this:
try:
timelimited(TIMEOUT, self._meta.cache.get, cache_key))
expect TimeLimitExprired:
# live without cache
Connection to database wasn't yet established - so I need to wrap in timelimited some portion of code that actually establishes database connection. But I don't know where such code exists and how to wrap it selectively (i.e. wrap only cache connection, leave default connection without timeout)
Do you know how to do point 2?
Please note, this answer https://stackoverflow.com/a/1084571/940208 is not correct:
grep -R "connect_timeout" /usr/local/lib/python2.7/dist-packages/django/db
gives no results and cx_Oracle driver doesn't support this parameter as far as I know.

Managing a pool of connections to a hosted Elastic Search provider

I need a way to manage connections to a hosted Elastic Search provider, to speed up search on my website. We are running Django on Heroku, using the Found ElasticSearch add-on, and pyes, which is an ElasticSearch Python library.
The standard way of setting up a connection to ElasticSearch with pyes is by passing the provider URL into an ES object, like so:
(1) connection = ES(my_elasticsearch_url)
Pyes uses the ES object behind the scenes to establish an open HTTP connection to my ElasticSearch provider, so I can run searches like this:
(2) results = connection.search(some_query, index_name)
Previously, I was doing both those steps in my Django view for search -- every time a user did a search, it opened a new HTTP connection then ran the search. Consequentially the search call was slow.
I sped up search by moving (1) into my app's __init__.py file -- now, I am setting up the connection only once, and importing it into the search view. But I'm worried it will choke that HTTP connection if lots of people are trying search at once.
I'm looking for ideas on how to set up a pool of connections, initiate them once on app start up, and then dole them out to my search view as needed. Ideally I'd like to be able to scale the size of the pool up and down easily with minimal changes to my code.
I can think of a few ways to approach it, but it seems like a common computing related problem, so I'm sure that a lot of you have ideas on good design and best practices for such a system. I'd love to hear them.
Thanks a lot!
Clay
If your running in a multi-threaded environment, it's merely a matter of extending Queue.Queue to create an instance that can fetch and instantiate connections on demand, from multiple threads in which your views are handling the request-response flow. You'll probably want to have a certain cap on how many connections your retaining by limiting the maximum size of the queue, although you can instantiate more connections beyond that and simply discard them if you can put them back into the queue.
The downside of using Queue.Queue is that it can create cross-cutting concerns if your views are responsible for retrieving connections from and returning them back into the queue. You can get a healthier design if you only queue the actual object from pyes.ES that holds the connection and create a wrapper for ES that, when performing a query, creates a new ES instance, fetches a connection from the queue, sets it on the instance, performs the query, returns the connection back into the queue, discards the ES instance and returns the query results.

Django statelessness?

I'm just wondering if Django was designed to be a fully stateless framework?
It seems to encourage statelessness and external storage mechanisms (databases and caches) but I'm wondering if it is possible to store some things in the server's memory while my app is in develpoment and runs via manage.py runserver.
Sure it's possible. But if you are writing a web application you probably won't want to do that because of threading issues.
That depends on what you mean by "store things in the server's memory." It also depends on the type of data. If you can, you're better off storing "global data" in a database or in the file system somewhere. Unless it is needed every request it doesn't really make sense to store it in the Django instance itself. You'll need to implement some form of locking to prevent race conditions, but you'd need to worry about race conditions if you stored everything on the server object anyway.
Of course, if you're talking about user-by-user data, Django does support sessions. Or, and this is another perfectly good option if you're willing to make the user save the data, cookies.
The best way to maintain state in a django app on a per-user basis is request.session (see django sessions) which is a dictionary you can use to remember things about the current user.
For Application-wide state you should use the a persistent datastore (database or key/value store)
example view for sessions:
def my_view(request):
pages_viewed = request.session.get('pages_viewed', 1) + 1
request.session['pages_viewed'] = pages_viewed
...
If you wanted to maintain local variables on a per app-instance basis you can just define module level variables like so
# number of times my_view has been served since by this server
# instance since the last restart
served_since_restart = 0
def my_view(request):
served_since_restart += 1
...
If you wanted to maintain some server state across ALL app servers (like total number of pages viewed EVER) you should probably use a persistent key/value store like redis, memcachedb, or riak. There is a decent comparison of all these options here: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
You can do it with redis (via redis-py) like so (assuming your redis server is at "127.0.0.1" (localhost) and it's port 6379 (the default):
import redis
def my_view(request):
r = redis.Redis(host='127.0.0.1', port="6379")
served = r.get('pages_served_all_time', 0)
served += 1
r.set('pages_served_all_time', served)
...
There is LocMemCache cache backend that stores data in-process. You can use it with sessions (but with great care: this cache is not cross-process so you will have to use single process for deployment because it will not be guaranteed that subsequent requests will be handled by the same process otherwise). Global variables may also work (use threadlocals if they shouldn't be shared for all process threads; the warning about cross-process communication also applies here).
By the way, what's wrong with external storage? External storage provides easy cross-process data sharing and other features (like memory limiting algorithms for cache or persistance with databases).

Categories