Why does my remote MongoDB connection require authentication on every query? - python

After fighting with different things here and there, I was finally able to get BottlePY running on Apache and run a MongoDB powered site. I am used to running Django apps, so I will be relating to that a bit in my question.
The Problem
Every time a page is loaded via BottlePY, the connection to the MongoDB database located on MongoHQ.com needs to be re-authenticated (meaning it probably had to reconnect).
What I Found
I attached a db.keep_alive() function to the top of each model function, so that before any mongodb query is run, it trys to run a simple query. If it fails, it catches the OperationFailure or AutoReconnect errors and then calls the db.authenticate() function. After it reauthenticates, I have it add a log to a logs db to monitor how often it needs to reauthenticate. Currently, it needs to reauthenticate on every page load (that requires running a query). This isn't right.
Difference from Django
I use this same concept in django, and have found that the db connection only needs to be authenticated after 10-15 minutes of no queries being run.
I don't understand why creating a pymongo connection in django would be different from creating one in bottle, since I am using the same driver, functions and methods. I am not using any ORMS or anything like that either.
Versions
Bottle: 0.9.dev
Django: 1.2.1 final
PyMongo: 1.8
I appreciate the help!
Update: A friend was able to take a quick look and noticed the following that may help with answering my question.
It appears that each request is
launching a new Python process, as
opposed to Django, in which a single
process remains running for a long
period of time.

This just ended up to be a weird thing between Bottle and MongoHQ. No real solution was found, but I couldn't recreate it with other frameworks. Any other ideas are appreciated.

does your apache xxx.conf contain something like:
WSGIDaemonProcess project user=mysite group=www-data processes=5 threads=1
WSGIProcessGroup project
I think most important should be threads=1

Related

How to change settings in a middleware?

I'd like to change the environment variable DJANGO_SETTINGS_MODULE (along with a few others) and then have ALL relevant modules like django.conf, django.db etc reloaded to reflect the information from the new settings module. The new settings module will have different database. I will be doing this in a middleware.
I was able to achieve this by reloading a few modules along with django.conf and django.db. All new SQL statements were fired against the new DB.
But this appears to be so hackish.
The main reason for me wanting to do this is to have the same apache child process serve requests for different django applications (different settings and not different apps) without having to recreate a new apache child process which reloads the whole thing.
Is there a clean way of achieving what I want to do?
Thanks,
UPDATE (19-Sept-2014): I have accepted Daniel Roseman's answer as that seems to be the reality in the context of the question asked. The Router approach suggested by him was something that I explored but couldn't use because django's transaction classes don't use the router. The router I presume exists for a different reason. The application code base I'm working on, which is pretty large, has tons of transaction.commit_manually for the default or a specific db alias. I was trying to get this to support multiple client databases without changing the application code.
However, I did manage to solve the main problem which was to support multiple client DBs and other settings. I don't try to change the settings on the fly nor do I use the router. I instead have a single settings.py with all client DB information. I monkey patched the connection handler to return a different database connection for 'default' alias (or other specific alias used by the code) based on certain env variables set in the middleware. So far this has worked fine. I will post an update if I run into any issues or if someone else can point out a potential issue with the approach.
No, there is no way to do this, and that's a very good thing as it is a bad idea. There is no reason to use the same Apache process for different sites: instead you should have different virtual hosts for each of your sites, and let Apache manage them.

Python Web Backend

I am an experienced Python developer starting to work on web service
backend system. The system feeds data (constantly) from the web to a
MySQL database. This data is later displayed by a frontend side (there
is no connection between the frontend and the backend). The backend
system constantly downloads flight information from the web (some of
the data is fetched via APIs, and some by downloading and parsing
text / xls files). I already have a script that downloads the data,
parses it, and inserts it to the MySQL db - all in a big loop. The
frontend side is just a bunch of php pages that properly display the
data by querying the MySQL server.
It is crucial that this web service be robust, strong and reliable.
Therefore, I have been looking into the proper ways to design it, and came across the following parts to comprise my system:
1) django as a framework (for HTTP connections and for using Piston)
2) Piston as an API provider (this is great because then my front-end can use the API instead of actually running queries)
3) SQLAlchemy as the DB layer (I don't like the little control you get when using django ORM, I want to be able to run a more complex DB framework)
4) Apache with mod_wsgi to run everything
5) And finally, Celery (or django-cron) to actually run my infinite loop that pulls the data off the web - hopefully in some sort of organized tasks format). This is the part I am least sure of, and any pointers are appreciated.
This all sounds great. I used django before to write websites (aka
request handlers that return data). However, other than using Celery or django-cron I can't really see how it fits a role of a constant data feeding backend.
I just wanted to run this by you guys to hear your ideas / comments. Any input you have / pointers to documentation and/or other libraries would be greatly greatly appreciated!
If You are about to use SQLAlchemy, I would refrain from using Django: Django is fine if You are using the whole stack, but as You are about to rip Models off, I do not see much value in using it and I would take a look at another option (perhaps Pylons or pure old CherryPy would do).
Even more so if FEs will not run queries, but only ask API providers.
As for robustness, I am more satisfied with starting separate fcgi processess with supervise and using more lightweight web server (ligty / nginx), but that's a matter of taste.
For the "infinite loop" part, it depends on what behavior you want: if there is a problem with the source, would you just like to skip the step or repeat it multiple times when source is back up?
Periodic Tasks might be good for former, while cron that would just spawn scraping tasks is better for latter.

Django ORM and PostgreSQL connection limits

I'm running a Django project on Postgresql 8.1.21 (using Django 1.1.1, Python2.5, psycopg2, Apache2 with mod_wsgi 3.2). We've recently encountered this lovely error:
OperationalError: FATAL: connection limit exceeded for non-superusers
I'm not the first person to run up against this. There's a lot of discussion about this error, specifically with psycopg, but much of it centers on older versions of Django and/or offer solutions involving edits to code in Django itself. I've yet to find a succinct explanation of how to solve the problem of the Django ORM (or psycopg, whichever is really responsible, in this case) leaving open Postgre connections.
Will simply adding connection.close() at the end of every view solve this problem? Better yet, has anyone conclusively solved this problem and kicked this error's ass?
Edit: we later upped Postgresql's limit to 500 connections; this prevented the error from cropping up, but replaced it with excessive memory usage.
This could be caused by other things. For example, configuring Apache/mod_wsgi in a way that theoretically it could accept more concurrent requests than what the database itself may be able to accept at the same time. Have you reviewed your Apache/mod_wsgi configuration and compared limit on maximum clients to that of PostgreSQL to make sure something like that hasn't been done. Obviously this presumes though that you have managed to reach that limit in Apache some how and also depends on how any database connection pooling is set up.

How to Disable Django / mod_WSGI Page Caching

I have Django running in Apache via mod_wsgi. I believe Django is caching my pages server-side, which is causing some of the functionality to not work correctly.
I have a countdown timer that works by getting the current server time, determining the remaining countdown time, and outputting that number to the HTML template. A javascript countdown timer then takes over and runs the countdown for the user.
The problem arises when the user refreshes the page, or navigates to a different page with the countdown timer. The timer appears to jump around to different times sporadically, usually going back to the same time over and over again on each refresh.
Using HTTPFox, the page is not being loaded from my browser cache, so it looks like either Django or Apache is caching the page. Is there any way to disable this functionality? I'm not going to have enough traffic to worry about caching the script output. Or am I completely wrong about why this is happening?
[Edit] From the posts below, it looks like caching is disabled in Django, which means it must be happening elsewhere, perhaps in Apache?
[Edit] I have a more thorough description of what is happening: For the first 7 (or so) requests made to the server, the pages are rendered by the script and returned, although each of those 7 pages seems to be cached as it shows up later. On the 8th request, the server serves up the first page. On the 9th request, it serves up the second page, and so on in a cycle. This lasts until I restart apache, when the process starts over again.
[Edit] I have configured mod_wsgi to run only one process at a time, which causes the timer to reset to the same value in every case. Interestingly though, there's another component on my page that displays a random image on each request, using order('?'), and that does refresh with different images each time, which would indicate the caching is happening in Django and not in Apache.
[Edit] In light of the previous edit, I went back and reviewed the relevant views.py file, finding that the countdown start variable was being set globally in the module, outside of the view functions. Moving that setting inside the view functions resolved the problem. So it turned out not to be a caching issue after all. Thanks everyone for your help on this.
From my experience with mod_wsgi in Apache, it is highly unlikely that they are causing caching. A couple of things to try:
It is possible that you have some proxy server between your computer and the web server that is appropriately or inappropriately caching pages. Sometimes ISPs run proxy servers to reduce bandwidth outside their network. Can you please provide the HTTP headers for a page that is getting cached (Firebug can give these to you). Headers that I would specifically be interested in include Cache-Control, Expires, Last-Modified, and ETag.
Can you post your MIDDLEWARE_CLASSES from your settings.py file. It possible that you have a Middleware that performs caching for you.
Can you grep your code for the following items "load cache", "django.core.cache", and "cache_page". A *grep -R "search" ** will work.
Does the settings.py (or anything it imports like "from localsettings import *") include CACHE_BACKEND?
What happens when you restart apache? (e.g. sudo services apache restart). If a restart clears the issue, then it might be apache doing caching (it is possible that this could also clear out a locmen Django cache backend)
Did you specifically setup Django caching? From the docs it seems you would clearly know if Django was caching as it requires work beforehand to get it working. Specifically, you need to define where the cached files are saved.
http://docs.djangoproject.com/en/dev/topics/cache/
Are you using a multiprocess configuration for Apache/mod_wsgi? If you are, that will account for why different responses can have a different value for the timer as likely that when timer is initialised will be different for each process handling requests. Thus why it can jump around.
Have a read of:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
Work out in what mode or configuration you are running Apache/mod_wsgi and perhaps post what that configuration is. Without knowing, there are too many unknowns.
I just came across this:
Support for Automatic Reloading To help deployment tools you can
activate support for automatic reloading. Whenever something changes
the .wsgi file, mod_wsgi will reload all the daemon processes for us.
For that, just add the following directive to your Directory section:
WSGIScriptReloading On

How to improve Trac's performance

I have noticed that my particular instance of Trac is not running quickly and has big lags. This is at the very onset of a project, so not much is in Trac (except for plugins and code loaded into SVN).
Setup Info: This is via a SELinux system hosted by WebFaction. It is behind Apache, and connections are over SSL. Currently the .htpasswd file is what I use to control access.
Are there any recommend ways to improve the performance of Trac?
It's hard to say without knowing more about your setup, but one easy win is to make sure that Trac is running in something like mod_python, which keeps the Python runtime in memory. Otherwise, every HTTP request will cause Python to run, import all the modules, and then finally handle the request. Using mod_python (or FastCGI, whichever you prefer) will eliminate that loading and skip straight to the good stuff.
Also, as your Trac database grows and you get more people using the site, you'll probably outgrow the default SQLite database. At that point, you should think about migrating the database to PostgreSQL or MySQL, because they'll be able to handle concurrent requests much faster.
We've had the best luck with FastCGI. Another critical factor was to only use https for authentication but use http for all other traffic -- I was really surprised how much that made a difference.
I have noticed that if
select disctinct name from wiki
takes more than 5 seconds (for example due to a million rows in this table - this is a true story (We had a script that filled it)), browsing wiki pages becomes very slow and takes over 2*t*n, where t is time of execution of the quoted query (>5s of course), and n is a number of tracwiki links present on the viewed page.
This is due to trac having a (hardcoded) 5s cache expire for this query. It is used by trac to tell what the colour should the link be. We re-hardcoded the value to 30s (We need that many pages, so every 30s someone has to wait 6-7s).
It may not be what caused Your problem, but it may be. Good luck on speeding up Your Trac instance.
Serving the chrome files statically with and expires-header could help too. See the end of this page.

Categories