How to configure NGINX to start a cacheRefresher program on server startup - python

Is there any way i can configure a python module to be invoked as part of the startup,when ever the NGINX server starts.
The role of the python module is to query database and cache it in memory database like redis. It also has to refresh the cache periodically every 5 mins for example.
I am using NGINX as reverse proxy server and uWSGI as the application server. The idea of caching is to reduce the application server response time to milliseconds from mins

Originally your question sounded like you an nginx caching proxy...
https://www.nginx.com/resources/wiki/start/topics/examples/reverseproxycachingexample/
Preloading an application cache like Redis is a different problem. What goes in the cache and in what format is hard to generalize ie you may want to cache html fragments and I may want to cache JSON data from an API, someone else may want to cache queries from a database.
Personally if you're serving requests and your applications uses the cache then use the application to preload the cache. A concrete example that I've used with Squid but this will also work with nginx.
Parse logs and gets stats for most used URL's. This is your heatmap.
Start Nginx.
Hit those URL's.
This loads the recently used items into the cache. I've also used the following method ie get id's from database and then hit the app... (forgive my python)
import grequests
base_url = "http://localhost:70/?id="
big_list = ['1', '2', '3', '4', '5']
for i in range(len(big_list)):
big_list[i] = base_url + big_list[i]
rs = (grequests.get(u) for u in big_list)
grequests.map(rs)
This loops over a list of id's and hits the URL. The app then caches everything it needs.
You can use any combination of the above ie I'd rather load only what's needed to avoid caching rubbish that's rarely requested.
The old answer follows...
Your question doesn't read very well but anytime I hear nginx and caching it's normally about how to configure it to cache pages. Some more details here...
https://serversforhackers.com/nginx-caching/
Have a look here for nginx caching with timeouts ie look for
http://nginx.org/en/docs/http/ngx_http_proxy_module.html
look for proxy_cache_valid on that page.

Related

Does python with wsgi (uwsgi) under nginx have some small default cache?

In my small web-site I feel need to make some data widely available, to avoid exchanging with database for every request made. E.g. this could be the list of current users show in the bottom of every page or the time of last update of ranking.
The stuff works in Python (Flask) running upon nginx + uwsgi (this docker image).
I wonder, do I have some small cache or shared memory for keeping such information "out of the box", or I need to take care of explicitly setting up some dedicated cache? Or perhaps some thing like this is provided by nginx?
alternatively I still can use database for it has its own cache I think, anyway
Sorry if question seems to be naive/silly - for I come from java world (where things a bit different as we serve all requests with one fat instance of java application) - and have some difficulty grasping what powers does wsgi/uwsgi provide. Thanks in advance!
Firstly, nginx has cache:
https://www.nginx.com/blog/nginx-caching-guide/
But for flask cacheing you also have options:
https://pythonhosted.org/Flask-Cache/
http://flask.pocoo.org/docs/1.0/patterns/caching/
Did you have a look at caching section from Flask docs?
It literally says:
Flask itself does not provide caching for you, but Werkzeug, one of the libraries it is based on, has some very basic cache support
You create a cache object once and keep it around, similar to how Flask objects are created. If you are using the development server you can create a SimpleCache object, that one is a simple cache that keeps the item stored in the memory of the Python interpreter:
from werkzeug.contrib.cache import SimpleCache
cache = SimpleCache()
-- UPDATE --
Or you could solve on the frontend side storing data in the web browser local storage.
If there's nothing in the local storage you call the DB, else you use the information from local storage rather than making db call.
Hope it helps.

mongodb application level caching worthwhile?

I'm using mongodb as container for a set of help html documents
that are being served by a (cherrypy)-based REST server using pymongo.
As the database is relatively small (~50 documents) and changes seldomly
(only when someone edits through my web server frontend, which runs on the
same machine as the REST server), I wonder whether I could speed up
performance by caching the documents. Three options:
a) I could either cache locally (in the Python process) and then listen for, say, an invalidation trigger being sent by my web server frontend (e.g. through a UNIX socket or a signal or whatever means).
b) I could cache via memcached and have my web server frontend invalidate
by removing or updating said documents when something changes.
c) I could not do anything and rely on caching through a working set being
automatically cached in memory by mongodb (is it?).
What would you think is the best strategy for such usecases?

Web application: Hold large object between requests

I'm working on a web application related to genome searching. This application makes use of this suffix tree library through Cython bindings. Objects of this type are large (hundreds of MB up to ~10GB) and take as long to load from disk as it takes to process them in response to a page request. I'm looking for a way to load several of these objects once on server boot and then use them for all page requests.
I have tried using a remote manager / client setup using the multiprocessing module, modeled after this demo, but it fails when the client connects with an error message that says the object is not picklable.
I would suggest writing a small Flask (or even raw WSGI… But it's probably simpler to use Flask, as it will be easier to get up and running quickly) application which loads the genome database then exposes a simple API. Something like this:
app = Flask(__name__)
database = load_database()
#app.route('/get_genomes')
def get_genomes():
return database.all_genomes()
app.run(debug=True)
Or, you know, something a bit more sensible.
Also, if you need to be handling more than one request at a time (I believe that app.run will only handle one at a time), start by threading… And if that's too slow, you can os.fork() after the database is loaded and run multiple request handlers from there (that way they will all share the same database in memory).

Early Django Admin Logout

I'm working on a Django 1.2.3 project, and I'm finding that the admin session seems to timeout extremely early, after about a minute after logging in, even while I'm using it.
Initially, I had these settings:
SESSION_COOKIE_AGE=1800
SESSION_EXPIRE_AT_BROWSER_CLOSE=True
I thought the problem might be my session storage was mis-configured, so I tried configuring my session to be stored in local memory by adding:
SESSION_ENGINE = "django.contrib.sessions.backends.cache"
CACHE_BACKEND = 'locmem://'
However, the problem still occurs. Is there something else that would cause admin sessions to timeout early even when the user is active?
Caching sessions in locmem:// means that you lose the session whenever the python process restarts. If you're running under the dev server, that would be any time you save a file. In a production environment, that will vary based on your infrastructure - mod_wsgi in apache, for example, will restart python after a certain number of requests (which is highly configurable). If you have multiple python processes configured, you'll lose your session whenever your request goes to a different process.
What's more, if you have multiple servers in a production environment, locmem:// will only refer to one server process.
In other words, don't use locmem:// for session storage.

How to Disable Django / mod_WSGI Page Caching

I have Django running in Apache via mod_wsgi. I believe Django is caching my pages server-side, which is causing some of the functionality to not work correctly.
I have a countdown timer that works by getting the current server time, determining the remaining countdown time, and outputting that number to the HTML template. A javascript countdown timer then takes over and runs the countdown for the user.
The problem arises when the user refreshes the page, or navigates to a different page with the countdown timer. The timer appears to jump around to different times sporadically, usually going back to the same time over and over again on each refresh.
Using HTTPFox, the page is not being loaded from my browser cache, so it looks like either Django or Apache is caching the page. Is there any way to disable this functionality? I'm not going to have enough traffic to worry about caching the script output. Or am I completely wrong about why this is happening?
[Edit] From the posts below, it looks like caching is disabled in Django, which means it must be happening elsewhere, perhaps in Apache?
[Edit] I have a more thorough description of what is happening: For the first 7 (or so) requests made to the server, the pages are rendered by the script and returned, although each of those 7 pages seems to be cached as it shows up later. On the 8th request, the server serves up the first page. On the 9th request, it serves up the second page, and so on in a cycle. This lasts until I restart apache, when the process starts over again.
[Edit] I have configured mod_wsgi to run only one process at a time, which causes the timer to reset to the same value in every case. Interestingly though, there's another component on my page that displays a random image on each request, using order('?'), and that does refresh with different images each time, which would indicate the caching is happening in Django and not in Apache.
[Edit] In light of the previous edit, I went back and reviewed the relevant views.py file, finding that the countdown start variable was being set globally in the module, outside of the view functions. Moving that setting inside the view functions resolved the problem. So it turned out not to be a caching issue after all. Thanks everyone for your help on this.
From my experience with mod_wsgi in Apache, it is highly unlikely that they are causing caching. A couple of things to try:
It is possible that you have some proxy server between your computer and the web server that is appropriately or inappropriately caching pages. Sometimes ISPs run proxy servers to reduce bandwidth outside their network. Can you please provide the HTTP headers for a page that is getting cached (Firebug can give these to you). Headers that I would specifically be interested in include Cache-Control, Expires, Last-Modified, and ETag.
Can you post your MIDDLEWARE_CLASSES from your settings.py file. It possible that you have a Middleware that performs caching for you.
Can you grep your code for the following items "load cache", "django.core.cache", and "cache_page". A *grep -R "search" ** will work.
Does the settings.py (or anything it imports like "from localsettings import *") include CACHE_BACKEND?
What happens when you restart apache? (e.g. sudo services apache restart). If a restart clears the issue, then it might be apache doing caching (it is possible that this could also clear out a locmen Django cache backend)
Did you specifically setup Django caching? From the docs it seems you would clearly know if Django was caching as it requires work beforehand to get it working. Specifically, you need to define where the cached files are saved.
http://docs.djangoproject.com/en/dev/topics/cache/
Are you using a multiprocess configuration for Apache/mod_wsgi? If you are, that will account for why different responses can have a different value for the timer as likely that when timer is initialised will be different for each process handling requests. Thus why it can jump around.
Have a read of:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
Work out in what mode or configuration you are running Apache/mod_wsgi and perhaps post what that configuration is. Without knowing, there are too many unknowns.
I just came across this:
Support for Automatic Reloading To help deployment tools you can
activate support for automatic reloading. Whenever something changes
the .wsgi file, mod_wsgi will reload all the daemon processes for us.
For that, just add the following directive to your Directory section:
WSGIScriptReloading On

Categories