Using Django Caching with Mod_WSGI

Using Django Caching with Mod_WSGI - python

Is anyone aware of any issues with Django's caching framework when deployed to Apache/Mod_WSGI?
When testing with the caching framework locally with the dev server, using the profiling middleware and either the FileBasedCache or LocMemCache, Django's very fast. My request time goes from ~0.125 sec to ~0.001 sec. Fantastic.
I deploy the identical code to a remote machine running Apache/Mod_WSGI and my request time goes from ~0.155 sec (before I deployed the change) to ~.400 sec (post deployment). That's right, caching slowed everything down.
I've spent hours digging through everything, looking for something I'm missing. I've tried using FileBasedCache with a location on tmpfs, but that also failed to improve performance.
I've monitored the remote machine with top, and it shows no other processes and it has 6GB available memory, so basically Django should have full rein. I love Django, but it's incredibly slow, and so far I've never been able to get the caching framework to make any noticeable impact in a production environment. Is there anything I'm missing?
EDIT: I've also tried memcached, with the same result. I confirmed memcached was running by telneting into it.

Indeed django is slow. But I must say most of the slowness goes from app itself.. django just forces you (bu providing bad examples in docs) to do lazy thing that are slow in production.
First of: try nginx + uwsgi. it is just the best.
To optimize you app: you need to find you what is causing slowness, it can be:
slow database queries (a lot of queries or just slow queries)
slow database itself
slow filesystem (nfs for example)
Try logging request queries and watch iostat or iotop or something like that.
I had this scenario with apache+mod_wsgi: first request from browser was very slow... then a few request from same browser were fast.. then if sat doing nothing for 2 minutes - wgain very slow. I don`t know if that was improperly configured apache if it was shutting down wsgi app and starting for each keepalive request. It just posted me off - I installed nging and with nginx+fgxi all was a lot faster than apache+mod_wsgi.

I had a similar problem with an app using memcached. The solution was running mod_wsgi in daemon mode instead of embeded mode, and Apache in mpm_worker mode. After that, application is working much faster.

Same thing happened to me and was wondering what is that is taking so much time.
each cache get was taking around 100 millisecond.
So I debugged the code django locmem code and found out that pickle was taking a lot of time (I was caching a whole table in locmemcache).
I wrapped the locmem as I didn't wanted anything advanced, so even if you remove the pickle and unpickle and put it. You will see a major improvement.
Hope it helps someone.

Related

django-plotly-dash multi session on CPU intensive pages

Running django-plotly-dash, I have multiple python pages. The issue is that when I am loading one of the pages while it is running some calculations, I can not run the same page or other pages from a different session, and the webserver is not responding for other users. If I look at the runserver output, it is busy rendering the first request only.

If I look at the runserver output, it is busy rendering the first request only.
If I understand correctly, this means you use Django's development server, and thus that you are in development (if you use django-admin runserver in production, that's a serious issue).
Now’s a good time to note: don’t use this server in anything resembling a production environment. It’s intended only for use while developing. (We’re in the business of making web frameworks, not web servers.)
Django's development server is supposed to be multithreaded and support concurrent requests. However, from my experience I also noticed that it can handle only one request at a time. I didn't dig too much into it but I assume it might be caused by an app that overrides the runserver command and disables multithreading.
In development this shouldn't be too much of an issue. And in production you won't suffer this kind of blocks as real WSGI servers such as gunicorn will be able to handle several concurrent requests (provided it is configured to use the available resources correctly, and the hardware is able to handle the load).
However if your pages are actually slow to respond, this can be an issue for the user loading the page, and will also require more resources to handle more concurrent requests. It all depends if "slow" means 2 seconds, 5 seconds, 30 seconds or even more. Reducing the response time will depend a lot on the bottleneck of your code and could include:
Optimizing the algorithms
Reducing and optimizing SQL queries (See Database access optimization)
Delaying to Celery the calculations that do not affect the response
Using websockets to flow and display the data as they get calculated without blocking the client until the whole page is calculated. (See django-channels)
Using asyncio to avoid staying idle while waiting for I/O operations.

Django: how to emulate a slow server?

The production server my work is pushed to performs significantly worse than my local development environment, where everything runs fast and smoothly, so I cannot figure out if any change I make may or may not have a bad performance in production.
In that server, responses take a lot of time, mostly I assume because of the database queries, rather than the server's processing power and memory capacity.
I wonder if there is any way to set up a server configuration to emulate these bad conditions: how can I reduce the power of my local Django server so that responses take longer, processing power is low and, most importantly, database connection is slow?
I hope this is not a crazy ask, but it is something I really need to figure out how my code will behave in production, since I have no way of telling that from my local environment.

As some comments suggest above, you could slow down the dev server by doing what this answer suggests.
If you suspect the database, you could change the QuerySet methods in db/models/query.py in the django code and add something like this:
from time import sleep
sleep(0.5) # this is seconds, not milliseconds ;)
but this is all guesses.
You could setup something like opbeat in your code to get actual production info about your code, so you can work out what the problem really is (and speed up production instead of slowing down dev!).
For your local dev environment, you could also try django-silk to get a sense of how many queries your code runs.

Are Heroku instances persistent? (Or, can I use dict/array as a cache?)

So my friend told me that instances on Heroku are persistent (I'm not sure if the vocab is right, but he implied that all users share the same instance).
So, if I have app.py, and an instance runs it, then all users share that instance. That means we can use a dict as a temporary cache for storing small things for faster response time.
So for example, if I'm serving an API, I can maybe define a cache like this and then use it.
How true is that? I tried looking this up, but could not find anything.
I deployed the linked API to heroku on 1 dyno, and with just a few requests per second, it was taking over 100 seconds to serve it. So my understanding is that the cache wasn't working. (It might be useful to note here that majority of time was due to request queueing, according to new relic.)

The Heroku Devcenter has several articles about the Heroku architecture.
Processes don't share memory. Moreover, your code is compiled into a slug and optimized for distribution to the dyno manager. In simple words, it means you don't even know which machine will execute your code. Theoretically, 5 users hitting your app may be routed to 5 different machines and processes.
Last but not least, keep in mind that if your app has only a single web dyno running, that web dyno will sleep. You have to have more than one web dyno to prevent web dynos from sleeping. When the dyno enter the sleep mode, the memory is released and you will loose all the data in memory.
This means that your approach will not work.
Generally speaking, in Heroku you should use external storages. For example, you can use the Memcached add-on and store your cache information in Memcached.
Also note you should not use the file system as cache. Not only because it's slower than Memcached, but also because the Cedar stack file system should be considered ephemeral.

Is there a better way to serve the results of an expensive, blocking python process over HTTP?

We have a web service which serves small, arbitrary segments of a fixed inventory of larger MP3 files. The MP3 files are generated on-the-fly by a python application. The model is, make a GET request to a URL specifying which segments you want, get an audio/mpeg stream in response. This is an expensive process.
We're using Nginx as the front-end request handler. Nginx takes care of caching responses for common requests.
We initially tried using Tornado on the back-end to handle requests from Nginx. As you would expect, the blocking MP3 operation kept Tornado from doing its thing (asynchronous I/O). So, we went multithreaded, which solved the blocking problem, and performed quite well. However, it introduced a subtle race condition (under real world load) that we haven't been able to diagnose or reproduce yet. The race condition corrupts our MP3 output.
So we decided to set our application up as a simple WSGI handler behind Apache/mod_wsgi (still w/ Nginx up front). This eliminates the blocking issue and the race condition, but creates a cascading load (i.e. Apache creates too many processses) on the server under real world conditions. We're working on tuning Apache/mod_wsgi right now, but still at a trial-and-error phase. (Update: we've switched back to Tornado. See below.)
Finally, the question: are we missing anything? Is there a better way to serve CPU-expensive resources over HTTP?
Update: Thanks to Graham's informed article, I'm pretty sure this is an Apache tuning problem. In the mean-time, we've gone back to using Tornado and are trying to resolve the data-corruption issue.
For those who were so quick to throw more iron at the problem, Tornado and a bit of multi-threading (despite the data integrity problem introduced by threading) handles the load acceptably on a small (single core) Amazon EC2 instance.

Have you tried Spawning? It is a WSGI server with a flexible assortment of threading modes.

Are you making the mistake of using embedded mode of Apache/mod_wsgi? Read:
http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html
Ensure you use daemon mode if using Apache/mod_wsgi.

You might consider a queuing system with AJAX notification methods.
Whenever there is a request for your expensive resource, and that resource needs to be generated, add that request to the queue (if it's not already there). That queuing operation should return an ID of an object that you can query to get its status.
Next you have to write a background service that spins up worker threads. These workers simply dequeue the request, generate the data, then saves the data's location in the request object.
The webpage can make AJAX calls to your server to find out the progress of the generation and to give a link to the file once it's available.
This is how LARGE media sites work - those that have to deal with video in particular. It might be overkill for your MP3 work however.
Alternatively, look into running a couple machines to distribute the load. Your threads on Apache will still block, but atleast you won't consume resources on the web server.

Please define "cascading load", as it has no common meaning.
Your most likely problem is going to be if you're running too many Apache processes.
For a load like this, make sure you're using the prefork mpm, and make sure you're limiting yourself to an appropriate number of processes (no less than one per CPU, no more than two).

It looks like you are doing things right -- just lacking CPU power: can you determine what is the CPU loading in the process of generating these MP3?
I think the next thing you have to do there is to add more hardware to render the MP3's on other machines. Or that or find a way to deliver pre-rendered MP3 (maybe you can cahce some of your media?)
BTW, scaling for the web was the theme of a Keynote lecture by Jacob Kaplan-Moss on PyCon Brasil this year, and it is far from being a closed problem. The stack of technologies one needs to handle is quite impressible - (I could not find an online copy o f the presentation, though - -sorry for that)

benchmarking django apps

I'm interested in testing the performance of my django apps as I go, what is the best way to get line by line performance data?
note: Googling this returns lots of people benchmarking django itself. I'm not looking for a benchmarks of django, I'm trying to test the performance of the django apps that I'm writing :)
Thanks!
edit: By "line by line" I just mean timing individual functions, db calls, etc to find out where the bottlenecks are on a very granular level

There's two layers to this. We have most of #1 in place for our testing. We're about to start on #2.
Django in isolation. The ordinary Django unit tests works well here. Create some tests that cycle through a few (less than 6) "typical" use cases. Get this, post that, etc. Collect timing data. This isn't real web performance, but it's an easy-to-work with test scenario that you can use for tuning.
Your whole web stack. In this case, you need a regular server running Squid, Apache, Django, MySQL, whatever. You need a second computer(s) to act a client exercise your web site through urllib2, doing a few (less than 6) "typical" use cases. Get this, post that, etc. Collect timing data. This still isn't "real" web performance, because it isn't through the internet, but it's as close as you're going to get without a really elaborate setup.
Note that the #2 (end-to-end) includes a great deal of caching for performance. If your client scripts are doing similar work, caching will be really beneficial. if your client scripts do unique things each time, caching will be less beneficial.
The hardest part is determining what the "typical" workload is. This isn't functional testing, so the workload doesn't have to include everything. Also, the more concurrent sessions your client is running, the slower it becomes. Don't struggle trying to optimize your server when your test client is the slowest part of the processing.
Edit
If "line-by-line" means "profiling", well, you've got to get a Python profiler running.
https://docs.python.org/library/profile.html
Note that there's plenty of caching in the Django ORM layer. So running a view function a half-dozen times to get a meaningful set of measurements isn't sensible. You have to run a "typical" set of operations and then find hot-spots in the profile.
Generally, your application is easy to optimize -- you shouldn't be doing much. Your view functions should be short and have no processing to speak of. Your form and model method functions, similarly, should be very short.

One way to get line by line performance data (profiling) your Django app is to use a WSGI middleware component like repoze.profile.
Assuming you are using mod_wsgi with Apache you can insert repoze.profile into your app like this:
...
application = django.core.handlers.wsgi.WSGIHandler()
...
from repoze.profile.profiler import AccumulatingProfileMiddleware
application = AccumulatingProfileMiddleware(
application,
log_filename='/path/to/logs/profile.log',
discard_first_request=True,
flush_at_shutdown=True,
path='/_profile'
)
And now you can point your browser to /_profile to view your profile data. Of course this won't work with mod_python or the internal Django server.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.