I'm interested in testing the performance of my django apps as I go, what is the best way to get line by line performance data?
note: Googling this returns lots of people benchmarking django itself. I'm not looking for a benchmarks of django, I'm trying to test the performance of the django apps that I'm writing :)
Thanks!
edit: By "line by line" I just mean timing individual functions, db calls, etc to find out where the bottlenecks are on a very granular level
There's two layers to this. We have most of #1 in place for our testing. We're about to start on #2.
Django in isolation. The ordinary Django unit tests works well here. Create some tests that cycle through a few (less than 6) "typical" use cases. Get this, post that, etc. Collect timing data. This isn't real web performance, but it's an easy-to-work with test scenario that you can use for tuning.
Your whole web stack. In this case, you need a regular server running Squid, Apache, Django, MySQL, whatever. You need a second computer(s) to act a client exercise your web site through urllib2, doing a few (less than 6) "typical" use cases. Get this, post that, etc. Collect timing data. This still isn't "real" web performance, because it isn't through the internet, but it's as close as you're going to get without a really elaborate setup.
Note that the #2 (end-to-end) includes a great deal of caching for performance. If your client scripts are doing similar work, caching will be really beneficial. if your client scripts do unique things each time, caching will be less beneficial.
The hardest part is determining what the "typical" workload is. This isn't functional testing, so the workload doesn't have to include everything. Also, the more concurrent sessions your client is running, the slower it becomes. Don't struggle trying to optimize your server when your test client is the slowest part of the processing.
Edit
If "line-by-line" means "profiling", well, you've got to get a Python profiler running.
https://docs.python.org/library/profile.html
Note that there's plenty of caching in the Django ORM layer. So running a view function a half-dozen times to get a meaningful set of measurements isn't sensible. You have to run a "typical" set of operations and then find hot-spots in the profile.
Generally, your application is easy to optimize -- you shouldn't be doing much. Your view functions should be short and have no processing to speak of. Your form and model method functions, similarly, should be very short.
One way to get line by line performance data (profiling) your Django app is to use a WSGI middleware component like repoze.profile.
Assuming you are using mod_wsgi with Apache you can insert repoze.profile into your app like this:
...
application = django.core.handlers.wsgi.WSGIHandler()
...
from repoze.profile.profiler import AccumulatingProfileMiddleware
application = AccumulatingProfileMiddleware(
application,
log_filename='/path/to/logs/profile.log',
discard_first_request=True,
flush_at_shutdown=True,
path='/_profile'
)
And now you can point your browser to /_profile to view your profile data. Of course this won't work with mod_python or the internal Django server.
Related
Is anyone aware of any issues with Django's caching framework when deployed to Apache/Mod_WSGI?
When testing with the caching framework locally with the dev server, using the profiling middleware and either the FileBasedCache or LocMemCache, Django's very fast. My request time goes from ~0.125 sec to ~0.001 sec. Fantastic.
I deploy the identical code to a remote machine running Apache/Mod_WSGI and my request time goes from ~0.155 sec (before I deployed the change) to ~.400 sec (post deployment). That's right, caching slowed everything down.
I've spent hours digging through everything, looking for something I'm missing. I've tried using FileBasedCache with a location on tmpfs, but that also failed to improve performance.
I've monitored the remote machine with top, and it shows no other processes and it has 6GB available memory, so basically Django should have full rein. I love Django, but it's incredibly slow, and so far I've never been able to get the caching framework to make any noticeable impact in a production environment. Is there anything I'm missing?
EDIT: I've also tried memcached, with the same result. I confirmed memcached was running by telneting into it.
Indeed django is slow. But I must say most of the slowness goes from app itself.. django just forces you (bu providing bad examples in docs) to do lazy thing that are slow in production.
First of: try nginx + uwsgi. it is just the best.
To optimize you app: you need to find you what is causing slowness, it can be:
slow database queries (a lot of queries or just slow queries)
slow database itself
slow filesystem (nfs for example)
Try logging request queries and watch iostat or iotop or something like that.
I had this scenario with apache+mod_wsgi: first request from browser was very slow... then a few request from same browser were fast.. then if sat doing nothing for 2 minutes - wgain very slow. I don`t know if that was improperly configured apache if it was shutting down wsgi app and starting for each keepalive request. It just posted me off - I installed nging and with nginx+fgxi all was a lot faster than apache+mod_wsgi.
I had a similar problem with an app using memcached. The solution was running mod_wsgi in daemon mode instead of embeded mode, and Apache in mpm_worker mode. After that, application is working much faster.
Same thing happened to me and was wondering what is that is taking so much time.
each cache get was taking around 100 millisecond.
So I debugged the code django locmem code and found out that pickle was taking a lot of time (I was caching a whole table in locmemcache).
I wrapped the locmem as I didn't wanted anything advanced, so even if you remove the pickle and unpickle and put it. You will see a major improvement.
Hope it helps someone.
I'm working on a turn-based web game that will perform all world updates (player orders, physics, scripted events, etc.) on the server. For now, I could simply update the world in a web request callback. Unfortunately, that naive approach is not at all scalable. I don't want to bog down my web server when I start running many concurrent games.
So what is the best way to separate the load from the web server, ideally in a way that could even be run on a separate machine?
A simple python module with infinite loop?
A distributed task in something like Celery?
Some sort of cross-platform Cron scheduler?
Some other fancy Django feature or third-party library that I don't know about?
I also want to minimize code duplication by using the same model layer. That probably means my service would need access to the Django model code, so that definitely determines how I architect the service.
I think Celery, which you mention in your question, is the way to go here. It will interface nicely with the rest of your setup, support your eventual aim of separating out the systems, and is compatible with Django.
I'd just write the backend to just use the Django database interface (look at the setup code in your manage.py), spawn it as its own process, and interface to it with Protocol Buffers. That route should move to a separate machine with little work. MPI may be an option, too.
Pipes, FIFOs, and most other IPC requires both processes to be on the same box.
Though I have to point out a flaw in your premise:
Unfortunately, that naive approach is not at all scalable. I don't want to bog down my web server when I start running many concurrent games.
If you run concurrent games, so long as you keep all the parts for a given game on the same server, this is a non-issue, unless there's a common resource needed by all games. Then the real issue becomes load balancing across the servers.
I'm looking for some recommendations for a python web application. We have some memory restrictions and we try to keep it small and lean.
We thought about using WSGI (and a python webserver) and build the rest ourself. We already have a template engine we'd like to use, but we are open for some suggestions regarding the whole request handling (the controller).
The application has to run in a single process and the requests have to be processed with multiple threads.
We've looked at django, but we are a not sure if it fits into our memory budget.
Your feedback is very welcome!
Cheers,
Reto
I've been using Werkzeug because it's more a small collection of really useful components than a whole framework. It runs behind a wsgi server of your choice (and comes with a built-in one). If you want something even easier, Flask might be worth a look. Also, you might want to bookmark the rather speedy Jinja in case your template engine doesn't pan out. Those folks over at pocoo.org have been releasing some nice stuff.
You can run an django application in 20 mb memory easily. probably a django application will use less memory than 20mb.
I want to advise you to check webpy and cherrypy
but I'm big fan of django. if you have 20 mb memory to run application, django will give you everythig it has.
I'd go for bottle. It has all the conciseness of web.py but with some nice routing features.
You could take a look at Twisted, which has a module twisted.web. That seems to be fairly light-weight. I'm currently using it, and with a simple app it starts almost instantaneously, so it can't be all that resource intensive :)
I don't know whether Twisted uses different threads.
webpy (http://webpy.org/) is a very minimal memory footprint but highly usable framework. But it all depends on how complex your application is going to be.
Also please take a look at WHIFF. It's tiny and very flexible whiff documentation
We have a web service which serves small, arbitrary segments of a fixed inventory of larger MP3 files. The MP3 files are generated on-the-fly by a python application. The model is, make a GET request to a URL specifying which segments you want, get an audio/mpeg stream in response. This is an expensive process.
We're using Nginx as the front-end request handler. Nginx takes care of caching responses for common requests.
We initially tried using Tornado on the back-end to handle requests from Nginx. As you would expect, the blocking MP3 operation kept Tornado from doing its thing (asynchronous I/O). So, we went multithreaded, which solved the blocking problem, and performed quite well. However, it introduced a subtle race condition (under real world load) that we haven't been able to diagnose or reproduce yet. The race condition corrupts our MP3 output.
So we decided to set our application up as a simple WSGI handler behind Apache/mod_wsgi (still w/ Nginx up front). This eliminates the blocking issue and the race condition, but creates a cascading load (i.e. Apache creates too many processses) on the server under real world conditions. We're working on tuning Apache/mod_wsgi right now, but still at a trial-and-error phase. (Update: we've switched back to Tornado. See below.)
Finally, the question: are we missing anything? Is there a better way to serve CPU-expensive resources over HTTP?
Update: Thanks to Graham's informed article, I'm pretty sure this is an Apache tuning problem. In the mean-time, we've gone back to using Tornado and are trying to resolve the data-corruption issue.
For those who were so quick to throw more iron at the problem, Tornado and a bit of multi-threading (despite the data integrity problem introduced by threading) handles the load acceptably on a small (single core) Amazon EC2 instance.
Have you tried Spawning? It is a WSGI server with a flexible assortment of threading modes.
Are you making the mistake of using embedded mode of Apache/mod_wsgi? Read:
http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html
Ensure you use daemon mode if using Apache/mod_wsgi.
You might consider a queuing system with AJAX notification methods.
Whenever there is a request for your expensive resource, and that resource needs to be generated, add that request to the queue (if it's not already there). That queuing operation should return an ID of an object that you can query to get its status.
Next you have to write a background service that spins up worker threads. These workers simply dequeue the request, generate the data, then saves the data's location in the request object.
The webpage can make AJAX calls to your server to find out the progress of the generation and to give a link to the file once it's available.
This is how LARGE media sites work - those that have to deal with video in particular. It might be overkill for your MP3 work however.
Alternatively, look into running a couple machines to distribute the load. Your threads on Apache will still block, but atleast you won't consume resources on the web server.
Please define "cascading load", as it has no common meaning.
Your most likely problem is going to be if you're running too many Apache processes.
For a load like this, make sure you're using the prefork mpm, and make sure you're limiting yourself to an appropriate number of processes (no less than one per CPU, no more than two).
It looks like you are doing things right -- just lacking CPU power: can you determine what is the CPU loading in the process of generating these MP3?
I think the next thing you have to do there is to add more hardware to render the MP3's on other machines. Or that or find a way to deliver pre-rendered MP3 (maybe you can cahce some of your media?)
BTW, scaling for the web was the theme of a Keynote lecture by Jacob Kaplan-Moss on PyCon Brasil this year, and it is far from being a closed problem. The stack of technologies one needs to handle is quite impressible - (I could not find an online copy o f the presentation, though - -sorry for that)
I have a website that right now, runs by creating static html pages from a cron job that runs nightly.
I'd like to add some search and filtering features using a CGI type script, but my script will have enough of a startup time (maybe a few seconds?) that I'd like it to stay resident and serve multiple requests.
This is a side-project I'm doing for fun, and it's not going to be super complex. I don't mind using something like Pylons, but I don't feel like I need or want an ORM layer.
What would be a reasonable approach here?
EDIT: I wanted to point out that for the load I'm expecting and processing I need to do on a request, I'm confident that a single python script in a single process could handle all requests without any slowdowns, especially since my dataset would be memory-resident.
That's exactly what WSGI is for ;)
I don't know off hand what the simplest way to turn a CGI script into a WSGI application is, though (I've always had that managed by a framework). It shouldn't be too tricky, though.
That said, An Introduction to the Python Web Server Gateway Interface (WSGI) seems to be a reasonable introduction, and you'll also want to take a look at mod_wsgi (assuming you're using Apacheā¦)
maybe you should direct your search towards inter process commmunication and make a search process that returns the results to the web server. This search process will be running all the time assuming you have your own server.