I'm running wsgi application on apache mod_wsgi in daemon mode.
I have these lines in the configuration
WSGIDaemonProcess app processes=2 threads=3 display-name=%{GROUP}
WSGIProcessGroup app
How do I find the optimal combination/tuning of processes and threads?
EDIT:
This link [given in answer bellow] was quite usefull:
https://serverfault.com/questions/145617/apache-2-2-mpm-worker-more-threads-or-more-processes/146382#146382
Now, my question is this: If my server gives quite good performance for my needs, should I reduce the number of threads to increase stability / reliability? Can I even set it to 1?
You might get more information on ServerFault as well. For example: https://serverfault.com/questions/145617/apache-2-2-mpm-worker-more-threads-or-more-processes
This is another good resource for the topic: http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading#The_mod_wsgi_Daemon_Processes
which briefly describes the options -- including setting threads = 1.
I haven't done this yet but it sounds like it doesn't much matter. Supporting multiple threads as well as multiple processors are both well supported. But for my experience level (and probably yours) its worthwhile to eliminate threading as an extra source of concern -- even if it is theoretically rock solid.
Your best bet is to probably try different bench marks. You can use the apache benchmark command to get a rough estimate at how your configuration is doing. A lot of the tweaking is going to depend on how CPU / IO bound your web app is. The performance is also going to depend on the specs of the server you are hosting on etc.
Related
TL;DR:
Is there a possibility to easily let the workers decide which tasks they can work on, depending on their (local) configuration and the task args/kwargs?
A quick and dirty solution I thought of would be to raise Reject() in all workers that find themselves unsuitable, but I was hoping there's a more elegant one.
Details:
The application is an (educational) programming assignment evaluation tool - maybe comparable to continuous integration: A web application accepts submissions of source code for (previously specified) programming languages (or better: programming environments) which then need to be compiled and executed with several test cases. Now especially for use in a high performance computing course with GPUs, compiling and executing can not happen on the host where the web application runs (for other cases just think security reasons).
To make this easily configurable for administrators, I'd like to have a configuration file for a worker, where locally available resources, compiler types and paths etc. are configured, which the worker uses to decide whether to work on a task or not.
Simply using different queues and using a custom Router does not seem appealing to me, because the number and configuration of queues could vary at runtime and would look a little messy, I think.
Is there an elegant way to achieve something like that? To be honest, the documentation on Extensions and Bootsteps didn't give me much guidance on this.
Thanks in advance for any tips and pointers.
I'm creating an app in several different python web frameworks to see which has the better balance of being comfortable for me to program in and performance. Is there a way of reporting the memory usage of a particular app that is being run in virtualenv?
If not, how can I find the average, maximum and minimum memory usage of my web framework apps?
It depends on how you're going to run the application in your environment. There are many different ways to run Python web apps. Recently popular methods seem to be Gunicorn and uWSGI. So you'd be best off running the application as you would in your environment and you could simply use a process monitor to see how much memory and CPU is being used by the process running your applicaiton.
I'll second Matt W's note about the application environment being a major factor ( Gunicorn , uWSGI, nginx->paster/pserve, eventlet, apache+mod_wsgi, etc etc etc)
I'll also add this- the year is 2012. In 1999, memory and CPU for stuff like this were huge concerns. But it's 2012. Computers are significantly more powerful, expanding them is much easier and cheaper, and frameworks are coded better.
You're essentially looking at benchmarking things that have no practical matter and will only be theoretically 'neat' and informative.
The performance bottlenecks on Python webapps are usually:
database communications bottleneck
database schema
concurrent connections / requests-per-second
In terms of database communications bottleneck , the general approaches to solving it are:
communicate less
aggressive caching
optimize your sql queries and result sets , so there's less data
upgrade your db infrastructure
dedicated machine(s)
cluster master/slave or shard
In terms of database schema, convenience comes at a price. It's faster to get certain things done in Django -- but you're going to be largely stuck with the schema it creates. Pyramid+SqlAlchemy is more flexible and you can build against a finely tuned database with it... but you're not going to get any of the automagic tools that Django gives.
for concurrent connections / requests per second, it's largely due to the environment. running the same app under paster, uwsgi and other deployment strategies will have different results.
here's a link to a good , but old, benchmark - http://nichol.as/benchmark-of-python-web-servers
you'll note there's a slide for peak memory usage there, and although there are a few outliers and a decent amount of clustering going on, the worst performer had 122MB. that's nothing.
you could interpret gevent as awesome for having 3MB compared to uwsgi's 15 or cogen's 122... but these are all a small fraction of a modern system's memory.
the frameworks have such a small overhead and will barely be a factor in operating performance. even the database portions are nothing. reference this posting about SqlAlchemy ( Why is SQLAlchemy insert with sqlite 25 times slower than using sqlite3 directly? ) , where the maintainer notes some impressive performance notes: straight-up sql generation was ~.5s for 100k rows. when a full ORM with integrity checks/etc is involved , it becomes 16s for the same amount of rows. that is nothing.
So , my point is simple- the two factors you should consider are:
how fast / comfortable can i program now
how fast / comfortable can i program in a year from now ( i.e. how likely is my project to grow 'technical debt' using this framework , and how much of a problem will that become )
play with the frameworks to decide which one you like the most, but don't waste your time on performance testing , because all you're going to do is waste time.
The choice of hosting mechanism isn't the cause of memory usage, it is how you configure them, plus what fat Python web application you decide to run.
The benchmark being quoted of:
http://nichol.as/benchmark-of-python-web-servers
is a good example of where benchmarks can get it quite wrong.
The configurations of the different hosting mechanisms in that benchmark were not comparable and so there is no way you can use the results to evaluate memory usage of each properly. I would not pay much attention to that benchmark if memory is your concern.
Ignoring memory, some of the other comments made about where the real bottlenecks are going to be are valid. For a lot more detail on this whole issue see my PyCon talk.
http://lanyrd.com/2012/pycon/spcdg/
Question is relevant to this and this;
the difference is, I'd prefer something with possibly more precision and low load (per-minute cron job isn't preferable for those) and with minimal overhead (i.e. installing celery with rabbitmq seems like a big overkill).
An example task for such is personal reminders server (with reminders that could be edited over web and sent out through e-mail or XMPP).
I'm probably looking for something more like node.js's setTimeout but for django (and though I might prefer to implement reminders in node.js anyway, it's still a possibly interesting question).
For example, it's possible to start new threads in django app (with functions consisting of sleep() and send()); in what ways this can be bad?
The problem with using threads for this solution are the typical problems with Python threads that always drive people towards multi-process solutions instead. The problem is compounded here by the fact your thread isn't driven by the normal request-response cycle. This is summarized nicely by Malcolm Tredinnick here:
Have to disagree. Threads are not a good solution to this problem. The
issue is process management. As written, your threads will never be
rejoined. Webserver processes have a lifecycle uncontrollable by you
(the MaxRequestsPerChild Apache parameter and similar things in other
servers) and you are messing with that by using threads.
If you need a process with a lifecycle that is not matched by the
request-response path — something long running and independent of the
response — a completely separate process is definitely the right model
to use. Using a thread is tying it to the response lifecycle, which
wil have unintended side-effects.
A possible solution for you might be to have a long running process performing your tasks which gets a wake-up signal from a light cron process.
Another possibility would be build something using 0mq, which is much lighter than AMQP style queues (at the cost of some features of course). Tarek Ziade is working on a Mozilla project called powerhose that uses 0mq, looks super simple, and has a heartbeat capability with resolution to the second.
Is it just me or is having to run multiple instances of a web server to scale a hack?
Am I wrong in this?
Clarification
I am referring to how I read people run multiple instances of a web service on a single server. I am not talking about a cluster of servers.
Not really, people were running multiple frontends across a cluster of servers before multicore cpus became widespread
So there has been all the infrastructure for supporting sessions properly across multiple frontends for quite some time before it became really advantageous to run a bunch of threads on one machine.
Infact using asynchronous style frontends gives better performance on the same hardware than a multithreaded approach, so I would say that not running multiple instances in favour of a multithreaded monster is a hack
Since we are now moving towards more cores, rather than faster processors - in order to scale more and more, you will need to be running more instances.
So yes, I reckon you are wrong.
This does not by any means condone brain-dead programming with the excuse that you can just scale it horizontally, that just seems retarded.
With no details, it is very difficult to see what you are getting at. That being said, it is quite possible that you are simply not using the right approach for your problem.
Sometimes multiple separate instances are better. Sometimes, your Python services are actually better deployed behind a single Apache instance (using mod_wsgi) which may elect to use more than a single process. I don't know about Ruby to opinionate there.
In short, if you want to make your service scalable then the way to do so depends heavily on additional details. Is it scaling up or scaling out? What is the operating system and available or possibly installable server software? Is the service itself easily parallelized and how much is it database dependent? How is the database deployed?
Even if Ruby/Python interpreters were perfect, and could utilize all avail CPU with single process, you would still reach maximal capability of single server sooner or later and have to scale across several machines, going back to running several instances of your app.
I would hesitate to say that the issue is a "hack". Or indeed that threaded solutions are necessarily superior.
The situation is a result of design decisions used in the interpreters of languages like Ruby and Python.
I work with Ruby, so the details may be different for other languages.
BUT ... essentially, Ruby uses a Global Interpreter Lock to prevent threading issues:
http://en.wikipedia.org/wiki/Global_Interpreter_Lock
The side-effect of this is that to achieve concurrency with frameworks like Rails, rather than relying on multiple threads within the VM, we use multiple processes, each with its own interpreter and instance of your framework and application code
Each instance of the app handles a single request at a time. To achieve concurrency we have to spin up multiple instances.
In the olden days (2-3 years ago) we would run multiple mongrel (or similar) instances behind a proxy (generally apache). Passenger changed some of this because it is smart enough to manage the processes itself, rather than requiring manual setup. You tell Passenger how many processes it can use and off it goes.
The whole structure is actually not as bad as the thread-orthodoxy would have you believe. For a start, it's pretty easy to make this type of architecture work in a multicore environment. Any modern database is designed to handle highly concurrent loads, so having multiple processes has very little if any effect at that level.
If you use a language like JRuby you can deploy into a threaded app server like Tomcat and have a deployment that looks much more "java-like". However, this is not as big a win as you might think, because now your application needs to be much more thread-aware and you can see side effects and strangeness from threading issues.
Your assumption that Tomcat's and IIS's single process per server is superior is flawed. The choice of a multi-threaded server and a multi-process server depends on a lot of variables.
One main thing is the underlying operating system. Unix systems have always had great support for multi-processing because of the copy-on-write nature of the fork system call. This makes multi-processes a really attractive option because web-serving is usually very shared-nothing and you don't have to worry about locking. Windows on the other hand had much heavier processes and lighter threads so programs like IIS would gravitate to a multi-threading model.
As for the question to wether it's a hack to run multiple servers really depends on your perspective. If you look at Apache, it comes with a variety of pluggable engines to choose from. The MPM-prefork one is the default because it allows the programmer to easily use non-thread-safe C/Perl/database libraries without having to throw locks and semaphores all over the place. To some that might be a hack to work around poorly implemented libraries. To me it's a brilliant way of leaving it to the OS to handle the problems and letting me get back to work.
Also a multi-process model comes with a few features that would be very difficult to implement in a multi-threaded server. Because they are just processes, zero-downtime rolling-updates are trivial. You can do it with a bash script.
It also has it's short-comings. In a single-server model setting up a singleton that holds some global state is trivial, while on a multi-process model you have to serialize that state to a database or Redis server. (Of course if your single-process server outgrows a single server you'll have to do that anyway.)
Is it a hack? Yes and no. Both original implementations (MRI, and CPython) have Global Interpreter Locks that will prevent a multi-core server from operating at it's 100% potential. On the other hand multi-process has it's advantages (especially on the Unix-side of the fence).
There's also nothing inherent in the languages themselves that makes them require a GIL, so you can run your application with Jython, JRuby, IronPython or IronRuby if you really want to share state inside a single process.
My memory usage increases over time and restarting Django is not kind to users.
I am unsure how to go about profiling the memory usage but some tips on how to start measuring would be useful.
I have a feeling that there are some simple steps that could produce big gains. Ensuring 'debug' is set to 'False' is an obvious biggie.
Can anyone suggest others? How much improvement would caching on low-traffic sites?
In this case I'm running under Apache 2.x with mod_python. I've heard mod_wsgi is a bit leaner but it would be tricky to switch at this stage unless I know the gains would be significant.
Edit: Thanks for the tips so far. Any suggestions how to discover what's using up the memory? Are there any guides to Python memory profiling?
Also as mentioned there's a few things that will make it tricky to switch to mod_wsgi so I'd like to have some idea of the gains I could expect before ploughing forwards in that direction.
Edit: Carl posted a slightly more detailed reply here that is worth reading: Django Deployment: Cutting Apache's Overhead
Edit: Graham Dumpleton's article is the best I've found on the MPM and mod_wsgi related stuff. I am rather disappointed that no-one could provide any info on debugging the memory usage in the app itself though.
Final Edit: Well I have been discussing this with Webfaction to see if they could assist with recompiling Apache and this is their word on the matter:
"I really don't think that you will get much of a benefit by switching to an MPM Worker + mod_wsgi setup. I estimate that you might be able to save around 20MB, but probably not much more than that."
So! This brings me back to my original question (which I am still none the wiser about). How does one go about identifying where the problems lies? It's a well known maxim that you don't optimize without testing to see where you need to optimize but there is very little in the way of tutorials on measuring Python memory usage and none at all specific to Django.
Thanks for everyone's assistance but I think this question is still open!
Another final edit ;-)
I asked this on the django-users list and got some very helpful replies
Honestly the last update ever!
This was just released. Could be the best solution yet: Profiling Django object size and memory usage with Pympler
Make sure you are not keeping global references to data. That prevents the python garbage collector from releasing the memory.
Don't use mod_python. It loads an interpreter inside apache. If you need to use apache, use mod_wsgi instead. It is not tricky to switch. It is very easy. mod_wsgi is way easier to configure for django than brain-dead mod_python.
If you can remove apache from your requirements, that would be even better to your memory. spawning seems to be the new fast scalable way to run python web applications.
EDIT: I don't see how switching to mod_wsgi could be "tricky". It should be a very easy task. Please elaborate on the problem you are having with the switch.
If you are running under mod_wsgi, and presumably spawning since it is WSGI compliant, you can use Dozer to look at your memory usage.
Under mod_wsgi just add this at the bottom of your WSGI script:
from dozer import Dozer
application = Dozer(application)
Then point your browser at http://domain/_dozer/index to see a list of all your memory allocations.
I'll also just add my voice of support for mod_wsgi. It makes a world of difference in terms of performance and memory usage over mod_python. Graham Dumpleton's support for mod_wsgi is outstanding, both in terms of active development and in helping people on the mailing list to optimize their installations. David Cramer at curse.com has posted some charts (which I can't seem to find now unfortunately) showing the drastic reduction in cpu and memory usage after they switched to mod_wsgi on that high traffic site. Several of the django devs have switched. Seriously, it's a no-brainer :)
These are the Python memory profiler solutions I'm aware of (not Django related):
Heapy
pysizer (discontinued)
Python Memory Validator (commercial)
Pympler
Disclaimer: I have a stake in the latter.
The individual project's documentation should give you an idea of how to use these tools to analyze memory behavior of Python applications.
The following is a nice "war story" that also gives some helpful pointers:
Reducing the footprint of python applications
Additionally, check if you do not use any of known leakers. MySQLdb is known to leak enormous amounts of memory with Django due to bug in unicode handling. Other than that, Django Debug Toolbar might help you to track the hogs.
In addition to not keeping around global references to large data objects, try to avoid loading large datasets into memory at all wherever possible.
Switch to mod_wsgi in daemon mode, and use Apache's worker mpm instead of prefork. This latter step can allow you to serve many more concurrent users with much less memory overhead.
Webfaction actually has some tips for keeping django memory usage down.
The major points:
Make sure debug is set to false (you already know that).
Use "ServerLimit" in your apache config
Check that no big objects are being loaded in memory
Consider serving static content in a separate process or server.
Use "MaxRequestsPerChild" in your apache config
Find out and understand how much memory you're using
Another plus for mod_wsgi: set a maximum-requests parameter in your WSGIDaemonProcess directive and mod_wsgi will restart the daemon process every so often. There should be no visible effect for the user, other than a slow page load the first time a fresh process is hit, as it'll be loading Django and your application code into memory.
But even if you do have memory leaks, that should keep the process size from getting too large, without having to interrupt service to your users.
Here is the script I use for mod_wsgi (called wsgi.py, and put in the root off my django project):
import os
import sys
import django.core.handlers.wsgi
from os import path
sys.stdout = open('/dev/null', 'a+')
sys.stderr = open('/dev/null', 'a+')
sys.path.append(path.join(path.dirname(__file__), '..'))
os.environ['DJANGO_SETTINGS_MODULE'] = 'myproject.settings'
application = django.core.handlers.wsgi.WSGIHandler()
Adjust myproject.settings and the path as needed. I redirect all output to /dev/null since mod_wsgi by default prevents printing. Use logging instead.
For apache:
<VirtualHost *>
ServerName myhost.com
ErrorLog /var/log/apache2/error-myhost.log
CustomLog /var/log/apache2/access-myhost.log common
DocumentRoot "/var/www"
WSGIScriptAlias / /path/to/my/wsgi.py
</VirtualHost>
Hopefully this should at least help you set up mod_wsgi so you can see if it makes a difference.
Caches: make sure they're being flushed. Its easy for something to land in a cache, but never be GC'd because of the cache reference.
Swig'd code: Make sure any memory management is being done correctly, its really easy to miss these in python, especially with third party libraries
Monitoring: If you can, get data about memory usage and hits. Usually you'll see a correlation between a certain type of request and memory usage.
We stumbled over a bug in Django with big sitemaps (10.000 items). Seems Django is trying to load them all in memory when generating the sitemap: http://code.djangoproject.com/ticket/11572 - effectively kills the apache process when Google pays a visit to the site.