Maximizing apache server instances with large mod_wsgi application

Maximizing apache server instances with large mod_wsgi application - python

I'm writing a Oracle of Bacon type website that involves a breadth first search on a very large directed graph (>5 million nodes with an average of perhaps 30 outbound edges each). This is also essentially all the site will do, aside from display a few mostly text pages (how it works, contact info, etc.). I currently have a test implementation running in Python, but even using Python arrays to efficiently represent the data, it takes >1.5gb of RAM to hold the whole thing. Clearly Python is the wrong language for a low-level algorithmic problem like this, so I plan to rewrite most of it in C using the Python/C bindings. I estimate that this'll take about 300 mb of RAM.
Based on my current configuration, this will run through mod_wsgi in apache 2.2.14, which is set to use mpm_worker_module. Each child apache server will then load up the whole python setup (which loads the C extension) thus using 300 mb, and I only have 4gb of RAM. This'll take time to load and it seems like it'd potentially keep the number of server instances lower than it could otherwise be. If I understand correctly, data-heavy (and not client-interaction-heavy) tasks like this would typically get divorced from the server by setting up an SQL database or something of the sort that all the server processes could then query. But I don't know of a database framework that'd fit my needs.
So, how to proceed? Is it worth trying to set up a database divorced from the webserver, or in some other way move the application a step farther out than mod_wsgi, in order to maybe get a few more server instances running? If so, how could this be done?
My first impression is that the database, and not the server, is always going to be the limiting factor. It looks like the typical Apache mpm_worker_module configuration has ServerLimit 16 anyways, so I'd probably only get a few more servers. And if I did divorce the database from the server I'd have to have some way to run multiple instances of the database as well (I already know that just one probably won't cut it for the traffic levels I want to support) and make them play nice with the server. So I've perhaps mostly answered my own question, but this is a kind of odd situation so I figured it'd be worth seeing if anyone's got a firmer handle on it. Anything I'm missing? Does this implementation make sense? Thanks in advance!
Technical details: it's a Django website that I'm going to serve using Apache 2.2.14 on Ubuntu 10.4.

First up, look at daemon mode of mod_wsgi and don't use embedded mode as then you can control separate to Apache child processes the number of Python WSGI application processes. Secondly, you would be better off putting the memory hungry bits in a separate backend process. You might use XML-RPC or other message queueing system to communicate with the backend processes, or even perhaps see if you can use Celery in some way.

Related

Max number of Python scripts running simultaneously

I have written a script to do some research on HTTP Archive data. This script needs to make HTTP requests to sites scraped by HTTP Archive in order to classify sites into groups (e.g., Drupal, WordPress, etc). The script is working really well; however, the list of sites that I am handling is 300,000 sites long.
I would like to be able to complete the categorization of sites as fast as possible. I have experimented with running multiple instances of the script at the same time and it is working well with appropriate locks in place to prevent race conditions.
How can I max this out to get all of these operations completed as fast as possible? For instance, I am looking at spinning up a VPS with 8 CPUs and 16 GB RAM. How do I maximize these resources to make sure I'm using every bit of processing power possible? I may consider spinning up something more powerful, but I want to make sure I understand how to get the most out of it so I'm not wasting money.
Thanks!

Multiprocessing module is the best option that lets you harness the maximum power of your 8 CPUs:
https://docs.python.org/3.3/library/multiprocessing.html

Polling a folder, best way to keep memory and speed unaffected?

I want to create a python script that polls a folder that a java server will fill with images when a user transfers them, however I want this script to be almost as invisible as possible in terms of noticeable effects. Keep in mind that this computer has many servers on it, and management of memory and speed are thing I want to optimize. How is the best way to poll this directory without it clogging up the system? Would I want to pull sleep functions in there, or does that cause even more problems?

If your server is Linux, the best and cleanest way to do this is with the system service inotify, which is designed just for your needs. Python has a lib as a part of the twisted network programming framework, which is loosely coupled, so you can use it while keeping it simple. Just check-out this example:
http://twistedmatrix.com/documents/10.2.0/api/twisted.internet.inotify.html
it is quite straight-forward.

having to run multiple instances of a web service for ruby/python seems like a hack to me

Is it just me or is having to run multiple instances of a web server to scale a hack?
Am I wrong in this?
Clarification
I am referring to how I read people run multiple instances of a web service on a single server. I am not talking about a cluster of servers.

Not really, people were running multiple frontends across a cluster of servers before multicore cpus became widespread
So there has been all the infrastructure for supporting sessions properly across multiple frontends for quite some time before it became really advantageous to run a bunch of threads on one machine.
Infact using asynchronous style frontends gives better performance on the same hardware than a multithreaded approach, so I would say that not running multiple instances in favour of a multithreaded monster is a hack

Since we are now moving towards more cores, rather than faster processors - in order to scale more and more, you will need to be running more instances.
So yes, I reckon you are wrong.
This does not by any means condone brain-dead programming with the excuse that you can just scale it horizontally, that just seems retarded.

With no details, it is very difficult to see what you are getting at. That being said, it is quite possible that you are simply not using the right approach for your problem.
Sometimes multiple separate instances are better. Sometimes, your Python services are actually better deployed behind a single Apache instance (using mod_wsgi) which may elect to use more than a single process. I don't know about Ruby to opinionate there.
In short, if you want to make your service scalable then the way to do so depends heavily on additional details. Is it scaling up or scaling out? What is the operating system and available or possibly installable server software? Is the service itself easily parallelized and how much is it database dependent? How is the database deployed?

Even if Ruby/Python interpreters were perfect, and could utilize all avail CPU with single process, you would still reach maximal capability of single server sooner or later and have to scale across several machines, going back to running several instances of your app.

I would hesitate to say that the issue is a "hack". Or indeed that threaded solutions are necessarily superior.
The situation is a result of design decisions used in the interpreters of languages like Ruby and Python.
I work with Ruby, so the details may be different for other languages.
BUT ... essentially, Ruby uses a Global Interpreter Lock to prevent threading issues:
http://en.wikipedia.org/wiki/Global_Interpreter_Lock
The side-effect of this is that to achieve concurrency with frameworks like Rails, rather than relying on multiple threads within the VM, we use multiple processes, each with its own interpreter and instance of your framework and application code
Each instance of the app handles a single request at a time. To achieve concurrency we have to spin up multiple instances.
In the olden days (2-3 years ago) we would run multiple mongrel (or similar) instances behind a proxy (generally apache). Passenger changed some of this because it is smart enough to manage the processes itself, rather than requiring manual setup. You tell Passenger how many processes it can use and off it goes.
The whole structure is actually not as bad as the thread-orthodoxy would have you believe. For a start, it's pretty easy to make this type of architecture work in a multicore environment. Any modern database is designed to handle highly concurrent loads, so having multiple processes has very little if any effect at that level.
If you use a language like JRuby you can deploy into a threaded app server like Tomcat and have a deployment that looks much more "java-like". However, this is not as big a win as you might think, because now your application needs to be much more thread-aware and you can see side effects and strangeness from threading issues.

Your assumption that Tomcat's and IIS's single process per server is superior is flawed. The choice of a multi-threaded server and a multi-process server depends on a lot of variables.
One main thing is the underlying operating system. Unix systems have always had great support for multi-processing because of the copy-on-write nature of the fork system call. This makes multi-processes a really attractive option because web-serving is usually very shared-nothing and you don't have to worry about locking. Windows on the other hand had much heavier processes and lighter threads so programs like IIS would gravitate to a multi-threading model.
As for the question to wether it's a hack to run multiple servers really depends on your perspective. If you look at Apache, it comes with a variety of pluggable engines to choose from. The MPM-prefork one is the default because it allows the programmer to easily use non-thread-safe C/Perl/database libraries without having to throw locks and semaphores all over the place. To some that might be a hack to work around poorly implemented libraries. To me it's a brilliant way of leaving it to the OS to handle the problems and letting me get back to work.
Also a multi-process model comes with a few features that would be very difficult to implement in a multi-threaded server. Because they are just processes, zero-downtime rolling-updates are trivial. You can do it with a bash script.
It also has it's short-comings. In a single-server model setting up a singleton that holds some global state is trivial, while on a multi-process model you have to serialize that state to a database or Redis server. (Of course if your single-process server outgrows a single server you'll have to do that anyway.)
Is it a hack? Yes and no. Both original implementations (MRI, and CPython) have Global Interpreter Locks that will prevent a multi-core server from operating at it's 100% potential. On the other hand multi-process has it's advantages (especially on the Unix-side of the fence).
There's also nothing inherent in the languages themselves that makes them require a GIL, so you can run your application with Jython, JRuby, IronPython or IronRuby if you really want to share state inside a single process.

Is there a better way to serve the results of an expensive, blocking python process over HTTP?

We have a web service which serves small, arbitrary segments of a fixed inventory of larger MP3 files. The MP3 files are generated on-the-fly by a python application. The model is, make a GET request to a URL specifying which segments you want, get an audio/mpeg stream in response. This is an expensive process.
We're using Nginx as the front-end request handler. Nginx takes care of caching responses for common requests.
We initially tried using Tornado on the back-end to handle requests from Nginx. As you would expect, the blocking MP3 operation kept Tornado from doing its thing (asynchronous I/O). So, we went multithreaded, which solved the blocking problem, and performed quite well. However, it introduced a subtle race condition (under real world load) that we haven't been able to diagnose or reproduce yet. The race condition corrupts our MP3 output.
So we decided to set our application up as a simple WSGI handler behind Apache/mod_wsgi (still w/ Nginx up front). This eliminates the blocking issue and the race condition, but creates a cascading load (i.e. Apache creates too many processses) on the server under real world conditions. We're working on tuning Apache/mod_wsgi right now, but still at a trial-and-error phase. (Update: we've switched back to Tornado. See below.)
Finally, the question: are we missing anything? Is there a better way to serve CPU-expensive resources over HTTP?
Update: Thanks to Graham's informed article, I'm pretty sure this is an Apache tuning problem. In the mean-time, we've gone back to using Tornado and are trying to resolve the data-corruption issue.
For those who were so quick to throw more iron at the problem, Tornado and a bit of multi-threading (despite the data integrity problem introduced by threading) handles the load acceptably on a small (single core) Amazon EC2 instance.

Have you tried Spawning? It is a WSGI server with a flexible assortment of threading modes.

Are you making the mistake of using embedded mode of Apache/mod_wsgi? Read:
http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html
Ensure you use daemon mode if using Apache/mod_wsgi.

You might consider a queuing system with AJAX notification methods.
Whenever there is a request for your expensive resource, and that resource needs to be generated, add that request to the queue (if it's not already there). That queuing operation should return an ID of an object that you can query to get its status.
Next you have to write a background service that spins up worker threads. These workers simply dequeue the request, generate the data, then saves the data's location in the request object.
The webpage can make AJAX calls to your server to find out the progress of the generation and to give a link to the file once it's available.
This is how LARGE media sites work - those that have to deal with video in particular. It might be overkill for your MP3 work however.
Alternatively, look into running a couple machines to distribute the load. Your threads on Apache will still block, but atleast you won't consume resources on the web server.

Please define "cascading load", as it has no common meaning.
Your most likely problem is going to be if you're running too many Apache processes.
For a load like this, make sure you're using the prefork mpm, and make sure you're limiting yourself to an appropriate number of processes (no less than one per CPU, no more than two).

It looks like you are doing things right -- just lacking CPU power: can you determine what is the CPU loading in the process of generating these MP3?
I think the next thing you have to do there is to add more hardware to render the MP3's on other machines. Or that or find a way to deliver pre-rendered MP3 (maybe you can cahce some of your media?)
BTW, scaling for the web was the theme of a Keynote lecture by Jacob Kaplan-Moss on PyCon Brasil this year, and it is far from being a closed problem. The stack of technologies one needs to handle is quite impressible - (I could not find an online copy o f the presentation, though - -sorry for that)

I want to create a "CGI script" in python that stays resident in memory and services multiple requests

I have a website that right now, runs by creating static html pages from a cron job that runs nightly.
I'd like to add some search and filtering features using a CGI type script, but my script will have enough of a startup time (maybe a few seconds?) that I'd like it to stay resident and serve multiple requests.
This is a side-project I'm doing for fun, and it's not going to be super complex. I don't mind using something like Pylons, but I don't feel like I need or want an ORM layer.
What would be a reasonable approach here?
EDIT: I wanted to point out that for the load I'm expecting and processing I need to do on a request, I'm confident that a single python script in a single process could handle all requests without any slowdowns, especially since my dataset would be memory-resident.

That's exactly what WSGI is for ;)
I don't know off hand what the simplest way to turn a CGI script into a WSGI application is, though (I've always had that managed by a framework). It shouldn't be too tricky, though.
That said, An Introduction to the Python Web Server Gateway Interface (WSGI) seems to be a reasonable introduction, and you'll also want to take a look at mod_wsgi (assuming you're using Apache…)

maybe you should direct your search towards inter process commmunication and make a search process that returns the results to the web server. This search process will be running all the time assuming you have your own server.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.