I am a bit confused about multiproessing feature of mod_wsgi and about a general design of WSGI applications that would be executed on WSGI servers with multiprocessing ability.
Consider the following directive:
WSGIDaemonProcess example processes=5 threads=1
If I understand correctly, mod_wsgi will spawn 5 Python (e.g. CPython) processes and any of these processes can receive a request from a user.
The documentation says that:
Where shared data needs to be visible to all application instances, regardless of which child process they execute in, and changes made to
the data by one application are immediately available to another,
including any executing in another child process, an external data
store such as a database or shared memory must be used. Global
variables in normal Python modules cannot be used for this purpose.
But in that case it gets really heavy when one wants to be sure that an app runs in any WSGI conditions (including multiprocessing ones).
For example, a simple variable which contains the current amount of connected users - should it be process-safe read/written from/to memcached, or a DB or (if such out-of-the-standard-library mechanisms are available) shared memory?
And will the code like
counter = 0
#app.route('/login')
def login():
...
counter += 1
...
#app.route('/logout')
def logout():
...
counter -= 1
...
#app.route('/show_users_count')
def show_users_count():
return counter
behave unpredictably in multiprocessing environment?
Thank you!
There are several aspects to consider in your question.
First, the interaction between apache MPM's and mod_wsgi applications. If you run the mod_wsgi application in embedded mode (no WSGIDaemonProcess needed, WSGIProcessGroup %{GLOBAL}) you inherit multiprocessing/multithreading from the apache MPM's. This should be the fastest option, and you end up having multiple processes and multiple threads per process, depending on your MPM configuration. On the contrary if you run mod_wsgi in daemon mode, with WSGIDaemonProcess <name> [options] and WSGIProcessGroup <name>, you have fine control on multiprocessing/multithreading at the cost of a small overhead.
Within a single apache2 server you may define zero, one, or more named WSGIDaemonProcesses, and each application can be run in one of these processes (WSGIProcessGroup <name>) or run in embedded mode with WSGIProcessGroup %{GLOBAL}.
You can check multiprocessing/multithreading by inspecting the wsgi.multithread and wsgi.multiprocess variables.
With your configuration WSGIDaemonProcess example processes=5 threads=1 you have 5 independent processes, each with a single thread of execution: no global data, no shared memory, since you are not in control of spawning subprocesses, but mod_wsgi is doing it for you. To share a global state you already listed some possible options: a DB to which your processes interface, some sort of file system based persistence, a daemon process (started outside apache) and socket based IPC.
As pointed out by Roland Smith, the latter could be implemented using a high level API by multiprocessing.managers: outside apache you create and start a BaseManager server process
m = multiprocessing.managers.BaseManager(address=('', 12345), authkey='secret')
m.get_server().serve_forever()
and inside you apps you connect:
m = multiprocessing.managers.BaseManager(address=('', 12345), authkey='secret')
m.connect()
The example above is dummy, since m has no useful method registered, but here (python docs) you will find how to create and proxy an object (like the counter in your example) among your processes.
A final comment on your example, with processes=5 threads=1. I understand that this is just an example, but in real world applications I suspect that performance will be comparable with respect to processes=1 threads=5: you should go into the intricacies of sharing data in multiprocessing only if the expected performance boost over the 'single process many threads' model is significant.
From the docs on processes and threading for wsgi:
When Apache is run in a mode whereby there are multiple child processes, each child process will contain sub interpreters for each WSGI application.
This means that in your configuration, 5 processes with 1 thread each, there will be 5 interpreters and no shared data. Your counter object will be unique to each interpreter. You would need to either build some custom solution to count sessions (one common process you can communicate with, some kind of persistence based solution, etc.) OR, and this is definitely my recommendation, use a prebuilt solution (Google Analytics and Chartbeat are fantastic options).
I tend to think of using globals to share data as a big form of global abuse. It's a bug well and portability issue in most of the environments I've done parallel processing in. What if suddenly your application was to be run on multiple virtual machines? This would break your code no matter what the sharing model of threads and processes.
If you are using multiprocessing, there are multiple ways to share data between processes. Values and Arrays only work if processes have a parent/child relation (they are shared by inheriting). If that is not the case, use a Manager and Proxy objects.
Related
Exposing a database though Flask-based API, I use locks in view functions to avoid issues with non atomic database operations.
E.g. dealing with a PUT, if the database driver does not provide an atomic upsert feature, I just grab the lock, read, update, write, then release the lock.
AFAIU, this works in a multi-threaded environment, but since the lock belong to the Flask app, it fails if multiple processes are used. Is this correct?
If so, how do people deal with locks when using multiple processes? Do they use an external base such as Redis to store the locks?
Subsidiary question. My apache config is
WSGIDaemonProcess app_name threads=5
Can I conclude that I'm on the safe side until I don't throw some processes=N in there?
This question already has answers here:
Are global variables thread-safe in Flask? How do I share data between requests?
(4 answers)
Closed 4 years ago.
This seems like a pretty obvious question, but a lot of the document around this is very confusing, and warn me not to keep a global state instead of telling me how to.
For example, if I need to have a database connection pool (I'm not using SQLAlchemy), or a pool of object instances (both of which need to be global pools, centrally managed), how do I do that?
If I use flask.g, that's not shared between threads, and if I use a python global, that's not shared between multiple processes of the same application (which, as I understand, can be spawned in the case of large production flask servers). Do I use flask.current_app? Do I make the pool a separate process itself? Something else?
The warnings about "not keeping a per-process global state" in a web backend app (you'll have the very same issue with Django or any wsgi app) only applies to state that you expect to be shared between requests AND processes.
If it's ok for you to have per-process state (for example db connection is typically a per-process state) then it's not an issue. wrt/ connections pooling, you could (or not) decide that having distinct pools per server process is ok.
For anything else - any state that needs to be shared amongst processes -, this is usually handled by some using some external database or cache process, so if you want to have one single connection pool for all your Flask processes you will have to use a distinct server process for maintaining the pool indeed.
Also note that:
multiple processes of the same application (which, as I understand, can be spawned in the case of large production flask servers)
Actually this has nothing to do with being "large". With a traditional "blocking" server, you can only handle concurrent requests by using either multithreading or multiprocessing. The unix philosophy traditionnally favors multiprocessing ("prefork" model) for various reasons, and Python's multithreading is bordering on useless anyway (at least in this context) so you don't have much choice if you hope to serve one more one single request at a time.
To make a long story short, consider that just any production setup for a wsgi app will run multiple processes in the background, period.
I have Apache + mod_wsgi + Django app. mod_wsgi runs in daemon mode.
I have one view that fetches significant queryset from the DB and additionally allocates array by computing results of the queryset and then returns this array. I'm not using thread local storage, global variables or anything alike.
The problem is that my app eats memory relatively to the number threads I set for mod_wsgi.
I've made small experiment by setting various number of threads in mod_wsgi and then hitting my view by curl checking how far wsgi process can memory-climb.
It goes like this:
1 thread - 256Mb
2 threads - 400Mb
3 threads - 535Mb
4 threads - 650Mb
So each thread add about 120-140Mb to the top memory usage.
I seems like the initial memory allocated for first request is never freed up. In single-thread scenario, its reused when second request (to the same view) is arrived. With that I can leave.
But when I use multiple threads, then when request is processed by a thread that never run this request before, this thread "saves" another 140mb somewhere locally.
How can fix this?
Probably Django saves some data in TSL. If that is
the case, how can I disable it?
Alternatively, as a workaround, is it
possible to bind request execution to a certain thread in mod_wsgi?
Thanks.
PS. DEBUG is set to False in settings.py
In this sort of situation, what you should do is vertically partition your web application so that it runs across multiple mod_wsgi daemon process groups. That way you can tailor the configuration of the mod_wsgi daemon processes to the requirements of the subsets of URLs that you delegate to each. As the admin interface URLs of a Django application often have high transient memory usage requirements, yet aren't used very often, it can be recommended to do:
WSGIScriptAlias / /my/path/site/wsgi.py
WSGIApplicationGroup %{GLOBAL}
WSGIDaemonProcess main processes=3 threads=5
WSGIProcessGroup main
WSGIDaemonProcess admin threads=2 inactivity-timeout=60
<Location /admin>
WSGIProcessGroup admin
</Location>
So what this does is create two daemon process groups. By default URLs will be handled in the main daemon process group where the processes are persistent.
For the URLs for the admin interface however, they will be directed to the admin daemon process group, which can be set up with a single process with reduced number of threads, plus an inactivity timeout so that the process will be restarted automatically if the admin interface isn't used after 60 seconds, thereby reclaiming any excessive transient memory usage.
This will mean that submitting a requests to the admin interface can be slowed slightly if the processes had been recycled since the last time, as everything has to be loaded again, but since it is the admin interface and not a public URL, this is generally acceptable.
In a regular application (like on Windows), when objects/variables are created on a global level it is available to the entire program during the entire duration the program is running.
In a web application written in PHP for instance, all variables/objects are destroyed at the end of the script so everything has to be written to the database.
a) So what about python running under apache/modwsgi? How does that work in regards to the memory?
b) How do you create objects that persist between web page requests and how do you ensure there isn't threading issues in apache/modwsgi?
Go read the following from the official mod_wsgi documentation:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
It explains the various modes things can be run in and gives some general guidelines about data scope and sharing.
All Python globals are created when the module is imported. When module is re-imported the same globals are used.
Python web servers do not do threading, but pre-forked processes. Thus there is no threading issues with Apache.
The lifecycle of Python processes under Apache depends. Apache has settings how many child processes are spawned, keep in reserve and killed. This means that you can use globals in Python processes for caching (in-process cache), but the process may terminate after any request so you cannot put any persistent data in the globals. But the process does not necessarily need to terminate and in this regard Python is much more efficient than PHP (the source code is not parsed for every request - but you need to have the server in reload mode to read source code changes during the development).
Since globals are per-process and there can be N processes, the processes share "web server global" state using mechanisms like memcached.
Usually Python globals only contain
Setting variables set during the process initialization
Cached data (session/user neutral)
I have a python (well, it's php now but we're rewriting) function that takes some parameters (A and B) and compute some results (finds best path from A to B in a graph, graph is read-only), in typical scenario one call takes 0.1s to 0.9s to complete. This function is accessed by users as a simple REST web-service (GET bestpath.php?from=A&to=B). Current implementation is quite stupid - it's a simple php script+apache+mod_php+APC, every requests needs to load all the data (over 12MB in php arrays), create all structures, compute a path and exit. I want to change it.
I want a setup with N independent workers (X per server with Y servers), each worker is a python app running in a loop (getting request -> processing -> sending reply -> getting req...), each worker can process one request at a time. I need something that will act as a frontend: get requests from users, manage queue of requests (with configurable timeout) and feed my workers with one request at a time.
how to approach this? can you propose some setup? nginx + fcgi or wsgi or something else? haproxy? as you can see i'am a newbie in python, reverse-proxy, etc. i just need a starting point about architecture (and data flow)
btw. workers are using read-only data so there is no need to maintain locking and communication between them
The typical way to handle this sort of arrangement using threads in Python is to use the standard library module Queue. An example of using the Queue module for managing workers can be found here: Queue Example
Looks like you need the "workers" to be separate processes (at least some of them, and therefore might as well make them all separate processes rather than bunches of threads divided into several processes). The multiprocessing module in Python 2.6 and later's standard library offers good facilities to spawn a pool of processes and communicate with them via FIFO "queues"; if for some reason you're stuck with Python 2.5 or even earlier there are versions of multiprocessing on the PyPi repository that you can download and use with those older versions of Python.
The "frontend" can and should be pretty easily made to run with WSGI (with either Apache or Nginx), and it can deal with all communications to/from worker processes via multiprocessing, without the need to use HTTP, proxying, etc, for that part of the system; only the frontend would be a web app per se, the workers just receive, process and respond to units of work as requested by the frontend. This seems the soundest, simplest architecture to me.
There are other distributed processing approaches available in third party packages for Python, but multiprocessing is quite decent and has the advantage of being part of the standard library, so, absent other peculiar restrictions or constraints, multiprocessing is what I'd suggest you go for.
There are many FastCGI modules with preforked mode and WSGI interface for python around, the most known is flup. My personal preference for such task is superfcgi with nginx. Both will launch several processes and will dispatch requests to them. 12Mb is not as much to load them separately in each process, but if you'd like to share data among workers you need threads, not processes. Note, that heavy math in python with single process and many threads won't use several CPU/cores efficiently due to GIL. Probably the best approach is to use several processes (as much as cores you have) each running several threads (default mode in superfcgi).
The most simple solution in this case is to use the webserver to do all the heavy lifting. Why should you handle threads and/or processes when the webserver will do all that for you?
The standard arrangement in deployments of Python is:
The webserver start a number of processes each running a complete python interpreter and loading all your data into memory.
HTTP request comes in and gets dispatched off to some process
Process does your calculation and returns the result directly to the webserver and user
When you need to change your code or the graph data, you restart the webserver and go back to step 1.
This is the architecture used Django and other popular web frameworks.
I think you can configure modwsgi/Apache so it will have several "hot" Python interpreters
in separate processes ready to go at all times and also reuse them for new accesses
(and spawn a new one if they are all busy).
In this case you could load all the preprocessed data as module globals and they would
only get loaded once per process and get reused for each new access. In fact I'm not sure this isn't the default configuration
for modwsgi/Apache.
The main problem here is that you might end up consuming
a lot of "core" memory (but that may not be a problem either).
I think you can also configure modwsgi for single process/multiple
thread -- but in that case you may only be using one CPU because
of the Python Global Interpreter Lock (the infamous GIL), I think.
Don't be afraid to ask at the modwsgi mailing list -- they are very
responsive and friendly.
You could use nginx load balancer to proxy to PythonPaste paster (which serves WSGI, for example Pylons), that launches each request as separate thread anyway.
Another option is a queue table in the database.
The worker processes run in a loop or off cron and poll the queue table for new jobs.