how to process long-running requests in python workers?

how to process long-running requests in python workers? - python

I have a python (well, it's php now but we're rewriting) function that takes some parameters (A and B) and compute some results (finds best path from A to B in a graph, graph is read-only), in typical scenario one call takes 0.1s to 0.9s to complete. This function is accessed by users as a simple REST web-service (GET bestpath.php?from=A&to=B). Current implementation is quite stupid - it's a simple php script+apache+mod_php+APC, every requests needs to load all the data (over 12MB in php arrays), create all structures, compute a path and exit. I want to change it.
I want a setup with N independent workers (X per server with Y servers), each worker is a python app running in a loop (getting request -> processing -> sending reply -> getting req...), each worker can process one request at a time. I need something that will act as a frontend: get requests from users, manage queue of requests (with configurable timeout) and feed my workers with one request at a time.
how to approach this? can you propose some setup? nginx + fcgi or wsgi or something else? haproxy? as you can see i'am a newbie in python, reverse-proxy, etc. i just need a starting point about architecture (and data flow)
btw. workers are using read-only data so there is no need to maintain locking and communication between them

The typical way to handle this sort of arrangement using threads in Python is to use the standard library module Queue. An example of using the Queue module for managing workers can be found here: Queue Example

Looks like you need the "workers" to be separate processes (at least some of them, and therefore might as well make them all separate processes rather than bunches of threads divided into several processes). The multiprocessing module in Python 2.6 and later's standard library offers good facilities to spawn a pool of processes and communicate with them via FIFO "queues"; if for some reason you're stuck with Python 2.5 or even earlier there are versions of multiprocessing on the PyPi repository that you can download and use with those older versions of Python.
The "frontend" can and should be pretty easily made to run with WSGI (with either Apache or Nginx), and it can deal with all communications to/from worker processes via multiprocessing, without the need to use HTTP, proxying, etc, for that part of the system; only the frontend would be a web app per se, the workers just receive, process and respond to units of work as requested by the frontend. This seems the soundest, simplest architecture to me.
There are other distributed processing approaches available in third party packages for Python, but multiprocessing is quite decent and has the advantage of being part of the standard library, so, absent other peculiar restrictions or constraints, multiprocessing is what I'd suggest you go for.

There are many FastCGI modules with preforked mode and WSGI interface for python around, the most known is flup. My personal preference for such task is superfcgi with nginx. Both will launch several processes and will dispatch requests to them. 12Mb is not as much to load them separately in each process, but if you'd like to share data among workers you need threads, not processes. Note, that heavy math in python with single process and many threads won't use several CPU/cores efficiently due to GIL. Probably the best approach is to use several processes (as much as cores you have) each running several threads (default mode in superfcgi).

The most simple solution in this case is to use the webserver to do all the heavy lifting. Why should you handle threads and/or processes when the webserver will do all that for you?
The standard arrangement in deployments of Python is:
The webserver start a number of processes each running a complete python interpreter and loading all your data into memory.
HTTP request comes in and gets dispatched off to some process
Process does your calculation and returns the result directly to the webserver and user
When you need to change your code or the graph data, you restart the webserver and go back to step 1.
This is the architecture used Django and other popular web frameworks.

I think you can configure modwsgi/Apache so it will have several "hot" Python interpreters
in separate processes ready to go at all times and also reuse them for new accesses
(and spawn a new one if they are all busy).
In this case you could load all the preprocessed data as module globals and they would
only get loaded once per process and get reused for each new access. In fact I'm not sure this isn't the default configuration
for modwsgi/Apache.
The main problem here is that you might end up consuming
a lot of "core" memory (but that may not be a problem either).
I think you can also configure modwsgi for single process/multiple
thread -- but in that case you may only be using one CPU because
of the Python Global Interpreter Lock (the infamous GIL), I think.
Don't be afraid to ask at the modwsgi mailing list -- they are very
responsive and friendly.

You could use nginx load balancer to proxy to PythonPaste paster (which serves WSGI, for example Pylons), that launches each request as separate thread anyway.

Another option is a queue table in the database.
The worker processes run in a loop or off cron and poll the queue table for new jobs.

Related

How to use multiple processes in python for a continuous workload

I have a python application running inside of a pod in kubernetes which subscribes to a Google Pub/Sub topic and on each message downloads a file from a google bucket.
The issue I have is that I can't process the workload quickly enough using a single threaded Python application. I would normally run a number of pods to handle the workload but the problem is that all the files have to end up on the same filesystem to be processed by another application.
I have tried spawning a new thread for each request but the volume is too great.
What I would like to do is:
1) Have a number of processes that can process new messages
2) Keep the processes alive and use them to respond to new requests coming in.
All the examples for multiprocessing in python are single workload examples, for example providing 10 numbers to a square function, which isn't what I'm trying to achieve.
I've used gunicorn in the past which spawns a number of worker threads for a flask application, what I want is to do something similar without flask.

In the first, try to separate IO-bound (e.g. request, read/write and etc.) task from CPU-bound (parse JSON/XML, calculating and etc.) task.
For IO-bound case use Threading or ThreadPoolExecutor primitives for auto reuse working thread. Keep attention, writing on disk is blocking function!
If you want to use parallelism for CPU-bound user Processing or ProcessPoolExecutor. For sync them you can use shared object (proxy object) or file or pipe or redis and etc.
Shared objects like Managers (Namespaces, dicts and etc.) is preferred if you want to use pure python.
For work with files to avoid blocking, use individual thread or use async.
For asyncio use aiofile library.

Tornado multi-process mode: how can I send users to the same process?

I have a stateful concept Tornado application which enables users (a small user base) to do some CPU bound tasks with potentially large in-memory objects. This poses a problem in a single-threaded configuration, because one user can impact the experience of another.
I could use multiprocessing to ship the task to another process, but this would require repeated copying of this large data and would not be ideal.
I found that Tornado can be configured to be multi-process. I thought this would resolve my issue for the time being; different users get different processes. However what I've found is that references to objects would go missing when interacting with the web application. I assume it's because Tornado sends me to a potentially different process per API invocation, and the object I interacted previously with did not exist in the current process.
Thus my question: Can I configure Tornado to a client/user to the same process repeatedly?

You can't do this with Tornado's multi-process mode, or any solution that involves multiple processes all listening on a single port. Instead, you need to run your Tornado processes independently on different ports, and use a separate load balancer that can distribute requests to them intelligently (for example, nginx with the ip_hash option)

Jython(WLST)/Python Communication

I have a want to make a jython and python communcation link. I have a django app and python scripts I use for a front end and do system admin/automation tasks. I use jython for Weblogic 9/10. The thing I want to do is make it so that I can give the jython system a request to do. Such as task A with args a,b,c and then return back a message when it was done.
I want to do this because wlst or jython is slow to start and it becomes a pain to do when I need to do a deploy, or check the status of a server or servers(up to 100 right now). So which would be the easiest way to share information back to the main script or python class while keeping the jython/(wlst) system alive and can easily share / make requests?
The way I have been doing it is using the pickle object. By getting all the data, spitting it out to a file, then loading the file back into the python app/script.

Have you considdered Celery or some other standard Queue/Broker messaging system? django-celery is quite mature and well developed and specifically designed for this kind of task.
Django -> Celery --> Worker Process (always running)
^ |-> Worker Process
| `-> Worker Process -,
\______ Job Complete _____/
The basic idea is that you have always-running worker processes(on 1 or more servers) waiting for messages to come in. (these can be pickled objects or json or whatever you want). These processes idle waiting for Celery (and it's RabbitMQ backend) to dish out a message/job to them. Once the message/job is processed, notification comes back through the broker and calls a callback in django in which you update status.
Celery is a task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.
The execution units, called tasks, are executed concurrently on a single or more worker servers. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).

For this kind of messaging I like to use a real language-agnostic message queueing system that can be used over and over again in future projects. Look at AMQP if you can handle having a message-queue broker in the middle managing all the queues. Or if you don't want a 3rd party broker then look at ZeroMQ.
In both cases you can send messages using a sub-pub queue that can handle several workers per queue if needed. The messages can either be simple text strings http://tnetstrings.org/ or they can be JSON objects or, you might even be able to send pickled Python objects along with code to execute if you are careful. Personally I like using JSON objects (a subset of JSON) and unpack them into Python dicts in order to use them.
I have used both AMQP and ZeroMQ in systems with around 20 communicating Python processes. It works well, and if you need to connect to something non-Python, you will find that there is already an AMQP module and a ZeroMQ library out there.
An interesting extension of your scenario is to have 3 kinds of worker processes written in Jython, CPython and IronPython. That way you can leverage 3rd party Java and .NET modules as well as binary CPython modules like lxml. Combine it with something like Redis so that the processes are completely decoupled and can run on multiple servers if necessary. The workers would put their results into Redis instead of gumming up the message queuing system with big messages interleaved with small ones. If necessary a worker can publish a message containing a Redis key so that another process can retrieve the value.

Pickling is fine, use cPickle for efficiency. However you should not write it to a file. Rather use some other IPC mechanisms, like sockets or pipes (i.e. see https://stackoverflow.com/search?q=python+named+pipes) that avoid the disk-overhead.

Python - question regarding the concurrent use of `multiprocess`

I want to use Python's multiprocessing to do concurrent processing without using locks (locks to me are the opposite of multiprocessing) because I want to build up multiple reports from different resources at the exact same time during a web request (normally takes about 3 seconds but with multiprocessing I can do it in .5 seconds).
My problem is that, if I expose such a feature to the web and get 10 users pulling the same report at the same time, I suddenly have 60 interpreters open at the same time (which would crash the system). Is this just the common sense result of using multiprocessing, or is there a trick to get around this potential nightmare?
Thanks

If you're really worried about having too many instances you could think about protecting the call with a Semaphore object. If I understand what you're doing then you can use the threaded semaphore object:
from threading import Semaphore
sem = Semaphore(10)
with sem:
make_multiprocessing_call()
I'm assuming that make_multiprocessing_call() will cleanup after itself.
This way only 10 "extra" instances of python will ever be opened, if another request comes along it will just have to wait until the previous have completed. Unfortunately this won't be in "Queue" order ... or any order in particular.
Hope that helps

You are barking up the wrong tree if you are trying to use multiprocess to add concurrency to a network app. You are barking up a completely wrong tree if you're creating processes for each request. multiprocess is not what you want (at least as a concurrency model).
There's a good chance you want an asynchronous networking framework like Twisted.

locks are only ever nessecary if you have multiple agents writing to a source. If they are just accessing, locks are not needed (and as you said defeat the purpose of multiprocessing).
Are you sure that would crash the system? On a web server using CGI, each request spawns a new process, so it's not unusual to see thousands of simultaneous processes (granted in python one should use wsgi and avoid this), which do not crash the system.
I suggest you test your theory -- it shouldn't be difficult to manufacture 10 simultaneous accesses -- and see if your server really does crash.

Python "Task Server"

My question is: which python framework should I use to build my server?
Notes:
This server talks HTTP with it's clients: GET and POST (via pyAMF)
Clients "submit" "tasks" for processing and, then, sometime later, retrieve the associated "task_result"
submit and retrieve might be separated by days - different HTTP connections
The "task" is a lump of XML describing a problem to be solved, and a "task_result" is a lump of XML describing an answer.
When a server gets a "task", it queues it for processing
The server manages this queue and, when tasks get to the top, organises that they are processed.
the processing is performed by a long running (15 mins?) external program (via subprocess) which is feed the task XML and which produces a "task_result" lump of XML which the server picks up and stores (for later Client retrieval).
it serves a couple of basic HTML pages showing the Queue and processing status (admin purposes only)
I've experimented with twisted.web, using SQLite as the database and threads to handle the long running processes.
But I can't help feeling that I'm missing a simpler solution. Am I? If you were faced with this, what technology mix would you use?

I'd recommend using an existing message queue. There are many to choose from (see below), and they vary in complexity and robustness.
Also, avoid threads: let your processing tasks run in a different process (why do they have to run in the webserver?)
By using an existing message queue, you only need to worry about producing messages (in your webserver) and consuming them (in your long running tasks). As your system grows you'll be able to scale up by just adding webservers and consumers, and worry less about your queuing infrastructure.
Some popular python implementations of message queues:
http://code.google.com/p/stomper/
http://code.google.com/p/pyactivemq/
http://xph.us/software/beanstalkd/

I'd suggest the following. (Since it's what we're doing.)
A simple WSGI server (wsgiref or werkzeug). The HTTP requests coming in will naturally form a queue. No further queueing needed. You get a request, you spawn the subprocess as a child and wait for it to finish. A simple list of children is about all you need.
I used a modification of the main "serve forever" loop in wsgiref to periodically poll all of the children to see how they're doing.
A simple SQLite database can track request status. Even this may be overkill because your XML inputs and results can just lay around in the file system.
That's it. Queueing and threads don't really enter into it. A single long-running external process is too complex to coordinate. It's simplest if each request is a separate, stand-alone, child process.
If you get immense bursts of requests, you might want a simple governor to prevent creating thousands of children. The governor could be a simple queue, built using a list with append() and pop(). Every request goes in, but only requests that fit will in some "max number of children" limit are taken out.

My reaction is to suggest Twisted, but you've already looked at this. Still, I stick by my answer. Without knowing you personal pain-points, I can at least share some things that helped me reduce almost all of the deferred-madness that arises when you have several dependent, blocking actions you need to perform for a client.
Inline callbacks (lightly documented here: http://twistedmatrix.com/documents/8.2.0/api/twisted.internet.defer.html) provide a means to make long chains of deferreds much more readable (to the point of looking like straight-line code). There is an excellent example of the complexity reduction this affords here: http://blog.mekk.waw.pl/archives/14-Twisted-inlineCallbacks-and-deferredGenerator.html
You don't always have to get your bulk processing to integrate nicely with Twisted. Sometimes it is easier to break a large piece of your program off into a stand-alone, easily testable/tweakable/implementable command line tool and have Twisted invoke this tool in another process. Twisted's ProcessProtocol provides a fairly flexible way of launching and interacting with external helper programs. Furthermore, if you suddenly decide you want to cloudify your application, it is not all that big of a deal to use a ProcessProtocol to simply run your bulk processing on a remote server (random EC2 instances perhaps) via ssh, assuming you have the keys setup already.

You can have a look at celery

It seems any python web framework will suit your needs. I work with a similar system on a daily basis and I can tell you, your solution with threads and SQLite for queue storage is about as simple as you're going to get.
Assuming order doesn't matter in your queue, then threads should be acceptable. It's important to make sure you don't create race conditions with your queues or, for example, have two of the same job type running simultaneously. If this is the case, I'd suggest a single threaded application to do the items in the queue one by one.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.