How to control a simulation in Python

How to control a simulation in Python - python

I have a fairly high-level question about Python and running interactive simulations. Here is the setup:
I am porting to Python some simulation software I originally wrote in Smalltalk (VW). It is a kind of Recurrent Neural Network controlled interactively from a graphical interface. The interface allows the manipulation of most the network's parameters in real time, in addition to controlling the simulation itself (starting it, stopping it, etc). In the original Smalltalk implementation, I had two processes running with different priority levels:
The interface itself with a higher priority
The neural network running forever at a lower priority
Communication between the two processes was trivial, because all Smalltalk processes share the same address space (the Object memory).
I am now starting to realize that replicating a similar setup in Python is not so trivial. The threading module does not allow its threads to share address space, as far as I can tell. The multiprocessing module does, but in a rather complex way (with Queues, etc).
So I am starting to think that my Smalltalk perspective is leading me astray and I am approaching a relatively simple problem from the wrong angle altogether. Problem is, I don't know what is the right angle! How would you recommend I approach the problem? I am fairly new to Python (obviously) and more than willing to learn. But I would greatly appreciate suggestions on how to frame the issues and which multiprocessing modules (if any!) I should delve into.
Thanks,
Stefano

I'll offer my take on how to approach this problem. Within the multiprocessing module the Pipe and Queue IPC mechanisms are really the best way to go; in spite of the added complexity you allude to, it's worth learning how they work. The Pipe is fairly straightforward so I'll use that to illustrate.
Here's the code, followed by some explanation:
import sys
import os
import random
import time
import multiprocessing
class computing_task(multiprocessing.Process):
def __init__(self, name, pipe):
# call this before anything else
multiprocessing.Process.__init__(self)
# then any other initialization
self.name = name
self.ipcPipe = pipe
self.number1 = 0.0
self.number2 = 0.0
sys.stdout.write('[%s] created: %f\n' % (self.name, self.number1))
# Do some kind of computation
def someComputation(self):
try:
count = 0
while True:
count += 1
self.number1 = (random.uniform(0.0, 10.0)) * self.number2
sys.stdout.write('[%s]\t%d \t%g \t%g\n' % (self.name, count, self.number1, self.number2))
# Send result via pipe to parent process.
# Can send lists, whatever - anything picklable.
self.ipcPipe.send([self.name, self.number1])
# Get new data from parent process
newData = self.ipcPipe.recv()
self.number2 = newData[0]
time.sleep(0.5)
except KeyboardInterrupt:
return
def run(self):
sys.stdout.write('[%s] started ... process id: %s\n'
% (self.name, os.getpid()))
self.someComputation()
# When done, send final update to parent process and close pipe.
self.ipcPipe.send([self.name, self.number1])
self.ipcPipe.close()
sys.stdout.write('[%s] task completed: %f\n' % (self.name, self.number1))
def main():
# Create pipe
parent_conn, child_conn = multiprocessing.Pipe()
# Instantiate an object which contains the computation
# (give "child process pipe" to the object so it can phone home :) )
computeTask = computing_task('foo', child_conn)
# Start process
computeTask.start()
# Continually send and receive updates to/from the child process
try:
while True:
# receive data from child process
result = parent_conn.recv()
print "recv: ", result
# send new data to child process
parent_conn.send([random.uniform(0.0, 1.0)])
except KeyboardInterrupt:
computeTask.join()
parent_conn.close()
print "joined, exiting"
if (__name__ == "__main__"):
main()
I have encapsulated the computing to be done inside a class derived from Process. This isn't strictly necessary but makes the code easier to understand and extend, in most cases. From the main process you can start your computing task with the start() method on an instance of this class (this will start a separate process to run the contents of your object).
As you can see, we use Pipe in the parent process to create two connectors ("ends" of the pipe) and give one to the child while the the parent holds the other. Each of these connectors is a two-way communication mechanism between the processes holding the ends, with send() and recv() methods for doing what their names imply. In this example I've used the pipe to transmit lists of numbers and text, but in general you can send lists, tuples, objects, or anything that's picklable (i.e. serializable with Python's pickle facility). So you've got some latitude for what you send back and forth between processes.
So you set up your connectors, invoke start() on your new process, and you're off and computing. Here we're just multiplying two numbers, but you can see it's being done "interactively" in the subprocess with updates sent from the parent. Likewise the parent process is informed regularly of new results from the computing process.
Note that the connector's recv() method is blocking, i.e. if the other end hasn't sent anything yet, recv() will wait until something is there to read, and prevent anything else from happening in the meantime. So just be aware of that.
Hope this helps. Again, this is a barebones example and in real life you'll want to do more error handling, possibly use poll() on the connection objects, and so forth, but hopefully this conveys the major ideas and gets you started.

Related

Python simplest form of multiprocessing

Ive been trying to read up on threading and multiprocessing but all the examples are to intricate and advanced for my level of python/programming knowlegde. I want to run a function, which consists of a while loop, and while that loop runs I want to continue with the program and eventually change the condition for the while-loop and end that process. This is the code:
class Example():
def __init__(self):
self.condition = False
def func1(self):
self.condition = True
while self.condition:
print "Still looping"
time.sleep(1)
print "Finished loop"
def end_loop(self):
self.condition = False
The I make the following function-calls:
ex = Example()
ex.func1()
time.sleep(5)
ex.end_loop()
What I want is for the func1 to run for 5s before the end_loop() is called and changes the condition and ends the loop and thus also the function. I.e I want one process to start and "go" into func1 and at the same time I want time.sleep(5) to be called, so the processes "split" when arriving at func1, one process entering the function while the other continues down the program and start with the time.sleep(5) execution.
This must be the most basic example of a multiprocess, still Ive had trouble finding a simple way to do it!
Thank you
EDIT1: regarding do_something. In my real problem do_something is replaced by some code that communicates with another program via a socket and receives packages with coordinates every 0.02s and stores them in membervariables of the class. I want this constant updating of the coordinates to start and then be able to to read the coordinates via other functions at the same time.
However that is not so relevant. What if do_something is replaced by:
time.sleep(1)
print "Still looping"
How do I solve my problem then?
EDIT2: I have tried multiprocessing like this:
from multiprocessing import Process
ex = Example()
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
p1.start()
time.sleep(5)
p2.start()
When I ran this, I never got to p2.start(), so that did not help. Even if it had this is not really what Im looking for either. What I want would be just to start the process p1, and then continue with time.sleep and ex.end_loop()

The first problem with your code are the calls
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
With ex.func1() you're calling the function and pass the return value as target parameter. Since the function doesn't return anything, you're effectively calling
p1 = Process(target=None)
p2 = Process(target=None)
which makes, of course, no sense.
After fixing that, the next problem will be shared data: when using the multiprocessing package, you implement concurrency using multiple processes which, by default, cannot simply share data afaik. Have a look at Sharing state between processes in the package's documentation to read about this. Especially take the first sentence into account: "when doing concurrent programming it is usually best to avoid using shared state as far as possible"!
So you might want to also have a look at Exchanging objects between processes to read about how to send/receive data between two different processes. So, instead of simply setting a flag to stop the loop, it might be better to send a message to signal the loop should be terminated.
Also note that processes are a heavyweight form of multiprocessing, they spawn multiple OS processes which comes with a relatively big overhead. multiprocessing's main purpose is to avoid problems imposed by Python's Global Interpreter Lock (google about this to read more...) If your problem is'nt much more complex than what you've told us, you might want to use the threading package instead: threads come with less overhead than processes and also allow to access the same data (although you really should read about synchronization when doing this...)
I'm afraid, multiprocessing is an inherently complex subject. So I think you will need to advance your programming/python skills to successfully use it. But I'm sure you'll manage this, the python documentation about this is comprehensive and there are a lot of other resources about this.

To tackle your EDIT2 problem, you could try using the shared memory map Value.
import time
from multiprocessing import Process, Value
class Example():
def func1(self, cond):
while (cond.value == 1):
print('do something')
time.sleep(1)
return
if __name__ == '__main__':
ex = Example()
cond = Value('i', 1)
proc = Process(target=ex.func1, args=(cond,))
proc.start()
time.sleep(5)
cond.value = 0
proc.join()
(Note the target=ex.func1 without the parentheses and the comma after cond in args=(cond,).)
But look at the answer provided by MartinStettner to find a good solution.

How to use multiprocessing with class instances in Python?

I am trying to create a class than can run a separate process to go do some work that takes a long time, launch a bunch of these from a main module and then wait for them all to finish. I want to launch the processes once and then keep feeding them things to do rather than creating and destroying processes. For example, maybe I have 10 servers running the dd command, then I want them all to scp a file, etc.
My ultimate goal is to create a class for each system that keeps track of the information for the system in which it is tied to like IP address, logs, runtime, etc. But that class must be able to launch a system command and then return execution back to the caller while that system command runs, to followup with the result of the system command later.
My attempt is failing because I cannot send an instance method of a class over the pipe to the subprocess via pickle. Those are not pickleable. I therefore tried to fix it various ways but I can't figure it out. How can my code be patched to do this? What good is multiprocessing if you can't send over anything useful?
Is there any good documentation of multiprocessing being used with class instances? The only way I can get the multiprocessing module to work is on simple functions. Every attempt to use it within a class instance has failed. Maybe I should pass events instead? I don't understand how to do that yet.
import multiprocessing
import sys
import re
class ProcessWorker(multiprocessing.Process):
"""
This class runs as a separate process to execute worker's commands in parallel
Once launched, it remains running, monitoring the task queue, until "None" is sent
"""
def __init__(self, task_q, result_q):
multiprocessing.Process.__init__(self)
self.task_q = task_q
self.result_q = result_q
return
def run(self):
"""
Overloaded function provided by multiprocessing.Process. Called upon start() signal
"""
proc_name = self.name
print '%s: Launched' % (proc_name)
while True:
next_task_list = self.task_q.get()
if next_task is None:
# Poison pill means shutdown
print '%s: Exiting' % (proc_name)
self.task_q.task_done()
break
next_task = next_task_list[0]
print '%s: %s' % (proc_name, next_task)
args = next_task_list[1]
kwargs = next_task_list[2]
answer = next_task(*args, **kwargs)
self.task_q.task_done()
self.result_q.put(answer)
return
# End of ProcessWorker class
class Worker(object):
"""
Launches a child process to run commands from derived classes in separate processes,
which sit and listen for something to do
This base class is called by each derived worker
"""
def __init__(self, config, index=None):
self.config = config
self.index = index
# Launce the ProcessWorker for anything that has an index value
if self.index is not None:
self.task_q = multiprocessing.JoinableQueue()
self.result_q = multiprocessing.Queue()
self.process_worker = ProcessWorker(self.task_q, self.result_q)
self.process_worker.start()
print "Got here"
# Process should be running and listening for functions to execute
return
def enqueue_process(target): # No self, since it is a decorator
"""
Used to place an command target from this class object into the task_q
NOTE: Any function decorated with this must use fetch_results() to get the
target task's result value
"""
def wrapper(self, *args, **kwargs):
self.task_q.put([target, args, kwargs]) # FAIL: target is a class instance method and can't be pickled!
return wrapper
def fetch_results(self):
"""
After all processes have been spawned by multiple modules, this command
is called on each one to retreive the results of the call.
This blocks until the execution of the item in the queue is complete
"""
self.task_q.join() # Wait for it to to finish
return self.result_q.get() # Return the result
#enqueue_process
def run_long_command(self, command):
print "I am running number % as process "%number, self.name
# In here, I will launch a subprocess to run a long-running system command
# p = Popen(command), etc
# p.wait(), etc
return
def close(self):
self.task_q.put(None)
self.task_q.join()
if __name__ == '__main__':
config = ["some value", "something else"]
index = 7
workers = []
for i in range(5):
worker = Worker(config, index)
worker.run_long_command("ls /")
workers.append(worker)
for worker in workers:
worker.fetch_results()
# Do more work... (this would actually be done in a distributor in another class)
for worker in workers:
worker.close()
Edit: I tried to move the ProcessWorker class and the creation of the multiprocessing queues outside of the Worker class and then tried to manually pickle the worker instance. Even that doesn't work and I get an error
RuntimeError: Queue objects should only be shared between processes
through inheritance
. But I am only passing references of those queues into the worker instance?? I am missing something fundamental. Here is the modified code from the main section:
if __name__ == '__main__':
config = ["some value", "something else"]
index = 7
workers = []
for i in range(1):
task_q = multiprocessing.JoinableQueue()
result_q = multiprocessing.Queue()
process_worker = ProcessWorker(task_q, result_q)
worker = Worker(config, index, process_worker, task_q, result_q)
something_to_look_at = pickle.dumps(worker) # FAIL: Doesn't like queues??
process_worker.start()
worker.run_long_command("ls /")

So, the problem was that I was assuming that Python was doing some sort of magic that is somehow different from the way that C++/fork() works. I somehow thought that Python only copied the class, not the whole program into a separate process. I seriously wasted days trying to get this to work because all of the talk about pickle serialization made me think that it actually sent everything over the pipe. I knew that certain things could not be sent over the pipe, but I thought my problem was that I was not packaging things up properly.
This all could have been avoided if the Python docs gave me a 10,000 ft view of what happens when this module is used. Sure, it tells me what the methods of multiprocess module does and gives me some basic examples, but what I want to know is what is the "Theory of Operation" behind the scenes! Here is the kind of information I could have used. Please chime in if my answer is off. It will help me learn.
When you run start a process using this module, the whole program is copied into another process. But since it is not the "__main__" process and my code was checking for that, it doesn't fire off yet another process infinitely. It just stops and sits out there waiting for something to do, like a zombie. Everything that was initialized in the parent at the time of calling multiprocess.Process() is all set up and ready to go. Once you put something in the multiprocess.Queue or shared memory, or pipe, etc. (however you are communicating), then the separate process receives it and gets to work. It can draw upon all imported modules and setup just as if it was the parent. However, once some internal state variables change in the parent or separate process, those changes are isolated. Once the process is spawned, it now becomes your job to keep them in sync if necessary, either through a queue, pipe, shared memory, etc.
I threw out the code and started over, but now I am only putting one extra function out in the ProcessWorker, an "execute" method that runs a command line. Pretty simple. I don't have to worry about launching and then closing a bunch of processes this way, which has caused me all kinds of instability and performance issues in the past in C++. When I switched to launching processes at the beginning and then passing messages to those waiting processes, my performance improved and it was very stable.
BTW, I looked at this link to get help, which threw me off because the example made me think that methods were being transported across the queues: http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html
The second example of the first section used "next_task()" that appeared (to me) to be executing a task received via the queue.

Instead of attempting to send a method itself (which is impractical), try sending a name of a method to execute.
Provided that each worker runs the same code, it's a matter of a simple getattr(self, task_name).
I'd pass tuples (task_name, task_args), where task_args were a dict to be directly fed to the task method:
next_task_name, next_task_args = self.task_q.get()
if next_task_name:
task = getattr(self, next_task_name)
answer = task(**next_task_args)
...
else:
# poison pill, shut down
break

REF: https://stackoverflow.com/a/14179779
Answer on Jan 6 at 6:03 by David Lynch is not factually correct when he says that he was misled by
http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html.
The code and examples provided are correct and work as advertised. next_task() is executing a task received via the queue -- try and understand what the Task.__call__() method is doing.
In my case what, tripped me up was syntax errors in my implementation of run(). It seems that the sub-process will not report this and just fails silently -- leaving things stuck in weird loops! Make sure you have some kind of syntax checker running e.g. Flymake/Pyflakes in Emacs.
Debugging via multiprocessing.log_to_stderr()F helped me narrow down the problem.

Why are Python multiprocessing Pipe unsafe?

I don't understand why Pipes are said unsafe when there are multiple senders and receivers.
How the following code can be turned into code using Queues if this is the case ? Queues don't throw EOFError when closed, so my processes can't stop. Should I send endlessly 'Poison' messages to tell them to stop (this way, i'm sure all my processes receive at least one poison) ?
I would like to keep the pipe p1 open until I decide otherwise (here it's when I have sent the 10 messages).
from multiprocessing import Pipe, Process
from random import randint, random
from time import sleep
def job(name, p_in, p_out):
print(name + ' starting')
nb_msg = 0
try:
while True:
x = p_in.recv()
print(name + ' receives ' + x)
nb_msg = nb_msg + 1
p_out.send(x)
sleep(random())
except EOFError:
pass
print(name + ' ending ... ' + str(nb_msg) + ' message(s)')
if __name__ == '__main__':
p1_in, p1_out = Pipe()
p2_in, p2_out = Pipe()
proc = []
for i in range(3):
p = Process(target=job, args=(str(i), p1_out, p2_in))
p.start()
proc.append(p)
for x in range(10):
p1_in.send(chr(97+x))
p1_in.close()
for p in proc:
p.join()
p1_out.close()
p2_in.close()
try:
while True:
print(p2_out.recv())
except EOFError:
pass
p2_out.close()

Essentially, the problem is that Pipe is a thin wrapper around a platform-defined pipe object. recv simply repeatedly receives a buffer of bytes until a complete Python object is obtained. If two threads or processes use recv on the same pipe, the reads may interleave, leaving each process with half a pickled object and thus corrupting the data. Queues do proper synchronization between processes, at the expense of more complexity.
As the multiprocessing documentation puts it:
Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.
You don't have to endlessly send poison pills; one per worker is all you need. Each worker picks up exactly one poison pill before exiting, so there's no danger that a worker will somehow miss the message.
You should also consider using multiprocessing.Pool instead of reimplementing the "worker process" model -- Pool has a lot of methods which make distributing work across multiple threads very easy.

I don't understand why Pipes are said unsafe when there are multiple senders and receivers.
Consider you put water into a pipe from source A and B simultaneously. On the other end of the pipe, it will be impossible for you to find out which part of the water came from A or B, right? :)
A pipe transports a data stream on the byte level. Without a communication protocol on top of it, it does not know what a message is and therefore can't ensure message integrity. Therefore, it is not only 'unsafe' to use pipes with multiple senders. It is a major design flaw and will most likely lead to communication problems.
Queues, however, are implemented on a higher level. They are designed for communicating messages (or even abstract objects). Queues are made for keeping a message/object self-contained. Multiple sources can put objects into a queue and multiple consumers can pull these objects while being 100 % sure that whatever got into the queue as a unit also comes out of it as a unit.
Edit after quite a while:
I should add that in the byte stream, all bytes are retrieved in the same order as sent (guaranteed). The issue with multiple senders is that the sending order (the order of input) might already be unclear or random, i.e. multiple streams might mix in an unpredictable fashion.
A common queue implementation guarantees that single messages are kept intact, even if there are multiple senders. Messages are retrieved in the order as sent, too. With multiple competing senders and without further synchronization mechanisms there is, however, again no guarantee about the order of input messages.

python multiprocessing each with own subprocess (Kubuntu,Mac)

I've created a script that by default creates one multiprocessing Process; then it works fine. When starting multiple processes, it starts to hang, and not always in the same place. The program's about 700 lines of code, so I'll try to summarise what's going on. I want to make the most of my multi-cores, by parallelising the slowest task, which is aligning DNA sequences. For that I use the subprocess module to call a command-line program: 'hmmsearch', which I can feed in sequences through /dev/stdin, and then I read out the aligned sequences through /dev/stdout. I imagine the hang occurs because of these multiple subprocess instances reading / writing from stdout / stdin, and I really don't know the best way to go about this...
I was looking into os.fdopen(...) & os.tmpfile(), to create temporary filehandles or pipes where I can flush the data through. However, I've never used either before & I can't picture how to do that with the subprocess module. Ideally I'd like to bypass using the hard-drive entirely, because pipes are much better with high-throughput data processing!
Any help with this would be super wonderful!!
import multiprocessing, subprocess
from Bio import SeqIO
class align_seq( multiprocessing.Process ):
def __init__( self, inPipe, outPipe, semaphore, options ):
multiprocessing.Process.__init__(self)
self.in_pipe = inPipe ## Sequences in
self.out_pipe = outPipe ## Alignment out
self.options = options.copy() ## Modifiable sub-environment
self.sem = semaphore
def run(self):
inp = self.in_pipe.recv()
while inp != 'STOP':
seq_record , HMM = inp # seq_record is only ever one Bio.Seq.SeqRecord object at a time.
# HMM is a file location.
align_process = subprocess.Popen( ['hmmsearch', '-A', '/dev/stdout', '-o',os.devnull, HMM, '/dev/stdin'], shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE )
self.sem.acquire()
align_process.stdin.write( seq_record.format('fasta') )
align_process.stdin.close()
for seq in SeqIO.parse( align_process.stdout, 'stockholm' ): # get the alignment output
self.out_pipe.send_bytes( seq.seq.tostring() ) # send it to consumer
align_process.wait() # Don't know if there's any need for this??
self.sem.release()
align_process.stdout.close()
inp = self.in_pipe.recv()
self.in_pipe.close() #Close handles so don't overshoot max. limit on number of file-handles.
self.out_pipe.close()
Having spent a while debugging this, I've found a problem that was always there and isn't quite solved yet, but have fixed some other inefficiencies in the process (of debugging).
There are two initial feeder functions, this align_seq class and a file parser parseHMM() which loads a position specific scoring matrix (PSM) into a dictionary.
The main parent process then compares the alignment to the PSM, using a dictionary (of dictionaries) as a pointer to the relevant score for each residue. In order to calculate the scores I want I have two separate multiprocessing.Process classes, one class logScore() that calculates the log odds ratio (with math.exp() ); I parallelise this one; and it Queues the calculated scores to the last Process, sumScore() which just sums these scores (with math.fsum), returning the sum and all position specific scores back to the parent process as a dictionary.
i.e.
Queue.put( [sum, { residue position : position specific score , ... } ] )
I find this exceptionally confusing to get my head around (too many queue's!), so I hope that readers are managing to follow... After all the above calculations are done, I then give the option to save the cumulative scores as tab-delimited output. This is where it now (since last night) sometimes breaks, as I ensure it prints out a score for every position where there should be a score. I think that due to latency (computer timings being out-of-sync), sometimes what gets put in the Queue first for logScore doesn't reach sumScore first.
In order that sumScore knows when to return the tally and start again, I put 'endSEQ' into the Queue for the last logScore process that performed a calculation. I thought that then it should reach sumScore last too, but that's not always the case; only sometimes does it break. So now I don't get a deadlock anymore, but instead a KeyError when printing or saving the results.
I believe the reason for sometimes getting KeyError is because I create a Queue for each logScore process, but that instead they should all use the same Queue. Now, where I have something like:-
class logScore( multiprocessing.Process ):
def __init__( self, inQ, outQ ):
self.inQ = inQ
...
def scoreSequence( processes, HMMPSM, sequenceInPipe ):
process_index = -1
sequence = sequenceInPipe.recv_bytes()
for residue in sequence:
.... ## Get the residue score.
process_index += 1
processes[process_index].inQ.put( residue_score )
## End of sequence
processes[process_index].inQ.put( 'endSEQ' )
logScore_to_sumScoreQ = multiprocessing.Queue()
logScoreProcesses = [ logScore( multiprocessing.Queue() , logScore_to_sumScoreQ ) for i in xrange( options['-ncpus'] ) ]
sumScoreProcess = sumScore( logScore_to_sumScoreQ, scoresOut )
whereas I should create just one Queue to share between all the logScore instances. i.e.
logScore_to_sumScoreQ = multiprocessing.Queue()
scoreSeq_to_logScore = multiprocessing.Queue()
logScoreProcesses = [ logScore( scoreSeq_to_logScore , logScore_to_sumScoreQ ) for i in xrange( options['-ncpus'] ) ]
sumScoreProcess = sumScore( logScore_to_sumScoreQ, scoresOut )

That's not quite how pipelining works... but to put your mind to ease, here's an excerpt from the subprocess documentation:
stdin, stdout and stderr specify the
executed programs’ standard input,
standard output and standard error
file handles, respectively. Valid
values are PIPE, an existing file
descriptor (a positive integer), an
existing file object, and None. PIPE
indicates that a new pipe to the child
should be created. With None, no
redirection will occur; the child’s
file handles will be inherited from
the parent.
The likeliest areas at fault would be in the communication with the main process or in your management of the semaphore. Maybe state transitions / synchronization are not proceeding as expected due to a bug? I suggest debugging by adding logging/print statements before & after each blocking call - where you're communicating with the main process and where you acquire/release the semaphore to narrow down where things have gone wrong.
Also I'm curious - is the semaphore absolutely necessary?

I also wanted to parallelize simple tasks and for that I created a little python script. You can take a look at:
http://bioinf.comav.upv.es/psubprocess/index.html
Is a little more general than what you want, but for simple tasks is quite easy to use. It might be at least of some insparation to you.
Jose Blanca

It could be a deadlock in subprocess, have you tried using the communicate method rather than wait?
http://docs.python.org/library/subprocess.html

Asynchronous subprocess on Windows

First of all, the overall problem I am solving is a bit more complicated than I am showing here, so please do not tell me 'use threads with blocking' as it would not solve my actual situation without a fair, FAIR bit of rewriting and refactoring.
I have several applications which are not mine to modify, which take data from stdin and poop it out on stdout after doing their magic. My task is to chain several of these programs. Problem is, sometimes they choke, and as such I need to track their progress which is outputted on STDERR.
pA = subprocess.Popen(CommandA, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# ... some more processes make up the chain, but that is irrelevant to the problem
pB = subprocess.Popen(CommandB, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=pA.stdout )
Now, reading directly through pA.stdout.readline() and pB.stdout.readline(), or the plain read() functions, is a blocking matter. Since different applications output in different paces and different formats, blocking is not an option. (And as I wrote above, threading is not an option unless at a last, last resort.) pA.communicate() is deadlock safe, but since I need the information live, that is not an option either.
Thus google brought me to this asynchronous subprocess snippet on ActiveState.
All good at first, until I implement it. Comparing the cmd.exe output of pA.exe | pB.exe, ignoring the fact both output to the same window making for a mess, I see very instantaneous updates. However, I implement the same thing using the above snippet and the read_some() function declared there, and it takes over 10 seconds to notify updates of a single pipe. But when it does, it has updates leading all the way upto 40% progress, for example.
Thus I do some more research, and see numerous subjects concerning PeekNamedPipe, anonymous handles, and returning 0 bytes available even though there is information available in the pipe. As the subject has proven quite a bit beyond my expertise to fix or code around, I come to Stack Overflow to look for guidance. :)
My platform is W7 64-bit with Python 2.6, the applications are 32-bit in case it matters, and compatibility with Unix is not a concern. I can even deal with a full ctypes or pywin32 solution that subverts subprocess entirely if it is the only solution, as long as I can read from every stderr pipe asynchronously with immediate performance and no deadlocks. :)

How bad is it to have to use threads? I encountered much the same problem and eventually decided to use threads to gather up all the data on a sub-process's stdout and stderr and put it onto a thread-safe queue which which the main thread can read in a blocking fashion, without having to worry about the threading going on behind the scenes.
It's not clear what trouble you anticipate with a solution based on threads and blocking. Are you worried about having to make the rest of your code thread-safe? That shouldn't be an issue since the IO thread wouldn't need to interact with any of the rest of your code or data. If you have very restrictive memory requirements or your pipeline is particularly long then perhaps you may feel unhappy about spawning so many threads. I don't know enough about your situation so I couldn't say if this is likely to be a problem, but it seems to me that since you're already spawning off extra processes a few threads to interact with them should not be a terrible burden. In my situation I have not found these IO threads to be particularly problematic.
My thread function looked something like this:
def simple_io_thread(pipe, queue, tag, stop_event):
"""
Read line-by-line from pipe, writing (tag, line) to the
queue. Also checks for a stop_event to give up before
the end of the stream.
"""
while True:
line = pipe.readline()
while True:
try:
# Post to the queue with a large timeout in case the
# queue is full.
queue.put((tag, line), block=True, timeout=60)
break
except Queue.Full:
if stop_event.isSet():
break
continue
if stop_event.isSet() or line=="":
break
pipe.close()
When I start up the subprocess I do this:
outputqueue = Queue.Queue(50)
stop_event = threading.Event()
process = subprocess.Popen(
command,
cwd=workingdir,
env=env,
shell=useshell,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stderr_thread = threading.Thread(
target=simple_io_thread,
args=(process.stderr, outputqueue, "STDERR", stop_event)
)
stdout_thread = threading.Thread(
target=simple_io_thread,
args=(process.stdout, outputqueue, "STDOUT", stop_event)
)
stderr_thread.daemon = True
stdout_thread.daemon = True
stderr_thread.start()
stdout_thread.start()
Then when I want to read I can just block on outputqueue - each item read from it contains either a string to identify which pipe it came from and a line of text from that pipe. Very little code runs in a separate thread, and it only communicates with the main thread via a thread-safe queue (plus an event in case I need to give up early). Perhaps this approach would be useful and allow you to solve the problem with threads and blocking but without having to rewrite lots of code?
(My solution is made more complicated because I sometimes wish to terminate the subprocesses early, and want to be sure that the threads will all finish. If that's not an issue you can get rid of all the stop_event stuff and it becomes pretty succinct.)

I assume that the process pipeline will not deadlock if it only uses stdin and stdout; and the problem you're trying to solve is how to make it not deadlock if they write to stderr (and have to deal with stderr possibly getting backed up).
If you're letting multiple processes write to stderr, you have to watch out for their output being intermingled. I'm guessing you have that sorted somehow; just putting it out there to be sure.
Be aware of the -u flag to python; it is helpful when testing to see if OS buffering is screwing you up.
If you want to emulate select() on file handles in win32, your only choice is to use PeekNamedPipe() and friends. I have a snippet of code that reads line-oriented output from multiple processes at once, which you may even be able to use directly -- try passing the list of proc.stderr handles to it and go.
class NoLineError(Exception): pass
class NoMoreLineError(Exception): pass
class LineReader(object):
"""Helper class for multi_readlines."""
def __init__(self, f):
self.fd = f.fileno()
self.osf = msvcrt.get_osfhandle(self.fd)
self.buf = ''
def getline(self):
"""Returns a line of text, or raises NoLineError, or NoMoreLineError."""
try:
_, avail, _ = win32pipe.PeekNamedPipe(self.osf, 0)
bClosed = False
except pywintypes.error:
avail = 0
bClosed = True
if avail:
self.buf += os.read(self.fd, avail)
idx = self.buf.find('\n')
if idx >= 0:
ret, self.buf = self.buf[:idx+1], self.buf[idx+1:]
return ret
elif bClosed:
if self.buf:
ret, self.buf = self.buf, None
return ret
else:
raise NoMoreLineError
else:
raise NoLineError
def multi_readlines(fs, timeout=0):
"""Read lines from |fs|, a list of file objects.
The lines come out in arbitrary order, depending on which files
have output available first."""
if type(fs) not in (list, tuple):
raise Exception("argument must be a list.")
objs = [LineReader(f) for f in fs]
for i,obj in enumerate(objs): obj._index = i
while objs:
yielded = 0
for i,obj in enumerate(objs):
try:
yield (obj._index, obj.getline())
yielded += 1
except NoLineError:
#time.sleep(timeout)
pass
except NoMoreLineError:
del objs[i]
break # Because we mutated the array
if not yielded:
time.sleep(timeout)
pass
I have never seen the "Peek returns 0 bytes even though data is available" issue myself. If this happens to others, I bet their libc is buffering their stdout/stderr before sending the data to the OS; there is nothing you can do about that from outside. You have to make the app use unbuffered output somehow (-u to python; win32/libc calls to modify the stderr file handle, ...)
The fact that you are seeing nothing, then a ton of updates, makes me think that your problem is buffering on the source end. win32 libc may buffer differently if it writes to a pipe rather than a console. Again, the best you can do from outside those programs is to aggressively drain their output.

What about using Twisted's FD's? http://twistedmatrix.com/documents/8.1.0/api/twisted.internet.fdesc.html
It's not asynchronous but it is non-blocking. For asynchronous stuff, can you port to using Twisted?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.