Synchronous/Asynchronous behaviour of python Pipes

Synchronous/Asynchronous behaviour of python Pipes - python

In my application I'm using pipes from the multiprocessing module to communicate between python processes.
Lately I've observed a weird behaviour depending on the size of data I'm sending through them.
According to the python documentation these pipes are based on the connections and should behave in an asynchronous manner, yet sometimes they get stuck at sending. If I enable full duplex in each connection everything works fine, even though I'm not using the connections for both sending and listening.
Can anyone explain this behaviour?
100 floats, full duplex disable
The code works, utilizing the asynchronousness.
100 floats, full duplex enable
The example works fine as expected.
10000 floats, full duplex disable
The execution is blocked forever even though it was fine with the smaller data.
10000 floats, full duplex enable
Fine again.
Code (it's not my production code, it just illustrates what I mean):
from collections import deque
from multiprocessing import Process, Pipe
from numpy.random import randn
from os import getpid
PROC_NR = 4
DATA_POINTS = 100
# DATA_POINTS = 10000
def arg_passer(pipe_in, pipe_out, list_):
my_pid = getpid()
print "{}: Before send".format(my_pid)
pipe_out.send(list_)
print "{}: After send, before recv".format(my_pid)
buf = pipe_in.recv()
print "{}: After recv".format(my_pid)
if __name__ == "__main__":
pipes = [Pipe(False) for _ in range(PROC_NR)]
# pipes = [Pipe(True) for _ in range(PROC_NR)]
pipes_in = deque(p[0] for p in pipes)
pipes_out = deque(p[1] for p in pipes)
pipes_in.rotate(1)
pipes_out.rotate(-1)
data = [randn(DATA_POINTS) for foo in xrange(PROC_NR)]
processes = [Process(target=arg_passer, args=(pipes_in[foo], pipes_out[foo], data[foo]))
for foo in xrange(PROC_NR)]
for proc in processes:
proc.start()
for proc in processes:
proc.join()

First of all, it's worth noting the implementation of the multiprocessing.Pipe class...
def Pipe(duplex=True):
'''
Returns pair of connection objects at either end of a pipe
'''
if duplex:
s1, s2 = socket.socketpair()
s1.setblocking(True)
s2.setblocking(True)
c1 = _multiprocessing.Connection(os.dup(s1.fileno()))
c2 = _multiprocessing.Connection(os.dup(s2.fileno()))
s1.close()
s2.close()
else:
fd1, fd2 = os.pipe()
c1 = _multiprocessing.Connection(fd1, writable=False)
c2 = _multiprocessing.Connection(fd2, readable=False)
return c1, c2
The difference being that half-duplex 'Pipes' use an anonymous pipe, but full-duplex 'Pipes' actually use a Unix domain socket, since anonymous pipes are unidirectional by nature.
I'm not sure what you mean by the term "asynchronous" in this context. If you mean "non-blocking I/O" then it's worth noting that both implementations use blocking I/O by default.
Secondly, it's worth noting the pickled size of the data you're trying to send...
>>> from numpy.random import randn
>>> from cPickle import dumps
>>> len(dumps(randn(100)))
2479
>>> len(dumps(randn(10000)))
237154
Thirdly, from the pipe(7) manpage...
Pipe Capacity
A pipe has a limited capacity. If the pipe is full, then a write(2) will block
or fail, depending on whether the O_NONBLOCK flag is set (see below). Different
implementations have different limits for the pipe capacity. Applications should
not rely on a particular capacity: an application should be designed so that a
reading process consumes data as soon as it is available, so that a writing process
does not remain blocked.
In Linux versions before 2.6.11, the capacity of a pipe was the same as the system
page size (e.g., 4096 bytes on i386). Since Linux 2.6.11, the pipe capacity is
65536 bytes.
So, in effect, you've created a deadlock where all the subprocesses have blocked on the pipe_out.send() call, and none of them can receive any data from the other processes, because you're sending all 237,154 bytes of data in one hit, which has filled the 65,536 byte buffer.
You might be tempted just to use the Unix domain socket version, but the only reason it works at present is that it has a larger buffer size than a pipe, and you'll find that solution will also fail if you increase the number of DATA_POINTS to 100,000.
The "quick n' dirty hack" solution is to break the data into smaller chunks for sending, but it's not good practice to rely on the buffers being a specific size.
A better solution would be to use non-blocking I/O on the pipe_out.send() call, although I'm not familiar enough with the multiprocessing module to determine the best way to achieve it using that module.
The pseudocode would be along the lines of...
while 1:
if we have sent all data and received all data:
break
send as much data as we can without blocking
receive as much data as we can without blocking
if we didn't send or receive anything in this iteration:
sleep for a bit so we don't waste CPU time
continue
...or you can use the Python select module to avoid sleeping for longer than is necessary, but, again, integrating it with multiprocessing.Pipe might be tricky.
It's possible that the multiprocessing.Queue class does all this for you, but I've never used it before, so you'd have to do some experiments.

Related

Start a new process and share an object to it (shared memory)

Can a process #1 start a process #2 with Popen, and pass to it a reference to a dict D, so that process #2 can read its content?
# process1.py
import subprocess, time
D = {'foo': 'bar'}
subprocess.Popen(['python3', 'process2.py', str(id(D))])
time.sleep(3600)
# process2.py
import sys
ref = sys.argv[1] # can we turn this address into an (at least read-only) object?
# and read foo/bar?
If not, I know sockets (and more generally messaging techniques involving networking), /dev/shm on Linux, and these IPC techniques to make 2 different processes communicate, but are there even simpler solutions by just sharing an in-memory object?
I guess, for security reasons, the processes should be started with a special option to authorize its memory to be shared with other processes.

After further research, it seems mmap is perfect for this (but note that it will not share an object but just bytes. pickle can be used for the serialization though).
To map anonymous memory, -1 should be passed as the fileno along with the length.
tagname, if specified and not None, is a string giving a tag name for the mapping. Windows allows you to have many different mappings against the same file. If you specify the name of an existing tag, that tag is opened, otherwise a new tag of this name is created.
Thus, this is a very simple shared-memory inter process communication without using a socket:
Server:
import mmap, time
mm = mmap.mmap(-1, 20, tagname="foo")
while True:
mm.seek(0)
mm.write(str(time.time()).encode())
mm.flush()
time.sleep(1)
Client:
import mmap, time
mm = mmap.mmap(-1, 20, tagname="foo")
while True:
mm.seek(0)
buf = mm.read(128)
print(buf)
time.sleep(1)
Note: this is only for Windows:
Avoiding the use of the tag parameter will assist in keeping your code portable between Unix and Windows.
Another example with video data: https://github.com/off99555/python-mmap-ipc

Why can `popen.stdout.readline` deadlock and what to do about it?

From the Python documentation
Warning Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.
I'm trying to understand why this would deadlock. For some background, I am spawning N processes in parallel:
for c in commands:
h = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
handles.append(h)
Then printing the output of each process 1-by-1:
for handle in handles:
while handle.poll() is None:
try:
line = handle.stdout.readline()
except UnicodeDecodeError:
line = "((INVALID UNICODE))\n"
sys.stdout.write(line)
if handle.returncode != 0:
print(handle.stdout.read(), file=sys.stdout)
if handle.returncode != 0:
print(handle.stderr.read(), file=sys.stderr)
Occasionally this does in fact deadlock. Unfortunately, the documentation's recommendation to use communicate() is not going to work for me, because this process could take several minutes to run, and I don't want it to appear dead during this time. It should print output in real time.
I have several options, such as changing the bufsize argument, polling in a different thread for each handle, etc. But in order to decide what the best way to fix this, I think I need to understand what the fundamental reason for the deadlock is in the first place. Something to do with buffer sizes, apparently, but what? I can hypothesize that maybe all of these processes are sharing a single OS kernel object, and because I'm only draining the buffer of one of the processes, the other ones fill it up, in which case option 2 above would probably fix it. But maybe that's not even the real problem.
Can anyone shed some light on this?

The bidirectional communication between the parent and child processes uses two unidirectional pipes. One for each direction. OK, stderr is the third one, but the idea is the same.
A pipe has two ends, one for writing, one for reading. The capacity of a pipe was 4K and is now 64K on modern Linux. One can expect similar values on other systems. This means, the writer can write to a pipe without problems up to its limit, but then the pipe gets full and a write to it blocks until the reader reads some data from the other end.
From the reader's view is the situation obvious. A regular read blocks until data is available.
To summarize: a deadlock occurs when a process attempts to read from a pipe where nobody is writing to or when it writes data larger that the pipe's capacity to a pipe nobody is reading from.
Typically the two processes act as a client & server and utilize some kind of request/response style communication. Something like half-duplex. One side is writing and the other one is reading. Then they switch the roles. This is practically the most complex setup we can handle with standard synchronous programming. And a deadlock can stil occur when the client and server get somehow out of sync. This can be caused by an empty response, unexpected error message, etc.
If there are several child processes or when the communication protocol is not so simple or we just want a robust solution, we need the parent to operate on all the pipes. communicate() uses threads for this purpose. The other approach is asynchronous I/O: first check, which one is ready to do I/O and only then read or write from that pipe (or socket). The old and deprecated asyncore library implemented that.
On the low level, the select (or similar) system call checks which file handles from a given set are ready for I/O. But at that low level, we can do only one read or write before re-checking. That is the problem of this snippet:
while handle.poll() is None:
try:
line = handle.stdout.readline()
except UnicodeDecodeError:
line = "((INVALID UNICODE))\n"
The poll check tells us there is something to be read, but this does not mean we will be able to read repeatedly until a newline! We can only do one read and append the data to an input buffer. If there is a newline, we can extract the whole line and process it. If not, we need to wait to next succesfull poll and read.
Writes behave similarly. We can write once, check the number of bytes written and remove that many bytes from the output buffer.
That implies that line buffering and all that higher level stuff needs to be implemented on top of that. Fortunately, the successor of asyncore offers what we need: asyncio > subprocesses.
I hope I could explain the deadlock. The solution could be expected. If you need to do several things, use either threading or asyncio.
UPDATE:
Below is a short asyncio test program. It reads inputs from several child processes and prints the data line by line.
But first a cmd.py helper which prints a line in several small chunks to demonstrate the line buffering. Try the usage e.g. with python3 cmd.py 10.
import sys
import time
def countdown(n):
print('START', n)
while n >= 0:
print(n, end=' ', flush=True)
time.sleep(0.1)
n -= 1
print('END')
if __name__ == '__main__':
args = sys.argv[1:]
if len(args) != 1:
sys.exit(3)
countdown(int(args[0]))
And the main program:
import asyncio
PROG = 'cmd.py'
NPROC = 12
async def run1(*execv):
"""Run a program, read input lines."""
proc = await asyncio.create_subprocess_exec(
*execv,
stdin=asyncio.subprocess.DEVNULL,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.DEVNULL)
# proc.stdout is a StreamReader object
async for line in proc.stdout:
print("Got line:", line.decode().strip())
async def manager(prog, nproc):
"""Spawn 'nproc' copies of python script 'prog'."""
tasks = [asyncio.create_task(run1('python3', prog, str(i))) for i in range(nproc)]
await asyncio.wait(tasks)
if __name__ == '__main__':
asyncio.run(manager(PROG, NPROC))
The async for line ... is a feature of StreamReader similar to the for line in file: idiom. It can be replaced it with:
while True:
line = await proc.stdout.readline()
if not line:
break
print("Got line:", line.decode().strip())

How to apply parallel or asynchronous I/O file writing on a python piece of code

To begin with, we're given the following piece of code:
from validate_email import validate_email
import time
import os
def verify_emails(email_path, good_filepath, bad_filepath):
good_emails = open(good_filepath, 'w+')
bad_emails = open(bad_filepath, 'w+')
emails = set()
with open(email_path) as f:
for email in f:
email = email.strip()
if email in emails:
continue
emails.add(email)
if validate_email(email, verify=True):
good_emails.write(email + '\n')
else:
bad_emails.write(email + '\n')
if __name__ == "__main__":
os.system('cls')
verify_emails("emails.txt", "good_emails.txt", "bad_emails.txt")
I expect contacting SMTP servers to be the most expensive part by far from my program when emails.txt contains large amount of lines (>1k). Using some form of parallel or asynchronous I/O should speed this up a lot, since I can wait for multiple servers to respond instead of waiting sequentially.
As far as I have read:
Asynchronous I/O operates by queuing a request for I/O to the file
descriptor, tracked independently of the calling process. For a file
descriptor that supports asynchronous I/O (raw disk devcies
typically), a process can call aio_read() (for instance) to request a
number of bytes be read from the file descriptor. The system call
returns immediately, whether or not the I/O has completed. Some time
later, the process then polls the operating system for the completion
of the I/O (that is, buffer is filled with data).
To be sincere, I didn't quite understand how to implement async I/O on my program. Can anybody take a little time and explain me the whole process ?
EDIT as per PArakleta suggested:
from validate_email import validate_email
import time
import os
from multiprocessing import Pool
import itertools
def validate_map(e):
return (validate_email(e.strip(), verify=True), e)
seen_emails = set()
def unique(e):
if e in seen_emails:
return False
seen_emails.add(e)
return True
def verify_emails(email_path, good_filepath, bad_filepath):
good_emails = open(good_filepath, 'w+')
bad_emails = open(bad_filepath, 'w+')
with open(email_path, "r") as f:
for result in Pool().imap_unordered(validate_map,
itertools.ifilter(unique, f):
(good, email) = result
if good:
good_emails.write(email)
else:
bad_emails.write(email)
good_emails.close()
bad_emails.close()
if __name__ == "__main__":
os.system('cls')
verify_emails("emails.txt", "good_emails.txt", "bad_emails.txt")

You're asking the wrong question
Having looked at the validate_email package your real problem is that you're not efficiently batching your results. You should be only doing the MX lookup once per domain and then only connect to each MX server once, go through the handshake, and then check all of the addresses for that server in a single batch. Thankfully the validate_email package does the MX result caching for you, but you still need to be group the email addresses by server to batch the query to the server itself.
You need to edit the validate_email package to implement batching, and then probably give a thread to each domain using the actual threading library rather than multiprocessing.
It's always important to profile your program if it's slow and figure out where it is actually spending the time rather than trying to apply optimisation tricks blindly.
The requested solution
IO is already asynchronous if you are using buffered IO and your use case fits with the OS buffering. The only place you could potentially get some advantage is in read-ahead but Python already does this if you use the iterator access to a file (which you are doing). AsyncIO is an advantage to programs that are moving large amounts of data and have disabled the OS buffers to prevent copying the data twice.
You need to actually profile/benchmark your program to see if it has any room for improvement. If your disks aren't already throughput bound then there is a chance to improve the performance by parallel execution of the processing of each email (address?). The easiest way to check this is probably to check to see if the core running your program is maxed out (i.e. you are CPU bound and not IO bound).
If you are CPU bound then you need to look at threading. Unfortunately Python threading doesn't work in parallel unless you have non-Python work to be done so instead you'll have to use multiprocessing (I'm assuming validate_email is a Python function).
How exactly you proceed depends on where the bottleneck's in your program are and how much of a speed up you need to get to the point where you are IO bound (since you cannot actually go any faster than that you can stop optimising when you hit that point).
The emails set object is hard to share because you'll need to lock around it so it's probably best that you keep that in one thread. Looking at the multiprocessing library the easiest mechanism to use is probably Process Pools.
Using this you would need to wrap your file iterable in an itertools.ifilter which discards duplicates, and then feed this into a Pool.imap_unordered and then iterate that result and write into your two output files.
Something like:
with open(email_path) as f:
for result in Pool().imap_unordered(validate_map,
itertools.ifilter(unique, f):
(good, email) = result
if good:
good_emails.write(email)
else:
bad_emails.write(email)
The validate_map function should be something simple like:
def validate_map(e):
return (validate_email(e.strip(), verify=True), e)
The unique function should be something like:
seen_emails = set()
def unique(e):
if e in seen_emails:
return False
seen_emails.add(e)
return True
ETA: I just realised that validate_email is a library which actually contacts SMTP servers. Given that it's not busy in Python code you can use threading. The threading API though is not as convenient as the multiprocessing library but you can use multiprocessing.dummy to have a thread based Pool.
If you are CPU bound then it's not really worth having more threads/processes than cores but since your bottleneck is network IO you can benefit from many more threads/processes. Since processes are expensive you want to swap to threads and then crank up the number running in parallel (although you should be polite not to DOS-attack the servers you are connecting to).
Consider from multiprocessing.dummy import Pool as ThreadPool and then call ThreadPool(processes=32).imap_unordered().

Why are Python multiprocessing Pipe unsafe?

I don't understand why Pipes are said unsafe when there are multiple senders and receivers.
How the following code can be turned into code using Queues if this is the case ? Queues don't throw EOFError when closed, so my processes can't stop. Should I send endlessly 'Poison' messages to tell them to stop (this way, i'm sure all my processes receive at least one poison) ?
I would like to keep the pipe p1 open until I decide otherwise (here it's when I have sent the 10 messages).
from multiprocessing import Pipe, Process
from random import randint, random
from time import sleep
def job(name, p_in, p_out):
print(name + ' starting')
nb_msg = 0
try:
while True:
x = p_in.recv()
print(name + ' receives ' + x)
nb_msg = nb_msg + 1
p_out.send(x)
sleep(random())
except EOFError:
pass
print(name + ' ending ... ' + str(nb_msg) + ' message(s)')
if __name__ == '__main__':
p1_in, p1_out = Pipe()
p2_in, p2_out = Pipe()
proc = []
for i in range(3):
p = Process(target=job, args=(str(i), p1_out, p2_in))
p.start()
proc.append(p)
for x in range(10):
p1_in.send(chr(97+x))
p1_in.close()
for p in proc:
p.join()
p1_out.close()
p2_in.close()
try:
while True:
print(p2_out.recv())
except EOFError:
pass
p2_out.close()

Essentially, the problem is that Pipe is a thin wrapper around a platform-defined pipe object. recv simply repeatedly receives a buffer of bytes until a complete Python object is obtained. If two threads or processes use recv on the same pipe, the reads may interleave, leaving each process with half a pickled object and thus corrupting the data. Queues do proper synchronization between processes, at the expense of more complexity.
As the multiprocessing documentation puts it:
Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.
You don't have to endlessly send poison pills; one per worker is all you need. Each worker picks up exactly one poison pill before exiting, so there's no danger that a worker will somehow miss the message.
You should also consider using multiprocessing.Pool instead of reimplementing the "worker process" model -- Pool has a lot of methods which make distributing work across multiple threads very easy.

I don't understand why Pipes are said unsafe when there are multiple senders and receivers.
Consider you put water into a pipe from source A and B simultaneously. On the other end of the pipe, it will be impossible for you to find out which part of the water came from A or B, right? :)
A pipe transports a data stream on the byte level. Without a communication protocol on top of it, it does not know what a message is and therefore can't ensure message integrity. Therefore, it is not only 'unsafe' to use pipes with multiple senders. It is a major design flaw and will most likely lead to communication problems.
Queues, however, are implemented on a higher level. They are designed for communicating messages (or even abstract objects). Queues are made for keeping a message/object self-contained. Multiple sources can put objects into a queue and multiple consumers can pull these objects while being 100 % sure that whatever got into the queue as a unit also comes out of it as a unit.
Edit after quite a while:
I should add that in the byte stream, all bytes are retrieved in the same order as sent (guaranteed). The issue with multiple senders is that the sending order (the order of input) might already be unclear or random, i.e. multiple streams might mix in an unpredictable fashion.
A common queue implementation guarantees that single messages are kept intact, even if there are multiple senders. Messages are retrieved in the order as sent, too. With multiple competing senders and without further synchronization mechanisms there is, however, again no guarantee about the order of input messages.

Asynchronous subprocess on Windows

First of all, the overall problem I am solving is a bit more complicated than I am showing here, so please do not tell me 'use threads with blocking' as it would not solve my actual situation without a fair, FAIR bit of rewriting and refactoring.
I have several applications which are not mine to modify, which take data from stdin and poop it out on stdout after doing their magic. My task is to chain several of these programs. Problem is, sometimes they choke, and as such I need to track their progress which is outputted on STDERR.
pA = subprocess.Popen(CommandA, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# ... some more processes make up the chain, but that is irrelevant to the problem
pB = subprocess.Popen(CommandB, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=pA.stdout )
Now, reading directly through pA.stdout.readline() and pB.stdout.readline(), or the plain read() functions, is a blocking matter. Since different applications output in different paces and different formats, blocking is not an option. (And as I wrote above, threading is not an option unless at a last, last resort.) pA.communicate() is deadlock safe, but since I need the information live, that is not an option either.
Thus google brought me to this asynchronous subprocess snippet on ActiveState.
All good at first, until I implement it. Comparing the cmd.exe output of pA.exe | pB.exe, ignoring the fact both output to the same window making for a mess, I see very instantaneous updates. However, I implement the same thing using the above snippet and the read_some() function declared there, and it takes over 10 seconds to notify updates of a single pipe. But when it does, it has updates leading all the way upto 40% progress, for example.
Thus I do some more research, and see numerous subjects concerning PeekNamedPipe, anonymous handles, and returning 0 bytes available even though there is information available in the pipe. As the subject has proven quite a bit beyond my expertise to fix or code around, I come to Stack Overflow to look for guidance. :)
My platform is W7 64-bit with Python 2.6, the applications are 32-bit in case it matters, and compatibility with Unix is not a concern. I can even deal with a full ctypes or pywin32 solution that subverts subprocess entirely if it is the only solution, as long as I can read from every stderr pipe asynchronously with immediate performance and no deadlocks. :)

How bad is it to have to use threads? I encountered much the same problem and eventually decided to use threads to gather up all the data on a sub-process's stdout and stderr and put it onto a thread-safe queue which which the main thread can read in a blocking fashion, without having to worry about the threading going on behind the scenes.
It's not clear what trouble you anticipate with a solution based on threads and blocking. Are you worried about having to make the rest of your code thread-safe? That shouldn't be an issue since the IO thread wouldn't need to interact with any of the rest of your code or data. If you have very restrictive memory requirements or your pipeline is particularly long then perhaps you may feel unhappy about spawning so many threads. I don't know enough about your situation so I couldn't say if this is likely to be a problem, but it seems to me that since you're already spawning off extra processes a few threads to interact with them should not be a terrible burden. In my situation I have not found these IO threads to be particularly problematic.
My thread function looked something like this:
def simple_io_thread(pipe, queue, tag, stop_event):
"""
Read line-by-line from pipe, writing (tag, line) to the
queue. Also checks for a stop_event to give up before
the end of the stream.
"""
while True:
line = pipe.readline()
while True:
try:
# Post to the queue with a large timeout in case the
# queue is full.
queue.put((tag, line), block=True, timeout=60)
break
except Queue.Full:
if stop_event.isSet():
break
continue
if stop_event.isSet() or line=="":
break
pipe.close()
When I start up the subprocess I do this:
outputqueue = Queue.Queue(50)
stop_event = threading.Event()
process = subprocess.Popen(
command,
cwd=workingdir,
env=env,
shell=useshell,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stderr_thread = threading.Thread(
target=simple_io_thread,
args=(process.stderr, outputqueue, "STDERR", stop_event)
)
stdout_thread = threading.Thread(
target=simple_io_thread,
args=(process.stdout, outputqueue, "STDOUT", stop_event)
)
stderr_thread.daemon = True
stdout_thread.daemon = True
stderr_thread.start()
stdout_thread.start()
Then when I want to read I can just block on outputqueue - each item read from it contains either a string to identify which pipe it came from and a line of text from that pipe. Very little code runs in a separate thread, and it only communicates with the main thread via a thread-safe queue (plus an event in case I need to give up early). Perhaps this approach would be useful and allow you to solve the problem with threads and blocking but without having to rewrite lots of code?
(My solution is made more complicated because I sometimes wish to terminate the subprocesses early, and want to be sure that the threads will all finish. If that's not an issue you can get rid of all the stop_event stuff and it becomes pretty succinct.)

I assume that the process pipeline will not deadlock if it only uses stdin and stdout; and the problem you're trying to solve is how to make it not deadlock if they write to stderr (and have to deal with stderr possibly getting backed up).
If you're letting multiple processes write to stderr, you have to watch out for their output being intermingled. I'm guessing you have that sorted somehow; just putting it out there to be sure.
Be aware of the -u flag to python; it is helpful when testing to see if OS buffering is screwing you up.
If you want to emulate select() on file handles in win32, your only choice is to use PeekNamedPipe() and friends. I have a snippet of code that reads line-oriented output from multiple processes at once, which you may even be able to use directly -- try passing the list of proc.stderr handles to it and go.
class NoLineError(Exception): pass
class NoMoreLineError(Exception): pass
class LineReader(object):
"""Helper class for multi_readlines."""
def __init__(self, f):
self.fd = f.fileno()
self.osf = msvcrt.get_osfhandle(self.fd)
self.buf = ''
def getline(self):
"""Returns a line of text, or raises NoLineError, or NoMoreLineError."""
try:
_, avail, _ = win32pipe.PeekNamedPipe(self.osf, 0)
bClosed = False
except pywintypes.error:
avail = 0
bClosed = True
if avail:
self.buf += os.read(self.fd, avail)
idx = self.buf.find('\n')
if idx >= 0:
ret, self.buf = self.buf[:idx+1], self.buf[idx+1:]
return ret
elif bClosed:
if self.buf:
ret, self.buf = self.buf, None
return ret
else:
raise NoMoreLineError
else:
raise NoLineError
def multi_readlines(fs, timeout=0):
"""Read lines from |fs|, a list of file objects.
The lines come out in arbitrary order, depending on which files
have output available first."""
if type(fs) not in (list, tuple):
raise Exception("argument must be a list.")
objs = [LineReader(f) for f in fs]
for i,obj in enumerate(objs): obj._index = i
while objs:
yielded = 0
for i,obj in enumerate(objs):
try:
yield (obj._index, obj.getline())
yielded += 1
except NoLineError:
#time.sleep(timeout)
pass
except NoMoreLineError:
del objs[i]
break # Because we mutated the array
if not yielded:
time.sleep(timeout)
pass
I have never seen the "Peek returns 0 bytes even though data is available" issue myself. If this happens to others, I bet their libc is buffering their stdout/stderr before sending the data to the OS; there is nothing you can do about that from outside. You have to make the app use unbuffered output somehow (-u to python; win32/libc calls to modify the stderr file handle, ...)
The fact that you are seeing nothing, then a ton of updates, makes me think that your problem is buffering on the source end. win32 libc may buffer differently if it writes to a pipe rather than a console. Again, the best you can do from outside those programs is to aggressively drain their output.

What about using Twisted's FD's? http://twistedmatrix.com/documents/8.1.0/api/twisted.internet.fdesc.html
It's not asynchronous but it is non-blocking. For asynchronous stuff, can you port to using Twisted?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.