Why are Python multiprocessing Pipe unsafe?

Why are Python multiprocessing Pipe unsafe? - python

I don't understand why Pipes are said unsafe when there are multiple senders and receivers.
How the following code can be turned into code using Queues if this is the case ? Queues don't throw EOFError when closed, so my processes can't stop. Should I send endlessly 'Poison' messages to tell them to stop (this way, i'm sure all my processes receive at least one poison) ?
I would like to keep the pipe p1 open until I decide otherwise (here it's when I have sent the 10 messages).
from multiprocessing import Pipe, Process
from random import randint, random
from time import sleep
def job(name, p_in, p_out):
print(name + ' starting')
nb_msg = 0
try:
while True:
x = p_in.recv()
print(name + ' receives ' + x)
nb_msg = nb_msg + 1
p_out.send(x)
sleep(random())
except EOFError:
pass
print(name + ' ending ... ' + str(nb_msg) + ' message(s)')
if __name__ == '__main__':
p1_in, p1_out = Pipe()
p2_in, p2_out = Pipe()
proc = []
for i in range(3):
p = Process(target=job, args=(str(i), p1_out, p2_in))
p.start()
proc.append(p)
for x in range(10):
p1_in.send(chr(97+x))
p1_in.close()
for p in proc:
p.join()
p1_out.close()
p2_in.close()
try:
while True:
print(p2_out.recv())
except EOFError:
pass
p2_out.close()

Essentially, the problem is that Pipe is a thin wrapper around a platform-defined pipe object. recv simply repeatedly receives a buffer of bytes until a complete Python object is obtained. If two threads or processes use recv on the same pipe, the reads may interleave, leaving each process with half a pickled object and thus corrupting the data. Queues do proper synchronization between processes, at the expense of more complexity.
As the multiprocessing documentation puts it:
Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.
You don't have to endlessly send poison pills; one per worker is all you need. Each worker picks up exactly one poison pill before exiting, so there's no danger that a worker will somehow miss the message.
You should also consider using multiprocessing.Pool instead of reimplementing the "worker process" model -- Pool has a lot of methods which make distributing work across multiple threads very easy.

I don't understand why Pipes are said unsafe when there are multiple senders and receivers.
Consider you put water into a pipe from source A and B simultaneously. On the other end of the pipe, it will be impossible for you to find out which part of the water came from A or B, right? :)
A pipe transports a data stream on the byte level. Without a communication protocol on top of it, it does not know what a message is and therefore can't ensure message integrity. Therefore, it is not only 'unsafe' to use pipes with multiple senders. It is a major design flaw and will most likely lead to communication problems.
Queues, however, are implemented on a higher level. They are designed for communicating messages (or even abstract objects). Queues are made for keeping a message/object self-contained. Multiple sources can put objects into a queue and multiple consumers can pull these objects while being 100 % sure that whatever got into the queue as a unit also comes out of it as a unit.
Edit after quite a while:
I should add that in the byte stream, all bytes are retrieved in the same order as sent (guaranteed). The issue with multiple senders is that the sending order (the order of input) might already be unclear or random, i.e. multiple streams might mix in an unpredictable fashion.
A common queue implementation guarantees that single messages are kept intact, even if there are multiple senders. Messages are retrieved in the order as sent, too. With multiple competing senders and without further synchronization mechanisms there is, however, again no guarantee about the order of input messages.

Related

Why can `popen.stdout.readline` deadlock and what to do about it?

From the Python documentation
Warning Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.
I'm trying to understand why this would deadlock. For some background, I am spawning N processes in parallel:
for c in commands:
h = subprocess.Popen(c, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
handles.append(h)
Then printing the output of each process 1-by-1:
for handle in handles:
while handle.poll() is None:
try:
line = handle.stdout.readline()
except UnicodeDecodeError:
line = "((INVALID UNICODE))\n"
sys.stdout.write(line)
if handle.returncode != 0:
print(handle.stdout.read(), file=sys.stdout)
if handle.returncode != 0:
print(handle.stderr.read(), file=sys.stderr)
Occasionally this does in fact deadlock. Unfortunately, the documentation's recommendation to use communicate() is not going to work for me, because this process could take several minutes to run, and I don't want it to appear dead during this time. It should print output in real time.
I have several options, such as changing the bufsize argument, polling in a different thread for each handle, etc. But in order to decide what the best way to fix this, I think I need to understand what the fundamental reason for the deadlock is in the first place. Something to do with buffer sizes, apparently, but what? I can hypothesize that maybe all of these processes are sharing a single OS kernel object, and because I'm only draining the buffer of one of the processes, the other ones fill it up, in which case option 2 above would probably fix it. But maybe that's not even the real problem.
Can anyone shed some light on this?

The bidirectional communication between the parent and child processes uses two unidirectional pipes. One for each direction. OK, stderr is the third one, but the idea is the same.
A pipe has two ends, one for writing, one for reading. The capacity of a pipe was 4K and is now 64K on modern Linux. One can expect similar values on other systems. This means, the writer can write to a pipe without problems up to its limit, but then the pipe gets full and a write to it blocks until the reader reads some data from the other end.
From the reader's view is the situation obvious. A regular read blocks until data is available.
To summarize: a deadlock occurs when a process attempts to read from a pipe where nobody is writing to or when it writes data larger that the pipe's capacity to a pipe nobody is reading from.
Typically the two processes act as a client & server and utilize some kind of request/response style communication. Something like half-duplex. One side is writing and the other one is reading. Then they switch the roles. This is practically the most complex setup we can handle with standard synchronous programming. And a deadlock can stil occur when the client and server get somehow out of sync. This can be caused by an empty response, unexpected error message, etc.
If there are several child processes or when the communication protocol is not so simple or we just want a robust solution, we need the parent to operate on all the pipes. communicate() uses threads for this purpose. The other approach is asynchronous I/O: first check, which one is ready to do I/O and only then read or write from that pipe (or socket). The old and deprecated asyncore library implemented that.
On the low level, the select (or similar) system call checks which file handles from a given set are ready for I/O. But at that low level, we can do only one read or write before re-checking. That is the problem of this snippet:
while handle.poll() is None:
try:
line = handle.stdout.readline()
except UnicodeDecodeError:
line = "((INVALID UNICODE))\n"
The poll check tells us there is something to be read, but this does not mean we will be able to read repeatedly until a newline! We can only do one read and append the data to an input buffer. If there is a newline, we can extract the whole line and process it. If not, we need to wait to next succesfull poll and read.
Writes behave similarly. We can write once, check the number of bytes written and remove that many bytes from the output buffer.
That implies that line buffering and all that higher level stuff needs to be implemented on top of that. Fortunately, the successor of asyncore offers what we need: asyncio > subprocesses.
I hope I could explain the deadlock. The solution could be expected. If you need to do several things, use either threading or asyncio.
UPDATE:
Below is a short asyncio test program. It reads inputs from several child processes and prints the data line by line.
But first a cmd.py helper which prints a line in several small chunks to demonstrate the line buffering. Try the usage e.g. with python3 cmd.py 10.
import sys
import time
def countdown(n):
print('START', n)
while n >= 0:
print(n, end=' ', flush=True)
time.sleep(0.1)
n -= 1
print('END')
if __name__ == '__main__':
args = sys.argv[1:]
if len(args) != 1:
sys.exit(3)
countdown(int(args[0]))
And the main program:
import asyncio
PROG = 'cmd.py'
NPROC = 12
async def run1(*execv):
"""Run a program, read input lines."""
proc = await asyncio.create_subprocess_exec(
*execv,
stdin=asyncio.subprocess.DEVNULL,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.DEVNULL)
# proc.stdout is a StreamReader object
async for line in proc.stdout:
print("Got line:", line.decode().strip())
async def manager(prog, nproc):
"""Spawn 'nproc' copies of python script 'prog'."""
tasks = [asyncio.create_task(run1('python3', prog, str(i))) for i in range(nproc)]
await asyncio.wait(tasks)
if __name__ == '__main__':
asyncio.run(manager(PROG, NPROC))
The async for line ... is a feature of StreamReader similar to the for line in file: idiom. It can be replaced it with:
while True:
line = await proc.stdout.readline()
if not line:
break
print("Got line:", line.decode().strip())

Troubleshooting data inconsistencies with Python multiprocessing/threading

TL;DR: Getting different results after running code with threading and multiprocessing and single threaded. Need guidance on troubleshooting.
Hello, I apologize in advance if this may be a bit too generic, but I need a bit of help troubleshooting an issue and I am not sure how best to proceed.
Here is the story; I have a bunch of data indexed into a Solr Collection (~250m items), all items in that collection have a sessionid. Some items can share the same session id. I am combing through the collection to extract all items that have the same session, massage the data a bit and spit out another JSON file for indexing later.
The code has two main functions:
proc_day - accepts a day and processes all the sessions for that day
and
proc_session - does everything that needs to happen for a single session.
Multiprocessing is implemented on proc_day, so each day would be processed by a separate process, the proc_session function can be ran with threads. Below is the code I am using for threading/multiprocessing below. It accepts a function, a list of arguments and number of threads / multiprocesses. It will then create a queue based on input args, then create processes/threads and let them go through it. I am not posting the actual code, since it generally runs fine single threaded without any issues, but can post it if needed.
autoprocs.py
import sys
import logging
from multiprocessing import Process, Queue,JoinableQueue
import time
import multiprocessing
import os
def proc_proc(func,data,threads,delay=10):
if threads < 0:
return
q = JoinableQueue()
procs = []
for i in range(threads):
thread = Process(target=proc_exec,args=(func,q))
thread.daemon = True;
thread.start()
procs.append(thread)
for item in data:
q.put(item)
logging.debug(str(os.getpid()) + ' *** Processes started and data loaded into queue waiting')
s = q.qsize()
while s > 0:
logging.info(str(os.getpid()) + " - Proc Queue Size is:" + str(s))
s = q.qsize()
time.sleep(delay)
for p in procs:
logging.debug(str(os.getpid()) + " - Joining Process {}".format(p))
p.join(1)
logging.debug(str(os.getpid()) + ' - *** Main Proc waiting')
q.join()
logging.debug(str(os.getpid()) + ' - *** Done')
def proc_exec(func,q):
p = multiprocessing.current_process()
logging.debug(str(os.getpid()) + ' - Starting:{},{}'.format(p.name, p.pid))
while True:
d = q.get()
try:
logging.debug(str(os.getpid()) + " - Starting to Process {}".format(d))
func(d)
sys.stdout.flush()
logging.debug(str(os.getpid()) + " - Marking Task as Done")
q.task_done()
except:
logging.error(str(os.getpid()) + " - Exception in subprocess execution")
logging.error(sys.exc_info()[0])
logging.debug(str(os.getpid()) + 'Ending:{},{}'.format(p.name, p.pid))
autothreads.py:
import threading
import logging
import time
from queue import Queue
def thread_proc(func,data,threads):
if threads < 0:
return "Thead Count not specified"
q = Queue()
for i in range(threads):
thread = threading.Thread(target=thread_exec,args=(func,q))
thread.daemon = True
thread.start()
for item in data:
q.put(item)
logging.debug('*** Main thread waiting')
s = q.qsize()
while s > 0:
logging.debug("Queue Size is:" + str(s))
s = q.qsize()
time.sleep(1)
logging.debug('*** Main thread waiting')
q.join()
logging.debug('*** Done')
def thread_exec(func,q):
while True:
d = q.get()
#logging.debug("Working...")
try:
func(d)
except:
pass
q.task_done()
I am running into problems with validating data after python runs under different multiprocessing/threading configs. There is a lot of data, so I really need to get multiprocessing working. Here are the results of my test yesterday.
Only with multiprocessing - 10 procs:
Days Processed 30
Sessions Found 3,507,475
Sessions Processed 3,514,496
Files 162,140
Data Output: 1.9G
multiprocessing and multithreading - 10 procs 10 threads
Days Processed 30
Sessions Found 3,356,362
Sessions Processed 3,272,402
Files 424,005
Data Output: 2.2GB
just threading - 10 threads
Days Processed 31
Sessions Found 3,595,263
Sessions Processed 3,595,263
Files 733,664
Data Output: 3.3GB
Single process/ no threading
Days Processed 31
Sessions Found 3,595,263
Sessions Processed 3,595,263
Files 162,190
Data Output: 1.9GB
These counts were gathered by grepping and counties entries in the log files (1 per main process). The first thing that jumps out is that days processed doesn't match. However, I manually checked the log files and it looks like a log entry was missing, there are follow on log entries to indicate that the day was actually processed. I have no idea why it was omitted.
I really don't want to write more code to validate this code, just seems like a terrible waste of time, is there any alternative?

I gave some general hints in the comments above. I think there are multiple problems with your approach, at very different levels of abstraction. You are also not showing all code of relevance.
The issue might very well be
in the method you are using to read from solr or in preparing read data before feeding it to your workers.
in the architecture you have come up with for distributing the work among multiple processes.
in your logging infrastructure (as you have pointed out yourself).
in your analysis approach.
You have to go through all of these points, and as of the complexity of the issue surely nobody here will be able to identify the exact issues for you.
Regarding points (3) and (4):
If you are not sure about the completeness of your log files, you should perform the analysis based on the payload output of your processing engine. What I am trying to say: the log files probably are just a side product of your data processing. The primary product is the thing you should analyze. Of course it is also important to get your logs right. But these two problems should be treated independently.
My contribution regarding point (2) in the list above:
What is especially suspicious about your multiprocessing-based solution is your way to wait for the workers to finish. You seem not to be sure by which method you should wait for your workers, so you apply three different methods:
First, you are monitoring the size of the queue in a while loop and wait for it to become 0. This is a non-canonical approach, which might actually work.
Secondly, you join() your processes in a weird way:
for p in procs:
logging.debug(str(os.getpid()) + " - Joining Process {}".format(p))
p.join(1)
Why are you defining a timeout of one second here and do not respond to whether the process actually terminated within that time frame? You should either really join a process, i.e. wait until it has terminated or you specify a timeout and, if that timeout expires before the process finishes, treat that situation specially. Your code does not distinguish these situations, so p.join(1) is like writing time.sleep(1) instead.
Thirdly, you join the queue.
So, after making sure that q.qsize() returns 0 and after waiting for another second, do you really think that joining the queue is important? Does it make any difference? One of these approaches should be enough, and you need to think about which of these criteria is most important to your problem. That is, one of these conditions should deterministically implicate the other two.
All this looks like a quick & dirty hack of a multiprocessing solution, whereas you yourself are not really sure how that solution should behave. One of the most important insights I have obtained while working on concurrency architectures: You, the architect, must be 100 % aware of how the communication and control flow works in your system. Not properly monitoring and controlling the state of your worker processes may very well be the source of the issues you are observing.

I figured it out, I followed Jan-Philip's advice and started examining the output data of the multiprocess/multithreaded process. Turned out that an object that does all these things with the data from Solr was shared among threads. I did not have any locking mechanisms, so in a case it had mixed data from multiple sessions which caused inconsistent output. I validated this by instantiating a new object for every thread and the counts matched up. It is a bit slower, but still workable.
Thanks

Synchronous/Asynchronous behaviour of python Pipes

In my application I'm using pipes from the multiprocessing module to communicate between python processes.
Lately I've observed a weird behaviour depending on the size of data I'm sending through them.
According to the python documentation these pipes are based on the connections and should behave in an asynchronous manner, yet sometimes they get stuck at sending. If I enable full duplex in each connection everything works fine, even though I'm not using the connections for both sending and listening.
Can anyone explain this behaviour?
100 floats, full duplex disable
The code works, utilizing the asynchronousness.
100 floats, full duplex enable
The example works fine as expected.
10000 floats, full duplex disable
The execution is blocked forever even though it was fine with the smaller data.
10000 floats, full duplex enable
Fine again.
Code (it's not my production code, it just illustrates what I mean):
from collections import deque
from multiprocessing import Process, Pipe
from numpy.random import randn
from os import getpid
PROC_NR = 4
DATA_POINTS = 100
# DATA_POINTS = 10000
def arg_passer(pipe_in, pipe_out, list_):
my_pid = getpid()
print "{}: Before send".format(my_pid)
pipe_out.send(list_)
print "{}: After send, before recv".format(my_pid)
buf = pipe_in.recv()
print "{}: After recv".format(my_pid)
if __name__ == "__main__":
pipes = [Pipe(False) for _ in range(PROC_NR)]
# pipes = [Pipe(True) for _ in range(PROC_NR)]
pipes_in = deque(p[0] for p in pipes)
pipes_out = deque(p[1] for p in pipes)
pipes_in.rotate(1)
pipes_out.rotate(-1)
data = [randn(DATA_POINTS) for foo in xrange(PROC_NR)]
processes = [Process(target=arg_passer, args=(pipes_in[foo], pipes_out[foo], data[foo]))
for foo in xrange(PROC_NR)]
for proc in processes:
proc.start()
for proc in processes:
proc.join()

First of all, it's worth noting the implementation of the multiprocessing.Pipe class...
def Pipe(duplex=True):
'''
Returns pair of connection objects at either end of a pipe
'''
if duplex:
s1, s2 = socket.socketpair()
s1.setblocking(True)
s2.setblocking(True)
c1 = _multiprocessing.Connection(os.dup(s1.fileno()))
c2 = _multiprocessing.Connection(os.dup(s2.fileno()))
s1.close()
s2.close()
else:
fd1, fd2 = os.pipe()
c1 = _multiprocessing.Connection(fd1, writable=False)
c2 = _multiprocessing.Connection(fd2, readable=False)
return c1, c2
The difference being that half-duplex 'Pipes' use an anonymous pipe, but full-duplex 'Pipes' actually use a Unix domain socket, since anonymous pipes are unidirectional by nature.
I'm not sure what you mean by the term "asynchronous" in this context. If you mean "non-blocking I/O" then it's worth noting that both implementations use blocking I/O by default.
Secondly, it's worth noting the pickled size of the data you're trying to send...
>>> from numpy.random import randn
>>> from cPickle import dumps
>>> len(dumps(randn(100)))
2479
>>> len(dumps(randn(10000)))
237154
Thirdly, from the pipe(7) manpage...
Pipe Capacity
A pipe has a limited capacity. If the pipe is full, then a write(2) will block
or fail, depending on whether the O_NONBLOCK flag is set (see below). Different
implementations have different limits for the pipe capacity. Applications should
not rely on a particular capacity: an application should be designed so that a
reading process consumes data as soon as it is available, so that a writing process
does not remain blocked.
In Linux versions before 2.6.11, the capacity of a pipe was the same as the system
page size (e.g., 4096 bytes on i386). Since Linux 2.6.11, the pipe capacity is
65536 bytes.
So, in effect, you've created a deadlock where all the subprocesses have blocked on the pipe_out.send() call, and none of them can receive any data from the other processes, because you're sending all 237,154 bytes of data in one hit, which has filled the 65,536 byte buffer.
You might be tempted just to use the Unix domain socket version, but the only reason it works at present is that it has a larger buffer size than a pipe, and you'll find that solution will also fail if you increase the number of DATA_POINTS to 100,000.
The "quick n' dirty hack" solution is to break the data into smaller chunks for sending, but it's not good practice to rely on the buffers being a specific size.
A better solution would be to use non-blocking I/O on the pipe_out.send() call, although I'm not familiar enough with the multiprocessing module to determine the best way to achieve it using that module.
The pseudocode would be along the lines of...
while 1:
if we have sent all data and received all data:
break
send as much data as we can without blocking
receive as much data as we can without blocking
if we didn't send or receive anything in this iteration:
sleep for a bit so we don't waste CPU time
continue
...or you can use the Python select module to avoid sleeping for longer than is necessary, but, again, integrating it with multiprocessing.Pipe might be tricky.
It's possible that the multiprocessing.Queue class does all this for you, but I've never used it before, so you'd have to do some experiments.

How to control a simulation in Python

I have a fairly high-level question about Python and running interactive simulations. Here is the setup:
I am porting to Python some simulation software I originally wrote in Smalltalk (VW). It is a kind of Recurrent Neural Network controlled interactively from a graphical interface. The interface allows the manipulation of most the network's parameters in real time, in addition to controlling the simulation itself (starting it, stopping it, etc). In the original Smalltalk implementation, I had two processes running with different priority levels:
The interface itself with a higher priority
The neural network running forever at a lower priority
Communication between the two processes was trivial, because all Smalltalk processes share the same address space (the Object memory).
I am now starting to realize that replicating a similar setup in Python is not so trivial. The threading module does not allow its threads to share address space, as far as I can tell. The multiprocessing module does, but in a rather complex way (with Queues, etc).
So I am starting to think that my Smalltalk perspective is leading me astray and I am approaching a relatively simple problem from the wrong angle altogether. Problem is, I don't know what is the right angle! How would you recommend I approach the problem? I am fairly new to Python (obviously) and more than willing to learn. But I would greatly appreciate suggestions on how to frame the issues and which multiprocessing modules (if any!) I should delve into.
Thanks,
Stefano

I'll offer my take on how to approach this problem. Within the multiprocessing module the Pipe and Queue IPC mechanisms are really the best way to go; in spite of the added complexity you allude to, it's worth learning how they work. The Pipe is fairly straightforward so I'll use that to illustrate.
Here's the code, followed by some explanation:
import sys
import os
import random
import time
import multiprocessing
class computing_task(multiprocessing.Process):
def __init__(self, name, pipe):
# call this before anything else
multiprocessing.Process.__init__(self)
# then any other initialization
self.name = name
self.ipcPipe = pipe
self.number1 = 0.0
self.number2 = 0.0
sys.stdout.write('[%s] created: %f\n' % (self.name, self.number1))
# Do some kind of computation
def someComputation(self):
try:
count = 0
while True:
count += 1
self.number1 = (random.uniform(0.0, 10.0)) * self.number2
sys.stdout.write('[%s]\t%d \t%g \t%g\n' % (self.name, count, self.number1, self.number2))
# Send result via pipe to parent process.
# Can send lists, whatever - anything picklable.
self.ipcPipe.send([self.name, self.number1])
# Get new data from parent process
newData = self.ipcPipe.recv()
self.number2 = newData[0]
time.sleep(0.5)
except KeyboardInterrupt:
return
def run(self):
sys.stdout.write('[%s] started ... process id: %s\n'
% (self.name, os.getpid()))
self.someComputation()
# When done, send final update to parent process and close pipe.
self.ipcPipe.send([self.name, self.number1])
self.ipcPipe.close()
sys.stdout.write('[%s] task completed: %f\n' % (self.name, self.number1))
def main():
# Create pipe
parent_conn, child_conn = multiprocessing.Pipe()
# Instantiate an object which contains the computation
# (give "child process pipe" to the object so it can phone home :) )
computeTask = computing_task('foo', child_conn)
# Start process
computeTask.start()
# Continually send and receive updates to/from the child process
try:
while True:
# receive data from child process
result = parent_conn.recv()
print "recv: ", result
# send new data to child process
parent_conn.send([random.uniform(0.0, 1.0)])
except KeyboardInterrupt:
computeTask.join()
parent_conn.close()
print "joined, exiting"
if (__name__ == "__main__"):
main()
I have encapsulated the computing to be done inside a class derived from Process. This isn't strictly necessary but makes the code easier to understand and extend, in most cases. From the main process you can start your computing task with the start() method on an instance of this class (this will start a separate process to run the contents of your object).
As you can see, we use Pipe in the parent process to create two connectors ("ends" of the pipe) and give one to the child while the the parent holds the other. Each of these connectors is a two-way communication mechanism between the processes holding the ends, with send() and recv() methods for doing what their names imply. In this example I've used the pipe to transmit lists of numbers and text, but in general you can send lists, tuples, objects, or anything that's picklable (i.e. serializable with Python's pickle facility). So you've got some latitude for what you send back and forth between processes.
So you set up your connectors, invoke start() on your new process, and you're off and computing. Here we're just multiplying two numbers, but you can see it's being done "interactively" in the subprocess with updates sent from the parent. Likewise the parent process is informed regularly of new results from the computing process.
Note that the connector's recv() method is blocking, i.e. if the other end hasn't sent anything yet, recv() will wait until something is there to read, and prevent anything else from happening in the meantime. So just be aware of that.
Hope this helps. Again, this is a barebones example and in real life you'll want to do more error handling, possibly use poll() on the connection objects, and so forth, but hopefully this conveys the major ideas and gets you started.

Asynchronous subprocess on Windows

First of all, the overall problem I am solving is a bit more complicated than I am showing here, so please do not tell me 'use threads with blocking' as it would not solve my actual situation without a fair, FAIR bit of rewriting and refactoring.
I have several applications which are not mine to modify, which take data from stdin and poop it out on stdout after doing their magic. My task is to chain several of these programs. Problem is, sometimes they choke, and as such I need to track their progress which is outputted on STDERR.
pA = subprocess.Popen(CommandA, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# ... some more processes make up the chain, but that is irrelevant to the problem
pB = subprocess.Popen(CommandB, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=pA.stdout )
Now, reading directly through pA.stdout.readline() and pB.stdout.readline(), or the plain read() functions, is a blocking matter. Since different applications output in different paces and different formats, blocking is not an option. (And as I wrote above, threading is not an option unless at a last, last resort.) pA.communicate() is deadlock safe, but since I need the information live, that is not an option either.
Thus google brought me to this asynchronous subprocess snippet on ActiveState.
All good at first, until I implement it. Comparing the cmd.exe output of pA.exe | pB.exe, ignoring the fact both output to the same window making for a mess, I see very instantaneous updates. However, I implement the same thing using the above snippet and the read_some() function declared there, and it takes over 10 seconds to notify updates of a single pipe. But when it does, it has updates leading all the way upto 40% progress, for example.
Thus I do some more research, and see numerous subjects concerning PeekNamedPipe, anonymous handles, and returning 0 bytes available even though there is information available in the pipe. As the subject has proven quite a bit beyond my expertise to fix or code around, I come to Stack Overflow to look for guidance. :)
My platform is W7 64-bit with Python 2.6, the applications are 32-bit in case it matters, and compatibility with Unix is not a concern. I can even deal with a full ctypes or pywin32 solution that subverts subprocess entirely if it is the only solution, as long as I can read from every stderr pipe asynchronously with immediate performance and no deadlocks. :)

How bad is it to have to use threads? I encountered much the same problem and eventually decided to use threads to gather up all the data on a sub-process's stdout and stderr and put it onto a thread-safe queue which which the main thread can read in a blocking fashion, without having to worry about the threading going on behind the scenes.
It's not clear what trouble you anticipate with a solution based on threads and blocking. Are you worried about having to make the rest of your code thread-safe? That shouldn't be an issue since the IO thread wouldn't need to interact with any of the rest of your code or data. If you have very restrictive memory requirements or your pipeline is particularly long then perhaps you may feel unhappy about spawning so many threads. I don't know enough about your situation so I couldn't say if this is likely to be a problem, but it seems to me that since you're already spawning off extra processes a few threads to interact with them should not be a terrible burden. In my situation I have not found these IO threads to be particularly problematic.
My thread function looked something like this:
def simple_io_thread(pipe, queue, tag, stop_event):
"""
Read line-by-line from pipe, writing (tag, line) to the
queue. Also checks for a stop_event to give up before
the end of the stream.
"""
while True:
line = pipe.readline()
while True:
try:
# Post to the queue with a large timeout in case the
# queue is full.
queue.put((tag, line), block=True, timeout=60)
break
except Queue.Full:
if stop_event.isSet():
break
continue
if stop_event.isSet() or line=="":
break
pipe.close()
When I start up the subprocess I do this:
outputqueue = Queue.Queue(50)
stop_event = threading.Event()
process = subprocess.Popen(
command,
cwd=workingdir,
env=env,
shell=useshell,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stderr_thread = threading.Thread(
target=simple_io_thread,
args=(process.stderr, outputqueue, "STDERR", stop_event)
)
stdout_thread = threading.Thread(
target=simple_io_thread,
args=(process.stdout, outputqueue, "STDOUT", stop_event)
)
stderr_thread.daemon = True
stdout_thread.daemon = True
stderr_thread.start()
stdout_thread.start()
Then when I want to read I can just block on outputqueue - each item read from it contains either a string to identify which pipe it came from and a line of text from that pipe. Very little code runs in a separate thread, and it only communicates with the main thread via a thread-safe queue (plus an event in case I need to give up early). Perhaps this approach would be useful and allow you to solve the problem with threads and blocking but without having to rewrite lots of code?
(My solution is made more complicated because I sometimes wish to terminate the subprocesses early, and want to be sure that the threads will all finish. If that's not an issue you can get rid of all the stop_event stuff and it becomes pretty succinct.)

I assume that the process pipeline will not deadlock if it only uses stdin and stdout; and the problem you're trying to solve is how to make it not deadlock if they write to stderr (and have to deal with stderr possibly getting backed up).
If you're letting multiple processes write to stderr, you have to watch out for their output being intermingled. I'm guessing you have that sorted somehow; just putting it out there to be sure.
Be aware of the -u flag to python; it is helpful when testing to see if OS buffering is screwing you up.
If you want to emulate select() on file handles in win32, your only choice is to use PeekNamedPipe() and friends. I have a snippet of code that reads line-oriented output from multiple processes at once, which you may even be able to use directly -- try passing the list of proc.stderr handles to it and go.
class NoLineError(Exception): pass
class NoMoreLineError(Exception): pass
class LineReader(object):
"""Helper class for multi_readlines."""
def __init__(self, f):
self.fd = f.fileno()
self.osf = msvcrt.get_osfhandle(self.fd)
self.buf = ''
def getline(self):
"""Returns a line of text, or raises NoLineError, or NoMoreLineError."""
try:
_, avail, _ = win32pipe.PeekNamedPipe(self.osf, 0)
bClosed = False
except pywintypes.error:
avail = 0
bClosed = True
if avail:
self.buf += os.read(self.fd, avail)
idx = self.buf.find('\n')
if idx >= 0:
ret, self.buf = self.buf[:idx+1], self.buf[idx+1:]
return ret
elif bClosed:
if self.buf:
ret, self.buf = self.buf, None
return ret
else:
raise NoMoreLineError
else:
raise NoLineError
def multi_readlines(fs, timeout=0):
"""Read lines from |fs|, a list of file objects.
The lines come out in arbitrary order, depending on which files
have output available first."""
if type(fs) not in (list, tuple):
raise Exception("argument must be a list.")
objs = [LineReader(f) for f in fs]
for i,obj in enumerate(objs): obj._index = i
while objs:
yielded = 0
for i,obj in enumerate(objs):
try:
yield (obj._index, obj.getline())
yielded += 1
except NoLineError:
#time.sleep(timeout)
pass
except NoMoreLineError:
del objs[i]
break # Because we mutated the array
if not yielded:
time.sleep(timeout)
pass
I have never seen the "Peek returns 0 bytes even though data is available" issue myself. If this happens to others, I bet their libc is buffering their stdout/stderr before sending the data to the OS; there is nothing you can do about that from outside. You have to make the app use unbuffered output somehow (-u to python; win32/libc calls to modify the stderr file handle, ...)
The fact that you are seeing nothing, then a ton of updates, makes me think that your problem is buffering on the source end. win32 libc may buffer differently if it writes to a pipe rather than a console. Again, the best you can do from outside those programs is to aggressively drain their output.

What about using Twisted's FD's? http://twistedmatrix.com/documents/8.1.0/api/twisted.internet.fdesc.html
It's not asynchronous but it is non-blocking. For asynchronous stuff, can you port to using Twisted?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.