Interleaving stdout from Popen with messages from ZMQ recv

Interleaving stdout from Popen with messages from ZMQ recv - python

Is there a best-practices approach to poll for the stdout/stderr from a subprocess.Popen as well as a zmq socket?
In my case, I have my main program spawning a Popen subprocess. The subprocess publishes messages via zmq which I then want to subscribe to in my main program.
Waiting on multiple zmq sockets is not complicated with the zmq.Poller but when I want to interleave this with the output from my subprocess itself, I am unsure how to do it in the best way without risking waits or having needless loops.
In the end, I would like to use it like so:
process = Popen([prog, '--publish-to', 'tcp://127.0.0.1:89890'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE, ...)
for (origin, data) in interleave(process, 'tcp://127.0.0.1:89890'):
if origin == 'STDOUT': pass
if origin == 'STDERR': pass
if origin == 'ZMQ': pass
prog --publish-to tcp://127.0.0.1:89890 will then open a zmq.PUB socket and publish data, whereas the interleave function will subscribe to this and also poll for stdout and stderr, yielding whatever data reaches it first.
I know how to define interleave with multiple daemon threads and queues but I don’t know if this approach might have some caveats with regards to lazy reading (ie. stdout might not be processed until the end of the program?) or other things that I have not yet thought about (seems also to be quite a bit of overhead for such a task).
I will be thankful for all ideas or insights.
I aim for at least Python 3.3/3.4 but if this turns out to be much easier with the new async/await tools, I could also use Python 3.5 for the code.

Use zmq.Poller: http://pyzmq.readthedocs.io/en/latest/api/zmq.html#polling. You can register zmq sockets and native file descriptors (e.g. process.stdout.fileno() and process.stderr.fileno()) there, and it will wait until input is available on at least one of the registered sources.
I don't know if it works in Windows, you should try.

Related

Capture stderr in ProcessPoolExecutor

Im using ProcessPoolExecutor to execute external command is there any way to capture stderr when the process is finished (similar to subprocess)? Im capturing executor.submit() future result but that returns 0 or 1.

This might not be an answer but it is of that direction and far too long for a comment, so here goes.
I would say no to that. You might be able to achieve this by tinkering with stderr file descriptors, redirecting that to a stream of your own and returning this as the worker result but I am wondering if ProcessPoolExecutor is suitable for your task - of course not knowing what it is.
A subprocess created by a process pool does not finish like a subprocess created by yourself. It stays alive waiting for more work to arrive until you close the pool. If your worker produces stdout or stderr, they go to the same place where your main process directs its stdout and stderr.
Your workers will also process many different tasks. If your pool size is four and you submit ten tasks, how do you then decipher from a plain stderr capture which task created what message?
I have a hunch this needs to be redesigned. You would be able to raise exceptions in your workers and then later capture those from your future objects. Or it might be that your task is something where a pool is just not suitable. If subprocesses do what you want them to do, why not use them instead?
Pools are good for parallelising repetitive tasks that return and receive modest amount of data (implemented as queues that are not miracle performers), with a standard interface and standardised output/error handling. Pools simplify your code by providing the routine part. If your subtasks require different inputs, their outputs and error handling vary greatly or there is a lot of data to be transmitted, you might be better off by building the parallel processing part yourself.

Interprocess Communication between two python scripts without STDOUT

I am trying to create a Monitor script that monitors all the threads or a huge python script which has several loggers running, several thread running.
From Monitor.py i could run subprocess and forward the STDOUT which might contain my status of the threads.. but since several loggers are running i am seeing other logging in that..
Question: How can run the main script as a separate process and get custom messages, thread status without interfering with logging. ( passing PIPE as argument ? )
Main_Script.py * Runs Several Threads * Each Thread has separate Loggers.
Monitor.py * Spins up the Main_script.py * Monitors the each of the threads in MainScript.py ( may be obtain other messages from Main_script in the future)
So Far, I tried subprocess, process from Multiprocessing.
Subprocess lets me start the Main_script and forward the stdout back to monitor but I see the logging of threads coming in through the same STDOUT. I am using the “import logging “ Library to log the data from each threads to separate files.
I tried “process” from Multiprocessing. I had to call the main function of the main_script.py as a process and send a PIPE argument to it from monitor.py. Now I can’t see the Main_script.py as a separate process when I run top command.

Normally, you want to change the child process to work like a typical Unix userland tool: the logging and other side-band information goes to stderr (or to a file, or syslog, etc.), and only the actual output goes to stdout.
Then, the problem is easy: just capture stdout to a PIPE that you process, and either capture stderr to a different PIPE, or pass it through to real stderr.
If that's not appropriate for some reason, you need to come up with some other mechanism for IPC: Unix or Windows named pipes, anonymous pipes that you pass by leaking the file descriptor across the fork/exec and then pass the fd as an argument, Unix-domain sockets, TCP or UDP localhost sockets, a higher-level protocol like a web service on top of TCP sockets, mmapped files, anonymous mmaps or pipes that you pass between processes via a Unix-domain socket or Windows API calls, …
As you can see, there are a huge number of options. Without knowing anything about your problem other than that you want "custom messages", it's impossible to tell you which one you want.
While we're at it: If you can rewrite your code around multiprocessing rather than subprocess, there are nice high-level abstractions built in to that module. For example, you can use a Queue that automatically manages synchronization and blocking, and also manages pickling/unpickling so you can just pass any (picklable) object rather than having to worry about serializing to text and parsing the text. Or you can create shared memory holding arrays of int32 objects, or NumPy arrays, or arbitrary structures that you define with ctypes. And so on. Of course you could build the same abstractions yourself, without needing to use multiprocessing, but it's a lot easier when they're there out of the box.
Finally, while your question is tagged ipc and pipe, and titled "Interprocess Communication", your description refers to threads, not processes. If you actually are using a bunch of threads in a single process, you don't need any of this.
You can just stick your results on a queue.Queue, or store them in a list or deque with a Lock around it, or pass in a callback to be called with each new result, or use a higher-level abstraction like concurrent.futures.ThreadPoolExecutor and return a Future object or an iterator of Futures, etc.

Communicate multiple times with a subprocess in Python

This question is NOT a duplicate of
Communicate multiple times with a process without breaking the pipe?
That question is solved because its use case allows inputs to be sent together, but this is not true if your program is interactive (as illustrated in the use case here).
Document subprocess.Popen says:
communicate(input=None)
Interact with process: Send data to stdin. Read data from stdout
and stderr, until end-of-file is reached. Wait for process to
terminate. ...
Is it possible to communicate multiple times with the subprocess before its termination, like with a terminal or with a network socket?
For example, if the subprocess is bc, the parent process may want to send it different inputs for calculation as needed. Since inputs send to bc may depend on user inputs, it is not possible to send all inputs at once.

This turns out to be not very difficult:
proc = subprocess.Popen(['bc'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
os.write(proc.stdin.fileno(), b'100+200\n')
print(os.read(proc.stdout.fileno(), 4096))

Basicly Non-blocking read on a subprocess.PIPE in python
Set the proc pipes (proc.stdout, proc.stdin, ...) to nonblocking mode via fnctl and then write/read them directly.
You might want to use epoll or select via the select or io modules for more efficiency.

What is the best way to capture output from a process using python?

I am using python's subprocess module to start a new process. I would like to capture the output of the new process in real time so I can do things with it (display it, parse it, etc.). I have seen many examples of how this can be done, some use custom file-like objects, some use threading and some attempt to read the output until the process has completed.
File Like Objects Example (click me)
I would prefer not to use custom file-like objects because I want to allow users to supply their own values for stdin, stdout and stderr.
Threading Example (click me)
I do not really understand why threading is required so I am reluctant to follow this example. If someone can explain why the threading example makes sense I would be happy listen. However, this example also restricts users from supplying their own stdout and stderr values.
Read Output Example (see below)
The example which makes the most sense to me is to read the stdout, stderr until the process has finished. Here is some example code:
import subprocess
# Start a process which prints the options to the python program.
process = subprocess.Popen(
["python", "-h"],
bufsize=1,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
# While the process is running, display the output to the user.
while True:
# Read standard output data.
for stdout_line in iter(process.stdout.readline, ""):
# Display standard output data.
sys.stdout.write(stdout_line)
# Read standard error data.
for stderr_line in iter(process.stderr.readline, ""):
# Display standard error data.
sys.stderr.write(stderr_line)
# If the process is complete - exit loop.
if process.poll() != None:
break
My question is,
Q. Is there a recommended approach for capturing the output of a process using python?

First, your design is a bit silly, since you can do the same thing like this:
process = subprocess.Popen(
["python", "-h"],
bufsize=1,
stdout=sys.stdout,
stderr=sys.stderr
)
… or, even better:
process = subprocess.Popen(
["python", "-h"],
bufsize=1
)
However, I'll assume that's just a toy example, and you might want to do something more useful.
The main problem with your design is that it won't read anything from stderr until stdout is done.
Imagine you're driving an MP3 player that prints each track name to stdout, and logging info to stderr, and you want to play 10 songs. Do you really want to wait 30 minutes before displaying any of the logging to your users?
If that is acceptable, then you might as well just use communicate, which takes care of all of the headaches for you.
Plus, even if it's acceptable for your model, are you sure you can queue up that much unread data in the pipe without it blocking the child? On every platform?
Just breaking up the loop to alternate between the two won't help, because you could end up blocking on stdout.readline() for 5 minutes while stderr is piling up.
So that's why you need some way to read from both at once.
How do you read from two pipes at once?
This is the same problem (but smaller) as handling 1000 network clients at once, and it has the same solutions: threading, or multiplexing (and the various hybrids, like doing green threads on top of a multiplexor and event loop, or using a threaded proactor, etc.).
The best sample code for the threaded version is communicate from the 3.2+ source code. It's a little complicated, but if you want to handle all of the edge cases properly on both Windows and Unix there's really no avoiding a bit of complexity.
For multiplexing, you can use the select module, but keep in mind that this only works on Unix (you can't select on pipes on Windows), and it's buggy without 3.2+ (or the subprocess32 backport), and to really get all the edge cases right you need to add a signal handler to your select. Unless you really, really don't want to use threading, this is the harder answer.
But the easy answer is to use someone else's implementation. There are a dozen or more modules on PyPI specifically for async subprocesses. Alternatively, if you already have a good reason to write your app around an event loop, just about every modern event-loop-driven async networking library (including the stdlib's asyncio) includes subprocess support out of the box, that works on both Unix and Windows.
Is there a recommended approach for capturing the output of a process using python?
It depends on who you're asking; a thousand Python developers might have a thousand different answers… or at least half a dozen. If you're asking what the core devs would recommend, I can take a guess:
If you don't need to capture it asynchronously, use communicate (but make sure to upgrade to at least 3.2 for important bug fixes). If you do need to capture it asynchronously, use asyncio (which requires 3.4).

run external program in a loop with max time limit

I wish to have a python script that runs an external program in a loop sequentially. I also want to limit each execution of the program to a max running time. If it is exceeded, then kill the program. What is the best way to accomplish this?
Thanks!

To run an external program from Python you'll normally want to use the subprocess module.
You could "roll your own" subprocess handling using os.fork() and os.execve() (or one of its exec* cousins) ... with any file descriptor plumbing and signal handling magic you like. However, the subprocess.Popen() function has implemented and exposed most of the features for what you'd want to do for you.
To arrange for the program to die after a given period of time you can have your Python script kill it after the timeout. Naturally you'll want to check to see if the process has already completed before then. Here's a dirt stupid example (using the split function from the shlex module for additional readability:
from shlex import split as splitsh
import subprocess
import time
TIMEOUT=10
cmd = splitsh('/usr/bin/sleep 60')
proc = subprocess.Popen(cmd)
time.sleep(TIMEOUT)
pstatus = proc.poll()
if pstatus is None:
proc.kill()
# Could use os.kill() to send a specific signal
# such as HUP or TERM, check status again and
# then resort to proc.kill() or os.kill() for
# SIGKILL only if necessary
As noted there are a few ways to kill your subprocess. Note that I check for "is None" rather than testing pstatus for truth. If your process completed with an exit value of zero (conventionally indicating that no error occurred) then a naïve test of the proc.poll() results would conflate that completion with the still running process status.
There are also a few ways to determine if sufficient time has passed. In this example we sleep, which is somewhat silly if there's anything else we could be doing. That just leaves our Python process (the parent of your external program) laying about idle.
You could capture the start time using time.time() then launch your subprocess, then do other work (launch other subprocesses, for example) and check the time (perhaps in a loop of other activity) until your desired timeout has been exceeded.
If any of your other activity involves file or socket (network) operations then you'd want to consider using the select module as a way to return a list of file descriptors which are readable, writable or ready with "exceptional" events. The select.select() function also takes an optional "timeout" value. A call to select.select([],[],[],x) is essentially the same as time.sleep(x) (in the case where we aren't providing any file descriptors for it to select among).
In lieu of select.select() it's also possible to use the fcntl module to set your file descriptor into a non-blocking mode and then use os.read() (NOT the normal file object .read() methods, but the lower level functionality from the os module). Again it's better to use the higher level interfaces where possible and only to resort to the lower level functions when you must. (If you use non-blocking I/O then all your os.read() or similar operations must be done within exception handling blocks, since Python will represent the "-EWOULDBLOCK" condition as an OSError (exception) like: "OSError: [Errno 11] Resource temporarily unavailable" (Linux). The precise number of the error might vary from one OS to another. However, it should be portable (at least for POSIX systems) to use the -EWOULDBLOCK value from the errno module.
(I realize I'm going down a rathole here, but information on how your program can do something useful while your child processes are running external programs is a natural extension of how to manage the timeouts for them).
Ugly details about non-blocking file I/O (including portability issues with MS Windows) have been discussed here in the past: Stackoverflow: non-blocking read on a stream in Python
As others have commented, it's better to provide more detailed questions and include short, focused snippets of code which show what effort you've already undertaken. Usually you won't find people here inclined to write tutorials rather than answers.

If you are able to use Python 3.3
From docs,
subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False, timeout=None)
subprocess.call(["ls", "-l"])
0
subprocess.call("exit 1", shell=True)
1
Should do the trick.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.