Communicate multiple times with a subprocess in Python - python

This question is NOT a duplicate of
Communicate multiple times with a process without breaking the pipe?
That question is solved because its use case allows inputs to be sent together, but this is not true if your program is interactive (as illustrated in the use case here).
Document subprocess.Popen says:
communicate(input=None)
Interact with process: Send data to stdin. Read data from stdout
and stderr, until end-of-file is reached. Wait for process to
terminate. ...
Is it possible to communicate multiple times with the subprocess before its termination, like with a terminal or with a network socket?
For example, if the subprocess is bc, the parent process may want to send it different inputs for calculation as needed. Since inputs send to bc may depend on user inputs, it is not possible to send all inputs at once.

This turns out to be not very difficult:
proc = subprocess.Popen(['bc'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
os.write(proc.stdin.fileno(), b'100+200\n')
print(os.read(proc.stdout.fileno(), 4096))

Basicly Non-blocking read on a subprocess.PIPE in python
Set the proc pipes (proc.stdout, proc.stdin, ...) to nonblocking mode via fnctl and then write/read them directly.
You might want to use epoll or select via the select or io modules for more efficiency.

Related

Capture stderr in ProcessPoolExecutor

Im using ProcessPoolExecutor to execute external command is there any way to capture stderr when the process is finished (similar to subprocess)? Im capturing executor.submit() future result but that returns 0 or 1.
This might not be an answer but it is of that direction and far too long for a comment, so here goes.
I would say no to that. You might be able to achieve this by tinkering with stderr file descriptors, redirecting that to a stream of your own and returning this as the worker result but I am wondering if ProcessPoolExecutor is suitable for your task - of course not knowing what it is.
A subprocess created by a process pool does not finish like a subprocess created by yourself. It stays alive waiting for more work to arrive until you close the pool. If your worker produces stdout or stderr, they go to the same place where your main process directs its stdout and stderr.
Your workers will also process many different tasks. If your pool size is four and you submit ten tasks, how do you then decipher from a plain stderr capture which task created what message?
I have a hunch this needs to be redesigned. You would be able to raise exceptions in your workers and then later capture those from your future objects. Or it might be that your task is something where a pool is just not suitable. If subprocesses do what you want them to do, why not use them instead?
Pools are good for parallelising repetitive tasks that return and receive modest amount of data (implemented as queues that are not miracle performers), with a standard interface and standardised output/error handling. Pools simplify your code by providing the routine part. If your subtasks require different inputs, their outputs and error handling vary greatly or there is a lot of data to be transmitted, you might be better off by building the parallel processing part yourself.

Interprocess Communication between two python scripts without STDOUT

I am trying to create a Monitor script that monitors all the threads or a huge python script which has several loggers running, several thread running.
From Monitor.py i could run subprocess and forward the STDOUT which might contain my status of the threads.. but since several loggers are running i am seeing other logging in that..
Question: How can run the main script as a separate process and get custom messages, thread status without interfering with logging. ( passing PIPE as argument ? )
Main_Script.py * Runs Several Threads * Each Thread has separate Loggers.
Monitor.py * Spins up the Main_script.py * Monitors the each of the threads in MainScript.py ( may be obtain other messages from Main_script in the future)
So Far, I tried subprocess, process from Multiprocessing.
Subprocess lets me start the Main_script and forward the stdout back to monitor but I see the logging of threads coming in through the same STDOUT. I am using the “import logging “ Library to log the data from each threads to separate files.
I tried “process” from Multiprocessing. I had to call the main function of the main_script.py as a process and send a PIPE argument to it from monitor.py. Now I can’t see the Main_script.py as a separate process when I run top command.
Normally, you want to change the child process to work like a typical Unix userland tool: the logging and other side-band information goes to stderr (or to a file, or syslog, etc.), and only the actual output goes to stdout.
Then, the problem is easy: just capture stdout to a PIPE that you process, and either capture stderr to a different PIPE, or pass it through to real stderr.
If that's not appropriate for some reason, you need to come up with some other mechanism for IPC: Unix or Windows named pipes, anonymous pipes that you pass by leaking the file descriptor across the fork/exec and then pass the fd as an argument, Unix-domain sockets, TCP or UDP localhost sockets, a higher-level protocol like a web service on top of TCP sockets, mmapped files, anonymous mmaps or pipes that you pass between processes via a Unix-domain socket or Windows API calls, …
As you can see, there are a huge number of options. Without knowing anything about your problem other than that you want "custom messages", it's impossible to tell you which one you want.
While we're at it: If you can rewrite your code around multiprocessing rather than subprocess, there are nice high-level abstractions built in to that module. For example, you can use a Queue that automatically manages synchronization and blocking, and also manages pickling/unpickling so you can just pass any (picklable) object rather than having to worry about serializing to text and parsing the text. Or you can create shared memory holding arrays of int32 objects, or NumPy arrays, or arbitrary structures that you define with ctypes. And so on. Of course you could build the same abstractions yourself, without needing to use multiprocessing, but it's a lot easier when they're there out of the box.
Finally, while your question is tagged ipc and pipe, and titled "Interprocess Communication", your description refers to threads, not processes. If you actually are using a bunch of threads in a single process, you don't need any of this.
You can just stick your results on a queue.Queue, or store them in a list or deque with a Lock around it, or pass in a callback to be called with each new result, or use a higher-level abstraction like concurrent.futures.ThreadPoolExecutor and return a Future object or an iterator of Futures, etc.

Interleaving stdout from Popen with messages from ZMQ recv

Is there a best-practices approach to poll for the stdout/stderr from a subprocess.Popen as well as a zmq socket?
In my case, I have my main program spawning a Popen subprocess. The subprocess publishes messages via zmq which I then want to subscribe to in my main program.
Waiting on multiple zmq sockets is not complicated with the zmq.Poller but when I want to interleave this with the output from my subprocess itself, I am unsure how to do it in the best way without risking waits or having needless loops.
In the end, I would like to use it like so:
process = Popen([prog, '--publish-to', 'tcp://127.0.0.1:89890'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE, ...)
for (origin, data) in interleave(process, 'tcp://127.0.0.1:89890'):
if origin == 'STDOUT': pass
if origin == 'STDERR': pass
if origin == 'ZMQ': pass
prog --publish-to tcp://127.0.0.1:89890 will then open a zmq.PUB socket and publish data, whereas the interleave function will subscribe to this and also poll for stdout and stderr, yielding whatever data reaches it first.
I know how to define interleave with multiple daemon threads and queues but I don’t know if this approach might have some caveats with regards to lazy reading (ie. stdout might not be processed until the end of the program?) or other things that I have not yet thought about (seems also to be quite a bit of overhead for such a task).
I will be thankful for all ideas or insights.
I aim for at least Python 3.3/3.4 but if this turns out to be much easier with the new async/await tools, I could also use Python 3.5 for the code.
Use zmq.Poller: http://pyzmq.readthedocs.io/en/latest/api/zmq.html#polling. You can register zmq sockets and native file descriptors (e.g. process.stdout.fileno() and process.stderr.fileno()) there, and it will wait until input is available on at least one of the registered sources.
I don't know if it works in Windows, you should try.

Should I use Popen's wait or communicate to read stdout in subprocess in Python 3? [duplicate]

This question already has answers here:
read subprocess stdout line by line
(10 answers)
Closed 6 years ago.
I am trying to run a subprocess in Python 3 and constantly read the output.
In the documentation for subprocess in Python 3 I see the following:
Popen.wait(timeout=None)
Wait for child process to terminate. Set and return returncode attribute.
Warning This will deadlock when using stdout=PIPE and/or stderr=PIPE
and the child process generates enough output to a pipe such that it
blocks waiting for the OS pipe buffer to accept more data. Use
communicate() to avoid that.
Which makes me think I should use communicate as the amount of data from stdout is quite large. However, reading the documentation again shows this:
Popen.communicate(input=None, timeout=None)...
Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached.
Note The data read is buffered in memory, so do not use this method if the data size is large or
unlimited.
So again, it seems like there are problems with reading starndard out from subprocesses this way. Can someone please tell me the best / safest way to run a subprocess and read all of its (potentially large amount) of stdout?
I think you should use communicate. The message warns you about performance issues with the default behaviour of the method. In fact, there's a buffer size parameter to the popen constructor that can be tuned to improve a lot performance for large data size.
I hope it will help :)

What is the best way to capture output from a process using python?

I am using python's subprocess module to start a new process. I would like to capture the output of the new process in real time so I can do things with it (display it, parse it, etc.). I have seen many examples of how this can be done, some use custom file-like objects, some use threading and some attempt to read the output until the process has completed.
File Like Objects Example (click me)
I would prefer not to use custom file-like objects because I want to allow users to supply their own values for stdin, stdout and stderr.
Threading Example (click me)
I do not really understand why threading is required so I am reluctant to follow this example. If someone can explain why the threading example makes sense I would be happy listen. However, this example also restricts users from supplying their own stdout and stderr values.
Read Output Example (see below)
The example which makes the most sense to me is to read the stdout, stderr until the process has finished. Here is some example code:
import subprocess
# Start a process which prints the options to the python program.
process = subprocess.Popen(
["python", "-h"],
bufsize=1,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
# While the process is running, display the output to the user.
while True:
# Read standard output data.
for stdout_line in iter(process.stdout.readline, ""):
# Display standard output data.
sys.stdout.write(stdout_line)
# Read standard error data.
for stderr_line in iter(process.stderr.readline, ""):
# Display standard error data.
sys.stderr.write(stderr_line)
# If the process is complete - exit loop.
if process.poll() != None:
break
My question is,
Q. Is there a recommended approach for capturing the output of a process using python?
First, your design is a bit silly, since you can do the same thing like this:
process = subprocess.Popen(
["python", "-h"],
bufsize=1,
stdout=sys.stdout,
stderr=sys.stderr
)
… or, even better:
process = subprocess.Popen(
["python", "-h"],
bufsize=1
)
However, I'll assume that's just a toy example, and you might want to do something more useful.
The main problem with your design is that it won't read anything from stderr until stdout is done.
Imagine you're driving an MP3 player that prints each track name to stdout, and logging info to stderr, and you want to play 10 songs. Do you really want to wait 30 minutes before displaying any of the logging to your users?
If that is acceptable, then you might as well just use communicate, which takes care of all of the headaches for you.
Plus, even if it's acceptable for your model, are you sure you can queue up that much unread data in the pipe without it blocking the child? On every platform?
Just breaking up the loop to alternate between the two won't help, because you could end up blocking on stdout.readline() for 5 minutes while stderr is piling up.
So that's why you need some way to read from both at once.
How do you read from two pipes at once?
This is the same problem (but smaller) as handling 1000 network clients at once, and it has the same solutions: threading, or multiplexing (and the various hybrids, like doing green threads on top of a multiplexor and event loop, or using a threaded proactor, etc.).
The best sample code for the threaded version is communicate from the 3.2+ source code. It's a little complicated, but if you want to handle all of the edge cases properly on both Windows and Unix there's really no avoiding a bit of complexity.
For multiplexing, you can use the select module, but keep in mind that this only works on Unix (you can't select on pipes on Windows), and it's buggy without 3.2+ (or the subprocess32 backport), and to really get all the edge cases right you need to add a signal handler to your select. Unless you really, really don't want to use threading, this is the harder answer.
But the easy answer is to use someone else's implementation. There are a dozen or more modules on PyPI specifically for async subprocesses. Alternatively, if you already have a good reason to write your app around an event loop, just about every modern event-loop-driven async networking library (including the stdlib's asyncio) includes subprocess support out of the box, that works on both Unix and Windows.
Is there a recommended approach for capturing the output of a process using python?
It depends on who you're asking; a thousand Python developers might have a thousand different answers… or at least half a dozen. If you're asking what the core devs would recommend, I can take a guess:
If you don't need to capture it asynchronously, use communicate (but make sure to upgrade to at least 3.2 for important bug fixes). If you do need to capture it asynchronously, use asyncio (which requires 3.4).

Categories