Twisted, how can ProcessProtocol receive stdout w/o buffering? - python

I'm using external process which writes short line of output for each chunk of data processed. I would like to react after each of these lines without any additional delay. However, seems that .outReceived() of ProcessProtocol is buffered. Docs state:
.outReceived(data): This is called with data that was received from
the process' stdout pipe. Pipes tend to provide data in larger
chunks than sockets (one kilobyte is a common buffer size), so you
may not experience the "random dribs and drabs" behavior typical of
network sockets, but regardless you should be prepared to deal if you
don't get all your data in a single call. To do it properly,
outReceived ought to simply accumulate the data and put off doing
anything with it until the process has finished.
The result is, that I get output in one chunk after whole processing is done. How can I force ProcessProtocol not to buffer stdout?

I'm using external process which writes short line of output for each chunk of data processed. I would like to react after each of these lines without any additional delay.
The result is, that I get output in one chunk after whole processing is done. How can I force ProcessProtocol not to buffer stdout?
The buffering is happening in the producer process, not the consumer. Standard C library stdout is line-buffered only when connected to a terminal, otherwise it is fully-buffered. This is what causes the producer process to output data in large chunks rather than line by line when it is not connected to a terminal.
Use stdbuf utility to force producer process' stdout to be line-buffered.
If the producer process is a python script use -u python interpreter switch to completely turn off buffering of the standard streams. stdbuf utility is better though.

Related

Interprocess Communication between two python scripts without STDOUT

I am trying to create a Monitor script that monitors all the threads or a huge python script which has several loggers running, several thread running.
From Monitor.py i could run subprocess and forward the STDOUT which might contain my status of the threads.. but since several loggers are running i am seeing other logging in that..
Question: How can run the main script as a separate process and get custom messages, thread status without interfering with logging. ( passing PIPE as argument ? )
Main_Script.py * Runs Several Threads * Each Thread has separate Loggers.
Monitor.py * Spins up the Main_script.py * Monitors the each of the threads in MainScript.py ( may be obtain other messages from Main_script in the future)
So Far, I tried subprocess, process from Multiprocessing.
Subprocess lets me start the Main_script and forward the stdout back to monitor but I see the logging of threads coming in through the same STDOUT. I am using the “import logging “ Library to log the data from each threads to separate files.
I tried “process” from Multiprocessing. I had to call the main function of the main_script.py as a process and send a PIPE argument to it from monitor.py. Now I can’t see the Main_script.py as a separate process when I run top command.
Normally, you want to change the child process to work like a typical Unix userland tool: the logging and other side-band information goes to stderr (or to a file, or syslog, etc.), and only the actual output goes to stdout.
Then, the problem is easy: just capture stdout to a PIPE that you process, and either capture stderr to a different PIPE, or pass it through to real stderr.
If that's not appropriate for some reason, you need to come up with some other mechanism for IPC: Unix or Windows named pipes, anonymous pipes that you pass by leaking the file descriptor across the fork/exec and then pass the fd as an argument, Unix-domain sockets, TCP or UDP localhost sockets, a higher-level protocol like a web service on top of TCP sockets, mmapped files, anonymous mmaps or pipes that you pass between processes via a Unix-domain socket or Windows API calls, …
As you can see, there are a huge number of options. Without knowing anything about your problem other than that you want "custom messages", it's impossible to tell you which one you want.
While we're at it: If you can rewrite your code around multiprocessing rather than subprocess, there are nice high-level abstractions built in to that module. For example, you can use a Queue that automatically manages synchronization and blocking, and also manages pickling/unpickling so you can just pass any (picklable) object rather than having to worry about serializing to text and parsing the text. Or you can create shared memory holding arrays of int32 objects, or NumPy arrays, or arbitrary structures that you define with ctypes. And so on. Of course you could build the same abstractions yourself, without needing to use multiprocessing, but it's a lot easier when they're there out of the box.
Finally, while your question is tagged ipc and pipe, and titled "Interprocess Communication", your description refers to threads, not processes. If you actually are using a bunch of threads in a single process, you don't need any of this.
You can just stick your results on a queue.Queue, or store them in a list or deque with a Lock around it, or pass in a callback to be called with each new result, or use a higher-level abstraction like concurrent.futures.ThreadPoolExecutor and return a Future object or an iterator of Futures, etc.

Communicate multiple times with a subprocess in Python

This question is NOT a duplicate of
Communicate multiple times with a process without breaking the pipe?
That question is solved because its use case allows inputs to be sent together, but this is not true if your program is interactive (as illustrated in the use case here).
Document subprocess.Popen says:
communicate(input=None)
Interact with process: Send data to stdin. Read data from stdout
and stderr, until end-of-file is reached. Wait for process to
terminate. ...
Is it possible to communicate multiple times with the subprocess before its termination, like with a terminal or with a network socket?
For example, if the subprocess is bc, the parent process may want to send it different inputs for calculation as needed. Since inputs send to bc may depend on user inputs, it is not possible to send all inputs at once.
This turns out to be not very difficult:
proc = subprocess.Popen(['bc'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
os.write(proc.stdin.fileno(), b'100+200\n')
print(os.read(proc.stdout.fileno(), 4096))
Basicly Non-blocking read on a subprocess.PIPE in python
Set the proc pipes (proc.stdout, proc.stdin, ...) to nonblocking mode via fnctl and then write/read them directly.
You might want to use epoll or select via the select or io modules for more efficiency.

Interleaving stdout from Popen with messages from ZMQ recv

Is there a best-practices approach to poll for the stdout/stderr from a subprocess.Popen as well as a zmq socket?
In my case, I have my main program spawning a Popen subprocess. The subprocess publishes messages via zmq which I then want to subscribe to in my main program.
Waiting on multiple zmq sockets is not complicated with the zmq.Poller but when I want to interleave this with the output from my subprocess itself, I am unsure how to do it in the best way without risking waits or having needless loops.
In the end, I would like to use it like so:
process = Popen([prog, '--publish-to', 'tcp://127.0.0.1:89890'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE, ...)
for (origin, data) in interleave(process, 'tcp://127.0.0.1:89890'):
if origin == 'STDOUT': pass
if origin == 'STDERR': pass
if origin == 'ZMQ': pass
prog --publish-to tcp://127.0.0.1:89890 will then open a zmq.PUB socket and publish data, whereas the interleave function will subscribe to this and also poll for stdout and stderr, yielding whatever data reaches it first.
I know how to define interleave with multiple daemon threads and queues but I don’t know if this approach might have some caveats with regards to lazy reading (ie. stdout might not be processed until the end of the program?) or other things that I have not yet thought about (seems also to be quite a bit of overhead for such a task).
I will be thankful for all ideas or insights.
I aim for at least Python 3.3/3.4 but if this turns out to be much easier with the new async/await tools, I could also use Python 3.5 for the code.
Use zmq.Poller: http://pyzmq.readthedocs.io/en/latest/api/zmq.html#polling. You can register zmq sockets and native file descriptors (e.g. process.stdout.fileno() and process.stderr.fileno()) there, and it will wait until input is available on at least one of the registered sources.
I don't know if it works in Windows, you should try.

Should I use Popen's wait or communicate to read stdout in subprocess in Python 3? [duplicate]

This question already has answers here:
read subprocess stdout line by line
(10 answers)
Closed 6 years ago.
I am trying to run a subprocess in Python 3 and constantly read the output.
In the documentation for subprocess in Python 3 I see the following:
Popen.wait(timeout=None)
Wait for child process to terminate. Set and return returncode attribute.
Warning This will deadlock when using stdout=PIPE and/or stderr=PIPE
and the child process generates enough output to a pipe such that it
blocks waiting for the OS pipe buffer to accept more data. Use
communicate() to avoid that.
Which makes me think I should use communicate as the amount of data from stdout is quite large. However, reading the documentation again shows this:
Popen.communicate(input=None, timeout=None)...
Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached.
Note The data read is buffered in memory, so do not use this method if the data size is large or
unlimited.
So again, it seems like there are problems with reading starndard out from subprocesses this way. Can someone please tell me the best / safest way to run a subprocess and read all of its (potentially large amount) of stdout?
I think you should use communicate. The message warns you about performance issues with the default behaviour of the method. In fact, there's a buffer size parameter to the popen constructor that can be tuned to improve a lot performance for large data size.
I hope it will help :)

Do python thread wait for standard output?

if you run couple of threads but they all have to print to the same stdout, does this mean they have to wait on each other? so say if all 4 threads have something to write, they have to pause and wait for the stdout to be free so they can get on with their work?
Deep deep (deep deep deep...) down in the OS's system calls, yes. Modern OSes have thread-safe terminal printing routines which usually just lock around the critical sections that do the actual device access (or buffer, depending on what you're writing into and what its settings are). These waits are very short, however. Keep in mind that this is IO you're dealing with here, so the wait times are likely to be negligible relatively to actual IO execution.
It depends. If stdout is a pipe, each pipe gets a 4KB buffer which you can override when the pipe is created. Buffers are flushed when the buffer is full or with a call to flush().
If stdout is a terminal, output is usually line buffered. So until you print a newline, all threads can write to their buffers. When the newline is written, the whole buffer is dumped on the console and all other threads that are writing newlines at the same time have to wait.
Since threads do other things than writing newlines, each thread gets some CPU. So even in the worst case, the congestion should be pretty small.
There is one exception, though: If you write a lot of data or if the console is slow (like the Linux kernel debug console which uses the serial port). When the console can't cope with the amount of data, more and more threads will hang in the write of the newline waiting for the buffers to flush.

Categories