Read from pty without endless hanging

Read from pty without endless hanging - python

I have a script, that prints colored output if it is on tty. A bunch of them executes in parallel, so I can't put their stdout to tty. I don't have control over the script code either (to force coloring), so I want to fake it via pty. My code:
invocation = get_invocation()
master, slave = pty.openpty()
subprocess.call(invocation, stdout=slave)
print string_from_fd(master)
And I can't figure out, what should be in string_from_fd. For now, I have something like
def string_from_fd(fd):
return os.read(fd, 1000)
It works, but that number 1000 looks strange . I think output can be quiet large, and any number there could be not sufficient. I tried a lot of solutions from stack overflow, but none of them works (it prints nothing or hanging forever).
I am not very familiar with file descriptors and all that, so any clarification if I'm doing something wrong would be much appreciated.
Thanks!

This won't work for long outputs: subprocess.call will block once the PTY's buffer is full. That's why subprocess.communicate exists, but that won't work with a PTY.
The standard/easiest solution is to use the external module pexpect, which uses PTYs internally: For example,
pexpect.spawn("/bin/ls --color=auto").read()
will give you the ls output with color codes.
If you'd like to stick to subprocess, then you must use subprocess.Popen for the reason stated above. You are right in your assumption that by passing 1000, you read at most 1000 bytes, so you'll have to use a loop. os.read blocks if there is nothing to read and waits for data to appear. The catch is how to recognize when the process terminated: In this case, you know that no more data will arrive. The next call to os.read will block forever. Luckily, the operating system helps you detect this situation: If all file descriptors to the pseudo terminal that could be used for writing are closed, then os.read will either return an empty string or return an error, depending on the OS. You can check for this condition and exit the loop when this happens. Now the final piece to understanding the following code is to understand how open file descriptors and subprocess go together: subprocess.Popen internally calls fork(), which duplicates the current process including all open file descriptors, and then within one of the two execution paths calls exec(), which terminates the current process in favour of a new one. In the other execution path, control returns to your Python script. So after calling subprocess.Popen there are two valid file descriptors for the slave end of the PTY: One belongs to the spawned process, one to your Python script. If you close yours, then the only file descriptor that could be used to send data to the master end belongs to the spawned process. Upon its termination, it is closed, and the PTY enters the state where calls to read on the master end fail.
Here's the code:
import os
import pty
import subprocess
master, slave = pty.openpty()
process = subprocess.Popen("/bin/ls --color", shell=True, stdout=slave,
stdin=slave, stderr=slave, close_fds=True)
os.close(slave)
output = []
while True:
try:
data = os.read(master, 1024)
except OSError:
break
if not data:
break
output.append(data) # In Python 3, append ".decode()" to os.read()
output = "".join(output)

Related

Writing input to a process opened with Popen

I have a program called my_program that operates a system. the program runs on Linux, and I'm trying to automate it using Python.
my_program is constantly generating output and is suppose to receive input and respond to it.
When I'm running my_program in bash it does work like it should, I receive a constant output from the program and when I press a certain sequence (for instance /3 to change the mode of the system), the program responds with an output.
to start the process I am using:
self.process = Popen(my_program,stdin=PIPE,stdout=PIPE,text=True)
And in order to write input to the system I am using:
self.process.stdin.write('/3')
But the writing does not seem to work, I also tried using:
self.process.communicate('/3)
But since my system constantly generating output, it deadlooks the process and the whole program gets stuck.
Any solution for writing to a process that is constantly generating output?
Edit:
I don't think I can provide a code that can reproduce the problem because I'm using a unique SW that my company has, but it goes somthing like this:
self.process = Popen(my_program,stdin=PIPE,stdout=PIPE,text=True)
self.process.stdin.write('/3')
# try to find a specific string that indicated that the input string was received
string_received = False
while(string_received = False):
response = self.process.stdout.readline().strip()
if (response == expected_string):
break

The operating system implements buffered I/O between processes unless you specifically request otherwise.
In very brief, the output buffer will be flushed and written when it fills up, or (with default options) when you write a newline.
You can disable buffering when you create the Popen object:
self.process = Popen(my_program, stdin=PIPE, stdout=PIPE, text=True, bufsize=1)
... or you can explicitly flush() the file handle when you want to force writing.
self.process.stdin.flush()
However, as the documentation warns you, if you can't predict when the subprocess can read and when it can write, you can easily end up in deadlock. A more maintainable solution might be to run the subprocess via pexpect or similar.

Reading output from child process using python

The Context
I am using the subprocess module to start a process from python. I want to be able to access the output (stdout, stderr) as soon as it is written/buffered.
The solution must support Windows 7. I require a solution for Unix systems too but I suspect the Windows case is more difficult to solve.
The solution should support Python 2.6. I am currently restricted to Python 2.6 but solutions using later versions of Python are still appreciated.
The solution should not use third party libraries. Ideally I would love a solution using the standard library but I am open to suggestions.
The solution must work for just about any process. Assume there is no control over the process being executed.
The Child Process
For example, imagine I want to run a python file called counter.py via a subprocess. The contents of counter.py is as follows:
import sys
for index in range(10):
# Write data to standard out.
sys.stdout.write(str(index))
# Push buffered data to disk.
sys.stdout.flush()
The Parent Process
The parent process responsible for executing the counter.py example is as follows:
import subprocess
command = ['python', 'counter.py']
process = subprocess.Popen(
cmd,
bufsize=1,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
The Issue
Using the counter.py example I can access the data before the process has completed. This is great! This is exactly what I want. However, removing the sys.stdout.flush() call prevents the data from being accessed at the time I want it. This is bad! This is exactly what I don't want. My understanding is that the flush() call forces the data to be written to disk and before the data is written to disk it exists only in a buffer. Remember I want to be able to run just about any process. I do not expect the process to perform this kind of flushing but I still expect the data to be available in real time (or close to it). Is there a way to achieve this?
A quick note about the parent process. You may notice I am using bufsize=0 for line buffering. I was hoping this would cause a flush to disk for every line but it doesn't seem to work that way. How does this argument work?
You will also notice I am using subprocess.PIPE. This is because it appears to be the only value which produces IO objects between the parent and child processes. I have come to this conclusion by looking at the Popen._get_handles method in the subprocess module (I'm referring to the Windows definition here). There are two important variables, c2pread and c2pwrite which are set based on the stdout value passed to the Popen constructor. For instance, if stdout is not set, the c2pread variable is not set. This is also the case when using file descriptors and file-like objects. I don't really know whether this is significant or not but my gut instinct tells me I would want both read and write IO objects for what I am trying to achieve - this is why I chose subprocess.PIPE. I would be very grateful if someone could explain this in more detail. Likewise, if there is a compelling reason to use something other than subprocess.PIPE I am all ears.
Method For Retrieving Data From The Child Process
import time
import subprocess
import threading
import Queue
class StreamReader(threading.Thread):
"""
Threaded object used for reading process output stream (stdout, stderr).
"""
def __init__(self, stream, queue, *args, **kwargs):
super(StreamReader, self).__init__(*args, **kwargs)
self._stream = stream
self._queue = queue
# Event used to terminate thread. This way we will have a chance to
# tie up loose ends.
self._stop = threading.Event()
def stop(self):
"""
Stop thread. Call this function to terminate the thread.
"""
self._stop.set()
def stopped(self):
"""
Check whether the thread has been terminated.
"""
return self._stop.isSet()
def run(self):
while True:
# Flush buffered data (not sure this actually works?)
self._stream.flush()
# Read available data.
for line in iter(self._stream.readline, b''):
self._queue.put(line)
# Breather.
time.sleep(0.25)
# Check whether thread has been terminated.
if self.stopped():
break
cmd = ['python', 'counter.py']
process = subprocess.Popen(
cmd,
bufsize=1,
stdout=subprocess.PIPE,
)
stdout_queue = Queue.Queue()
stdout_reader = StreamReader(process.stdout, stdout_queue)
stdout_reader.daemon = True
stdout_reader.start()
# Read standard out of the child process whilst it is active.
while True:
# Attempt to read available data.
try:
line = stdout_queue.get(timeout=0.1)
print '%s' % line
# If data was not read within time out period. Continue.
except Queue.Empty:
# No data currently available.
pass
# Check whether child process is still active.
if process.poll() != None:
# Process is no longer active.
break
# Process is no longer active. Nothing more to read. Stop reader thread.
stdout_reader.stop()
Here I am performing the logic which reads standard out from the child process in a thread. This allows for the scenario in which the read is blocking until data is available. Instead of waiting for some potentially long period of time, we check whether there is available data, to be read within a time out period, and continue looping if there is not.
I have also tried another approach using a kind of non-blocking read. This approach uses the ctypes module to access Windows system calls. Please note that I don't fully understand what I am doing here - I have simply tried to make sense of some example code I have seen in other posts. In any case, the following snippet doesn't solve the buffering issue. My understanding is that it's just another way to combat a potentially long read time.
import os
import subprocess
import ctypes
import ctypes.wintypes
import msvcrt
cmd = ['python', 'counter.py']
process = subprocess.Popen(
cmd,
bufsize=1,
stdout=subprocess.PIPE,
)
def read_output_non_blocking(stream):
data = ''
available_bytes = 0
c_read = ctypes.c_ulong()
c_available = ctypes.c_ulong()
c_message = ctypes.c_ulong()
fileno = stream.fileno()
handle = msvcrt.get_osfhandle(fileno)
# Read available data.
buffer_ = None
bytes_ = 0
status = ctypes.windll.kernel32.PeekNamedPipe(
handle,
buffer_,
bytes_,
ctypes.byref(c_read),
ctypes.byref(c_available),
ctypes.byref(c_message),
)
if status:
available_bytes = int(c_available.value)
if available_bytes > 0:
data = os.read(fileno, available_bytes)
print data
return data
while True:
# Read standard out for child process.
stdout = read_output_non_blocking(process.stdout)
print stdout
# Check whether child process is still active.
if process.poll() != None:
# Process is no longer active.
break
Comments are much appreciated.
Cheers

At issue here is buffering by the child process. Your subprocess code already works as well as it could, but if you have a child process that buffers its output then there is nothing that subprocess pipes can do about this.
I cannot stress this enough: the buffering delays you see are the responsibility of the child process, and how it handles buffering has nothing to do with the subprocess module.
You already discovered this; this is why adding sys.stdout.flush() in the child process makes the data show up sooner; the child process uses buffered I/O (a memory cache to collect written data) before sending it down the sys.stdout pipe 1.
Python automatically uses line-buffering when sys.stdout is connected to a terminal; the buffer flushes whenever a newline is written. When using pipes, sys.stdout is not connected to a terminal and a fixed-size buffer is used instead.
Now, the Python child process can be told to handle buffering differently; you can set an environment variable or use a command-line switch to alter how it uses buffering for sys.stdout (and sys.stderr and sys.stdin). From the Python command line documentation:
-u
Force stdin, stdout and stderr to be totally unbuffered. On systems where it matters, also put stdin, stdout and stderr in binary mode.
[...]
PYTHONUNBUFFERED
If this is set to a non-empty string it is equivalent to specifying the -u option.
If you are dealing with child processes that are not Python processes and you experience buffering issues with those, you'll need to look at the documentation of those processes to see if they can be switched to use unbuffered I/O, or be switched to more desirable buffering strategies.
One thing you could try is to use the script -c command to provide a pseudo-terminal to a child process. This is a POSIX tool, however, and is probably not available on Windows.
1. It should be noted that when flushing a pipe, no data is 'written to disk'; all data remains entirely in memory here. I/O buffers are just memory caches to get the best performance out of I/O by handling data in larger chunks. Only if you have a disk-based file object would fileobj.flush() cause it to push any buffers to the OS, which usually means that data is indeed written to disk.

expect has a command called 'unbuffer':
http://expect.sourceforge.net/example/unbuffer.man.html
that will disable buffering for any command

Python Multiprocessing - sending inputs to child processes

I am using the multiprocessing module in python to launch few processes in parallel. These processes are independent of each other. They generate their own output and write out the results in different files. Each process calls an external tool using the subprocess.call method.
It was working fine until I discovered an issue in the external tool where due to some error condition it goes into a 'prompt' mode and waits for the user input. Now in my python script I use the join method to wait till all the processes finish their tasks. This is causing the whole thing to wait for this erroneous subprocess call. I can put a timeout for each of the process but I do not know in advance how long each one is going to run and hence this option is ruled out.
How do I figure out if any child process is waiting for an user input and how do I send an 'exit' command to it? Any pointers or suggestions to relevant modules in python will be really appreciated.
My code here:
import subprocess
import sys
import os
import multiprocessing
def write_script(fname,e):
f = open(fname,'w')
f.write("Some useful cammnd calling external tool")
f.close()
subprocess.call(['chmod','+x',os.path.abspath(fname)])
return os.path.abspath(fname)
def run_use(mname,script):
print "ssh "+mname+" "+script
subprocess.call(['ssh',mname,script])
if __name__ == '__main__':
dict1 = {}
dict['mod1'] = ['pp1','ext2','les3','pw4']
dict['mod2'] = ['aaa','bbb','ccc','ddd']
machines = ['machine1','machine2','machine3','machine4']
log_file.write(str(dict1.keys()))
for key in dict1.keys():
arr = []
for mod in dict1[key]:
d = {}
arr.append(mod)
if ((mod == dict1[key][-1]) | (len(arr)%4 == 0)):
for i in range(0,len(arr)):
e = arr.pop()
script = write_script(e+"_temp.sh",e)
d[i] = multiprocessing.Process(target=run_use,args=(machines[i],script,))
d[i].daemon = True
for pp in d:
d[pp].start()
for pp in d:
d[pp].join()

Since you're writing a shell script to run your subcommands, can you simply tell them to read input from /dev/null?
#!/bin/bash
# ...
my_other_command -a -b arg1 arg2 < /dev/null
# ...
This may stop them blocking on input and is a really simple solution. If this doesn't work for you, read on for some other options.
The subprocess.call() function is simply shorthand for constructing a subprocess.Popen instance and then calling the wait() method on it. So, your spare processes could instead create their own subprocess.Popen instances and poll them with poll() method on the object instead of wait() (in a loop with a suitable delay). This leaves them free to remain in communication with the main process so you can, for example, allow the main process to tell the child process to terminate the Popen instance with the terminate() or kill() methods and then itself exit.
So, the question is how does the child process tell whether the subprocess is awaiting user input, and that's a trickier question. I would say perhaps the easiest approach is to monitor the output of the subprocess and search for the user input prompt, assuming that it always uses some string that you can look for. Alternatively, if the subprocess is expected to generate output continually then you could simply look for any output and if a configured amount of time goes past without any output then you declare that process dead and terminate it as detailed above.
Since you're reading the output, actually you don't need poll() or wait() - the process closing its output file descriptor is good enough to know that it's terminated in this case.
Here's an example of a modified run_use() method which watches the output of the subprocess:
def run_use(mname,script):
print "ssh "+mname+" "+script
proc = subprocess.Popen(['ssh',mname,script], stdout=subprocess.PIPE)
for line in proc.stdout:
if "UserPrompt>>>" in line:
proc.terminate()
break
In this example we assume that the process either gets hung on on UserPrompt>>> (replace with the appropriate string) or it terminates naturally. If it were to get stuck in an infinite loop, for example, then your script would still not terminate - you can only really address that with an overall timeout, but you didn't seem keen to do that. Hopefully your subprocess won't misbehave in that way, however.
Finally, if you don't know in advance the prompt that will be giving from your process then your job is rather harder. Effectively what you're asking to do is monitor an external process and know when it's blocked reading on a file descriptor, and I don't believe there's a particularly clean solution to this. You could consider running a process under strace or similar, but that's quite an awful hack and I really wouldn't recommend it. Things like strace are great for manual diagnostics, but they really shouldn't be part of a production setup.

About piping stdio and subprocess.Popen

I have one Python program, that is opening another Python program via subprocess.Popen. The 1st is supposed to output some text into the console (just for info), and write some text to the 2nd program it had spawned. Then, it should wait for the 2nd program to respond (read() from it), and print that response.
The 2nd one is supposed to listen to the first one's input (via raw_input()) and then print text to the 1st.
To understand what exactly was happening, I had put a 5 second delay into the 2nd, and the result surprised me a bit.
Here's the code:
import subprocess
print "1st starting."
app = subprocess.Popen("name", shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE) #<--- B
print "Writing something to app's STDIN..."
app.stdin.write(some_text)
print "Reading something from my STDIN..." #<--- A
result = app.stdout.read()
print "Result:"
print result
And for the 2nd one:
import time
print "app invoked."
print "Waiting for text from STDIN..."
text = raw_input()
#process(text)
time.sleep(5)
print "magic"
When I ran this code, it paused at point A, as that was the last console output.
After 5 seconds, the "Result:\n" line would be outputted, and everything the 2nd program had printed would show up in the console.
Why did the 1st program pause when reading the stdout of the 2nd one? Does it have to wait for its child to terminate before reading its output? How can this be changed so I can pass messages between programs?
I'm running Debian Linux 7.0.

The answer lies not in any magic related to the subprocess module, but in the typical behaviour of the read() method on Python objects.
If you run this:
import subprocess
p = subprocess.Popen(['ls'], stdout=subprocess.PIPE)
help(p.stdout.read)
You'll see this:
read(...)
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
(END)
The same thing applies to all file-like objects. It's very simple: calling read() with no argument consumes the buffer until it encounters an error (usually EOF).
EOF is not sent until either:
the subprocess calls sys.stdout.close(), or
the subprocess exits and the Python runtime and/or OS kernel clean up its file descriptors
Beware that os.read has different behaviour - much more like typical buffered I/O in C. The built-in Python help function is useless, but if you're on any UNIXy system you should be able to run man 3 read; the Python behaviour more or less matches what's there.
A word of warning
The program above is fine, but patterns like that sometimes lead to a deadlock. The docs for the subprocess module warns about this where Popen.wait() is documented:
Warning
This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.
It's possible to get in a similar situation if you're not careful during two-way communication with a subprocess, depending on what the subprocess is doing.
edit:
By the way, this page covers the behaviour of pipes with EOF:
If all file descriptors referring to the write end of a pipe have been
closed, then an attempt to read(2) from the pipe will see end-of-file
(read(2) will return 0).
edit 2:
As Lennart mentined above, if you want truly two-way communication that goes beyond write-once read-once, you'll also need to beware of buffering. If you read this you'll get some idea of it, but you should be aware that this is how buffered IO almost always works in UNIX-based systems - it's not a Python quirk. Run man stdio.h for more information.

You are asking program 1 to read input from program 2. And you are pausing program two for five seconds before it outputs anything. Obviously program 1 then needs to wait those five seconds. So what happens is perfectly expected.
Does it have to wait for its child to terminate before reading its output?
To some extent, yes, because input and output is buffered, so it's possible that even if you move the delay to after you print something the same will happen.
raw_input() will wait for a linefeed, in any case.

Best way to fork multiple shell commands/processes in Python?

Most of the examples I've seen with os.fork and the subprocess/multiprocessing modules show how to fork a new instance of the calling python script or a chunk of python code. What would be the best way to spawn a set of arbitrary shell command concurrently?
I suppose, I could just use subprocess.call or one of the Popen commands and pipe the output to a file, which I believe will return immediately, at least to the caller. I know this is not that hard to do, I'm just trying to figure out the simplest, most Pythonic way to do it.
Thanks in advance

All calls to subprocess.Popen return immediately to the caller. It's the calls to wait and communicate which block. So all you need to do is spin up a number of processes using subprocess.Popen (set stdin to /dev/null for safety), and then one by one call communicate until they're all complete.
Naturally I'm assuming you're just trying to start a bunch of unrelated (i.e. not piped together) commands.

I like to use PTYs instead of pipes. For a bunch of processes where I only want to capture error messages I did this.
RNULL = open('/dev/null', 'r')
WNULL = open('/dev/null', 'w')
logfile = open("myprocess.log", "a", 1)
REALSTDERR = sys.stderr
sys.stderr = logfile
This next part was in a loop spawning about 30 processes.
sys.stderr = REALSTDERR
master, slave = pty.openpty()
self.subp = Popen(self.parsed, shell=False, stdin=RNULL, stdout=WNULL, stderr=slave)
sys.stderr = logfile
After this I had a select loop which collected any error messages and sent them to the single log file. Using PTYs meant that I never had to worry about partial lines getting mixed up because the line discipline provides simple framing.

There is no best for all possible circumstances. The best depends on the problem at hand.
Here's how to spawn a process and save its output to a file combining stdout/stderr:
import subprocess
import sys
def spawn(cmd, output_file):
on_posix = 'posix' in sys.builtin_module_names
return subprocess.Popen(cmd, close_fds=on_posix, bufsize=-1,
stdin=open(os.devnull,'rb'),
stdout=output_file,
stderr=subprocess.STDOUT)
To spawn multiple processes that can run in parallel with your script and each other:
processes, files = [], []
try:
for i, cmd in enumerate(commands):
files.append(open('out%d' % i, 'wb'))
processes.append(spawn(cmd, files[-1]))
finally:
for p in processes:
p.wait()
for f in files:
f.close()
Note: cmd is a list everywhere.

I suppose, I could just us subprocess.call or one of the Popen
commands and pipe the output to a file, which I believe will return
immediately, at least to the caller.
That's not a good way to do it if you want to process the data.
In this case, better do
sp = subprocess.Popen(['ls', '-l'], stdout=subprocess.PIPE)
and then sp.communicate() or read directly from sp.stdout.read().
If the data shall be processed in the calling program at a later time, there are two ways to go:
You can retrieve the data ASAP, maybe via a separate thread, reading them and storing them somewhere where the consumer can get them.
You can have the producing subprocess have block and retrieve the data from it when you need them. The subprocess produces as many data as fit in the pipe buffer (usually 64 kiB) and then blocks on further writes. As soon as you need the data, you read() from the subprocess object's stdout (maybe stderr as well) and use them - or, again, you use sp.communicate() at that later time.
Way 1 would the way to go if producing the data needs much time, so that your wprogram would have to wait.
Way 2 would be to be preferred if the size of the data is quite huge and/or the data is produced so fast that buffering would make no sense.

See an older answer of mine including code snippets to do:
Uses processes not threads for blocking I/O because they can more reliably be p.terminated()
Implements a retriggerable timeout watchdog that restarts counting whenever some output happens
Implements a long-term timeout watchdog to limit overall runtime
Can feed in stdin (although I only need to feed in one-time short strings)
Can capture stdout/stderr in the usual Popen means (Only stdout is coded, and stderr redirected to stdout; but can easily be separated)
It's almost realtime because it only checks every 0.2 seconds for output. But you could decrease this or remove the waiting interval easily
Lots of debugging printouts still enabled to see whats happening when.
For spawning multiple concurrent commands, you would need to alter the class RunCmd to instantiate mutliple read output/write input queues and to spawn mutliple Popen subprocesses.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.