Python subprocess.Popen pipe IO blocking unexpectedly - python

I am trying to use subprocess.Popen to control an ssh process and interact with it via pipes, like so:
p=subprocess.Popen(['ssh','-tt','LOGIN#HOSTNAME'], stdin=subprocess.PIPE,
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
universal_newlines=True)
while True:
(out_stdout,out_stderr)=p.communicate(timeout=10)
if out_stderr:
print(out_stderr)
if not out_stdout:
raise EOFError
print(out_stdout)
This works fine without the '-tt' option to ssh. However the program I need to interact with on the remote side of the ssh breaks if there is no pseudo tty allocated, so I am forced to use it.
What this seems to do is that the p.communicate reads then block indefinitely (or until timeout), even if input is available.
I have rewritten this using lower level calls to io.read, select.select etc to avoid going through Popen.communicate. Select will actually return the file descriptor as ready, but a subsequent io.read to that file descriptor will also block. If I disable 'universal newlines' and set 'bufsize=0' in the Popen call it then works fine, but then I am forced to do binary/unicode conversion and line ending processing myself.
It's worth saying though, that disabling universal_newlines in the p.communicate version also blocks indefinitely, so its not just that.
Any advice on how I can get line buffered input working properly here without having to reimplement everything?

There are library alternatives to subprocess and the SSH binary that are better suited for such tasks.
parallel-ssh:
from pssh.pssh2_client import ParalellSSHClient
client = ParallelSSHClient(['HOSTNAME'], user='LOGIN')
output = client.run_command('echo', use_pty=True)
for host, host_output in output.items():
for line in host_output.stdout:
print(line)
Replace echo with the command you need to run, or leave as-is if no command is required. The libraries require that something is passed in as command even if the remote side executes something automatically.
See also documentation for the single host SSHClient of the same project.
Per documentation, line parsing and encoding are handled by the library which is also cross-platform.
There are others like paramiko and ssh2-python that are lower level and need more code for the equivalent above - see their respective home pages for examples.

Related

missing stdout before subprocess.Popen crash [duplicate]

I am using a 3rd-party python module which is normally called through terminal commands. When called through terminal commands it has a verbose option which prints to terminal in real time.
I then have another python program which calls the 3rd-party program through subprocess. Unfortunately, when called through subprocess the terminal output no longer flushes, and is only returned on completion (the process takes many hours so I would like real-time progress).
I can see the source code of the 3rd-party module and it does not set printing to be flushed such as print('example', flush=True). Is there a way to force the flushing through my module without editing the 3rd-party source code? Furthermore, can I send this output to a log file (again in real time)?
Thanks for any help.
The issue is most likely that many programs work differently if run interactively in a terminal or as part of a pipe line (i.e. called using subprocess). It has very little to do with Python itself, but more with the Unix/Linux architecture.
As you have noted, it is possible to force a program to flush stdout even when run in a pipe line, but it requires changes to the source code, by manually applying stdout.flush calls.
Another way to print to screen, is to "trick" the program to think it is working with an interactive terminal, using a so called pseudo-terminal. There is a supporting module for this in the Python standard library, namely pty. Using, that, you will not explicitly call subprocess.run (or Popen or ...). Instead you have to use the pty.spawn call:
def prout(fd):
data = os.read(fd, 1024)
while(data):
print(data.decode(), end="")
data = os.read(fd, 1024)
pty.spawn("./callee.py", prout)
As can be seen, this requires a special function for handling stdout. Here above, I just print it to the terminal, but of course it is possible to do other thing with the text as well (such as log or parse...)
Another way to trick the program, is to use an external program, called unbuffer. Unbuffer will take your script as input, and make the program think (as for the pty call) that is called from a terminal. This is arguably simpler if unbuffer is installed or you are allowed to install it on your system (it is part of the expect package). All you have to do then, is to change your subprocess call as
p=subprocess.Popen(["unbuffer", "./callee.py"], stdout=subprocess.PIPE)
and then of course handle the output as usual, e.g. with some code like
for line in p.stdout:
print(line.decode(), end="")
print(p.communicate()[0].decode(), end="")
or similar. But this last part I think you have already covered, as you seem to be doing something with the output.

Execute scp using Popen without having to enter password

I have the following script
test.py
#!/usr/bin/env python2
from subprocess import Popen, PIPE, STDOUT
proc = Popen(['scp', 'test_file', 'user#192.168.120.172:/home/user/data'], stdout=PIPE, stdin=PIPE, stderr=STDOUT)
out, err = proc.communicate(input='userpass\n')
print('stdout: ' + out)
print('stderr: ' + str(err))
which is meant to copy test_file in a remote directory /home/user/data located at 10.0.0.2 using login for a given user user. In order to do that I must use scp. No key authentification is allowed (don't ask why, it's just how things are and I cannot change them).
Even though I am piping userpass to the process I still get a prompt inside the terminal to enter password. I want to just run test.py on the local machine and then the remote gets the file without any user interaction.
I though that I'm not using communicate() correctly so I manually called
proc.stdin.write('userpass\n')
proc.stdin.flush()
out, err = proc.communicate()
but nothing changed and I still got that password prompt.
When scp or ssh attempt to read a password they do not read it from stdin. Instead they open /dev/tty and read the password direct from the connected terminal.
sshpass works by creating its own dummy terminal and spawning ssh or scp in a child process controlled by that terminal. That's basically the only way to intercept the password prompt. The recommended solution is to use public key authentication, but you say you cannot do that.
If as you say you cannot install sshpass and also cannot use a secure form of authentication then about the only thing you can do is re-implement sshpass in your own code. sshpass itself is licensed under the GPL, so if you copy the existing code be sure not to infringe on its copyleft.
Here's the comment from the sshpass source which describes how it manages to spoof the input:
/*
Comment no. 3.14159
This comment documents the history of code.
We need to open the slavept inside the child process, after "setsid", so that it becomes the controlling
TTY for the process. We do not, otherwise, need the file descriptor open. The original approach was to
close the fd immediately after, as it is no longer needed.
It turns out that (at least) the Linux kernel considers a master ptty fd that has no open slave fds
to be unused, and causes "select" to return with "error on fd". The subsequent read would fail, causing us
to go into an infinite loop. This is a bug in the kernel, as the fact that a master ptty fd has no slaves
is not a permenant problem. As long as processes exist that have the slave end as their controlling TTYs,
new slave fds can be created by opening /dev/tty, which is exactly what ssh is, in fact, doing.
Our attempt at solving this problem, then, was to have the child process not close its end of the slave
ptty fd. We do, essentially, leak this fd, but this was a small price to pay. This worked great up until
openssh version 5.6.
Openssh version 5.6 looks at all of its open file descriptors, and closes any that it does not know what
they are for. While entirely within its prerogative, this breaks our fix, causing sshpass to either
hang, or do the infinite loop again.
Our solution is to keep the slave end open in both parent AND child, at least until the handshake is
complete, at which point we no longer need to monitor the TTY anyways.
*/
So what sshpass is doing is opening a pseudo terminal device (using posix_openpt), then forks and in the child process makes the slave the controlling pt for the process. Then it can exec the scp command.
I don't know if you can get this to work from Python, but the good news is the standard library does include functions for working with pseudo terminals: https://docs.python.org/3.6/library/pty.html

python 2.7 Popen: what does `close_fds` do?

I have a web server in Python (2.7) that uses Popen to delegate some work to a child process:
url_arg = "http://localhost/index.html?someparam=somevalue"
call = ('phantomjs', 'some/phantom/script.js', url_arg)
imageB64data = tempfile.TemporaryFile()
errordata = tempfile.TemporaryFile()
p = Popen(call, stdout=imageB64data, stderr=errordata, stdin=PIPE)
p.communicate(input="")
I am seeing intermittent issues where after some number of these Popens have occurred (roughly 64), the process runs out of file descriptors and is unable to function -- it becomes completely unresponsive and all threads seem to block forever if they attempt to open any files or sockets.
(Possibly relevant: the phantomjs child process loads a URL calls back into the server that spawned it.)
Based on this Python bug report, I believe I need to set close_fds=True on all Popen calls from inside my server process in order to mitigate the leaking of file descriptors. However, I am unfamiliar with the machinery around exec-ing subprocesses and inheritance of file descriptors so much of the Popen documentation and the notes in the aforementioned bug report are unclear to me.
It sounds like it would actually close all open file descriptors (which includes active request sockets, log file handles, etc.) in my process before executing the subprocess. This sounds like it would be strictly better than leaking the sockets, but would still result in errors.
However, in practice, when I use close_fds=True during a web request, it seems to work fine and thus far I have been unable to construct a scenario where it actually closes any other request sockets, database requests, etc.
The docs state:
If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed.
So my question is: is it "safe" and "correct" to pass close_fds=True to Popen in a multithreaded Python web server? Or should I expect this to have side effects if other requests are doing file/socket IO at the same time?
I tried the following test with the subprocess32 backport of Python 3.2/3.3's subprocess:
import tempfile
import subprocess32 as subprocess
fp = open('test.txt', 'w')
fp.write("some stuff")
echoed = tempfile.TemporaryFile()
p = subprocess.Popen(("echo", "this", "stuff"), stdout=echoed, close_fds=True)
p.wait()
echoed.seek(0)
fp.write("whatevs")
fp.write(echoed.read())
fp.close()
and I got the expected result of some stuffwhatevsecho this stuff in test.txt.
So it appears that the meaning of close in close_fds does not mean that open files (sockets, etc.) in the parent process will be unusable after executing a child process.
Also worth noting: subprocess32 defaults close_fds=True on POSIX systems, AFAICT. This implies to me that it is not as dangerous as it sounds.
I suspect that close_fds solves the problem of file descriptors leaking to subprocesses. Imagine opening a file, and then running some task using subprocess. Without close_fds, the file descriptor is copied to the subprocess, so even if the parent process closes the file, the file remains open due to the subprocess. Now, let's say we want to delete the directory with the file in another thread using shutil.rmtree. On a regular filesystem, this should not be an issue. The directory is just removed as expected. However, when the file resides on NFS, the following happens: First, Python will try to delete the file. Since the file is still in use, it gets renamed to .nfsXXX instead, where XXX is a long hexadecimal number. Next, Python will try to delete the directory, but that has become impossible because the .nfsXXX file still resides in it.

Python subprocess interaction, why does my process work with Popen.communicate, but not Popen.stdout.read()?

I am trying to communicate with a command-line chat bot with Python using the subprocess module. (http://howie.sourceforge.net/ using the compiled win32 binary, I have my reasons!)
This works:
proc = Popen('Howie/howie.exe', stdout=PIPE,stderr=STDOUT,stdin=PIPE)
output = proc.communicate()
But Popen.communicate waits for the process to terminate (and sends it EOF?), I want to be able to interact with it. The apparent solution for this was to read stdout / write stdin like so:
This doesn't work:
proc = Popen('Howie/howie.exe', stdout=PIPE,stderr=STDOUT,stdin=PIPE)
while True: print proc.stdout.readline()
(Note that I am actually using more complex code based on http://code.activestate.com/recipes/440554/ but the issue is the same.)
The problem is, the second approach works perfectly for communicating to cmd, but when I run the chatbot, nothing. So my question is, how is this different in capturing output to using Popen.communicate()?
i.e. I can use the second approach to use the command line as per normal, until I run the chatbot, at which point I stop receiving output. Using the first approach correctly displays the first few lines of output from the bot, but leaves me unable to interact with it.
One major difference between the two is that communicate() closes stdin after sending the data. I don't know about your particular case, but in many cases this means that if a process is awaiting the end of the user input, he will get it when communicate() is used, and will never get it when the code blocks on read() or readline().
Try adding Popen.stdin.close() first and see if it affects your case.
If you want to interact with the program after sending the EOF, rather than using Popen.stdin.close(), you can manually send the command-line End Of File character, which has the same effect but leaves stdin open.
In Python this character's escape sequence is '\x1a'.

Communicating multiple times w/ subprocess (multiple calls to stdout)

There's a similar question to mine on [this thread][1].
I want to send a command to my subprocess, interpret the response, then send another command. It would seem a shame to have to start a new subprocess to accomplish this, particularly if subprocess2 must perform many of the same tasks as subprocess1 (e.g. ssh, open mysql).
I tried the following:
subprocess1.stdin.write([my commands])
subprocess1.stdin.flush()
subprocess1.stout.read()
But without a definite parameter for bytes to read(), the program gets stuck executing that instruction, and I can't supply an argument for read() because I can't guess how many bytes are available in the stream.
I'm running WinXP, Py2.7.1
EDIT
Credit goes to #regularfry for giving me the best solution for my real intention (read the comments in his response, as they pertain to accomplishing my goal through an SSH tunnel). (His/her answer has been voted up.) For the benefit of any viewer who hereafter comes for an answer to the title question, however, I've accepted #Mike Penningtion's answer.
Your choices are:
Use a line-oriented protocol (and use readline() rather than read()), and ensure that every possible line sent is a valid message;
Use read(1) and a parser to tell you when you've read a full message; or
Pickle message objects into the stream from the subprocess, then unpickle them in the parent. This handles the message length problem for you.
#JellicleCat, I'm following up on the comments. I believe wexpect is a part of sage... AFAIK, it is not packaged separately, but you can download wexpect here.
Honestly, if you're going to drive programmatic ssh sessions, use paramiko. It is supported as an independent installation, has good packaging, and should install natively on windows.
EDIT
Sample paramiko script to cd to a directory, execute an ls and exit... capturing all results...
import sys
sys.stderr = open('/dev/null') # Silence silly warnings from paramiko
import paramiko as pm
sys.stderr = sys.__stderr__
import os
class AllowAllKeys(pm.MissingHostKeyPolicy):
def missing_host_key(self, client, hostname, key):
return
HOST = '127.0.0.1'
USER = ''
PASSWORD = ''
client = pm.SSHClient()
client.load_system_host_keys()
client.load_host_keys(os.path.expanduser('~/.ssh/known_hosts'))
client.set_missing_host_key_policy(AllowAllKeys())
client.connect(HOST, username=USER, password=PASSWORD)
channel = client.invoke_shell()
stdin = channel.makefile('wb')
stdout = channel.makefile('rb')
stdin.write('''
cd tmp
ls
exit
''')
print stdout.read()
stdout.close()
stdin.close()
client.close()
This approach will work (I've done this) but will take some time and it uses Unix-specific calls. You'll have to abandon the subprocess module and roll your own equivalent based on fork/exec and os.pipe().
Use the fcntl.fcntl function to place the stdin/stdout file descriptors (read and write) for your child process into non-blocking mode (O_NONBLOCK option constant) after creating them with os.pipe().
Use the select.select function to poll or wait for availability on your file descriptors. To avoid deadlocks you will need to use select() to ensure that writes will not block, just like reads. Even still, you must account for OSError exceptions when you read and write, and retry when you get EAGAIN errors. (Even when using select before read/write, EAGAIN can occur in non-blocking mode; this is a common kernel bug that has proven difficult to fix.)
If you are willing to implement on the Twisted framework, they have supposedly solved this problem for you; all you have to do is write a Process subclass. But I haven't tried that myself yet.

Categories