prevent unexpected stdin reads and lock in subprocess - python

A simple case I'm trying to solve for all situations.
I am running a subprocess for performing a certain task, and I don't expect it to ask for stdin, but in rare cases that I might not even expect, it might try to read.
I would like to prevent it from hanging in that case.
here is a classic example:
import subprocess
p = subprocess.Popen(["unzip", "-tqq", "encrypted.zip"])
p.wait()
This will hang forever.
I have already tried adding
stdin=open(os.devnull)
and such..
will post if I find a valuable solution.
would be enough for me to receive an exception in the parent process - instead of hanging on communicate/wait endlessly.
update: it seems the problem might be even more complicated than I initially expected, the subprocess (in password and other cases) reads from other file descriptors - like the /dev/tty to interact with the shell. might not be as easy to solve as I thought..

If your child process may ask for a password then it may do it outside of standard input/output/error streams if a tty is available, see the first reason in Q: Why not just use a pipe (popen())?
As you've noticed, creating a new session prevents the subprocess from using the parent's tty e.g., if you have ask-password.py script:
#!/usr/bin/env python
"""Ask for password. It defaults to working with a terminal directly."""
from getpass import getpass
try:
_ = getpass()
except EOFError:
pass # ignore
else:
assert 0
then to call it as a subprocess so that it would not hang awaiting for the password, you could use start_new_session=True parameter:
#!/usr/bin/env python3
import subprocess
import sys
subprocess.check_call([sys.executable, 'ask-password.py'],
stdin=subprocess.DEVNULL, start_new_session=True,
stderr=subprocess.DEVNULL)
stderr is redirected here too because getpass() uses it as a fallback, to print warnings and the prompt.
To emulate start_new_session=True on Unix on Python 2, you could use preexec_fn=os.setsid.
To emulate subprocess.DEVNULL on Python 2, you could use DEVNULL=open(os.devnull, 'r+b', 0) or pass stdin=PIPE and close it immediately using .communicate():
#!/usr/bin/env python2
import os
import sys
from subprocess import Popen, PIPE
Popen([sys.executable, 'ask-password.py'],
stdin=PIPE, preexec_fn=os.setsid,
stderr=PIPE).communicate() #NOTE: assume small output on stderr
Note: you don't need .communicate() unless you use subprocess.PIPE. check_call() is perfectly safe if you use an object with a real file descriptor (.fileno()) such as returned by open(os.devnull, ..). The redirection occurs before the child process is executed (after fork(), before exec()) -- there is no reason to use .communicate() instead of check_call() here.

Apparently the culprit is the direct usage of /dev/tty and such.
On linux at least, one solution is to add to the Popen call the following parameter:
preexec_fn=os.setsid
which causes a new session id to be set, and disallows reading from the tty directly. i will probably use the following code (stdin close is just in case):
import subprocess
import os
p = subprocess.Popen(["unzip", "-tqq", "encrypted.zip"],
stdin=subprocess.PIPE, preexec_fn=os.setsid)
p.stdin.close() #just in case
p.wait()
last two lines can be replaced by one call:
p.communicate()
since communicate() closes stdin file after sending all the input supplied.
Simple and elegant it seems.
Alternatively:
import subprocess
import os
p = subprocess.Popen(["unzip", "-tqq", "encrypted.zip"],
stdin=open(os.devnull), preexec_fn=os.setsid)
p.communicate()

Related

No stdout from killed subprocess

i have a homework assignment to capture a 4way handshake between a client and AP using scapy. im trying to use "aircrack-ng capture.pcap" to check for valid handshakes in the capture file i created using scapy
i launch the program using Popen. the program waits for user input so i have to kill it. when i try to get stdout after killing it the output is empty.
i've tried stdout.read(), i've tried communicate, i've tried reading stderr, and i've tried it both with and without shells
check=Popen("aircrack-ng capture.pcap",shell=True,stdin=PIPE,stdout=PIPE,stderr=PIPE)
check.kill()
print(check.stdout.read())
While you shouldn't do this (trying to rely on hardcoded delays is inherently race-condition-prone), that the issue is caused by your kill() being delivered while sh is still starting up can be demonstrated by the problem being "solved" (not reliably, but sufficient for demonstration) by tiny little sleep long enough let the shell start up and the echo run:
import time
from subprocess import Popen, PIPE
check=Popen("echo hello && sleep 1000", shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE)
time.sleep(0.01) # BAD PRACTICE: Race-condition-prone, use one of the below instead.
check.kill()
print(check.stdout.read())
That said, a much better-practice solution would be to close the stdin descriptor so the reads immediately return 0-byte results. On newer versions of Python (modern 3.x), you can do that with DEVNULL:
import time
from subprocess import Popen, PIPE, DEVNULL
check=Popen("echo hello && read input && sleep 1000",
shell=True, stdin=DEVNULL, stdout=PIPE, stderr=PIPE)
print(check.stdout.read())
...or, with Python 2.x, a similar effect can be achieved by passing an empty string to communicate(), thus close()ing the stdin pipe immediately:
import time
from subprocess import Popen, PIPE
check=Popen("echo hello && read input && sleep 1000",
shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE)
print(check.communicate('')[0])
Never, and I mean, never kill a process as part of normal operation. There's no guarantee whatsoever how far it has proceeded by the time you kill it, so you cannot expect any specific results from it in such a case.
To explicitly pass nothing to a subprocess as input to prevent hanging when it tries to read stdin:
connect its stdin to /dev/null (nul in Windows) as per run a process to /dev/null in python :
p=Popen(<...>, stdin=open(os.devnull)) #or stdin=subprocess.DEVNULL in Python 3.3+
or use stdin=PIPE and <process>.communicate() without arguments -- this will pass an empty stream
Use <process>.communicate(), or use subprocess.check_output() instead of Popen to read output reliably
A process, in the general case, is not guaranteed to output any data at any particular moment due to I/O buffering. So you need to read the output stream after the process completes to be sure you've got everything.
At the same time, you need to keep reading the stream in the meantime if the process can produce enough output to fill an I/O buffer1. Otherwise, it will hang waiting for you to read the buffered data. If both stdout and stderr are PIPEs, you need to read them both, in parallel -- i.e. in different threads.
communicate() and check_output (that uses the former under the hood) achieve this by reading stdout and stderr in two separate threads.
Prefer convenience functions to Popen for common use cases -- in your case, check_output -- as they take care of all the aforementioned caveats for you.
1Pipes are fully buffered and a typical buffer size is 64KB

Filter out command that needs a terminal in Python subprocess module

I am developing a robot that accepts commands from network (XMPP) and uses subprocess module in Python to execute them and sends back the output of commands. Essentially it is an SSH-like XMPP-based non-interactive shell.
The robot only executes commands from authenticated trusted sources, so arbitrary shell commands are allowed (shell=True).
However, when I accidentally send some command that needs a tty, the robot is stuck.
For example:
subprocess.check_output(['vim'], shell=False)
subprocess.check_output('vim', shell=True)
Should each of the above commands is received, the robot is stuck, and the terminal from which the robot is run, is broken.
Though the robot only receives commands from authenticated trusted sources, human errs. How could I make the robot filter out those commands that will break itself? I know there is os.isatty but how could I utilize it? Is there a way to detect those "bad" commands and refuse to execute them?
TL;DR:
Say, there are two kinds of commands:
Commands like ls: does not need a tty to run.
Commands like vim: needs a tty; breaks subprocess if no tty is given.
How could I tell a command is ls-like or is vim-like and refuses to run the command if it is vim-like?
What you expect is a function that receives command as input, and returns meaningful output by running the command.
Since the command is arbitrary, requirement for tty is just one of many bad cases may happen (other includes running a infinite loop), your function should only concern about its running period, in other words, a command is “bad” or not should be determined by if it ends in a limited time or not, and since subprocess is asynchronous by nature, you can just run the command and handle it in a higher vision.
Demo code to play, you can change the cmd value to see how it performs differently:
#!/usr/bin/env python
# coding: utf-8
import time
import subprocess
from subprocess import PIPE
#cmd = ['ls']
#cmd = ['sleep', '3']
cmd = ['vim', '-u', '/dev/null']
print 'call cmd'
p = subprocess.Popen(cmd, shell=True,
stdin=PIPE, stderr=PIPE, stdout=PIPE)
print 'called', p
time_limit = 2
timer = 0
time_gap = 0.2
ended = False
while True:
time.sleep(time_gap)
returncode = p.poll()
print 'process status', returncode
timer += time_gap
if timer >= time_limit:
print 'timeout, kill process'
p.kill()
break
if returncode is not None:
ended = True
break
if ended:
print 'process ended by', returncode
print 'read'
out, err = p.communicate()
print 'out', repr(out)
print 'error', repr(err)
else:
print 'process failed'
Three points are notable in the above code:
We use Popen instead of check_output to run the command, unlike check_output which will wait for the process to end, Popen returns immediately, thus we can do further things to control the process.
We implement a timer to check for the process's status, if it runs for too long, we killed it manually because we think a process is not meaningful if it could not end in a limited time. In this way your original problem will be solved, as vim will never end and it will definitely being killed as an “unmeaningful” command.
After the timer helps us filter out bad commands, we can get stdout and stderr of the command by calling communicate method of the Popen object, after that its your choice to determine what to return to the user.
Conclusion
tty simulation is not needed, we should run the subprocess asynchronously, then control it by a timer to determine whether it should be killed or not, for those ended normally, its safe and easy to get the output.
Well, SSH is already a tool that will allow users to remotely execute commands and be authenticated at the same time. The authentication piece is extremely tricky, please be aware that building the software you're describing is a bit risky from a security perspective.
There isn't a way to determine whether a process is going to need a tty or not. And there's no os.isatty method because if you ran a sub-processes that needed one wouldn't mean that there was one. :)
In general, it would probably be safer from a security perspective and also a solution to this problem if you were to consider a white list of commands. You could choose that white list to avoid things that would need a tty, because I don't think you'll easily get around this.
Thanks a lot for #J.F. Sebastia's help (see comments under the question), I've found a solution (workaround?) for my case.
The reason why vim breaks terminal while ls does not, is that vim needs a tty. As Sebastia says, we can feed vim with a pty using pty.openpty(). Feeding a pty gurantees the command will not break terminal, and we can add a timout to auto-kill such processes. Here is (dirty) working example:
#!/usr/bin/env python3
import pty
from subprocess import STDOUT, check_output, TimeoutExpired
master_fd, slave_fd = pty.openpty()
try:
output1 = check_output(['ls', '/'], stdin=slave_fd, stderr=STDOUT, universal_newlines=True, timeout=3)
print(output1)
except TimeoutExpired:
print('Timed out')
try:
output2 = check_output(['vim'], stdin=slave_fd, stderr=STDOUT, universal_newlines=True, timeout=3)
print(output2)
except TimeoutExpired:
print('Timed out')
Note it is stdin that we need to take care of, not stdout or stderr.
You can refer to my answer in: https://stackoverflow.com/a/43012138/3555925, which use pseudo-terminal to make stdout no-blocking, and use select in handle stdin/stdout.
I can just modify the command var to 'vim'. And the script is working fine.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import sys
import select
import termios
import tty
import pty
from subprocess import Popen
command = 'vim'
# save original tty setting then set it to raw mode
old_tty = termios.tcgetattr(sys.stdin)
tty.setraw(sys.stdin.fileno())
# open pseudo-terminal to interact with subprocess
master_fd, slave_fd = pty.openpty()
# use os.setsid() process the leader of a new session, or bash job control will not be enabled
p = Popen(command,
preexec_fn=os.setsid,
stdin=slave_fd,
stdout=slave_fd,
stderr=slave_fd,
universal_newlines=True)
while p.poll() is None:
r, w, e = select.select([sys.stdin, master_fd], [], [])
if sys.stdin in r:
d = os.read(sys.stdin.fileno(), 10240)
os.write(master_fd, d)
elif master_fd in r:
o = os.read(master_fd, 10240)
if o:
os.write(sys.stdout.fileno(), o)
# restore tty settings back
termios.tcsetattr(sys.stdin, termios.TCSADRAIN, old_tty)

Waiting for output from a subprocess which does not terminate

I need to run a subprocess from my script. The subprocess is an interactive (shell-like) application, to which I issue commands through the subprocess' stdin.
After I issue a command, the subprocess outputs the result to stdout and then waits for the next command (but does not terminate).
For example:
from subprocess import Popen, PIPE
p = Popen(args = [...], stdin = PIPE, stdout = PIPE, stderr = PIPE, shell = False)
# Issue a command:
p.stdin.write('command\n')
# *** HERE: get the result from p.stdout ***
# CONTINUE with the rest of the script once there is not more data in p.stdout
# NOTE that the subprocess is still running and waiting for the next command
# through stdin.
My problem is getting the result from p.stdout. The script needs to get the output while there is new data in p.stdout; but once there is no more data, I want to continue with the script.
The subprocess does not terminate, so I cannot use communicate() (which waits for the process to terminate).
I tried reading from p.stdout after issuing the command, like this:
res = p.stdout.read()
But the subprocess is not fast enough, and I just get empty result.
I thought about polling p.stdout in a loop until I get something, but then how do I know I got everything? And it seems wasteful anyway.
Any suggestions?
Use gevent.subprocess in gevent-1.0 to substitute the standard subprocess module. It could do the concurrency tasks using synchronous logic and won't block the script. Here is a brief tutorial about gevent.subprocess
Use circuits.io.Process in circuits-dev to wrap an asynchronous call to subprocess.
Example: https://bitbucket.org/circuits/circuits-dev/src/tip/examples/ping.py
After investigating several options I reached two solutions:
Setting the subprocess' stdout stream to be non blocking by using the fcntl module.
Using a thread to collect the subprocess' output to a proxy queue, and then reading the queue from the main thread.
I describe both solutions (and the problem and its origin) in this post.

Proper way of re-using and closing a subprocess object

I have the following code in a loop:
while true:
# Define shell_command
p1 = Popen(shell_command, shell=shell_type, stdout=PIPE, stderr=PIPE, preexec_fn=os.setsid)
result = p1.stdout.read();
# Define condition
if condition:
break;
where shell_command is something like ls (it just prints stuff).
I have read in different places that I can close/terminate/exit a Popen object in a variety of ways, e.g. :
p1.stdout.close()
p1.stdin.close()
p1.terminate
p1.kill
My question is:
What is the proper way of closing a subprocess object once we are done using it?
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands? Would that be more efficient in any way than opening new subprocess objects each time?
Update
I am still a bit confused about the sequence of steps to follow depending on whether I use p1.communicate() or p1.stdout.read() to interact with my process.
From what I understood in the answers and the comments:
If I use p1.communicate() I don't have to worry about releasing resources, since communicate() would wait until the process is finished, grab the output and properly close the subprocess object
If I follow the p1.stdout.read() route (which I think fits my situation, since the shell command is just supposed to print stuff) I should call things in this order:
p1.wait()
p1.stdout.read()
p1.terminate()
Is that right?
What is the proper way of closing a subprocess object once we are done using it?
stdout.close() and stdin.close() will not terminate a process unless it exits itself on end of input or on write errors.
.terminate() and .kill() both do the job, with kill being a bit more "drastic" on POSIX systems, as SIGKILL is sent, which cannot be ignored by the application. Specific differences are explained in this blog post, for example. On Windows, there's no difference.
Also, remember to .wait() and to close the pipes after killing a process to avoid zombies and force the freeing of resources.
A special case that is often encountered are processes which read from STDIN and write their result to STDOUT, closing themselves when EOF is encountered. With these kinds of programs, it's often sensible to use subprocess.communicate:
>>> p = Popen(["sort"], stdin=PIPE, stdout=PIPE)
>>> p.communicate("4\n3\n1")
('1\n3\n4\n', None)
>>> p.returncode
0
This can also be used for programs which print something and exit right after:
>>> p = Popen(["ls", "/home/niklas/test"], stdin=PIPE, stdout=PIPE)
>>> p.communicate()
('file1\nfile2\n', None)
>>> p.returncode
0
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands? Would that be more efficient in any way than opening new subprocess objects each time?
I don't think the subprocess module supports this and I don't see what resources could be shared here, so I don't think it would give you a significant advantage.
Considering the nature of my script, is there a way to open a subprocess object only once and reuse it with different shell commands?
Yes.
#!/usr/bin/env python
from __future__ import print_function
import uuid
import random
from subprocess import Popen, PIPE, STDOUT
MARKER = str(uuid.uuid4())
shell_command = 'echo a'
p = Popen('sh', stdin=PIPE, stdout=PIPE, stderr=STDOUT,
universal_newlines=True) # decode output as utf-8, newline is '\n'
while True:
# write next command
print(shell_command, file=p.stdin)
# insert MARKER into stdout to separate output from different shell_command
print("echo '%s'" % MARKER, file=p.stdin)
# read command output
for line in iter(p.stdout.readline, MARKER+'\n'):
if line.endswith(MARKER+'\n'):
print(line[:-len(MARKER)-1])
break # command output ended without a newline
print(line, end='')
# exit on condition
if random.random() < 0.1:
break
# cleanup
p.stdout.close()
if p.stderr:
p.stderr.close()
p.stdin.close()
p.wait()
Put while True inside try: ... finally: to perform the cleanup in case of exceptions. On Python 3.2+ you could use with Popen(...): instead.
Would that be more efficient in any way than opening new subprocess objects each time?
Does it matter in your case? Don't guess. Measure it.
The "correct" order is:
Create a thread to read stdout (and a second one to read stderr, unless you merged them into one).
Write commands to be executed by the child to stdin. If you're not reading stdout at the same time, writing to stdin can block.
Close stdin (this is the signal for the child that it can now terminate by itself whenever it is done)
When stdout returns EOF, the child has terminated. Note that you need to synchronize the stdout reader thread and your main thread.
call wait() to see if there was a problem and to clean up the child process
If you need to stop the child process for any reason (maybe the user wants to quit), then you can:
Close stdin if the child terminates when it reads EOF.
Kill the with terminate(). This is the correct solution for child processes which ignore stdin.
If the child doesn't respond, try kill()
In all three cases, you must call wait() to clean up the dead child process.
Depends on what you expect the process to do; you should always call p1.wait() in order to avoid zombies. Other steps depend on the behaviour of the subprocess; if it produces any output, you should consume the output (e.g. p1.read() ...but this would eat lots of memory) and only then call the p1.wait(); or you may wait for some timeout and call p1.terminate() to kill the process if you think it doesn't work as expected, and possible call p1.wait() to clean the zombie.
Alternatively, p1.communicate(...) would do the handling if io and waiting for you (not the killing).
Subprocess objects aren't supposed to be reused.

How do I write to a Python subprocess' stdin?

I'm trying to write a Python script that starts a subprocess, and writes to the subprocess stdin. I'd also like to be able to determine an action to be taken if the subprocess crashes.
The process I'm trying to start is a program called nuke which has its own built-in version of Python which I'd like to be able to submit commands to, and then tell it to quit after the commands execute. So far I've worked out that if I start Python on the command prompt like and then start nuke as a subprocess then I can type in commands to nuke, but I'd like to be able to put this all in a script so that the master Python program can start nuke and then write to its standard input (and thus into its built-in version of Python) and tell it to do snazzy things, so I wrote a script that starts nuke like this:
subprocess.call(["C:/Program Files/Nuke6.3v5/Nuke6.3", "-t", "E:/NukeTest/test.nk"])
Then nothing happens because nuke is waiting for user input. How would I now write to standard input?
I'm doing this because I'm running a plugin with nuke that causes it to crash intermittently when rendering multiple frames. So I'd like this script to be able to start nuke, tell it to do something and then if it crashes, try again. So if there is a way to catch a crash and still be OK then that'd be great.
It might be better to use communicate:
from subprocess import Popen, PIPE, STDOUT
p = Popen(['myapp'], stdout=PIPE, stdin=PIPE, stderr=PIPE)
stdout_data = p.communicate(input='data_to_write')[0]
"Better", because of this warning:
Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.
To clarify some points:
As jro has mentioned, the right way is to use subprocess.communicate.
Yet, when feeding the stdin using subprocess.communicate with input, you need to initiate the subprocess with stdin=subprocess.PIPE according to the docs.
Note that if you want to send data to the process’s stdin, you need to create the Popen object with stdin=PIPE. Similarly, to get anything other than None in the result tuple, you need to give stdout=PIPE and/or stderr=PIPE too.
Also qed has mentioned in the comments that for Python 3.4 you need to encode the string, meaning you need to pass Bytes to the input rather than a string. This is not entirely true. According to the docs, if the streams were opened in text mode, the input should be a string (source is the same page).
If streams were opened in text mode, input must be a string. Otherwise, it must be bytes.
So, if the streams were not opened explicitly in text mode, then something like below should work:
import subprocess
command = ['myapp', '--arg1', 'value_for_arg1']
p = subprocess.Popen(command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
output = p.communicate(input='some data'.encode())[0]
I've left the stderr value above deliberately as STDOUT as an example.
That being said, sometimes you might want the output of another process rather than building it up from scratch. Let's say you want to run the equivalent of echo -n 'CATCH\nme' | grep -i catch | wc -m. This should normally return the number characters in 'CATCH' plus a newline character, which results in 6. The point of the echo here is to feed the CATCH\nme data to grep. So we can feed the data to grep with stdin in the Python subprocess chain as a variable, and then pass the stdout as a PIPE to the wc process' stdin (in the meantime, get rid of the extra newline character):
import subprocess
what_to_catch = 'catch'
what_to_feed = 'CATCH\nme'
# We create the first subprocess, note that we need stdin=PIPE and stdout=PIPE
p1 = subprocess.Popen(['grep', '-i', what_to_catch], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
# We immediately run the first subprocess and get the result
# Note that we encode the data, otherwise we'd get a TypeError
p1_out = p1.communicate(input=what_to_feed.encode())[0]
# Well the result includes an '\n' at the end,
# if we want to get rid of it in a VERY hacky way
p1_out = p1_out.decode().strip().encode()
# We create the second subprocess, note that we need stdin=PIPE
p2 = subprocess.Popen(['wc', '-m'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
# We run the second subprocess feeding it with the first subprocess' output.
# We decode the output to convert to a string
# We still have a '\n', so we strip that out
output = p2.communicate(input=p1_out)[0].decode().strip()
This is somewhat different than the response here, where you pipe two processes directly without adding data directly in Python.
Hope that helps someone out.
Since subprocess 3.5, there is the subprocess.run() function, which provides a convenient way to initialize and interact with Popen() objects. run() takes an optional input argument, through which you can pass things to stdin (like you would using Popen.communicate(), but all in one go).
Adapting jro's example to use run() would look like:
import subprocess
p = subprocess.run(['myapp'], input='data_to_write', capture_output=True, text=True)
After execution, p will be a CompletedProcess object. By setting capture_output to True, we make available a p.stdout attribute which gives us access to the output, if we care about it. text=True tells it to work with regular strings rather than bytes. If you want, you might also add the argument check=True to make it throw an error if the exit status (accessible regardless via p.returncode) isn't 0.
This is the "modern"/quick and easy way to do to this.
One can write data to the subprocess object on-the-fly, instead of collecting all the input in a string beforehand to pass through the communicate() method.
This example sends a list of animals names to the Unix utility sort, and sends the output to standard output.
import sys, subprocess
p = subprocess.Popen('sort', stdin=subprocess.PIPE, stdout=sys.stdout)
for v in ('dog','cat','mouse','cow','mule','chicken','bear','robin'):
p.stdin.write( v.encode() + b'\n' )
p.communicate()
Note that writing to the process is done via p.stdin.write(v.encode()). I tried using
print(v.encode(), file=p.stdin), but that failed with the message TypeError: a bytes-like object is required, not 'str'. I haven't figured out how to get print() to work with this.
You can provide a file-like object to the stdin argument of subprocess.call().
The documentation for the Popen object applies here.
To capture the output, you should instead use subprocess.check_output(), which takes similar arguments. From the documentation:
>>> subprocess.check_output(
... "ls non_existent_file; exit 0",
... stderr=subprocess.STDOUT,
... shell=True)
'ls: non_existent_file: No such file or directory\n'

Categories