python 2.7 Popen: what does `close_fds` do? - python

I have a web server in Python (2.7) that uses Popen to delegate some work to a child process:
url_arg = "http://localhost/index.html?someparam=somevalue"
call = ('phantomjs', 'some/phantom/script.js', url_arg)
imageB64data = tempfile.TemporaryFile()
errordata = tempfile.TemporaryFile()
p = Popen(call, stdout=imageB64data, stderr=errordata, stdin=PIPE)
p.communicate(input="")
I am seeing intermittent issues where after some number of these Popens have occurred (roughly 64), the process runs out of file descriptors and is unable to function -- it becomes completely unresponsive and all threads seem to block forever if they attempt to open any files or sockets.
(Possibly relevant: the phantomjs child process loads a URL calls back into the server that spawned it.)
Based on this Python bug report, I believe I need to set close_fds=True on all Popen calls from inside my server process in order to mitigate the leaking of file descriptors. However, I am unfamiliar with the machinery around exec-ing subprocesses and inheritance of file descriptors so much of the Popen documentation and the notes in the aforementioned bug report are unclear to me.
It sounds like it would actually close all open file descriptors (which includes active request sockets, log file handles, etc.) in my process before executing the subprocess. This sounds like it would be strictly better than leaking the sockets, but would still result in errors.
However, in practice, when I use close_fds=True during a web request, it seems to work fine and thus far I have been unable to construct a scenario where it actually closes any other request sockets, database requests, etc.
The docs state:
If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed.
So my question is: is it "safe" and "correct" to pass close_fds=True to Popen in a multithreaded Python web server? Or should I expect this to have side effects if other requests are doing file/socket IO at the same time?

I tried the following test with the subprocess32 backport of Python 3.2/3.3's subprocess:
import tempfile
import subprocess32 as subprocess
fp = open('test.txt', 'w')
fp.write("some stuff")
echoed = tempfile.TemporaryFile()
p = subprocess.Popen(("echo", "this", "stuff"), stdout=echoed, close_fds=True)
p.wait()
echoed.seek(0)
fp.write("whatevs")
fp.write(echoed.read())
fp.close()
and I got the expected result of some stuffwhatevsecho this stuff in test.txt.
So it appears that the meaning of close in close_fds does not mean that open files (sockets, etc.) in the parent process will be unusable after executing a child process.
Also worth noting: subprocess32 defaults close_fds=True on POSIX systems, AFAICT. This implies to me that it is not as dangerous as it sounds.

I suspect that close_fds solves the problem of file descriptors leaking to subprocesses. Imagine opening a file, and then running some task using subprocess. Without close_fds, the file descriptor is copied to the subprocess, so even if the parent process closes the file, the file remains open due to the subprocess. Now, let's say we want to delete the directory with the file in another thread using shutil.rmtree. On a regular filesystem, this should not be an issue. The directory is just removed as expected. However, when the file resides on NFS, the following happens: First, Python will try to delete the file. Since the file is still in use, it gets renamed to .nfsXXX instead, where XXX is a long hexadecimal number. Next, Python will try to delete the directory, but that has become impossible because the .nfsXXX file still resides in it.

Related

Python subprocess.Popen pipe IO blocking unexpectedly

I am trying to use subprocess.Popen to control an ssh process and interact with it via pipes, like so:
p=subprocess.Popen(['ssh','-tt','LOGIN#HOSTNAME'], stdin=subprocess.PIPE,
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
universal_newlines=True)
while True:
(out_stdout,out_stderr)=p.communicate(timeout=10)
if out_stderr:
print(out_stderr)
if not out_stdout:
raise EOFError
print(out_stdout)
This works fine without the '-tt' option to ssh. However the program I need to interact with on the remote side of the ssh breaks if there is no pseudo tty allocated, so I am forced to use it.
What this seems to do is that the p.communicate reads then block indefinitely (or until timeout), even if input is available.
I have rewritten this using lower level calls to io.read, select.select etc to avoid going through Popen.communicate. Select will actually return the file descriptor as ready, but a subsequent io.read to that file descriptor will also block. If I disable 'universal newlines' and set 'bufsize=0' in the Popen call it then works fine, but then I am forced to do binary/unicode conversion and line ending processing myself.
It's worth saying though, that disabling universal_newlines in the p.communicate version also blocks indefinitely, so its not just that.
Any advice on how I can get line buffered input working properly here without having to reimplement everything?
There are library alternatives to subprocess and the SSH binary that are better suited for such tasks.
parallel-ssh:
from pssh.pssh2_client import ParalellSSHClient
client = ParallelSSHClient(['HOSTNAME'], user='LOGIN')
output = client.run_command('echo', use_pty=True)
for host, host_output in output.items():
for line in host_output.stdout:
print(line)
Replace echo with the command you need to run, or leave as-is if no command is required. The libraries require that something is passed in as command even if the remote side executes something automatically.
See also documentation for the single host SSHClient of the same project.
Per documentation, line parsing and encoding are handled by the library which is also cross-platform.
There are others like paramiko and ssh2-python that are lower level and need more code for the equivalent above - see their respective home pages for examples.

Should I always close stdout explicitly?

I am trying to integrate a small Win32 C++ program which reads from stdin and writes the decoded result (˜128 kbytes)to the output stream.
I read entire input into buffer with
while (std::cin.get(c)) { }
After I write entire output to the stdout.
Everything works fine when I run the application from command line eg test.exe < input.bin > output.bin, however this small app is supposed to be run from Python.
I expect that Python subprocess.communicate is supposed to be used, the docs say:
Interact with process: Send data to stdin. Read data from stdout and
stderr, until end-of-file is reached. Wait for process to terminate.
So communicate() will wait until the end-of-file before waiting my app to finish - is EOF supposed to happen when my application exits? Or should I explicitly do fclose(stderr) and fclose(stdout)?
Don't close stdout
In the general case, it is actually wrong, since it is possible to register a function with atexit() which tries to write to stdout, and this will break if stdout is closed.
When the process terminates, all handles are closed by the operating system automatically. This includes stdout, so you are not responsible for closing it manually.
(Technically, the C++ runtime will normally try to flush and close all C++ streams before the OS even has a chance to get involved, but the OS absolutely must close any handles which the runtime, for whatever reason, misses.)
In specialized circumstances, it may be useful to close standard streams (for example, when daemonizing), but it should be done with great care. It's usually a good idea to redirect to or from the null device (/dev/null on Unix, nul on Windows) so that code expecting to interact with those streams will still work. On Unix, this is done with freopen(3); Windows has an equivalent function, but it's part of the POSIX API and may not work well with standard Windows I/O.

Accessing an ALREADY running process, with Python

Question: Is there a way, using Python, to access the stdout of a running process? This process has not been started by Python.
Context: There is a program called mayabatch, that renders out images from 3D Maya scene files. If I were to run the program from the command line I would see progress messages from mayabatch. Sometimes, artists close these windows, leaving the progress untracable until the program finishes. That led me along this route of trying to read its stdout after it's been spawned by a foreign process.
Background:
OS: Windows 7 64-bit
My research so far: I have only found questions and answers of how to do this if it was a subprocess, using the subprocess module. I also looked briefly into psutil, but I could not find any way to read a process' stdout.
Any help would be really appreciated. Thank you.
I don't think you can get to the stdout of a process outside of the code that created it
The lazy way to is just to pipe the output of mayabatch to a text file, and then poll the text file periodically in your own code so it's under your control, rather than forcing you to wait on the pipe (which is especially hard on Windows, since Windows select doesn't work with the pipes used by subprocess.
I think this is what maya does internally too: by default mayaBatch logs its results to a file called mayaRenderLog.txt in the user's Maya directory.
If you're running mayabatch from the command line or a bat file, you can funnel stdout to a file with a > character:
mayabatch.exe "file.ma" > log.txt
You should be able to poll that text file from the outside using standard python as long as you only open it for reading. The advantage of doing it this way is that you control the frequency at which you check the file.
OTOH If you're doing it from python, it's a little tougher unless you don't mind having your python script idled until the mayabatch completes. The usual subprocess recipe, which uses popen.communicate() is going to wait for an end-of-process return code:
test = subprocess.Popen(["mayabatch.exe","filename.mb"], stdout=subprocess.PIPE)
print test.communicate()[0]
works but won't report until the process dies. But you calling readlines on the process's stdout will trigger the process and report it one line at a time:
test = subprocess.Popen(["mayabatch.exe","filename.mb"], stdout=subprocess.PIPE)
reader = iter(test.subprocess.readlines, "")
for line in reader:
print line
More discussion here

Python Seccomp Allow STDIN

I'm working on a project where I will be running potentially malicious code. It's basic organization is that there is a master and a slave process. The slave process runs the potentially malicious code, and has seccomp enabled.
import prctl
prctl.set_seccomp(True)
This is how seccomp is turned on. I can communicate fine FROM the slave TO the master, but not the other way around. When I don't turn on seccomp, I can use:
import sys
lines = sys.stdin.read()
Or something along those lines. I found this quite odd, I should have access to read and write given the default parameters of seccomp, especially for stdin/out. I have even tried opening stdin before I turn on seccomp. For example.
stdinFile = sys.stdin
prctl.set_seccomp(True)
lines = stdinFile.read()
But still to no avail. I have also tried readlines() which doesn't work. A friend suggested that I try Unix Domain Sockets, opening it before seccomp goes on, and then just using the write() call. This didn't work either. If anyone has any suggestions on how to combat this problem, please post them! I have seen some code in C for something like
seccomp_add_rule(stuff)
But I have been unsuccessful at using this in Python with the cffi module.
sys.stdin is not a file handle, you need to open it and get a file handle before calling set_seccomp. You could use os.fdopen for this. The file descriptor for stdin / stdout is available as sys.stdin.fileno().

How to get open files of a subprocess?

How to get open files of a subprocess?
i opened a subprocess which generate files, i want get file descritor of these files to do fsync on them
so if i have code like this:
p = subprocess.Popen([
'some_program'
])
the process p generate some files
i can get the process id of the subprocess using:
p.pid
but how can i get fd of these files to call flush and fsync() on them?
actually i find a utility called "lsof" (list open files)
but it is not installed or supported on my system, so i did not do further investigations on it, as i really need a standard way
thanks
Each process has its own table of file descriptors. If you know that a child process has a certain file open with FD 8 (which is easy enough, just take a listing of /proc/<pid>/fd), when you do fsync(8) you are sync'ing a file of your process, not the child's.
The same applies to all functions that use file descriptors: fread, fwrite, dup, close...
To get the effect of fsync, you might call sync instead.
What you could do instead is implement some kind of an RPC mechanism. For example you could add a signal handler that makes the child run fsync on all open FDs when it receives SIGUSR1.
If you want to use a packed solution, instead of going to /proc/pid/fd, an option is to use lsof of psutils
You can't fsync on behalf of another process. Also, you probably want flushing, not fsync. You can't flush on behalf of another process either. Rethink your requirements.

Categories