Accessing an ALREADY running process, with Python

Accessing an ALREADY running process, with Python - python

Question: Is there a way, using Python, to access the stdout of a running process? This process has not been started by Python.
Context: There is a program called mayabatch, that renders out images from 3D Maya scene files. If I were to run the program from the command line I would see progress messages from mayabatch. Sometimes, artists close these windows, leaving the progress untracable until the program finishes. That led me along this route of trying to read its stdout after it's been spawned by a foreign process.
Background:
OS: Windows 7 64-bit
My research so far: I have only found questions and answers of how to do this if it was a subprocess, using the subprocess module. I also looked briefly into psutil, but I could not find any way to read a process' stdout.
Any help would be really appreciated. Thank you.

I don't think you can get to the stdout of a process outside of the code that created it
The lazy way to is just to pipe the output of mayabatch to a text file, and then poll the text file periodically in your own code so it's under your control, rather than forcing you to wait on the pipe (which is especially hard on Windows, since Windows select doesn't work with the pipes used by subprocess.
I think this is what maya does internally too: by default mayaBatch logs its results to a file called mayaRenderLog.txt in the user's Maya directory.
If you're running mayabatch from the command line or a bat file, you can funnel stdout to a file with a > character:
mayabatch.exe "file.ma" > log.txt
You should be able to poll that text file from the outside using standard python as long as you only open it for reading. The advantage of doing it this way is that you control the frequency at which you check the file.
OTOH If you're doing it from python, it's a little tougher unless you don't mind having your python script idled until the mayabatch completes. The usual subprocess recipe, which uses popen.communicate() is going to wait for an end-of-process return code:
test = subprocess.Popen(["mayabatch.exe","filename.mb"], stdout=subprocess.PIPE)
print test.communicate()[0]
works but won't report until the process dies. But you calling readlines on the process's stdout will trigger the process and report it one line at a time:
test = subprocess.Popen(["mayabatch.exe","filename.mb"], stdout=subprocess.PIPE)
reader = iter(test.subprocess.readlines, "")
for line in reader:
print line
More discussion here

Related

python 2.7 Popen: what does `close_fds` do?

I have a web server in Python (2.7) that uses Popen to delegate some work to a child process:
url_arg = "http://localhost/index.html?someparam=somevalue"
call = ('phantomjs', 'some/phantom/script.js', url_arg)
imageB64data = tempfile.TemporaryFile()
errordata = tempfile.TemporaryFile()
p = Popen(call, stdout=imageB64data, stderr=errordata, stdin=PIPE)
p.communicate(input="")
I am seeing intermittent issues where after some number of these Popens have occurred (roughly 64), the process runs out of file descriptors and is unable to function -- it becomes completely unresponsive and all threads seem to block forever if they attempt to open any files or sockets.
(Possibly relevant: the phantomjs child process loads a URL calls back into the server that spawned it.)
Based on this Python bug report, I believe I need to set close_fds=True on all Popen calls from inside my server process in order to mitigate the leaking of file descriptors. However, I am unfamiliar with the machinery around exec-ing subprocesses and inheritance of file descriptors so much of the Popen documentation and the notes in the aforementioned bug report are unclear to me.
It sounds like it would actually close all open file descriptors (which includes active request sockets, log file handles, etc.) in my process before executing the subprocess. This sounds like it would be strictly better than leaking the sockets, but would still result in errors.
However, in practice, when I use close_fds=True during a web request, it seems to work fine and thus far I have been unable to construct a scenario where it actually closes any other request sockets, database requests, etc.
The docs state:
If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed.
So my question is: is it "safe" and "correct" to pass close_fds=True to Popen in a multithreaded Python web server? Or should I expect this to have side effects if other requests are doing file/socket IO at the same time?

I tried the following test with the subprocess32 backport of Python 3.2/3.3's subprocess:
import tempfile
import subprocess32 as subprocess
fp = open('test.txt', 'w')
fp.write("some stuff")
echoed = tempfile.TemporaryFile()
p = subprocess.Popen(("echo", "this", "stuff"), stdout=echoed, close_fds=True)
p.wait()
echoed.seek(0)
fp.write("whatevs")
fp.write(echoed.read())
fp.close()
and I got the expected result of some stuffwhatevsecho this stuff in test.txt.
So it appears that the meaning of close in close_fds does not mean that open files (sockets, etc.) in the parent process will be unusable after executing a child process.
Also worth noting: subprocess32 defaults close_fds=True on POSIX systems, AFAICT. This implies to me that it is not as dangerous as it sounds.

I suspect that close_fds solves the problem of file descriptors leaking to subprocesses. Imagine opening a file, and then running some task using subprocess. Without close_fds, the file descriptor is copied to the subprocess, so even if the parent process closes the file, the file remains open due to the subprocess. Now, let's say we want to delete the directory with the file in another thread using shutil.rmtree. On a regular filesystem, this should not be an issue. The directory is just removed as expected. However, when the file resides on NFS, the following happens: First, Python will try to delete the file. Since the file is still in use, it gets renamed to .nfsXXX instead, where XXX is a long hexadecimal number. Next, Python will try to delete the directory, but that has become impossible because the .nfsXXX file still resides in it.

Python other way to wait for an event

I want my program to wait until a specific file will contain text instead of empty string. Another program writes data to the file. When I run the first program my computer starts overheating because of the while loop that continously checks the file content. What can I do instead of that loop?

A better solution would be to start that process from within your Python script:
from subprocess import call
retcode = call(['myprocess', 'arg1', 'arg2', 'argN'])
Check if retcode is zero, this means success--your process ran successfully with no problems. You could also use os.system instead of subprocess.call. Once the process is finished, you would know now you can read the file.
Why this method is better than monitoring files?
The process might fail and there might be no output in the file you're trying to read from.
In this case scenario, your process will check the file again and again, looking for data, this wastes kernel I/O operation time. There's nothing that could guarantee that the process will succeed at all times.
The process may receive signals, (i,e. STOP and CONT), if the process received the STOP signal, the kernel will stop the process and there might be nothing that you could read from the output file, especially if you intend to read all the data at once like when you're sorting a file. Once the process receives CONT signal, there the process will start again. Basically, this means your Python script will be trying to read simultaneously from the file while the process is stopped.
The disadvantage of this method is that, the process needs to finish first before your Python script process the output from the file. The subprocess.call blocks, the next line won't be executed by Python interpreter until the spawned process finishes first, you could instead use subprocess.Popen which is non-blocking. Even better and if possible, redirect the output of the process to stdout and use Popen to read the output of your process from its stdout and then write the output from the Python script to a file.

Start another program and leave it running when the script ends

I'm using subprocess.Popen to launch an external program with arguments, but when I've opened it the script is hanging, waiting for the program to finish and if I close the script the program immediately quits.
I thought I was just using a similar process before without issue, so I'm unsure if I've actually done it wrong or I'm misremembering what Popen can do. This is how I'm calling my command:
subprocess.Popen(["rv", rvFile, '-nc'])
raw_input("= Opened file")
The raw_input part is only there so the user has a chance to see the message and know that the file should be opened now. But what I end up getting is all the information that the process itself is spitting back out as if it were called in the command line directly. My understanding was that Popen made it an independent child process that would allow me to close the script and leave the other process open.
The linked duplicate question does have a useful answer for my purposes, though it's still not working as I want it.
This is the answer. And this is how I changed my code:
DETACHED_PROCESS = 0x00000008
pid = subprocess.Popen(["rv", rvFile, '-nc'], creationflags=DETACHED_PROCESS).pid
raw_input("= Opened file")
It works from IDLE but not when I run the py file through the command prompt style interface. It's still tied to that window, printing the output and quitting the program as soon as I've run the script.

The stackoverflow question Calling an external command in python has a lot of useful answers which are related.
Take a look at os.spawnl, it can take a number of mode flags which include NOWAIT, WAIT.
import os
os.spawnl(os.P_NOWAIT, 'some command')
The NOWAIT option will return the process ID of the spawned task.

Sorry for such a short answer but I have not earned enough points to leave comments yet. Anyhow, put the raw_input("= Opened file") inside the file you are actually opening, rather than the program you are opening it from.
If the file you are opening is not a python file, then it will close upon finishing,regardless of what you declare from within python. If that is the case you could always try detaching it from it's parent using:
from subprocess import Popen, CREATE_NEW_PROCESS_GROUP
subprocess.Popen(["rv", rvFile, '-nc'], close_fds = True | CREATE_NEW_PROCESS_GROUP)

This is specifically for running the python script as a commandline process, but I eventually got this working by combining two answers that people suggested.
Using the combination of DETACHED_PROCESS suggested in this answer worked for running it through IDLE, but the commandline interface. But using shell=True (as ajsp suggested) and the DETACHED_PROCESS parameter it allows me to close the python script window and leave the other program still running.
DETACHED_PROCESS = 0x00000008
pid = subprocess.Popen(["rv", rvFile, '-nc'], creationflags=DETACHED_PROCESS, shell=True).pid
raw_input("= Opened file")

Writing and reading stdout unbuffered to a file over SSH

I'm using Node to execute a Python script. The Python script SSH's into a server, and then runs a Pig job. I want to be able to get the standard out from the Pig job, and display it in the browser.
I'm using the PExpect library to make the SSH calls, but this will not print the output of the pig call until it has totally completed (at least the way I have it written). Any tips on how to restructure it?
child.sendline(command)
child.expect(COMMAND_PROMPT)
print(child.before)
I know I shouldn't be expecting the command prompt (cause that will only show up when the process ends), but I'm not sure what I should be expecting.

Repeating my comment as an answer, since it solved the issue:
If you set child.logfile_read to a writable file-like object (e.g. sys.stdout), Pexpect will the forward the output there as it reads it.
child.logfile_read = sys.stdout
child.sendline(command)
child.expect(COMMAND_PROMPT)

How to display a file owned by another thread?

I'm trying to build an application that displays in a GUI the contents of a log file, written by a separate program that I call through subprocess. The application runs in Windows, and is a binary that I have no control over. Also, this application (Actel Designer if anyone cares) will write its output to a log file regardless of how I redirect the output of subprocess, so using a pipe for the output doesn't seem to be an option. The bottom line is that I seem to be forced into reading from a log file at the same time another thread may be writing to it. My question is if there is a way that I can keep the GUI's display of the log file's contents up to date in a robust way?
I've tried the following:
Naively opening the file for reading periodically while the child
process is running causes Python to crash (I'm guessing because the
child thread is writing to the file while I'm attempting to read its
contents)
Next I tried to open a file handle to the log filename before invoking the child process with GENERIC_READ, and SHARED_READ | SHARED_WRITE | SHARED_DELETE and reading back from that file. With this approach, the file appears empty
Thanks for any help you can provide - I'm not a professional programmer and I've been pulling my hair out over this for a week.

You should register for notifications on file change, the way tail -f does (you can find out what system calls it uses by executing strace tail -f logfile).
pyinotify provides a Python interface for these file change notifications.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.