Python subprocess.call() spawning new process for every call

Python subprocess.call() spawning new process for every call - python

I am trying to send .mp4 files to an mp4 tagging application. My problem is that on Windows everytime I call subprocess.call() or subprocess.Popen() a new process is spawned.
What I want is to open the file in the existing process if the process is already running... is this possible or will it depend on how the process being called handles new process calls?
here is what I have:
def sendToTagger(self, file):
msg = "-- " + self.getDateStamp() + "-- Sending " + os.path.basename(file) + " to Tagger...\r\n"
self.logFile.write(msg)
print(msg)
p = subprocess.Popen(['C:\\Program Files (x86)\\Tagger\\Tagger.exe', file], shell=False, stdin=None, stdout=None)

It has to spawn a new process as you are calling external command not native to your python code. But you can, if you wish wait for the process to complete by calling p.wait()

subprocess.Popen always opens a new process (that is it's purpose). You need to determine how Tagger.exe allows another program to programmatically request it to open a new file. In the simplest case you can simply communicate with it over stdin (in which cause you need to set stdin and possibly stdout to PIPE). However, your program may require some other method of inter-process communication (IPC). Such as sockets, shared memory, etc. I am not familiar with the methods on Windows, but if Tagger is a graphical program there is a good chance that you will need to do something more sophisticated.

Related

python 2.7 Popen: what does `close_fds` do?

I have a web server in Python (2.7) that uses Popen to delegate some work to a child process:
url_arg = "http://localhost/index.html?someparam=somevalue"
call = ('phantomjs', 'some/phantom/script.js', url_arg)
imageB64data = tempfile.TemporaryFile()
errordata = tempfile.TemporaryFile()
p = Popen(call, stdout=imageB64data, stderr=errordata, stdin=PIPE)
p.communicate(input="")
I am seeing intermittent issues where after some number of these Popens have occurred (roughly 64), the process runs out of file descriptors and is unable to function -- it becomes completely unresponsive and all threads seem to block forever if they attempt to open any files or sockets.
(Possibly relevant: the phantomjs child process loads a URL calls back into the server that spawned it.)
Based on this Python bug report, I believe I need to set close_fds=True on all Popen calls from inside my server process in order to mitigate the leaking of file descriptors. However, I am unfamiliar with the machinery around exec-ing subprocesses and inheritance of file descriptors so much of the Popen documentation and the notes in the aforementioned bug report are unclear to me.
It sounds like it would actually close all open file descriptors (which includes active request sockets, log file handles, etc.) in my process before executing the subprocess. This sounds like it would be strictly better than leaking the sockets, but would still result in errors.
However, in practice, when I use close_fds=True during a web request, it seems to work fine and thus far I have been unable to construct a scenario where it actually closes any other request sockets, database requests, etc.
The docs state:
If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed.
So my question is: is it "safe" and "correct" to pass close_fds=True to Popen in a multithreaded Python web server? Or should I expect this to have side effects if other requests are doing file/socket IO at the same time?

I tried the following test with the subprocess32 backport of Python 3.2/3.3's subprocess:
import tempfile
import subprocess32 as subprocess
fp = open('test.txt', 'w')
fp.write("some stuff")
echoed = tempfile.TemporaryFile()
p = subprocess.Popen(("echo", "this", "stuff"), stdout=echoed, close_fds=True)
p.wait()
echoed.seek(0)
fp.write("whatevs")
fp.write(echoed.read())
fp.close()
and I got the expected result of some stuffwhatevsecho this stuff in test.txt.
So it appears that the meaning of close in close_fds does not mean that open files (sockets, etc.) in the parent process will be unusable after executing a child process.
Also worth noting: subprocess32 defaults close_fds=True on POSIX systems, AFAICT. This implies to me that it is not as dangerous as it sounds.

I suspect that close_fds solves the problem of file descriptors leaking to subprocesses. Imagine opening a file, and then running some task using subprocess. Without close_fds, the file descriptor is copied to the subprocess, so even if the parent process closes the file, the file remains open due to the subprocess. Now, let's say we want to delete the directory with the file in another thread using shutil.rmtree. On a regular filesystem, this should not be an issue. The directory is just removed as expected. However, when the file resides on NFS, the following happens: First, Python will try to delete the file. Since the file is still in use, it gets renamed to .nfsXXX instead, where XXX is a long hexadecimal number. Next, Python will try to delete the directory, but that has become impossible because the .nfsXXX file still resides in it.

Python other way to wait for an event

I want my program to wait until a specific file will contain text instead of empty string. Another program writes data to the file. When I run the first program my computer starts overheating because of the while loop that continously checks the file content. What can I do instead of that loop?

A better solution would be to start that process from within your Python script:
from subprocess import call
retcode = call(['myprocess', 'arg1', 'arg2', 'argN'])
Check if retcode is zero, this means success--your process ran successfully with no problems. You could also use os.system instead of subprocess.call. Once the process is finished, you would know now you can read the file.
Why this method is better than monitoring files?
The process might fail and there might be no output in the file you're trying to read from.
In this case scenario, your process will check the file again and again, looking for data, this wastes kernel I/O operation time. There's nothing that could guarantee that the process will succeed at all times.
The process may receive signals, (i,e. STOP and CONT), if the process received the STOP signal, the kernel will stop the process and there might be nothing that you could read from the output file, especially if you intend to read all the data at once like when you're sorting a file. Once the process receives CONT signal, there the process will start again. Basically, this means your Python script will be trying to read simultaneously from the file while the process is stopped.
The disadvantage of this method is that, the process needs to finish first before your Python script process the output from the file. The subprocess.call blocks, the next line won't be executed by Python interpreter until the spawned process finishes first, you could instead use subprocess.Popen which is non-blocking. Even better and if possible, redirect the output of the process to stdout and use Popen to read the output of your process from its stdout and then write the output from the Python script to a file.

Accessing an ALREADY running process, with Python

Question: Is there a way, using Python, to access the stdout of a running process? This process has not been started by Python.
Context: There is a program called mayabatch, that renders out images from 3D Maya scene files. If I were to run the program from the command line I would see progress messages from mayabatch. Sometimes, artists close these windows, leaving the progress untracable until the program finishes. That led me along this route of trying to read its stdout after it's been spawned by a foreign process.
Background:
OS: Windows 7 64-bit
My research so far: I have only found questions and answers of how to do this if it was a subprocess, using the subprocess module. I also looked briefly into psutil, but I could not find any way to read a process' stdout.
Any help would be really appreciated. Thank you.

I don't think you can get to the stdout of a process outside of the code that created it
The lazy way to is just to pipe the output of mayabatch to a text file, and then poll the text file periodically in your own code so it's under your control, rather than forcing you to wait on the pipe (which is especially hard on Windows, since Windows select doesn't work with the pipes used by subprocess.
I think this is what maya does internally too: by default mayaBatch logs its results to a file called mayaRenderLog.txt in the user's Maya directory.
If you're running mayabatch from the command line or a bat file, you can funnel stdout to a file with a > character:
mayabatch.exe "file.ma" > log.txt
You should be able to poll that text file from the outside using standard python as long as you only open it for reading. The advantage of doing it this way is that you control the frequency at which you check the file.
OTOH If you're doing it from python, it's a little tougher unless you don't mind having your python script idled until the mayabatch completes. The usual subprocess recipe, which uses popen.communicate() is going to wait for an end-of-process return code:
test = subprocess.Popen(["mayabatch.exe","filename.mb"], stdout=subprocess.PIPE)
print test.communicate()[0]
works but won't report until the process dies. But you calling readlines on the process's stdout will trigger the process and report it one line at a time:
test = subprocess.Popen(["mayabatch.exe","filename.mb"], stdout=subprocess.PIPE)
reader = iter(test.subprocess.readlines, "")
for line in reader:
print line
More discussion here

Cleaning up temp folder after long-running subprocess exits

I have a Python script (running inside another application) which generates a bunch of temporary images. I then use subprocess to launch an application to view these.
When the image-viewing process exists, I want to remove the temporary images.
I can't do this from Python, as the Python process may have exited before the subprocess completes. I.e I cannot do the following:
p = subprocess.Popen(["imgviewer", "/example/image1.jpg", "/example/image1.jpg"])
p.communicate()
os.unlink("/example/image1.jpg")
os.unlink("/example/image2.jpg")
..as this blocks the main thread, nor could I check for the pid exiting in a thread etc
The only solution I can think of means I have to use shell=True, which I would rather avoid:
import pipes
import subprocess
cmd = ['imgviewer']
cmd.append("/example/image2.jpg")
for x in cleanup:
cmd.extend(["&&", "rm", pipes.quote(x)])
cmdstr = " ".join(cmd)
subprocess.Popen(cmdstr, shell = True)
This works, but is hardly elegant..
Basically, I have a background subprocess, and want to remove the temp files when it exits, even if the Python process no longer exists.

If you're on any variant of Unix, you could fork your Python program, and have the parent process go on with its life while the child process daemonized, runs the viewer (doesn't matter in the least if that blocks the child process, which has no other job in life anyway;-), and cleans up after it. The original Python process may or may not exist at this point, but the "waiting to clean up" child process of course will (some process or other has to do the clean-up, after all, right?-).
If you're on Windows, or need cross-platform code, then have your Python program "spawn" (i.e., just start with subprocess, then go on with life) another (much smaller) one, which is the one tasked to run the viewer (blocking, who cares) and then do the clean-up. (If on Unix, even in this case you may want to daemonize, otherwise the child process might go away when the parent process does).

Starting and Controlling an External Process via STDIN/STDOUT with Python

I need to launch an external process that is to be controlled via messages sent back and forth via stdin and stdout. Using subprocess.Popen I am able to start the process but am unable to control the execution via stdin as I need to.
The flow of what I'm trying to complete is to:
Start the external process
Iterate for some number of steps
Tell the external process to complete the next processing step by writing a new-line character to it's stdin
Wait for the external process to signal it has completed the step by writing a new-line character to it's stdout
Close the external process's stdin to indicate to the external process that execution has completed.
I have come up with the following so far:
process = subprocess.Popen([PathToProcess], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
for i in xrange(StepsToComplete):
print "Forcing step # %s" % i
process.communicate(input='\n')
When I run the above code the '\n' is not communicated to the external process, and I never get beyond step #0. The code blocks at process.communicate() and does not proceed any further. I am using the communicate() method incorrectly?
Also how would I implement the "wait until the external process writes a new line" piece of functionality?

process.communicate(input='\n') is wrong. If you will notice from the Python docs, it writes your string to the stdin of the child, then reads all output from the child until the child exits. From doc.python.org:
Popen.communicate(input=None) Interact
with process: Send data to stdin. Read
data from stdout and stderr, until
end-of-file is reached. Wait for
process to terminate. The optional
input argument should be a string to
be sent to the child process, or None,
if no data should be sent to the
child.
Instead, you want to just write to the stdin of the child. Then read from it in your loop.
Something more like:
process=subprocess.Popen([PathToProcess],stdin=subprocess.PIPE,stdout=subprocess.PIPE);
for i in xrange(StepsToComplete):
print "Forcing step # %s"%i
process.stdin.write("\n")
result=process.stdout.readline()
This will do something more like what you want.

You could use Twisted, by using reactor.spawnProcess and LineReceiver.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.