How can I terminate a process created with subprocess.run in Python 3?
The documentation of subprocess.run is here, but it doesn't specify it.
The documentation of the return-value is here, but there's no hint for it in there either.
With subprocess.Popen it's easy:
p = subprocess.Popen(...)
...
p.terminate()
How can I do the same when using subprocess.run?
you cannot, since the process returns to python interpreter only once it has ended.
You could try to get hold of the PID while running in a thread and kill it, but...
For those cases, Popen is the best solution as you can control input/output & end of your process.
From the documentation:
The underlying process creation and management in this module is handled by the Popen class. It offers a lot of flexibility so that developers are able to handle the less common cases not covered by the convenience functions.
Note that the documentation starts by describing run, then Popen, then the now deprecated check_call, check_output ... calls
Related
When calling a linux binary which takes a relatively long time through Python's subprocess module, does this release the GIL?
I want to parallelise some code which calls a binary program from the command line. Is it better to use threads (through threading and a multiprocessing.pool.ThreadPool) or multiprocessing? My assumption is that if subprocess releases the GIL then choosing the threading option is better.
When calling a linux binary which takes a relatively long time through Python's subprocess module, does this release the GIL?
Yes, it releases the Global Interpreter Lock (GIL) in the calling process.
As you are likely aware, on POSIX platforms subprocess offers convenience interfaces atop the "raw" components from fork, execve, and waitpid.
By inspection of the CPython 2.7.9 sources, fork and execve do not release the GIL. However, those calls do not block, so we'd not expect the GIL to be released.
waitpid of course does block, but we see it's implementation does give up the GIL using the ALLOW_THREADS macros:
static PyObject *
posix_waitpid(PyObject *self, PyObject *args)
{
....
Py_BEGIN_ALLOW_THREADS
pid = waitpid(pid, &status, options);
Py_END_ALLOW_THREADS
....
This could also be tested by calling out to some long running program like sleep from a demonstration multithreaded python script.
GIL doesn't span multiple processes. subprocess.Popen starts a new process. If it starts a Python process then it will have its own GIL.
You don't need multiple threads (or processes created by multiprocessing) if all you want is to run some linux binaries in parallel:
from subprocess import Popen
# start all processes
processes = [Popen(['program', str(i)]) for i in range(10)]
# now all processes run in parallel
# wait for processes to complete
for p in processes:
p.wait()
You could use multiprocessing.ThreadPool to limit number of concurrently run programs.
Since subprocess is for running executable (it is essentially a wrapper around os.fork() and os.execve()), it probably makes more sense to use it. You can use subprocess.Popen. Something like:
import subprocess
process = subprocess.Popen(["binary"])
This will run in as a separate process, hence not being affected by the GIL. You can then use the Popen.poll() method to check if child process has terminated:
if process.poll():
# process has finished its work
returncode = process.returncode
Just need to make sure you don't call any of the methods that wait for the process to finish its work (e.g. Popen.communicate()) to avoid your Python script blocking.
As mentioned in this answer
multiprocessing is for running functions within your existing
(Python) code with support for more flexible communications among the
family of processes. multiprocessing module is intended to provide
interfaces and features which are very similar to threading while
allowing CPython to scale your processing among multiple CPUs/cores
despite the GIL.
So, given your use-case, subprocess seems to be the right choice.
I see a number of posts here about subprocess module and it looks like it's changed quite a bit over the years. From reading documentation, I think I understand the answer to my question, but I'm asking for either confirmation or for someone to tell me what I'm missing. I'm using python 3.6.3.
I have a simple use case for subprocess. I need to build and execute a command. I need to capture stdout and stderr.
I do not need live results.
The size of stdout and stderr will be small relative to the server, so I don't think I need to be concerned about memory issues.
While stdout and std error will be small, I will expect multiple concurrent instances of the parent process to be invoked by a job scheduler.
I've been reading about deadlocks and pipe. Basically documentation states "don't do this stdout=PIPE b/c it can cause a problem with the OS pipe buffer".
But I think I can use subprocess.run and pass in
stdout=subprocess.PIPE
stderr=subprocess.PIPE
I think this option is different than stdout=PIPE that I'm told not to do in the documentation, b/c subprocess.run is using Popen.communicate() under the covers, which seems to handle making sure the os pipe doesn't fill up. Thus no deadlock problems, and after .run() finishes I can post process stdout and stderr.
Is this an accurate assessment or am I missing something?
Update #1:
process = Popen(command, stdout=PIPE, stderr=PIPE)
stdout, stderr = process.communicate()
In this case, are the following two things true?
the potential deadlock issues are eliminated by use of .communicate()
.communicate() will allow process to fully complete
This code seems pretty simple, but I've seen an array of other code that is far more complicated and I'm not sure if that code is solving a problem that I do not have.
It seems like subprocess.Popen() and os.fork() both are able to create a child process. I would however like to know what the difference is between both. When would you use which one? I tried looking at their source code but I couldn't find fork()'s source code on my machine and it wasn't totally clear how Popen works on Unix machines.
Could someobody please elaborate?
Thanks
subprocess.Popen let's you execute an arbitrary program/command/executable/whatever in its own process.
os.fork only allows you to create a child process that will execute the same script from the exact line in which you called it. As its name suggests, it "simply" forks the current process into 2.
os.fork is only available on Unix, and subprocess.Popen is cross-platfrom.
So I read the documentation for you. Results:
os.fork only exists on Unix. It creates a child process (by cloning the existing process), but that's all it does. When it returns, you have two (mostly) identical processes, both running the same code, both returning from os.fork (but the new process gets 0 from os.fork while the parent process gets the PID of the child process).
subprocess.Popen is more portable (in particular, it works on Windows). It creates a child process, but you must specify another program that the child process should execute. On Unix, it is implemented by calling os.fork (to clone the parent process), then os.execvp (to load the program into the new child process). Because Popen is all about executing a program, it lets you customize the initial environment of the program. You can redirect its standard handles, specify command line arguments, override environment variables, set its working directory, etc. None of this applies to os.fork.
In general, subprocess.Popen is more convenient to use. If you use os.fork, there's a lot you need to handle manually, and it'll only work on Unix systems. On the other hand, if you actually want to clone a process and not execute a new program, os.fork is the way to go.
Subprocess.popen() spawns a new OS level process.
os.fork() creates another process which will resume at exactly the same place as this one. So within the first loop run, you get a fork after which you have two processes, the "original one" (which gets a pid value of the PID of the child process) and the forked one (which gets a pid value of 0).
I wish to have a python script that runs an external program in a loop sequentially. I also want to limit each execution of the program to a max running time. If it is exceeded, then kill the program. What is the best way to accomplish this?
Thanks!
To run an external program from Python you'll normally want to use the subprocess module.
You could "roll your own" subprocess handling using os.fork() and os.execve() (or one of its exec* cousins) ... with any file descriptor plumbing and signal handling magic you like. However, the subprocess.Popen() function has implemented and exposed most of the features for what you'd want to do for you.
To arrange for the program to die after a given period of time you can have your Python script kill it after the timeout. Naturally you'll want to check to see if the process has already completed before then. Here's a dirt stupid example (using the split function from the shlex module for additional readability:
from shlex import split as splitsh
import subprocess
import time
TIMEOUT=10
cmd = splitsh('/usr/bin/sleep 60')
proc = subprocess.Popen(cmd)
time.sleep(TIMEOUT)
pstatus = proc.poll()
if pstatus is None:
proc.kill()
# Could use os.kill() to send a specific signal
# such as HUP or TERM, check status again and
# then resort to proc.kill() or os.kill() for
# SIGKILL only if necessary
As noted there are a few ways to kill your subprocess. Note that I check for "is None" rather than testing pstatus for truth. If your process completed with an exit value of zero (conventionally indicating that no error occurred) then a naïve test of the proc.poll() results would conflate that completion with the still running process status.
There are also a few ways to determine if sufficient time has passed. In this example we sleep, which is somewhat silly if there's anything else we could be doing. That just leaves our Python process (the parent of your external program) laying about idle.
You could capture the start time using time.time() then launch your subprocess, then do other work (launch other subprocesses, for example) and check the time (perhaps in a loop of other activity) until your desired timeout has been exceeded.
If any of your other activity involves file or socket (network) operations then you'd want to consider using the select module as a way to return a list of file descriptors which are readable, writable or ready with "exceptional" events. The select.select() function also takes an optional "timeout" value. A call to select.select([],[],[],x) is essentially the same as time.sleep(x) (in the case where we aren't providing any file descriptors for it to select among).
In lieu of select.select() it's also possible to use the fcntl module to set your file descriptor into a non-blocking mode and then use os.read() (NOT the normal file object .read() methods, but the lower level functionality from the os module). Again it's better to use the higher level interfaces where possible and only to resort to the lower level functions when you must. (If you use non-blocking I/O then all your os.read() or similar operations must be done within exception handling blocks, since Python will represent the "-EWOULDBLOCK" condition as an OSError (exception) like: "OSError: [Errno 11] Resource temporarily unavailable" (Linux). The precise number of the error might vary from one OS to another. However, it should be portable (at least for POSIX systems) to use the -EWOULDBLOCK value from the errno module.
(I realize I'm going down a rathole here, but information on how your program can do something useful while your child processes are running external programs is a natural extension of how to manage the timeouts for them).
Ugly details about non-blocking file I/O (including portability issues with MS Windows) have been discussed here in the past: Stackoverflow: non-blocking read on a stream in Python
As others have commented, it's better to provide more detailed questions and include short, focused snippets of code which show what effort you've already undertaken. Usually you won't find people here inclined to write tutorials rather than answers.
If you are able to use Python 3.3
From docs,
subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False, timeout=None)
subprocess.call(["ls", "-l"])
0
subprocess.call("exit 1", shell=True)
1
Should do the trick.
I'm trying to find a good and simple method to signal child processes
(created through SocketServer with ForkingMixIn) from the parent
process.
While Unix signals could be used, I want to avoid them since only
children who are interested should receive the signal, and it would be
overkill and complicated to require some kind of registration
mechanism to identify to the parent process who is interested.
(Please don't suggest threads, as this particular program won't work
with threads, and thus has to use forks.)
Since you are on a unix system, semaphores should be the easy answer.
Unfortunately, python does not seem to offer a way to call the semop system call.
If you are using python 2.6 , you may be able to use the
multiprocessing module Condition class.
I have come up with the idea of using a pipe file descriptor that the parent could write and then read/flush in combination with select, but this doesn't really qualify as a very elegant design.
In more detail: The parent would create a pipe, the subprocesses would inherit it, the parent process would write to the pipe, thereby waking up any subprocess select():ing on the file descriptor, but the parent would then immediately read from the read end of the pipe and empty it - the only effect being that those child processes that were select():ing on the pipe have woken up.
As I said, this feels odd and ugly, but I haven't found anything really better yet.
Edit:
It turns out that this doesn't work - some child processes are woken up and some aren't. I've resorted to using a Condition from the multiprocessing module.