subprocess.Popen() getting stuck - python

My question
I encountered a hang-up issue with the combination of threading, multiprocessing, and subprocess. I simplified my situation as below.
import subprocess
import threading
import multiprocessing
class dummy_proc(multiprocessing.Process):
def run(self):
print('run')
while True:
pass
class popen_thread(threading.Thread):
def run(self):
proc = subprocess.Popen('ls -la'.split(), shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout_byte, stderr_byte = proc.communicate()
rc = proc.returncode
print(rc)
if __name__ == '__main__':
print('start')
t = popen_thread()
t.start()
p = dummy_proc()
p.start()
t.join()
p.terminate()
In this script, a thread and a process are generated, respectively. The thread just issues the system command ls -la. The process just loops infinitely. When the thread finishes getting the return code of the system command, it terminates the process and exits immediately.
When I run this script again and again, it sometimes hangs up. I googled this situation and found some articles which seem to be related.
Is it safe to fork from within a thread?
Issue with python's subprocess,popen (creating a zombie and getting stuck)
So, I guess the hang-up issue is explained something like below.
The process is generated between Popen() and communicate().
It inherits some "blocking" status of the thread, and it is never released.
It prevents the thread from acquiring the result of the communitare().
But I'm not 100% confident, so it would be great if someone helped me explain what happens here.
My environment
I used following environment.
$ uname -a
Linux dell-vostro5490 5.10.96-1-MANJARO #1 SMP PREEMPT Tue Feb 1 16:57:46 UTC 2022 x86_64 GNU/Linux
$ python3 --version
Python 3.9.2
I also tried following environment and got the same result.
$ uname -a
Linux raspberrypi 5.10.17+ #2 Tue Jul 6 21:58:58 PDT 2021 armv6l GNU/Linux
$ python3 --version
Python 3.7.3
What I tried
Use spawn instead of fork for multiprocessing.
Use thread instead of process for dummy_proc.
In both cases, the issue disappeared. So, I guess this issue is related with the behavior of the fork...

This is a bit too long for a comment and so ...
I am having a problem understanding your statement that the problem disappears when you "Use thread instead of process for dummy_proc."
The hanging problem as I understand it is "that fork() only copies the calling thread, and any mutexes held in child threads will be forever locked in the forked child." In other words, the hanging problem arises when a fork is done when there exists one or more threads other than the main thread (i.e, the one associated with the main process).
If you execute a subprocess.Popen call from a newly created subprocess or a newly created thread, either way there will be by definition a new thread in existence prior to the fork done to implement the Popen call and I would think the potential for hanging exists.
import subprocess
import threading
import multiprocessing
import os
class popen_process(multiprocessing.Process):
def run(self):
print(f'popen_process, PID = {os.getpid()}, TID={threading.current_thread().ident}')
proc = subprocess.Popen('ls -la'.split(), shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout_byte, stderr_byte = proc.communicate()
rc = proc.returncode
if __name__ == '__main__':
print(f'main process, PID = {os.getpid()}, TID={threading.current_thread().ident}')
multiprocessing.set_start_method('spawn')
p = popen_process()
p.start()
p.join()
Prints:
main process, PID = 14, TID=140301923051328
popen_process, PID = 16, TID=140246240732992
Note the new thread with TID=140246240732992
It seems to me that you need to use startup method spawn as long as you are doing the Popen call from another thread or process if you want to be sure of not hanging. For what it's worth, on my Windows Subsystem for Linux I could not get it to hang with fork using your code after quite a few tries. So I am just going by what the linked answer warns against.
In any event, in your example code, there seems to be a potential race condition. Let's assume that even though your popen_process is a new thread, its properties are such that it does not give rise to the hanging problem (no mutexes are being held). Then the problem would be arising from the creation of the dummy_proc process/thread. The question then becomes whether your call to t1.start() completes the starting of the new process that ultimately runs the ls -la command prior to or after the completion of the creation of the dummy_proc process/thread. This timing will determine whether the new dummy_proc thread (there will be one regardless of whether dummy_proc inherits from Process or Thread as we have seen) will exist prior to the creation of the ls -la process. This race condition might explain why you sometimes were hanging. I would have no explanation for why if you make dummy_proc inherit from threading.Thread that you never hang.

Related

Python subprocess hang when a lot of executable are called

I have a problem with using Python to run Windows executables in parallel.
I will explain my problem in more detail.
I was able to write some code that creates an amount of threads equal to the number of cores. Each thread executes the following function that starts the executable with the use of subprocess.Popen().
The executable are unit test for an application. The test use gtest library. From what I know they just read and write on the file system.
def _execute(self, test_file_path) -> None:
test_path = self._get_test_path_without_extension(test_file_path)
process = subprocess.Popen(test_path,
shell=False,
stdout=sys.stdout,
stderr=sys.stderr,
universal_newlines=True)
try:
process.communicate(timeout=TEST_TIMEOUT_IN_SECONDS)
if process.returncode != 0:
print(f'Test fail')
except subprocess.TimeoutExpired:
process.kill()
During the execution of processes it happens that some hang, never ending. I set a timeout as workaround but I wondering why some of these application never terminate. This block the execution of the Python code.
The following code show the creation of the threads. The function _execute_tests just take a test from the Queue (with the .get() function) and pass it to the function execute(test_file_path).
### Peace of code used to spawn the threads
for i in range(int(core_num)):
thread = threading.Thread(target=self._execute_tests,
args=(tests,),
daemon=True)
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
I already try to:
use subprocess.run, subprocess.call and the other function explained on the documentation page
use a larger buffer with the use of bufsize parameter
disable the buffer
move the stdout to a file per thread
move the stdout to subprocess.DEVNULL
remove the use of subprocess.communicate()
remove the use of threading
use multiprocessing
On my local machine with 16 core / 64GB RAM I can run without problems 16 threads. All of them always terminate without problems. To be able to reproduce the problem I need to increase the number of threads to 30/40.
On Azure machine with 8 core / 32 GB RAM the issues can be reproduce with just 8 threads in parallel.
If I run the executables from a bat
for /r "." %%a in (*.exe) do start /B "" "%%~fa"
the problem never happen.
Have someone an idea of what could be the problem?

Have subprocess.Popen only wait on its child process to return, but not any grandchildren

I have a python script that does this:
p = subprocess.Popen(pythonscript.py, stdin=PIPE, stdout=PIPE, stderr=PIPE, shell=False)
theStdin=request.input.encode('utf-8')
(outputhere,errorshere) = p.communicate(input=theStdin)
It works as expected, it waits for the subprocess to finish via p.communicate(). However within the pythonscript.py I want to "fire and forget" a "grandchild" process. I'm currently doing this by overwriting the join function:
class EverLastingProcess(Process):
def join(self, *args, **kwargs):
pass # Overwrites join so that it doesn't block. Otherwise parent waits.
def __del__(self):
pass
And starting it like this:
p = EverLastingProcess(target=nameOfMyFunction, args=(arg1, etc,), daemon=False)
p.start()
This also works fine I just run pythonscript.py in a bash terminal or bash script. Control and a response returns while the child process started by EverLastingProcess keeps going. However, when I run pythonscript.py with Popen running the process as shown above, it looks from timings that the Popen is waiting on the grandchild to finish.
How can I make it so that the Popen only waits on the child process, and not any grandchild processes?
The solution above (using the join method with the shell=True addition) stopped working when we upgraded our Python recently.
There are many references on the internet about the pieces and parts of this, but it took me some doing to come up with a useful solution to the entire problem.
The following solution has been tested in Python 3.9.5 and 3.9.7.
Problem Synopsis
The names of the scripts match those in the code example below.
A top-level program (grandparent.py):
Uses subprocess.run or subprocess.Popen to call a program (parent.py)
Checks return value from parent.py for sanity.
Collects stdout and stderr from the main process 'parent.py'.
Does not want to wait around for the grandchild to complete.
The called program (parent.py)
Might do some stuff first.
Spawns a very long process (the grandchild - "longProcess" in the code below).
Might do a little more work.
Returns its results and exits while the grandchild (longProcess) continues doing what it does.
Solution Synopsis
The important part isn't so much what happens with subprocess. Instead, the method for creating the grandchild/longProcess is the critical part. It is necessary to ensure that the grandchild is truly emancipated from parent.py.
Subprocess only needs to be used in a way that captures output.
The longProcess (grandchild) needs the following to happen:
It should be started using multiprocessing.
It needs multiprocessing's 'daemon' set to False.
It should also be invoked using the double-fork procedure.
In the double-fork, extra work needs to be done to ensure that the process is truly separate from parent.py. Specifically:
Move the execution away from the environment of parent.py.
Use file handling to ensure that the grandchild no longer uses the file handles (stdin, stdout, stderr) inherited from parent.py.
Example Code
grandparent.py - calls parent.py using subprocess.run()
#!/usr/bin/env python3
import subprocess
p = subprocess.run(["/usr/bin/python3", "/path/to/parent.py"], capture_output=True)
## Comment the following if you don't need reassurance
print("The return code is: " + str(p.returncode))
print("The standard out is: ")
print(p.stdout)
print("The standard error is: ")
print(p.stderr)
parent.py - starts the longProcess/grandchild and exits, leaving the grandchild running. After 10 seconds, the grandchild will write timing info to /tmp/timelog.
!/usr/bin/env python3
import time
def longProcess() :
time.sleep(10)
fo = open("/tmp/timelog", "w")
fo.write("I slept! The time now is: " + time.asctime(time.localtime()) + "\n")
fo.close()
import os,sys
def spawnDaemon(func):
# do the UNIX double-fork magic, see Stevens' "Advanced
# Programming in the UNIX Environment" for details (ISBN 0201563177)
try:
pid = os.fork()
if pid > 0: # parent process
return
except OSError as e:
print("fork #1 failed. See next. " )
print(e)
sys.exit(1)
# Decouple from the parent environment.
os.chdir("/")
os.setsid()
os.umask(0)
# do second fork
try:
pid = os.fork()
if pid > 0:
# exit from second parent
sys.exit(0)
except OSError as e:
print("fork #2 failed. See next. " )
print(e)
print(1)
# Redirect standard file descriptors.
# Here, they are reassigned to /dev/null, but they could go elsewhere.
sys.stdout.flush()
sys.stderr.flush()
si = open('/dev/null', 'r')
so = open('/dev/null', 'a+')
se = open('/dev/null', 'a+')
os.dup2(si.fileno(), sys.stdin.fileno())
os.dup2(so.fileno(), sys.stdout.fileno())
os.dup2(se.fileno(), sys.stderr.fileno())
# Run your daemon
func()
# Ensure that the daemon exits when complete
os._exit(os.EX_OK)
import multiprocessing
daemonicGrandchild=multiprocessing.Process(target=spawnDaemon, args=(longProcess,))
daemonicGrandchild.daemon=False
daemonicGrandchild.start()
print("have started the daemon") # This will get captured as stdout by grandparent.py
References
The code above was mainly inspired by the following two resources.
This reference is succinct about the use of the double-fork but does not include the file handling we need in this situation.
This reference contains the needed file handling, but does many other things that we do not need.
Edit: the below stopped working after a Python upgrade, see the accepted answer from Lachele.
Working answer from a colleague, change to shell=True like this:
p = subprocess.Popen(pythonscript.py, stdin=PIPE, stdout=PIPE, stderr=PIPE, shell=True)
I've tested and the grandchild subprocesses stay alive after the child processes returns without waiting for them to finish.

Issue with python's subprocess,popen (creating a zombie and getting stuck)

An issue I have with Python's (3.4) subprocess.popen:
Very rarely (once in several thousands), calls to popen seem to create another forked process, in addition to the intentional process, and hanging (possibly waiting?), resulting in the intentional process becoming a zombie.
Here's the call sequence:
with subprocess.Popen(['prog', 'arg1', 'arg2'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) as p:
std_out, std_err = p.communicate()
p.wait()
Note: the above call sequence is run itself from a forked process (a form of process pooling, see process list below)
The issue happens with multiple programs (7z for example) so I assume the problem is with the caller and not the callee.
prog is zombiefied, so I assume the p.wait() statement is never reached or not executed properly.
The resulting process list (ps -ef output):
my_user 18219 18212 9 16:16 pts/1 00:18:11 python3 script.py # original process
my_user 1045 18219 0 16:18 pts/1 00:00:14 python3 script.py # Intentionally forked from original (poor man's process pool) - Seems to be stuck or waiting
my_user 2834 1045 0 16:18 pts/1 00:00:00 [prog] <defunct> # Program run by subprocess.popen - Zombie
my_user 2841 1045 0 16:18 pts/1 00:00:00 python3 script.py # !!!! Should not be here, also stuck or waiting, never finishes
Edited (added code sample as requested):
The code in questions:
import os
import subprocess
pid = os.fork()
if pid == 0:
# child
file_name='test.zip'
out_dir='/tmp'
while True:
with subprocess.Popen(['7z', 'x', '-y', '-p', '-o' + out_dir, file_name], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) as p:
try:
std_out, std_err = p.communicate(timeout=600)
except subprocess.TimeoutExpired:
p.kill()
std_out, std_err = p.communicate()
logging.critical('7z failed, a timeout has occurred during waiting')
except:
p.kill()
p.wait()
raise
return_code = p.poll()
# do something
else:
# parent
wpid, status = os.waitpid(pid, 0)
exit_code = status >> 8
I believe this is an effect of mixing forking and threading, which is a bad thing to do in Linux. Here are a couple references:
Is it safe to fork from within a thread?
https://rachelbythebay.com/w/2011/06/07/forked/
I believe your process is multithreaded once you import the logging module. (In my case, I was sometimes seeing my program hang while waiting on a logging futex and sometimes hang while waiting inside subprocess with the subprocess having become a zombie.) That module uses OS locks to ensure that it can be called in a thread-safe manner. Once you fork, that lock's state is inherited by the child process. So the child (which is single threaded but inherited the memory of the parent) can't acquire the logging lock because the lock was sometimes locked when the fork happened.
(I'm not super confident in my explanation. My problem went away when I switched from using multiprocessing's default fork behavior to using spawn behavior. In the latter, a child does not inherit its parent's memory, and subprocess and logging no longer caused hangs for me.)
subprocess indeed forks before running the command. This is mentionned in PEP 324 (ctrl-f for “fork”).
The reason is that the command is run using exec, which replaces the calling process by the executed one.
As you can see, it shares the same pid as the executed script, so it actually is the same process, but it is not the python interpreter that is being run.
So, as long as the child process does not return, the caller python process can't.

Kill a chain of sub processes on KeyboardInterrupt

I'm having a strange problem I've encountered as I wrote a script to start my local JBoss instance.
My code looks something like this:
with open("/var/run/jboss/jboss.pid", "wb") as f:
process = subprocess.Popen(["/opt/jboss/bin/standalone.sh", "-b=0.0.0.0"])
f.write(str(process.pid))
try:
process.wait()
except KeyboardInterrupt:
process.kill()
Should be fairly simple to understand, write the PID to a file while its running, once I get a KeyboardInterrupt, kill the child process.
The problem is that JBoss keeps running in the background after I send the kill signal, as it seems that the signal doesn't propagate down to the Java process started by standalone.sh.
I like the idea of using Python to write system management scripts, but there are a lot of weird edge cases like this where if I would have written it in Bash, everything would have just worked™.
How can I kill the entire subprocess tree when I get a KeyboardInterrupt?
You can do this using the psutil library:
import psutil
#..
proc = psutil.Process(process.pid)
for child in proc.children(recursive=True):
child.kill()
proc.kill()
As far as I know the subprocess module does not offer any API function to retrieve the children spawned by subprocesses, nor does the os module.
A better way of killing the processes would probably be the following:
proc = psutil.Process(process.pid)
procs = proc.children(recursive=True)
procs.append(proc)
for proc in procs:
proc.terminate()
gone, alive = psutil.wait_procs(procs, timeout=1)
for p in alive:
p.kill()
This would give a chance to the processes to terminate correctly and when the timeout ends the remaining processes will be killed.
Note that psutil also provides a Popen class that has the same interface of subprocess.Popen plus all the extra functionality of psutil.Process. You may want to simply use that instead of subprocess.Popen. It is also safer because psutil checks that PIDs don't get reused if a process terminates, while subprocess doesn't.

Threading behavior is weird in python subprocess

So I essentially have a case like this where in my main script I have
command = 'blender -b ' + settings.BLENDER_ROOT + 'uploadedFileCheck.blend -P ' + settings.BLENDER_ROOT + 'uploadedFileCheck.py -noaudio'
process = Popen(command.split(' ') ,stdout=PIPE, stderr=PIPE)
out, err = process.communicate()
And in the subprocess script uploadedFileCheck.py I have the line
exportFile(fileIn, fileOut)
Thread(target=myfunction).start()
So I expect the subprocess to be finished, or at least to return to out, err after the exportFile() call, but it seems it's waiting for the Thread to finish as well. Does anyone understand this behavior?
Also, in case you're wondering, I'm calling that other python file as a subprocess because the main script is in python2 and that script (blender) is in python3, but that's irrelevant (and can't change)
A process won't exit until all its non-daemon threads have exited. By default, Thread objects in Python are created as non-daemon threads. If you want your script to exit as soon as the main thread is done, rather than waiting for the spawned thread to finish, set the daemon flag on the Thread object to True prior to starting it:
t = Thread(target=myfunction)
t.daemon = True
t.start()
Note that this will kill the daemon thread in a non-graceful way, without any cleanup occuring. If you're doing any kind of work in that thread that needs to be cleaned up, you should consider an approach where you signal the thread to shut itself down, instead.

Categories