Threading behavior is weird in python subprocess

Threading behavior is weird in python subprocess - python

So I essentially have a case like this where in my main script I have
command = 'blender -b ' + settings.BLENDER_ROOT + 'uploadedFileCheck.blend -P ' + settings.BLENDER_ROOT + 'uploadedFileCheck.py -noaudio'
process = Popen(command.split(' ') ,stdout=PIPE, stderr=PIPE)
out, err = process.communicate()
And in the subprocess script uploadedFileCheck.py I have the line
exportFile(fileIn, fileOut)
Thread(target=myfunction).start()
So I expect the subprocess to be finished, or at least to return to out, err after the exportFile() call, but it seems it's waiting for the Thread to finish as well. Does anyone understand this behavior?
Also, in case you're wondering, I'm calling that other python file as a subprocess because the main script is in python2 and that script (blender) is in python3, but that's irrelevant (and can't change)

A process won't exit until all its non-daemon threads have exited. By default, Thread objects in Python are created as non-daemon threads. If you want your script to exit as soon as the main thread is done, rather than waiting for the spawned thread to finish, set the daemon flag on the Thread object to True prior to starting it:
t = Thread(target=myfunction)
t.daemon = True
t.start()
Note that this will kill the daemon thread in a non-graceful way, without any cleanup occuring. If you're doing any kind of work in that thread that needs to be cleaned up, you should consider an approach where you signal the thread to shut itself down, instead.

Related

How ensure subprocess is killed on timeout when using `run`?

I am using the following code to launch a subprocess :
# Run the program
subprocess_result = subprocess.run(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
check=False,
timeout=timeout,
cwd=directory,
env=env,
preexec_fn=set_memory_limits,
)
The launched subprocess is also a Python program, with a shebang.
This subprocess may last for longer than the specified timeout.
The subprocess does heavy computations and write results in a file and does not contain any signal handler.
According to the documentation https://docs.python.org/3/library/subprocess.html#subprocess.run, subprocess.run kills a child that timesout :
The timeout argument is passed to Popen.communicate(). If the timeout
expires, the child process will be killed and waited for. The
TimeoutExpired exception will be re-raised after the child process has
terminated.
When my subprocess timesout, I always receive the subprocess.TimeoutExpired exception, but from time to time the subprocess is not killed, hence still consuming resources on my machine.
So my question is, am I doing something wrong here ? If yes, what and if no, why do I have this issue and how can I solve it ?
Note : I am using Python 3.10 on Ubuntu 22_04

The most likely culprit for the behaviour you see is that the subprocess you are spawning is probably using multiprocessing and spawning its own child processes. Killing the parent process does not automatically kill the whole set of descendants. The granchildren are inherited by the init process (i.e. the process with PID 1) and will continue to run.
You can verify from the source code of suprocess.run :
with Popen(*popenargs, **kwargs) as process:
try:
stdout, stderr = process.communicate(input, timeout=timeout)
except TimeoutExpired as exc:
process.kill()
if _mswindows:
# Windows accumulates the output in a single blocking
# read() call run on child threads, with the timeout
# being done in a join() on those threads. communicate()
# _after_ kill() is required to collect that and add it
# to the exception.
exc.stdout, exc.stderr = process.communicate()
else:
# POSIX _communicate already populated the output so
# far into the TimeoutExpired exception.
process.wait()
raise
except: # Including KeyboardInterrupt, communicate handled that.
process.kill()
# We don't call process.wait() as .__exit__ does that for us.
raise
Here you can see at line 550 the timeout is set on the communicate call, if it fires at line 552 the subprocess is .kill()ed. The kill method sends a SIGKILL which immediately kills the subprocess without any cleanup. It's a signal that cannot be caught by the subprocess, so it's not possible that the child is somehow ignoring it.
The TimeoutException is then re-raised at line 564, so if your parent process sees this exception the subprocess is already dead.
This however says nothing of granchildren processes. Those will continue to run as children of PID 1.
I don't see any way in which you can customize how subprocess.run handles subprocess termination. For example, if it used SIGTERM instead of SIGKILL you could modify your child process or write a wrapper process that will catch the signal and properly kill all its descendants. But SIGKILL doesn't give you this luxury.
So I believe that for your use case you cannot use the subprocess.run facade but you should use Popen directly. You can look at the subprocess.run implementation and take just the things that you need, maybe dropping support for platforms you don't use.
Note: There are extremely rare situations in which the subprocesses won't die immediately on SIGKILL. I believe the only situation in which this happens is if the subprocess is performing a very long system call or other kernel operation, which might not be interrupted immediately. If the operation is in deadlock this might prevent the process from terminating forever. However I don't think that this is your case, since you did not mention that the process is stuck doing nothing, but from what you said the process simply seems to continue running.

subprocess.Popen() getting stuck

My question
I encountered a hang-up issue with the combination of threading, multiprocessing, and subprocess. I simplified my situation as below.
import subprocess
import threading
import multiprocessing
class dummy_proc(multiprocessing.Process):
def run(self):
print('run')
while True:
pass
class popen_thread(threading.Thread):
def run(self):
proc = subprocess.Popen('ls -la'.split(), shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout_byte, stderr_byte = proc.communicate()
rc = proc.returncode
print(rc)
if __name__ == '__main__':
print('start')
t = popen_thread()
t.start()
p = dummy_proc()
p.start()
t.join()
p.terminate()
In this script, a thread and a process are generated, respectively. The thread just issues the system command ls -la. The process just loops infinitely. When the thread finishes getting the return code of the system command, it terminates the process and exits immediately.
When I run this script again and again, it sometimes hangs up. I googled this situation and found some articles which seem to be related.
Is it safe to fork from within a thread?
Issue with python's subprocess,popen (creating a zombie and getting stuck)
So, I guess the hang-up issue is explained something like below.
The process is generated between Popen() and communicate().
It inherits some "blocking" status of the thread, and it is never released.
It prevents the thread from acquiring the result of the communitare().
But I'm not 100% confident, so it would be great if someone helped me explain what happens here.
My environment
I used following environment.
$ uname -a
Linux dell-vostro5490 5.10.96-1-MANJARO #1 SMP PREEMPT Tue Feb 1 16:57:46 UTC 2022 x86_64 GNU/Linux
$ python3 --version
Python 3.9.2
I also tried following environment and got the same result.
$ uname -a
Linux raspberrypi 5.10.17+ #2 Tue Jul 6 21:58:58 PDT 2021 armv6l GNU/Linux
$ python3 --version
Python 3.7.3
What I tried
Use spawn instead of fork for multiprocessing.
Use thread instead of process for dummy_proc.
In both cases, the issue disappeared. So, I guess this issue is related with the behavior of the fork...

This is a bit too long for a comment and so ...
I am having a problem understanding your statement that the problem disappears when you "Use thread instead of process for dummy_proc."
The hanging problem as I understand it is "that fork() only copies the calling thread, and any mutexes held in child threads will be forever locked in the forked child." In other words, the hanging problem arises when a fork is done when there exists one or more threads other than the main thread (i.e, the one associated with the main process).
If you execute a subprocess.Popen call from a newly created subprocess or a newly created thread, either way there will be by definition a new thread in existence prior to the fork done to implement the Popen call and I would think the potential for hanging exists.
import subprocess
import threading
import multiprocessing
import os
class popen_process(multiprocessing.Process):
def run(self):
print(f'popen_process, PID = {os.getpid()}, TID={threading.current_thread().ident}')
proc = subprocess.Popen('ls -la'.split(), shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout_byte, stderr_byte = proc.communicate()
rc = proc.returncode
if __name__ == '__main__':
print(f'main process, PID = {os.getpid()}, TID={threading.current_thread().ident}')
multiprocessing.set_start_method('spawn')
p = popen_process()
p.start()
p.join()
Prints:
main process, PID = 14, TID=140301923051328
popen_process, PID = 16, TID=140246240732992
Note the new thread with TID=140246240732992
It seems to me that you need to use startup method spawn as long as you are doing the Popen call from another thread or process if you want to be sure of not hanging. For what it's worth, on my Windows Subsystem for Linux I could not get it to hang with fork using your code after quite a few tries. So I am just going by what the linked answer warns against.
In any event, in your example code, there seems to be a potential race condition. Let's assume that even though your popen_process is a new thread, its properties are such that it does not give rise to the hanging problem (no mutexes are being held). Then the problem would be arising from the creation of the dummy_proc process/thread. The question then becomes whether your call to t1.start() completes the starting of the new process that ultimately runs the ls -la command prior to or after the completion of the creation of the dummy_proc process/thread. This timing will determine whether the new dummy_proc thread (there will be one regardless of whether dummy_proc inherits from Process or Thread as we have seen) will exist prior to the creation of the ls -la process. This race condition might explain why you sometimes were hanging. I would have no explanation for why if you make dummy_proc inherit from threading.Thread that you never hang.

Weird behaviour with threads and processes mixing

I'm running the following python code:
import threading
import multiprocessing
def forever_print():
while True:
print("")
def main():
t = threading.Thread(target=forever_print)
t.start()
return
if __name__=='__main__':
p = multiprocessing.Process(target=main)
p.start()
p.join()
print("main process on control")
It terminates.
When I unwrapped main from the new process, and just ran it directly, like this:
if name == '__main__':
main()
The script went on forever, as I thought it should. Am I wrong to assume that, given that t is a non-daemon process, p shouldn't halt in the first case?
I basically set up this little test because i've been developing an app in which threads are spawned inside subprocesses, and it's been showing some weird behaviour (sometimes it terminates properly, sometimes it doesn't). I guess what I wanted to know, in a broader sense, is if there is some sort of "gotcha" when mixing these two python libs.
My running environment: python 2.7 # Ubuntu 14.04 LTS

For now, threads created by multiprocessing worker processes act like daemon threads with respect to process termination: the worker process exits without waiting for the threads it created to terminate. This is due to worker processes using os._exit() to shut down, which skips most normal shutdown processing (and in particular skips the normal exit processing code (sys.exit()) that .join()'s non-daemon threading.Threads).
The easiest workaround is for worker processes to explicitly .join() the non-daemon threads they create.
There's an open bug report about this behavior, but it hasn't made much progress: http://bugs.python.org/issue18966

You need to call t.join() in your main function.
As your main function returns, the process gets terminated with both its threads.
p.join() blocks the main thread waiting for the spawned process to end. Your spawned process then, creates a thread but does not wait for it to end. It returns immediately thus trashing the thread itself.
If Threads share memory, Processes don't. Therefore, the Thread you create in the newly spawned process remains relegated to that process. The parent process is not aware of it.

The gotcha is that the multiprocessing machinery calls os._exit() after your target function exits, which violently kills the child process, even if it has background threads running.
The code for Process.start() looks like this:
def start(self):
'''
Start child process
'''
assert self._popen is None, 'cannot start a process twice'
assert self._parent_pid == os.getpid(), \
'can only start a process object created by current process'
assert not _current_process._daemonic, \
'daemonic processes are not allowed to have children'
_cleanup()
if self._Popen is not None:
Popen = self._Popen
else:
from .forking import Popen
self._popen = Popen(self)
_current_process._children.add(self)
Popen.__init__ looks like this:
def __init__(self, process_obj):
sys.stdout.flush()
sys.stderr.flush()
self.returncode = None
self.pid = os.fork() # This forks a new process
if self.pid == 0: # This if block runs in the new process
if 'random' in sys.modules:
import random
random.seed()
code = process_obj._bootstrap() # This calls your target function
sys.stdout.flush()
sys.stderr.flush()
os._exit(code) # Violent death of the child process happens here
The _bootstrap method is the one that actually executes the target function you passed passed to the Process object. In your case, that's main. main returns right after you start your background thread, even though the process doesn't exit, because there's still a non-daemon thread running.
However, as soon execution hits os._exit(code), the child process is killed, regardless of any non-daemon threads still executing.

Why doesn't the daemon program exit without join()

The answer might be right in front of me on the link below but I still don't understand. I'm sure after someone explains this to me, Darwin will be making a call to me.
The example is at this link here, although I've made some changes to try to experiment and help my understanding.
Here's the code:
import multiprocessing
import time
import sys
def daemon():
p = multiprocessing.current_process()
print 'Starting: ', p.name, p.pid
sys.stdout.flush()
time.sleep(2)
print 'Exiting: ', p.name, p.pid
sys.stdout.flush()
def non_daemon():
p = multiprocessing.current_process()
print 'Starting: ', p.name, p.pid
sys.stdout.flush()
time.sleep(6)
print 'Exiting: ', p.name, p.pid
sys.stdout.flush()
if __name__ == '__main__':
d = multiprocessing.Process(name='daemon', target=daemon)
d.daemon = True
n = multiprocessing.Process(name='non-daemon', target=non_daemon)
n.daemon = False
d.start()
time.sleep(1)
n.start()
# d.join()
And the output of the code is:
Starting: daemon 6173
Starting: non-daemon 6174
Exiting: non-daemon 6174
If the join() at the end is uncommented, then the output is:
Starting: daemon 6247
Starting: non-daemon 6248
Exiting: daemon 6247
Exiting: non-daemon 6248
I'm confused b/c the sleep of the daemon is 2 sec, whereas the non-daemon is 6 sec. Why doesn't it print out the "Exiting" message in the first case? The daemon should have woken up before the non-daemon and printed the message.
The explanation from the site is as such:
The output does not include the “Exiting” message from the daemon
process, since all of the non-daemon processes (including the main
program) exit before the daemon process wakes up from its 2 second
sleep.
but I changed it such that the daemon should have woken up before the non-daemon does. What am I missing here? Thanks in advance for your help.
EDIT: Forgot to mention I'm using python 2.7 but apparently this problem is also in python 3.x

This was a fun one to track down. The docs are somewhat misleading, in that they describe the non-daemon processes as if they are all equivalent; the existence of any non-daemon process means the process "family" is alive. But that's not how it's implemented. The parent process is "more equal" than others; multiprocessing registers an atexit handler that does the following:
for p in active_children():
if p.daemon:
info('calling terminate() for daemon %s', p.name)
p._popen.terminate()
for p in active_children():
info('calling join() for process %s', p.name)
p.join()
So when the main process finishes, it first terminates all daemon child processes, then joins all child processes to wait on non-daemon children and clean up resources from daemon children.
Because it performs cleanup in this order, a moment after your non-daemon Process starts, the main process begins cleanup and forcibly terminates the daemon Process.
Note that fixing this can be as simple as joining the non-daemon process manually, not just joining the daemon process (which defeats the whole point of a daemon completely); that prevents the atexit handler from being called, delaying the cleanup that would terminate the daemon child.
It's arguably a bug (one that seems to exist up through 3.5.1; I reproed myself), but whether it's a behavior bug or a docs bug is arguable.

Wait until nested python script ends before continuing in current python script

I have a python script that calls another python script. Inside the other python script it spawns some threads.How do I make the calling script wait until the called script is completely done running?
This is my code :
while(len(mProfiles) < num):
print distro + " " + str(len(mProfiles))
mod_scanProfiles.main(distro)
time.sleep(180)
mProfiles = readProfiles(mFile,num,distro)
print "yoyo"
How do I wait until mod_scanProfiles.main() and all threads are completely finished? ( I used time.sleep(180) for now but its not good programming habit)

You want to modify the code in mod_scanProfiles.main to block until all it's threads are finished.
Assuming you make a call to subprocess.Popen in that function just do:
# in mod_scanPfiles.main:
p = subprocess.Popen(...)
p.wait() # wait until the process completes.
If you're not currently waiting for your threads to end you'll also want to call Thread.join (docs) to wait for them to complete. For example:
# assuming you have a list of thread objects somewhere
threads = [MyThread(), ...]
for thread in threads:
thread.start()
for thread in threads:
thread.join()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.