Python subprocess hang when a lot of executable are called - python

I have a problem with using Python to run Windows executables in parallel.
I will explain my problem in more detail.
I was able to write some code that creates an amount of threads equal to the number of cores. Each thread executes the following function that starts the executable with the use of subprocess.Popen().
The executable are unit test for an application. The test use gtest library. From what I know they just read and write on the file system.
def _execute(self, test_file_path) -> None:
test_path = self._get_test_path_without_extension(test_file_path)
process = subprocess.Popen(test_path,
shell=False,
stdout=sys.stdout,
stderr=sys.stderr,
universal_newlines=True)
try:
process.communicate(timeout=TEST_TIMEOUT_IN_SECONDS)
if process.returncode != 0:
print(f'Test fail')
except subprocess.TimeoutExpired:
process.kill()
During the execution of processes it happens that some hang, never ending. I set a timeout as workaround but I wondering why some of these application never terminate. This block the execution of the Python code.
The following code show the creation of the threads. The function _execute_tests just take a test from the Queue (with the .get() function) and pass it to the function execute(test_file_path).
### Peace of code used to spawn the threads
for i in range(int(core_num)):
thread = threading.Thread(target=self._execute_tests,
args=(tests,),
daemon=True)
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
I already try to:
use subprocess.run, subprocess.call and the other function explained on the documentation page
use a larger buffer with the use of bufsize parameter
disable the buffer
move the stdout to a file per thread
move the stdout to subprocess.DEVNULL
remove the use of subprocess.communicate()
remove the use of threading
use multiprocessing
On my local machine with 16 core / 64GB RAM I can run without problems 16 threads. All of them always terminate without problems. To be able to reproduce the problem I need to increase the number of threads to 30/40.
On Azure machine with 8 core / 32 GB RAM the issues can be reproduce with just 8 threads in parallel.
If I run the executables from a bat
for /r "." %%a in (*.exe) do start /B "" "%%~fa"
the problem never happen.
Have someone an idea of what could be the problem?

Related

subprocess.Popen() getting stuck

My question
I encountered a hang-up issue with the combination of threading, multiprocessing, and subprocess. I simplified my situation as below.
import subprocess
import threading
import multiprocessing
class dummy_proc(multiprocessing.Process):
def run(self):
print('run')
while True:
pass
class popen_thread(threading.Thread):
def run(self):
proc = subprocess.Popen('ls -la'.split(), shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout_byte, stderr_byte = proc.communicate()
rc = proc.returncode
print(rc)
if __name__ == '__main__':
print('start')
t = popen_thread()
t.start()
p = dummy_proc()
p.start()
t.join()
p.terminate()
In this script, a thread and a process are generated, respectively. The thread just issues the system command ls -la. The process just loops infinitely. When the thread finishes getting the return code of the system command, it terminates the process and exits immediately.
When I run this script again and again, it sometimes hangs up. I googled this situation and found some articles which seem to be related.
Is it safe to fork from within a thread?
Issue with python's subprocess,popen (creating a zombie and getting stuck)
So, I guess the hang-up issue is explained something like below.
The process is generated between Popen() and communicate().
It inherits some "blocking" status of the thread, and it is never released.
It prevents the thread from acquiring the result of the communitare().
But I'm not 100% confident, so it would be great if someone helped me explain what happens here.
My environment
I used following environment.
$ uname -a
Linux dell-vostro5490 5.10.96-1-MANJARO #1 SMP PREEMPT Tue Feb 1 16:57:46 UTC 2022 x86_64 GNU/Linux
$ python3 --version
Python 3.9.2
I also tried following environment and got the same result.
$ uname -a
Linux raspberrypi 5.10.17+ #2 Tue Jul 6 21:58:58 PDT 2021 armv6l GNU/Linux
$ python3 --version
Python 3.7.3
What I tried
Use spawn instead of fork for multiprocessing.
Use thread instead of process for dummy_proc.
In both cases, the issue disappeared. So, I guess this issue is related with the behavior of the fork...
This is a bit too long for a comment and so ...
I am having a problem understanding your statement that the problem disappears when you "Use thread instead of process for dummy_proc."
The hanging problem as I understand it is "that fork() only copies the calling thread, and any mutexes held in child threads will be forever locked in the forked child." In other words, the hanging problem arises when a fork is done when there exists one or more threads other than the main thread (i.e, the one associated with the main process).
If you execute a subprocess.Popen call from a newly created subprocess or a newly created thread, either way there will be by definition a new thread in existence prior to the fork done to implement the Popen call and I would think the potential for hanging exists.
import subprocess
import threading
import multiprocessing
import os
class popen_process(multiprocessing.Process):
def run(self):
print(f'popen_process, PID = {os.getpid()}, TID={threading.current_thread().ident}')
proc = subprocess.Popen('ls -la'.split(), shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout_byte, stderr_byte = proc.communicate()
rc = proc.returncode
if __name__ == '__main__':
print(f'main process, PID = {os.getpid()}, TID={threading.current_thread().ident}')
multiprocessing.set_start_method('spawn')
p = popen_process()
p.start()
p.join()
Prints:
main process, PID = 14, TID=140301923051328
popen_process, PID = 16, TID=140246240732992
Note the new thread with TID=140246240732992
It seems to me that you need to use startup method spawn as long as you are doing the Popen call from another thread or process if you want to be sure of not hanging. For what it's worth, on my Windows Subsystem for Linux I could not get it to hang with fork using your code after quite a few tries. So I am just going by what the linked answer warns against.
In any event, in your example code, there seems to be a potential race condition. Let's assume that even though your popen_process is a new thread, its properties are such that it does not give rise to the hanging problem (no mutexes are being held). Then the problem would be arising from the creation of the dummy_proc process/thread. The question then becomes whether your call to t1.start() completes the starting of the new process that ultimately runs the ls -la command prior to or after the completion of the creation of the dummy_proc process/thread. This timing will determine whether the new dummy_proc thread (there will be one regardless of whether dummy_proc inherits from Process or Thread as we have seen) will exist prior to the creation of the ls -la process. This race condition might explain why you sometimes were hanging. I would have no explanation for why if you make dummy_proc inherit from threading.Thread that you never hang.

Why os.remove() raises exception PermissionError?

On a Windows 7 platform I'm using python 3.6 as a framework to start working processes (written in C).
For starting the processes subprocess.Popen is used. The following shows the relevant code (one thread per process to be started).
redirstream = open(redirfilename, "w")
proc = subprocess.Popen(batchargs, shell=False, stdout=redirstream)
outs, errs = proc.communicate(timeout=60)
# wait for job to be finished
ret = proc.wait()
...
if ret == 0: # changed !!
redirstream.flush()
redirstream.close()
os.remove(redirfilename)
communicate is just used to be able, to terminate the executable after 60 seconds , for the case it hangs. redirstream is used to write output from the executable (written in C) to a file, used for general debugging purposes (not related to this issue). Of course, all processes are passed redirfiles with different filenames.
Up to ten such subprocesses are started in that way from independent python threads.
Although it works, I made a mysterious observation:
For the case an executable has finished without errors, I want to delete redirfilename, because it is not used anymore.
Now lets say, I have started process-A, B and C.
Processes A and B are finished and gave back 0 as result.
Process C however intentionally doesn't get data (just for testing, a serial connection has been disconnected) and waits for input from a named pipe (created from python) using Windows ReadFile function:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365467(v=vs.85).aspx
In that case, while "C" is still waiting for ReadFile to be finished, os.remove(redirfilename) for A and B sometimes throws exception "PermissionError", saying, that the file is still used by another process. But from task manager I can see, that the processes A and B are not existing anymore (as expected).
I tried to catch the PermissionError and repeat the delete command after some delay. Only after "C" has terminated (timeout after 60 seconds), the redirfile for A or B can be deleted.
Why is the redirstream still blocked and somehow in use, although the process behind is not alive anymore and why is it blocked by ReadFile() in a completely unrelated process, which is definitely not related to that particular file? Is that an issue in Python or in my implementation?
Any hints are highly appreciated...

Python script can't be terminated through Ctrl+C or Ctrl+Break

I have this simple python script called myMain.py to execute another python program automatically with incremental number, and I'm running it on CentOS 7:
#!/usr/bin/python
import os
import sys
import time
def main():
step_indicator = ""
arrow = ">"
step = 2
try:
for i in range(0,360, step):
step_percentage = float(i)/360.0 * 100
if i % 10 == 0:
step_indicator += "="
os.system("python myParsePDB.py -i BP1.pdb -c 1 -s %s" % step)
print("step_percentage%s%s%.2f" % (step_indicator,arrow,step_percentage)+"%")
except KeyboardInterrupt:
print("Stop me!")
sys.exit(0)
if __name__ == "__main__":
main()
For now I only know this script is single thread safe, but I can't terminate it with Ctrl+C keyboard interruption.
I have read some relative questions: such as Cannot kill Python script with Ctrl-C and Stopping python using ctrl+c I realized that Ctrl+Z does not kill the process, it only pauses the process and keep the process in background. Ctrl+Break does work for my case either, I think it only terminates my main thread but keeps the child process.
I also noticed that calling os.system() will spawn a child process from the current executing process. At the same time, I also have os file I/O functions and os.system("rm -rf legacy/*") will be invoked in myParsePDB.py which means this myParsePDB.py child process will spawn child process as well. Then, if I want to catch Ctrl+C in myMain.py, should I daemon only myMain.py or should I daemon each process when they spawn?
This is a general problem that could raise when dealing with signal handling. Python signal is not an exception, it's a wrapper of operating system signal. Therefore, signal processing in python depends on operating system, hardware and many conditions. However, how to deal with these problem is similar.
According to this tutorial, I'll quote the following paragraphs: signal – Receive notification of asynchronous system events
Signals are an operating system feature that provide a means of
notifying your program of an event, and having it handled
asynchronously. They can be generated by the system itself, or sent
from one process to another. Since signals interrupt the regular flow
of your program, it is possible that some operations (especially I/O)
may produce error if a signal is received in the middle.
Signals are identified by integers and are defined in the operating
system C headers. Python exposes the signals appropriate for the
platform as symbols in the signal module. For the examples below, I
will use SIGINT and SIGUSR1. Both are typically defined for all Unix
and Unix-like systems.
In my code:
os.system("python myParsePDB.py -i BP1.pdb -c 1 -s %s" % step) inside the for loop will be executed for a bit of time and will spend some time on I/O files. If the keyboard interrupt is passing too fast and do not catch asynchronously after writing files, the signal might be blocked in operating system, so my execution will still remain the try clause for loop. (Errors detected during execution are called exceptions and are not unconditionally fatal: Python Errors and Exceptions).
Therefore the simplest way to make them asynchonous is wait:
try:
for i in range(0,360, step):
os.system("python myParsePDB.py -i BP1.pdb -c 1 -s %s" % step)
time.sleep(0.2)
except KeyboardInterrupt:
print("Stop me!")
sys.exit(0)
It might hurt performance but it guaranteed that the signal can be caught after waiting the execution of os.system(). You might also want to use other sync/async functions to solve the problem if better performance is required.
For more unix signal reference, please also look at: Linux Signal Manpage

python how to kill a popen process, shell false [why not working with standard methods]

I'm trying to kill a subprocess started with:
playing_long = Popen(["omxplayer", "/music.mp3"], stdout=subprocess.PIPE)
and after a while
pid = playing_long.pid
playing_long.terminate()
os.kill(pid,0)
playing_long.kill()
Which doesn't work.
Neither the solution pointed out here
How to terminate a python subprocess launched with shell=True
Noting that I am using threads, and it is not recommended to use preexec_fn when you use threads (or at least this is what I read, anyway it doesn't work either).
Why it is not working? There's no error message in the code, but I have to manually kill -9 the process to stop listening the mp3 file.
Thanks
EDIT:
From here, I have added a wait() after the kill().
Surprisingly, before re-starting the process I check if this is still await, so that I don't start a chorus with the mp3 file.
Without the wait(), the system sees that the process is alive.
With the wait(), the system understands that the process is dead and starts again it.
However, the process is still sounding. Definitively I can't seem to get it killed.
EDIT2: The problem is that omxplayer starts a second process that I don't kill, and it's the responsible for the actual music.
I've tried to use this code, found in several places in internet, it seems to work for everyone but not for me
playing_long.stdin.write('q')
playing_long.stdin.flush()
And it prints 'NoneType' object has no attribute 'write'. Even when using this code immediately after starting the popen process, it fails with the same message
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write('q')
playing_long.stdin.flush()
EDIT3: The problem then was that I wasn't establishing the stdin line in the popen line. Now it is
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write(b'q')
playing_long.stdin.flush()
*needing to specify that it is bytes what I write in stdin
Final solution then (see the process edited in the question):
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write(b'q')
playing_long.stdin.flush()

Python threads with os.system() calls. Main thread doesn't exit on ctrl+c

Please don't consider it a duplicate before reading, There are a lot of questions about multithreading and keyboard interrupt, but i didn't find any considering os.system and it looks like it's important.
I have a python script which makes some external calls in worker threads.
I want it to exit if I press ctrl+c But it look like the main thread ignores it.
Something like this:
from threading import Thread
import sys
import os
def run(i):
while True:
os.system("sleep 10")
print i
def main():
threads=[]
try:
for i in range(0, 3):
threads.append(Thread(target=run, args=(i,)))
threads[i].daemon=True
threads[i].start()
for i in range(0, 3):
while True:
threads[i].join(10)
if not threads[i].isAlive():
break
except(KeyboardInterrupt, SystemExit):
sys.exit("Interrupted by ctrl+c\n")
if __name__ == '__main__':
main()
Surprisingly, it works fine if I change os.system("sleep 10") to time.sleep(10).
I'm not sure what operating system and shell you are using. I describe Mac OS X and Linux with zsh (bash/sh should act similar).
When you hit Ctrl+C, all programs running in the foreground in your current terminal receive the signal SIGINT. In your case it's your main python process and all processes spawned by os.system.
Processes spawned by os.system then terminate their execution. Usually when python script receives SIGINT, it raises KeyboardInterrupt exception, but your main process ignores SIGINT, because of os.system(). Python os.system() calls the Standard C function system(), that makes calling process ignore SIGINT (man Linux / man Mac OS X).
So neither of your python threads receives SIGINT, it's only children processes who get it.
When you remove os.system() call, your python process stops ignoring SIGINT, and you get KeyboardInterrupt.
You can replace os.system("sleep 10") with subprocess.call(["sleep", "10"]). subprocess.call() doesn't make your process ignore SIGINT.
I've had this same problem more times than I could count back when i was first learning python multithreading.
Adding the sleep call within the loop makes your main thread block, which will allow it to still hear and honor exceptions. What you want to do is utilize the Event class to set an event in your child threads that will serve as an exit flag to break execution upon. You can set this flag in your KeyboardInterrupt exception, just put the except clause for that in your main thread.
I'm not entirely certain what is going on with the different behaviors between the python specific sleep and the os called one, but the remedy I am offering should work for what your desired end result is. Just offering a guess, the os called one probably blocks the interpreter itself in a different way?
Keep in mind that generally in most situations where threads are required the main thread is going to keep executing something, in which case the "sleeping" in your simple example would be implied.
http://docs.python.org/2/library/threading.html#event-objects

Categories