Python guru I need your help. I faced quite strange behavior:
empty python Process hangs on joining. Looks like it forks some locked resource.
Env:
Python version: 3.5.3
OS: Ubuntu 16.04.2 LTS
Kernel: 4.4.0-75-generic
Problem description:
1) I have a logger with thread to handle messages in background and queue for this thread. Logger source code (a little bit simplified).
2) And I have a simple script which uses my logger (just code to display my problem):
import os
from multiprocessing import Process
from my_logging import get_logger
def func():
pass
if __name__ == '__main__':
logger = get_logger(__name__)
logger.start()
for _ in range(2):
logger.info('message')
proc = Process(target=func)
proc.start()
proc.join(timeout=3)
print('TEST PROCESS JOINED: is_alive={0}'.format(proc.is_alive()))
logger.stop()
print('EXIT')
Sometimes this test script hangs. Script hangs on joining process "proc" (when script completes execution). Test process "proc" stay alive.
To reproduce this problem you can run the script in loop:
$ for i in {1..100} ; do /opt/python3.5.3/bin/python3.5 test.py ; done
Investigation:
Strace shows following:
strace: Process 25273 attached
futex(0x2275550, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff
And I figured out the place where process hangs. It hangs in multiprocessing module, file process.py, line 269 (python3.5.3), on flushing STDERR:
...
267 util.info('process exiting with exitcode %d' % exitcode)
268 sys.stdout.flush()
269 sys.stderr.flush()
...
If line 269 commented the script completes successfully always.
My thoughts:
By default logging.StreamHandler uses sys.stderr as stream.
If process has been forked when logger flushing data to STDERR, process context gets some locked resource and further hangs on flushing STDERR.
Some workarounds which solves problem:
Use python2.7. I can't reproduce it with python2.7. Maybe timings prevent me to reproduce the problem.
Use process to handle messages in logger instead of thread.
Do you have any ideas on this behavior? Where is the problem? Am I doing something wrong?
It looks like this behaviour is related to this issue: http://bugs.python.org/issue6721
Question: Sometimes ... Test process "proc" stay alive.
I could only reproduce your
TEST PROCESS:0 JOINED: is_alive=True
by adding a time.sleep(5) to def func():.
You use proc.join(timeout=3), that's the expected behavior.
Conclusion:
Overloading your System, starts in my Environment with 30 Processes running, triggers your proc.join(timeout=3).
You may rethink your Testcase to reproduce your problem.
One Approach I think, is fine-tuning your Process/Thread with some time.sleep(0.05) to give off a timeslice.
Your are using from multiprocessing import Queue
use from queue import Queue instead.
From the Documentation
Class multiprocessing.Queue
A queue class for use in a multi-processing (rather than multi-threading) context.
In class QueueHandler(logging.Handler):, prevent to do
self.queue.put_nowait(record)
after
class QueueListener(object):
...
def stop(self):
...
implement, for instance
class QueueHandler(logging.Handler):
def __init__(self):
self.stop = Event()
...
In def _monitor(self): use only ONE while ... loop.
Wait until the self._thread stoped
class QueueListener(object):
...
def stop(self):
self.handler.stop.set()
while not self.queue.empty():
time.sleep(0.5)
# Don't use double flags
#self._stop.set()
self.queue.put_nowait(self._sentinel)
self._thread.join()
Related
My question
I encountered a hang-up issue with the combination of threading, multiprocessing, and subprocess. I simplified my situation as below.
import subprocess
import threading
import multiprocessing
class dummy_proc(multiprocessing.Process):
def run(self):
print('run')
while True:
pass
class popen_thread(threading.Thread):
def run(self):
proc = subprocess.Popen('ls -la'.split(), shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout_byte, stderr_byte = proc.communicate()
rc = proc.returncode
print(rc)
if __name__ == '__main__':
print('start')
t = popen_thread()
t.start()
p = dummy_proc()
p.start()
t.join()
p.terminate()
In this script, a thread and a process are generated, respectively. The thread just issues the system command ls -la. The process just loops infinitely. When the thread finishes getting the return code of the system command, it terminates the process and exits immediately.
When I run this script again and again, it sometimes hangs up. I googled this situation and found some articles which seem to be related.
Is it safe to fork from within a thread?
Issue with python's subprocess,popen (creating a zombie and getting stuck)
So, I guess the hang-up issue is explained something like below.
The process is generated between Popen() and communicate().
It inherits some "blocking" status of the thread, and it is never released.
It prevents the thread from acquiring the result of the communitare().
But I'm not 100% confident, so it would be great if someone helped me explain what happens here.
My environment
I used following environment.
$ uname -a
Linux dell-vostro5490 5.10.96-1-MANJARO #1 SMP PREEMPT Tue Feb 1 16:57:46 UTC 2022 x86_64 GNU/Linux
$ python3 --version
Python 3.9.2
I also tried following environment and got the same result.
$ uname -a
Linux raspberrypi 5.10.17+ #2 Tue Jul 6 21:58:58 PDT 2021 armv6l GNU/Linux
$ python3 --version
Python 3.7.3
What I tried
Use spawn instead of fork for multiprocessing.
Use thread instead of process for dummy_proc.
In both cases, the issue disappeared. So, I guess this issue is related with the behavior of the fork...
This is a bit too long for a comment and so ...
I am having a problem understanding your statement that the problem disappears when you "Use thread instead of process for dummy_proc."
The hanging problem as I understand it is "that fork() only copies the calling thread, and any mutexes held in child threads will be forever locked in the forked child." In other words, the hanging problem arises when a fork is done when there exists one or more threads other than the main thread (i.e, the one associated with the main process).
If you execute a subprocess.Popen call from a newly created subprocess or a newly created thread, either way there will be by definition a new thread in existence prior to the fork done to implement the Popen call and I would think the potential for hanging exists.
import subprocess
import threading
import multiprocessing
import os
class popen_process(multiprocessing.Process):
def run(self):
print(f'popen_process, PID = {os.getpid()}, TID={threading.current_thread().ident}')
proc = subprocess.Popen('ls -la'.split(), shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout_byte, stderr_byte = proc.communicate()
rc = proc.returncode
if __name__ == '__main__':
print(f'main process, PID = {os.getpid()}, TID={threading.current_thread().ident}')
multiprocessing.set_start_method('spawn')
p = popen_process()
p.start()
p.join()
Prints:
main process, PID = 14, TID=140301923051328
popen_process, PID = 16, TID=140246240732992
Note the new thread with TID=140246240732992
It seems to me that you need to use startup method spawn as long as you are doing the Popen call from another thread or process if you want to be sure of not hanging. For what it's worth, on my Windows Subsystem for Linux I could not get it to hang with fork using your code after quite a few tries. So I am just going by what the linked answer warns against.
In any event, in your example code, there seems to be a potential race condition. Let's assume that even though your popen_process is a new thread, its properties are such that it does not give rise to the hanging problem (no mutexes are being held). Then the problem would be arising from the creation of the dummy_proc process/thread. The question then becomes whether your call to t1.start() completes the starting of the new process that ultimately runs the ls -la command prior to or after the completion of the creation of the dummy_proc process/thread. This timing will determine whether the new dummy_proc thread (there will be one regardless of whether dummy_proc inherits from Process or Thread as we have seen) will exist prior to the creation of the ls -la process. This race condition might explain why you sometimes were hanging. I would have no explanation for why if you make dummy_proc inherit from threading.Thread that you never hang.
I'm using multiprocessing in a larger code base where some of the import statements have side effects. How can I run a function in a background process without having it inherit global imports?
# helper.py:
print('This message should only print once!')
# main.py:
import multiprocessing as mp
import helper # This prints the message.
def worker():
pass # Unfortunately this also prints the message again.
if __name__ == '__main__':
mp.set_start_method('spawn')
process = mp.Process(target=worker)
process.start()
process.join()
Background: Importing TensorFlow initializes CUDA which reserves some amount of GPU memory. As a result, spawing too many processes leads to a CUDA OOM error, even though the processes don't use TensorFlow.
Similar question without an answer:
How to avoid double imports with the Python multiprocessing module?
Is there a resources that explains exactly what the multiprocessing
module does when starting an mp.Process?
Super quick version (using the spawn context not fork)
Some stuff (a pair of pipes for communication, cleanup callbacks, etc) is prepared then a new process is created with fork()exec(). On windows it's CreateProcessW(). The new python interpreter is called with a startup script spawn_main() and passed the communication pipe file descriptors via a crafted command string and the -c switch. The startup script cleans up the environment a little bit, then unpickles the Process object from its communication pipe. Finally it calls the run method of the process object.
So what about importing of modules?
Pickle semantics handle some of it, but __main__ and sys.modules need some tlc, which is handled here (during the "cleans up the environment" bit).
# helper.py:
print('This message should only print once!')
# main.py:
import multiprocessing as mp
def worker():
pass
def main():
# Importing the module only locally so that the background
# worker won't import it again.
import helper
mp.set_start_method('spawn')
process = mp.Process(target=worker)
process.start()
process.join()
if __name__ == '__main__':
main()
I'm using a commercial application that uses Python as part of its scripting API. One of the functions provided is something called App.run(). When this function is called, it starts a new Java process that does the rest of the execution. (Unfortunately, I don't really know what it's doing under the hood as the supplied Python modules are .pyc files, and many of the Python functions are SWIG generated).
The trouble I'm having is that I'm building the App.run() call into a larger Python application that needs to do some guaranteed cleanup code (closing a database, etc.). Unfortunately, if the subprocess is interrupted with Ctrl+C, it aborts and returns to the command line without returning control to the main Python program. Thus, my cleanup code never executes.
So far I've tried:
Registering a function with atexit... doesn't work
Putting cleanup in a class __del__ destructor... doesn't work. (App.run() is inside the class)
Creating a signal handler for Ctrl+C in the main Python app... doesn't work
Putting App.run() in a Thread... results in a Memory Fault after the Ctrl+C
Putting App.run() in a Process (from multiprocessing)... doesn't work
Any ideas what could be happening?
This is just an outline- but something like this?
import os
cpid = os.fork()
if not cpid:
# change stdio handles etc
os.setsid() # Probably not needed
App.run()
os._exit(0)
os.waitpid(cpid)
# clean up here
(os.fork is *nix only)
The same idea could be implemented with subprocess in an OS agnostic way. The idea is running App.run() in a child process and then waiting for the child process to exit; regardless of how the child process died. On posix, you could also trap for SIGCHLD (Child process death). I'm not a windows guru, so if applicable and subprocess doesn't work, someone else will have to chime in here.
After App.run() is called, I'd be curious what the process tree looks like. It's possible its running an exec and taking over the python process space. If thats happening, creating a child process is the only way I can think of trapping it.
If try: App.run() finally: cleanup() doesn't work; you could try to run it in a subprocess:
import sys
from subprocess import call
rc = call([sys.executable, 'path/to/run_app.py'])
cleanup()
Or if you have the code in a string you could use -c option e.g.:
rc = call([sys.executable, '-c', '''import sys
print(sys.argv)
'''])
You could implement #tMC's suggestion using subprocess by adding
preexec_fn=os.setsid argument (note: no ()) though I don't see how creating a process group might help here. Or you could try shell=True argument to run it in a separate shell.
You might give another try to multiprocessing:
import multiprocessing as mp
if __name__=="__main__":
p = mp.Process(target=App.run)
p.start()
p.join()
cleanup()
Are you able to wrap the App.Run() in a Try/Catch?
Something like:
try:
App.Run()
except (KeyboardInterrupt, SystemExit):
print "User requested an exit..."
cleanup()
(there is a follow up to this question here)
I am working on trying to write a Python based Init system for Linux but I'm having an issue getting signals to my Python init script. From the 'man 2 kill' page:
The only signals that can be sent to process ID 1, the init process, are those for which init has explicitly installed signal handlers.
In my Python based Init, I have a test function and a signal handler setup to call that function:
def SigTest(SIG, FRM):
print "Caught SIGHUP!"
signal.signal(signal.SIGHUP, SigTest)
From another TTY (the init script executes sh on another tty) if I send a signal, it is completely ignored and the text is never printed. kill -HUP 1
I found this issue because I wrote a reaping function for my Python init to reap its child processes as they die, but they all just zombied, it took awhile to figure out Python was never getting the SIGCHLD signal. Just to ensure my environment is sane, I wrote a C program to fork and have the child send PID 1 a signal and it did register.
How do I install a signal handler the system will acknowledge if signal.signal(SIG, FUNC) isn't working?
Im going to try using ctypes to register my handler with C code and see if that works, but I rather a pure Python answer if at all possible.
Ideas?
( I'm not a programmer, Im really in over my head here :p )
Test code below...
import os
import sys
import time
import signal
def SigTest(SIG, FRM):
print "SIGINT Caught"
print "forking for ash"
cpid = os.fork()
if cpid == 0:
os.closerange(0, 4)
sys.stdin = open('/dev/tty2', 'r')
sys.stdout = open('/dev/tty2', 'w')
sys.stderr = open('/dev/tty2', 'w')
os.execv('/bin/ash', ('ash',))
print "ash started on tty2"
signal.signal(signal.SIGHUP, SigTest)
while True:
time.sleep(5.0)
Signal handlers mostly work in Python. But there are some problems. One is that your handler won't run until the interpreter re-enters it's bytecode interpreter. if your program is blocked in a C function the signal handler is not called until it returns. You don't show the code where you are waiting. Are you using signal.pause()?
Another is that if you are in a system call you will get an exception after the singal handler returns. You need to wrap all system calls with a retry handler (at least on Linux).
It's interesting that you are writing an init replacement... That's something like a process manager. The proctools code might interest you, since it does handle SIGCHLD.
By the way, this code:
import signal
def SigTest(SIG, FRM):
print "SIGINT Caught"
signal.signal(signal.SIGHUP, SigTest)
while True:
signal.pause()
Does work on my system.
I have Python 2.6 on MacOS X and a multithread operation. Following test code works fine and shuts down app on Ctrl-C:
import threading, time, os, sys, signal
def SigIntHandler( signum, frame ) :
sys.exit( 0 )
signal.signal( signal.SIGINT, SigIntHandler )
class WorkThread( threading.Thread ) :
def run( self ) :
while True :
time.sleep( 1 )
thread = WorkThread()
thread.start()
time.sleep( 1000 )
But if i change only one string, adding some real work to worker thread, the app will never terminate on Ctrl-C:
import threading, time, os, sys, signal
def SigIntHandler( signum, frame ) :
sys.exit( 0 )
signal.signal( signal.SIGINT, SigIntHandler )
class WorkThread( threading.Thread ) :
def run( self ) :
while True :
os.system( "svn up" ) # This is really slow and can fail.
time.sleep( 1 )
thread = WorkThread()
thread.start()
time.sleep( 1000 )
Is it possible to fix it, or python is not intended to be used with threading?
A couple of things which may be causing your problem:
The Ctrl-C is perhaps being caught by svn, which is ignoring it.
You are creating a thread which is a non-daemon thread, then just exiting the process. This will cause the process to wait until the thread exits - which it never will. You need to either make the thread a daemon or give it a way to terminate it and join() it before exiting. While it always seems to stop on my Linux system, MacOS X behaviour may be different.
Python works well enough with threads :-)
Update: You could try using subprocess, setting up the child process so that file handles are not inherited, and setting the child's stdin to subprocess.PIPE.
You likely do not need threads at all.
Try using Python's subprocess module, or even Twisted's process support.
I'm not an expert on Threads with Python but quickly reading the docs leads to a few conclusions.
1) Calling os.system() spawns a new subshell and is not encouraged. Instead the subprocess module should be used. http://docs.python.org/release/2.6.6/library/os.html?highlight=os.system#os.system
2) The threading module doesn't seem to give a whole lot of control to the threads, maybe try using the thread module, at least there is a thread.exit() function. Also from the threading docs here it says that dummy threads may be created, which are always alive and daemonic, furthermore
"… the entire Python program exits when only daemon threads are left."
So, I would imagine that the least you need to do is signal the currently running threads that they need to exit, before exiting the main thread, or joining them on ctrl-c to allow them to finish (although this would obviously be contradictory to ctrl-c), or perhaps just using the subprocess module to spawn the svn up would do the trick.