Why is my threading/multiprocessing python script not exiting properly? - python

I have a server script that I need to be able to shutdown cleanly. While testing the usual try..except statements I realized that Ctrl-C didn't work the usual way. Normally I'd wrap long running tasks like this
try:
...
except KeyboardInterrupt:
#close the script cleanly here
so the task could be shutdown cleanly on Ctrl-C. I have never ran into any problems with this before, but somehow when I hit Ctrl-C when this particular script is running the script just exits without catching the Ctrl-C.
The initial version was implemented using Process from multiprocessing. I rewrote the script using Thread from threading, but same issue there. I have used threading many times before, but I am new to the multiprocessing library. Either way, I have never experienced this Ctrl-C behavior before.
Normally I have always implemented sentinels etc to close down Queues and Thread instances in an orderly fashion, but this script just exits without any response.
Last, I tried overriding signal.SIGINT as well like this
def handler(signal, frame):
print 'Ctrl+C'
signal.signal(signal.SIGINT, handler)
...
Here Ctrl+C was actually caught, but the handler doesn't execute, it never prints anything.
Besides the threading / multiprocessing aspect, parts of the script contains C++ SWIG objects. I don't know if that has anything to do with it. I am running Python 2.7.2 on OS X Lion.
So, a few questions:
What's going on here?
How can I debug this?
What do I need to learn in order to understand the root cause?
PLEASE NOTE: The internals of the script is proprietary so I can't give code examples. I am however very willing to receive pointers so I could debug this myself. I am experienced enough to be able to figure it out if someone could point me in the right direction.
EDIT: I started commenting out imports etc to see what caused the weird behavior, and I narrowed it down to an import of a C++ SWIG library. Any ideas why importing a C++ SWIG library 'steals' Ctrl-C? I am not the author of the guilty library however and my SWIG experience is limited so don't really know where to start...
EDIT 2: I just tried the same script on a windows machine, and in Windows 7 the Ctrl-C is caught as expected. I'm not really going to bother with the OS X part, the script will be run in an Windows environment anyway.

This might have to do with the way Python manages threads, signals and C calls.
In short - Ctrl-C cannot interrupt C calls, since the implementation requires that a python thread will handle the signal, and not just any thread, but the main thread (often blocked, waiting for other threads).
In fact, long operations can block everything.
Consider this:
>>> nums = xrange(100000000)
>>> -1 in nums
False (after ~ 6.6 seconds)
>>>
Now, Try hitting Ctrl-C (uninterruptible!)
>>> nums = xrange(100000000)
>>> -1 in nums
^C^C^C (nothing happens, long pause)
...
KeyboardInterrupt
>>>
The reason Ctrl-C doesn't work with threaded programs is that the main thread is often blocked on an uninterruptible thread-join or lock (e.g, any 'wait', 'join' or just a plain empty 'main' thread, which in the background causes python to 'join' on any spawned threads).
Try to insert a simple
while True:
time.sleep(1)
in your main thread.
If you have a long running C function, do signal handling in C-level (May the Force be with you!).
This is largely based on David Beazley's video on the subject.

It exits because something else is likely catching the KeyboardInterupt and then raising some other exception, or simply returning None. You should still get a traceback to help debug. You need to capture the stderr output or run your script with the -i commandline option so you can see traceback. Also, add another except block to catch all other exceptions.
If you suspect the C++ function call to be catching the CTRL+C try catching it's output. If the C function is not returning anything then there isn't much you can do except ask the author to add some exception handling, return codes, etc.
try:
#Doing something proprietary ...
#catch the function call output
result = yourCFuncCall()
#raise an exception if it's not what you expected
if result is None:
raise ValueError('Unexpected Result')
except KeyboardInterupt:
print('Must be a CTRL+C')
return
except:
print('Unhandled Exception')
raise

how about atexit?
http://docs.python.org/library/atexit.html#module-atexit

Related

How exactly is SIGILL generated?

I have a program using tensorflow on a non-supported hardware, so everytime i run it, i get the "Illegal instruction (Core dumped)" error
my main goal is to capture this error. i don't want to solve it.
The error is not printed to the stderr of my program, it's printed to the stderr of bash.
then my program exists with code 33792 which is 132 (SIGILL)
And i cannot capture it using the method mentioned here, because i'm running my command using docker run and i can't pass it the curly brackets
Is there any way to capture the stdout of bash without the curly brackets?
Also how exactly is SIGILL generated? what exactly is happening behind the scenes?
Is SIGILL triggered in the parent process (bash in my case) and passed to the child process (my program)? or vice versa?
i tried adding a SIGILL handler in my program to see if i can capture it, but my program froze instead of printing the "illegal instruction" error.
I'm using Debian 11 and my program is written in python.
Edit:
The SIGILL kills my python program and my goal is to capture the SIGILL from inside my program, print some error and kill my program afterward.
I don't want the (Illegal instruction) error printed to be printed in the bash's stderr, I want it to be printed to my program's stderr or stdout.
Edit: here's the sigill handler I have in my code
def sigill_handler(sig, frame):
print("Illegal Instruction. terminating.")
signal.signal(signal.SIGILL, sigill_handler)
notice that this is the only signal I'm handling in my code
Citing https://docs.python.org/3/library/signal.html:
Execution of Python signal handlers
A Python signal handler does not get executed inside the low-level (C) signal handler. Instead, the low-level signal handler sets a flag which tells the virtual machine to execute the corresponding Python signal handler at a later point(for example at the next bytecode instruction). This has consequences:
It makes little sense to catch synchronous errors like SIGFPE or SIGSEGV that are caused by an invalid operation in C code. Python will return from the signal handler to the C code, which is likely to raise the same signal again, causing Python to apparently hang. From Python 3.3 onwards, you can use the faulthandler module to report on synchronous errors.
A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes.
If the handler raises an exception, it will be raised “out of thin air” in the main thread. See the note below for a discussion.
According to https://docs.python.org/3/library/faulthandler.html, all the faulthandler can do is to dump a stack trace, so it does not help for your requirement.
What you could do is to run your possibly failing program from your own wrapper program where you can check the wait status and decide what you display to the user if the program was killed by SIGILL.
It would be better to check if your program runs on a supported platform before using any tensorflow functions.

Python. How do I capture <cntl>C anywhere in a program, then clean up?

I am developing a program to control a machine. Frequently the movement needs to be stopped before it does harm. At the moment, immediate control is by switching off the power because the machine continues to respond to stored commands. This leaves a lot of loose ends. I need to send a command to stop the machine immediately, and clear the queue before it can be started up. This can happen at any time/place during the running of the program.
All the examples I have seen here appear to assume that the placing of the C and keyboardinterrupt response is predicable, e.g.
How do I capture SIGINT in Python?
How do I capture C at any (unpredicted) point in the program?
This question reveals that I don't really understand how the underlying processes work.
You could execute all of your code within a "main" function and catch it within an except.
def Main():
# Running code...
try:
Main()
except KeyboardInterrupt:
# Execute actions to stop immediately.
except Exception as e:
print("An unexpected error occurred:\n" + str(e))
# Execute actions to stop immediately.

Guaranteeing calling to destruction on process termination

After reading A LOT of data on the subject I still couldn't find any actual solution to my problem (there might not be any).
My problem is as following:
In my project I have multiple drivers working with various hardware's (IO managers, programmable loads, power supplies and more).
Initializing connection to these hardware's is costly (in time), and I cant open and then close the connection for every communication iteration between us.
Meaning I cant do this (Assuming programmable load implements enter / exit):
start of code...
with programmable_load(args) as program_instance:
programmable_load_instance.do_something()
rest of code...
So I went for a different solution :
class programmable_load():
def __init__(self):
self.handler = handler_creator()
def close_connection(self):
self.handler.close_connection()
self.handler = None
def __del__(self):
if (self.handler != None):
self.close_connection()
For obvious reasons I dont 'trust' the destructor to actually get called so I explicitly call close_connection() when I want to end my program (for all drivers).
The problem happens when I abruptly terminate the process, for example when I run via debug mode and quit debugging.
In these cases the process terminates without running through any destructors.
I understand that the OS will clear all memory unused at this point, but is there any way to clear the memory in an organized manner?
and if not, is there a way to make the quit debugging function pass through a certain set of functions? Does the python process know it got a quite debugging event or does it treat it as a normal termination?
Operating system: Windows
According to this documentation:
If a process is terminated by TerminateProcess, all threads of the
process are terminated immediately with no chance to run additional
code.
(Emphasis mine.) This implies that there is nothing you can do in this case.
As detailed here, signals don't work very well on ms-windows.
As was mentioned in a comment, you could use atexit to do the cleanup. But that only works if the process is asked to close (e.g. QUIT signal on Linux) and not just killed (as is likely the case when stopping the debugging session). Similarily if you force your computer to turn off (e.g. long press power button or remove power) then it won't be called either. There is no 'solution' to that for obvious reasons. Your program can't expect to be called when the power suddenly goes off or when it is forcefully killed. The point of forcefully killing is to definitely kill the process now. If it first called your clean-up code then you could delay that which defeats the purpose. That is why there are signals such as to ask your process to stop. This is not Python specific. The same concept also applies across operating systems.
Bonus (design suggestion, not a solution): I would argue that you can still make use of the context manager (using with). Your problem is not unique. Database connections are usually kept alive for longer as well. It is a question of the scope. Move the context further up to the application level. Then it is clear what the boundary is and you don't need any magic (you are probably also aware of #contextmanager to make that a breeze).
I haven't tested properly as I don't have wingide installed over here so I can't grant you this will work but what about using setconsolectrlhandler? For instance, try something like this:
import os
import sys
import win32api
if __name__ == "__main__":
def callback(sig, func=None):
print("Exit handler called!")
try:
win32api.SetConsoleCtrlHandler(callback, True)
except Exception as e:
print("Captured exception", e)
sys.exit(1)
print("Press to quit")
input()
print("Bye!")
It'll be able to handle CTRL+C and CTRL+BREAK signals:

Python Testing with Kill Process

When testing a Python script I will often use a 'raw_input()' or 'input()' as a marker in the script, and when that marker is reached do a ctrl + c to kill the process in my command prompt.
I was worried this would lead to memory leaks, this brief thread indicates it shouldn't - killing python process leads memory leak?
It is bad practice to test scripts this way? Can it lead to any negative effects?
Usually there's no problem with using raw_input() or input() with ctrl + c to terminate your program. When you press ctrl+ c during raw_input or input invocation you're just raising a KeyboardInterrupt exception and Python knows how to handle exceptions appropriately. If you don't handle the KeyboardInterrupt, this exception will be handled by the default top-level exception handler, this default exception handler prints the stack trace from which the exception occurred just before exiting the interpreter.
For some applications, they might not free their resources after they exit or after you kill the process, most operating systems today are smart enough to free memory when its unused. But this is not applicable to your question.
Not sure about the leaks but I'm sure breakpoints are your friend here. In idle right click on the line you want to stop on and click set breakpoint, then clear when you're finished debugging.

Python multi-threading: How to keep daemon thread running

I'm stuck at a tricky problem to test whether a daemon thread is running. The daemon thread I made should run at the background to keep the service running, so I do the following to create it and keep it alive:
Creation:
ASThread = threading.Thread(target = initAirserv, args=[],)
ASThread.setDaemon(True)
ASThread.start()
Inside the initAirserv() method:
def initAirserv(self, channel="15"):
interface = self.execAirmon(options="start", interface=self.interface)
port = self.plug_port
if interface != "removed":
if channel=="15":
command = "airserv-ng -d " +str(interface)+" -p "+str(port)
else:
command = "airserv-ng -d " +str(interface)+" -p "+str(port)+" -c"+str(channel)
else:
return None
AServConn=self.init_Plug()
if AServConn:
(stdin, stdout, stderr) = AServConn.exec_command(command)
serv_op = stdout
serv_er = stderr
##### keep the daemon thread run persistently ####
a = 0
while 1:
a += 1
else:
logging.debug( "SSH Error" )
The purpose of the last several lines is to keep the thread busy using a stupid way. However, after starting this daemon thread and I did something else, when I came back and check the thread as follows:
if ASThread.is_alive() == 1:
# do something
the if body never gets executed. Can someone explain to me why does this happen? What's the best way to run a thread which executes something that needs to be busy all the time? Thanks very much.
The posted code doesn't add up. initAirserv as posted is a method on a class, but the initAirserv passed to the Thread constructor is not.
It's also hard to say anything concrete without knowing what execAirmon and init_Plug does, and what else happens in your application.
In general, I'd say you have it right. This should work. The fact that it doesn't means your assumptions are wrong.
Are you sure execAirmon returns something that is not equal to "removed"?
Are you sure init_Plug returns a non-false object?
Are you sure no exceptions are thrown? (I assume you would notice a spurious stacktrace, so are there other parts of your application that could swallow them up unnoticed?)
Some of my information is a few months old, and things may have changed, so please bear with me.
If you are using standard C based Python and are writing a multi-threaded application, you need to be aware of the Global Interpreter Lock (GIL) restriction. That is only one thread can run at a time. If you are willing to use one of the Python C interface packages and write a lot of your code in C, the C portion of your function call can be threaded and is not subject to the GIL restriction.
Python has excellent multi-process support and libraries, and because you are synchronizing processes, the GIL restriction does not apply.
There is talk about fixing the GIL restriction, but for now this is a problem you have to accept.
IMHO, I chose Python to write software in Python, not in C, unless a very specific problem had to be addressed. Python is an excellent language for a lot of things, but the GIL restriction encouraged me to learn a language that would support better event synchronization, aka a multi-threaded environment.
I hope this helps.

Categories