Python - Catch signals and exceptions in remote script

Python - Catch signals and exceptions in remote script - python

I am running a remote Python script on AWS (EC2 ubuntu) in background. The script performs some file manipulations, launches a long running simulation (subprocess run with os.system(...)) and writes some log files. I would like to manage the status of the running script and hopefully exit gracefully from various conditions. Specifically:
The sub-process is interrupted by the user with signal 15.
The simulation (sub-process) fails (signal 8 - Floating point exception)
The vm is rebooted
The vm is terminated. I am using Elastic File System, so even if the instance is destroyed, all the files are not.
I know how to handle basic exceptions, but I am a bit lost when I need to catch exceptions from subprocesses. Can you recommend a solid approach?
EDIT: Please notice the bold part.

For your given scenarios, try with signal handling. In given cases, case 1 (signal 15) and case 3 (vm is getting rebooted), are similar(generally signal 15/SIGTERM is part of shutdown sequence or maybe triggered by user with proper privileges. Nonetheless it serves the required purpose).
signal 8 - SIGFPE
import signal
def signalHandler(sigNum, frameObject):
if sigNum == 15:
# Code for handling signal 15 goes here
elif sigNum == 8:
# Code for handling signal 8 goes here
signal.signal(signal.SIGTERM, signalHandler) # signal 15
signal.signal(signal.SIGFPE, signalHandler) # signal 8

I might be misunderstanding you but just put all exception causing code in try-except blocks. You seem pretty knowledgeable but I'll give an example anyways
try:
//some potentially error causing code
except (errorType): //need to know what type of exception it will throw
//code for what to do if the error occurs

Related

How exactly is SIGILL generated?

I have a program using tensorflow on a non-supported hardware, so everytime i run it, i get the "Illegal instruction (Core dumped)" error
my main goal is to capture this error. i don't want to solve it.
The error is not printed to the stderr of my program, it's printed to the stderr of bash.
then my program exists with code 33792 which is 132 (SIGILL)
And i cannot capture it using the method mentioned here, because i'm running my command using docker run and i can't pass it the curly brackets
Is there any way to capture the stdout of bash without the curly brackets?
Also how exactly is SIGILL generated? what exactly is happening behind the scenes?
Is SIGILL triggered in the parent process (bash in my case) and passed to the child process (my program)? or vice versa?
i tried adding a SIGILL handler in my program to see if i can capture it, but my program froze instead of printing the "illegal instruction" error.
I'm using Debian 11 and my program is written in python.
Edit:
The SIGILL kills my python program and my goal is to capture the SIGILL from inside my program, print some error and kill my program afterward.
I don't want the (Illegal instruction) error printed to be printed in the bash's stderr, I want it to be printed to my program's stderr or stdout.
Edit: here's the sigill handler I have in my code
def sigill_handler(sig, frame):
print("Illegal Instruction. terminating.")
signal.signal(signal.SIGILL, sigill_handler)
notice that this is the only signal I'm handling in my code

Citing https://docs.python.org/3/library/signal.html:
Execution of Python signal handlers
A Python signal handler does not get executed inside the low-level (C) signal handler. Instead, the low-level signal handler sets a flag which tells the virtual machine to execute the corresponding Python signal handler at a later point(for example at the next bytecode instruction). This has consequences:
It makes little sense to catch synchronous errors like SIGFPE or SIGSEGV that are caused by an invalid operation in C code. Python will return from the signal handler to the C code, which is likely to raise the same signal again, causing Python to apparently hang. From Python 3.3 onwards, you can use the faulthandler module to report on synchronous errors.
A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes.
If the handler raises an exception, it will be raised “out of thin air” in the main thread. See the note below for a discussion.
According to https://docs.python.org/3/library/faulthandler.html, all the faulthandler can do is to dump a stack trace, so it does not help for your requirement.
What you could do is to run your possibly failing program from your own wrapper program where you can check the wait status and decide what you display to the user if the program was killed by SIGILL.
It would be better to check if your program runs on a supported platform before using any tensorflow functions.

Python. How do I capture <cntl>C anywhere in a program, then clean up?

I am developing a program to control a machine. Frequently the movement needs to be stopped before it does harm. At the moment, immediate control is by switching off the power because the machine continues to respond to stored commands. This leaves a lot of loose ends. I need to send a command to stop the machine immediately, and clear the queue before it can be started up. This can happen at any time/place during the running of the program.
All the examples I have seen here appear to assume that the placing of the C and keyboardinterrupt response is predicable, e.g.
How do I capture SIGINT in Python?
How do I capture C at any (unpredicted) point in the program?
This question reveals that I don't really understand how the underlying processes work.

You could execute all of your code within a "main" function and catch it within an except.
def Main():
# Running code...
try:
Main()
except KeyboardInterrupt:
# Execute actions to stop immediately.
except Exception as e:
print("An unexpected error occurred:\n" + str(e))
# Execute actions to stop immediately.

Python: "Unbreakable" infinite loop was exited

I'm recording data with a RuuviTag Bluetooth sensor which sends temperature values to my RaspberryPi.
According to the RuuviTag Python library docu I'm supposed to use the function RuuviTagSensor.get_datas(handle_data) which starts an infinite loop of the function handle_data().
In my case, I designed it like:
def handle_data(found_data):
temperature_measurement = found_data[1]['temperature']
publish_via_mqtt(temperature_measurement)
and I wrapped it all up in:
while True:
try:
RuuviTagSensor.get_datas(handle_data)
except Exception:
logger.exception(f"An error occurred at {datetime.now(timezone.utc)}")
reconnect_mqtt()
However, overnight, I broke ...
The logs say:
INFO:ruuvitag_sensor.ble_communication:Problem with hciconfig reset. Retry reset.
INFO:ruuvitag_sensor.ble_communication:Problem with hciconfig reset. Retry reset.
INFO:ruuvitag_sensor.ble_communication:Problem with hciconfig reset. Retry reset.
INFO:ruuvitag_sensor.ble_communication:Problem with hciconfig reset. Exit.
So it seems like the RaspberryPi hat a Bluetooth problem and tried to reconnect with the Ruuvis ... and when that did not work 3 times, he just EXITED PYTHON? Am I correct? Is there a way to restart the whole script if it is exited or handle this exit?

In the library code, exit(1) is called if the sensor cannot be started after three attempts.
Calling exit raises the SystemExit exception. SystemExit does not inherit from Exception, and so is not caught by the try/except block in the code in the question. This behaviour seems to be intentional, according to the project's README:
In case of errors, application tries to exit immediately, so it can be automatically restarted.
Trapping SystemExit within the program is possible (except SystemExit:) if you want to force the library to keep retrying.

How to debug silent crash

I have a Python3.4 and PyQt5 application. This application communicate with an embedded device (Send and receive some frames).
I have a method (run from a QThread) to retrieve the device events ( It can be 10 events or more than 600 ). This methode work well in "release" mode.
But when I start the program in "debug" mode with Pycharm, it will work without breakpoint but will crash with the exit code 0 if I put a breakpoint.
I have a retry button to launch this process.
So in release mode if I retry again and again it will also fail with the the exit code 0.
Moreover, the application doesn't crash each time at the same moment, if the amount of data to read from the device is large, the soft will crash earlier, else it will be longer.
So I was thinking about memory, but I can't catch any exception.
I tried to re-raise every exception in my program, nothing, so I tried to add thoses lines in my main :
def on_exception_triggered(type_except, value, tb):
import traceback
trace = "".join(traceback.format_exception(type_except, value, tb))
print("ERROR HOOKED : ", trace)
sys.__excepthook__(type_except, value, tb)
sys.excepthook = on_exception_triggered
But it catch nothing more

Actually the best thing that you can do is try to make your code independent from pyqt and debug it look for issues fix them and make connection with pyqt, because otherwise even if your code works fine you will get just the interface on a screen and you can not see what happen

Why is my threading/multiprocessing python script not exiting properly?

I have a server script that I need to be able to shutdown cleanly. While testing the usual try..except statements I realized that Ctrl-C didn't work the usual way. Normally I'd wrap long running tasks like this
try:
...
except KeyboardInterrupt:
#close the script cleanly here
so the task could be shutdown cleanly on Ctrl-C. I have never ran into any problems with this before, but somehow when I hit Ctrl-C when this particular script is running the script just exits without catching the Ctrl-C.
The initial version was implemented using Process from multiprocessing. I rewrote the script using Thread from threading, but same issue there. I have used threading many times before, but I am new to the multiprocessing library. Either way, I have never experienced this Ctrl-C behavior before.
Normally I have always implemented sentinels etc to close down Queues and Thread instances in an orderly fashion, but this script just exits without any response.
Last, I tried overriding signal.SIGINT as well like this
def handler(signal, frame):
print 'Ctrl+C'
signal.signal(signal.SIGINT, handler)
...
Here Ctrl+C was actually caught, but the handler doesn't execute, it never prints anything.
Besides the threading / multiprocessing aspect, parts of the script contains C++ SWIG objects. I don't know if that has anything to do with it. I am running Python 2.7.2 on OS X Lion.
So, a few questions:
What's going on here?
How can I debug this?
What do I need to learn in order to understand the root cause?
PLEASE NOTE: The internals of the script is proprietary so I can't give code examples. I am however very willing to receive pointers so I could debug this myself. I am experienced enough to be able to figure it out if someone could point me in the right direction.
EDIT: I started commenting out imports etc to see what caused the weird behavior, and I narrowed it down to an import of a C++ SWIG library. Any ideas why importing a C++ SWIG library 'steals' Ctrl-C? I am not the author of the guilty library however and my SWIG experience is limited so don't really know where to start...
EDIT 2: I just tried the same script on a windows machine, and in Windows 7 the Ctrl-C is caught as expected. I'm not really going to bother with the OS X part, the script will be run in an Windows environment anyway.

This might have to do with the way Python manages threads, signals and C calls.
In short - Ctrl-C cannot interrupt C calls, since the implementation requires that a python thread will handle the signal, and not just any thread, but the main thread (often blocked, waiting for other threads).
In fact, long operations can block everything.
Consider this:
>>> nums = xrange(100000000)
>>> -1 in nums
False (after ~ 6.6 seconds)
>>>
Now, Try hitting Ctrl-C (uninterruptible!)
>>> nums = xrange(100000000)
>>> -1 in nums
^C^C^C (nothing happens, long pause)
...
KeyboardInterrupt
>>>
The reason Ctrl-C doesn't work with threaded programs is that the main thread is often blocked on an uninterruptible thread-join or lock (e.g, any 'wait', 'join' or just a plain empty 'main' thread, which in the background causes python to 'join' on any spawned threads).
Try to insert a simple
while True:
time.sleep(1)
in your main thread.
If you have a long running C function, do signal handling in C-level (May the Force be with you!).
This is largely based on David Beazley's video on the subject.

It exits because something else is likely catching the KeyboardInterupt and then raising some other exception, or simply returning None. You should still get a traceback to help debug. You need to capture the stderr output or run your script with the -i commandline option so you can see traceback. Also, add another except block to catch all other exceptions.
If you suspect the C++ function call to be catching the CTRL+C try catching it's output. If the C function is not returning anything then there isn't much you can do except ask the author to add some exception handling, return codes, etc.
try:
#Doing something proprietary ...
#catch the function call output
result = yourCFuncCall()
#raise an exception if it's not what you expected
if result is None:
raise ValueError('Unexpected Result')
except KeyboardInterupt:
print('Must be a CTRL+C')
return
except:
print('Unhandled Exception')
raise

how about atexit?
http://docs.python.org/library/atexit.html#module-atexit

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.