Auto-generate full stack trace to STDERR on thread crash - python

Problem
I have a library which I must use, written in Python 2.7. There are several bugs in it, and one of them occasionally causes the calling thread to crash (rarely, however). I would like to generate the stack trace so I can determine which thread is dying when the library crashes. I get a trace dumped to STDERR of what went wrong in the library, i.e.:
A problem occurred in a Python script. Here is the sequence of
function calls leading up to the error, in the order they occurred.
I've also tried GDB (which works wonders for my C/C++ projects), using a guide I found on StackOverflow to get "Python plus GDB" working (so I can attach to a running Python application). However, I don't see anything helpful that relates to the (now dead) thread.
Question
Is it possible, in Python 2.7, to force a thread (when it crashes) to report a full stack trace to STDOUT, STDERR, or a log file, when this sort of issue (i.e. a library call crashing the calling thread) occurs?
Thank you.

If you have access to the thread definition -- you can write a wrapper thread
import logger
log = logger.getLogger(__name__)
class WrapperThread(threading.Thread):
def __init__(self, innerThread):
self.innerThread = innerThread
def start(self):
try:
# run the thread in the current context with run.
self.innerThread.run()
except Exception as e:
log.error("%s has crashed.", self, exc_info=True) #Exec info makes loggin print the stack trace.
Depending on the library you are using you may be able to apply a decorator to the thread definition. Though I don't recommend code like this ever being included in released code
import threading
import logging
logging.basicConfig()
def loggingThread(clazz, logger):
class _Thread(clazz):
def __init__(self, *args, **kwargs):
clazz.__init__(self, *args, **kwargs)
def run(self):
try:
clazz.run(self)
except Exception as e:
logger.error("%s has Crashed!", self, exc_info=True)
return _Thread
threading.Thread = loggingThread(threading.Thread, logging)
import random
def ohNo(range1, range2):
for x in xrange(1, range1):
if x % random.randint(1, range2) == 0:
raise ValueError("Oh no. %d is an illeagal value!" % (x,))
test = threading.Thread(target=ohNo, args=(500,100))
test.start()

Related

'sys.excepthook' and multiprocessing

I'm trying to use a custom sys.excepthook with the multiprocessing library to handle exceptions on all threads. I know there's an outstanding bug with python that prevents this from working correctly with the Threading library, and testing shows that this also affects multiprocessing.
The Python bug and Stackoverflow post that led me to it both have workarounds for the Threading library, but nothing for multiprocessing. I have tried to adapt the workaround for use with multiprocessing, but the exception is still thrown as usual.
def install_thread_excepthook():
import sys
start_old = multiprocessing.Process.start
def start(*args, **kwargs):
try:
start_old(*args, **kwargs)
except (KeyboardInterrupt, SystemExit):
raise
except:
sys.excepthook(*sys.exc_info())
multiprocessing.Process.start = run
How do I make sys.excepthook work properly with multiprocessing?

External programs are running when multiprocessing Python is closed

In Python (3.5), I started running external executables (written in C++) by multiprocessing.Pool.map + subprocess from an Xshell connection. However, the Xshell connection is interrupted due to bad internet condition.
After connecting again, I see that the managing Python is gone but the C++ executables are still running (and it looks like in the correct way, the Pool seems still control them.)
The question is if this is a bug, and what I shall do in this case. I cannot kill or kill -9 them.
Add: after removing all sublst_file by hand, all running executables(cmd) are gone. It seems that except sub.SubprocessError as e: part is still working.
The basic frame of my program is outlined in the following.
import subprocess as sub
import multiprocessing as mp
import itertools as it
import os
import time
def chunks(lst, chunksize=5):
return it.zip_longest(*[iter(lst)]*chunksize)
class Work():
def __init__(self, lst):
self.lst = lst
def _work(self, sublst):
retry_times = 6
for i in range(retry_times):
try:
cmd = 'my external c++ cmd'
sublst_file = 'a config file generated from sublst'
sub.check_call([cmd, sublst_file])
os.remove(sublst_file)
return sublst # return success sublst
except sub.SubprocessError as e:
if i == (retry_times-1):
print('\n[ERROR] %s %s failed after %d tries\n' % (cmd, sublst_file, retry_times))
return []
else:
print('\n[WARNNING] %dth sleeping, please waiting for restart\n' % (i+1))
time.sleep(1+i)
def work(self):
with mp.Pool(4) as pool:
results = pool.map(self._work, chunks(self.lst, 5))
for r in it.chain(results):
# other work on success items
print(r)
The multiprocessing.Pool does terminate its workers upon terminate() which is also called by __del__ which in turn will be called upon module unload (at exit).
The reason why these guys are orphaned is because subprocess.check_call spawns are not terminated upon exit.
This fact is not mentioned explicitly in the reference, but there is no indication anywhere that the spawns are terminated. A brief review of the source code also left me with no findings. This behavior is also easily testable.
To clean up upon parent termination use the Popen interface and this answer Killing child process when parent crashes in python

Python timeout decorator

I'm using the code solution mentioned here.
I'm new to decorators, and don't understand why this solution doesn't work if I want to write something like the following:
#timeout(10)
def main_func():
nested_func()
while True:
continue
#timeout(5)
def nested_func():
print "finished doing nothing"
=> Result of this will be no timeout at all. We will be stuck on endless loop.
However if I remove #timeout annotation from nested_func I get a timeout error.
For some reason we can't use decorator on function and on a nested function in the same time, any idea why and how can I correct it to work, assume that containing function timeout always must be bigger than the nested timeout.
This is a limitation of the signal module's timing functions, which the decorator you linked uses. Here's the relevant piece of the documentation (with emphasis added by me):
signal.alarm(time)
If time is non-zero, this function requests that a SIGALRM signal be sent to the process in time seconds. Any previously scheduled alarm is canceled (only one alarm can be scheduled at any time). The returned value is then the number of seconds before any previously set alarm was to have been delivered. If time is zero, no alarm is scheduled, and any scheduled alarm is canceled. If the return value is zero, no alarm is currently scheduled. (See the Unix man page alarm(2).) Availability: Unix.
So, what you're seeing is that when your nested_func is called, it's timer cancels the outer function's timer.
You can update the decorator to pay attention to the return value of the alarm call (which will be the time before the previous alarm (if any) was due). It's a bit complicated to get the details right, since the inner timer needs to track how long its function ran for, so it can modify the time remaining on the previous timer. Here's an untested version of the decorator that I think gets it mostly right (but I'm not entirely sure it works correctly for all exception cases):
import time
import signal
class TimeoutError(Exception):
def __init__(self, value = "Timed Out"):
self.value = value
def __str__(self):
return repr(self.value)
def timeout(seconds_before_timeout):
def decorate(f):
def handler(signum, frame):
raise TimeoutError()
def new_f(*args, **kwargs):
old = signal.signal(signal.SIGALRM, handler)
old_time_left = signal.alarm(seconds_before_timeout)
if 0 < old_time_left < second_before_timeout: # never lengthen existing timer
signal.alarm(old_time_left)
start_time = time.time()
try:
result = f(*args, **kwargs)
finally:
if old_time_left > 0: # deduct f's run time from the saved timer
old_time_left -= time.time() - start_time
signal.signal(signal.SIGALRM, old)
signal.alarm(old_time_left)
return result
new_f.func_name = f.func_name
return new_f
return decorate
as Blckknght pointed out, You can't use signals for nested decorators - but you can use multiprocessing to achieve that.
You might use this decorator, it supports nested decorators : https://github.com/bitranox/wrapt_timeout_decorator
and as ABADGER1999 points out in his blog https://anonbadger.wordpress.com/2018/12/15/python-signal-handlers-and-exceptions/
using signals and the TimeoutException is probably not the best idea - because it can be caught in the decorated function.
Of course you can use your own Exception, derived from the Base Exception Class, but the code might still not work as expected -
see the next example - you may try it out in jupyter: https://mybinder.org/v2/gh/bitranox/wrapt_timeout_decorator/master?filepath=jupyter_test_wrapt_timeout_decorator.ipynb
import time
from wrapt_timeout_decorator import *
# caveats when using signals - the TimeoutError raised by the signal may be caught
# inside the decorated function.
# So You might use Your own Exception, derived from the base Exception Class.
# In Python-3.7.1 stdlib there are over 300 pieces of code that will catch your timeout
# if you were to base an exception on Exception. If you base your exception on BaseException,
# there are still 231 places that can potentially catch your exception.
# You should use use_signals=False if You want to make sure that the timeout is handled correctly !
# therefore the default value for use_signals = False on this decorator !
#timeout(5, use_signals=True)
def mytest(message):
try:
print(message)
for i in range(1,10):
time.sleep(1)
print('{} seconds have passed - lets assume we read a big file here'.format(i))
# TimeoutError is a Subclass of OSError - therefore it is caught here !
except OSError:
for i in range(1,10):
time.sleep(1)
print('Whats going on here ? - Ooops the Timeout Exception is catched by the OSError ! {}'.format(i))
except Exception:
# even worse !
pass
except:
# the worst - and exists more then 300x in actual Python 3.7 stdlib Code !
# so You never really can rely that You catch the TimeoutError when using Signals !
pass
if __name__ == '__main__':
try:
mytest('starting')
print('no Timeout Occured')
except TimeoutError():
# this will never be printed because the decorated function catches implicitly the TimeoutError !
print('Timeout Occured')
There's a better version of timeout decorator that's currently on Python's PyPI library. It supports both UNIX and non-UNIX based operating system. The part where SIGNALS are mentioned - that specifically for UNIX.
Assuming you aren't using UNIX. Below is a code snippet from the decorator that shows you a list of parameters that you can use as required.
def timeout(seconds=None, use_signals=True, timeout_exception=TimeoutError, exception_message=None)
For implementation on NON-UNIX base operating system. This is what I would do:
import time
import timeout_decorator
#timeout_decorator.timeout(10, use_signals=False)
def main_func():
nested_func()
while True:
continue
#timeout_decorator.timeout(5, use_signals=False)
def nested_func():
print "finished doing nothing"
If you notice, I'm doing use_signals=False. That's all, you should be good to go.

Catching exception from timer

I am trying to create a watchdog class, that will throw an exception after specified time:
from threading import Timer
from time import sleep
class watchdog():
def _timeout(self):
#raise self
raise TypeError
def __init__(self):
self.t = Timer(1, self._timeout)
def start(self):
self.t.start()
try:
w = watchdog()
w.start()
sleep(2)
except TypeError, e:
print "Exception caught"
else:
print "Of course I didn't catch the exception"
This exception is not caught, as the exception is thrown from completely different context, hence we will see the last message.
My question is, how can I modify the code, so the exception will be caught?
This is not possible, as you suggested, and there is no api for abruptly stopping a thread, either, which rules out other potential solutions.
I believe your best solution is to let the watchdog set a flag, and let the test read it at certain points. Similarly, your test can simply check the duration from time to time.
Note that if the "flag" would set in a way that will cause the main thread to raise an exception (for example, deleting attributes from objects), it'll be just as effective.
The other possibility is to use multiprocessing instead of multythreading, if it is possible for your application.

QObject::connect: Cannot queue arguments of type 'QTextCursor'

Im trying to send a signal from a non-main thread in PyQt but i dont know what am doing wrong! And when i execute the program it fails with this error:
QObject::connect: Cannot queue arguments of type 'QTextCursor'
(Make sure 'QTextCursor' is registered using qRegisterMetaType().)
here is my code:
class Sender(QtCore.QThread):
def __init__(self,q):
super(Sender,self).__init__()
self.q=q
def run(self):
while True:
pass
try: line = q.get_nowait()
# or q.get(timeout=.1)
except Empty:
pass
else:
self.emit(QtCore.SIGNAL('tri()'))
class Workspace(QMainWindow, Ui_MainWindow):
""" This class is for managing the whole GUI `Workspace'.
Currently a Workspace is similar to a MainWindow
"""
def __init__(self):
try:
from Queue import Queue, Empty
except ImportError:
while True:
#from queue import Queue, Empty # python 3.x
print "error"
ON_POSIX = 'posix' in sys.builtin_module_names
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
p= Popen(["java -Xmx256m -jar bin/HelloWorld.jar"],cwd=r'/home/karen/sphinx4-1.0beta5-src/sphinx4-1.0beta5/',stdout=PIPE, shell=True, bufsize= 4024)
q = Queue()
t = threading.Thread(target=enqueue_output, args=(p.stdout, q))
t.daemon = True # thread dies with the program
t.start()
self.sender= Sender(q)
self.connect(self.sender, QtCore.SIGNAL('tri()'), self.__action_About)
self.sender.start()
I think that my way of send parameter to the thread is wrong...
I need to know how to send parameters to a thread, in my case i need to send q to the worker thread.
Quite new to PyQt5, but this appears to happen when you try to do a GUI operation from a thread which is not the "application thread". I put this in quotes because it appears to be a mistake to think that, even in a fairly simple PyQt5 app, QApplication.instance().thread() will always return the same object.
The thing to do is to use the signal/slot mechanism to send any kind of data from a worker thread (a thread created in my case by extending QtCore.QRunnable, one other pattern apparently being QtCore.QThread and QtCore.QObject.moveToThread, see here).
Then also include a check in all your slot methods which are likely to receive data from a non-"application thread". Example which logs messages visually during execution:
def append_message(self, message):
# this "instance" method is very useful!
app_thread = QtWidgets.QApplication.instance().thread()
curr_thread = QtCore.QThread.currentThread()
if app_thread != curr_thread:
raise Exception('attempt to call MainWindow.append_message from non-app thread')
ms_now = datetime.datetime.now().isoformat(sep=' ', timespec='milliseconds')
self.messages_text_box.insertPlainText(f'{ms_now}: {message}\n')
# scroll to bottom
self.messages_text_box.moveCursor(QtGui.QTextCursor.End)
It's all too easy to just call this inadvertently and directly from a non-"application thread".
Making such a mistake then raise an exception is good, because it gives you a stack trace showing the culprit call. Then change the call so that it instead sends a signal to the GUI class, the slot for which could be the method in the GUI class (here append_message), or alternatively one which then in turn calls append_message.
In my example I've included the "scroll to bottom" line above because it was only when I added that line that these "cannot queue" errors started happening. In other words, it is perfectly possible to get away with a certain amount of non-compliant handling (in this case adding some more text with each call) without any error being raised... and only later do you then run into difficulties. To prevent this, I suggest that EVERY method in a GUI class with GUI functionality should include such a check!
Make sure 'QTextCursor' is registered using qRegisterMetaType().
Did you try to use qRegisterMetaType function?
The official manual says:
The class is used as a helper to marshall types in QVariant and in
queued signals and slots connections. It associates a type name to a
type so that it can be created and destructed dynamically at run-time.
Declare new types with Q_DECLARE_METATYPE() to make them available to
QVariant and other template-based functions. Call qRegisterMetaType()
to make type available to non-template based functions, such as the
queued signal and slot connections.
I would like to add the following notes to the #mike rodent's post which solved my problem (I'm using PyQt5):
Custom signals and slots can be used to avoid directly modifying GUI from thread other than "application thread" (I'm using Python threading module and the equivalent there to that is probably "main thread"). I find this website very useful for basic custom signal and slot setup. Pay attention to using a class (and not an instance) attribute.
To avoid the QObject::connect: Cannot queue arguments of type 'QTextCursor' message I needed to find the following locations and add some code:
Before the function __init__ of the class MainWindow: definition of class attribute; I needed to use something like class_attribute = pyqtSignal(str).
In the function __init__: self.class_attribute.connect(self.slot_name)
Inside of a thread (I mean the thread which is not the main thread): self.class_attribute.emit(str)
In the slot inside the main thread: "safety mechanism" proposed by #mike rodent.

Categories