Consistent assert for parallel runs in python? - python

it seems to me that in a python code that runs in parallel, an assert that is failed by at least one processor should abort all the processors, so that:
1) the error message is clearly visible (with the stack trace)
2) the remaining processors do not keep waiting forever.
However this is not what the standard assert does.
This question has already been asked in
python script running with mpirun not stopping if assert on processor 0 fails
but I am not satisfied by the answer. There it is suggested to use the comm.Abort() function, but this only answers point 2) above.
So I was wondering: is there a standard "assert" function for parallel codes (eg with mpi4py), or should I write my own assert for that purpose?
Thanks!
Edit -- here is my try (in a class but could be outside), which can surely be improved:
import mpi4py.MPI as mpi
import traceback
class My_code():
def __init__(self, some_parameter=None):
self.current_com = mpi.COMM_WORLD
self.rank = self.current_com.rank
self.nb_procs = self.current_com.size
self.my_assert(some_parameter is not None)
self.parameter = some_parameter
print "Ok, parameter set to " + repr(self.parameter)
# some class functions here...
def my_assert(self, assertion):
"""
this is a try for an assert function that kills
every process in a parallel run
"""
if not assertion:
print 'Traceback (most recent call last):'
for line in traceback.format_stack()[:-1]:
print(line.strip())
print 'AssertionError'
if self.nb_procs == 1:
exit()
else:
self.current_com.Abort()

I think that the following piece of code answers the question. It is derived from the discussion pointed by Dan D.
import mpi4py.MPI as mpi
import sys
# put this somewhere but before calling the asserts
sys_excepthook = sys.excepthook
def mpi_excepthook(type, value, traceback):
sys_excepthook(type, value, traceback)
if mpi.COMM_WORLD.size > 1:
mpi.COMM_WORLD.Abort(1)
sys.excepthook = mpi_excepthook
# example:
if mpi.COMM_WORLD.rank == 0:
# with sys.excepthook redefined as above this will kill every processor
# otherwise this would only kill processor 0
assert 1==0
# assume here we have a lot of print messages
for i in range(50):
print "rank = ", mpi.COMM_WORLD.rank
# with std asserts the code would be stuck here
# and the error message from the failed assert above would hardly be visible
mpi.COMM_WORLD.Barrier()

Related

Callback from Ctypes sometimes fails

I have registered a python callback with a dll using the ctypes library. When the callback is triggered, i try to free up an asyncio future i have set up. Since the callback happens in a separate thread that gets spawned by the dll, i use the loop.call_soon_threadsafe() function to get back to the eventloop that started it all.
Mostly this works fine, but every once in a while the future fails to be unblocked. In the minimal example here this also happens sometimes, but here i see that in those cases the callback doesn't even arrive (or at least the corresponding print doesn't happen).
I tried this only with python 3.8.5 so far. Is there some race condition here that i did not notice?
Here's a minimal example:
import asyncio
import os
class testClass:
loop = None
future = None
exampleDll = None
def finish(self):
#now in the right c thread and eventloop.
print("callback in eventloop")
self.future.set_result(999)
def trampoline(self):
#still in the other c thread
self.loop.call_soon_threadsafe(self.finish)
def example_callback(self):
#in another c thread, so we need to do threadsafety stuff
print("callback has arrived")
self.trampoline()
return
async def register_and_wait(self):
self.loop = asyncio.get_event_loop()
self.future=self.loop.create_future()
callback_type = ctypes.CFUNCTYPE(None)
callback_as_cfunc = callback_type(self.example_callback)
#now register the callback and wait
self.exampleDll.fnminimalExample(callback_as_cfunc, ctypes.c_int(1))
await self.future
print("future has finished")
def main(self):
path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "minimalExample.dll")
#print(path)
ctypes.cdll.LoadLibrary(path)
#for easy access
self.exampleDll = ctypes.cdll.minimalExample
asyncio.run(self.register_and_wait())
if __name__ == "__main__":
for i in range(0,100000):
print(i)
test = testClass()
test.main()
You can get the compiled example dll and its source from the repository here to reproduce.
The issue (at least in this minimal example) does not show up any more if i reuse the same eventloop instead of spawning a new one for every iteration with asyncio.run
The problem is thus fixed, but it doesn't feel right.

Python PDB won't stop

I have a [large] program which has suddenly started having an issue somewhere in an infinite loop. I cannot find this loop.
I did:
import pdb
pdb.run ( 'main()' )
So when the program enters the infinite loop, I hit control-C and...... it doesn't do anything. In fact, when I don't use pdb, control-C doesn't work either.
I'm not overriding the signals. Even if I do, control-C does nothing.
I ran this in lldb to see if the problem was somewhere in C++-land, and it's not - it's definitely frozen executing python crap (on thread #7 if that matters).
How do I get pdb to actually break on control-c?
Here's a simple 'debugger' that counts the number of times each line is passed over and raises an error when a line is hit too many times. Hopefully it can help find the loop if there really is one.
from bdb import Bdb
from collections import Counter
class LoopDetector(Bdb):
def __init__(self, maxhits):
Bdb.__init__(self)
self.counter = Counter()
self.maxhits = maxhits
def do_clear(self, arg):
pass
def user_line(self, frame):
filename = frame.f_code.co_filename
lineno = frame.f_lineno
key = (filename, lineno)
self.counter[key] += 1
if self.counter[key] >= self.maxhits:
raise ValueError('Too many hits at %s:%s' % key)
LoopDetector(1000).set_trace()
x = 1
y = x + 2
for i in range(200):
y += i
while True: # An exception gets raised here
y -= 1
print 'Does not get here'
This has to be done once per thread since it only affects the current thread.
Take a look in the PDB docs
You should add a breakpoint in you function (main in your example) using pdb.set_trace()
Then you can run the function using the command line (e.g python myprog.py) and the program will stop where you set the breakpoint.
import pdb
def main():
i = 0
while i<10:
print i
if i == 8:
pdb.set_trace()
i += 1
In the example above the the program will stop for debugging when i==8

thread.start_new_thread: transfer exception to main-thread [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Catch a thread’s exception in the caller thread in Python
I have a given code and there is a
thread.start_new_thread()
As I just read in python doc: "When the function terminates with an unhandled exception, a stack trace is printed and then the thread exits (but other threads continue to run)."
But I want to terminate also the main-thread when the (new) function terminates with an exception - So the exception shall be transfered to the main-thread. How can I do this?
edit:
here is part of my code:
def CaptureRegionAsync(region=SCREEN, name="Region", asyncDelay=None, subDir="de"):
if asyncDelay is None:
CaptureRegion(region, name, subDir)
else:
thread.start_new_thread(_CaptureRegionAsync, (region, name, asyncDelay, subDir))
def _CaptureRegionAsync(region, name, asyncDelay, subDir):
time.sleep(max(0, asyncDelay))
CaptureRegion(region, name, subDir)
def CaptureRegion(region=SCREEN, name="Region", subDir="de"):
...
if found:
return
else:
raise Exception(u"[warn] Screenshot has changed: %s" % filename)
CaptureRegionAsync(myregion,"name",2)
UPD: It is not the best solution as your question is different from what I thought it is: I expected you were trying to deal with an exception in thread code you can not modify. However, i decided not to delete the answer.
It's hard to catch an exception from another thread. Usually it should be transferred manually.
Can you wrap a thread function? If you wrap the worker thread that crashes into try...except, then you may perform actions needed to exit main thread in except block (sys.exit seems to be useless, though it's a surprise to me, but there is thread.interrupt_main() and os.abort(), though you may need something more graceful like setting a flag the main thread is checking regularly).
However, if you can not wrap the function (can not modify the third-party code calling start_new_thread), you may try monkey patching the thread module (or the third-party module itself). Patched version of start_new_thread() should wrap the function you worry about:
import thread
import sys
import time
def t():
print "Thread started"
raise Exception("t Kaboom")
def t1():
print "Thread started"
raise Exception("t1 Kaboom")
def decorate(s_n_t):
def decorated(f, a, kw={}):
if f != t:
return s_n_t(f, a, kw)
def thunk(*args, **kwargs):
try:
f(*args, **kwargs)
except:
print "Kaboom"
thread.interrupt_main() # or os.abort()
return s_n_t(thunk, a, kw)
return decorated
# Now let's do monkey patching:
thread.start_new_thread = decorate(thread.start_new_thread)
thread.start_new_thread(t1, ())
time.sleep(5)
thread.start_new_thread(t, ())
time.sleep(5)
Here an exception in t() causes thread.interrupt_main()/os.abort(). Other thread functions in the application are not affected.

Is there a python library for notification and waiting?

I'm using python-zookeeper for locking, and I'm trying to figure out a way of getting the execution to wait for notification when it's watching a file, because zookeeper.exists() returns immediately, rather than blocking.
Basically, I have the code listed below, but I'm unsure of the best way to implement the notify() and wait_for_notification() functions. It could be done with os.kill() and signal.pause(), but I'm sure that's likely to cause problems if I later have multiple locks in one program - is there a specific Python library that is good for this sort of thing?
def get_lock(zh):
lockfile = zookeeper.create(zh,lockdir + '/guid-lock-','lock', [ZOO_OPEN_ACL_UNSAFE], zookeeper.EPHEMERAL | zookeeper.SEQUENCE)
while(True):
# this won't work for more than one waiting process, fix later
children = zookeeper.get_children(zh, lockdir)
if len(children) == 1 and children[0] == basename(lockfile):
return lockfile
# yeah, there's a problem here, I'll fix it later
for child in children:
if child < basename(lockfile):
break
# exists will call notify when the watched file changes
if zookeeper.exists(zh, lockdir + '/' + child, notify):
# Process should wait here until notify() wakes it
wait_for_notification()
def drop_lock(zh,lockfile):
zookeeper.delete(zh,lockfile)
def notify(zh, unknown1, unknown2, lockfile):
pass
def wait_for_notification():
pass
The Condition variables from Python's threading module are probably a very good fit for what you're trying to do:
http://docs.python.org/library/threading.html#condition-objects
I've extended to the example to make it a little more obvious how you would adapt it for your purposes:
#!/usr/bin/env python
from collections import deque
from threading import Thread,Condition
QUEUE = deque()
def an_item_is_available():
return bool(QUEUE)
def get_an_available_item():
return QUEUE.popleft()
def make_an_item_available(item):
QUEUE.append(item)
def consume(cv):
cv.acquire()
while not an_item_is_available():
cv.wait()
print 'We got an available item', get_an_available_item()
cv.release()
def produce(cv):
cv.acquire()
make_an_item_available('an item to be processed')
cv.notify()
cv.release()
def main():
cv = Condition()
Thread(target=consume, args=(cv,)).start()
Thread(target=produce, args=(cv,)).start()
if __name__ == '__main__':
main()
My answer may not be relevant to your question, but it is relevant to the question title.
from threading import Thread,Event
locker = Event()
def MyJob(locker):
while True:
#
# do some logic here
#
locker.clear() # Set event state to 'False'
locker.wait() # suspend the thread until event state is 'True'
worker_thread = Thread(target=MyJob, args=(locker,))
worker_thread.start()
#
# some main thread logic here
#
locker.set() # This sets the event state to 'True' and thus it resumes the worker_thread
More information here: https://docs.python.org/3/library/threading.html#event-objects

How can I register a function to be called only on *successful* exit of my Python program?

I want to run a task when my Python program finishes, but only if it finishes successfully. As far as I know, using the atexit module means that my registered function will always be run at program termination, regardless of success. Is there a similar functionality to register a function so that it runs only on successful exit? Alternatively, is there a way for my exit function to detect whether the exit was normal or exceptional?
Here is some code that demonstrates the problem. It will print that the program succeeded, even when it has failed.
import atexit
def myexitfunc():
print "Program succeeded!"
atexit.register(myexitfunc)
raise Exception("Program failed!")
Output:
$ python atexittest.py
Traceback (most recent call last):
File "atexittest.py", line 8, in <module>
raise Exception("Program failed!")
Exception: Program failed!
Program succeeded!
Out of the box, atexit is not quite suited for what you want to do: it's primarily used for resource cleanup at the very last moment, as things are shutting down and exiting. By analogy, it's the "finally" of a try/except, whereas what you want is the "else" of a try/except.
The simplest way I can think of is continuing to create a global flag which you set only when your script "succeeds"... and then have all the functions you attach to atexit check that flag, and do nothing unless it's been set.
Eg:
_success = False
def atsuccess(func, *args, **kwds):
def wrapper():
if _success:
func(*args,**kwds)
atexit(wrapper)
def set_success():
global _success
_success = True
# then call atsuccess() to attach your callbacks,
# and call set_success() before your script returns
One limitation is if you have any code which calls sys.exit(0) before setting the success flag. Such code should (probably) be refactored to return to the main function first, so that you call set_success and sys.exit in only one place. Failing that, you'll need add something like the following wrapper around the main entry point in your script:
try:
main()
except SystemExit, err:
if err.code == 0:
set_success()
raise
Wrap the body of your program in a with statement and define a corresponding context object that only performs your action when no exceptions have been raised. Something like:
class AtExit(object):
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
if exc_value is None:
print "Success!"
else:
print "Failure!"
if __name__ == "__main__":
with AtExit():
print "Running"
# raise Exception("Error")

Categories