Event Handling in Python Luigi - python

I've been trying to integrate Luigi as our workflow handler. Currently we are using concourse, however many of the things we're trying to do is a hassle to get around in concourse so we made the switch to Luigi as our dependency manager. No problems so far, workflows trigger and execute properly.
The issue comes in when a task fails for whatever reason. This case specifically the requires block of a task, however all cases need to be taken care of. As of right now Luigi gracefully takes care of the error and writes it to STDOUT. It still emits and exit code 0 though, which to concourse means the job passed. A false positive.
I've been trying to get the event handling to fix this, but I cannot get it to trigger, even with an extremely simple job:
#luigi.Task.event_handler(luigi.Event.FAILURE)
def mourn_failure(task, exception):
with open('/root/luigi', 'a') as f:
f.write("we got the exception!") #testing in concourse image
sys.exit(luigi.retcodes.retcode().unhandled_exception)
class Test(luigi.Task):
def requires(self):
raise Exception()
return []
def run(self):
pass
def output(self):
return []
Then running the command in python shell
luigi.run(main_task_cls=Test, local_scheduler=True)
The exception gets raised, but the even doesn't fire or something.
The file doesn't get written and the exit code is still 0.
Also, if it makes a difference I have my luigi config at /etc/luigi/client.cfg which contains
[retcode]
already_running=10
missing_data=20
not_run=25
task_failed=30
scheduling_error=35
unhandled_exception=40
I'm at a loss as to why the event handler won't trigger, but somehow I need the process to fail on an error.

It seems like the problem is where you place the "raise Exception" call.
If you place it in the requires function - it basically runs before your Test task run method. So it's not as if your Test task failed, but the task it's dependent on (right now, empty...).
for example if you move the raise to run, you're code will behave as you expect.
def run(self):
print('start')
raise Exception()
To handle a case where your dependency fails (in this case, the exception is raised in the requires method), you can add another type of luigi event handler, BROKEN_TASK: luigi.Event.BROKEN_TASK.
This will make sure the luigi code emits the return code (different than 0) you expect.
Cheers!

If you'd like to catch exceptions in requires(), use the following:
#luigi.Task.event_handler(luigi.Event.BROKEN_TASK)
def mourn_failure(task, exception):
...

If I understand it correctly, you just want luigi to return an error code when a task fails, I had many issues with this one, but it turns out to be quite simple, you just need to run it with luigi on the command line, not with python. Like this:
luigi --module my_module MyTask
I don't know if that was your problem too, but I was running with python, and then luigi ignored the retcodes on the luigi.cfg. Hope it helps.

Related

Is it possible to test a while True loop with pytest (I try with a timeout)?

I have a python function foo with a while True loop inside.
For background: It is expected do stream info from the web, do some writing and run indefinitely. The asserts test if the writing was done correctly.
Clearly I need it to stop sometime, in order to test.
What I did was to run via multirpocessing and introduce a timeout there, however when I see the test coverage, the function which ran through the multiprocessing, are not marked as covered.
Question 1: Why does pytest now work this way?
Question 2: How can I make this work?
I was thinking it's probably because I technically exit the loop, so maybe pytest does not mark this as tested....
import time
import multiprocessing
def test_a_while_loop():
# Start through multiprocessing in order to have a timeout.
p = multiprocessing.Process(
target=foo
name="Foo",
)
try:
p.start()
# my timeout
time.sleep(10)
p.terminate()
finally:
# Cleanup.
p.join()
# Asserts below
...
More info
I looked into adding a decorator such as #pytest.mark.timeout(5), but that did not work and it stops the whole function, so I never get to the asserts. (as suggested here).
If I don't find a way, I will just test the parts, but ideally I would like to find a way to test by breaking the loop.
I know I can re-write my code in order to make it have a timeout, but that would mean changing the code to make it testable, which I don't think is a good design.
Mocks I have not tried (as suggested here), because I don't believe I can mock what I do, since it writes info from the web. I need to actually see the "original" working.
Break out the functionality you want to test into a helper method. Test the helper method.
def scrape_web_info(url):
data = get_it(url)
return data
# In production:
while True:
scrape_web_info(...)
# During test:
def test_web_info():
assert scrape_web_info(...) == ...
Yes, it is possible and the code above shows one way to do it (run through a multiprocessing with a timeout).
Since the asserts were running fine, I found out that the issue was not the pytest, but the coverage report not accounting for the multiprocessing properly.
I describe how I fix this (now separate) issue question here.
Actually, I had the same problem with an endless task to test and coverage. However, In my code, there is a .run_forever() method which runs a .run_once() method inside in an infinite loop. So, I can write a unit test for the .run_once() method to test its functionality. Nevertheless, if you want to test your forever function despite the Halting Problem for getting more extent code coverage, I propose the following approach using a timeout regardless of tools you've mentioned including multiprocessing or #pytest.mark.timeout(5) which didn't work for me either:
First, install the interruptingcow PyPI package to have a nice timeout for raising an optional exception: pip install interruptingcow
Then:
import pytest
import asyncio
from interruptingcow import timeout
from <path-to-loop-the-module> import EventLoop
class TestCase:
#pytest.mark.parametrize("test_case", ['none'])
def test_events(self, test_case: list):
assert EventLoop().run_once() # It's usual
#pytest.mark.parametrize("test_case", ['none'])
def test_events2(self, test_case: list):
try:
with timeout(10, exception=asyncio.CancelledError):
EventLoop().run_forever()
assert False
except asyncio.CancelledError:
assert True

Is there a way for workers in multiprocessing.Pool's apply_async to catch errors and continue?

When using multiprocessing.Pool's apply_async(), what happens to breaks in code? This includes, I think, just exceptions, but there may be other things that make the worker functions fail.
import multiprocessing as mp
pool = mp.Pool(mp.cpu_count())
for f in files:
pool.apply_async(workerfunct, args=(*args), callback=callbackfunct)
As I understand it right now, the process/worker fails (all other processes continue) and anything past a thrown error is not executed, EVEN if I catch the error with try/except.
As an example, usually I'd except Errors and put in a default value and/or print out an error message, and the code continues. If my callback function involves writing to file, that's done with default values.
This answerer wrote a little about it:
I suspect the reason you're not seeing anything happen with your example code is because all of your worker function calls are failing. If a worker function fails, callback will never be executed. The failure won't be reported at all unless you try to fetch the result from the AsyncResult object returned by the call to apply_async. However, since you're not saving any of those objects, you'll never know the failures occurred. If I were you, I'd try using pool.apply while you're testing so that you see errors as soon as they occur.
If you're using Python 3.2+, you can use the error_callback keyword argument to to handle exceptions raised in workers.
pool.apply_async(workerfunct, args=(*args), callback=callbackfunct, error_callback=handle_error)
handle_error will be called with the exception object as an argument.
If you're not, you have to wrap all your worker functions in a try/except to ensure your callback is executed. (I think you got the impression that this wouldn't work from my answer in that other question, but that's not the case. Sorry!):
def workerfunct(*args):
try:
# Stuff
except Exception as e:
# Do something here, maybe return e?
pool.apply_async(workerfunct, args=(*args), callback=callbackfunct)
You could also use a wrapper function if you can't/don't want to change the function you actually want to call:
def wrapper(func, *args):
try:
return func(*args)
except Exception as e:
return e
pool.apply_async(wrapper, args=(workerfunct, *args), callback=callbackfunct)

Block python's atexit during a crash?

I have a python script I've written which uses atexit.register() to run a function to persist a list of dictionaries when the program exits. However, this code is also running when the script exits due to a crash or runtime error. Usually, this results in the data becoming corrupted.
Is there any way to block it from running when the program exits abnormally?
EDIT: To clarify, this involves a program using flask, and I'm trying to prevent the data persistence code from running on an exit that results from an error being raised.
You don't want to use atexit with Flask. You want to use Flask signals. It sounds like you are specifically looking for the request_finished signal.
from flask import request_finished
def request_finished_handler(sender, response, **extra):
sender.logger.debug('Request context is about to close down. '
'Response: %s', response)
# do some fancy storage stuff.
request_finished.connect(request_finished_handler, app)
The benefit of request_finished is that it only fires after a successful response. That means that so long as there isn't an error in another signal, you should be good.
One way: at global level in main program:
abormal_termination = False
def your_cleanup_function():
# Add next two lines at the top
if abnormal_termination:
return
# ...
# At end of main program:
try:
# your original code goes here
except Exception: # replace according to what *you* consider "abnormal"
abnormal_termination = True # stop atexit handler
Not pretty, but straightforward ;-)

gevent.Timeout not raised

I have a server that does "some stuff" in a section and I have a "with gevent.Timeout(5)" around that. I have some checks going on in another greenlet and through that I noticed that one of the greenlets which did that "some stuff" was running for 45mins. I had to eventually restart the program to kill it (I know of other ways of killing it but that's not the problem..).
I monkey patching using gevent.monkey.patch_all() as well. The "some stuff" part does involve network connections and I am guessing something got stuck in one of those places. I don't understand why the timeout exception was not raised. Does anyone have any idea why the gevent.Timeout exception might not have been raised?
Whenever I've used gevent.Timeout, I've also used it as a context manager but with the second argument False. This way the context manager suppresses any exception and just leaves the chunk of code. You can follow up by checking if, say, the block set a value successfully:
result = None
with gevent.Timeout(5, False):
# Something that stalls
if result == None:
# Take care of business
This has worked for me very reliably. It seems that the default second argument exception to gevent.Timeout is None -- have you tried replacing it with your own exception type? Or even Exception?

Overriding basic signals (SIGINT, SIGQUIT, SIGKILL??) in Python

I'm writing a program that adds normal UNIX accounts (i.e. modifying /etc/passwd, /etc/group, and /etc/shadow) according to our corp's policy. It also does some slightly fancy stuff like sending an email to the user.
I've got all the code working, but there are three pieces of code that are very critical, which update the three files above. The code is already fairly robust because it locks those files (ex. /etc/passwd.lock), writes to to a temporary files (ex. /etc/passwd.tmp), and then, overwrites the original file with the temporary. I'm fairly pleased that it won't interefere with other running versions of my program or the system useradd, usermod, passwd, etc. programs.
The thing that I'm most worried about is a stray ctrl+c, ctrl+d, or kill command in the middle of these sections. This has led me to the signal module, which seems to do precisely what I want: ignore certain signals during the "critical" region.
I'm using an older version of Python, which doesn't have signal.SIG_IGN, so I have an awesome "pass" function:
def passer(*a):
pass
The problem that I'm seeing is that signal handlers don't work the way that I expect.
Given the following test code:
def passer(a=None, b=None):
pass
def signalhander(enable):
signallist = (signal.SIGINT, signal.SIGQUIT, signal.SIGABRT, signal.SIGPIPE, signal.SIGALRM, signal.SIGTERM, signal.SIGKILL)
if enable:
for i in signallist:
signal.signal(i, passer)
else:
for i in signallist:
signal.signal(i, abort)
return
def abort(a=None, b=None):
sys.exit('\nAccount was not created.\n')
return
signalhander(True)
print('Enabled')
time.sleep(10) # ^C during this sleep
The problem with this code is that a ^C (SIGINT) during the time.sleep(10) call causes that function to stop, and then, my signal handler takes over as desired. However, that doesn't solve my "critical" region problem above because I can't tolerate whatever statement encounters the signal to fail.
I need some sort of signal handler that will just completely ignore SIGINT and SIGQUIT.
The Fedora/RH command "yum" is written is Python and does basically exactly what I want. If you do a ^C while it's installing anything, it will print a message like "Press ^C within two seconds to force kill." Otherwise, the ^C is ignored. I don't really care about the two second warning since my program completes in a fraction of a second.
Could someone help me implement a signal handler for CPython 2.3 that doesn't cause the current statement/function to cancel before the signal is ignored?
As always, thanks in advance.
Edit: After S.Lott's answer, I've decided to abandon the signal module.
I'm just going to go back to try: except: blocks. Looking at my code there are two things that happen for each critical region that cannot be aborted: overwriting file with file.tmp and removing the lock once finished (or other tools will be unable to modify the file, until it is manually removed). I've put each of those in their own function inside a try: block, and the except: simply calls the function again. That way the function will just re-call itself in the event of KeyBoardInterrupt or EOFError, until the critical code is completed.
I don't think that I can get into too much trouble since I'm only catching user provided exit commands, and even then, only for two to three lines of code. Theoretically, if those exceptions could be raised fast enough, I suppose I could get the "maximum reccurrsion depth exceded" error, but that would seem far out.
Any other concerns?
Pesudo-code:
def criticalRemoveLock(file):
try:
if os.path.isFile(file):
os.remove(file)
else:
return True
except (KeyboardInterrupt, EOFError):
return criticalRemoveLock(file)
def criticalOverwrite(tmp, file):
try:
if os.path.isFile(tmp):
shutil.copy2(tmp, file)
os.remove(tmp)
else:
return True
except (KeyboardInterrupt, EOFError):
return criticalOverwrite(tmp, file)
There is no real way to make your script really save. Of course you can ignore signals and catch a keyboard interrupt using try: except: but it is up to your application to be idempotent against such interrupts and it must be able to resume operations after dealing with an interrupt at some kind of savepoint.
The only thing that you can really to is to work on temporary files (and not original files) and move them after doing the work into the final destination. I think such file operations are supposed to be "atomic" from the filesystem prospective. Otherwise in case of an interrupt: restart your processing from start with clean data.

Categories