This is a follow-up to Handle an exception thrown in a generator and discusses a more general problem.
I have a function that reads data in different formats. All formats are line- or record-oriented and for each format there's a dedicated parsing function, implemented as a generator. So the main reading function gets an input and a generator, which reads its respective format from the input and delivers records back to the main function:
def read(stream, parsefunc):
for record in parsefunc(stream):
do_stuff(record)
where parsefunc is something like:
def parsefunc(stream):
while not eof(stream):
rec = read_record(stream)
do some stuff
yield rec
The problem I'm facing is that while parsefunc can throw an exception (e.g. when reading from a stream), it has no idea how to handle it. The function responsible for handling exceptions is the main read function. Note that exceptions occur on a per-record basis, so even if one record fails, the generator should continue its work and yield records back until the whole stream is exhausted.
In the previous question I tried to put next(parsefunc) in a try block, but as turned out, this is not going to work. So I have to add try-except to the parsefunc itself and then somehow deliver exceptions to the consumer:
def parsefunc(stream):
while not eof(stream):
try:
rec = read_record()
yield rec
except Exception as e:
?????
I'm rather reluctant to do this because
it makes no sense to use try in a function that isn't intended to handle any exceptions
it's unclear to me how to pass exceptions to the consuming function
there going to be many formats and many parsefunc's, I don't want to clutter them with too much helper code.
Has anyone suggestions for a better architecture?
A note for googlers: in addition to the top answer, pay attention to senderle's and Jon's posts - very smart and insightful stuff.
You can return a tuple of record and exception in the parsefunc and let the consumer function decide what to do with the exception:
import random
def get_record(line):
num = random.randint(0, 3)
if num == 3:
raise Exception("3 means danger")
return line
def parsefunc(stream):
for line in stream:
try:
rec = get_record(line)
except Exception as e:
yield (None, e)
else:
yield (rec, None)
if __name__ == '__main__':
with open('temp.txt') as f:
for rec, e in parsefunc(f):
if e:
print "Got an exception %s" % e
else:
print "Got a record %s" % rec
Thinking deeper about what would happen in a more complex case kind of vindicates the Python choice of avoiding bubbling exceptions out of a generator.
If I got an I/O error from a stream object the odds of simply being able to recover and continue reading, without the structures local to the generator being reset in some way, would be low. I would somehow have to reconcile myself with the reading process in order to continue: skip garbage, push back partial data, reset some incomplete internal tracking structure, etc.
Only the generator has enough context to do that properly. Even if you could keep the generator context, having the outer block handle the exceptions would totally flout the Law of Demeter. All the important information that the surrounding block needs to reset and move on is in local variables of the generator function! And getting or passing that information, though possible, is disgusting.
The resulting exception would almost always be thrown after cleaning up, in which case the reader-generator will already have an internal exception block. Trying very hard to maintain this cleanliness in the brain-dead-simple case only to have it break down in almost every realistic context would be silly. So just have the try in the generator, you are going to need the body of the except block anyway, in any complex case.
It would be nice if exceptional conditions could look like exceptions, though, and not like return values. So I would add an intermediate adapter to allow for this: The generator would yield either data or exceptions and the adapter would re-raise the exception if applicable. The adapter should be called first-thing inside the for loop, so that we have the option of catching it within the loop and cleaning up to continue, or breaking out of the loop to catch it and and abandon the process. And we should put some kind of lame wrapper around the setup to indicate that tricks are afoot, and to force the adapter to get called if the function is adapting.
That way each layer is presented errors that it has the context to handle, at the expense of the adapter being a tiny bit intrusive (and perhaps also easy to forget).
So we would have:
def read(stream, parsefunc):
try:
for source in frozen(parsefunc(stream)):
try:
record = source.thaw()
do_stuff(record)
except Exception, e:
log_error(e)
if not is_recoverable(e):
raise
recover()
except Exception, e:
properly_give_up()
wrap_up()
(Where the two try blocks are optional.)
The adapter looks like:
class Frozen(object):
def __init__(self, item):
self.value = item
def thaw(self):
if isinstance(value, Exception):
raise value
return value
def frozen(generator):
for item in generator:
yield Frozen(item)
And parsefunc looks like:
def parsefunc(stream):
while not eof(stream):
try:
rec = read_record(stream)
do_some_stuff()
yield rec
except Exception, e:
properly_skip_record_or_prepare_retry()
yield e
To make it harder to forget the adapter, we could also change frozen from a function to a decorator on parsefunc.
def frozen_results(func):
def freezer(__func = func, *args, **kw):
for item in __func(*args, **kw):
yield Frozen(item)
return freezer
In which case we we would declare:
#frozen_results
def parsefunc(stream):
...
And we would obviously not bother to declare frozen, or wrap it around the call to parsefunc.
Without knowing more about the system, I think it's difficult to tell what approach will work best. However, one option that no one has suggested yet would be to use a callback. Given that only read knows how to deal with exceptions, might something like this work?
def read(stream, parsefunc):
some_closure_data = {}
def error_callback_1(e):
manipulate(some_closure_data, e)
def error_callback_2(e):
transform(some_closure_data, e)
for record in parsefunc(stream, error_callback_1):
do_stuff(record)
Then, in parsefunc:
def parsefunc(stream, error_callback):
while not eof(stream):
try:
rec = read_record()
yield rec
except Exception as e:
error_callback(e)
I used a closure over a mutable local here; you could also define a class. Note also that you can access the traceback info via sys.exc_info() inside the callback.
Another interesting approach might be to use send. This would work a little differently; basically, instead of defining a callback, read could check the result of yield, do a lot of complex logic, and send a substitute value, which the generator would then re-yield (or do something else with). This is a bit more exotic, but I thought I'd mention it in case it's useful:
>>> def parsefunc(it):
... default = None
... for x in it:
... try:
... rec = float(x)
... except ValueError as e:
... default = yield e
... yield default
... else:
... yield rec
...
>>> parsed_values = parsefunc(['4', '6', '5', '5h', '22', '7'])
>>> for x in parsed_values:
... if isinstance(x, ValueError):
... x = parsed_values.send(0.0)
... print x
...
4.0
6.0
5.0
0.0
22.0
7.0
On it's own this is a bit useless ("Why not just print the default directly from read?" you might ask), but you could do more complex things with default inside the generator, resetting values, going back a step, and so on. You could even wait to send a callback at this point based on the error you receive. But note that sys.exc_info() is cleared as soon as the generator yields, so you'll have to send everything from sys.exc_info() if you need access to the traceback.
Here's an example of how you might combine the two options:
import string
digits = set(string.digits)
def digits_only(v):
return ''.join(c for c in v if c in digits)
def parsefunc(it):
default = None
for x in it:
try:
rec = float(x)
except ValueError as e:
callback = yield e
yield float(callback(x))
else:
yield rec
parsed_values = parsefunc(['4', '6', '5', '5h', '22', '7'])
for x in parsed_values:
if isinstance(x, ValueError):
x = parsed_values.send(digits_only)
print x
An example of a possible design:
from StringIO import StringIO
import csv
blah = StringIO('this,is,1\nthis,is\n')
def parse_csv(stream):
for row in csv.reader(stream):
try:
yield int(row[2])
except (IndexError, ValueError) as e:
pass # don't yield but might need something
# All others have to go up a level - so it wasn't parsable
# So if it's an IOError you know why, but this needs to catch
# exceptions potentially, just let the major ones propogate
for record in parse_csv(blah):
print record
I like the given answer with the Frozen stuff. Based on that idea I came up with this, solving two aspects I did not yet like. The first was the patterns needed to write it down. The second was the loss of the stack trace when yielding an exception. I tried my best to solve the first by using decorators as good as possible. I tried keeping the stack trace by using sys.exc_info() instead of the exception alone.
My generator normally (i.e. without my stuff applied) would look like this:
def generator():
def f(i):
return float(i) / (3 - i)
for i in range(5):
yield f(i)
If I can transform it into using an inner function to determine the value to yield, I can apply my method:
def generator():
def f(i):
return float(i) / (3 - i)
for i in range(5):
def generate():
return f(i)
yield generate()
This doesn't yet change anything and calling it like this would raise an error with a proper stack trace:
for e in generator():
print e
Now, applying my decorators, the code would look like this:
#excepterGenerator
def generator():
def f(i):
return float(i) / (3 - i)
for i in range(5):
#excepterBlock
def generate():
return f(i)
yield generate()
Not much change optically. And you still can use it the way you used the version before:
for e in generator():
print e
And you still get a proper stack trace when calling. (Just one more frame is in there now.)
But now you also can use it like this:
it = generator()
while it:
try:
for e in it:
print e
except Exception as problem:
print 'exc', problem
This way you can handle in the consumer any exception raised in the generator without too much syntactic hassle and without losing stack traces.
The decorators are spelled out like this:
import sys
def excepterBlock(code):
def wrapper(*args, **kwargs):
try:
return (code(*args, **kwargs), None)
except Exception:
return (None, sys.exc_info())
return wrapper
class Excepter(object):
def __init__(self, generator):
self.generator = generator
self.running = True
def next(self):
try:
v, e = self.generator.next()
except StopIteration:
self.running = False
raise
if e:
raise e[0], e[1], e[2]
else:
return v
def __iter__(self):
return self
def __nonzero__(self):
return self.running
def excepterGenerator(generator):
return lambda *args, **kwargs: Excepter(generator(*args, **kwargs))
(I answered the other question linked in the OP but my answer applies to this situation as well)
I have needed to solve this problem a couple of times and came upon this question after a search for what other people have done.
One option- which will probably require refactoring things a little bit- would be to simply create an error handling generator, and throw the exception in the generator (to another error handling generator) rather than raise it.
Here is what the error handling generator function might look like:
def err_handler():
# a generator for processing errors
while True:
try:
# errors are thrown to this point in function
yield
except Exception1:
handle_exc1()
except Exception2:
handle_exc2()
except Exception3:
handle_exc3()
except Exception:
raise
An additional handler argument is provided to the parsefunc function so it has a place to put the errors:
def parsefunc(stream, handler):
# the handler argument fixes errors/problems separately
while not eof(stream):
try:
rec = read_record(stream)
do some stuff
yield rec
except Exception as e:
handler.throw(e)
handler.close()
Now just use almost the original read function, but now with an error handler:
def read(stream, parsefunc):
handler = err_handler()
for record in parsefunc(stream, handler):
do_stuff(record)
This isn't always going to be the best solution, but it's certainly an option, and relatively easy to understand.
About your point of propagating exception from generator to consuming function,
you can try to use an error code (set of error codes) to indicate the error.
Though not elegant that is one approach you can think of.
For example in the below code yielding a value like -1 where you were expecting
a set of positive integers would signal to the calling function that there was
an error.
In [1]: def f():
...: yield 1
...: try:
...: 2/0
...: except ZeroDivisionError,e:
...: yield -1
...: yield 3
...:
In [2]: g = f()
In [3]: next(g)
Out[3]: 1
In [4]: next(g)
Out[4]: -1
In [5]: next(g)
Out[5]: 3
Actually, generators are quite limited in several aspects. You found one: the raising of exceptions is not part of their API.
You could have a look at the Stackless Python stuff like greenlets or coroutines which offer a lot more flexibility; but diving into that is a bit out of scope here.
Related
I have this dict
dic = {'wow': 77, 'yy': 'gt', 'dwe': {'dwdw': {'fefe': 2006}}}
and I have this function
def get_missing_key(data, nest, default_value):
try:
return data + nest
except KeyError as err:
return default_value
and this is how I call it:
get_missing_key(dic, ['dwe']['dwdw']['fefe'], 16)
What I want is that I want the second parameter to get converted to normal python expression and do calculations with it
I want it to be like this
def get_missing_key(data, nest, default_value):
try:
return data['dwe']['dwdw']['fefe']
except KeyError as err:
return default_value
is there a way to achieve this?
But what I have clearly doesn't work, since I can't concatinate a dict with a list
You could use reduce like #kyle-parsons did, or you could manually loop:
lookup = ["dwe", "dwdw", "fefe"]
def find_missing(data, lookup, default):
found = data
for i in lookup:
try:
found = found[i]
except KeyError:
return default
return found
You should pass your keys as a list.
from functools import reduce
def get_missing_key(data, nest, default_value):
try:
reduce(dict.__getitem__, nest, data)
except KeyError:
return default_value
In general, Python eagerly evaluates expressions and there's no way to delay that short of passing in strings of code to be built up and execed, but that's really not a good idea.
I often have functions that return multiple outputs which are structured like so:
def f(vars):
...
if something_unexpected():
return None, None
...
# normal return
return value1, value2
In this case, there might be a infrequent problem that something_unexpected detects (say, a empty dataframe when the routine expects at least one row of data), and so I want to return a value to the caller that says to ignore the output and skip over it. If this were a single return function then returning None once would seem fine, but when I'm returning multiple values it seems sloppy to return multiple copies of None just so the caller has the right number of arguments to unpack.
What are some better ways of coding up this construct? Is simply having the caller use a try-except block and the function raising an exception the way to go, or is there another example of good practice to use here?
Edit: Of course I could return the pair of outputs into a single variable, but then I'd have to call the function like
results = f(inputs)
if results is None:
continue
varname1, varname2 = results[0], results[1]
rather than the more clean-seeming
varname1, varname2 = f(inputs)
if varname1 is None:
continue
Depends on where you want to handle this behavior, but exceptions are a pretty standard way to do this. Without exceptions, you could still return None, None:
a, b = f(inputs)
if None in (a, b):
print("Got something bad!")
continue
Though, I think it might be better to raise in your function and catch it instead:
def f():
if unexpected:
raise ValueError("Got empty values")
else:
return val1, val2
try:
a, b = f()
except ValueError:
print("bad behavior in f, skipping")
continue
The best practice is to raise an exception:
if something_unexpected():
raise ValueError("Something unexpected happened")
REFERENCES:
Explicit is better than implicit.
Errors should never pass silently.
Unless explicitly silenced.
PEP 20 -- The Zen of Python
a = []
b = [1,2,'x']
try:
for i in b:
a.append(i%4)
except:
print('Not possible')
finally:
print("It's over")
print(a)
Result:
Not possible
It's over
[1, 2]
I always thought try in python is similar to transactions; commit() and rollback() in SQL. So in a way, the operation would not return partial results, as it does in my case. This is a dummy case, for example, but does python offer solutions in a way it doesn't commit the change to a list if the error was imposed in the process? So it would return a blank list in this example.
Please note, I am aware of how to fix this problem, I am curious about solving the issue with a try & except.
Wrap your operation in a list comprehension
a = []
b = [1, 2, 'x']
try:
a += [i % 4 for i in b]
except Exception:
print("failed")
This way nothing will be appended to a since the list comprehension failed to instantiate.
Try is a catcher, patiently waiting for an exception to be thrown so that it can catch it and send the code down the Except block. The only way an exception will be thrown is if it executes code, so no, it won't automatically rollback. Note that:
The code inside the Try has no idea it exists within a Try-Except block.
Because of 1., if there was a rollback, Python would really be taking all of your content inside of the Try and storing changes or copies of it. You will certainly have nested Trys. If you import a package, some of the functions you use in it will have Trys inside. This means the stack would be keeping track of multiple copies of everything you do in the block. Note that in most (many?) DBMS's, you cannot start a second transaction without ending your first. This is not the case in python.
If you want to rollback the changes, you should do that in the first few lines of your Except statement.
In Python, anything occurring in a try will execute (as if it weren't in a try/except block at all) all the way until an Exception is encountered. At that point, it then proceeds to the except block(s). finally is executed whether except was entered or not (aka, it will always fire). try/except/finally is not meant to be atomic like SQL.
You can define a new metaclass such that every class using this metaclass implement the try/except the way you mean.
For instance, you could define:
from copy import deepcopy
import types
class TryExceptRollbackMeta(type):
def __new__(cls, name, bases, attrs):
new_attrs = {}
for name, value in attrs.items():
if name == "__init__" or not isinstance(value, types.FunctionType):
new_attrs[name] = value
continue
# We know from now on that we're dealing with a non-static function
# If for some reason, a non-static method is defined without being passed self as an argument
if value.__code__.co_argcount == 0:
new_attrs[name] = value
continue
new_attrs[f"updated_{name}"] = TryExceptRollbackMeta.generate_updated_method(value)
return super().__new__(cls, name, bases, new_attrs)
#staticmethod
def generate_updated_method(func):
def updated_method(*args, **kwargs):
original = deepcopy(args[0])
try:
result = func(*args, **kwargs)
except Exception as e:
print(f"Exception {type(e)} has occured: {e}. Reverting state...")
args[0].__dict__.update(original.__dict__)
return None
return result
return updated_method
class Test(metaclass=TryExceptRollbackMeta):
def __init__(self):
self.a = []
def correct(self):
for i in [1, 2, 3]:
self.a.append(i % 4)
def incorrect(self):
for i in [1, 2, 'x']:
self.a.append(i % 4)
Then, this would work like this:
>>> test = Test()
>>> test.updated_correct()
>>> print(test.a)
[1, 2, 3]
>>> test.updated_incorrect()
Exception <class 'TypeError'> has occured: not all arguments converted during string formatting. Reverting state...
>>> print(test.a)
[1, 2, 3]
By doing so, you have full control on the way you want to deal with an Exception, you can act according to the Exception type, print the line at which it failed, etc...
The problem is that the deepcopy can potentially be very long depending on what your object attributes are. It is still possible to target the attributes you want to revert specifically though, this is just the general case where you don't know which attribute can be affected by a method.
So I'm trying to work out a generic solution that will collect all values from a function and append them to a list that is later accessible. This is to be used during concurrent.futures or threading type tasks. Here is a solution I have using a global master_list:
from concurrent.futures import ThreadPoolExecutor
master_list = []
def return_from_multithreaded(func):
# master_list = []
def wrapper(*args, **kwargs):
# nonlocal master_list
global master_list
master_list += func(*args, **kwargs)
return wrapper
#return_from_multithreaded
def f(n):
return [n]
with ThreadPoolExecutor(max_workers=20) as exec:
exec.map(f, range(1, 100))
print(master_list)
I would like to find a solution that does not include globals, and perhaps can return the commented out master_list that is stored as a closure?
If you don't want to use globals, don't discard the results of map. map is giving you back the values returned by each function, you just ignored them. This code could be made much simpler by using map for its intended purpose:
def f(n):
return n # No need to wrap in list
with ThreadPoolExecutor(max_workers=20) as exec:
master_list = list(exec.map(f, range(1, 100)))
print(master_list)
If you need a master_list that shows the results computed so far (maybe some other thread is watching it), you just make the loop explicit:
def f(n):
return n # No need to wrap in list
master_list = []
with ThreadPoolExecutor(max_workers=20) as exec:
for result in exec.map(f, range(1, 100)):
master_list.append(result)
print(master_list)
This is what the Executor model is designed for; normal threads aren't intended to return values, but Executors provided a channel for returning values under the covers so you don't have to manage it yourself. Internally, this is using Queues of some form or another, with additional metadata to keep the results in order, but you don't need to deal with that complexity; from your perspective, it's equivalent to the regular map function, it just happens to parallelize the work.
Update to cover dealing with exceptions:
map will raise any exceptions raised in the workers when the result is hit. Thus, as written, the first set of code will not store anything if any of the tasks fail (the list will be partially constructed, but thrown away when the exception raises). The second example will only keep results before the first exception is thrown, with the rest discarded (you'd have to store the map iterator and use some awkward code to avoid it). If you need to store all successful results, ignoring failures (or just logging them in some way), it's easiest to use submit to create a list of Future objects, then wait on them, either serially or in order of completion, wrapping the .result() calls in try/except to avoid throwing away good results. For example, to store results in order of submission, you'd do:
master_list = []
with ThreadPoolExecutor(max_workers=20) as exec:
futures = [exec.submit(f, i) for i in range(1, 100)]
exec.shutdown(False) # Optional: workers terminate as soon as all futures finish,
# rather than waiting for all results to be processed
for fut in futures:
try:
master_list.append(fut.result())
except Exception:
... log error here ...
For more efficient code, you can retrieve results in order of completion, not submission, using concurrent.futures.as_completed to eagerly retrieve results as they finish. The only change from the previous code is that:
for fut in futures:
becomes:
for fut in concurrent.futures.as_completed(futures):
where as_completed does the work of yielding completed/cancelled futures as soon as they complete, instead of delaying until all futures submitted earlier complete and get handled.
There are more complicated options involving using add_done_callback so the main thread isn't involved in explicitly handling the results at all, but that's usually unnecessary, and often confusing, so it's best avoided if possible.
I have faced this issue in the past: Running multiple asynchronous function and get the returned value of each function. This was my approach to do it:
def async_call(func_list):
"""
Runs the list of function asynchronously.
:param func_list: Expects list of lists to be of format
[[func1, args1, kwargs1], [func2, args2, kwargs2], ...]
:return: List of output of the functions
[output1, output2, ...]
"""
def worker(function, f_args, f_kwargs, queue, index):
"""
Runs the function and appends the output to list, and the Exception in the case of error
"""
response = {
'index': index, # For tracking the index of each function in actual list.
# Since, this function is called asynchronously, order in
# queue may differ
'data': None,
'error': None
}
# Handle error in the function call
try:
response['data'] = function(*f_args, **f_kwargs)
except Exception as e:
response['error'] = e # send back the exception along with the queue
queue.put(response)
queue = Queue()
processes = [Process(target=worker, args=(func, args, kwargs, queue, i)) \
for i, (func, args, kwargs) in enumerate(func_list)]
for process in processes:
process.start()
response_list = []
for process in processes:
# Wait for process to finish
process.join()
# Get back the response from the queue
response = queue.get()
if response['error']:
raise response['error'] # Raise exception if the function call failed
response_list.append(response)
return [content['data'] for content in sorted(response_list, key=lambda x: x['index'])]
Sample run:
def my_sum(x, y):
return x + y
def your_mul(x, y):
return x*y
my_func_list = [[my_sum, [1], {'y': 2}], [your_mul, [], {'x':1, 'y':2}]]
async_call(my_func_list)
# Value returned: [3, 2]
There are some cases where its convenient to use a generator with yield to pass back data, to the caller over an extended period. Is there a way to do something similar to yield, without having to make the function into a generator?
The reason for this, is in some cases I end up having to make all callee's into generators when those nested functions may have useful return values.
# currently this works fine, but requires a return arg
def nested(return_store):
return_store[0] = some_test()
yield from some_generator()
def do_stuff(return_store):
yield some_data
for more_data in data:
yield more_data
# Annoying workaround!
return_store = [None]
yield from nested(return_store)
if return_store[0]:
pass # do anything
def main():
return Reply(do_stuff())
Instead I'd like to pass an object as an argument which I can pass arguments to (instead of using yield)
# is something like this possible?
def nested(iter_obj):
iter_obj.yield_replacement(some_generator())
return some_test()
def do_stuff(iter_obj):
iter_obj.yield_replacement(some_data)
for more_data in data:
iter_obj.yield_replacement(more_data)
# No annoying workaround
if nested(iter_obj):
pass # do anything
def main():
iter_obj = yield_replacement_object(consumer=print)
# sets up the generator (Reply should consume iter_obj)
do_stuff(iter_obj)
return Reply(iter_obj)
Generators are just one form of iterators. Anything that implements the iterator protocol will do.
This means you can replace your nested function with an object with more attributes:
class Nested():
def __iter__:
self.some_flag = some_test()
yield data
I implemented the __iter__ method as a generator function even.
Then use the object in your generator at will:
n = Nested()
yield from n
if n.some_flag:
# ...
Another method is to throw exceptions; if you are trying to communicate some out-of-band state change, throw an exception and catch it in the parent generator.