Maintaining a roll-backable flow of code in python without extreme identation - python

I've encountered a situation where I'm working over a piece of code where I command changes on a remote object (that is one I can't duplicate to work over a clone), then ask the remote object for some operation in the new state and revert all the changes I made to it by a sequence of opposite commands.
The problem is that if in the middle of all these changes I encounter an error, I want to be able to roll-back all the changes I made so far.
The best fitting solution that came to my mind is the python try-finally workflow, but it's rather problematic when the sequence of commands is long:
try:
# perform action
try:
# perform action
try:
# ...
finally:
# unroll
finally:
# unroll
finally:
# unroll
This way, the more commands I need the deeper my indentation and nesting goes and the less my code is readable.
I've considered some other solutions such as maintaining a stack where for every command I push a rollback action, but this could get rather complicated, and I dislike pushing bound methods into stacks.
I've also considered incrementing a counter for every action I perform then in a single finally decide on the kind of rollback I want depending on the counter, but again, the maintainability of such code becomes a pain.
Most hits I got on searches for "transactions" and "rollback" were DB related and didn't fit very well to a more generic kind of code...
Anyone has a good idea as to how to systematically flatten this atrocity?

Wouldn't Context Manager objects and the with statement improve the situation? Especially if you can use a version of Python where the with statement supports multiple context expressions, as 2.7 or 3.x. Here's an example:
class Action(object):
def __init__(self, count):
self.count = count
def perform(self):
print "perform " + str(self.count)
if self.count == 2:
raise Exception("self.count is " + str(self.count))
def commit(self):
print "commit " + str(self.count)
def rollback(self):
print "rollback " + str(self.count)
def __enter__(self):
perform()
return self
def __exit__(self, exc_type, exc_value, traceback):
if exc_value is None:
self.commit()
else:
self.rollback()
with Action(1), Action(2), Action(3):
pass
You'd have to move your code to a set of "transactional" classes, such as Action above, where the action to be performed is executed in the __enter__() method and, if this terminates normally, you would be guaranteed that the corresponding __exit()__ method would be called.
Note that my example doesn't correspond exactly to yours; you'd have to tune what to execute in the __enter__() methods and what to execute in the with statement's body. In that case you might want to use the following syntax:
with Action(1) as a1, Action(2) as a2:
pass
To be able to access the Action objects from within the body of the with statement.

Related

Should I use finally after try/except?

I have a bunch of functions similar to this structure:
def df():
try:
foo = #do some computation
except Exception:
foo = #do other computation
return foo
I was wondering what would be the difference with this other implementation:
def df():
try:
foo = #do some computation
except Exception:
foo = #do other computation
finally:
return foo
What should I use in this case? I see it a little bit redundant and also I'm concerned of the time execution, because I have many more functions with this same architecture and I don't know if adding finally would increase the execution time too much or not.
If you are catching a generic exception like that and not throwing it back to the calling method then both are functionally the same. The finally keyword is guaranteed to run after the try/catch has processed so in those examples, it makes no real difference. Typically, the finally keyword is used to ensure thread state or connection closures after execution of the try/catch block. If those are truly representative of your code then I wouldn’t use finally.
"finally" is executed even if the exception is raised. In your specific case it wouldn't be required.

Raising exception in a generator, handle it elsewhere and vice versa in python

I'm thinking in a direction more advanced as well as difficult to find solutions this problem. Before coming to any decision, I thought of asking expert advice to address this problem.
The enhanced generators have new methods .send() and .throw() that allow the caller to pass messages or to raise exceptions into the generator (coroutine).
From python documentation: This can be very handy, especially the .throw() method that requests the generator to handle exceptions raised in the caller.
Request #1: Any example code for the above statement. I didn't find any code snippets for this explanation.
However, I'm considering the inverse problem as well: can a generator raise an exception, pass it to the caller, let the caller "repair" it, and continue the generator's own execution? That is what I would like to call a "reverse throw".
Request #2: Any example code for the above statement. I didn't find any code snippets for this explanation.
Simply raising exceptions in the generator is not OK. I tried "raise SomeException" in the generator, and that didn't work, because after a "raise" the generator can no longer be executed --- it simply stops, and further attempts to run the generator cause the StopIteration exception. In other words, "raise" is much more deadly than "yield": one can resume itself after yielding to the caller but a "raise" sends itself to the dead end.
I wonder if there are simple ways to do the "reverse throw" in Python? That will enable us to write coroutines that cooperate by throwing exceptions at each other. But why use exceptions? Well, I dunno... it all began as some rough idea.
CASE STUDY CODE:
class MyException(Exception):pass
def handleError(func):
''' handle an error'''
errors =[]
def wrapper(arg1):
result = func(arg1)
for err in findError(result):
errors.append(err)
print errors
return result
return wrapper
def findError(result):
'''
Find an error if any
'''
print result
for k, v in result.iteritems():
error_nr = v % 2
if error_nr ==0:
pass
elif error_nr > 0:
yield MyException
#handleError
def numGen(input):
''' This function take the input and generates 10 random numbers. 10 random numbers are saved in result dictionary with indices. Find error decorator is called based on the result dictionary'''
from random import randint
result= {}
errors = []
for i in range(9):
j = (randint(0,4))
result[i] = input + j
return result
if __name__ == '__main__':
numGen(4)
Could anyone explain please both the ideas based on case study example(Raising exception in a generator and handle it elsewhere vice versa)? I do expect pro's and con's of both methods.
Thanks in advance.
Looking for an answer drawing from credible and/or official sources.
Request #1 (Example for .throw())
I have never actually used this, but you could use it to change behaviour in the generator after the fact. You can also do this with .send of course, but then you'll need to deal with it in the line with the yield expressions (which might be in several locations in the code), rather than centralized with a try-except block.
def getstuff():
i=0
try:
while True:
yield i
i+=1
except ValueError:
while True:
yield i**2
i+=1
generator = getstuff()
print("Get some numbers...")
print(next(generator))
print(next(generator))
print(next(generator))
print("Oh, actually, I want squares!")
print(generator.throw(ValueError))
print(next(generator))
print(next(generator))
Request #1: Any example code for the above statement. I didn't find any code snippets for this explanation.
Take a look at ayscio source code
https://github.com/python/asyncio/search?utf8=%E2%9C%93&q=.throw
Request #2: Any example code for the above statement. I didn't find any code snippets for this explanation.
There is no way* to it today in python - maybe (if proven useful) can be a nice enhancement
*That is you can use yield to signal a framework to raise an exception elsewhere..
I have needed to solve this problem a couple of times and came upon this question after a search for what other people have done. I don't think I would used either of the methods suggested by the OP- they're pretty complicated.
One option- which will probably require refactoring things a little bit- would be to simply throw the exception in the generator (to another error handling generator) rather than raise it. Here is what that might look like:
def f(handler):
# the handler argument fixes errors/problems separately
while something():
try:
yield something_else()
except Exception as e:
handler.throw(e)
handler.close()
def err_handler():
# a generator for processing errors
while True:
try:
yield
except Exception1:
handle_exc1()
except Exception2:
handle_exc2()
except Exception3:
handle_exc3()
except Exception:
raise
def process():
handler = err_handler()
for item in f(handler):
do stuff
This isn't always going to be the best solution, but it's certainly an option, and relatively easy to understand.

How not to return to a calling function?

In python is there a way to not return to the caller function if a certain event happened in the called function. For example,...
def acquire_image(sdkobject):
ret = sdkobject.PrepareAcquisition()
error_check(ret)
ret = sdkobject.StartAcquisition()
error_check(ret)
error_check is a function that checks the return code to see if the sdk call had an error. If it is an error message then I would like to not go back to acquire and image but go to another function to reinitalise the sdk and start from the beginning again. Is there a pythonic way of doing this?
Have your error_check function raise an exception (like SDKError) if there is a problem, then run all the commands in a while loop.
class SDKError(Exception):
pass
# Perhaps define a separate exception for each possible
# error code, and make a dict that maps error codes to the
# appropriate exception.
class SDKType1Error(SDKError):
pass
class SDKType5Error(SDKError):
pass
sdk_errors = {
1: SDKType1Error,
5: SDKType5Error,
}
# Either return, if there was no error, or raise
# the appropriate exception
def error_check(return_code):
if return_code == 0:
return # No error
else:
raise sdk_errors[return_code]
# Example of how to deal with specific SDKErrors subclasses, or a generic
# catch-all SDKError
def acquire_image(sdkobject):
while True:
try:
# initialize sdk here
error_check(sdkobject.PrepareAcquisition())
error_check(sdkobject.StartAcquisition())
except SDKType1Error:
# Special handling for this error
except SDKError:
pass
else:
break
Return the error and use an if condition to check if the returned value has error, and if it has, call the reinitialization code from the calling function.
Use return for happy scenario
Returning to calling function is done by simple return or return response.
Use it for solving typical run of your code, when all goes well.
Throw exception, when something goes wrong
If something goes wrong, call raise Exception(). In many situations, your code does not has to do it explicitly, it throws the exception on its own.
You may even you your own Exception instances and use them to pass to the caller more information about what went wrong.
It took me a while to learn this approach and it made my coding much simpler and shorter then.
Do not care about what will your calling code do with it
Let your function do the task or fail, if there are problems.
Trying to think for client responsibility in your function will mess up your code and will not be complete solution anyway.
Things to avoid
Ignore who is calling you
In OOP this is principle of client anonymity. Just serve the request and do not care, who is calling.
Do not attempt using Exceptions as replacement for returning a value
Sometime, people use the fact, Exception can pass some information to to caller. But this is rather antipattern (there are always exception.)

Memoize a function so that it isn't reset when I rerun the file in Python

I often do interactive work in Python that involves some expensive operations that I don't want to repeat often. I'm generally running whatever Python file I'm working on frequently.
If I write:
import functools32
#functools32.lru_cache()
def square(x):
print "Squaring", x
return x*x
I get this behavior:
>>> square(10)
Squaring 10
100
>>> square(10)
100
>>> runfile(...)
>>> square(10)
Squaring 10
100
That is, rerunning the file clears the cache. This works:
try:
safe_square
except NameError:
#functools32.lru_cache()
def safe_square(x):
print "Squaring", x
return x*x
but when the function is long it feels strange to have its definition inside a try block. I can do this instead:
def _square(x):
print "Squaring", x
return x*x
try:
safe_square_2
except NameError:
safe_square_2 = functools32.lru_cache()(_square)
but it feels pretty contrived (for example, in calling the decorator without an '#' sign)
Is there a simple way to handle this, something like:
#non_resetting_lru_cache()
def square(x):
print "Squaring", x
return x*x
?
Writing a script to be executed repeatedly in the same session is an odd thing to do.
I can see why you'd want to do it, but it's still odd, and I don't think it's unreasonable for the code to expose that oddness by looking a little odd, and having a comment explaining it.
However, you've made things uglier than necessary.
First, you can just do this:
#functools32.lru_cache()
def _square(x):
print "Squaring", x
return x*x
try:
safe_square_2
except NameError:
safe_square_2 = _square
There is no harm in attaching a cache to the new _square definition. It won't waste any time, or more than a few bytes of storage, and, most importantly, it won't affect the cache on the previous _square definition. That's the whole point of closures.
There is a potential problem here with recursive functions. It's already inherent in the way you're working, and the cache doesn't add to it in any way, but you might only notice it because of the cache, so I'll explain it and show how to fix it. Consider this function:
#lru_cache()
def _fact(n):
if n < 2:
return 1
return _fact(n-1) * n
When you re-exec the script, even if you have a reference to the old _fact, it's going to end up calling the new _fact, because it's accessing _fact as a global name. It has nothing to do with the #lru_cache; remove that, and the old function will still end up calling the new _fact.
But if you're using the renaming trick above, you can just call the renamed version:
#lru_cache()
def _fact(n):
if n < 2:
return 1
return fact(n-1) * n
Now the old _fact will call fact, which is still the old _fact. Again, this works identically with or without the cache decorator.
Beyond that initial trick, you can factor that whole pattern out into a simple decorator. I'll explain step by step below, or see this blog post.
Anyway, even with the less-ugly version, it's still a bit ugly and verbose. And if you're doing this dozens of times, my "well, it should look a bit ugly" justification will wear thin pretty fast. So, you'll want to handle this the same way you always factor out ugliness: wrap it in a function.
You can't really pass names around as objects in Python. And you don't want to use a hideous frame hack just to deal with this. So you'll have to pass the names around as strings. ike this:
globals().setdefault('fact', _fact)
The globals function just returns the current scope's global dictionary. Which is a dict, which means it has the setdefault method, which means this will set the global name fact to the value _fact if it didn't already have a value, but do nothing if it did. Which is exactly what you wanted. (You could also use setattr on the current module, but I think this way emphasizes that the script is meant to be (repeatedly) executed in someone else's scope, not used as a module.)
So, here that is wrapped up in a function:
def new_bind(name, value):
globals().setdefault(name, value)
… which you can turn that into a decorator almost trivially:
def new_bind(name):
def wrap(func):
globals().setdefault(name, func)
return func
return wrap
Which you can use like this:
#new_bind('foo')
def _foo():
print(1)
But wait, there's more! The func that new_bind gets is going to have a __name__, right? If you stick to a naming convention, like that the "private" name must be the "public" name with a _ prefixed, we can do this:
def new_bind(func):
assert func.__name__[0] == '_'
globals().setdefault(func.__name__[1:], func)
return func
And you can see where this is going:
#new_bind
#lru_cache()
def _square(x):
print "Squaring", x
return x*x
There is one minor problem: if you use any other decorators that don't wrap the function properly, they will break your naming convention. So… just don't do that. :)
And I think this works exactly the way you want in every edge case. In particular, if you've edited the source and want to force the new definition with a new cache, you just del square before rerunning the file, and it works.
And of course if you want to merge those two decorators into one, it's trivial to do so, and call it non_resetting_lru_cache.
However, I'd keep them separate. I think it's more obvious what they do. And if you ever want to wrap another decorator around #lru_cache, you're probably still going to want #new_bind to be the outermost decorator, right?
What if you want to put new_bind into a module that you can import? Then it's not going to work, because it will be referring to the globals of that module, not the one you're currently writing.
You can fix that by explicitly passing your globals dict, or your module object, or your module name as an argument, like #new_bind(__name__), so it can find your globals instead of its. But that's ugly and repetitive.
You can also fix it with an ugly frame hack. At least in CPython, sys._getframe() can be used to get your caller's frame, and frame objects have a reference to their globals namespace, so:
def new_bind(func):
assert func.__name__[0] == '_'
g = sys._getframe(1).f_globals
g.setdefault(func.__name__[1:], func)
return func
Notice the big box in the docs that tells you this is an "implementation detail" that may only apply to CPython and is "for internal and specialized purposes only". Take this seriously. Whenever someone has a cool idea for the stdlib or builtins that could be implemented in pure Python, but only by using _getframe, it's generally treated almost the same as an idea that can't be implemented in pure Python at all. But if you know what you're doing, and you want to use this, and you only care about present-day versions of CPython, it will work.
There is no persistent_lru_cache in the stdlib. But you can build one pretty easily.
The functools source is linked directly from the docs, because this is one of those modules that's as useful as sample code as it is for using it directly.
As you can see, the cache is just a dict. If you replace that with, say, a shelf, it will become persistent automatically:
def persistent_lru_cache(filename, maxsize=128, typed=False):
"""new docstring explaining what dbpath does"""
# same code as before up to here
def decorating_function(user_function):
cache = shelve.open(filename)
# same code as before from here on.
Of course that only works if your arguments are strings. And it could be a little slow.
So, you might want to instead keep it as an in-memory dict, and just write code that pickles it to a file atexit, and restores it from a file if present at startup:
def decorating_function(user_function):
# ...
try:
with open(filename, 'rb') as f:
cache = pickle.load(f)
except:
cache = {}
def cache_save():
with lock:
with open(filename, 'wb') as f:
pickle.dump(cache, f)
atexit.register(cache_save)
# …
wrapper.cache_save = cache_save
wrapper.cache_filename = filename
Or, if you want it to write every N new values (so you don't lose the whole cache on, say, an _exit or a segfault or someone pulling the cord), add this to the second and third versions of wrapper, right after the misses += 1:
if misses % N == 0:
cache_save()
See here for a working version of everything up to this point (using save_every as the "N" argument, and defaulting to 1, which you probably don't want in real life).
If you want to be really clever, maybe copy the cache and save that in a background thread.
You might want to extend the cache_info to include something like number of cache writes, number of misses since last cache write, number of entries in the cache at startup, …
And there are probably other ways to improve this.
From a quick test, with save_every=1, this makes the cache on both get_pep and fib (from the functools docs) persistent, with no measurable slowdown to get_pep and a very small slowdown to fib the first time (note that fib(100) has 100097 hits vs. 101 misses…), and of course a large speedup to get_pep (but not fib) when you re-run it. So, just what you'd expect.
I can't say I won't just use #abarnert's "ugly frame hack", but here is the version that requires you to pass in the calling module's globals dict. I think it's worth posting given that decorator functions with arguments are tricky and meaningfully different from those without arguments.
def create_if_not_exists_2(my_globals):
def wrap(func):
if "_" != func.__name__[0]:
raise Exception("Function names used in cine must begin with'_'")
my_globals.setdefault(func.__name__[1:], func)
def wrapped(*args):
func(*args)
return wrapped
return wrap
Which you can then use in a different module like this:
from functools32 import lru_cache
from cine import create_if_not_exists_2
#create_if_not_exists_2(globals())
#lru_cache()
def _square(x):
print "Squaring", x
return x*x
assert "_square" in globals()
assert "square" in globals()
I've gained enough familiarity with decorators during this process that I was comfortable taking a swing at solving the problem another way:
from functools32 import lru_cache
try:
my_cine
except NameError:
class my_cine(object):
_reg_funcs = {}
#classmethod
def func_key (cls, f):
try:
name = f.func_name
except AttributeError:
name = f.__name__
return (f.__module__, name)
def __init__(self, f):
k = self.func_key(f)
self._f = self._reg_funcs.setdefault(k, f)
def __call__(self, *args, **kwargs):
return self._f(*args, **kwargs)
if __name__ == "__main__":
#my_cine
#lru_cache()
def fact_my_cine(n):
print "In fact_my_cine for", n
if n < 2:
return 1
return fact_my_cine(n-1) * n
x = fact_my_cine(10)
print "The answer is", x
#abarnert, if you are still watching, I'd be curious to hear your assessment of the downsides of this method. I know of two:
You have to know in advance what attributes to look in for a name to associate with the function. My first stab at it only looked at func_name which failed when passed an lru_cache object.
Resetting a function is painful: del my_cine._reg_funcs[('__main__', 'fact_my_cine')], and the swing I took at adding a __delitem__ was unsuccessful.

Creating an asynchronous method with Google App Engine's NDB

I want to make sure I got down how to create tasklets and asyncrounous methods. What I have is a method that returns a list. I want it to be called from somewhere, and immediatly allow other calls to be made. So I have this:
future_1 = get_updates_for_user(userKey, aDate)
future_2 = get_updates_for_user(anotherUserKey, aDate)
somelist.extend(future_1)
somelist.extend(future_2)
....
#ndb.tasklet
def get_updates_for_user(userKey, lastSyncDate):
noteQuery = ndb.GqlQuery('SELECT * FROM Comments WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastSyncDate)
note_list = list()
qit = noteQuery.iter()
while (yield qit.has_next_async()):
note = qit.next()
noteDic = note.to_dict()
note_list.append(noteDic)
raise ndb.Return(note_list)
Is this code doing what I'd expect it to do? Namely, will the two calls run asynchronously? Am I using futures correctly?
Edit: Well after testing, the code does produce the desired results. I'm a newbie to Python - what are some ways to test to see if the methods are running async?
It's pretty hard to verify for yourself that the methods are running concurrently -- you'd have to put copious logging in. Also in the dev appserver it'll be even harder as it doesn't really run RPCs in parallel.
Your code looks okay, it uses yield in the right place.
My only recommendation is to name your function get_updates_for_user_async() -- that matches the convention NDB itself uses and is a hint to the reader of your code that the function returns a Future and should be yielded to get the actual result.
An alternative way to do this is to use the map_async() method on the Query object; it would let you write a callback that just contains the to_dict() call:
#ndb.tasklet
def get_updates_for_user_async(userKey, lastSyncDate):
noteQuery = ndb.gql('...')
note_list = yield noteQuery.map_async(lambda note: note.to_dict())
raise ndb.Return(note_list)
Advanced tip: you can simplify this even more by dropping the #ndb.tasklet decorator and just returning the Future returned by map_async():
def get_updates_for_user_Async(userKey, lastSyncDate):
noteQuery = ndb.gql('...')
return noteQuery.map_async(lambda note: note.to_dict())
This is a general slight optimization for async functions that contain only one yield and immediately return the value yielded. (If you don't immediately get this you're in good company, and it runs the risk to be broken by a future maintainer who doesn't either. :-)

Categories