Closures in Python - python

I've been trying to learn Python, and while I'm enthusiastic about using closures in Python, I've been having trouble getting some code to work properly:
def memoize(fn):
def get(key):
return (False,)
def vset(key, value):
global get
oldget = get
def newget(ky):
if key==ky: return (True, value)
return oldget(ky)
get = newget
def mfun(*args):
cache = get(args)
if (cache[0]): return cache[1]
val = apply(fn, args)
vset(args, val)
return val
return mfun
def fib(x):
if x<2: return x
return fib(x-1)+fib(x-2)
def fibm(x):
if x<2: return x
return fibm(x-1)+fibm(x-2)
fibm = memoize(fibm)
Basically, what this is supposed to do is use closures to maintain the memoized state of the function. I realize there are probably many faster, easier to read, and in general more 'Pythonic' ways to implement this; however, my goal is to understand exactly how closures work in Python, and how they differ from Lisp, so I'm not interested in alternative solutions, just why my code doesn't work and what I can do (if anything) to fix it.
The problem I'm running into is when I try to use fibm - Python insists that get isn't defined:
Python 2.6.1 (r261:67515, Feb 1 2009, 11:39:55)
[GCC 4.0.1 (Apple Inc. build 5488)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import memoize
>>> memoize.fibm(35)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "memoize.py", line 14, in mfun
cache = get(args)
NameError: global name 'get' is not defined
>>>
Seeing as I'm new to Python, I don't know if I've done something wrong, or if this is just a limitation of the language. I'm hoping it's the former. :-)

The problem is in your scoping, not in your closures. If you're up for some heavy reading, then you can try http://www.python.org/dev/peps/pep-3104/.
If that's not the case, here's the simple explanation:
The problem is in the statement global get . global refers to the outermost scope, and since there isn't any global function get, it throws.
What you need, is an access specifier for variables in the enclosing scope, and not the global scope.
In python 3.0, as I've tested, the nonlocal keyword is exactly what you need, in the place of global.
nonlocal get
...
In python 2.x, I just removed the global get and the oldget references and it works properly.

def memoize(fn):
get = [lambda key: (False, None)]
def vset(args):
value = fn(*args)
oldget = get[0]
def newget(key):
if args == key:
return (True, value)
return oldget(key)
get[0] = newget
return value
def mfun(*args):
found, value = get[0](args)
if found:
return value
return vset(args)
return mfun
CALLS = 0
def fib(x):
global CALLS
CALLS += 1
if x<2: return x
return fib(x-1)+fib(x-2)
#memoize
def fibm(x):
global CALLS
CALLS += 1
if x<2: return x
return fibm(x-1)+fibm(x-2)
CALLS = 0
print "fib(35) is", fib(35), "and took", CALLS, "calls"
CALLS = 0
print "fibm(35) is", fibm(35), "and took", CALLS, "calls"
Output is:
fib(35) is 9227465 and took 29860703 calls
fibm(35) is 9227465 and took 36 calls
Similar to other answers, however this one works. :)
The important change from the code in the question is assigning to a non-global non-local (get); however, I also made some improvements while trying to maintain your *cough*broken*cough* closure use. Usually the cache is a dict instead of a linked list of closures.

You want to put global get at the beginning of every function (except get itself).
the def get is an assignment to the name get, so you want get to be declared global before that.
Putting global get in mfun and vset makes them work. I can't point to the scoping rules that makes this necessary, but it works ;-)
Your conses are quite lispy too... :)

Get is not global, but local to the surrounding function, that's why the global declaration fails.
If you remove the global, it still fails, because you can't assign to the captured variable name. To work around that, you can use an object as the variable captured by your closures and than just change properties of that object:
class Memo(object):
pass
def memoize(fn):
def defaultget(key):
return (False,)
memo = Memo()
memo.get = defaultget
def vset(key, value):
oldget = memo.get
def newget(ky):
if key==ky: return (True, value)
return oldget(ky)
memo.get = newget
def mfun(*args):
cache = memo.get(args)
if cache[0]: return cache[1]
val = apply(fn, args)
vset(args, val)
return val
return mfun
This way you don't need to assign to the captured variable names but still get what you wanted.

Probably because you want the global get while it isn't a global?
By the way, apply is deprecated, use fn(*args) instead.
def memoize(fn):
def get(key):
return (False,)
def vset(key, value):
def newget(ky):
if key==ky: return (True, value)
return get(ky)
get = newget
def mfun(*args):
cache = get(args)
if (cache[0]): return cache[1]
val = fn(*args)
vset(args, val)
return val
return mfun
def fib(x):
if x<2: return x
return fib(x-1)+fib(x-2)
def fibm(x):
if x<2: return x
return fibm(x-1)+fibm(x-2)
fibm = memoize(fibm)

I think the best way would be:
class Memoized(object):
def __init__(self,func):
self.cache = {}
self.func = func
def __call__(self,*args):
if args in self.cache: return cache[args]
else:
self.cache[args] = self.func(*args)
return self.cache[args]

Related

Is there something like the threading macro from Clojure in Python?

In Clojure I can do something like this:
(-> path
clojure.java.io/resource
slurp
read-string)
instead of doing this:
(read-string (slurp (clojure.java.io/resource path)))
This is called threading in Clojure terminology and helps getting rid of a lot of parentheses.
In Python if I try to use functional constructs like map, any, or filter I have to nest them to each other. Is there a construct in Python with which I can do something similar to threading (or piping) in Clojure?
I'm not looking for a fully featured version since there are no macros in Python, I just want to do away with a lot of parentheses when I'm doing functional programming in Python.
Edit: I ended up using toolz which supports pipeing.
Here is a simple implementation of #deceze's idea (although, as #Carcigenicate points out, it is at best a partial solution):
import functools
def apply(x,f): return f(x)
def thread(*args):
return functools.reduce(apply,args)
For example:
def f(x): return 2*x+1
def g(x): return x**2
thread(5,f,g) #evaluates to 121
I wanted to take this to the extreme and do it all dynamically.
Basically, the below Chain class lets you chain functions together similar to Clojure's -> and ->> macros. It supports both threading into the first and last arguments.
Functions are resolved in this order:
Object method
Local defined variable
Built-in variable
The code:
class Chain(object):
def __init__(self, value, index=0):
self.value = value
self.index = index
def __getattr__(self, item):
append_arg = True
try:
prop = getattr(self.value, item)
append_arg = False
except AttributeError:
try:
prop = locals()[item]
except KeyError:
prop = getattr(__builtins__, item)
if callable(prop):
def fn(*args, **kwargs):
orig = list(args)
if append_arg:
if self.index == -1:
orig.append(self.value)
else:
orig.insert(self.index, self.value)
return Chain(prop(*orig, **kwargs), index=self.index)
return fn
else:
return Chain(prop, index=self.index)
Thread each result as first arg
file = Chain(__file__).open('r').readlines().value
Thread each result as last arg
result = Chain(range(0, 100), index=-1).map(lambda x: x * x).reduce(lambda x, y: x + y).value

Identifying pure functions in python

I have a decorator #pure that registers a function as pure, for example:
#pure
def rectangle_area(a,b):
return a*b
#pure
def triangle_area(a,b,c):
return ((a+(b+c))(c-(a-b))(c+(a-b))(a+(b-c)))**0.5/4
Next, I want to identify a newly defined pure function
def house_area(a,b,c):
return rectangle_area(a,b) + triangle_area(a,b,c)
Obviously house_area is pure, since it only calls pure functions.
How can I discover all pure functions automatically (perhaps by using ast)
Assuming operators are all pure, then essentially you only need to check all the functions calls. This can indeed be done with the ast module.
First I defined the pure decorator as:
def pure(f):
f.pure = True
return f
Adding an attribute telling that it's pure, allows skipping early or "forcing" a function to identify as pure. This is useful if you'd need a function like math.sin to identify as pure. Additionally since you can't add attributes to builtin functions.
#pure
def sin(x):
return math.sin(x)
All in all. Use the ast module to visit all the nodes. Then for each Call node check whether the function being called is pure.
import ast
class PureVisitor(ast.NodeVisitor):
def __init__(self, visited):
super().__init__()
self.pure = True
self.visited = visited
def visit_Name(self, node):
return node.id
def visit_Attribute(self, node):
name = [node.attr]
child = node.value
while child is not None:
if isinstance(child, ast.Attribute):
name.append(child.attr)
child = child.value
else:
name.append(child.id)
break
name = ".".join(reversed(name))
return name
def visit_Call(self, node):
if not self.pure:
return
name = self.visit(node.func)
if name not in self.visited:
self.visited.append(name)
try:
callee = eval(name)
if not is_pure(callee, self.visited):
self.pure = False
except NameError:
self.pure = False
Then check whether the function has the pure attribute. If not get code and check if all the functions calls can be classified as pure.
import inspect, textwrap
def is_pure(f, _visited=None):
try:
return f.pure
except AttributeError:
pass
try:
code = inspect.getsource(f.__code__)
except AttributeError:
return False
code = textwrap.dedent(code)
node = compile(code, "<unknown>", "exec", ast.PyCF_ONLY_AST)
if _visited is None:
_visited = []
visitor = PureVisitor(_visited)
visitor.visit(node)
return visitor.pure
Note that print(is_pure(lambda x: math.sin(x))) doesn't work since inspect.getsource(f.__code__) returns code on a line by line basis. So the source returned by getsource would include the print and is_pure call, thus yielding False. Unless those functions are overridden.
To verify that it works, test it by doing:
print(house_area) # Prints: True
To list through all the functions in the current module:
import sys, types
for k in dir(sys.modules[__name__]):
v = globals()[k]
if isinstance(v, types.FunctionType):
print(k, is_pure(v))
The visited list keeps track of which functions have already been verified pure. This help circumvent problems related to recursion. Since the code isn't executed, the evaluation would recursively visit factorial.
#pure
def factorial(n):
return 1 if n == 1 else n * factorial(n - 1)
Note that you might need to revise the following code. Choosing another way to obtain a function from its name.
try:
callee = eval(name)
if not is_pure(callee, self.visited):
self.pure = False
except NameError:
self.pure = False

decorator that add variable to closure

I want to write a decorator that inject custom local variable into function.
interface may like this.
def enclose(name, value):
...
def decorator(func):
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
return wrapper
return decorator
expectation:
#enclose('param1', 1)
def f():
param1 += 1
print param1
f() will compile and run without error
output:
2
Is it possible to do this in python? why?
I thought I'd try this out just to see how hard it would be. Pretty hard as it turns out.
First thing was how do you implement this? Is the extra parameter an injected local variable, an additional argument to the function or a nonlocal variable. An injected local variable will be a fresh object each time, but how to create more complicated objects... An additional argument will record mutations to the object, but assignments to the name will be forgotten between function invocations. Additionally, this will require either parsing of the source to find where to place the argument, or directly manipulating code objects. Finally, declaring the variables nonlocal will record mutations to the object and assignments to the name. Effectively a nonlocal is global, but only reachable by the decorated function. Again, using a nonlocal will requiring parsing the source and finding where to place the nonlocal declaration or direct manipulation of a code object.
In the end I decided with using a nonlocal variable and parsing the function source. Originally I was going to manipulate code objects, but it seemed too complicated.
Here is the code for the decorator:
import re
import types
import inspect
class DummyInject:
def __call__(self, **kwargs):
return lambda func: func
def __getattr__(self, name):
return self
class Inject:
function_end = re.compile(r"\)\s*:\s*\n")
indent = re.compile("\s+")
decorator = re.compile("#([a-zA-Z0-9_]+)[.a-zA-Z0-9_]*")
exec_source = """
def create_new_func({closure_names}):
{func_source}
{indent}return {func_name}"""
nonlocal_declaration = "{indent}nonlocal {closure_names};"
def __init__(self, **closure_vars):
self.closure_vars = closure_vars
def __call__(self, func):
lines, line_number = inspect.getsourcelines(func)
self.inject_nonlocal_declaration(lines)
new_func = self.create_new_function(lines, func)
return new_func
def inject_nonlocal_declaration(self, lines):
"""hides nonlocal declaration in first line of function."""
function_body_start = self.get_function_body_start(lines)
nonlocals = self.nonlocal_declaration.format(
indent=self.indent.match(lines[function_body_start]).group(),
closure_names=", ".join(self.closure_vars)
)
lines[function_body_start] = nonlocals + lines[function_body_start]
return lines
def get_function_body_start(self, lines):
line_iter = enumerate(lines)
found_function_header = False
for i, line in line_iter:
if self.function_end.search(line):
found_function_header = True
break
assert found_function_header
for i, line in line_iter:
if not line.strip().startswith("#"):
break
return i
def create_new_function(self, lines, func):
# prepares source -- eg. making sure indenting is correct
declaration_indent, body_indent = self.get_indent(lines)
if not declaration_indent:
lines = [body_indent + line for line in lines]
exec_code = self.exec_source.format(
closure_names=", ".join(self.closure_vars),
func_source="".join(lines),
indent=declaration_indent if declaration_indent else body_indent,
func_name=func.__name__
)
# create new func -- mainly only want code object contained by new func
lvars = {"closure_vars": self.closure_vars}
gvars = self.get_decorators(exec_code, func.__globals__)
exec(exec_code, gvars, lvars)
new_func = eval("create_new_func(**closure_vars)", gvars, lvars)
# add back bits that enable function to work well
# includes original global references and
new_func = self.readd_old_references(new_func, func)
return new_func
def readd_old_references(self, new_func, old_func):
"""Adds back globals, function name and source reference."""
func = types.FunctionType(
code=self.add_src_ref(new_func.__code__, old_func.__code__),
globals=old_func.__globals__,
name=old_func.__name__,
argdefs=old_func.__defaults__,
closure=new_func.__closure__
)
func.__doc__ = old_func.__doc__
return func
def add_src_ref(self, new_code, old_code):
return types.CodeType(
new_code.co_argcount,
new_code.co_kwonlyargcount,
new_code.co_nlocals,
new_code.co_stacksize,
new_code.co_flags,
new_code.co_code,
new_code.co_consts,
new_code.co_names,
new_code.co_varnames,
old_code.co_filename, # reuse filename
new_code.co_name,
old_code.co_firstlineno, # reuse line number
new_code.co_lnotab,
new_code.co_freevars,
new_code.co_cellvars
)
def get_decorators(self, source, global_vars):
"""Creates a namespace for exec function creation in. Must remove
any reference to Inject decorator to prevent infinite recursion."""
namespace = {}
for match in self.decorator.finditer(source):
decorator = eval(match.group()[1:], global_vars)
basename = match.group(1)
if decorator is Inject:
namespace[basename] = DummyInject()
else:
namespace[basename] = global_vars[basename]
return namespace
def get_indent(self, lines):
"""Takes a set of lines used to create a function and returns the
outer indentation that the function is declared in and the inner
indentation of the body of the function."""
body_indent = None
function_body_start = self.get_function_body_start(lines)
for line in lines[function_body_start:]:
match = self.indent.match(line)
if match:
body_indent = match.group()
break
assert body_indent
match = self.indent.match(lines[0])
if not match:
declaration_indent = ""
else:
declaration_indent = match.group()
return declaration_indent, body_indent
if __name__ == "__main__":
a = 1
#Inject(b=10)
def f(c, d=1000):
"f uses injected variables"
return a + b + c + d
#Inject(var=None)
def g():
"""Purposefully generate exception to show stacktraces are still
meaningful."""
create_name_error # line number 164
print(f(100)) # prints 1111
assert f(100) == 1111
assert f.__doc__ == "f uses injected variables" # show doc is retained
try:
g()
except NameError:
raise
else:
assert False
# stack trace shows NameError on line 164
Which outputs the following:
1111
Traceback (most recent call last):
File "inject.py", line 171, in <module>
g()
File "inject.py", line 164, in g
create_name_error # line number 164
NameError: name 'create_name_error' is not defined
The whole thing is hideously ugly, but it works. It's also worth noting that if Inject is used for method, then any injected values are shared between all instances of the class.
You can do it using globals but I don't recommend this approach.
def enclose(name, value):
globals()[name] = value
def decorator(func):
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
return wrapper
return decorator
#enclose('param1', 1)
def f():
global param1
param1 += 1
print(param1)
f()

how does the compile() work in python?

I have two code which really confused me.
def get_context():
__gc = globals()
__lc = locals()
def precompiler(code):
exec code in __lc
def compiler(script, scope):
return compile(script, scope, 'eval')
def executor(expr):
return eval(expr, __gc, __lc)
return precompiler, compiler, executor
maker1, compiler1, executor1 = get_context()
maker2, compiler2, executor2 = get_context()
maker1("abc = 123")
maker2("abc = 345")
expr1 = compiler1("abc == 123", "test.py")
print "executor1(abc == 123):", executor1(expr1)
print "executor2(abc == 123):", executor2(expr1)
the result is:
executor1(abc == 123): True
executor2(abc == 123): False
Why the compile execute in the closure only once and the byte-code could run in both?
And there is another code here:
def get_context():
__gc = globals()
__lc = locals()
test_var = 123
def compiler(script, scope):
return compile(script, scope, 'eval')
def executor(expr):
return eval(expr, __gc, __lc)
return compiler, executor
compiler1, executor1 = get_context()
compiler2, executor2 = get_context()
expr1 = compiler1("test_var == 123", "test.py")
print "executor1(test_var == 123):", executor1(expr1)
print "executor2(test_var == 123):", executor2(expr1)
the result is:
NameError: name 'test_var' is not defined
And how did this happen?
Why does the compile need to check the environment(variable or some others) of the closure while it is not dependent on the closure? This is what I confused!
In your first example, you are executing 'abc=123' in your first context, and 'abc=345' in your second context. So 'test_var==123' is true in your first context and false in your second context.
In your second example, you have caught an interesting situation where the interpreter has removed test_var from the context because test_var isn't referenced.
For your first question, compile just takes the python code and produces the bytecode. It it is not dependent in any way on the closure where you compiled it. Its not different then if you had produced say, a string. That string isn't permantely tied to the function where it was created and neither is the code object.
For your second question, locals() builds a dictionary of the local variables when it is called. Since you setup test_var after calling locals it doesn't have it. If you want test_var inside locals, you need to call it afterwards.

Real-world examples of nested functions

I asked previously how the nested functions work, but unfortunately I still don't quite get it. To understand it better, can someone please show some real-wold, practical usage examples of nested functions?
Many thanks
Your question made me curious, so I looked in some real-world code: the Python standard library. I found 67 examples of nested functions. Here are a few, with explanations.
One very simple reason to use a nested function is simply that the function you're defining doesn't need to be global, because only the enclosing function uses it. A typical example from Python's quopri.py standard library module:
def encode(input, output, quotetabs, header = 0):
...
def write(s, output=output, lineEnd='\n'):
# RFC 1521 requires that the line ending in a space or tab must have
# that trailing character encoded.
if s and s[-1:] in ' \t':
output.write(s[:-1] + quote(s[-1]) + lineEnd)
elif s == '.':
output.write(quote(s) + lineEnd)
else:
output.write(s + lineEnd)
... # 35 more lines of code that call write in several places
Here there was some common code within the encode function, so the author simply factored it out into a write function.
Another common use for nested functions is re.sub. Here's some code from the json/encode.py standard library module:
def encode_basestring(s):
"""Return a JSON representation of a Python string
"""
def replace(match):
return ESCAPE_DCT[match.group(0)]
return '"' + ESCAPE.sub(replace, s) + '"'
Here ESCAPE is a regular expression, and ESCAPE.sub(replace, s) finds all matches of ESCAPE in s and replaces each one with replace(match).
In fact, any API, like re.sub, that accepts a function as a parameter can lead to situations where nested functions are convenient. For example, in turtle.py there's some silly demo code that does this:
def baba(xdummy, ydummy):
clearscreen()
bye()
...
tri.write(" Click me!", font = ("Courier", 12, "bold") )
tri.onclick(baba, 1)
onclick expects you to pass an event-handler function, so we define one and pass it in.
Decorators are a very popular use for nested functions. Here's an example of a decorator that prints a statement before and after any call to the decorated function.
def entry_exit(f):
def new_f(*args, **kwargs):
print "Entering", f.__name__
f(*args, **kwargs)
print "Exited", f.__name__
return new_f
#entry_exit
def func1():
print "inside func1()"
#entry_exit
def func2():
print "inside func2()"
func1()
func2()
print func1.__name__
Nested functions avoid cluttering other parts of the program with other functions and variables that only make sense locally.
A function that return Fibonacci numbers could be defined as follows:
>>> def fib(n):
def rec():
return fib(n-1) + fib(n-2)
if n == 0:
return 0
elif n == 1:
return 1
else:
return rec()
>>> map(fib, range(10))
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
EDIT: In practice, generators would be a better solution for this, but the example shows how to take advantage of nested functions.
They are useful when using functions that take other functions as input. Say you're in a function, and want to sort a list of items based on the items' value in a dict:
def f(items):
vals = {}
for i in items: vals[i] = random.randint(0,100)
def key(i): return vals[i]
items.sort(key=key)
You can just define key right there and have it use vals, a local variable.
Another use-case is callbacks.
I have only had to use nested functions when creating decorators. A nested function is basically a way of adding some behavior to a function without knowing what the function is that you are adding behavior to.
from functools import wraps
from types import InstanceType
def printCall(func):
def getArgKwargStrings(*args, **kwargs):
argsString = "".join(["%s, " % (arg) for arg in args])
kwargsString = "".join(["%s=%s, " % (key, value) for key, value in kwargs.items()])
if not len(kwargs):
if len(argsString):
argsString = argsString[:-2]
else:
kwargsString = kwargsString[:-2]
return argsString, kwargsString
#wraps(func)
def wrapper(*args, **kwargs):
ret = None
if args and isinstance(args[0], InstanceType) and getattr(args[0], func.__name__, None):
instance, args = args[0], args[1:]
argsString, kwargsString = getArgKwargStrings(*args, **kwargs)
ret = func(instance, *args, **kwargs)
print "Called %s.%s(%s%s)" % (instance.__class__.__name__, func.__name__, argsString, kwargsString)
print "Returned %s" % str(ret)
else:
argsString, kwargsString = getArgKwargStrings(*args, **kwargs)
ret = func(*args, **kwargs)
print "Called %s(%s%s)" % (func.__name__, argsString, kwargsString)
print "Returned %s" % str(ret)
return ret
return wrapper
def sayHello(name):
print "Hello, my name is %s" % (name)
if __name__ == "__main__":
sayHelloAndPrintDebug = printCall(sayHello)
name = "Nimbuz"
sayHelloAndPrintDebug(name)
Ignore all the mumbo jumbo in the "printCall" function for right now and focus only the "sayHello" function and below. What we're doing here is we want to print out how the "sayHello" function was called everytime it is called without knowing or altering what the "sayHello" function does. So we redefine the "sayHello" function by passing it to "printCall", which returns a NEW function that does what the "sayHello" function does AND prints how the "sayHello" function was called. This is the concept of decorators.
Putting "#printCall" above the sayHello definition accomplishes the same thing:
#printCall
def sayHello(name):
print "Hello, my name is %s" % (name)
if __name__ == "__main__":
name = "Nimbuz"
sayHello(name)
Yet another (very simple) example. A function that returns another function. Note how the inner function (that is returned) can use variables from the outer function's scope.
def create_adder(x):
def _adder(y):
return x + y
return _adder
add2 = create_adder(2)
add100 = create_adder(100)
>>> add2(50)
52
>>> add100(50)
150
Python Decorators
This is actually another topic to learn, but if you look at the stuff on 'Using Functions as Decorators', you'll see some examples of nested functions.
OK, besides decorators: Say you had an application where you needed to sort a list of strings based on substrings which varied from time to time. Now the sorted functions takes a key= argument which is a function of one argument: the items (strings in this case) to be sorted. So how to tell this function which substrings to sort on? A closure or nested function, is perfect for this:
def sort_key_factory(start, stop):
def sort_key(string):
return string[start: stop]
return sort_key
Simple eh? You can expand on this by encapsulating start and stop in a tuple or a slice object and then passing a sequence or iterable of these to the sort_key_factory.

Categories