Python lambda capturing external variables - python

Consider the following example:
test = 123
f = lambda x,test=test: print(x, test)
del test
f('hello')
prints
hello 123
By capturing the variable in the lambda definition, the original variable seems to be retained.
Is the lambda ok to use when you could also use a simple object to store the same data?
class Test:
def __init__(self, some_data):
self.some_data = some_data
def f(self, x):
print(x, self.some_data)
t = Test('123')
t.f('hello')

By capturing the variable in the lambda definition, the original variable seems to be retained.
Becaused you made it the default value for the test argument of the lambda, and Python only evaluates arguments defaults once when the function is created. FWIW, this has nothing to do with the use of the lambda keyword - lambda is just syntactic sugar and you'd get exactly the same result with a "full-blown" function, ie:
test = 123
def f(x, test=test):
print(x, test)
test = 456
f('hello')
Is the lambda ok to use when you could also use a simple object to store the same data?
Oranges and apples, really. A function is not meant to be use to "store" anything, but to perform some computation. The fact that it can capture some values either via the arguments defaults or via a closure, while quite useful for some use cases, is not meant as a replacement for proper objects or collections. So the answer is : depends on what you want to do with those values and your function.
NB : technically what you're doing here is (almost) what is known as "partial application" (you'd just have to rename Test.f to Test.__call__ and use t("hello") in the second example)). A partial application of a function is the mechanism by which a function of N arguments, when called with N - x (with x < N) arguments, returns a function of N-x arguments that when called with the missing arguments will return the result of the original function, ie:
def foo(a, b=None):
if b is None:
return lambda b, a=a: foo(b, a)
return a + b
# here, `f` is a partial application of function `foo` to `1`
f = foo(1)
print(f)
f(2)
In this case we use a closure to capture a, in the functional programming tradition - "partial application" is also mostly a functionnal programming concept FWIW. Now while it does support some FP features and idioms, Python is first and foremost an OO language, and since closures are the FP equivalent of objects (closures are a way to encapsulate state and behaviour together), it also makes sense to implement partial application as proper class, either with "ad hoc" specialized objects (your Test class but also Method objects), or with a more generic "partial" class - which already exists in the stdlib as functools.partial

The default argument value of an argument is evaluated once, when the function or lambda is defined. It is not evaluated at each call site. (This is why you usually cannot use [] as a default argument and have to specify None instead, and write code for allocating a fresh list each time the function is called.)
It really depends on the surrounding code if a lambda, function, or class is appropriate. Usually, if there is only one operation and the state cannot be mutated directly from the outside (only within the operation), a lambda or function is appropriate.

Related

How to map function on all argument values, as a list? but have explicit argument names in the function definition

I want to define a function using explicit argument names ff(a,b,c) in the function definition, but I also want to map a function over all arguments to get a list:
ff(a,b,c):
return list(map(myfunc,[a,b,c]))
However, I don't want to explicitly write parameter names inside function as a,b,c. I want to do it like
ff(a,b,c):
return list(map(myfunc,getArgValueList()))
getArgValueList() will retrieve the argument values in order and form a list. How to do this? Is there a built-in function like getArgValueList()?
What you're trying to do is impossible without ugly hacks. You either take *args and get a sequence of parameter values that you can use as args:
def ff(*args):
return list(map(myfunc, args))
… or you take three explicit parameters and use them by name:
def ff(a, b, c):
return list(map(myfunc, (a, b, c)))
… but it's one or the other, not both.
Of course you can put those values in a sequence yourself if you want:
def ff(a, b, c):
args = a, b, c
return list(map(myfunc, args))
… but I'm not sure what that buys you.
If you really want to know how to write a getArgValueList function anyway, I'll explain how to do it. However, if you're looking to make your code more readable, more efficient, more idiomatic, easier to understand, more concise, or almost anything else, it will have the exact opposite effect. The only reason I could imagine doing something like this is if you had to generate functions dynamically or something—and even then, I can't think of a reason you couldn't just use *args. But, if you insist:
def getArgValueList():
frame = inspect.currentframe().f_back
code = frame.f_code
vars = code.co_varnames[:code.co_argcount]
return [frame.f_locals[var] for var in vars]
If you want to know how it works, most of it's in the inspect module docs:
currentframe() gets the current frame—the frame of getArgValueList.
f_back gets the parent frame—the frame of whoever called getArgValueList.
f_code gets the code object compiled from the function body of whoever called getArgValueList.
co_varnames is a list of all local variables in that body, starting with the parameters.
co_argcount is a count of explicit positional-or-keyword parameters.
f_locals is a dict with a copy of the locals() environment of the frame.
This of course only works for a function that takes no *args, keyword-only args, or **kwargs, but you can extend it to work for them as well with a bit of work. (See co_kwonlyargcount, co_flags, CO_VARARGS, and CO_VARKEYWORDS for details.)
Also, this only works for CPython, not most other interpreters. and it could break in some future version, because it's pretty blatantly relying on implementation details of the interpreter.
The *args construction will give you the arguments as a list:
>>> def f(*args): return list(map(lambda x:x+1, args))
>>> f(1,2,3)
[2, 3, 4]
If you are bound with the signature of f, you'll have to use the inspect module:
import inspect
def f(a, b,c):
f_locals = locals()
values = [f_locals[name] for name in inspect.signature(f).parameters]
return list(map(lambda x:x+1, values))
inspect.signature(f).parameters gives you the list of arguments in the correct order. The values are in locals().

Confused about the lambda expression in python

I understand the normal lambda expression, such as
g = lambda x: x**2
However, for some complex ones, I am a little confused about them. For example:
for split in ['train', 'test']:
sets = (lambda split=split: newspaper(split, newspaper_devkit_path))
def get_imdb():
return sets()
Where newspaper is a function. I was wondering what actually the sets is and why the get_imdb function can return the value sets()
Thanks for your help!
Added:
The codes are actually from here factory.py
sets is being assigned a lambda that is not really supposed to accept inputs, which you see from the way it is invoked. Lambdas in general behave like normal functions, and can therefore be assigned to variables like g or sets. The definition of sets is surrounded by an extra set of parentheses for no apparent reason. You can ignore those outer parens.
Lambdas can have all the same types of positional, keyword and default arguments a normal function can. The lambda sets has a default parameter named split. This is a common idiom to ensure that sets in each iteration of the loop gets the value of split corresponding to that iteration rather than just the one from the last iteration in all cases.
Without a default parameter, split would be evaluated within the lambda based on the namespace at the time it was called. Once the loop completes, split in the outer function's namespace will just be the last value it had for the loop.
Default parameters are evaluated immediately when a function object is created. This means that the value of the default parameter split will be wherever it is in the iteration of the loop that creates it.
Your example is a bit misleading because it discards all the actual values of sets besides the last one, making the default parameter to the lambda meaningless. Here is an example illustrating what happens if you keep all the lambdas. First with the default parameter:
sets = []
for split in ['train', 'test']:
sets.append(lambda split=split: split)
print([fn() for fn in sets])
I have truncated the lambdas to just return their input parameter for purposes of illustration. This example will print ['train', 'test'], as expected.
If you do the same thing without the default parameter, the output will be ['test', 'test'] instead:
sets = []
for split in ['train', 'test']:
sets.append(lambda: split)
print([fn() for fn in sets])
This is because 'test' is the value of split when all the lambdas get evaluated.
A lambda function:
func = lambda x: x**2
can be rewritten almost equivalently:
def func(x):
return x**2
Using either way, you can call the function in this manner:
func(4)
In your example,
sets = lambda split=split: newspaper(split, newspaper_devkit_path)
can be rewritten:
def sets(split=split):
return newspaper(split, newspaper_devkit_path)
and so can be called:
sets()
When you write the following:
def get_imdb():
return sets()
you are defining a "closure". A reference to the function sets is saved within get_imdb so that it can be called later wherever get_imdb is called.
Maybe you are confused about the split=split part. This has the same meaning as it would have in a regular function: the split on the left is a parameter of the lambda function and the split on the right is the default value the left split takes when no value is provided. In this case, the default value would be the variable split defined in the for loop.
So, answering your first question (what is sets?):
sets is a variable to which an anonymous function (or lambda function) is assigned. This allows the lambda function to be referenced and used via the variable sets.
To your second question (why can sets() be returned?), I respond:
Since sets is a variable that acts as a function, adding parenthesis after it calls the lambda function. Because no parameters are given, the parameter split takes the value 'test', which is the last value the for loop variable split takes. It is worth noting here that, since sets is not defined inside the function get_imdb, the interpreter looks for a definition of sets outside the scope of get_imdb (and finds the one that refers to the lambda function).

Caching in python using *args and lambda functions

I recently attempted Googles foo.bar challenge. After my time was up I decided to try find a solution to the problem I couldn't do and found a solution here (includes the problem statement if you're interested). I'd previously been making a dictionary for every function I wanted to cache but it looks like in this solution any function/input can be cached using the same syntax.
Firstly I'm confused on how the code is even working, the *args variable isn't inputted as an argument (and prints to nothing). Heres an modified minimal example to illustrate my confusion:
mem = {}
def memoize(key, func, *args):
"""
Helper to memoize the output of a function
"""
print(args)
if key not in mem:
# store the output of the function in memory
mem[key] = func(*args)
return mem[key]
def example(n):
return memoize(
n,
lambda: longrun(n),
)
def example2(n):
return memoize(
n,
longrun(n),
)
def longrun(n):
for i in range(10000):
for j in range(100000):
2**10
return n
Here I use the same memoize function but with a print. The function example returns memoize(n, a lambda function,). The function longrun is just an identity function with lots of useless computation so it's easy to see if the cache is working (example(2) will take ~5 seconds the first time and be almost instant after).
Here are my confusions:
Why is the third argument of memoize empty? When args is printed in memoize it prints (). Yet somehow mem[key] stores func(*args) as func(key)?
Why does this behavior only work when using the lambda function (example will cache but example2 won't)? I thought lambda: longrun(n) is just a short way of giving as input a function which returns longrun(n).
As a bonus, does anyone know how you could memoize functions using a decorator?
Also I couldn't think of a more descriptive title, edits welcome. Thanks.
The notation *args stands for a variable number of positional arguments. For example, print can be used as print(1), print(1, 2), print(1, 2, 3) and so on. Similarly, **kwargs stands for a variable number of keyword arguments.
Note that the names args and kwargs are just a convention - it's the * and ** symbols that make them variadic.
Anyways, memoize uses this to accept basically any input to func. If the result of func isn't cached, it's called with the arguments. In a function call, *args is basically the reverse of *args in a function definition. For example, the following are equivalent:
# provide *args explicitly
print(1, 2, 3)
# unpack iterable to *args
arguments = 1, 2, 3
print(*arguments)
If args is empty, then calling print(*args) is the same as calling print() - no arguments are passed to it.
Functions and lambda functions are the same in python. It's simply a different notation for creating a function object.
The problem is that in example2, you are not passing a function. You call a function, then pass on its result. Instead, you have to pass on the function and its argument separately.
def example2(n):
return memoize(
n,
longrun, # no () means no call, just the function object
# all following parameters are put into *args
n
)
Now, some implementation details: why is args empty and why is there a separate key?
The empty args comes from your definition of the lambda. Let's write that as a function for clarity:
def example3(n):
def nonlambda():
return longrun(n)
return memoize(n, nonlambda)
Note how nonlambda takes no arguments. The parameter n is bound from the containing scope as a closure, bound from the containing scope. As such, you don't have to pass it to memoize - it is already bound inside the nonlambda. Thus, args is empty in memoize, even though longrun does receive a parameter, because the two don't interact directly.
Now, why is it mem[key] = f(*args), not mem[key] = f(key)? That's actually slightly the wrong question; the right question is "why isn't it mem[f, args] = f(*args)?".
Memoization works because the same input to the same function leads to the same output. That is, f, args identifies your output. Ideally, your key would be f, args as that's the only relevant information.
The problem is you need a way to look up f and args inside mem. If you ever tried putting a list inside a dict, you know there are some types which don't work in mappings (or any other suitable lookup structure, for that matter). So if you define key = f, args, you cannot memoize functions taking mutable/unhashable types. Python's functools.lru_cache actually has this limitation.
Defining an explicit key is one way of solving this problem. It has the advantage that the caller can select an appropriate key, for example taking n without any modifications. This offers the best optimization potential. However, it breaks easily - using just n misses out the actual function called. Memoizing a second function with the same input would break your cache.
There are alternative approaches, each with pros and cons. Common is the explicit conversion of types: list to tuple, set to frozenset, and so on. This is slow, but the most precise. Another approach is to just call str or repr as in key = repr((f, args, sorted(kwargs.items()))), but it relies on every value having a proper repr.

Why doesn't functools.partial return a real function (and how to create one that does)?

So I was playing around with currying functions in Python and one of the things that I noticed was that functools.partial returns a partial object rather than an actual function. One of the things that annoyed me about this was that if I did something along the lines of:
five = partial(len, 'hello')
five('something')
then we get
TypeError: len() takes exactly 1 argument (2 given)
but what I want to happen is
TypeError: five() takes no arguments (1 given)
Is there a clean way to make it work like this? I wrote a workaround, but it's too hacky for my taste (doesn't work yet for functions with varargs):
def mypartial(f, *args):
argcount = f.func_code.co_argcount - len(args)
params = ''.join('a' + str(i) + ',' for i in xrange(argcount))
code = '''
def func(f, args):
def %s(%s):
return f(*(args+(%s)))
return %s
''' % (f.func_name, params, params, f.func_name)
exec code in locals()
return func(f, args)
Edit: I think it might be helpful if I added more context. I'm writing a decorator that will automatically curry a function like so:
#curry
def add(a, b, c):
return a + b + c
f = add(1, 2) # f is a function
assert f(5) == 8
I want to hide the fact that f was created from a partial (maybe a bad idea :P). The message that the TypeError message above gives is one example of where whether something is a partial can be revealed. I want to change that.
This needs to be generalizable so EnricoGiampieri's and mgilson's suggestions only work in that specific case.
You definitely don't want to do this with exec.
You can find recipes for partial in pure Python, such as this one—many of them are mislabeled as curry recipes, so look for that as well. At any rate, these will show you the proper way to do it without exec, and you can just pick one and modify it to do what you want.
Or you could just wrap partial…
However, whatever you do, there's no way the wrapper can know that it's defining a function named "five"; that's just the name of the variable you store the function in. So if you want a custom name, you'll have to pass it in to the function:
five = my_partial('five', len, 'hello')
At that point, you have to wonder why this is any better than just defining a new function.
However, I don't think this is what you actually want anyway. Your ultimate goal is to define a #curry decorator that creates a curried version of the decorated function, with the same name (and docstring, arg list, etc.) as the decorated function. The whole idea of replacing the name of the intermediate partial is a red herring; use functools.wraps properly inside your curry function, and it won't matter how you define the curried function, it'll preserve the name of the original.
In some cases, functools.wraps doesn't work. And in fact, this may be one of those times—you need to modify the arg list, for example, so curry(len) can take either 0 or 1 parameter instead of requiring 1 parameter, right? See update_wrapper, and the (very simple) source code for wraps and update_wrapper to see how the basics work, and build from there.
Expanding on the previous: To curry a function, you pretty much have to return something that takes (*args) or (*args, **kw) and parse the args explicitly, and possibly raise TypeError and other appropriate exceptions explicitly. Why? Well, if foo takes 3 params, curry(foo) takes 0, 1, 2, or 3 params, and if given 0-2 params it returns a function that takes 0 through n-1 params.
The reason you might want **kw is that it allows callers to specify params by name—although then it gets much more complicated to check when you're done accumulating arguments, and arguably this is an odd thing to do with currying—it may be better to first bind the named params with partial, then curry the result and pass in all remaining params in curried style…
If foo has default-value or keyword args, it gets even more complicated, but even without those problems, you already need to deal with this problem.
For example, let's say you implement curry as a class that holds the function and all already-curried parameters as instance members. Then you'll have something like this:
def __call__(self, *args):
if len(args) + len(self.curried_args) > self.fn.func_code.co_argcount:
raise TypeError('%s() takes exactly %d arguments (%d given)' %
(self.fn.func_name, self.fn.func_code.co_argcount,
len(args) + len(self.curried_args)))
self.curried_args += args
if len(self.curried_args) == self.fn.func_code.co_argcount:
return self.fn(*self.curried_args)
else:
return self
This is horribly oversimplified, but it shows how to handle the basics.
My guess is that the partial function just delay the execution of the function, do not create a whole new function out of it.
My guess is that is just easier to define directly a new function in place:
def five(): return len('hello')
This is a very simple line, won't clutter your code and is quite clear, so i wouldn't bother writing a function to replace it, especially if you don't need this situation in a large number of cases

What does "lambda" mean in Python, and what's the simplest way to use it?

Can you give an example and other examples that show when and when not to use Lambda?
My book gives me examples, but they're confusing.
Lambda, which originated from Lambda Calculus and (AFAIK) was first implemented in Lisp, is basically an anonymous function - a function which doesn't have a name, and is used in-line, in other words you can assign an identifier to a lambda function in a single expression as such:
>>> addTwo = lambda x: x+2
>>> addTwo(2)
4
This assigns addTwo to the anonymous function, which accepts 1 argument x, and in the function body it adds 2 to x, it returns the last value of the last expression in the function body so there's no return keyword.
The code above is roughly equivalent to:
>>> def addTwo(x):
... return x+2
...
>>> addTwo(2)
4
Except you're not using a function definition, you're assigning an identifier to the lambda.
The best place to use them is when you don't really want to define a function with a name, possibly because that function will only be used one time and not numerous times, in which case you would be better off with a function definition.
Example of a hash tree using lambdas:
>>> mapTree = {
... 'number': lambda x: x**x,
... 'string': lambda x: x[1:]
... }
>>> otype = 'number'
>>> mapTree[otype](2)
4
>>> otype = 'string'
>>> mapTree[otype]('foo')
'oo'
In this example I don't really want to define a name to either of those functions because I'll only use them within the hash, therefore I'll use lambdas.
I do not know which book you are using, but Dive into Python has a section which I think is informative.
Use of lambda is sort of a style thing. When you can get away with a very simple function, and usually where you are just storing it somewhere (in a list of functions perhaps, or in a GUI toolkit data structure, etc.) people feel lambda reduces clutter in their code.
In Python it is only possible to make a lambda that returns a single expression, and the lambda cannot span multiple lines (unless you join the multiple lines by using the backslash-at-the-end-of-a-line trick). People have requested that Python add improvements for lambda, but it hasn't happened. As I understand it, the changes to make lambda able to write any function would significantly complicate the parsing code in Python. And, since we already have def to define a function, the gain is not considered worth the complication. So there are some cases where you might wish to use lambda where it is not possible. In that case, you can just use a def:
object1.register_callback_function(lambda x: x.foo() > 3)
def fn(x):
if x.foo() > 3:
x.recalibrate()
return x.value() > 9
elif x.bar() > 3:
x.do_something_else()
return x.other_value < 0
else:
x.whatever()
return True
object2.register_callback_function(fn)
del(fn)
The first callback function was simple and a lambda sufficed. For the second one, it is simply not possible to use a lambda. We achieve the same effect by using def and making a function object that is bound to the name fn, and then passing fn to register_callback_function(). Then, just to show we can, we call del() on the name fn to unbind it. Now the name fn no longer is bound with any object, but register_callback_function() still has a reference to the function object so the function object lives on.

Categories