I understand the normal lambda expression, such as
g = lambda x: x**2
However, for some complex ones, I am a little confused about them. For example:
for split in ['train', 'test']:
sets = (lambda split=split: newspaper(split, newspaper_devkit_path))
def get_imdb():
return sets()
Where newspaper is a function. I was wondering what actually the sets is and why the get_imdb function can return the value sets()
Thanks for your help!
Added:
The codes are actually from here factory.py
sets is being assigned a lambda that is not really supposed to accept inputs, which you see from the way it is invoked. Lambdas in general behave like normal functions, and can therefore be assigned to variables like g or sets. The definition of sets is surrounded by an extra set of parentheses for no apparent reason. You can ignore those outer parens.
Lambdas can have all the same types of positional, keyword and default arguments a normal function can. The lambda sets has a default parameter named split. This is a common idiom to ensure that sets in each iteration of the loop gets the value of split corresponding to that iteration rather than just the one from the last iteration in all cases.
Without a default parameter, split would be evaluated within the lambda based on the namespace at the time it was called. Once the loop completes, split in the outer function's namespace will just be the last value it had for the loop.
Default parameters are evaluated immediately when a function object is created. This means that the value of the default parameter split will be wherever it is in the iteration of the loop that creates it.
Your example is a bit misleading because it discards all the actual values of sets besides the last one, making the default parameter to the lambda meaningless. Here is an example illustrating what happens if you keep all the lambdas. First with the default parameter:
sets = []
for split in ['train', 'test']:
sets.append(lambda split=split: split)
print([fn() for fn in sets])
I have truncated the lambdas to just return their input parameter for purposes of illustration. This example will print ['train', 'test'], as expected.
If you do the same thing without the default parameter, the output will be ['test', 'test'] instead:
sets = []
for split in ['train', 'test']:
sets.append(lambda: split)
print([fn() for fn in sets])
This is because 'test' is the value of split when all the lambdas get evaluated.
A lambda function:
func = lambda x: x**2
can be rewritten almost equivalently:
def func(x):
return x**2
Using either way, you can call the function in this manner:
func(4)
In your example,
sets = lambda split=split: newspaper(split, newspaper_devkit_path)
can be rewritten:
def sets(split=split):
return newspaper(split, newspaper_devkit_path)
and so can be called:
sets()
When you write the following:
def get_imdb():
return sets()
you are defining a "closure". A reference to the function sets is saved within get_imdb so that it can be called later wherever get_imdb is called.
Maybe you are confused about the split=split part. This has the same meaning as it would have in a regular function: the split on the left is a parameter of the lambda function and the split on the right is the default value the left split takes when no value is provided. In this case, the default value would be the variable split defined in the for loop.
So, answering your first question (what is sets?):
sets is a variable to which an anonymous function (or lambda function) is assigned. This allows the lambda function to be referenced and used via the variable sets.
To your second question (why can sets() be returned?), I respond:
Since sets is a variable that acts as a function, adding parenthesis after it calls the lambda function. Because no parameters are given, the parameter split takes the value 'test', which is the last value the for loop variable split takes. It is worth noting here that, since sets is not defined inside the function get_imdb, the interpreter looks for a definition of sets outside the scope of get_imdb (and finds the one that refers to the lambda function).
Related
I have a question about arguments in functions, in particular initialising an array or other data structure within the function call, like the following:
def helper(root, result = []):
...
My question is, what is the difference between the above vs. doing:
def helper(root):
result = []
I can see why this would be necessary if we were to run recursions, i.e. we would need to use the first case in some instances.
But are there any other instances, and am I right in saying it is necessary in some cases for recursion, or can we always use the latter instead?
Thanks
Python uses pointers for lists, so initializing a list or any other mutable objects in function definition is a bad idea.
The best way of doing it is like this:
def helper(root, result=None):
if isinstance(result, type(None)):
result = []
Now if you only pass one argument to the function, the "result" will be an empty list.
If you initiate the list within the function definition, by calling the function multiple times, "result" won't reset and it will keep the values from previous calls.
For example, when I have:
def function(text):
print(text)
mylist = [function('yes'),function('no')]
mylist[0]
It just prints yes and no and doesn't do anything with mylist[0].
I want it to be able to call the function with parameters in the list and not have the functions run when the program starts. Is this possible?
It sounds like you want to store the functions pre-bound with parameters. You can use functools.partial to give you a function with some or all of the parameters already defined:
from functools import partial
def f(text):
print(text)
mylist = [partial(f, 'yes'), partial(f,'no')]
mylist[0]()
# yes
mylist[1]()
# no
You can store functions as parameters, since functions in python are treated as first class citizens, which means you can effectively use them / treat them as you would a variable.
In your case, it isn't working because you're calling the function, not referencing the function.
mylist = [functionA,functionB]
mylist[0]('yes')
Here I've stored the references of the funcitons into a list, instead of calling them.
When you use (), you're effectively calling the function on the spot.
If you print mylist you will see that it contains None, which is the default return value of functions that don't have a return statement. You are, in fact, simply calling your function and saving the return value in a list.
Consider the following example:
test = 123
f = lambda x,test=test: print(x, test)
del test
f('hello')
prints
hello 123
By capturing the variable in the lambda definition, the original variable seems to be retained.
Is the lambda ok to use when you could also use a simple object to store the same data?
class Test:
def __init__(self, some_data):
self.some_data = some_data
def f(self, x):
print(x, self.some_data)
t = Test('123')
t.f('hello')
By capturing the variable in the lambda definition, the original variable seems to be retained.
Becaused you made it the default value for the test argument of the lambda, and Python only evaluates arguments defaults once when the function is created. FWIW, this has nothing to do with the use of the lambda keyword - lambda is just syntactic sugar and you'd get exactly the same result with a "full-blown" function, ie:
test = 123
def f(x, test=test):
print(x, test)
test = 456
f('hello')
Is the lambda ok to use when you could also use a simple object to store the same data?
Oranges and apples, really. A function is not meant to be use to "store" anything, but to perform some computation. The fact that it can capture some values either via the arguments defaults or via a closure, while quite useful for some use cases, is not meant as a replacement for proper objects or collections. So the answer is : depends on what you want to do with those values and your function.
NB : technically what you're doing here is (almost) what is known as "partial application" (you'd just have to rename Test.f to Test.__call__ and use t("hello") in the second example)). A partial application of a function is the mechanism by which a function of N arguments, when called with N - x (with x < N) arguments, returns a function of N-x arguments that when called with the missing arguments will return the result of the original function, ie:
def foo(a, b=None):
if b is None:
return lambda b, a=a: foo(b, a)
return a + b
# here, `f` is a partial application of function `foo` to `1`
f = foo(1)
print(f)
f(2)
In this case we use a closure to capture a, in the functional programming tradition - "partial application" is also mostly a functionnal programming concept FWIW. Now while it does support some FP features and idioms, Python is first and foremost an OO language, and since closures are the FP equivalent of objects (closures are a way to encapsulate state and behaviour together), it also makes sense to implement partial application as proper class, either with "ad hoc" specialized objects (your Test class but also Method objects), or with a more generic "partial" class - which already exists in the stdlib as functools.partial
The default argument value of an argument is evaluated once, when the function or lambda is defined. It is not evaluated at each call site. (This is why you usually cannot use [] as a default argument and have to specify None instead, and write code for allocating a fresh list each time the function is called.)
It really depends on the surrounding code if a lambda, function, or class is appropriate. Usually, if there is only one operation and the state cannot be mutated directly from the outside (only within the operation), a lambda or function is appropriate.
I recently attempted Googles foo.bar challenge. After my time was up I decided to try find a solution to the problem I couldn't do and found a solution here (includes the problem statement if you're interested). I'd previously been making a dictionary for every function I wanted to cache but it looks like in this solution any function/input can be cached using the same syntax.
Firstly I'm confused on how the code is even working, the *args variable isn't inputted as an argument (and prints to nothing). Heres an modified minimal example to illustrate my confusion:
mem = {}
def memoize(key, func, *args):
"""
Helper to memoize the output of a function
"""
print(args)
if key not in mem:
# store the output of the function in memory
mem[key] = func(*args)
return mem[key]
def example(n):
return memoize(
n,
lambda: longrun(n),
)
def example2(n):
return memoize(
n,
longrun(n),
)
def longrun(n):
for i in range(10000):
for j in range(100000):
2**10
return n
Here I use the same memoize function but with a print. The function example returns memoize(n, a lambda function,). The function longrun is just an identity function with lots of useless computation so it's easy to see if the cache is working (example(2) will take ~5 seconds the first time and be almost instant after).
Here are my confusions:
Why is the third argument of memoize empty? When args is printed in memoize it prints (). Yet somehow mem[key] stores func(*args) as func(key)?
Why does this behavior only work when using the lambda function (example will cache but example2 won't)? I thought lambda: longrun(n) is just a short way of giving as input a function which returns longrun(n).
As a bonus, does anyone know how you could memoize functions using a decorator?
Also I couldn't think of a more descriptive title, edits welcome. Thanks.
The notation *args stands for a variable number of positional arguments. For example, print can be used as print(1), print(1, 2), print(1, 2, 3) and so on. Similarly, **kwargs stands for a variable number of keyword arguments.
Note that the names args and kwargs are just a convention - it's the * and ** symbols that make them variadic.
Anyways, memoize uses this to accept basically any input to func. If the result of func isn't cached, it's called with the arguments. In a function call, *args is basically the reverse of *args in a function definition. For example, the following are equivalent:
# provide *args explicitly
print(1, 2, 3)
# unpack iterable to *args
arguments = 1, 2, 3
print(*arguments)
If args is empty, then calling print(*args) is the same as calling print() - no arguments are passed to it.
Functions and lambda functions are the same in python. It's simply a different notation for creating a function object.
The problem is that in example2, you are not passing a function. You call a function, then pass on its result. Instead, you have to pass on the function and its argument separately.
def example2(n):
return memoize(
n,
longrun, # no () means no call, just the function object
# all following parameters are put into *args
n
)
Now, some implementation details: why is args empty and why is there a separate key?
The empty args comes from your definition of the lambda. Let's write that as a function for clarity:
def example3(n):
def nonlambda():
return longrun(n)
return memoize(n, nonlambda)
Note how nonlambda takes no arguments. The parameter n is bound from the containing scope as a closure, bound from the containing scope. As such, you don't have to pass it to memoize - it is already bound inside the nonlambda. Thus, args is empty in memoize, even though longrun does receive a parameter, because the two don't interact directly.
Now, why is it mem[key] = f(*args), not mem[key] = f(key)? That's actually slightly the wrong question; the right question is "why isn't it mem[f, args] = f(*args)?".
Memoization works because the same input to the same function leads to the same output. That is, f, args identifies your output. Ideally, your key would be f, args as that's the only relevant information.
The problem is you need a way to look up f and args inside mem. If you ever tried putting a list inside a dict, you know there are some types which don't work in mappings (or any other suitable lookup structure, for that matter). So if you define key = f, args, you cannot memoize functions taking mutable/unhashable types. Python's functools.lru_cache actually has this limitation.
Defining an explicit key is one way of solving this problem. It has the advantage that the caller can select an appropriate key, for example taking n without any modifications. This offers the best optimization potential. However, it breaks easily - using just n misses out the actual function called. Memoizing a second function with the same input would break your cache.
There are alternative approaches, each with pros and cons. Common is the explicit conversion of types: list to tuple, set to frozenset, and so on. This is slow, but the most precise. Another approach is to just call str or repr as in key = repr((f, args, sorted(kwargs.items()))), but it relies on every value having a proper repr.
Can you give an example and other examples that show when and when not to use Lambda?
My book gives me examples, but they're confusing.
Lambda, which originated from Lambda Calculus and (AFAIK) was first implemented in Lisp, is basically an anonymous function - a function which doesn't have a name, and is used in-line, in other words you can assign an identifier to a lambda function in a single expression as such:
>>> addTwo = lambda x: x+2
>>> addTwo(2)
4
This assigns addTwo to the anonymous function, which accepts 1 argument x, and in the function body it adds 2 to x, it returns the last value of the last expression in the function body so there's no return keyword.
The code above is roughly equivalent to:
>>> def addTwo(x):
... return x+2
...
>>> addTwo(2)
4
Except you're not using a function definition, you're assigning an identifier to the lambda.
The best place to use them is when you don't really want to define a function with a name, possibly because that function will only be used one time and not numerous times, in which case you would be better off with a function definition.
Example of a hash tree using lambdas:
>>> mapTree = {
... 'number': lambda x: x**x,
... 'string': lambda x: x[1:]
... }
>>> otype = 'number'
>>> mapTree[otype](2)
4
>>> otype = 'string'
>>> mapTree[otype]('foo')
'oo'
In this example I don't really want to define a name to either of those functions because I'll only use them within the hash, therefore I'll use lambdas.
I do not know which book you are using, but Dive into Python has a section which I think is informative.
Use of lambda is sort of a style thing. When you can get away with a very simple function, and usually where you are just storing it somewhere (in a list of functions perhaps, or in a GUI toolkit data structure, etc.) people feel lambda reduces clutter in their code.
In Python it is only possible to make a lambda that returns a single expression, and the lambda cannot span multiple lines (unless you join the multiple lines by using the backslash-at-the-end-of-a-line trick). People have requested that Python add improvements for lambda, but it hasn't happened. As I understand it, the changes to make lambda able to write any function would significantly complicate the parsing code in Python. And, since we already have def to define a function, the gain is not considered worth the complication. So there are some cases where you might wish to use lambda where it is not possible. In that case, you can just use a def:
object1.register_callback_function(lambda x: x.foo() > 3)
def fn(x):
if x.foo() > 3:
x.recalibrate()
return x.value() > 9
elif x.bar() > 3:
x.do_something_else()
return x.other_value < 0
else:
x.whatever()
return True
object2.register_callback_function(fn)
del(fn)
The first callback function was simple and a lambda sufficed. For the second one, it is simply not possible to use a lambda. We achieve the same effect by using def and making a function object that is bound to the name fn, and then passing fn to register_callback_function(). Then, just to show we can, we call del() on the name fn to unbind it. Now the name fn no longer is bound with any object, but register_callback_function() still has a reference to the function object so the function object lives on.