I have a function get_knng_graph that takes two parameters; a set of points and an integer k. I want to generate a sequence of functions, each of which only accepts the set of points, but with the value k of the parameter embedded inside different for every function.
Consider the code below:
# definition of get_knng_graph(....) here
graph_fns = []
for k in range(1,5):
def knng(pts):
return get_knng_graph(pts,k)
graph_fns.append(knng);
Is this reasonable code? By which I mean can I be assured that the values of the parameter k embedded inside each of the elements of graph_fns will continue to be different?
In the Haskell world, of course, this is nothing but currying, but this is the first time I am doing something like this in Python.
I tried it, and the code doesn't work. If I place a print(k) in the code above, then when I execute successive functions in the array, it keeps prints out 4 for all function runs.
The problem you are seeing is because Python creates that reference to the name k and doesn't capture the value, so your code is equivalent to this code:
graph_fns = []
def knng(pts):
return get_knng_graph(pts,k)
for k in range(1,5):
graph_fns.append(knng);
If you want to bind the value of k to the function, there are a couple of solutions.
The most trivial code change is to add an extra argument with a default parameter:
graph_fns = []
for k in range(1,5):
def knng(pts, k=k):
return get_knng_graph(pts, k)
graph_fns.append(knng)
You might also find it a bit cleaner to use functools.partial:
from functools import partial
graph_fns = []
for k in range(1,5):
knng = partial(get_knng_graph, k=k)
graph_fns.append(knng)
and by that time you could just use a list comprehension:
from functools import partial
graph_fns = [partial(get_knng_graph, k=k) for k in range(1, 5)]
There are some other options discussed on this page, like creating a class for this.
In Python, scopes are function wide, that is using a for loop does not introduce a new nested scope. Thus in this example, k is rebound every iteration, and the k in every knng closure refers to that same variable, and if you call any of them after the loop has run its course will show its last value (4 in this case). The standard Python way to deal with this is to shadow it with a default argument:
graph_fns = []
for k in range(1,5):
def knng(pts, k=k):
return get_knng_graph(pts,k)
graph_fns.append(knng)
This works because default arguments are bound when the definition is executed and the closure is created.
Seems to me, this is a good case for using partial from the functools module.
So say I have a function that takes an input, squares it and adds some variable to it before returning the result:
def x_squared_plus_n(x,n):
return (x**2)+n
If I want to curry, or redefine that function, modifying it so that a fixed number (say 5) is always squared, and has a number n added to it, I can do so by using partial
from functools import partial
five_squared_plus_n = partial(x_squared_plus_n,5)
Now, I have a new function five_squared_plus_n for which the first x parameter in the original function's parameter signature is fixed to x=5. The new function has a parameter signature containing only the remaining parameters, here n.
So calling:
five_squared_plus_n(15)
or equivalently,
five_squared_plus_n(n=15)
The answer of 40 is returned.
Any combination of parameters can be fixed like this and the resulting "curried" function be assigned to a new function name. It's a very powerful tool.
In your example, you could wrap your partial calls in a loop, over which the values of different values could be fixed, and assign the resultant functions to values in a dictionary. Using my simple example, that might look something like:
func_dict = {}
for k in range(1,5):
func_dict[k]=partial(x_squared_plus_n,k)
Which would prepare a series of functions, callable by reference to that dictionary - so:
func_dict[1](5)
Would return 12+5=6 , while
func_dict[3](12)
Would return 32+12=21 .
It is possible to assign proper python names to these functions, but that's probably for a different question - here, just imagine that the dictionary hosts a series of functions, accessible by key. I've used a numeric key, but you could assign strings or other values to help access the function you've prepared in this way.
Python's support for Haskell-style "functional" programming is fairly strong - you just need to dig around a little to access the appropriate hooks. I think,m subjectively, there's perhaps less purity in terms of functional design, but for most practical purposes, there is a functional solution.
Related
Look at this code. I am creating 3 lists of lambda functions (stored in the variables plus_n, plus_n_, and plus_n__). They suppose to be exactly the same. However, only plus_n_ shows the expected behavior.
MAX=5
plus_n=[lambda x: x+i for i in range(MAX)]
plus_n_=[]
for i in range(MAX):
plus_n_.append(lambda x: x+i)
def all_plus_n():
plus_ns=[]
for i in range(MAX):
plus_ns.append(lambda x: x+i)
return plus_ns
plus_n__=all_plus_n()
for i in range(len(plus_n)):
print('plus_n[{}]({})={}'.format(i,3,plus_n[i](3)))
print('plus_n_[{}]({})={}'.format(i,3,plus_n_[i](3)))
print('plus_n__[{}]({})={}'.format(i,3,plus_n__[i](3)))
print()
The output:
plus_n[0](3)=7
plus_n_[0](3)=3
plus_n__[0](3)=7
plus_n[1](3)=7
plus_n_[1](3)=4
plus_n__[1](3)=7
plus_n[2](3)=7
plus_n_[2](3)=5
plus_n__[2](3)=7
plus_n[3](3)=7
plus_n_[3](3)=6
plus_n__[3](3)=7
plus_n[4](3)=7
plus_n_[4](3)=7
plus_n__[4](3)=7
See, the exact same code gives different results if it is on a function or in a comprehensive list...
So, what is the difference between the 3 approaches? What is happening?
If I want to use this variable in multiple functions, do I have to use it as a global variable? Because seems that I cant use a function to get the variable values...
Tks in advance.
This is a somewhat interesting variation on the usual question. Normally, the plus_n_ version wouldn't work either, but you happen to have reused i as the iteration variable for your testing at the end of the code. Since plus_n_ captures the global i, and the test loop also sets the global i, the lambda retrieved from plus_n_ uses the correct value each time through the loop - even though it's late binding on i. The list comprehension has its own scoped i which has a value of 4 after evaluation (and doesn't change after that); similarly for the loop in the function.
The clean, explicit, simple way to bind function parameters in Python is functools.partial from the standard library:
from functools import partial
MAX = 5
# Or we could use `operator.add`
def add(i, x):
return i + x
# Works as expected, whatever happens to `i` later in any scope
plus_n = [partial(add, i) for i in range(MAX)]
This way, each call to partial produces a callable object that binds its own value for i, rather than late-binding to the name i:
for j in plus_n:
i = {"this isn't even a valid operand": 'lol'} # utterly irrelevant
print(plus_n[j](3))
Do notice, however, that the parameters to be bound need to be at the beginning for this approach.
Another way to solve the specific example problem is to rely on bound method calls:
plus_n = [i.__add__ for i in range(MAX)]
You can also hand-roll your own currying, but why reinvent the wheel?
Finally, it is possible to use default parameters to lambdas to bind parameters - although I greatly dislike this method, since it is abusing the behaviour that causes another common problem and implying the existence of a parameter that could be overridden but isn't designed for it:
plus_n = [lambda x, i=i: x+i for i in range(MAX)]
I wanted to make a dictionary that looks like this:
example = dict(C# = "o.ooo.")
Because there is '#' symbol, the rest greys out.
I know I can fix this problem by doing this:
test = [("C#"), ("o.ooo.")]
example = dict(test)
I was wondering if there was something that could fix my problem such as:
example = dict(r(C#) = "o.ooo.") - which obviously doesn't work.
Like in other programming language Python has some ways of reusing the same code in different position of the program. One of them is function concept (called procedure in same languages) that we can classify by type of arguments/parameters in:
Positional Function Parameters
In this category a value for the parameter is assigned by position. So if we have the following function definition
def newLengths(bridge1,bridge2,bridge3):
#updating lengths
and we call it like this newLenghts(1200,1001,1110) the parameter bridge2 will take for value 1001 meters because it was in the second position.
Named/Keyword Python Functional Parameters
In this case we are explicitly telling to python which variable we want to assign a value and not let's implicitly determine by position.
So now for the previous function we can do newLenghts(1200,bridge2 = 1001, 1110). This way of using named parameters is useful in different situations, especially for default values of parameters when not indicated.
The important part of keyword parameters is the fact the the keyword need to be a valid variable identifier in Python to be used like so, otherwise python will think it's something else, like a number and so one. In your case you try using # that have the specializing of line comment and can't be combined to create an identifier.
So I can say that there is no way of using the hashtag unless you modify Python syntax so it's a valid character for variable's name.
However there are some ways to create "C#" key:
Use the string directly like key in the dictionary {"C#":"O.OOO."}
Create an iterable object and pass it when calling dict()
The simplest way to make an iterable for this is to use the function zip() that take two lists and combine them one element of the first list with the one in the same position of the second list.
keys = ["C#","Java","Python"]
values = ["Book1","Book1","Book0"]
example = dict(zip(keys,values))
Passing keywords to the dict() built-in function is problematic here, because the # in your key is being
misinterpreted as the beginning of a comment.
Instead, use the literal syntax:
example = {"C#": "o.ooo."}
I wrote a function (testFunction) with four return values in Python:
diff1, diff2, sameCount, vennPlot
where the first 3 values (in the output tuple) were used to plot "vennPlot" inside of the function.
A similar questions was asked : How can I plot output from a function which returns multiple values in Python?, but in my case, I also want to know two additional things:
I will likely to use this function later, and seems like I need to memorize the order of the returns so that I can extract the correct return for downstream work. Am I correct here? If so, is there better ways to refer to the tuple return than do output[1], or output[2]? (output=testFunction(...))
Generally speaking, is it appropriate to have multiple outputs from a function? (E.g. in my case, I could just return the first three values and draw the venn diagram outside of the function.)
Technically, every function returns exactly one value; that value, however, can be a tuple, a list, or some other type that contains multiple values.
That said, you can return something that uses something other than just the order of values to distinguish them. You can return a dict:
def testFunction(...):
...
return dict(diff1=..., diff2=..., sameCount=..., venn=...)
x = testFunction(...)
print(x['diff1'])
or you can define a named tuple:
ReturnType = collections.namedtuple('ReturnType', 'diff1 diff2 sameCount venn')
def testFunction(...):
...
return ReturnType(diff1=..., diff2=..., sameCount=..., venn=...)
x = testFunction(...)
print(x.diff1) # or x[0], if you still want to use the index
To answer your first question, you can unpack tuples returned from a function as such:
diff1, diff2, samecount, vennplot = testFunction(...)
Secondly, there is nothing wrong with multiple outputs from a function, though using multiple return statements within the same function is typically best avoided if possible for clarity's sake.
I will likely to use this function later, and seems like I need to memorize the order of the returns so that I can extract the correct return for downstream work. Am I correct here?
It seems you're correct (depends on your use case).
If so, is there better ways to refer to the tuple return than do output[1], or output[2]? (output=testFunction(...))
You could use a namedtuple: docs
or - if order is not important - you could just return a dictionary, so you can acess the values by name.
Generally speaking, is it appropriate to have multiple outputs from a function? (E.g. in my case, I could just return the first three values and draw the venn diagram outside of the function.)
Sure, as long as it's documented, then it's just what the function does and the programmer knows then how to handle the return values.
Python supports direct unpacking into variables. So downstream, when you call the function, you can retrieve the return values into separate variables as simply as:
diff1, diff2, sameCount, vennPlot= testFunction(...)
EDIT: You can even "swallow" the ones you don't need. For example:
diff1, *stuff_in_the_middle, vennPlot= testFunction(...)
in which case stuff_in_the_middle will contain a tuple of 2.
It is quite appropriate AFAIK, even standard library modules return tuples.
For example - Popen.communicate() from the subprocess module.
I have an assignment in a mooc where I have to code a function that returns the cumulative sum, cumulative product, max and min of an input list.
This part of the course was about functional programming, so I wanted to go all out on this, even though I can use other ways.
So I tried this:
from operator import mul
from itertools import repeat
from functools import reduce
def reduce2(l):
print(l)
return reduce(*l)
def numbers(l):
return tuple(map(reduce2, zip([sum, mul,min, max], repeat(l,4))))
l=[1,2,3,4,5]
numbers(l)
My problem is that it doesn't work. zip will pass only one object to reduce if I use it inside map, and unpacking the zip will yield the 4 tuple of (function and argument list l) so I defined reduce2 for this reason, I wanted to unpack the zip inside it but it did not work.
Python returns a TypeError: int' object is not iterable
I thought that I could use return reduce(l[0],l[1]) in reduce2, but there is still the same Error.
I don't understand the behavior of python here.
If I merely use return reduce(l), it returns again a TypeError: reduce expected at least 2 arguments, got 1
What's happening here? How could I make it work?
Thanks for your help.
Effectively, you are trying to execute code like this:
xs = [1, 2, 3, 4, 5]
reduce(sum, xs)
But sum takes an iterable and isn't really compatible with direct use via reduce. Instead, you need a function that takes 2 arguments and returns their sum -- a function analogous to mul. You can get that from operator:
from operator import mul, add
Then just change sum to add in your program.
BTW, functional programming has a variable naming convention that is really cool: x for one thing, and xs for a list of them. It's much better than the hard-to-read l variable name. Also it uses singular/plural to tell you whether you are dealing with a scalar value or a collection.
FMc answer's correctly diagnoses the error in your code. I just want to add a couple alternatives to your map + zip approach.
For one, instead of defining a special version of reduce, you can use itertools.starmap instead of map, which is designed specifically for this purpose:
def numbers(xs):
return tuple(starmap(reduce, zip([add, mul, min, max], repeat(xs))))
However, even better would be to use the often ignored variadic version of map instead of manually zipping the arguments:
def numbers(xs):
return tuple(map(reduce, [add, mul, min, max], repeat(xs)))
It essentially does the zip + starmap for you. In terms of functional programming, this version of map is analogous to Haskell's zipWith function.
What is considered to be a better programming practice when dealing with more object at time (but with the option to process just one object)?
A: LOOP INSIDE FUNCTION
Function can be called with one or more objects and it is iterating inside function:
class Object:
def __init__(self, a, b):
self.var_a = a
self.var_b = b
var_a = ""
var_b = ""
def func(obj_list):
if type(obj_list) != list:
obj_list = [obj_list]
for obj in obj_list:
# do whatever with an object
print(obj.var_a, obj.var_b)
obj_list = [Object("a1", "a2"), Object("b1", "b2")]
obj_alone = Object("c1", "c2")
func(obj_list)
func(obj_alone)
B: LOOP OUTSIDE FUNCTION
Function is dealing with one object only and when it is dealing with more objects in must be called multiple times.
class Object:
def __init__(self, a, b):
self.var_a = a
self.var_b = b
var_a = ""
var_b = ""
def func(obj):
# do whatever with an object
print(obj.var_a, obj.var_b)
obj_list = [Object("a1", "a2"), Object("b1", "b2")]
obj_alone = Object("c1", "c2")
for obj in obj_list:
func(obj)
func(obj_alone)
I personally like the first one (A) more, because for me it makes cleaner code when calling the function, but maybe it's not the right approach. Is there some method generally better than the other? And if not, what are the cons and pros of each method?
A function should have a defined input and output and follow the single responsibility principle. You need to be able to clearly define your function in terms of "I put foo in, I get bar back". The more qualifiers you need to make in this statement to properly describe your function probably means your function is doing too much. "I put foo in and get bar back, unless I put baz in then I also get bar back, unless I put a foo-baz in then it'll error".
In this particular case, you can pass an object or a list of objects. Try to generalise that to a value or a list of values. What if you want to pass a list as a value? Now your function behaviour is ambiguous. You want the single list object to be your value, but the function treats it as multiple arguments instead.
Therefore, it's trivial to adapt a function which takes one argument to work on multiple values in practice. There's no reason to complicate the function's design by making it adaptable to multiple arguments. Write the function as simple and clearly as possible, and if you need it to work through a list of things then you can loop it through that list of things outside the function.
This might become clearer if you try to give an actual useful name to your function which describes what it does. Do you need to use plural or singular terms? foo_the_bar(bar) does something else than foo_the_bars(bars).
Move loops outside functions (when possible)
Generally speaking, keep loops that do nothing but iterate over the parameter outside of functions. This gives the caller maximum control and assumes the least about how the client will use the function.
The rule of thumb is to use the most minimal parameter complexity that the function needs do its job.
For example, let's say you have a function that processes one item. You've anticipated that a client might conceivably want to process multiple items, so you changed the parameter to an iterable, baked a loop into the function, and are now returning a list. Why not? It could save the client from writing an ugly loop in the caller, you figure, and the basic functionality is still available -- and then some!
But this turns out to be a serious constraint. Now the caller needs to pack (and possibly unpack, if the function returns a list of results in addition to a list of arguments) that single item into a list just to use the function. This is confusing and potentially expensive on heap memory:
>>> def square(it): return [x ** 2 for x in it]
...
>>> square(range(6)) # you're thinking ...
[0, 1, 4, 9, 16, 25]
>>> result, = square([3]) # ... but the client just wants to square 1 number
>>> result
9
Here's a much better design for this particular function, intuitive and flexible:
>>> def square(x): return x ** 2
...
>>> square(3)
9
>>> [square(x) for x in range(6)]
[0, 1, 4, 9, 16, 25]
>>> list(map(square, range(6)))
[0, 1, 4, 9, 16, 25]
>>> (square(x) for x in range(6))
<generator object <genexpr> at 0x00000166D122CBA0>
>>> all(square(x) % 2 for x in range(6))
False
This brings me to a second problem with the functions in your code: they have a side-effect, print. I realize these functions are just for demonstration, but designing functions like this makes the example somewhat contrived. Functions typically return values rather than simply produce side-effects, and the parameters and return values are often related, as in the above example -- changing the parameter type bound us to a different return type.
When does it make sense to use an iterable argument? A good example is sort -- the smallest unit of operation for a sorting function is an iterable, so the problem of packing and unpacking in the square example above is a non-issue.
Following this logic a step further, would it make sense for a sort function to accept a list (or variable arguments) of lists? No -- if the caller wants to sort multiple lists, they should loop over them explicitly and call sort on each one, as in the second square example.
Consider variable arguments
A nice feature that bridges the gap between iterables and single arguments is support for variable arguments, which many languages offer. This sometimes gives you the best of both worlds, and some functions go so far as to accept either args or an iterable:
>>> max([1, 3, 2])
3
>>> max(1, 3, 2)
3
One reason max is nice as a variable argument function is that it's a reduction function, so you'll always get a single value as output. If it were a mapping or filtering function, the output is always a list (or generator) so the input should be as well.
To take another example, a sort routine wouldn't make much sense with varargs because it's a classically in-place algorithm that works on lists, so you'd need to unpack the list into the arguments with the * operator pretty much every time you invoke the function -- not cool.
There's no real need for a call like sort(1, 3, 4, 2) as there is with max, where the parameters are just as likely to be loose variables as they are a packed iterable. Varargs are usually used when you have a small number of arguments, or the thing you're unpacking is a small pair or tuple-type element, as often the case with zip.
There's definitely a "feel" to when to offer parameters as varargs, an iterable, or a single value (i.e. let the caller handle looping), but as long as you follow the rule of avoiding iterables unless they're essential to the function, it's hard to go wrong.
As a final tip, try to write your functions with similar contracts to the library functions in your language or the tools you use frequently. These are pretty much always designed well; mimic good design.
If you implement B then you will make it harder for yourself to achieve A.
If you implement A then it isn't too difficult to achieve B. You also have many tools already available to apply this function to a list of arguments (the loop method you described, using something like map, or even a multiprocessing approach if needed)
Therefore I would choose to implement A, and if it makes things neater or easier in a given case you can think about also implementing B (using A) also so that you have both.