I was looking at an article on Peter Norvig's website, where he's trying to answer the following question (this is not my question, btw)
"Can I do the equivalent of (test ? result : alternative) in Python?"
here's one of the options listed by him,
def if_(test, result, alternative=None):
"If test is true, 'do' result, else alternative. 'Do' means call if callable."
if test:
if callable(result): result = result()
return result
else:
if callable(alternative): alternative = alternative()
return alternative
And here's a usage example.
>>> fact = lambda n: if_(n <= 1, 1, lambda: n * fact(n-1))
>>> fact(6)
720
I understand how this works (I think), but I was just playing with the code, and decided to see what happens when I change the third argument in the definition of 'fact' above to
n * fact(n-1), that is, change it to a non-callable expression. On running it, the interpreter goes into a never ending loop. I have a pretty good idea of why that is happening, that is, the if_ function is returning back the same expression that it is receiving. But what is the type of that expression? What exactly is going on here? I am not looking for a detailed explanation , but just for some pointers to python's evaluation model which might help my understanding.
Thanks!
The reason the loop never terminates when you change fact to n * fact(n-1) is that n * fact(n-1) has to evaluate first (as the third argument to if). Evaluating it leads to another call to fact, ad infinitum (since there is no longer any base case to stop it).
Previously, you were passing a function object (lambda), which would not be evaluated until the body of if, and its result would be checked via test.
This is known (I believe) as eager evaluation, where function arguments are evaluated before they are passed to the function. In a lazy-evaluation scheme, the arguments would not be evaluated until they were used in the function body.
Related
This question already has answers here:
"Least Astonishment" and the Mutable Default Argument
(33 answers)
Closed 6 months ago.
I had a very difficult time with understanding the root cause of a problem in an algorithm. Then, by simplifying the functions step by step I found out that evaluation of default arguments in Python doesn't behave as I expected.
The code is as follows:
class Node(object):
def __init__(self, children = []):
self.children = children
The problem is that every instance of Node class shares the same children attribute, if the attribute is not given explicitly, such as:
>>> n0 = Node()
>>> n1 = Node()
>>> id(n1.children)
Out[0]: 25000176
>>> id(n0.children)
Out[0]: 25000176
I don't understand the logic of this design decision? Why did Python designers decide that default arguments are to be evaluated at definition time? This seems very counter-intuitive to me.
The alternative would be quite heavyweight -- storing "default argument values" in the function object as "thunks" of code to be executed over and over again every time the function is called without a specified value for that argument -- and would make it much harder to get early binding (binding at def time), which is often what you want. For example, in Python as it exists:
def ack(m, n, _memo={}):
key = m, n
if key not in _memo:
if m==0: v = n + 1
elif n==0: v = ack(m-1, 1)
else: v = ack(m-1, ack(m, n-1))
_memo[key] = v
return _memo[key]
...writing a memoized function like the above is quite an elementary task. Similarly:
for i in range(len(buttons)):
buttons[i].onclick(lambda i=i: say('button %s', i))
...the simple i=i, relying on the early-binding (definition time) of default arg values, is a trivially simple way to get early binding. So, the current rule is simple, straightforward, and lets you do all you want in a way that's extremely easy to explain and understand: if you want late binding of an expression's value, evaluate that expression in the function body; if you want early binding, evaluate it as the default value of an arg.
The alternative, forcing late binding for both situation, would not offer this flexibility, and would force you to go through hoops (such as wrapping your function into a closure factory) every time you needed early binding, as in the above examples -- yet more heavy-weight boilerplate forced on the programmer by this hypothetical design decision (beyond the "invisible" ones of generating and repeatedly evaluating thunks all over the place).
In other words, "There should be one, and preferably only one, obvious way to do it [1]": when you want late binding, there's already a perfectly obvious way to achieve it (since all of the function's code is only executed at call time, obviously everything evaluated there is late-bound); having default-arg evaluation produce early binding gives you an obvious way to achieve early binding as well (a plus!-) rather than giving TWO obvious ways to get late binding and no obvious way to get early binding (a minus!-).
[1]: "Although that way may not be obvious at first unless you're Dutch."
The issue is this.
It's too expensive to evaluate a function as an initializer every time the function is called.
0 is a simple literal. Evaluate it once, use it forever.
int is a function (like list) that would have to be evaluated each time it's required as an initializer.
The construct [] is literal, like 0, that means "this exact object".
The problem is that some people hope that it to means list as in "evaluate this function for me, please, to get the object that is the initializer".
It would be a crushing burden to add the necessary if statement to do this evaluation all the time. It's better to take all arguments as literals and not do any additional function evaluation as part of trying to do a function evaluation.
Also, more fundamentally, it's technically impossible to implement argument defaults as function evaluations.
Consider, for a moment the recursive horror of this kind of circularity. Let's say that instead of default values being literals, we allow them to be functions which are evaluated each time a parameter's default values are required.
[This would parallel the way collections.defaultdict works.]
def aFunc( a=another_func ):
return a*2
def another_func( b=aFunc ):
return b*3
What is the value of another_func()? To get the default for b, it must evaluate aFunc, which requires an eval of another_func. Oops.
Of course in your situation it is difficult to understand. But you must see, that evaluating default args every time would lay a heavy runtime burden on the system.
Also you should know, that in case of container types this problem may occur -- but you could circumvent it by making the thing explicit:
def __init__(self, children = None):
if children is None:
children = []
self.children = children
The workaround for this, discussed here (and very solid), is:
class Node(object):
def __init__(self, children = None):
self.children = [] if children is None else children
As for why look for an answer from von Löwis, but it's likely because the function definition makes a code object due to the architecture of Python, and there might not be a facility for working with reference types like this in default arguments.
I thought this was counterintuitive too, until I learned how Python implements default arguments.
A function's an object. At load time, Python creates the function object, evaluates the defaults in the def statement, puts them into a tuple, and adds that tuple as an attribute of the function named func_defaults. Then, when a function is called, if the call doesn't provide a value, Python grabs the default value out of func_defaults.
For instance:
>>> class C():
pass
>>> def f(x=C()):
pass
>>> f.func_defaults
(<__main__.C instance at 0x0298D4B8>,)
So all calls to f that don't provide an argument will use the same instance of C, because that's the default value.
As far as why Python does it this way: well, that tuple could contain functions that would get called every time a default argument value was needed. Apart from the immediately obvious problem of performance, you start getting into a universe of special cases, like storing literal values instead of functions for non-mutable types to avoid unnecessary function calls. And of course there are performance implications galore.
The actual behavior is really simple. And there's a trivial workaround, in the case where you want a default value to be produced by a function call at runtime:
def f(x = None):
if x == None:
x = g()
This comes from python's emphasis on syntax and execution simplicity. a def statement occurs at a certain point during execution. When the python interpreter reaches that point, it evaluates the code in that line, and then creates a code object from the body of the function, which will be run later, when you call the function.
It's a simple split between function declaration and function body. The declaration is executed when it is reached in the code. The body is executed at call time. Note that the declaration is executed every time it is reached, so you can create multiple functions by looping.
funcs = []
for x in xrange(5):
def foo(x=x, lst=[]):
lst.append(x)
return lst
funcs.append(foo)
for func in funcs:
print "1: ", func()
print "2: ", func()
Five separate functions have been created, with a separate list created each time the function declaration was executed. On each loop through funcs, the same function is executed twice on each pass through, using the same list each time. This gives the results:
1: [0]
2: [0, 0]
1: [1]
2: [1, 1]
1: [2]
2: [2, 2]
1: [3]
2: [3, 3]
1: [4]
2: [4, 4]
Others have given you the workaround, of using param=None, and assigning a list in the body if the value is None, which is fully idiomatic python. It's a little ugly, but the simplicity is powerful, and the workaround is not too painful.
Edited to add: For more discussion on this, see effbot's article here: http://effbot.org/zone/default-values.htm, and the language reference, here: http://docs.python.org/reference/compound_stmts.html#function
I'll provide a dissenting opinion, by addessing the main arguments in the other posts.
Evaluating default arguments when the function is executed would be bad for performance.
I find this hard to believe. If default argument assignments like foo='some_string' really add an unacceptable amount of overhead, I'm sure it would be possible to identify assignments to immutable literals and precompute them.
If you want a default assignment with a mutable object like foo = [], just use foo = None, followed by foo = foo or [] in the function body.
While this may be unproblematic in individual instances, as a design pattern it's not very elegant. It adds boilerplate code and obscures default argument values. Patterns like foo = foo or ... don't work if foo can be an object like a numpy array with undefined truth value. And in situations where None is a meaningful argument value that may be passed intentionally, it can't be used as a sentinel and this workaround becomes really ugly.
The current behaviour is useful for mutable default objects that should be shared accross function calls.
I would be happy to see evidence to the contrary, but in my experience this use case is much less frequent than mutable objects that should be created anew every time the function is called. To me it also seems like a more advanced use case, whereas accidental default assignments with empty containers are a common gotcha for new Python programmers. Therefore, the principle of least astonishment suggests default argument values should be evaluated when the function is executed.
In addition, it seems to me that there exists an easy workaround for mutable objects that should be shared across function calls: initialise them outside the function.
So I would argue that this was a bad design decision. My guess is that it was chosen because its implementation is actually simpler and because it has a valid (albeit limited) use case. Unfortunately, I don't think this will ever change, since the core Python developers want to avoid a repeat of the amount of backwards incompatibility that Python 3 introduced.
Python function definitions are just code, like all the other code; they're not "magical" in the way that some languages are. For example, in Java you could refer "now" to something defined "later":
public static void foo() { bar(); }
public static void main(String[] args) { foo(); }
public static void bar() {}
but in Python
def foo(): bar()
foo() # boom! "bar" has no binding yet
def bar(): pass
foo() # ok
So, the default argument is evaluated at the moment that that line of code is evaluated!
Because if they had, then someone would post a question asking why it wasn't the other way around :-p
Suppose now that they had. How would you implement the current behaviour if needed? It's easy to create new objects inside a function, but you cannot "uncreate" them (you can delete them, but it's not the same).
In my workings with python, I have sometimes found that I deleted a variable and just left the string inline in my code. I have noticed you can do this and there is no notification at all from the compiler or any runtime error when this occurs. Can someone tell me why this does not error out?
c = 1
"this causes no error or notification to the developer"
b = 2
This appears to work just like a docstring but there is no triple quotes, just a single set. I could not find any documentation on the subject.
As in many languages, it is perfectly valid to have an expression that is not assigned to a variable.
3.14
"Boo!"
False
2 + 2
callable(func) and func()
Python will print the results of such expressions when they are executed interactively, but in a script, these values are simply discarded.
Such expressions may actually have side effects (such as the last one, which calls func if it's callable, and func can do anything at all) so it is not necessarily pointless to write such code. In fact, a call to a function that does not return anything you need to keep around afterward is exactly the sort of thing you're objecting to, and is obviously useful.
Because Pyhthon defines and allows that, aka known as expression-statements
I quote:
Expression statements are used (mostly interactively) to compute and
write a value, or (usually) to call a procedure (a function that
returns no meaningful result; in Python, procedures return the value
None). Other uses of expression statements are allowed and
occasionally useful.
Like the title says, I'm wondering why the default value in a next expression is initialized regardless of it being used. An example of this is below.
class Test(object):
def __init__(self):
print("Test object created")
print(next((i for i in range(1)), Test()))
Previously I would have thought this to only print out 0 and not Test object created as well as 0. I recently found out this wasn't the case through a bug caused by this behavior.
Is there some reason for this? Was this a design decision?
The design decision here is that next is just an ordinary function, not special language syntax. And in any function call, all arguments are evaluated before the function starts executing. next is nothing special, it just follows the same rules as everyone else.
Function arguments are always evaluated before the function is called, regardless of whether the function ends up needing them. You could design a language that doesn't work that way, but it has far-reaching effects throughout the language design and how functions are used, since you have to worry about the argument expression evaluating to something different if the variables it depends on change before it's evaluated.
Can anyone familiar with Python's internals (CPython, or other implementations) explain why list addition is required to be homogenous:
In [1]: x = [1]
In [2]: x+"foo"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
C:\Users\Marcin\<ipython-input-2-94cd84126ddc> in <module>()
----> 1 x+"foo"
TypeError: can only concatenate list (not "str") to list
In [3]: x+="foo"
In [4]: x
Out[4]: [1, 'f', 'o', 'o']
Why shouldn't the x+"foo" above return the same value as the final value of x in the above transcript?
This question follows on from NPE's question here: Is the behaviour of Python's list += iterable documented anywhere?
Update: I know it is not required that heterogenous += work (but it does) and likewise, it is not required that heterogenous + be an error. This question is about why that latter choice was made.
It is too much to say that the results of adding a sequence to a list are uncertain. If that were a sufficient objection, it would make sense to prevent heterogenous +=. Update2: In particular, python always delegates operator calls to the lefthand operand, so no issue "what is the right thing to do" arises": the left-hand object always governs (unless it delegates to the right).
Update3: For anyone arguing that this is a design decision, please explain (a) why it is not documented; or (b) where it is documented.
Update4: "what should [1] + (2, ) return?" It should return a result value equal with the value of a variable x initially holding [1] immediately after x+=(2, ). This result is well-defined.
From the Zen of Python:
In the face of ambiguity, refuse the temptation to guess.
Let's look at what happens here:
x + y
This gives us a value, but of what type? When we add things in real life, we expect the type to be the same as the input types, but what if they are disparate? Well, in the real world, we refuse to add 1 and "a", it doesn't make sense.
What if we have similar types? In the real world, we look at context. The computer can't do this, so it has to guess. Python picks the left operand and lets that decide. Your issue occurs because of this lack of context.
Say a programmer wants to do ["a"] + "bc" - this could mean they want "abc" or ["a", "b", "c"]. Currently, the solution is to either call "".join() on the first operand or list() on the second, which allows the programmer to do what they want and is clear and explicit.
Your suggestion is for Python to guess (by having a built-in rule to pick a given operand), so the programmer can do the same thing by doing the addition - why is that better? It just means it's easier to get the wrong type by mistake, and we have to remember an arbitrary rule (left operand picks type). Instead, we get an error so we can give Python the information it needs to make the right call.
So why is += different? Well, that's because we are giving Python that context. With the in-place operation we are telling Python to modify a value, so we know that we are dealing with something of the type the value we are modifying is. This is the context Python needs to make the right call, so we don't need to guess.
When I talk about guessing, I'm talking about Python guessing the programmer's intent. This is something Python does a lot - see division in 3.x. / does float division, correcting the error of it being integer division in 2.x.
This is because we are implicitly asking for float division when we try to divide. Python takes this into account and it's operations are done according to that. Likewise, it's about guessing intent here. When we add with + our intent is unclear. When we use +=, it is very clear.
These bug reports suggest that this design quirk was a mistake.
Issue12318:
Yes, this is the expected behavior and yes, it is inconsistent.
It's been that way for a long while and Guido said he wouldn't do it again (it's in his list of regrets). However, we're not going to break code by changing it (list.__iadd__ working like list.extend).
Issue575536:
The intent was that list.__iadd__ correspond exactly to
list.extend(). There's no need to hypergeneralize
list.__add__() too: it's a feature that people who don't
want to get surprised by Martin-like examples can avoid
them by using plain + for lists.
(Of course, there are those of us who find this behaviour quite surprising, including the developer who opened that bug report).
(Thanks to #Mouad for finding these).
I believe Python designers made addition this way so that '+' operator stays consistently commutative with regard to result type: type(a + b) == type(b + a)
Everybody expects that 1 + 2 has the same result as 2 + 1. Would you expect [1] + 'foo' to be the same as 'foo' + [1]? If yes, what should be the result?
You have 3 choices, you either pick left operand as result type, right operand as result type, or raise an error.
+= is not commutative because it contains assignment. In this case you either pick left operand as result type or throw. The surprise here is that a += b is not the same as a = a + b. a += b does not translate in English to "Add a to b and assign result to a". It translates to "Add a to b in place". That's why it doesn't work on immutables such as string or tuple.
Thanks for the comments. Edited the post.
My guess is that Python is strongly typed, and there's not a clear indication of the right thing to do here. Are you asking Python to append the string itself, or to cast the string to a list (which is what you indicated you'd like it to do)?
Remember explicit is better than implicit. In the most common case, neither of those guesses is correct and you're accidentally trying to do something you didn't intent. Raising a TypeError and letting you sort it out is the safest, most Pythonic thing to do here.
I was thinking about this recently since Python 3 is changing print from a statement to a function.
However, Ruby and CoffeeScript take the opposite approach, since you often leave out parentheses from functions, thereby blurring the distinction between keywords/statements and functions. (A function call without parentheses looks a lot like a keyword.)
Generally, what's the difference between a keyword and a function? It seems to me that some keywords are really just functions. For example, return 3 could equally be thought of as return(3) where the return function is implemented natively in the language. Or in JavaScript, typeof is a keyword, but it seems very much like a function, and can be called with parentheses.
Thoughts?
A function is executed within a stack frame, whereas a keyword statement isn't necessarily. A good example is the return statement: If it were a function and would execute in its own stack, there would be no way it could control the execution flow in the way it does.
Keywords and functions are ambiguous. Whether or not parentheses are necessary is completely dependent upon the design of the language syntax.
Consider an integer declaration, for instance:
int my_integer = 4;
vs
my_integer = int(4)
Both of these examples are logically equivalent, but vary by the language syntax.
Programming languages use keywords to reserve their finite number of basic functions. When you write a function, you are extending a language.
Keywords are lower-level building blocks than functions, and can do things that functions can't.
You cite return in your question, which is a good example: In all the languages you mention, there's no way to use a function to provide the same behavior as return x.
In Python, parenthesis are used for function calls, creating tuples or just for defining precedence.
a = (1) #same as a =1
a = (1,) #tuple with one element
print a #prints the value of a
print(a) #same thing, as (a) == a
def foo(x):
return x+1
foo(10) #function call, one element
foo(10,) #function call, also one element
foo 10 #not allowed!
foo(10)*2 #11 times 2 = 22
def foo2(y):
return (y*2)*2 #Not a function call. Same thing as y*4
Also, keywords can't be assigned as values.
def foo(x):
return x**2
foo = 1234 #foo with new value
return = 10 #invalid!
PS: Another use for parenthesis are generators. Just like list comprehensions but they aren't evaluated after creation.
(x**2 for x in range(10))
sum(x+1 for x in [1,2,3]) #Parenthesis used in function call are 'shared' with generator