This question already has answers here:
"Least Astonishment" and the Mutable Default Argument
(33 answers)
Closed 6 months ago.
I had a very difficult time with understanding the root cause of a problem in an algorithm. Then, by simplifying the functions step by step I found out that evaluation of default arguments in Python doesn't behave as I expected.
The code is as follows:
class Node(object):
def __init__(self, children = []):
self.children = children
The problem is that every instance of Node class shares the same children attribute, if the attribute is not given explicitly, such as:
>>> n0 = Node()
>>> n1 = Node()
>>> id(n1.children)
Out[0]: 25000176
>>> id(n0.children)
Out[0]: 25000176
I don't understand the logic of this design decision? Why did Python designers decide that default arguments are to be evaluated at definition time? This seems very counter-intuitive to me.
The alternative would be quite heavyweight -- storing "default argument values" in the function object as "thunks" of code to be executed over and over again every time the function is called without a specified value for that argument -- and would make it much harder to get early binding (binding at def time), which is often what you want. For example, in Python as it exists:
def ack(m, n, _memo={}):
key = m, n
if key not in _memo:
if m==0: v = n + 1
elif n==0: v = ack(m-1, 1)
else: v = ack(m-1, ack(m, n-1))
_memo[key] = v
return _memo[key]
...writing a memoized function like the above is quite an elementary task. Similarly:
for i in range(len(buttons)):
buttons[i].onclick(lambda i=i: say('button %s', i))
...the simple i=i, relying on the early-binding (definition time) of default arg values, is a trivially simple way to get early binding. So, the current rule is simple, straightforward, and lets you do all you want in a way that's extremely easy to explain and understand: if you want late binding of an expression's value, evaluate that expression in the function body; if you want early binding, evaluate it as the default value of an arg.
The alternative, forcing late binding for both situation, would not offer this flexibility, and would force you to go through hoops (such as wrapping your function into a closure factory) every time you needed early binding, as in the above examples -- yet more heavy-weight boilerplate forced on the programmer by this hypothetical design decision (beyond the "invisible" ones of generating and repeatedly evaluating thunks all over the place).
In other words, "There should be one, and preferably only one, obvious way to do it [1]": when you want late binding, there's already a perfectly obvious way to achieve it (since all of the function's code is only executed at call time, obviously everything evaluated there is late-bound); having default-arg evaluation produce early binding gives you an obvious way to achieve early binding as well (a plus!-) rather than giving TWO obvious ways to get late binding and no obvious way to get early binding (a minus!-).
[1]: "Although that way may not be obvious at first unless you're Dutch."
The issue is this.
It's too expensive to evaluate a function as an initializer every time the function is called.
0 is a simple literal. Evaluate it once, use it forever.
int is a function (like list) that would have to be evaluated each time it's required as an initializer.
The construct [] is literal, like 0, that means "this exact object".
The problem is that some people hope that it to means list as in "evaluate this function for me, please, to get the object that is the initializer".
It would be a crushing burden to add the necessary if statement to do this evaluation all the time. It's better to take all arguments as literals and not do any additional function evaluation as part of trying to do a function evaluation.
Also, more fundamentally, it's technically impossible to implement argument defaults as function evaluations.
Consider, for a moment the recursive horror of this kind of circularity. Let's say that instead of default values being literals, we allow them to be functions which are evaluated each time a parameter's default values are required.
[This would parallel the way collections.defaultdict works.]
def aFunc( a=another_func ):
return a*2
def another_func( b=aFunc ):
return b*3
What is the value of another_func()? To get the default for b, it must evaluate aFunc, which requires an eval of another_func. Oops.
Of course in your situation it is difficult to understand. But you must see, that evaluating default args every time would lay a heavy runtime burden on the system.
Also you should know, that in case of container types this problem may occur -- but you could circumvent it by making the thing explicit:
def __init__(self, children = None):
if children is None:
children = []
self.children = children
The workaround for this, discussed here (and very solid), is:
class Node(object):
def __init__(self, children = None):
self.children = [] if children is None else children
As for why look for an answer from von Löwis, but it's likely because the function definition makes a code object due to the architecture of Python, and there might not be a facility for working with reference types like this in default arguments.
I thought this was counterintuitive too, until I learned how Python implements default arguments.
A function's an object. At load time, Python creates the function object, evaluates the defaults in the def statement, puts them into a tuple, and adds that tuple as an attribute of the function named func_defaults. Then, when a function is called, if the call doesn't provide a value, Python grabs the default value out of func_defaults.
For instance:
>>> class C():
pass
>>> def f(x=C()):
pass
>>> f.func_defaults
(<__main__.C instance at 0x0298D4B8>,)
So all calls to f that don't provide an argument will use the same instance of C, because that's the default value.
As far as why Python does it this way: well, that tuple could contain functions that would get called every time a default argument value was needed. Apart from the immediately obvious problem of performance, you start getting into a universe of special cases, like storing literal values instead of functions for non-mutable types to avoid unnecessary function calls. And of course there are performance implications galore.
The actual behavior is really simple. And there's a trivial workaround, in the case where you want a default value to be produced by a function call at runtime:
def f(x = None):
if x == None:
x = g()
This comes from python's emphasis on syntax and execution simplicity. a def statement occurs at a certain point during execution. When the python interpreter reaches that point, it evaluates the code in that line, and then creates a code object from the body of the function, which will be run later, when you call the function.
It's a simple split between function declaration and function body. The declaration is executed when it is reached in the code. The body is executed at call time. Note that the declaration is executed every time it is reached, so you can create multiple functions by looping.
funcs = []
for x in xrange(5):
def foo(x=x, lst=[]):
lst.append(x)
return lst
funcs.append(foo)
for func in funcs:
print "1: ", func()
print "2: ", func()
Five separate functions have been created, with a separate list created each time the function declaration was executed. On each loop through funcs, the same function is executed twice on each pass through, using the same list each time. This gives the results:
1: [0]
2: [0, 0]
1: [1]
2: [1, 1]
1: [2]
2: [2, 2]
1: [3]
2: [3, 3]
1: [4]
2: [4, 4]
Others have given you the workaround, of using param=None, and assigning a list in the body if the value is None, which is fully idiomatic python. It's a little ugly, but the simplicity is powerful, and the workaround is not too painful.
Edited to add: For more discussion on this, see effbot's article here: http://effbot.org/zone/default-values.htm, and the language reference, here: http://docs.python.org/reference/compound_stmts.html#function
I'll provide a dissenting opinion, by addessing the main arguments in the other posts.
Evaluating default arguments when the function is executed would be bad for performance.
I find this hard to believe. If default argument assignments like foo='some_string' really add an unacceptable amount of overhead, I'm sure it would be possible to identify assignments to immutable literals and precompute them.
If you want a default assignment with a mutable object like foo = [], just use foo = None, followed by foo = foo or [] in the function body.
While this may be unproblematic in individual instances, as a design pattern it's not very elegant. It adds boilerplate code and obscures default argument values. Patterns like foo = foo or ... don't work if foo can be an object like a numpy array with undefined truth value. And in situations where None is a meaningful argument value that may be passed intentionally, it can't be used as a sentinel and this workaround becomes really ugly.
The current behaviour is useful for mutable default objects that should be shared accross function calls.
I would be happy to see evidence to the contrary, but in my experience this use case is much less frequent than mutable objects that should be created anew every time the function is called. To me it also seems like a more advanced use case, whereas accidental default assignments with empty containers are a common gotcha for new Python programmers. Therefore, the principle of least astonishment suggests default argument values should be evaluated when the function is executed.
In addition, it seems to me that there exists an easy workaround for mutable objects that should be shared across function calls: initialise them outside the function.
So I would argue that this was a bad design decision. My guess is that it was chosen because its implementation is actually simpler and because it has a valid (albeit limited) use case. Unfortunately, I don't think this will ever change, since the core Python developers want to avoid a repeat of the amount of backwards incompatibility that Python 3 introduced.
Python function definitions are just code, like all the other code; they're not "magical" in the way that some languages are. For example, in Java you could refer "now" to something defined "later":
public static void foo() { bar(); }
public static void main(String[] args) { foo(); }
public static void bar() {}
but in Python
def foo(): bar()
foo() # boom! "bar" has no binding yet
def bar(): pass
foo() # ok
So, the default argument is evaluated at the moment that that line of code is evaluated!
Because if they had, then someone would post a question asking why it wasn't the other way around :-p
Suppose now that they had. How would you implement the current behaviour if needed? It's easy to create new objects inside a function, but you cannot "uncreate" them (you can delete them, but it's not the same).
This question already has answers here:
Correct Style for Python functions that mutate the argument
(4 answers)
Closed 6 years ago.
I want to do some list modification in a function and then continue to use the modified list after calling the function, which is the better approach to do this:
def modify(alist):
alist.append(4)
alist = [1,2,3]
modify(alist)
alist.append(5)
Or this:
def modify(alist):
alist.append(4)
return alist
alist = [1,2,3]
alist = modify(alist)
alist.append(5)
Is the first some kind of bad tone?
Ordinarily a function should return the results it generates. However when you pass a list, you either need to make a copy of the list or accept the fact that it will be modified; it's redundant to return the modified list. It also leads to a problem if you ever provide a default argument, since the default will also be modified.
I generally prefer to make arguments read-only unless it's obvious that they will be modified inplace.
My recommendation:
def modify(alist=[]):
alist = alist[:] # make a copy
alist.append(4)
return alist
Since you're modifying the list inplace, it would not make much sense to return the same list you have modified.
Your inplace mutation propagates from the reference to the list, so better to leave out the return statement and raise an exception if an error occurs somewhere in the function, but not return the modified object.
There should be one-- and preferably only one --obvious way to do it.
It's safe to return the list from a function.
Formally, both approaches fullfil the task you want - to modify a list.
But, the second approach with function return some value which is stored in your variable is safer, and such code is much-much easier to maintain and develop, especially if you have many lists to modify, for example. Or in general you have a code that solves some difficult task, for which you need a lot of functions and variables. "Safe" means you don't have to worry about variable name conflict inside you code - everything you create inside your function namespace, stays local (except the case when you create a class attribute). So, it's generally considered a better practice. Good luck!
This question already has answers here:
How do I pass a variable by reference?
(39 answers)
Closed 8 years ago.
Since Python doesn't have pointers, I am wondering how I can pass a reference to an object through to a function instead of copying the entire object. This is a very contrived example, but say I am writing a function like this:
def some_function(x):
c = x/2 + 47
return c
y = 4
z = 12
print some_function(y)
print some_function(z)
From my understanding, when I call some_function(y), Python allocates new space to store the argument value, then erases this data once the function has returned c and it's no longer needed. Since I am not actually altering the argument within some_function, how can I simply reference y from within the function instead of copying y when I pass it through? In this case it doesn't matter much, but if y was very large (say a giant matrix), copying it could eat up some significant time and space.
Your understanding is, unfortunately, completely wrong. Python does not copy the value, nor does it allocate space for a new one. It passes a value which is itself a reference to the object. If you modify that object (rather than rebinding its name), then the original will be modified.
Edit
I wish you would stop worrying about memory allocation: Python is not C++, almost all of the time you don't need to think about memory.
It's easier to demonstrate rebinding via the use of something like a list:
def my_func(foo):
foo.append(3) # now the source list also has the number 3
foo = [3] # we've re-bound 'foo' to something else, severing the relationship
foo.append(4) # the source list is unaffected
return foo
original = [1, 2]
new = my_func(original)
print original # [1, 2, 3]
print new # [3, 4]
It might help if you think in terms of names rather than variables: inside the function, the name "foo" starts off being a reference to the original list, but then we change that name to point to a new, different list.
Python parameters are always "references".
The way parameters in Python works and the way they are explained on the docs can be confusing and misleading to newcomers to the languages, specially if you have a background on other languages which allows you to choose between "pass by value" and "pass by reference".
In Python terms, a "reference" is just a pointer with some more metadata to help the garbage collector do its job. And every variable and every parameter are always "references".
So, internally, Python pass a "pointer" to each parameter. You can easily see this in this example:
>>> def f(L):
... L.append(3)
...
>>> X = []
>>> f(X)
>>> X
[3]
The variable X points to a list, and the parameter L is a copy of the "pointer" of the list, and not a copy of the list itself.
Take care to note that this is not the same as "pass-by-reference" as C++ with the & qualifier, or pascal with the var qualifier.
I am trying to understand how exactly assignment operators, constructors and parameters passed in functions work in python specifically with lists and objects. I have a class with a list as a parameter. I want to initialize it to an empty list and then want to populate it using the constructor. I am not quite sure how to do it.
Lets say my class is --
class A:
List = [] # Point 1
def __init1__(self, begin=[]): # Point 2
for item in begin:
self.List.append(item)
def __init2__(self, begin): # Point 3
List = begin
def __init3__(self, begin=[]): # Point 4
List = list()
for item in begin:
self.List.append(item)
listObj = A()
del(listObj)
b = listObj
I have the following questions. It will be awesome if someone could clarify what happens in each case --
Is declaring an empty like in Point 1 valid? What is created? A variable pointing to NULL?
Which of Point 2 and Point 3 are valid constructors? In Point 3 I am guessing that a new copy of the list passed in (begin) is not made and instead the variable List will be pointing to the pointer "begin". Is a new copy of the list made if I use the constructor as in Point 2?
What happens when I delete the object using del? Is the list deleted as well or do I have to call del on the List before calling del on the containing object? I know Python uses GC but if I am concerned about cleaning unused memory even before GC kicks in is it worth it?
Also assigning an object of type A to another only makes the second one point to the first right? If so how do I do a deep copy? Is there a feature to overload operators? I know python is probably much simpler than this and hence the question.
EDIT:
5. I just realized that using Point 2 and Point 3 does not make a difference. The items from the list begin are only copied by reference and a new copy is not made. To do that I have to create a new list using list(). This makes sense after I see it I guess.
Thanks!
In order:
using this form is simply syntactic sugar for calling the list constructor - i.e. you are creating a new (empty) list. This will be bound to the class itself (is a static field) and will be the same for all instances.
apart from the constructor name which must always be init, both are valid forms, but mean different things.
The first constructor can be called with a list as argument or without. If it is called without arguments, the empty list passed as default is used within (this empty list is created once during class definition, and not once per constructor call), so no items are added to the static list.
The second must be called with a list parameter, or python will complain with an error, but using it without the self. prefix like you are doing, it would just create a new local variable name List, accessible only within the constructor, and leave the static A.List variable unchanged.
Deleting will only unlink a reference to the object, without actually deleting anything. Once all references are removed, however, the garbage collector is free to clear the memory as needed.
It is usually a bad idea to try to control the garbage collector. instead. just make sure you don't hold references to objects you no longer need and let it make its work.
Assigning a variable with an object will only create a new reference to the same object, yes. To create a deep copy use the related functions or write your own.
Operator overloading (use with care, it can make things more confusing instead of clearer if misused) can be done by overriding some special methods in the class definition.
About your edit: like i pointed above, when writing List=list() inside the constructor, without the self. (or better, since the variable is static, A.) prefix, you are just creating an empty variable, and not overriding the one you defined in the class body.
For reference, the usual way to handle a list as default argument is by using a None placeholder:
class A(object):
def __init__(self, arg=None):
self.startvalue = list(arg) if arg is not None else list()
# making a defensive copy of arg to keep the original intact
As an aside, do take a look at the python tutorial. It is very well written and easy to follow and understand.
"It will be awesome if someone could clarify what happens in each case" isn't that the purpose of the dis module ?
http://docs.python.org/2/library/dis.html
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Python: How do I pass a variable by reference?
I'm trying to write a function that modifies one of the passed parameters. Here's the current code:
def improve_guess(guess, num):
return (guess + (num/guess)) / 2
x = 4.0
guess = 2.0
guess = improve_guess(guess, x)
However, I want to write the code in such a way that I don't have to do the final assignment. That way, I can just call:
improve_guess(guess,x)
and get the new value in guess.
(I intentionally didn't mention passing-by-reference because during my net-searching, I found a lot of academic discussion about the topic but no clean way of doing this. I don't really want to use globals or encapsulation in a list for this.)
You can't do this directly since integer and floating-point types are immutable in Python.
You could wrap guess into a mutable structure of some sort (e.g. a list or a custom class), but that would get very ugly very quickly.
P.S. I personally really like the explicit nature of guess = improve_guess(guess, x) since it leaves no doubt as to what exactly is being modified. I don't even need to know anything about improve_guess() to figure that out.
But those are the only two ways to do it: either use a global, or use a mutable type like a list. You can't modify a non-mutable variable in Python: doing so just rebinds the local name to a new value.
If you really don't like wrapping in a list, you could create your own class and pass around an instance that will be mutable, but I can't see any benefit in doing that.