Related
If I pass a dataframe to a function and modify it inside the function, is it pass-by-value or pass-by-reference?
I run the following code
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
def letgo(df):
df = df.drop('b',axis=1)
letgo(a)
the value of a does not change after the function call. Does it mean it is pass-by-value?
I also tried the following
xx = np.array([[1,2], [3,4]])
def letgo2(x):
x[1,1] = 100
def letgo3(x):
x = np.array([[3,3],[3,3]])
It turns out letgo2() does change xx and letgo3() does not. Why is it like this?
The short answer is, Python always does pass-by-value, but every Python variable is actually a pointer to some object, so sometimes it looks like pass-by-reference.
In Python every object is either mutable or non-mutable. e.g., lists, dicts, modules and Pandas data frames are mutable, and ints, strings and tuples are non-mutable. Mutable objects can be changed internally (e.g., add an element to a list), but non-mutable objects cannot.
As I said at the start, you can think of every Python variable as a pointer to an object. When you pass a variable to a function, the variable (pointer) within the function is always a copy of the variable (pointer) that was passed in. So if you assign something new to the internal variable, all you are doing is changing the local variable to point to a different object. This doesn't alter (mutate) the original object that the variable pointed to, nor does it make the external variable point to the new object. At this point, the external variable still points to the original object, but the internal variable points to a new object.
If you want to alter the original object (only possible with mutable data types), you have to do something that alters the object without assigning a completely new value to the local variable. This is why letgo() and letgo3() leave the external item unaltered, but letgo2() alters it.
As #ursan pointed out, if letgo() used something like this instead, then it would alter (mutate) the original object that df points to, which would change the value seen via the global a variable:
def letgo(df):
df.drop('b', axis=1, inplace=True)
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo(a) # will alter a
In some cases, you can completely hollow out the original variable and refill it with new data, without actually doing a direct assignment, e.g. this will alter the original object that v points to, which will change the data seen when you use v later:
def letgo3(x):
x[:] = np.array([[3,3],[3,3]])
v = np.empty((2, 2))
letgo3(v) # will alter v
Notice that I'm not assigning something directly to x; I'm assigning something to the entire internal range of x.
If you absolutely must create a completely new object and make it visible externally (which is sometimes the case with pandas), you have two options. The 'clean' option would be just to return the new object, e.g.,
def letgo(df):
df = df.drop('b',axis=1)
return df
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
a = letgo(a)
Another option would be to reach outside your function and directly alter a global variable. This changes a to point to a new object, and any function that refers to a afterward will see that new object:
def letgo():
global a
a = a.drop('b',axis=1)
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo() # will alter a!
Directly altering global variables is usually a bad idea, because anyone who reads your code will have a hard time figuring out how a got changed. (I generally use global variables for shared parameters used by many functions in a script, but I don't let them alter those global variables.)
To add to #Mike Graham's answer, who pointed to a very good read:
In your case, what is important to remember is the difference between names and values. a, df, xx, x, are all names, but they refer to the same or different values at different points of your examples:
In the first example, letgo rebinds df to another value, because df.drop returns a new DataFrame unless you set the argument inplace = True (see doc). That means that the name df (local to the letgo function), which was referring to the value of a, is now referring to a new value, here the df.drop return value. The value a is referring to still exists and hasn't changed.
In the second example, letgo2 mutates x, without rebinding it, which is why xx is modified by letgo2. Unlike the previous example, here the local name x always refers to the value the name xx is referring to, and changes that value in place, which is why the value xx is referring to has changed.
In the third example, letgo3 rebinds x to a new np.array. That causes the name x, local to letgo3 and previously referring to the value of xx, to now refer to another value, the new np.array. The value xx is referring to hasn't changed.
The question isn't PBV vs. PBR. These names only cause confusion in a language like Python; they were invented for languages that work like C or like Fortran (as the quintessential PBV and PBR languages). It is true, but not enlightening, that Python always passes by value. The question here is whether the value itself is mutated or whether you get a new value. Pandas usually errs on the side of the latter.
http://nedbatchelder.com/text/names.html explains very well what Python's system of names is.
Python is neither pass by value nor pass by reference. It is pass by assignment.
Supporting reference, the Python FAQ:
https://docs.python.org/3/faq/programming.html#how-do-i-write-a-function-with-output-parameters-call-by-reference
IOW:
If you pass an immutable value, changes to it do not change its
value in the caller - because you are rebinding the name to a new
object.
If you pass a mutable value, changes made in the called function,
also change the value in the caller, so long as you do not rebind
that name to a new object. If you reassign the variable,
creating a new object, that change and subsequent changes to the
name are not seen in the caller.
So if you pass a list, and change its 0th value, that change is seen in both the called and the caller. But if you reassign the list with a new list, this change is lost. But if you slice the list and replace that with a new list, that change is seen in both the called and the caller.
EG:
def change_it(list_):
# This change would be seen in the caller if we left it alone
list_[0] = 28
# This change is also seen in the caller, and replaces the above
# change
list_[:] = [1, 2]
# This change is not seen in the caller.
# If this were pass by reference, this change too would be seen in
# caller.
list_ = [3, 4]
thing = [10, 20]
change_it(thing)
# here, thing is [1, 2]
If you're a C fan, you can think of this as passing a pointer by value - not a pointer to a pointer to a value, just a pointer to a value.
HTH.
Here is the doc for drop:
Return new object with labels in requested axis removed.
So a new dataframe is created. The original has not changed.
But as for all objects in python, the data frame is passed to the function by reference.
you need to make 'a' global at the start of the function otherwise it is a local variable and does not change the 'a' in the main code.
Short answer:
By value: df2 = df.copy()
By references : df2 = df
I am aware that numeric values are immutable in python. I have also read how everything is an object in python. I just want to know if numeric types are also objects in python. Because if they are objects, then the variables are actually reference variables right? Does it mean that if I pass a number to a function and modify it inside a function, then two number objects with two references are created? Is there a concept of primitive data types in python?
Note: I too was thinking it as objects. But visualizing in python tutor says differnt:
http://www.pythontutor.com/visualize.html#mode=edit
def test(a):
a+=10
b=100
test(b)
Or is it a defect in the visualization tool?
Are numeric types objects?
>>> isinstance(1, object)
True
Apparently they are. :-).
Note that you might need to adjust your mental model of an object a little. It seems to me that you're thinking of object as something that is "mutable" -- that isn't the case. In reality, we need to think of python names as a reference to an object. That object may hold references to other objects.
name = something
Here, the right hand side is evaluated -- All the names are resolved into objects and the result of the expression (an object) is referenced by "name".
Ok, now lets consider what happens when you pass something to a function.
def foo(x):
x = 2
z = 3
foo(z)
print(z)
What do we expect to happen here? Well, first we create the function foo. Next, we create the object 3 and reference it by the name z. After that, we look up the value that z references and pass that value to foo. Upon entering foo, that value gets referenced by the (local) name x. We then create the object 2 and reference it by the local name x. Note, x has nothing to do with the global z -- They're independent references. Just because they were referencing the same object when you enter the function doesn't mean that they have to reference the function for all time. We can change what a name references at any point by using an assignment statement.
Note, your example with += may seem to complicate things, but you can think of a += 10 as a = a + 10 if it helps in this context. For more information on += check out: When is "i += x" different from "i = i + x" in Python?
Everything in Python is an object, and that includes the numbers. There are no "primitive" types, only built-in types.
Numbers, however, are immutable. When you perform an operation with a number, you are creating a new number object.
We all know the dogma that global variables are bad. As I began to learn python I read parameters passed to functions are treated as local variables inside the funktion. This seems to be at least half of the truth:
def f(i):
print("Calling f(i)...")
print("id(i): {}\n".format(id(i)))
print("Inside f(): i += 1")
i += 1
print("id(i): {}".format(id(i)))
return
i = 1
print("\nBefore function call...")
print("id(i): {}\n".format(id(i)))
f(i)
This evaluates to:
Before function call...
id(i): 507107200
Calling f(i)...
id(i): 507107200
Inside f(): i += 1
id(i): 507107232
As I read now, the calling mechanism of functions in Python is "Call by object reference". This means an argument is initially passed by it's object reference, but if it is modified inside the function, a new object variable is created. This seems reasonable to me to avoid a design in which functions unintendedly modify global variables.
But what happens if we pass a list as an argument?
def g(l):
print("Calling f(l)...")
print("id(l): {}\n".format(id(l)))
print("Inside f(): l[0] += 1")
l[0] += 1
print("id(l): {}".format(id(l)))
return
l = [1, 2, 3]
print("\nBefore function call...")
print("id(l): {}\n".format(id(l)))
g(l)
This results in:
Before function call...
id(l): 120724616
Calling f(l)...
id(l): 120724616
Inside f(): l[0] += 1
id(l): 120724616
As we can see, the object reference remains the same! So we work on a global variable, don't we?
I know we can easily overcome this by passing a copy of the list to the function with:
g(l[:])
But my question is: What is the reason the implement two different behaviors of function parameters in Python? If we intend to manipulate a global variable, we could also use the "global"-keyword for list like we would do for integers, couldn't we? How is this behavior consistent with the zen of python "explicit is better than implicit"?
Python has two types of objects - mutable and inmutable. Most of build-in types, like int, string or float, are inmutable. This means they cannot change. Types like list, dict or array are mutable, which means that their state can be changed. Almost all user defined objects are mutable too.
When you do i += 1, you assign a new value to i, which is i + 1. This doesn't mutate i in any way, it just says that it should forget i and replace it with value of i + 1. Then i becomes replaced by a completely new object.
But when you do i[0] += 1 in list, you say to the list that is should replace element 0 with i[0] + 1. This means that id(i[0]) will be changed with new object, and the state of list i will change, but it's identity remains the same - it's the same object it was, only changed.
Note that in Python this is not true for strings, as they are immutable and changing one element will copy the string with updated values and create new object.
Why are int & list function parameters differently treated?
They are not. All parameters are treated the same, regardless of type.
You are seeing different behavior between the two cases because you are doing different things to l.
First, let's simplify the += into an = and a +: l = l + 1 in the first case, and l[0] = l[0] + 1 in the second. (+= doesn't always equal an assignment and +; it depends on the runtime class of the object on the left side, which can override it; but here, for ints, it is equivalent to an assignment and +.) Also, the right side of the assignment just reads stuff and is not interesting, so let's just ignore it for now; so you have:
l = something (in the first case)
l[0] = something (in the second case)
The second one is "assigning to an element", which is actually syntactic sugar for a call to the method . __setitem__():
l.__setitem__(0, something)
So now you can see the difference between the two --
In the first case, you are assigning to the variable l. Python is pass-by-value, so this has no effect on outside code. Assigning to the variable simply makes it point to a new object; it has no effect on the object that it used to point to. If you had assigned something to l in the second case, it would also have had no effect on the original object.
In the second case, you are calling a method on the object pointed to by l. This method happens to be a mutating method on lists, and so modifies the contents of the list object, the original list object a pointer to which was passed in to the method. It is true that int (the runtime class of l in the first case) happens to have no methods that are mutating, but that is besides the point.
If you had done the same thing to l in both cases (if that were possible), then you can expect the same semantics.
This is pretty common across a bunch of languages (Ruby, for example).
The variable itself is scoped to the function. But that variable is just a pointer to an object floating around in memory somewhere -- and that object can be changed.
In Python everything is an object, and hence everything is represented by reference. The most notable thing about variables in Python is that they contain references to objects, not the objects themselves. Now, when arguments are passed to functions, they are passed by reference. Consequently, Inside the scope of a function, every parameter is assigned to the reference of the argument and then treated as a local variable inside the function. When you assign a new value to a parameter, you are changing the object it refers to, and so you have a new object and any changes to it (even if it's a mutable object) will not be seen outside the scope of the function in question, and not related anyway to the passed argument. That said, when you don't assign a new reference to the parameter, it stays holding the reference of the argument, and any changes to it (if and only if it's mutable) will be seen outside the scope of the function.
IMO python is pass by value if the parameter is basic types, like number, boolean
func_a(bool_value):
bool_value = True
Will not change the outside bool_value, right?
So my question is how can I make the bool_value change takes effect in the outside one(pass by reference?
You can use a list to enclose the inout variable:
def func(container):
container[0] = True
container = [False]
func(container)
print container[0]
The call-by-value/call-by-reference misnomer is an old debate. Python's semantics are more accurately described by CLU's call-by-sharing. See Fredrik Lundh's write up of this for more detail:
Call By Object
Python (always), like Java (mostly) passes arguments (and, in simple assignment, binds names) by object reference. There is no concept of "pass by value", neither does any concept of "reference to a variables" -- only reference to a value (some express this by saying that Python doesn't have "variables"... it has names, which get bound to values -- and that is all that can ever happen).
Mutable objects can have mutating methods (some of which look like operators or even assignment, e.g a.b = c actually means type(a).__setattr__(a, 'b', c), which calls a method which may likely be a mutating ones).
But simple assignment to a barename (and argument passing, which is exactly the same as simple assignment to a barename) never has anything at all to do with any mutating methods.
Quite independently of the types involved, simple barename assignment (and, identically, argument passing) only ever binds or rebinds the specific name on the left of the =, never affecting any other name nor any object in any way whatsoever. You're very mistaken if you believe that types have anything to do with the semantics of argument passing (or, identically, simple assignment to barenames).
Unmutable types can't, but if you send a user-defined class instance, a list or a dictionary, you can change it and keep with only one object.
Like this:
def add1(my_list):
my_list.append(1)
a = []
add1(a)
print a
But, if you do my_list = [1], you obtain a new instance, losing the original reference inside the function, that's why you can't just do "my_bool = False" and hope that outside of the function your variable get that False
"Learning Python, 4th Ed." mentions that:
the enclosing scope variable is looked up when the nested functions
are later called..
However, I thought that when a function exits, all of its local references disappear.
def makeActions():
acts = []
for i in range(5): # Tries to remember each i
acts.append(lambda x: i ** x) # All remember same last i!
return acts
makeActions()[n] is the same for every n because the variable i is somehow looked up at call time. How does Python look up this variable? Shouldn't it not exist at all because makeActions has already exited? Why doesn't Python do what the code intuitively suggests, and define each function by replacing i with its current value within the for loop as the loop is running?
I think it's pretty obvious what happens when you think of i as a name not some sort of value. Your lambda function does something like "take x: look up the value of i, calculate i**x" ... so when you actually run the function, it looks up i just then so i is 4.
You can also use the current number, but you have to make Python bind it to another name:
def makeActions():
def make_lambda( j ):
return lambda x: j * x # the j here is still a name, but now it wont change anymore
acts = []
for i in range(5):
# now you're pushing the current i as a value to another scope and
# bind it there, under a new name
acts.append(make_lambda(i))
return acts
It might seem confusing, because you often get taught that a variable and it's value are the same thing -- which is true, but only in languages that actually use variables. Python has no variables, but names instead.
About your comment, actually i can illustrate the point a bit better:
i = 5
myList = [i, i, i]
i = 6
print(myList) # myList is still [5, 5, 5].
You said you changed i to 6, that is not what actually happend: i=6 means "i have a value, 6 and i want to name it i". The fact that you already used i as a name matters nothing to Python, it will just reassign the name, not change it's value (that only works with variables).
You could say that in myList = [i, i, i], whatever value i currently points to (the number 5) gets three new names: mylist[0], mylist[1], mylist[2]. That's the same thing that happens when you call a function: The arguments are given new names. But that is probably going against any intuition about lists ...
This can explain the behavior in the example: You assign mylist[0]=5, mylist[1]=5, mylist[2]=5 - no wonder they don't change when you reassign the i. If i was something muteable, for example a list, then changing i would reflect on all entries in myList too, because you just have different names for the same value!
The simple fact that you can use mylist[0] on the left hand of a = proves that it is indeed a name. I like to call = the assign name operator: It takes a name on the left, and a expression on the right, then evaluates the expression (call function, look up the values behind names) until it has a value and finally gives the name to the value. It does not change anything.
For Marks comment about compiling functions:
Well, references (and pointers) only make sense when we have some sort of addressable memory. The values are stored somewhere in memory and references lead you that place. Using a reference means going to that place in memory and doing something with it. The problem is that none of these concepts are used by Python!
The Python VM has no concept of memory - values float somewhere in space and names are little tags connected to them (by a little red string). Names and values exist in separate worlds!
This makes a big difference when you compile a function. If you have references, you know the memory location of the object you refer to. Then you can simply replace then reference with this location.
Names on the other hand have no location, so what you have to do (during runtime) is follow that little red string and use whatever is on the other end. That is the way Python compiles functions: Where
ever there is a name in the code, it adds a instruction that will figure out what that name stands for.
So basically Python does fully compile functions, but names are compiled as lookups in the nesting namespaces, not as some sort of reference to memory.
When you use a name, the Python compiler will try to figure out where to which namespace it belongs to. This results in a instruction to load that name from the namespace it found.
Which brings you back to your original problem: In lambda x:x**i, the i is compiled as a lookup in the makeActions namespace (because i was used there). Python has no idea, nor does it care about the value behind it (it does not even have to be a valid name). One that code runs the i gets looked up in it's original namespace and gives the more or less expected value.
What happens when you create a closure:
The closure is constructed with a pointer to the frame (or roughly, block) that it was created in: in this case, the for block.
The closure actually assumes shared ownership of that frame, by incrementing the frame's ref count and stashing the pointer to that frame in the closure. That frame, in turn, keeps around references to the frames it was enclosed in, for variables that were captured further up the stack.
The value of i in that frame keeps changing as long as the for loop is running – each assignment to i updates the binding of i in that frame.
Once the for loop exits, the frame is popped off the stack, but it isn't thrown away as it might usually be! Instead, it's kept around because the closure's reference to the frame is still active. At this point, though, the value of i is no longer updated.
When the closure is invoked, it picks up whatever value of i is in the parent frame at the time of invocation. Since in the for loop you create closures, but don't actually invoke them, the value of i upon invocation will be the last value it had after all the looping was done.
Future calls to makeActions will create different frames. You won't reuse the for loop's previous frame, or update that previous frame's i value, in that case.
In short: frames are garbage-collected just like other Python objects, and in this case, an extra reference is kept around to the frame corresponding to the for block so it doesn't get destroyed when the for loop goes out of scope.
To get the effect you want, you need to have a new frame created for each value of i you want to capture, and each lambda needs to be created with a reference to that new frame. You won't get that from the for block itself, but you could get that from a call to a helper function which will establish the new frame. See THC4k's answer for one possible solution along these lines.
The local references persist because they're contained in the local scope, which the closure keeps a reference to.
I thought that when a function exits, all of its local references disappear.
Except for those locals which are closed over in a closure. Those do not disappear, even when the function to which they are local has returned.
Intuitively one might think i would be captured in its current state but that is not the case. Think of each layer as a dictionary of name value pairs.
Level 1:
acts
i
Level 2:
x
Every time you create a closure for the inner lambda you are capturing a reference to level one. I can only assume that the run-time will perform a look-up of the variable i, starting in level 2 and making its way to level 1. Since you are not executing these functions immediately they will all use the final value of i.
Experts?