We all know the dogma that global variables are bad. As I began to learn python I read parameters passed to functions are treated as local variables inside the funktion. This seems to be at least half of the truth:
def f(i):
print("Calling f(i)...")
print("id(i): {}\n".format(id(i)))
print("Inside f(): i += 1")
i += 1
print("id(i): {}".format(id(i)))
return
i = 1
print("\nBefore function call...")
print("id(i): {}\n".format(id(i)))
f(i)
This evaluates to:
Before function call...
id(i): 507107200
Calling f(i)...
id(i): 507107200
Inside f(): i += 1
id(i): 507107232
As I read now, the calling mechanism of functions in Python is "Call by object reference". This means an argument is initially passed by it's object reference, but if it is modified inside the function, a new object variable is created. This seems reasonable to me to avoid a design in which functions unintendedly modify global variables.
But what happens if we pass a list as an argument?
def g(l):
print("Calling f(l)...")
print("id(l): {}\n".format(id(l)))
print("Inside f(): l[0] += 1")
l[0] += 1
print("id(l): {}".format(id(l)))
return
l = [1, 2, 3]
print("\nBefore function call...")
print("id(l): {}\n".format(id(l)))
g(l)
This results in:
Before function call...
id(l): 120724616
Calling f(l)...
id(l): 120724616
Inside f(): l[0] += 1
id(l): 120724616
As we can see, the object reference remains the same! So we work on a global variable, don't we?
I know we can easily overcome this by passing a copy of the list to the function with:
g(l[:])
But my question is: What is the reason the implement two different behaviors of function parameters in Python? If we intend to manipulate a global variable, we could also use the "global"-keyword for list like we would do for integers, couldn't we? How is this behavior consistent with the zen of python "explicit is better than implicit"?
Python has two types of objects - mutable and inmutable. Most of build-in types, like int, string or float, are inmutable. This means they cannot change. Types like list, dict or array are mutable, which means that their state can be changed. Almost all user defined objects are mutable too.
When you do i += 1, you assign a new value to i, which is i + 1. This doesn't mutate i in any way, it just says that it should forget i and replace it with value of i + 1. Then i becomes replaced by a completely new object.
But when you do i[0] += 1 in list, you say to the list that is should replace element 0 with i[0] + 1. This means that id(i[0]) will be changed with new object, and the state of list i will change, but it's identity remains the same - it's the same object it was, only changed.
Note that in Python this is not true for strings, as they are immutable and changing one element will copy the string with updated values and create new object.
Why are int & list function parameters differently treated?
They are not. All parameters are treated the same, regardless of type.
You are seeing different behavior between the two cases because you are doing different things to l.
First, let's simplify the += into an = and a +: l = l + 1 in the first case, and l[0] = l[0] + 1 in the second. (+= doesn't always equal an assignment and +; it depends on the runtime class of the object on the left side, which can override it; but here, for ints, it is equivalent to an assignment and +.) Also, the right side of the assignment just reads stuff and is not interesting, so let's just ignore it for now; so you have:
l = something (in the first case)
l[0] = something (in the second case)
The second one is "assigning to an element", which is actually syntactic sugar for a call to the method . __setitem__():
l.__setitem__(0, something)
So now you can see the difference between the two --
In the first case, you are assigning to the variable l. Python is pass-by-value, so this has no effect on outside code. Assigning to the variable simply makes it point to a new object; it has no effect on the object that it used to point to. If you had assigned something to l in the second case, it would also have had no effect on the original object.
In the second case, you are calling a method on the object pointed to by l. This method happens to be a mutating method on lists, and so modifies the contents of the list object, the original list object a pointer to which was passed in to the method. It is true that int (the runtime class of l in the first case) happens to have no methods that are mutating, but that is besides the point.
If you had done the same thing to l in both cases (if that were possible), then you can expect the same semantics.
This is pretty common across a bunch of languages (Ruby, for example).
The variable itself is scoped to the function. But that variable is just a pointer to an object floating around in memory somewhere -- and that object can be changed.
In Python everything is an object, and hence everything is represented by reference. The most notable thing about variables in Python is that they contain references to objects, not the objects themselves. Now, when arguments are passed to functions, they are passed by reference. Consequently, Inside the scope of a function, every parameter is assigned to the reference of the argument and then treated as a local variable inside the function. When you assign a new value to a parameter, you are changing the object it refers to, and so you have a new object and any changes to it (even if it's a mutable object) will not be seen outside the scope of the function in question, and not related anyway to the passed argument. That said, when you don't assign a new reference to the parameter, it stays holding the reference of the argument, and any changes to it (if and only if it's mutable) will be seen outside the scope of the function.
Related
How does Python manage to locate a global/nonlocal variable so quickly even if the recursion is thousands of layers deep? Also why do only need global/nonlocal for primitive data types and not pointers?
How does Python manage to locate a global/nonlocal variable so quickly even if the recursion is thousands of layers deep?
A useful reference (as far as CPython goes) is: TenThousandMeters. The key points from this is that (1) globals are accessible through the bytecode instruction LOAD_GLOBAL, (2) declared nonlocals are 'cell variables' directly accessible to a code block. The VM doesn't need to somehow iterate through the previous frames on the call stack to get to them.
Also why do only need global/nonlocal for primitive data types and not
pointers?
I think what you're getting at here is: why can we do something like my_list.append(42) if my_list comes from an enclosing scope, without needing a nonlocal declaration, whereas if we do x = x + 42 we do need to have declared nonlocal x?
The key point here is that x = x + 42 reassigns the name x to refer to a different object and requires a nonlocal declaration. But my_list.append(42) doesn't reassign a name. my_list refers to the same object before and after the statement: the effect of the statement is to perform an operation on that mutable object, and nonlocal isn't needed here: just like it isn't when you call a method on my_list which happens to leave it unchanged.
This isn't exactly about primitive data types vs others. For example, my_list = [i+2 for i in my_list] does reassign my_list to point to a newly created object, and needs nonlocal.
If I pass a dataframe to a function and modify it inside the function, is it pass-by-value or pass-by-reference?
I run the following code
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
def letgo(df):
df = df.drop('b',axis=1)
letgo(a)
the value of a does not change after the function call. Does it mean it is pass-by-value?
I also tried the following
xx = np.array([[1,2], [3,4]])
def letgo2(x):
x[1,1] = 100
def letgo3(x):
x = np.array([[3,3],[3,3]])
It turns out letgo2() does change xx and letgo3() does not. Why is it like this?
The short answer is, Python always does pass-by-value, but every Python variable is actually a pointer to some object, so sometimes it looks like pass-by-reference.
In Python every object is either mutable or non-mutable. e.g., lists, dicts, modules and Pandas data frames are mutable, and ints, strings and tuples are non-mutable. Mutable objects can be changed internally (e.g., add an element to a list), but non-mutable objects cannot.
As I said at the start, you can think of every Python variable as a pointer to an object. When you pass a variable to a function, the variable (pointer) within the function is always a copy of the variable (pointer) that was passed in. So if you assign something new to the internal variable, all you are doing is changing the local variable to point to a different object. This doesn't alter (mutate) the original object that the variable pointed to, nor does it make the external variable point to the new object. At this point, the external variable still points to the original object, but the internal variable points to a new object.
If you want to alter the original object (only possible with mutable data types), you have to do something that alters the object without assigning a completely new value to the local variable. This is why letgo() and letgo3() leave the external item unaltered, but letgo2() alters it.
As #ursan pointed out, if letgo() used something like this instead, then it would alter (mutate) the original object that df points to, which would change the value seen via the global a variable:
def letgo(df):
df.drop('b', axis=1, inplace=True)
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo(a) # will alter a
In some cases, you can completely hollow out the original variable and refill it with new data, without actually doing a direct assignment, e.g. this will alter the original object that v points to, which will change the data seen when you use v later:
def letgo3(x):
x[:] = np.array([[3,3],[3,3]])
v = np.empty((2, 2))
letgo3(v) # will alter v
Notice that I'm not assigning something directly to x; I'm assigning something to the entire internal range of x.
If you absolutely must create a completely new object and make it visible externally (which is sometimes the case with pandas), you have two options. The 'clean' option would be just to return the new object, e.g.,
def letgo(df):
df = df.drop('b',axis=1)
return df
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
a = letgo(a)
Another option would be to reach outside your function and directly alter a global variable. This changes a to point to a new object, and any function that refers to a afterward will see that new object:
def letgo():
global a
a = a.drop('b',axis=1)
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo() # will alter a!
Directly altering global variables is usually a bad idea, because anyone who reads your code will have a hard time figuring out how a got changed. (I generally use global variables for shared parameters used by many functions in a script, but I don't let them alter those global variables.)
To add to #Mike Graham's answer, who pointed to a very good read:
In your case, what is important to remember is the difference between names and values. a, df, xx, x, are all names, but they refer to the same or different values at different points of your examples:
In the first example, letgo rebinds df to another value, because df.drop returns a new DataFrame unless you set the argument inplace = True (see doc). That means that the name df (local to the letgo function), which was referring to the value of a, is now referring to a new value, here the df.drop return value. The value a is referring to still exists and hasn't changed.
In the second example, letgo2 mutates x, without rebinding it, which is why xx is modified by letgo2. Unlike the previous example, here the local name x always refers to the value the name xx is referring to, and changes that value in place, which is why the value xx is referring to has changed.
In the third example, letgo3 rebinds x to a new np.array. That causes the name x, local to letgo3 and previously referring to the value of xx, to now refer to another value, the new np.array. The value xx is referring to hasn't changed.
The question isn't PBV vs. PBR. These names only cause confusion in a language like Python; they were invented for languages that work like C or like Fortran (as the quintessential PBV and PBR languages). It is true, but not enlightening, that Python always passes by value. The question here is whether the value itself is mutated or whether you get a new value. Pandas usually errs on the side of the latter.
http://nedbatchelder.com/text/names.html explains very well what Python's system of names is.
Python is neither pass by value nor pass by reference. It is pass by assignment.
Supporting reference, the Python FAQ:
https://docs.python.org/3/faq/programming.html#how-do-i-write-a-function-with-output-parameters-call-by-reference
IOW:
If you pass an immutable value, changes to it do not change its
value in the caller - because you are rebinding the name to a new
object.
If you pass a mutable value, changes made in the called function,
also change the value in the caller, so long as you do not rebind
that name to a new object. If you reassign the variable,
creating a new object, that change and subsequent changes to the
name are not seen in the caller.
So if you pass a list, and change its 0th value, that change is seen in both the called and the caller. But if you reassign the list with a new list, this change is lost. But if you slice the list and replace that with a new list, that change is seen in both the called and the caller.
EG:
def change_it(list_):
# This change would be seen in the caller if we left it alone
list_[0] = 28
# This change is also seen in the caller, and replaces the above
# change
list_[:] = [1, 2]
# This change is not seen in the caller.
# If this were pass by reference, this change too would be seen in
# caller.
list_ = [3, 4]
thing = [10, 20]
change_it(thing)
# here, thing is [1, 2]
If you're a C fan, you can think of this as passing a pointer by value - not a pointer to a pointer to a value, just a pointer to a value.
HTH.
Here is the doc for drop:
Return new object with labels in requested axis removed.
So a new dataframe is created. The original has not changed.
But as for all objects in python, the data frame is passed to the function by reference.
you need to make 'a' global at the start of the function otherwise it is a local variable and does not change the 'a' in the main code.
Short answer:
By value: df2 = df.copy()
By references : df2 = df
I am aware that numeric values are immutable in python. I have also read how everything is an object in python. I just want to know if numeric types are also objects in python. Because if they are objects, then the variables are actually reference variables right? Does it mean that if I pass a number to a function and modify it inside a function, then two number objects with two references are created? Is there a concept of primitive data types in python?
Note: I too was thinking it as objects. But visualizing in python tutor says differnt:
http://www.pythontutor.com/visualize.html#mode=edit
def test(a):
a+=10
b=100
test(b)
Or is it a defect in the visualization tool?
Are numeric types objects?
>>> isinstance(1, object)
True
Apparently they are. :-).
Note that you might need to adjust your mental model of an object a little. It seems to me that you're thinking of object as something that is "mutable" -- that isn't the case. In reality, we need to think of python names as a reference to an object. That object may hold references to other objects.
name = something
Here, the right hand side is evaluated -- All the names are resolved into objects and the result of the expression (an object) is referenced by "name".
Ok, now lets consider what happens when you pass something to a function.
def foo(x):
x = 2
z = 3
foo(z)
print(z)
What do we expect to happen here? Well, first we create the function foo. Next, we create the object 3 and reference it by the name z. After that, we look up the value that z references and pass that value to foo. Upon entering foo, that value gets referenced by the (local) name x. We then create the object 2 and reference it by the local name x. Note, x has nothing to do with the global z -- They're independent references. Just because they were referencing the same object when you enter the function doesn't mean that they have to reference the function for all time. We can change what a name references at any point by using an assignment statement.
Note, your example with += may seem to complicate things, but you can think of a += 10 as a = a + 10 if it helps in this context. For more information on += check out: When is "i += x" different from "i = i + x" in Python?
Everything in Python is an object, and that includes the numbers. There are no "primitive" types, only built-in types.
Numbers, however, are immutable. When you perform an operation with a number, you are creating a new number object.
This question already has answers here:
Why variable = object doesn't work like variable = number
(10 answers)
Closed 4 years ago.
There is this code:
# assignment behaviour for integer
a = b = 0
print a, b # prints 0 0
a = 4
print a, b # prints 4 0 - different!
# assignment behaviour for class object
class Klasa:
def __init__(self, num):
self.num = num
a = Klasa(2)
b = a
print a.num, b.num # prints 2 2
a.num = 3
print a.num, b.num # prints 3 3 - the same!
Questions:
Why assignment operator works differently for fundamental type and
class object (for fundamental types it copies by value, for class object it copies by reference)?
How to copy class objects only by value?
How to make references for fundamental types like in C++ int& b = a?
This is a stumbling block for many Python users. The object reference semantics are different from what C programmers are used to.
Let's take the first case. When you say a = b = 0, a new int object is created with value 0 and two references to it are created (one is a and another is b). These two variables point to the same object (the integer which we created). Now, we run a = 4. A new int object of value 4 is created and a is made to point to that. This means, that the number of references to 4 is one and the number of references to 0 has been reduced by one.
Compare this with a = 4 in C where the area of memory which a "points" to is written to. a = b = 4 in C means that 4 is written to two pieces of memory - one for a and another for b.
Now the second case, a = Klass(2) creates an object of type Klass, increments its reference count by one and makes a point to it. b = a simply takes what a points to , makes b point to the same thing and increments the reference count of the thing by one. It's the same as what would happen if you did a = b = Klass(2). Trying to print a.num and b.num are the same since you're dereferencing the same object and printing an attribute value. You can use the id builtin function to see that the object is the same (id(a) and id(b) will return the same identifier). Now, you change the object by assigning a value to one of it's attributes. Since a and b point to the same object, you'd expect the change in value to be visible when the object is accessed via a or b. And that's exactly how it is.
Now, for the answers to your questions.
The assignment operator doesn't work differently for these two. All it does is add a reference to the RValue and makes the LValue point to it. It's always "by reference" (although this term makes more sense in the context of parameter passing than simple assignments).
If you want copies of objects, use the copy module.
As I said in point 1, when you do an assignment, you always shift references. Copying is never done unless you ask for it.
Quoting from Data Model
Objects are Python’s abstraction for data. All data in a Python
program is represented by objects or by relations between objects. (In
a sense, and in conformance to Von Neumann’s model of a “stored
program computer,” code is also represented by objects.)
From Python's point of view, Fundamental data type is fundamentally different from C/C++. It is used to map C/C++ data types to Python. And so let's leave it from the discussion for the time being and consider the fact that all data are object and are manifestation of some class. Every object has an ID (somewhat like address), Value, and a Type.
All objects are copied by reference. For ex
>>> x=20
>>> y=x
>>> id(x)==id(y)
True
>>>
The only way to have a new instance is by creating one.
>>> x=3
>>> id(x)==id(y)
False
>>> x==y
False
This may sound complicated at first instance but to simplify a bit, Python made some types immutable. For example you can't change a string. You have to slice it and create a new string object.
Often copying by reference gives unexpected results for ex.
x=[[0]*8]*8 may give you a feeling that it creates a two dimensional list of 0s. But in fact it creates a list of the reference of the same list object [0]s. So doing x[1][1] would end up changing all the duplicate instance at the same time.
The Copy module provides a method called deepcopy to create a new instance of the object rather than a shallow instance. This is beneficial when you intend to have two distinct object and manipulate it separately just as you intended in your second example.
To extend your example
>>> class Klasa:
def __init__(self, num):
self.num = num
>>> a = Klasa(2)
>>> b = copy.deepcopy(a)
>>> print a.num, b.num # prints 2 2
2 2
>>> a.num = 3
>>> print a.num, b.num # prints 3 3 - different!
3 2
It doesn't work differently. In your first example, you changed a so that a and b reference different objects. In your second example, you did not, so a and b still reference the same object.
Integers, by the way, are immutable. You can't modify their value. All you can do is make a new integer and rebind your reference. (like you did in your first example)
Suppose you and I have a common friend. If I decide that I no longer like her, she is still your friend. On the other hand, if I give her a gift, your friend received a gift.
Assignment doesn't copy anything in Python, and "copy by reference" is somewhere between awkward and meaningless (as you actually point out in one of your comments). Assignment causes a variable to begin referring to a value. There aren't separate "fundamental types" in Python; while some of them are built-in, int is still a class.
In both cases, assignment causes the variable to refer to whatever it is that the right-hand-side evaluates to. The behaviour you're seeing is exactly what you should expect in that environment, per the metaphor. Whether your "friend" is an int or a Klasa, assigning to an attribute is fundamentally different from reassigning the variable to a completely other instance, with the correspondingly different behaviour.
The only real difference is that the int doesn't happen to have any attributes you can assign to. (That's the part where the implementation actually has to do a little magic to restrict you.)
You are confusing two different concepts of a "reference". The C++ T& is a magical thing that, when assigned to, updates the referred-to object in-place, and not the reference itself; that can never be "reseated" once the reference is initialized. This is useful in a language where most things are values. In Python, everything is a reference to begin with. The Pythonic reference is more like an always-valid, never-null, not-usable-for-arithmetic, automatically-dereferenced pointer. Assignment causes the reference to start referring to a different thing completely. You can't "update the referred-to object in-place" by replacing it wholesale, because Python's objects just don't work like that. You can, of course, update its internal state by playing with its attributes (if there are any accessible ones), but those attributes are, themselves, also all references.
IMO python is pass by value if the parameter is basic types, like number, boolean
func_a(bool_value):
bool_value = True
Will not change the outside bool_value, right?
So my question is how can I make the bool_value change takes effect in the outside one(pass by reference?
You can use a list to enclose the inout variable:
def func(container):
container[0] = True
container = [False]
func(container)
print container[0]
The call-by-value/call-by-reference misnomer is an old debate. Python's semantics are more accurately described by CLU's call-by-sharing. See Fredrik Lundh's write up of this for more detail:
Call By Object
Python (always), like Java (mostly) passes arguments (and, in simple assignment, binds names) by object reference. There is no concept of "pass by value", neither does any concept of "reference to a variables" -- only reference to a value (some express this by saying that Python doesn't have "variables"... it has names, which get bound to values -- and that is all that can ever happen).
Mutable objects can have mutating methods (some of which look like operators or even assignment, e.g a.b = c actually means type(a).__setattr__(a, 'b', c), which calls a method which may likely be a mutating ones).
But simple assignment to a barename (and argument passing, which is exactly the same as simple assignment to a barename) never has anything at all to do with any mutating methods.
Quite independently of the types involved, simple barename assignment (and, identically, argument passing) only ever binds or rebinds the specific name on the left of the =, never affecting any other name nor any object in any way whatsoever. You're very mistaken if you believe that types have anything to do with the semantics of argument passing (or, identically, simple assignment to barenames).
Unmutable types can't, but if you send a user-defined class instance, a list or a dictionary, you can change it and keep with only one object.
Like this:
def add1(my_list):
my_list.append(1)
a = []
add1(a)
print a
But, if you do my_list = [1], you obtain a new instance, losing the original reference inside the function, that's why you can't just do "my_bool = False" and hope that outside of the function your variable get that False