This question already has answers here:
Why can a function modify some arguments as perceived by the caller, but not others?
(13 answers)
Closed 8 years ago.
I was currently doing the introductory course of python provided by mit on edx. And, the professor is currently teaching the use of functions as objects. I am having confusion in one of the codes where he is defining a function and taking two arguments- one is a list and other is some other function.
def applyToEach(l,f):
for i in range(len(l)):
l[i]=f(l[i])
l=[1,-2,3.4]
applyToEach(l,abs)
print(l)
My question is: the changes/modifications in the list is taking place inside the function. So, how they are getting reflected outside the function? According to me, there should not be any changes in the list when I print it because I made all the changes inside the function. I mean that different environments are created for different functions and it worked this way only in the previous exercises and therefore changes are not reflected in our main environment. Correct me wherever I am wrong, because when I am running this code, the modified list is getting printed. So, where I am wrong in the given logic?
Because they are the same objects. You can verify that through id() or is.
#!/usr/bin/env python
#-*- coding:utf-8 -*-
def applyToEach(l,f):
print 'inside id is: ', id(l)
for i in range(len(l)):
l[i]=f(l[i])
l=[1,-2,3.4]
print 'outside id is: ', id(l)
applyToEach(l,abs)
print(l)
In Python, passing a value to a function equals 'para = value'. However, '=' only add a reference to the value but not a copy.
The list is being passed as an argument to the function. When a list is created in python a memory location is allocated to it for reference. Now, when you are passing the list (or even do any operation on the list), the assigned memory location is passed. So, whatever modifications are done, that is reflected in the list itself. But the memory location remains the same. So, regardless of whether you access it from outside of the function or inside the function it points to the same memory location. And hence, the changes done, inside are visible outside the function.
Let the list be
Fruit = ['Mango', 'Apple', 'Orange']
While we assign the list to value Fruit, a memory location is assigned to it (here 4886)
so id(Fruit) = 4886
Now, suppose the list 'Fruit' is being passed to a function, inside which it is being modified as follows :-
def fun(li=[]):
li[1] = 'Grape'
li.extend(['Apple', 'Banana'])
So, what happens behind the scene is, while the list is passed to the method as argument, actually, the reference (or address) of the list is passed. So, even if you access the list and modify it within the function, it modifies the original list as it refers to the list reference. So, the list reference get updated accordingly.
After, the modification also, it will show the list updated as inside the method. But, the memory location will be the same i.e. id(Fruit) = 4886
Even if you assign the list to a new list i.e. temp = Fruit, same will be the scenario. temp will refer to the same location as of Fruit, both the list will point to the same location.
To avoid this situation, what you have to do is :-
temp = list(Fruit)
which will create a new list with a new memory location and hence the id(Fruit) will differ from id(temp).
Following link will give you a better understanding on how list works in python :-
http://www.python-course.eu/deep_copy.php
Related
I'm an absolute Python-Newbe and I have some trouble with following function. I hope you can help me. Thank you very much for your help in advance!
I have created a list of zip-files in a directory via a list-comprehension:
zips_in_folder = [file for file in os.listdir(my_path) if file.endswith('.zip')]
I then wanted to define a function that replaces a certain character at a certain index in every element fo the list with "-":
print(zips_in_folder)
def replacer_zip_names(r_index, replacer, zips_in_folder=zips_in_folder):
for index, element in enumerate(zips_in_folder):
x = list(element)
x[r_index] = replacer
zips_in_folder[index]=''.join(x)
replacer_zip_names(5,"-")
print(zips_in_folder)
Output:
['12345#6', '22345#6']
['12345-6', '22345-6']
The function worked, but what I cannot wrap my head around: Why will my function update the actual list "zips_in_folder". I thought the "zips_in_folder"-list within the function would only be a "shadow" of the actual list outside the function. Is the scope of the for-loop global instead of local in this case?
In other functions I wrote the scope of the variables was always local...
I was searching for an answer for hours now, I hope my question isn't too obvious!
Thanks again!
Best
Felix
This is a rather intermediate topic. In one line: Python is pass-by-object-reference.
What this means
zips_in_folder is an object. An object has a reference (think of it like an address) that points to its location in memory. To access an object, you need to use its reference.
Now, here's the key part:
For objects Python passes their reference as value
This means that a copy of the reference of the object is created but again, the new reference is pointing to the same location in the memory.
As a consequence, if you use the reference's copy to access the object, then the original object will be modified.
In your function, zips_in_folder is a variable storing a new copy of the reference.
The following line is using the new copy to access the original object:
zips_in_folder[index]=''.join(x)
However, if you decide to reassign the variable that is storing the reference, nothing will be done to the object, or its original reference, because you just reassigned the variable storing the copy of the reference, you did not modify the original object. Meaning that:
def reassign(a):
a = []
a = [1,0]
reassign(a)
print(a) # output: [1,0]
A simple way to think about it is that lists are mutable, this means that the following will be true:
a = [1, 2, 3]
b = a # a, b are referring to the same object
a[1] = 20 # b now is [1, 20, 3]
That is because lists are objects in python, not primitive variables, so the function changes the "original" list i.e. it doesn't make a local copy of it.
The same is true for any class, user-defined or otherwise: a function manipulating an object will not make a copy of the object, it will change the "original" object passed to it.
If you have knowledge of c++ or any other low-level programming language, it's the same as pass-by-reference.
I am trying to understand how exactly assignment operators, constructors and parameters passed in functions work in python specifically with lists and objects. I have a class with a list as a parameter. I want to initialize it to an empty list and then want to populate it using the constructor. I am not quite sure how to do it.
Lets say my class is --
class A:
List = [] # Point 1
def __init1__(self, begin=[]): # Point 2
for item in begin:
self.List.append(item)
def __init2__(self, begin): # Point 3
List = begin
def __init3__(self, begin=[]): # Point 4
List = list()
for item in begin:
self.List.append(item)
listObj = A()
del(listObj)
b = listObj
I have the following questions. It will be awesome if someone could clarify what happens in each case --
Is declaring an empty like in Point 1 valid? What is created? A variable pointing to NULL?
Which of Point 2 and Point 3 are valid constructors? In Point 3 I am guessing that a new copy of the list passed in (begin) is not made and instead the variable List will be pointing to the pointer "begin". Is a new copy of the list made if I use the constructor as in Point 2?
What happens when I delete the object using del? Is the list deleted as well or do I have to call del on the List before calling del on the containing object? I know Python uses GC but if I am concerned about cleaning unused memory even before GC kicks in is it worth it?
Also assigning an object of type A to another only makes the second one point to the first right? If so how do I do a deep copy? Is there a feature to overload operators? I know python is probably much simpler than this and hence the question.
EDIT:
5. I just realized that using Point 2 and Point 3 does not make a difference. The items from the list begin are only copied by reference and a new copy is not made. To do that I have to create a new list using list(). This makes sense after I see it I guess.
Thanks!
In order:
using this form is simply syntactic sugar for calling the list constructor - i.e. you are creating a new (empty) list. This will be bound to the class itself (is a static field) and will be the same for all instances.
apart from the constructor name which must always be init, both are valid forms, but mean different things.
The first constructor can be called with a list as argument or without. If it is called without arguments, the empty list passed as default is used within (this empty list is created once during class definition, and not once per constructor call), so no items are added to the static list.
The second must be called with a list parameter, or python will complain with an error, but using it without the self. prefix like you are doing, it would just create a new local variable name List, accessible only within the constructor, and leave the static A.List variable unchanged.
Deleting will only unlink a reference to the object, without actually deleting anything. Once all references are removed, however, the garbage collector is free to clear the memory as needed.
It is usually a bad idea to try to control the garbage collector. instead. just make sure you don't hold references to objects you no longer need and let it make its work.
Assigning a variable with an object will only create a new reference to the same object, yes. To create a deep copy use the related functions or write your own.
Operator overloading (use with care, it can make things more confusing instead of clearer if misused) can be done by overriding some special methods in the class definition.
About your edit: like i pointed above, when writing List=list() inside the constructor, without the self. (or better, since the variable is static, A.) prefix, you are just creating an empty variable, and not overriding the one you defined in the class body.
For reference, the usual way to handle a list as default argument is by using a None placeholder:
class A(object):
def __init__(self, arg=None):
self.startvalue = list(arg) if arg is not None else list()
# making a defensive copy of arg to keep the original intact
As an aside, do take a look at the python tutorial. It is very well written and easy to follow and understand.
"It will be awesome if someone could clarify what happens in each case" isn't that the purpose of the dis module ?
http://docs.python.org/2/library/dis.html
I have a nested dictionary containing a bunch of data on a number of different objects (where I mean object in the non-programming sense of the word). The format of the dictionary is allData[i][someDataType], where i is a number designation of the object that I have data on, and someDataType is a specific data array associated with the object in question.
Now, I have a function that I have defined that requires a particular data array for a calculation to be performed for each object. The data array is called cleanFDF. So I feed this to my function, along with a bunch of other things it requires to work. I call it like this:
rm.analyze4complexity(allData[i]['cleanFDF'], other data, other data, other data)
Inside the function itself, I straight away re-assign the cleanFDF data to another variable name, namely clFDF. I.e. The end result is:
clFDF = allData[i]['cleanFDF']
I then have to zero out all of the data that lies below a certain threshold, as such:
clFDF[ clFDF < threshold ] = 0
OK - the function works as it is supposed to. But now when I try to plot the original cleanFDF data back in the main script, the entries that got zeroed out in clFDF are also zeroed out in allData[i]['cleanFDF']. WTF? Obviously something is happening here that I do not understand.
To make matters even weirder (from my point of view), I've tried to do a bodgy kludge to get around this by 'saving' the array to another variable before calling the function. I.e. I do
saveFDF = allData[i]['cleanFDF']
then run the function, then update the cleanFDF entry with the 'saved' data:
allData[i].update( {'cleanFDF':saveFDF} )
but somehow, simply by performing clFDF[ clFDF < threshold ] = 0 within the function modifies clFDF, saveFDF and allData[i]['cleanFDF'] in the main friggin' script, zeroing out all the entires at the same array indexes! It is like they are all associated global variables somehow, but I've made no such declarations anywhere...
I am a hopeless Python newbie, so no doubt I'm not understanding something about how it works. Any help would be greatly appreciated!
You are passing the value at allData[i]['cleanFDF'] by reference (decent explanation at https://stackoverflow.com/a/430958/337678). Any changes made to it will be made to the object it refers to, which is still the same object as the original, just assigned to a different variable.
Making a deep copy of the data will likely fix your issue (Python has a deepcopy library that should do the trick ;)).
Everything is a reference in Python.
def function(y):
y.append('yes')
return y
example = list()
function(example)
print(example)
it would return ['yes'] even though i am not directly changing the variable 'example'.
See Why does list.append evaluate to false?, Python append() vs. + operator on lists, why do these give different results?, Python lists append return value.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Modifying list while iterating
I am writing a python script where I am trying to append objects to a List created in the body of a class, from inside a method.
My code so far is this:
class Worker:
myList = ['one item', 'second item']
def itter_List_Func(self, list_param):
for item in list_param:
more_items = item.split()
self.myList[:] = [self.myList + item for item in more_items]
but for a strange kind of fashion I run into some 'Can not modify list while itterated error'. Should I leave the in-place change and try to create a new List object with the new items as well? Or that would create more problems due to lost reference for list_param or something?
To be explicit about the "make a new list directly" idea, you want something like:
myList = sum((item.split() for item in myList), [])
That's the simply-written way, that unfortunately gets slow if you have a lot of items (because sum relies on addition, and addition isn't an efficient way to join lists in Python). Using an explicit loop:
result = []
for item in myList:
result.extend(item.split())
myList = result
That is: we don't create a copy of the list and try to modify it; we create a blank list and iteratively transform it into what we want, using the original list as input for the process.
By the way, you have two likely design issues here: you seem to be expecting the function to be passed a specific value every time it is called, and you have defined a class attribute where you probably want an instance attribute instead.
Ok from the links that avasal provided me with, and some further reading here on stack overflow, I understood that what I am trying to do is a bad idea. Maybe I should try and iterrate over a copy of the original list, or make assignments to a copy of the original list, because, if I make changes to the list while being itterated, the itterators will not be informed about this, resulting in very weird behaviour, or run-time errors.
"Learning Python, 4th Ed." mentions that:
the enclosing scope variable is looked up when the nested functions
are later called..
However, I thought that when a function exits, all of its local references disappear.
def makeActions():
acts = []
for i in range(5): # Tries to remember each i
acts.append(lambda x: i ** x) # All remember same last i!
return acts
makeActions()[n] is the same for every n because the variable i is somehow looked up at call time. How does Python look up this variable? Shouldn't it not exist at all because makeActions has already exited? Why doesn't Python do what the code intuitively suggests, and define each function by replacing i with its current value within the for loop as the loop is running?
I think it's pretty obvious what happens when you think of i as a name not some sort of value. Your lambda function does something like "take x: look up the value of i, calculate i**x" ... so when you actually run the function, it looks up i just then so i is 4.
You can also use the current number, but you have to make Python bind it to another name:
def makeActions():
def make_lambda( j ):
return lambda x: j * x # the j here is still a name, but now it wont change anymore
acts = []
for i in range(5):
# now you're pushing the current i as a value to another scope and
# bind it there, under a new name
acts.append(make_lambda(i))
return acts
It might seem confusing, because you often get taught that a variable and it's value are the same thing -- which is true, but only in languages that actually use variables. Python has no variables, but names instead.
About your comment, actually i can illustrate the point a bit better:
i = 5
myList = [i, i, i]
i = 6
print(myList) # myList is still [5, 5, 5].
You said you changed i to 6, that is not what actually happend: i=6 means "i have a value, 6 and i want to name it i". The fact that you already used i as a name matters nothing to Python, it will just reassign the name, not change it's value (that only works with variables).
You could say that in myList = [i, i, i], whatever value i currently points to (the number 5) gets three new names: mylist[0], mylist[1], mylist[2]. That's the same thing that happens when you call a function: The arguments are given new names. But that is probably going against any intuition about lists ...
This can explain the behavior in the example: You assign mylist[0]=5, mylist[1]=5, mylist[2]=5 - no wonder they don't change when you reassign the i. If i was something muteable, for example a list, then changing i would reflect on all entries in myList too, because you just have different names for the same value!
The simple fact that you can use mylist[0] on the left hand of a = proves that it is indeed a name. I like to call = the assign name operator: It takes a name on the left, and a expression on the right, then evaluates the expression (call function, look up the values behind names) until it has a value and finally gives the name to the value. It does not change anything.
For Marks comment about compiling functions:
Well, references (and pointers) only make sense when we have some sort of addressable memory. The values are stored somewhere in memory and references lead you that place. Using a reference means going to that place in memory and doing something with it. The problem is that none of these concepts are used by Python!
The Python VM has no concept of memory - values float somewhere in space and names are little tags connected to them (by a little red string). Names and values exist in separate worlds!
This makes a big difference when you compile a function. If you have references, you know the memory location of the object you refer to. Then you can simply replace then reference with this location.
Names on the other hand have no location, so what you have to do (during runtime) is follow that little red string and use whatever is on the other end. That is the way Python compiles functions: Where
ever there is a name in the code, it adds a instruction that will figure out what that name stands for.
So basically Python does fully compile functions, but names are compiled as lookups in the nesting namespaces, not as some sort of reference to memory.
When you use a name, the Python compiler will try to figure out where to which namespace it belongs to. This results in a instruction to load that name from the namespace it found.
Which brings you back to your original problem: In lambda x:x**i, the i is compiled as a lookup in the makeActions namespace (because i was used there). Python has no idea, nor does it care about the value behind it (it does not even have to be a valid name). One that code runs the i gets looked up in it's original namespace and gives the more or less expected value.
What happens when you create a closure:
The closure is constructed with a pointer to the frame (or roughly, block) that it was created in: in this case, the for block.
The closure actually assumes shared ownership of that frame, by incrementing the frame's ref count and stashing the pointer to that frame in the closure. That frame, in turn, keeps around references to the frames it was enclosed in, for variables that were captured further up the stack.
The value of i in that frame keeps changing as long as the for loop is running – each assignment to i updates the binding of i in that frame.
Once the for loop exits, the frame is popped off the stack, but it isn't thrown away as it might usually be! Instead, it's kept around because the closure's reference to the frame is still active. At this point, though, the value of i is no longer updated.
When the closure is invoked, it picks up whatever value of i is in the parent frame at the time of invocation. Since in the for loop you create closures, but don't actually invoke them, the value of i upon invocation will be the last value it had after all the looping was done.
Future calls to makeActions will create different frames. You won't reuse the for loop's previous frame, or update that previous frame's i value, in that case.
In short: frames are garbage-collected just like other Python objects, and in this case, an extra reference is kept around to the frame corresponding to the for block so it doesn't get destroyed when the for loop goes out of scope.
To get the effect you want, you need to have a new frame created for each value of i you want to capture, and each lambda needs to be created with a reference to that new frame. You won't get that from the for block itself, but you could get that from a call to a helper function which will establish the new frame. See THC4k's answer for one possible solution along these lines.
The local references persist because they're contained in the local scope, which the closure keeps a reference to.
I thought that when a function exits, all of its local references disappear.
Except for those locals which are closed over in a closure. Those do not disappear, even when the function to which they are local has returned.
Intuitively one might think i would be captured in its current state but that is not the case. Think of each layer as a dictionary of name value pairs.
Level 1:
acts
i
Level 2:
x
Every time you create a closure for the inner lambda you are capturing a reference to level one. I can only assume that the run-time will perform a look-up of the variable i, starting in level 2 and making its way to level 1. Since you are not executing these functions immediately they will all use the final value of i.
Experts?