"Learning Python, 4th Ed." mentions that:
the enclosing scope variable is looked up when the nested functions
are later called..
However, I thought that when a function exits, all of its local references disappear.
def makeActions():
acts = []
for i in range(5): # Tries to remember each i
acts.append(lambda x: i ** x) # All remember same last i!
return acts
makeActions()[n] is the same for every n because the variable i is somehow looked up at call time. How does Python look up this variable? Shouldn't it not exist at all because makeActions has already exited? Why doesn't Python do what the code intuitively suggests, and define each function by replacing i with its current value within the for loop as the loop is running?
I think it's pretty obvious what happens when you think of i as a name not some sort of value. Your lambda function does something like "take x: look up the value of i, calculate i**x" ... so when you actually run the function, it looks up i just then so i is 4.
You can also use the current number, but you have to make Python bind it to another name:
def makeActions():
def make_lambda( j ):
return lambda x: j * x # the j here is still a name, but now it wont change anymore
acts = []
for i in range(5):
# now you're pushing the current i as a value to another scope and
# bind it there, under a new name
acts.append(make_lambda(i))
return acts
It might seem confusing, because you often get taught that a variable and it's value are the same thing -- which is true, but only in languages that actually use variables. Python has no variables, but names instead.
About your comment, actually i can illustrate the point a bit better:
i = 5
myList = [i, i, i]
i = 6
print(myList) # myList is still [5, 5, 5].
You said you changed i to 6, that is not what actually happend: i=6 means "i have a value, 6 and i want to name it i". The fact that you already used i as a name matters nothing to Python, it will just reassign the name, not change it's value (that only works with variables).
You could say that in myList = [i, i, i], whatever value i currently points to (the number 5) gets three new names: mylist[0], mylist[1], mylist[2]. That's the same thing that happens when you call a function: The arguments are given new names. But that is probably going against any intuition about lists ...
This can explain the behavior in the example: You assign mylist[0]=5, mylist[1]=5, mylist[2]=5 - no wonder they don't change when you reassign the i. If i was something muteable, for example a list, then changing i would reflect on all entries in myList too, because you just have different names for the same value!
The simple fact that you can use mylist[0] on the left hand of a = proves that it is indeed a name. I like to call = the assign name operator: It takes a name on the left, and a expression on the right, then evaluates the expression (call function, look up the values behind names) until it has a value and finally gives the name to the value. It does not change anything.
For Marks comment about compiling functions:
Well, references (and pointers) only make sense when we have some sort of addressable memory. The values are stored somewhere in memory and references lead you that place. Using a reference means going to that place in memory and doing something with it. The problem is that none of these concepts are used by Python!
The Python VM has no concept of memory - values float somewhere in space and names are little tags connected to them (by a little red string). Names and values exist in separate worlds!
This makes a big difference when you compile a function. If you have references, you know the memory location of the object you refer to. Then you can simply replace then reference with this location.
Names on the other hand have no location, so what you have to do (during runtime) is follow that little red string and use whatever is on the other end. That is the way Python compiles functions: Where
ever there is a name in the code, it adds a instruction that will figure out what that name stands for.
So basically Python does fully compile functions, but names are compiled as lookups in the nesting namespaces, not as some sort of reference to memory.
When you use a name, the Python compiler will try to figure out where to which namespace it belongs to. This results in a instruction to load that name from the namespace it found.
Which brings you back to your original problem: In lambda x:x**i, the i is compiled as a lookup in the makeActions namespace (because i was used there). Python has no idea, nor does it care about the value behind it (it does not even have to be a valid name). One that code runs the i gets looked up in it's original namespace and gives the more or less expected value.
What happens when you create a closure:
The closure is constructed with a pointer to the frame (or roughly, block) that it was created in: in this case, the for block.
The closure actually assumes shared ownership of that frame, by incrementing the frame's ref count and stashing the pointer to that frame in the closure. That frame, in turn, keeps around references to the frames it was enclosed in, for variables that were captured further up the stack.
The value of i in that frame keeps changing as long as the for loop is running – each assignment to i updates the binding of i in that frame.
Once the for loop exits, the frame is popped off the stack, but it isn't thrown away as it might usually be! Instead, it's kept around because the closure's reference to the frame is still active. At this point, though, the value of i is no longer updated.
When the closure is invoked, it picks up whatever value of i is in the parent frame at the time of invocation. Since in the for loop you create closures, but don't actually invoke them, the value of i upon invocation will be the last value it had after all the looping was done.
Future calls to makeActions will create different frames. You won't reuse the for loop's previous frame, or update that previous frame's i value, in that case.
In short: frames are garbage-collected just like other Python objects, and in this case, an extra reference is kept around to the frame corresponding to the for block so it doesn't get destroyed when the for loop goes out of scope.
To get the effect you want, you need to have a new frame created for each value of i you want to capture, and each lambda needs to be created with a reference to that new frame. You won't get that from the for block itself, but you could get that from a call to a helper function which will establish the new frame. See THC4k's answer for one possible solution along these lines.
The local references persist because they're contained in the local scope, which the closure keeps a reference to.
I thought that when a function exits, all of its local references disappear.
Except for those locals which are closed over in a closure. Those do not disappear, even when the function to which they are local has returned.
Intuitively one might think i would be captured in its current state but that is not the case. Think of each layer as a dictionary of name value pairs.
Level 1:
acts
i
Level 2:
x
Every time you create a closure for the inner lambda you are capturing a reference to level one. I can only assume that the run-time will perform a look-up of the variable i, starting in level 2 and making its way to level 1. Since you are not executing these functions immediately they will all use the final value of i.
Experts?
Related
How does Python manage to locate a global/nonlocal variable so quickly even if the recursion is thousands of layers deep? Also why do only need global/nonlocal for primitive data types and not pointers?
How does Python manage to locate a global/nonlocal variable so quickly even if the recursion is thousands of layers deep?
A useful reference (as far as CPython goes) is: TenThousandMeters. The key points from this is that (1) globals are accessible through the bytecode instruction LOAD_GLOBAL, (2) declared nonlocals are 'cell variables' directly accessible to a code block. The VM doesn't need to somehow iterate through the previous frames on the call stack to get to them.
Also why do only need global/nonlocal for primitive data types and not
pointers?
I think what you're getting at here is: why can we do something like my_list.append(42) if my_list comes from an enclosing scope, without needing a nonlocal declaration, whereas if we do x = x + 42 we do need to have declared nonlocal x?
The key point here is that x = x + 42 reassigns the name x to refer to a different object and requires a nonlocal declaration. But my_list.append(42) doesn't reassign a name. my_list refers to the same object before and after the statement: the effect of the statement is to perform an operation on that mutable object, and nonlocal isn't needed here: just like it isn't when you call a method on my_list which happens to leave it unchanged.
This isn't exactly about primitive data types vs others. For example, my_list = [i+2 for i in my_list] does reassign my_list to point to a newly created object, and needs nonlocal.
I am a beginner in Python and using Mark Lutz's book to learn the fundamentals of Python.
Here's an example that the author uses to demonstrate storing state information using lists:
def tester(start):
def nested(label):
print(label,state[0])
state[0] += 1
state = [start]
return nested
Here's the code to test state information:
F = tester(3)
F('sam')
F('sam')
You would see that the counter increases from 3 and then continues. In essence, above code stores initial state start (passed during initialization of the object) in [state] and increments it every time label is called.
However, I am unsure why Python doesn't throw an error in nested block. Specifically, [state] is local to tester and not nested.
To demonstrate what I mean, I am going to replace state[0] by state.
def tester(start):
def nested(label):
print(label,state) #Replaced state[0] with state
state += 1 #Replaced state[0] with state
print("after:",state)
state = start #Replaced state[0] with state
return nested
Technically, above code should also work fine because all I have done is that replaced the list with the variable. However, PyCharm wouldn't even run this code. I get an error nboundLocalError: local variable 'state' referenced before assignment
Can someone please explain why the version with list works fine? The author has stated that "this leverages the mutability of lists, and relies on the fact that in-place object do not classify a name as local."
I am not really sure what that means. Can someone please help me? Thanks for any help extended to me.
You should read this section of the documentation.
Basically, in both versions the scope of the nested block allows it to interact with the namespace of the encompassing block. The difference is that you are not reassigning state in the first example, you're mutating it.
In the second example, Python knows that you are going to assign a value to that reference later in the function, so it is treated as a name from the local nested namespace, rather than the outer tester namespace.
You can use the nonlocal keyword to circumvent this and use the other reference from the other namespace
def tester(start):
def nested(label):
nonlocal state
print(label,state)
state += 1
print("after:",state)
state = start
return nested
From what I understand, because nested is nested under tester, it would have access to any objects and variables that belong to tester because tester is the parent function and nested is the child function in this case. Python will not produce an error because of inheritance.
And about replacing state[0] with state, Python automatically assumes that state is an integer because you are trying to add to it. While state is a list, and you can't add to is unless you append an element to it- which this isn't your case. The reason why state[0] works and not state is because state[0] is an element in the state list and it adds 0 to it.
It's a function of 1) how Python variable assignment actually just creates aliases (pointers) to underlying values in memory, and the difference between how mutable and immutable types are treated; and 2) some Python "magic" related to closures. The crux of your question is really the first point.
To address that, take for example the following:
a = 3
b = 3
Both a and b point to the same underlying object:
assert hex(id(a)) == hex(id(b))
is True. However, then setting b = 4 will cause b to point to a different object in memory (showing that the int is immutable).
However, a list is mutable (modifiable "in place"). For example: c = [2] will have the same memory location before and after an operation like c[0] = 3.
There are a lot of implications of this very basic explanation that take some time to interpret. For example, variables can’t “point to” other variables, but rather still point to underlying objects.
As a result, lists can exhibit "weird" behavior (another common, related confusion revolves around setting a default parameter value as a list that's then modified in the function), but can also be taken advantage of in the way your example shows.
I'm new to coding and I'm a little confused. How/why can a for loop use a variable that isn't defined yet?
For example:
demond = {'green':'grass', 'red':'fire', 'yellow':'sun'}
for i in demond:
print(i)
Output:
green
yellow
red
In python, you don't need to declare variables. In C/C++/JAVA etc. you will have to declare them first and then use them.
Variables are nothing but reserved memory locations to store values.Based on the data type of a variable, the interpreter allocates memory and decides what can be stored in the reserved memory.Python variables do not need explicit declaration to reserve memory space. The declaration happens automatically when you assign a value to a variable.
There are two things that you need to keep in mind:
because Python is a weakly-typed language, you do not need to explicitly declare any variable to a certain object type. This is something you already know, and why you can assign things without having to state what type they will be.
For loop constructs do a lot of things in the background that you don't explicitly see. This means that although it doesnt LOOK like anything is being defined, it is.
With that in mind, I dont want to really explain how for loops work, because there are already answers available for that but the main point is that a for loop in python is the same as the following pseudo code.
#set up your iterable
demond = SOME_ITERABLE_OBJECT (this can be a list, string, dict, etc)
#this
for i in demond:
do_something(i)
#is the same as this
i = demond[0] # the first item in demond
do_something(i)
i = demond[1] # the second item in demond
do_something(i)
i = demond[2]
...
...
..
i = demond[n] # the last item in demond
do_something(i)
Now your follow up question may be this: what makes it so that, in your code, for i in demond sets i to equal to it's keys? Well that is just part of the design of python, specifically how dicts work. What the for loop is ACTUALLY doing is calling an iterables next() function until the iterable generator is done. Each iterable can have a different result from a for loop (see the first link).
NOTE:
In my code example, I am setting i = demond[some_index]. This looks like a list index grab but it is really meant to just show that is iterating through the list in some sort of order. IT IS PSUEDO CODE. Just keep that in mind.
How would I go about making reference to an element from a list inside that list? For example,
settings = ["Exposure", "0", random_time(settings[0])]
Where the third element makes reference to the first. I could verbosely state "Exposure" but I am trying to set it up so that even if the first element is changed the third changes with it.
Edit:
I think maybe my question wasn't clear enough. There will be more than one setting each using the generic function "random_time", hence the need to pass the keyword of the setting. The reference to the first element is so I only have to make modifications to the code in one place. This value will not change once the script is running.
I will try and use a list of keywords that the settings list makes reference to.
The right-hand expression is evaluated first, so when you evaluate
["Exposure", "0", random_time(settings[0])]
the variable settings is not defined yet.
A little example:
a = 1 + 2
First 1 + 2 is evaluated and the result is 3, after it's evaluated, then the assignment is done:
a = 3
One way you could handle this is storing the "changing" string to a variable:
var1 = "Exposure"
settings = [var1 , "0", random_time(var1)]
this will work in the list definition, but if, after declaring the list settings, you change var1, it won't change its third element. If you want this to happen, you can try implementing a class Settings, which will be a lot more flexible.
AFAIK you can't. This is common to most programming languages because when you're running your function there the item hasn't been completely created yet.
You can't directly.
You could have both refer to something else, though, and use an attribute of that.
class SettingObj:
name = "Exposure"
settings = [SettingObj, "0", random_time(SettingObj)]
Now, change the way you work with your settings list so that you look for your name attribute for 1st and 3rd items on the list.
As others have told you, the syntax you've chosen will try to reference settings before it is created, and therefore it will not work (unless settings already exists because another object was assigned to it on a previous line).
More importantly, in Python, assigning a string to two places will not make it so that changing it in one place will change it in the other. This applies to all forms of binding, including variable names, lists and object attributes.
Strings are immutable in Python -- they cannot be changed, only rebinded. And rebinding only affects a single name (or list position or etc.) at a time. This is different from, say, C, where two names can contain pointers that reference the same spot in memory, and you can edit that spot in memory and affect both places.
If you really need to do this, you can wrap the string in an object (custom class, presumably). You could even make the object's interface look like a string in all respects, except that it's not a string primitive but an object with an attribute (say contents) that's bound to a string. Then when you want to change the string, you rebind the object's attribute (that is, obj.contents or whatever). Since you are not reassigning the names bound to the object itself, but only a name inside the object, it will change in both places.
In this particular case you don't just have the same string in both places but you actually have a string in the first position but the result of a function performed on the string in the third position. So even if you use an object wrapper, it won't work the way you seem to want it to, because the function needs to be re-run every time.
There are ways to design your program so that this is not a problem, but without knowing more about your ultimate goal I can't say what they are.
I have a nested dictionary containing a bunch of data on a number of different objects (where I mean object in the non-programming sense of the word). The format of the dictionary is allData[i][someDataType], where i is a number designation of the object that I have data on, and someDataType is a specific data array associated with the object in question.
Now, I have a function that I have defined that requires a particular data array for a calculation to be performed for each object. The data array is called cleanFDF. So I feed this to my function, along with a bunch of other things it requires to work. I call it like this:
rm.analyze4complexity(allData[i]['cleanFDF'], other data, other data, other data)
Inside the function itself, I straight away re-assign the cleanFDF data to another variable name, namely clFDF. I.e. The end result is:
clFDF = allData[i]['cleanFDF']
I then have to zero out all of the data that lies below a certain threshold, as such:
clFDF[ clFDF < threshold ] = 0
OK - the function works as it is supposed to. But now when I try to plot the original cleanFDF data back in the main script, the entries that got zeroed out in clFDF are also zeroed out in allData[i]['cleanFDF']. WTF? Obviously something is happening here that I do not understand.
To make matters even weirder (from my point of view), I've tried to do a bodgy kludge to get around this by 'saving' the array to another variable before calling the function. I.e. I do
saveFDF = allData[i]['cleanFDF']
then run the function, then update the cleanFDF entry with the 'saved' data:
allData[i].update( {'cleanFDF':saveFDF} )
but somehow, simply by performing clFDF[ clFDF < threshold ] = 0 within the function modifies clFDF, saveFDF and allData[i]['cleanFDF'] in the main friggin' script, zeroing out all the entires at the same array indexes! It is like they are all associated global variables somehow, but I've made no such declarations anywhere...
I am a hopeless Python newbie, so no doubt I'm not understanding something about how it works. Any help would be greatly appreciated!
You are passing the value at allData[i]['cleanFDF'] by reference (decent explanation at https://stackoverflow.com/a/430958/337678). Any changes made to it will be made to the object it refers to, which is still the same object as the original, just assigned to a different variable.
Making a deep copy of the data will likely fix your issue (Python has a deepcopy library that should do the trick ;)).
Everything is a reference in Python.
def function(y):
y.append('yes')
return y
example = list()
function(example)
print(example)
it would return ['yes'] even though i am not directly changing the variable 'example'.
See Why does list.append evaluate to false?, Python append() vs. + operator on lists, why do these give different results?, Python lists append return value.